Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices

被引:64
作者
Munteanu, Cristian Robert [2 ]
Gonzalez-Diaz, Humberto [1 ]
Magalhaes, Alexandre L. [2 ]
机构
[1] Univ Santiago de Compostela, Fac Pharm, Dept Microbiol & Parasitol, Santiago De Compostela 15782, Spain
[2] Univ Porto, Dept Chem, REQUIMTE, Fac Sci, P-4169007 Oporto, Portugal
关键词
protein models; protein secondary structures; star graph; !text type='Python']Python[!/text] application;
D O I
10.1016/j.jtbi.2008.06.003
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The huge amount of new proteins that need a fast enzymatic activity characterization creates demands of protein QSAR theoretical models. The protein parameters that can be used for an enzyme/non-enzyme classification includes the simpler indices such as composition, sequence and connectivity, also called topological indices (TIs) and the computationally expensive 3D descriptors. A comparison of the 3D versus lower dimension indices has not been reported with respect to the power of discrimination of proteins according to enzyme action. A set of 966 proteins (enzymes and non-enzymes) whose structural characteristics are provided by PDB/DSSP files was analyzed with Python/Biopython scripts, STATISTICA and Weka. The list of indices includes, but it is not restricted to pure composition indices (residue fractions), DSSP secondary structure protein composition and 3D indices (surface and access). We also used mixed indices such as composition-sequence indices (Chou's pseudoamino acid compositions or coupling numbers), 31)-composition (surface fractions) and DSSP secondary structure amino acid composition/propensities (obtained with our Prot-2S Web too[). In addition, we extend and test for the first time several classic TIs for the Randic's protein sequence Star graphs using our Sequence to Star Graph (S2SG) Python application. All the indices were processed with general discriminant analysis models (GDA), neural networks (NN) and machine learning (ML) methods and the results are presented versus complexity, average of Shannon's information entropy (Sh) and data/ method type. This study compares for the first time all these classes of indices to assess the ratios between model accuracy and indices/model complexity in enzyme/non-enzyme discrimination. The use of different methods and complexity of data shows that one cannot establish a direct relation between the complexity and the accuracy of the model. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:476 / 482
页数:7
相关论文
共 81 条
[1]   Atom level electrotopological state indexes in QSAR: Designing and testing antithyroid agents [J].
AbouShaaban, RRA ;
AlKhamees, HA ;
AbouAuda, HS ;
Simonelli, AP .
PHARMACEUTICAL RESEARCH, 1996, 13 (01) :129-136
[2]   QSAR study on carbonic anhydrase inhibitors:: water-soluble sulfonamides incorporating β-alanyl moieties, possessing long lasting-intra ocular pressure lowering properties -: a molecular connectivity approach [J].
Agrawal, VK ;
Banerji, M ;
Gupta, M ;
Singh, J ;
Khadikar, PV ;
Supuran, CT .
EUROPEAN JOURNAL OF MEDICINAL CHEMISTRY, 2005, 40 (10) :1002-1012
[3]   Novel 2D maps and coupling numbers for protein sequences.: The first QSAR study of polygalacturonases;: isolation and prediction of a novel sequence from Psidium guajava']java L. [J].
Agüero-Chapin, GA ;
González-Díaz, H ;
Molina, R ;
Varona-Santos, J ;
Uriarte, E ;
González-Díaz, Y .
FEBS LETTERS, 2006, 580 (03) :723-730
[4]  
ALTHAUS IW, 1993, J BIOL CHEM, V268, P14875
[5]  
ALTHAUS IW, 1993, J BIOL CHEM, V268, P6119
[6]   Kinetic plasticity and the determination of product ratios for kinetic schemes leading to multiple products without rate laws - New methods based on directed graphs [J].
Andraos, John .
CANADIAN JOURNAL OF CHEMISTRY, 2008, 86 (04) :342-357
[7]  
[Anonymous], EXPT CLIN SCI INT ON
[8]  
[Anonymous], ARTIFICIAL NEURAL NE
[9]   Characterization of fold diversity among proteins with the same number of amino acid residues [J].
Arteca, GA ;
Tapia, O .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1999, 39 (04) :642-649
[10]   The ENZYME database in 2000 [J].
Bairoch, A .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :304-305