Random Forest classification based on star graph topological indices for antioxidant proteins

被引:49
作者
Fernandez-Blanco, Enrique [1 ]
Aguiar-Pulido, Vanessa [1 ]
Robert Munteanu, Cristian [1 ]
Dorado, Julian [1 ]
机构
[1] Univ A Coruna, ICT Dept, Fac Informat, La Coruna 15071, Spain
关键词
Multi-target QSAR; Star Graph; Topological indices; Antioxidant protein; CODON USAGE; QSAR MODEL; STEADY; LONGEVITY; REPRESENTATION; PREDICTION; SEQUENCE; RULES; INHIBITOR; NETWORKS;
D O I
10.1016/j.jtbi.2012.10.006
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Aging and life quality is an important research topic nowadays in areas such as life sciences, chemistry, pharmacology, etc. People live longer, and, thus, they want to spend that extra time with a better quality of life. At this regard, there exists a tiny subset of molecules in nature, named antioxidant proteins that may influence the aging process. However, testing every single protein in order to identify its properties is quite expensive and inefficient. For this reason, this work proposes a model, in which the primary structure of the protein is represented using complex network graphs that can be used to reduce the number of proteins to be tested for antioxidant biological activity. The graph obtained as a representation will help us describe the complex system by using topological indices. More specifically, in this work, Randic's Star Networks have been used as well as the associated indices, calculated with the S2SNet tool. In order to simulate the existing proportion of antioxidant proteins in nature, a dataset containing 1999 proteins, of which 324 are antioxidant proteins, was created. Using this data as input, Star Graph Topological Indices were calculated with the S2SNet tool. These indices were then used as input to several classification techniques. Among the techniques utilised, the Random Forest has shown the best performance, achieving a score of 94% correctly classified instances. Although the target class (antioxidant proteins) represents a tiny subset inside the dataset, the proposed model is able to achieve a percentage of 81.8% correctly classified instances for this class, with a precision of 81.3%. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:331 / 337
页数:7
相关论文
共 59 条
[1]   Novel 2D maps and coupling numbers for protein sequences.: The first QSAR study of polygalacturonases;: isolation and prediction of a novel sequence from Psidium guajava']java L. [J].
Agüero-Chapin, GA ;
González-Díaz, H ;
Molina, R ;
Varona-Santos, J ;
Uriarte, E ;
González-Díaz, Y .
FEBS LETTERS, 2006, 580 (03) :723-730
[2]   Naive Bayes QSDR classification based on spiral-graph Shannon entropies for protein biomarkers in human colon cancer [J].
Aguiar-Pulido, Vanessa ;
Munteanu, Cristian R. ;
Seoane, Jose A. ;
Fernandez-Blanco, Enrique ;
Perez-Montoto, Lazaro G. ;
Gonzalez-Diaz, Humberto ;
Dorado, Julian .
MOLECULAR BIOSYSTEMS, 2012, 8 (06) :1716-1722
[3]   STEADY-STATE KINETIC-STUDIES WITH THE POLYSULFONATE U-9843, AN HIV REVERSE-TRANSCRIPTASE INHIBITOR [J].
ALTHAUS, IW ;
CHOU, JJ ;
GONZALES, AJ ;
LEMAY, RJ ;
DEIBEL, MR ;
CHOU, KC ;
KEZDY, FJ ;
ROMERO, DL ;
THOMAS, RC ;
ARISTOFF, PA ;
TARPLEY, WG ;
REUSSER, F .
EXPERIENTIA, 1994, 50 (01) :23-28
[4]   The benzylthio-pyrimidine U-31,355, a potent inhibitor of HIV-1 reverse transcriptase [J].
Althaus, IW ;
Chou, KC ;
Lemay, RJ ;
Franks, KM ;
Deibel, MR ;
Kezdy, FJ ;
Resnick, L ;
Busso, ME ;
So, AG ;
Downey, KM ;
Romero, DL ;
Thomas, RC ;
Aristoff, PA ;
Tarpley, WG ;
Reusser, F .
BIOCHEMICAL PHARMACOLOGY, 1996, 51 (06) :743-750
[5]  
ALTHAUS IW, 1993, J BIOL CHEM, V268, P6119
[6]   KINETIC-STUDIES WITH THE NONNUCLEOSIDE HIV-1 REVERSE-TRANSCRIPTASE INHIBITOR-U-88204E [J].
ALTHAUS, IW ;
CHOU, JJ ;
GONZALES, AJ ;
DEIBEL, MR ;
CHOU, KC ;
KEZDY, FJ ;
ROMERO, DL ;
PALMER, JR ;
THOMAS, RC ;
ARISTOFF, PA ;
TARPLEY, WG ;
REUSSER, F .
BIOCHEMISTRY, 1993, 32 (26) :6548-6554
[7]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[8]   Distribution moments of 2D-graphs as descriptors of DNA sequences [J].
Bielinska-Waz, Dorota ;
Nowak, Wieslaw ;
Waz, Piotr ;
Nandy, Ashesh ;
Clark, Timothy .
CHEMICAL PHYSICS LETTERS, 2007, 443 (4-6) :408-413
[9]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[10]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32