Bayesian network multi-classifiers for protein secondary structure prediction

被引:37
作者
Robes, V [1 ]
Larrañaga, P
Peña, JM
Menasalvas, E
Pérez, MS
Herves, V
Wasilewska, A
机构
[1] Tech Univ Madrid, Dept Comp Architecture & Technol, Madrid, Spain
[2] Univ Basque Country, Dept Comp Sci & Artificial Intelligence, San Sebastian, Spain
[3] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY 11794 USA
关键词
multi-classifier; supervised classification; machine learning; stacked generalization; Bayesian networks; protein secondary; structure prediction; Pazzani-EDA;
D O I
10.1016/j.artmed.2004.01.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Successful secondary structure predictions provide a starting point for direct tertiary structure modelling, and also can significantly improve sequence analysis and sequence-structure threading for aiding in structure and function determination. Hence the improvement of predictive accuracy of the secondary structure prediction becomes essential for future development of the whole field of protein research. In this work we present several multi-classifiers that combine the predictions of the best current classifiers available on Internet. Our results prove that combining the predictions of a set of classifiers by creating composite classifiers is a fruitful one. We have created multi-classifiers that are more accurate than any of the component classifiers. The multi-classifiers are based on Bayesian networks. They are validated with 9 different datasets. Their predictive accuracy results outperform the best secondary structure predictors by 1.21% on average. Our main contributions are: (i) we improved the best know predictive accuracy by 1.21%, (ii) our best results have been obtained with a new semi naive Bayes approach named Pazzani-EDA and (iii) our multi-classifiers combine results of previously build classifiers predictions obtained through Internet, thanks to our development of a Java application. (C) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:117 / 136
页数:20
相关论文
共 38 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Exploiting the past and the future in protein secondary structure prediction [J].
Baldi, P ;
Brunak, S ;
Frasconi, P ;
Soda, G ;
Pollastri, G .
BIOINFORMATICS, 1999, 15 (11) :937-946
[3]  
BALDI P, 1999, P 16 INT JOINT ART I
[4]  
BARTON G, 1988, J MOL BIOL, V195, P957
[5]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[6]  
Cuff JA, 1999, PROTEINS, V34, P508, DOI 10.1002/(SICI)1097-0134(19990301)34:4<508::AID-PROT10>3.0.CO
[7]  
2-4
[8]   JPred: a consensus secondary structure prediction server [J].
Cuff, JA ;
Clamp, ME ;
Siddiqui, AS ;
Finlay, M ;
Barton, GJ .
BIOINFORMATICS, 1998, 14 (10) :892-893
[9]  
Duda R. O., 1973, PATTERN CLASSIFICATI
[10]  
Frishman D, 1997, PROTEINS, V27, P329, DOI 10.1002/(SICI)1097-0134(199703)27:3<329::AID-PROT1>3.0.CO