Combining protein secondary structure prediction models with ensemble methods of optimal complexity

被引:22
作者
Guermeur, Y
Pollastri, G
Elisseeff, A
Zelus, D
Paugam-Moisy, H
Baldi, P
机构
[1] Univ Nancy 1, LORIA, F-54506 Vandoeuvre Les Nancy, France
[2] Univ Calif Irvine, Inst Genom & Bioinformat, Dept Informat & Comp Sci, Irvine, CA 92697 USA
[3] Max Planck Inst Biol Cybernet, D-72076 Tubingen, Germany
[4] CIBIO, Wiener Lab, RA-2000 Rosario, Santa Fe, Argentina
[5] Univ Lyon 2, UMR CNRS 5015, ISC, F-69675 Bron, France
关键词
protein secondary structure prediction; multi-class support vector machines (M-SVMs); ensemble methods; hierarchical sequence processing systems;
D O I
10.1016/j.neucom.2003.10.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many sophisticated methods are currently available to perform protein secondary structure prediction. Since they are frequently based on different principles, and different knowledge sources, significant benefits can be expected from combining them. However, the choice of an appropriate combiner appears to be an issue in its own right. The first difficulty to overcome when combining prediction methods is overfitting. This is the reason why we investigate the implementation of Support Vector Machines to perform the task. A family of multi-class SVMs is introduced. Two of these machines are used to combine some of the current best protein secondary structure prediction methods. Their performance is consistently superior to the performance of the ensemble methods traditionally used in the field. They also outperform the decomposition approaches based on bi-class SVMs. Furthermore, initial experimental evidence suggests that their outputs could be processed by the biologist to perform higher-level treatments. (C) 2003 Elsevier B.V. All rights reserved.
引用
收藏
页码:305 / 327
页数:23
相关论文
共 79 条
[11]   The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network [J].
Bartlett, PL .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (02) :525-536
[12]   SECONDARY STRUCTURE PREDICTION - COMBINATION OF 3 DIFFERENT METHODS [J].
BIOU, V ;
GIBRAT, JF ;
LEVIN, JM ;
ROBSON, B ;
GARNIER, J .
PROTEIN ENGINEERING, 1988, 2 (03) :185-191
[13]  
Bishop C. M., 1996, Neural networks for pattern recognition
[14]  
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[15]   Multicategory classification by support vector machines [J].
Bredensteiner, EJ ;
Bennett, KP .
COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 1999, 12 (1-3) :53-79
[16]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[17]   On the algorithmic implementation of multiclass kernel-based vector machines [J].
Crammer, K ;
Singer, Y .
JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (02) :265-292
[18]  
Crammer Koby., 2000, Proceedings of the Thirteenth Annual Conference on Computa- tional Learning Theory, COLT '00, P35
[19]  
Cristianini N., 2000, Intelligent Data Analysis: An Introduction, DOI 10.1017/CBO9780511801389
[20]  
Cuff JA, 1999, PROTEINS, V34, P508, DOI 10.1002/(SICI)1097-0134(19990301)34:4<508::AID-PROT10>3.0.CO