Boosting classifier for predicting protein domain structural class

被引:127
作者
Feng, KY
Cai, YD
Chou, KC
机构
[1] Gordon Life Sci Inst, San Diego, CA 92130 USA
[2] Univ Manchester, Sch Med, Manchester M13 9PT, Lancs, England
[3] Univ Manchester, Inst Sci & Technol, Biomol Sci Dept, Manchester M60 1QD, Lancs, England
关键词
domain structural classification; binary LogitBoost; one-vs-others LogitBoost; AdaBoost; support vector machines; neural network;
D O I
10.1016/j.bbrc.2005.06.075
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A novel classifier, the so-called "Logit Boost" classifier, was introduced to predict the structural class of a protein domain according to its amino acid sequence. LogitBoost is featured by introducing a log-likelihood loss function to reduce the sensitivity to noise and outliers, as well as by performing classification via combining many weak classifiers together to build up a very strong and robust classifier. It was demonstrated thru jackknife cross-validation tests that LogitBoost outperformed other classifiers including "support vector machine," a very powerful classifier widely used in biological literatures. It is anticipated that LogitBoost can also become a useful vehicle in classifying other attributes of proteins according to their sequences, such as subcellular localization and enzyme family class, among many others. (c) 2005 Elsevier Inc. All rights reserved.
引用
收藏
页码:213 / 217
页数:5
相关论文
共 56 条
[1]  
Bahar I, 1997, PROTEINS, V29, P172, DOI 10.1002/(SICI)1097-0134(199710)29:2<172::AID-PROT5>3.3.CO
[2]  
2-D
[3]  
Bishop C. M., 1996, Neural networks for pattern recognition
[4]  
Breiman L, 1998, ANN STAT, V26, P801
[5]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[6]   Prediction of protein structural classes by neural network [J].
Cai, YD ;
Zhou, GP .
BIOCHIMIE, 2000, 82 (08) :783-785
[7]   Identify catalytic triads of serine hydrolases by support vector machines [J].
Cai, YD ;
Zhou, GP ;
Jen, CH ;
Lin, SL ;
Chou, KC .
JOURNAL OF THEORETICAL BIOLOGY, 2004, 228 (04) :551-557
[8]   Support vector machines for predicting membrane protein types by using functional domain composition [J].
Cai, YD ;
Zhou, GP ;
Chou, KC .
BIOPHYSICAL JOURNAL, 2003, 84 (05) :3257-3263
[9]   Using neural networks for prediction of domain structural classes [J].
Cai, YD ;
Li, YX ;
Chou, KC .
BIOCHIMICA ET BIOPHYSICA ACTA-PROTEIN STRUCTURE AND MOLECULAR ENZYMOLOGY, 2000, 1476 (01) :1-2
[10]   A JOINT PREDICTION OF THE FOLDING TYPES OF 1490 HUMAN PROTEINS FROM THEIR GENETIC CODONS [J].
CHOU, JJW ;
ZHANG, CT .
JOURNAL OF THEORETICAL BIOLOGY, 1993, 161 (02) :251-262