Improved method for predicting protein fold patterns with ensemble classifiers

被引:22
作者
Chen, W. [1 ]
Liu, X. [1 ,3 ,4 ]
Huang, Y. [2 ]
Jiang, Y. [1 ]
Zou, Q. [1 ]
Lin, C. [1 ]
机构
[1] Xiamen Univ, Sch Informat Sci & Technol, Xiamen, Fujian, Peoples R China
[2] Henan Univ Sci & Technol, Anim Sci & Technol Coll, Luoyang, Henan, Peoples R China
[3] Xiamen Univ, Shenzhen Res Inst, Guangzhou, Guangdong, Peoples R China
[4] Dalian Univ, Minist Educ, Key Lab Adv Design & Intelligent Comp, Dalian, Peoples R China
关键词
Protein folding pattern; Ensemble classifier; Machine learning; Bioinformatics; CLASSIFICATION; DATABASE;
D O I
10.4238/2012.January.27.4
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/similar to cwc/ProteinPredict.html.
引用
收藏
页码:174 / 181
页数:8
相关论文
共 20 条
[1]   HIV-1 coreceptor usage prediction without multiple alignments: an application of string kernels [J].
Boisvert, Sebastien ;
Marchand, Mario ;
Laviolette, Francois ;
Corbeil, Jacques .
RETROVIROLOGY, 2008, 5 (1)
[2]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[3]   SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence [J].
Cai, CZ ;
Han, LY ;
Ji, ZL ;
Chen, X ;
Chen, YZ .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3692-3697
[4]   The structure of the ζζ transmembrane dimer reveals features essential for its assembly with the T cell receptor [J].
Call, Matthew E. ;
Schnell, Jason R. ;
Xu, Chenqi ;
Lutz, Regina A. ;
Chou, James J. ;
Wucherpfennig, Kai W. .
CELL, 2006, 127 (02) :355-368
[5]   PFRES: protein fold classification by using evolutionary information and predicted secondary structure [J].
Chen, Ke ;
Kurgan, Lukasz .
BIOINFORMATICS, 2007, 23 (21) :2843-2850
[6]   Structural bioinformatics and its impact to biomedical science [J].
Chou, KC .
CURRENT MEDICINAL CHEMISTRY, 2004, 11 (16) :2105-2134
[7]   Multi-class protein fold recognition using support vector machines and neural networks [J].
Ding, CHQ ;
Dubchak, I .
BIOINFORMATICS, 2001, 17 (04) :349-358
[8]   DNA-nanotube-induced alignment of membrane proteins for NMR structure determination [J].
Douglas, Shawn M. ;
Chou, James J. ;
Shih, William M. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (16) :6644-6648
[9]   Agaritine and its derivatives are potential inhibitors against HIV proteases [J].
Gao, Wei-Na ;
Wei, Dong-Qing ;
Li, Yun ;
Gao, Hui ;
Xu, Wei-Ren ;
Li, Ai-Xiu ;
Chou, Kuo-Chen .
MEDICINAL CHEMISTRY, 2007, 3 (03) :221-226
[10]   cDNA microarray analysis of autoimmune hepatitis, primary biliary cirrhosis and consecutive disease manifestation [J].
Honda, M ;
Kawai, H ;
Shirota, Y ;
Yamashita, T ;
Takamura, T ;
Kaneko, S .
JOURNAL OF AUTOIMMUNITY, 2005, 25 (02) :133-140