Using Random Forest Algorithm to Predict β-Hairpin Motifs

被引:28
作者
Jia, Shao-Chun [1 ]
Hu, Xiu-Zhen [1 ]
机构
[1] Inner Mongolia Univ Technol, Coll Sci, Hohhot 010051, Peoples R China
基金
中国国家自然科学基金;
关键词
Amino acids component of position; auto-correlation function; beta-hairpin motif; hydropathy component of position; predicted secondary structure information; random forest algorithm; AMINO-ACID-COMPOSITION; PROTEIN SECONDARY STRUCTURE; SUBCELLULAR LOCATION; APOPTOSIS PROTEINS; FUNCTIONAL DOMAIN; CLEAVAGE SITES; EUK-MPLOC; CLASSIFICATION; LOCALIZATION; RECOGNITION;
D O I
10.2174/092986611795222777
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A novel method is presented for predicting beta-hairpin motifs in protein sequences. That is Random Forest algorithm on the basis of the multi-characteristic parameters, which include amino acids component of position, hydropathy component of position, predicted secondary structure information and value of auto-correlation function. Firstly, the method is trained and tested on a set of 8,291 beta-hairpin motifs and 6,865 non-beta-hairpin motifs. The overall accuracy and Matthew's correlation coefficient achieve 82.2% and 0.64 using 5-fold cross-validation, while they achieve 81.7% and 0.63 using the independent test. Secondly, the method is also tested on a set of 4,884 beta-hairpin motifs and 4,310 non-hairpin motifs which is used in previous studies. The overall accuracy and Matthew's correlation coefficient achieve 80.9% and 0.61 for 5-fold cross-validation, while they achieve 80.6% and 0.60 for the independent test. Compared with the previous, the present result is better. Thirdly, 4,884 beta-hairpin motifs and 4,310 non-beta-hairpin motifs selected as the training set, and 8,291 beta-hairpin motifs and 6,865 non-beta-hairpin motifs selected as the independent testing set, the overall accuracy and Matthew's correlation coefficient achieve 81.5% and 0.63 with the independent test.
引用
收藏
页码:609 / 617
页数:9
相关论文
共 62 条
[1]  
[Anonymous], NAT SCI
[2]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[3]   Prediction of Protein Secondary Structure Content by Using the Concept of Chou's Pseudo Amino Acid Composition and Support Vector Machine [J].
Chen, Chao ;
Chen, Lixuan ;
Zou, Xiaoyong ;
Cai, Peixiang .
PROTEIN AND PEPTIDE LETTERS, 2009, 16 (01) :27-31
[4]   Analysis of Protein Pathway Networks Using Hybrid Properties [J].
Chen, Lei ;
Huang, Tao ;
Shi, Xiao-He ;
Cai, Yu-Dong ;
Chou, Kuo-Chen .
MOLECULES, 2010, 15 (11) :8177-8192
[5]   Predicting the network of substrate-enzyme-product triads by combining compound similarity and functional domain composition [J].
Chen, Lei ;
Feng, Kai-Yan ;
Cai, Yu-Dong ;
Chou, Kuo-Chen ;
Li, Hai-Peng .
BMC BIOINFORMATICS, 2010, 11
[6]  
Chou K.C., 2009, OPEN BIOINFORM J, V3, P31
[7]   Binding mechanism of coronavirus main proteinase with ligands and its implication to drug design against SARS [J].
Chou, KC ;
Wei, DQ ;
Zhong, WZ .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2003, 308 (01) :148-151
[8]   Structural bioinformatics and its impact to biomedical science [J].
Chou, KC .
CURRENT MEDICINAL CHEMISTRY, 2004, 11 (16) :2105-2134
[9]   Prediction of human immunodeficiency virus protease cleavage sites in proteins [J].
Chou, KC .
ANALYTICAL BIOCHEMISTRY, 1996, 233 (01) :1-14
[10]  
Chou KC, 1997, J PEPT RES, V49, P120