Using Random Forest Algorithm to Predict β-Hairpin Motifs

被引：28

作者：

Jia, Shao-Chun ^{[1
]}

Hu, Xiu-Zhen ^{[1
]}

机构：

[1] Inner Mongolia Univ Technol, Coll Sci, Hohhot 010051, Peoples R China

来源：

PROTEIN AND PEPTIDE LETTERS | 2011年 / 18卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Amino acids component of position; auto-correlation function; beta-hairpin motif; hydropathy component of position; predicted secondary structure information; random forest algorithm; AMINO-ACID-COMPOSITION; PROTEIN SECONDARY STRUCTURE; SUBCELLULAR LOCATION; APOPTOSIS PROTEINS; FUNCTIONAL DOMAIN; CLEAVAGE SITES; EUK-MPLOC; CLASSIFICATION; LOCALIZATION; RECOGNITION;

D O I：

10.2174/092986611795222777

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

A novel method is presented for predicting beta-hairpin motifs in protein sequences. That is Random Forest algorithm on the basis of the multi-characteristic parameters, which include amino acids component of position, hydropathy component of position, predicted secondary structure information and value of auto-correlation function. Firstly, the method is trained and tested on a set of 8,291 beta-hairpin motifs and 6,865 non-beta-hairpin motifs. The overall accuracy and Matthew's correlation coefficient achieve 82.2% and 0.64 using 5-fold cross-validation, while they achieve 81.7% and 0.63 using the independent test. Secondly, the method is also tested on a set of 4,884 beta-hairpin motifs and 4,310 non-hairpin motifs which is used in previous studies. The overall accuracy and Matthew's correlation coefficient achieve 80.9% and 0.61 for 5-fold cross-validation, while they achieve 80.6% and 0.60 for the independent test. Compared with the previous, the present result is better. Thirdly, 4,884 beta-hairpin motifs and 4,310 non-beta-hairpin motifs selected as the training set, and 8,291 beta-hairpin motifs and 6,865 non-beta-hairpin motifs selected as the independent testing set, the overall accuracy and Matthew's correlation coefficient achieve 81.5% and 0.63 with the independent test.

引用

页码：609 / 617

页数：9

共 62 条

[1]

[Anonymous], NAT SCI

[2] Random forests [J].

Breiman, L .

MACHINE LEARNING, 2001, 45 (01) :5-32

[3] Prediction of Protein Secondary Structure Content by Using the Concept of Chou's Pseudo Amino Acid Composition and Support Vector Machine [J].

Chen, Chao ;

Chen, Lixuan ;

Zou, Xiaoyong ;

Cai, Peixiang .

PROTEIN AND PEPTIDE LETTERS, 2009, 16 (01) :27-31

[4] Analysis of Protein Pathway Networks Using Hybrid Properties [J].

Chen, Lei ;

Huang, Tao ;

Shi, Xiao-He ;

Cai, Yu-Dong ;

Chou, Kuo-Chen .

MOLECULES, 2010, 15 (11) :8177-8192

[5] Predicting the network of substrate-enzyme-product triads by combining compound similarity and functional domain composition [J].

Chen, Lei ;

Feng, Kai-Yan ;

Cai, Yu-Dong ;

Chou, Kuo-Chen ;

Li, Hai-Peng .

BMC BIOINFORMATICS, 2010, 11

[6]

Chou K.C., 2009, OPEN BIOINFORM J, V3, P31

[7] Binding mechanism of coronavirus main proteinase with ligands and its implication to drug design against SARS [J].

Chou, KC ;

Wei, DQ ;

Zhong, WZ .

BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2003, 308 (01) :148-151

[8] Structural bioinformatics and its impact to biomedical science [J].

Chou, KC .

CURRENT MEDICINAL CHEMISTRY, 2004, 11 (16) :2105-2134

[9] Prediction of human immunodeficiency virus protease cleavage sites in proteins [J].

Chou, KC .

ANALYTICAL BIOCHEMISTRY, 1996, 233 (01) :1-14

[10]

Chou KC, 1997, J PEPT RES, V49, P120

← 1 2 3 4 5 6 7 →