A Combination of Feature Extraction Methods with an Ensemble of Different Classifiers for Protein Structural Class Prediction Problem

被引:49
作者
Dehzangi, Abdollah [1 ]
Paliwal, Kuldip [1 ]
Sharma, Alok [1 ]
Dehzangi, Omid [2 ]
Sattar, Abdul [1 ]
机构
[1] Griffith Univ, Inst Integrated & Intelligent Syst, Nathan, Qld 4111, Australia
[2] Univ Texas Dallas, Dept Elect Engn, Embedded Syst & Signal Proc Lab, Richardson, TX 75080 USA
关键词
Mixture of feature extraction models; overlapped segmented distribution; overlapped segmented autocorrelation; ensemble of different classifiers; physicochemical-based features; AMINO-ACID-COMPOSITION; SUPPORT VECTOR MACHINES; SEQUENCES; DATABASE; ALGORITHM; HOMOLOGY; FUSION; MODEL;
D O I
10.1109/TCBB.2013.65
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Better understanding of structural class of a given protein reveals important information about its overall folding type and its domain. It can also be directly used to provide critical information on general tertiary structure of a protein which has a profound impact on protein function determination and drug design. Despite tremendous enhancements made by pattern recognition-based approaches to solve this problem, it still remains as an unsolved issue for bioinformatics that demands more attention and exploration. In this study, we propose a novel feature extraction model that incorporates physicochemical and evolutionary-based information simultaneously. We also propose overlapped segmented distribution and autocorrelation-based feature extraction methods to provide more local and global discriminatory information. The proposed feature extraction methods are explored for 15 most promising attributes that are selected from a wide range of physicochemical-based attributes. Finally, by applying an ensemble of different classifiers namely, Adaboost.M1, LogitBoost, naive Bayes, multilayer perceptron (MLP), and support vector machine (SVM) we show enhancement of the protein structural class prediction accuracy for four popular benchmarks.
引用
收藏
页码:564 / 575
页数:12
相关论文
共 55 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Predicting protein structural class by SVM with class-wise optimized features and decision probabilities [J].
Anand, Ashish ;
Pugalenthi, Ganesan ;
Suganthan, P. N. .
JOURNAL OF THEORETICAL BIOLOGY, 2008, 253 (02) :375-380
[3]  
[Anonymous], 1999, The Nature Statist. Learn. Theory
[4]   Prediction of protein structural classes by neural network [J].
Cai, YD ;
Zhou, GP .
BIOCHIMIE, 2000, 82 (08) :783-785
[5]   Using LogitBoost classifier to predict protein structural classes [J].
Cai, YD ;
Feng, KY ;
Lu, WC ;
Chou, KC .
JOURNAL OF THEORETICAL BIOLOGY, 2006, 238 (01) :172-176
[6]   Prediction of protein structural classes by support vector machines [J].
Cai, YD ;
Liu, XJ ;
Xu, XB ;
Chou, KC .
COMPUTERS & CHEMISTRY, 2002, 26 (03) :293-296
[7]   Support Vector Machines for predicting protein structural class [J].
Cai, Yu-Dong ;
Liu, Xiao-Jun ;
Xu, Xue-biao ;
Zhou, Guo-Ping .
BMC BIOINFORMATICS, 2001, 2 (1)
[8]   Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network [J].
Chen, Chao ;
Zhou, Xibin ;
Tian, Yuanxin ;
Zou, Xiaoyong ;
Cai, Peixiang .
ANALYTICAL BIOCHEMISTRY, 2006, 357 (01) :116-121
[9]   Prediction of protein structural class using novel evolutionary collocation-based sequence representation [J].
Chen, Ke ;
Kurgan, Lukasz A. ;
Ruan, Jishou .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 2008, 29 (10) :1596-1604
[10]   Prediction of the protein structural class by specific peptide frequencies [J].
Costantini, Susan ;
Facchiano, Angelo M. .
BIOCHIMIE, 2009, 91 (02) :226-229