Contact map prediction using a large-scale ensemble of rule sets and the fusion of multiple predicted structural features

被引:35
作者
Bacardit, Jaume [1 ]
Widera, Pawel [1 ]
Marquez-Chamorro, Alfonso [2 ]
Divina, Federico [2 ]
Aguilar-Ruiz, Jesus S. [2 ]
Krasnogor, Natalio [1 ]
机构
[1] Univ Nottingham, Sch Comp Sci, Interdisciplinary Comp & Complex Syst ICOS Res Gr, Nottingham NG8 1BB, England
[2] Pablo de Olavide Univ, Sch Engn, Seville 41013, Spain
基金
英国工程与自然科学研究理事会;
关键词
PROTEIN SECONDARY STRUCTURE; RESIDUE-RESIDUE CONTACTS; DATABASE;
D O I
10.1093/bioinformatics/bts472
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The prediction of a protein's contact map has become in recent years, a crucial stepping stone for the prediction of the complete 3D structure of a protein. In this article, we describe a methodology for this problem that was shown to be successful in CASP8 and CASP9. The methodology is based on (i) the fusion of the prediction of a variety of structural aspects of protein residues, (ii) an ensemble strategy used to facilitate the training process and (iii) a rule-based machine learning system from which we can extract human-readable explanations of the predictor and derive useful information about the contact map representation. Results: The main part of the evaluation is the comparison against the sequence-based contact prediction methods from CASP9, where our method presented the best rank in five out of the six evaluated metrics. We also assess the impact of the size of the ensemble used in our predictor to show the trade-off between performance and training time of our method. Finally, we also study the rule sets generated by our machine learning system. From this analysis, we are able to estimate the contribution of the attributes in our representation and how these interact to derive contact predictions.
引用
收藏
页码:2441 / 2448
页数:8
相关论文
共 24 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
Bacardit J., 2009, MEMET COMPUT, V1, P55
[3]   Automated Alphabet Reduction for Protein Datasets [J].
Bacardit, Jaume ;
Stout, Michael ;
Hirst, Jonathan D. ;
Valencia, Alfonso ;
Smith, Robert E. ;
Krasnogor, Natalio .
BMC BIOINFORMATICS, 2009, 10
[4]  
Bacardit J, 2006, GECCO 2006: GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, VOL 1 AND 2, P247
[5]   The Quickhull algorithm for convex hulls [J].
Barber, CB ;
Dobkin, DP ;
Huhdanpaa, H .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1996, 22 (04) :469-483
[6]   Functional Network Construction in Arabidopsis Using Rule-Based Machine Learning on Large-Scale Data Sets [J].
Bassel, George W. ;
Glaab, Enrico ;
Marquez, Julietta ;
Holdsworth, Michael J. ;
Bacardit, Jaume .
PLANT CELL, 2011, 23 (09) :3101-3116
[7]   Improved residue contact prediction using support vector machines and a large feature set [J].
Cheng, Jianlin ;
Baldi, Pierre .
BMC BIOINFORMATICS, 2007, 8 (1)
[8]  
Demsar J, 2006, J MACH LEARN RES, V7, P1
[9]   Protein secondary structure prediction based on position-specific scoring matrices [J].
Jones, DT .
JOURNAL OF MOLECULAR BIOLOGY, 1999, 292 (02) :195-202
[10]   DICTIONARY OF PROTEIN SECONDARY STRUCTURE - PATTERN-RECOGNITION OF HYDROGEN-BONDED AND GEOMETRICAL FEATURES [J].
KABSCH, W ;
SANDER, C .
BIOPOLYMERS, 1983, 22 (12) :2577-2637