DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning

被引:52
作者
Eickholt, Jesse [1 ]
Deng, Xin [1 ]
Cheng, Jianlin [1 ,2 ,3 ]
机构
[1] Univ Missouri, Dept Comp Sci, Columbia, MO 65211 USA
[2] Univ Missouri, Inst Informat, Columbia, MO 65211 USA
[3] Univ Missouri, C Bond Life Sci Ctr, Columbia, MO 65211 USA
关键词
CLASSIFICATION; DATABASE; LINKERS; HISTORY; SEARCH;
D O I
10.1186/1471-2105-12-43
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Accurate identification of protein domain boundaries is useful for protein structure determination and prediction. However, predicting protein domain boundaries from a sequence is still very challenging and largely unsolved. Results: We developed a new method to integrate the classification power of machine learning with evolutionary signals embedded in protein families in order to improve protein domain boundary prediction. The method first extracts putative domain boundary signals from a multiple sequence alignment between a query sequence and its homologs. The putative sites are then classified and scored by support vector machines in conjunction with input features such as sequence profiles, secondary structures, solvent accessibilities around the sites and their positions. The method was evaluated on a domain benchmark by 10-fold cross-validation and 60% of true domain boundaries can be recalled at a precision of 60%. The trade-off between the precision and recall can be adjusted according to specific needs by using different decision thresholds on the domain boundary scores assigned by the support vector machines. Conclusions: The good prediction accuracy and the flexibility of selecting domain boundary sites at different precision and recall values make our method a useful tool for protein structure determination and modelling. The method is available at http://sysbio.rnet.missouri.edu/dobo/.
引用
收藏
页数:8
相关论文
共 50 条
[1]   Multiple domain protein diagnostic patterns [J].
Adams, RM ;
Das, S ;
Smith, TF .
PROTEIN SCIENCE, 1996, 5 (07) :1240-1249
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]  
[Anonymous], BIOINFORMATICS GENOM
[4]  
[Anonymous], PDB IDENTIFIERS DOMA
[5]  
[Anonymous], CASP9
[6]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
[7]   3D DOMAIN SWAPPING - A MECHANISM FOR OLIGOMER ASSEMBLY [J].
BENNETT, MJ ;
SCHLUNEGGER, MP ;
EISENBERG, D .
PROTEIN SCIENCE, 1995, 4 (12) :2455-2468
[8]   SHUFFLED DOMAINS IN EXTRACELLULAR PROTEINS [J].
BORK, P .
FEBS LETTERS, 1991, 286 (1-2) :47-54
[9]   KemaDom: a web server for domain prediction using kernel machine with local context [J].
Chen, Lusheng ;
Wang, Wei ;
Ling, Shaoping ;
Jia, Caiyan ;
Wang, Fei .
NUCLEIC ACIDS RESEARCH, 2006, 34 :W158-W163
[10]   SCRATCH: a protein structure and structural feature prediction server [J].
Cheng, J ;
Randall, AZ ;
Sweredoski, MJ ;
Baldi, P .
NUCLEIC ACIDS RESEARCH, 2005, 33 :W72-W76