MUPRED: A tool for bridging the gap between template based methods and sequence profile based methods for protein secondary structure prediction

被引:29
作者
Bondugula, Rajkumar [1 ]
Xu, Dong [1 ]
机构
[1] Univ Missouri, Christopher S Bond Life Sci Ctr 271C, Digital Biol Lab, Dept Comp Sci, Columbia, MO 65211 USA
关键词
protein secondary structure prediction; fuzzy nearest neighbor; neural network; hybrid prediction system; sequence profile; template; prediction accuracy assessment;
D O I
10.1002/prot.21177
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Predicting secondary structures from a protein sequence is an important step for characterizing the structural properties of a protein. Existing methods for protein secondary structure prediction can be broadly classified into template based or sequence profile based methods. We propose a novel framework that bridges the gap between the two fundamentally different approaches. Our framework integrates the information from the fuzzy k-nearest neighbor algorithm and position-specific scoring matrices using a neural network. It combines the strengths of the two methods and has a better potential to use the information in both the sequence and structure databases than existing methods. We implemented the framework into a software system MUPRED. MUPRED has achieved three-state prediction accuracy (Q(3)) ranging from 79.2 to 80.14%, depending on which benchmark dataset is used. A higher Q(3) can be achieved if a query protein has a significant sequence identity (> 25%) to a template in PDB. MUPRED also estimates the prediction accuracy at the individual residue level more quantitatively than existing methods. The MUPRED web server and executables are freely available at http://digbio.missouri.edu/mupred. Proteins 2007; 66:664-670. (c) 2006 Wiley-Liss, Inc.
引用
收藏
页码:664 / 670
页数:7
相关论文
共 23 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Exploiting the past and the future in protein secondary structure prediction [J].
Baldi, P ;
Brunak, S ;
Frasconi, P ;
Soda, G ;
Pollastri, G .
BIOINFORMATICS, 1999, 15 (11) :937-946
[3]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[4]  
BONDUGULA R, 2001, P 3 AS PAC BIOINF C
[5]   The ASTRAL compendium for protein structure and sequence analysis [J].
Brenner, SE ;
Koehl, P ;
Levitt, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :254-256
[6]   Prediction of protein secondary structure by mining structural fragment database [J].
Cheng, HT ;
Sen, TZ ;
Kloczkowski, A ;
Margaritis, D ;
Jernigan, RL .
POLYMER, 2005, 46 (12) :4314-4321
[7]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[8]   K-NEAREST-NEIGHBOR BAYES-RISK ESTIMATION [J].
FUKUNAGA, K ;
HOSTETLER, LD .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1975, 21 (03) :285-293
[9]   SOPM - A SELF-OPTIMIZED METHOD FOR PROTEIN SECONDARY STRUCTURE PREDICTION [J].
GEOURJON, C ;
DELEAGE, G .
PROTEIN ENGINEERING, 1994, 7 (02) :157-164
[10]  
HOBOHM U, 1994, PROTEIN SCI, V3, P522