PPRODO: Prediction of protein domain boundaries using neural networks

被引:60
作者
Sim, J [1 ]
Kim, SY [1 ]
Lee, J [1 ]
机构
[1] Korea Inst Adv Study, Sch Computat Sci, Seoul 130722, South Korea
关键词
protein domains; domain boundary prediction; neural network;
D O I
10.1002/prot.20442
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Successful prediction of protein domain boundaries provides valuable information not only for the computational structure prediction of multidomain proteins but also for the experimental structure determination. Since protein sequences of multiple domains may contain much information regarding evolutionary processes such as gene-exon shuffling, this information can be detected by analyzing the position-specific scoring matrix (PSSM) generated by PSI-BLAST. We have presented a method, PPRODO (Prediction of PROtein DOmain boundaries) that predicts domain boundaries of proteins from sequence information by a neural network. The network is trained and tested using the values obtained from the PSSM generated by PSI-BLAST. A 10-fold cross-validation technique is performed to obtain the parameters of neural networks using a nonredundant set of 522 proteins containing 2 contiguous domains. PPRODO provides good and consistent results for the prediction of domain boundaries, with accuracy of about 66% using the 20 residue criterion. The PPRODO source code, as well as all data sets used in this work, are available from http://gene.kias.re.kr/similar to jlee/pprodo/. (c) 2005 Wiley-Liss, Inc.
引用
收藏
页码:627 / 632
页数:6
相关论文
共 28 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins [J].
Bateman, A ;
Birney, E ;
Durbin, R ;
Eddy, SR ;
Finn, RD ;
Sonnhammer, ELL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :260-262
[3]   SHUFFLED DOMAINS IN EXTRACELLULAR PROTEINS [J].
BORK, P .
FEBS LETTERS, 1991, 286 (1-2) :47-54
[4]   The ASTRAL compendium for protein structure and sequence analysis [J].
Brenner, SE ;
Koehl, P ;
Levitt, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :254-256
[5]   PREDICTION OF PROTEIN STRUCTURAL CLASSES [J].
CHOU, KC ;
ZHANG, CT .
CRITICAL REVIEWS IN BIOCHEMISTRY AND MOLECULAR BIOLOGY, 1995, 30 (04) :275-349
[6]   THE MULTIPLICITY OF DOMAINS IN PROTEINS [J].
DOOLITTLE, RF .
ANNUAL REVIEW OF BIOCHEMISTRY, 1995, 64 :287-314
[7]   Prediction of protein domain boundaries from sequence alone [J].
Galzitskaya, OV ;
Melnik, BS .
PROTEIN SCIENCE, 2003, 12 (04) :696-701
[8]   SnapDRAGON: a method to delineate protein structural domains from sequence data [J].
George, RA ;
Heringa, J .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 316 (03) :839-851
[9]   Protein domain identification and improved sequence similarity searching using PSI-BLAST [J].
George, RA ;
Heringa, J .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2002, 48 (04) :672-681
[10]   Automated protein sequence database classification. II. Delineation of domain boundaries from sequence similarities [J].
Gracy, J ;
Argos, P .
BIOINFORMATICS, 1998, 14 (02) :174-187