Analysis and prediction of functional sub-types from protein sequence alignments

被引:208
作者
Hannenhalli, SS
Russell, RB
机构
[1] SmithKline Beecham Pharmaceut, Res & Dev, Bioinformat Res Grp, Harlow CM19 5AW, Essex, England
[2] SmithKline Beecham Pharmaceut, Res & Dev, Bioinformat Res Grp, King Of Prussia, PA 19406 USA
关键词
protein function; protein structure; prediction; sequence alignment;
D O I
10.1006/jmbi.2000.4036
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The increasing number and diversity of protein sequence families requires new methods to define and predict details regarding function. Here, we present a method for analysis and prediction of functional subtypes from multiple protein sequence alignments. Given an alignment and set of proteins grouped into sub-types according to some definition of function, such as enzymatic specificity, the method identifies positions that are indicative of functional differences by comparison of sub-type specific sequence profiles, and analysis of positional entropy in the alignment. Alignment positions with significantly high positional relative entropy correlate with those known to be involved in defining sub-types for nucleotidyl cyclases, protein kinases, lactate/malate dehydrogenases and trypsin-like serine proteases. We highlight new positions for these proteins that suggest additional experiments to elucidate the basis of specificity. The method is also able to predict sub-type for unclassified sequences. We assess several variations on a prediction method, and compare them to simple sequence comparisons. For assessment, we remove close homologues to the sequence for which a prediction is to be made (by a sequence identity above a threshold). This simulates situations where a protein is known to belong to a protein family, but is not a close relative of another protein of known sub-type. Considering the four families above, and a sequence identity threshold of 30 %, our best method gives an accuracy of 96% compared to 80% obtained for sequence similarity and 74% for BLAST. We describe the derivation of a set of sub-type groupings derived from an automated parsing of alignments from PFAM and the SWISSPROT database, and use this to perform a large-scale assessment. The best method gives an average accuracy of 94% compared to 68% for sequence similarity and 79% for BLAST. We discuss implications for experimental design, genome annotation and the prediction of protein function and protein intra-residue distances. (C) 2000 Academic Press.
引用
收藏
页码:61 / 76
页数:16
相关论文
共 58 条
[1]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[2]   Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families [J].
Andrade, MA ;
Valencia, A .
BIOINFORMATICS, 1998, 14 (07) :600-607
[3]  
ANDRADE MA, 1999, ISMB, V7, P28
[4]   Shaping of Drosophila alcohol dehydrogenase through evolution:: Relationship with enzyme functionality [J].
Atrian, S ;
Sánchez-Pulido, L ;
Gonzàlez-Duarte, R ;
Valencia, A .
JOURNAL OF MOLECULAR EVOLUTION, 1998, 47 (02) :211-221
[5]   Model of the Ran-RCC1 interaction using biochemical and docking experiments [J].
Azuma, Y ;
Renault, L ;
García-Ranea, JA ;
Valencia, A ;
Nishimoto, T ;
Wittinghofer, A .
JOURNAL OF MOLECULAR BIOLOGY, 1999, 289 (04) :1119-1130
[6]   The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :49-54
[7]   ALSCRIPT - A TOOL TO FORMAT MULTIPLE SEQUENCE ALIGNMENTS [J].
BARTON, GJ .
PROTEIN ENGINEERING, 1993, 6 (01) :37-40
[8]   Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins [J].
Bateman, A ;
Birney, E ;
Durbin, R ;
Eddy, SR ;
Finn, RD ;
Sonnhammer, ELL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :260-262
[9]   Effector recognition by the small GTP-binding proteins Ras and Ral [J].
Bauer, B ;
Mirey, G ;
Vetter, IR ;
García-Ranea, JA ;
Valencia, A ;
Wittinghofer, A ;
Camonis, JH ;
Cool, RH .
JOURNAL OF BIOLOGICAL CHEMISTRY, 1999, 274 (25) :17763-17770
[10]   PairWise and SearchWise: Finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames [J].
Birney, E ;
Thompson, JD ;
Gibson, TJ .
NUCLEIC ACIDS RESEARCH, 1996, 24 (14) :2730-2739