RECURRING LOCAL SEQUENCE MOTIFS IN PROTEINS

被引:62
作者
HAN, KF
BAKER, D
机构
[1] UNIV WASHINGTON,DEPT BIOCHEM,SEATTLE,WA 98195
[2] UNIV CALIF SAN FRANCISCO,GRAD GRP BIOPHYS,SAN FRANCISCO,CA 94143
基金
美国国家科学基金会;
关键词
MULTIPLE SEQUENCE ALIGNMENTS; SEQUENCE COMPARISON; SUBSTITUTION MATRICES; PROTEIN STRUCTURE PREDICTION; SEQUENCE MOTIFS;
D O I
10.1006/jmbi.1995.0424
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We describe a completely automated approach to identifying local sequence motifs that transcend protein family boundaries. Cluster analysis is used to identify recurring patterns of variation at single positions and in short segments of contiguous positions in multiple sequence alignments for a non-redundant set of protein families. Parallel experiments on simulated data sets constructed with the overall residue frequencies of proteins but not the inter-residue correlations show that naturally occurring protein sequences are significantly more clustered than the corresponding random sequences for window lengths ranging from one to 13 contiguous positions. The patterns of variation at single positions are not in general surprising: chemically similar amino acids tend to be grouped together. More interesting patterns emerge as the window length increases. The patterns of variation for longer window lengths are in part recognizable patterns of hydrophobic and hydrophilic residues, and in part less obvious combinations. A particularly interesting class of patterns features highly conserved glycine residues. The patterns provide a means to abstract the information contained in multiple sequence alignments and may be useful for comparison of distantly related sequences or sequence families and for protein structure prediction. (C) 1995 Academic Press Limited
引用
收藏
页码:176 / 187
页数:12
相关论文
共 20 条
[11]  
GRIBSKOV M, 1990, METHOD ENZYMOL, V183, P146
[12]   AMINO-ACID SUBSTITUTION MATRICES FROM PROTEIN BLOCKS [J].
HENIKOFF, S ;
HENIKOFF, JG .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (22) :10915-10919
[13]   ALIGNMENT AND SEARCHING FOR COMMON PROTEIN FOLDS USING A DATA-BANK OF STRUCTURAL TEMPLATES [J].
JOHNSON, MS ;
OVERINGTON, JP ;
BLUNDELL, TL .
JOURNAL OF MOLECULAR BIOLOGY, 1993, 231 (03) :735-752
[14]   TESTS FOR COMPARING RELATED AMINO-ACID SEQUENCES CYTOCHROME-C AND CYTOCHROME-C551 [J].
MCLACHLAN, AD .
JOURNAL OF MOLECULAR BIOLOGY, 1971, 61 (02) :409-+
[15]   AMINO-ACID SUBSTITUTIONS IN STRUCTURALLY RELATED PROTEINS - A PATTERN-RECOGNITION APPROACH - DETERMINATION OF A NEW AND EFFICIENT SCORING MATRIX [J].
RISLER, JL ;
DELORME, MO ;
DELACROIX, H ;
HENAUT, A .
JOURNAL OF MOLECULAR BIOLOGY, 1988, 204 (04) :1019-1029
[16]   PREDICTION OF PROTEIN SECONDARY STRUCTURE AT BETTER THAN 70-PERCENT ACCURACY [J].
ROST, B ;
SANDER, C .
JOURNAL OF MOLECULAR BIOLOGY, 1993, 232 (02) :584-599
[17]   DATABASE OF HOMOLOGY-DERIVED PROTEIN STRUCTURES AND THE STRUCTURAL MEANING OF SEQUENCE ALIGNMENT [J].
SANDER, C ;
SCHNEIDER, R .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1991, 9 (01) :56-68
[18]   WEIGHTING ALIGNED PROTEIN OR NUCLEIC-ACID SEQUENCES TO CORRECT FOR UNEQUAL REPRESENTATION [J].
SIBBALD, PR ;
ARGOS, P .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 216 (04) :813-818
[19]   ITERATIVE CHARACTER WEIGHTING BASED ON MUTATION FREQUENCY - A NEW METHOD FOR CONSTRUCTING PHYLETIC TREES [J].
VANOOYEN, A ;
HOGEWEG, P .
JOURNAL OF MOLECULAR EVOLUTION, 1990, 31 (04) :330-342
[20]   WEIGHTING IN SEQUENCE SPACE - A COMPARISON OF METHODS IN TERMS OF GENERALIZED SEQUENCES [J].
VINGRON, M ;
SIBBALD, PR .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1993, 90 (19) :8777-8781