RECURRING LOCAL SEQUENCE MOTIFS IN PROTEINS

被引:62
作者
HAN, KF
BAKER, D
机构
[1] UNIV WASHINGTON,DEPT BIOCHEM,SEATTLE,WA 98195
[2] UNIV CALIF SAN FRANCISCO,GRAD GRP BIOPHYS,SAN FRANCISCO,CA 94143
基金
美国国家科学基金会;
关键词
MULTIPLE SEQUENCE ALIGNMENTS; SEQUENCE COMPARISON; SUBSTITUTION MATRICES; PROTEIN STRUCTURE PREDICTION; SEQUENCE MOTIFS;
D O I
10.1006/jmbi.1995.0424
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We describe a completely automated approach to identifying local sequence motifs that transcend protein family boundaries. Cluster analysis is used to identify recurring patterns of variation at single positions and in short segments of contiguous positions in multiple sequence alignments for a non-redundant set of protein families. Parallel experiments on simulated data sets constructed with the overall residue frequencies of proteins but not the inter-residue correlations show that naturally occurring protein sequences are significantly more clustered than the corresponding random sequences for window lengths ranging from one to 13 contiguous positions. The patterns of variation at single positions are not in general surprising: chemically similar amino acids tend to be grouped together. More interesting patterns emerge as the window length increases. The patterns of variation for longer window lengths are in part recognizable patterns of hydrophobic and hydrophilic residues, and in part less obvious combinations. A particularly interesting class of patterns features highly conserved glycine residues. The patterns provide a means to abstract the information contained in multiple sequence alignments and may be useful for comparison of distantly related sequences or sequence families and for protein structure prediction. (C) 1995 Academic Press Limited
引用
收藏
页码:176 / 187
页数:12
相关论文
共 20 条
[1]  
ALTSCHUL SF, 1989, J MOL BIOL, V20, P647
[2]  
[Anonymous], 1995, CLUSTER ANAL
[3]  
[Anonymous], 1978, ATLAS PROTEIN SEQ ST
[4]   RULES FOR ALPHA-HELIX TERMINATION BY GLYCINE [J].
AURORA, R ;
SRINIVASAN, R ;
ROSE, GD .
SCIENCE, 1994, 264 (5162) :1126-1130
[5]   A METHOD TO IDENTIFY PROTEIN SEQUENCES THAT FOLD INTO A KNOWN 3-DIMENSIONAL STRUCTURE [J].
BOWIE, JU ;
LUTHY, R ;
EISENBERG, D .
SCIENCE, 1991, 253 (5016) :164-170
[6]  
BROWN M, 1993, 1ST P INT C INT SYST, P47
[7]   PREDICTION OF PROTEIN CONFORMATION [J].
CHOU, PY ;
FASMAN, GD .
BIOCHEMISTRY, 1974, 13 (02) :222-245
[8]   WHAT IS A CONSERVATIVE SUBSTITUTION [J].
FRENCH, S ;
ROBSON, B .
JOURNAL OF MOLECULAR EVOLUTION, 1983, 19 (02) :171-175
[9]  
GONNET G, 1994, BIOCHEM BIOPH RES CO, V199, P496
[10]   EXHAUSTIVE MATCHING OF THE ENTIRE PROTEIN-SEQUENCE DATABASE [J].
GONNET, GH ;
COHEN, MA ;
BENNER, SA .
SCIENCE, 1992, 256 (5062) :1443-1445