Automated protein sequence database classification. II. Delineation of domain boundaries from sequence similarities

被引:43
作者
Gracy, J [1 ]
Argos, P [1 ]
机构
[1] European Mol Biol Lab, D-69012 Heidelberg, Germany
关键词
D O I
10.1093/bioinformatics/14.2.174
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Decomposing each protein into modular domains is a basic prerequisite to classify accurately structural units in biological molecules. Boundaries between domains are indicated by two similar- amino acid sequence segments located within the same protein (repeats) ol within homologous proteins at notably different distances from their respective N- or C-termini. Results: We have developed an automated method that combines such positional constraints derived from various detected pairwise sequence similarities to delineate the modular organization of proteins. The procedure has been applied to a non-redundant data set of 26 990 proteins whose sequences were taken from the PIR and SWISS-PROT databanks and shared <60% sequence identity amongst pairs. The resultant clustering, delineation and multiple alignment of 24 380 sequence fragments yielded a new database of 4364 domain families. Comparison of the domain collection with that of PRODOM indicates a clear improvement in the number and size of domain families, domain boundaries and multiple sequence alignments. The accuracy and sensitivity of the method are illustrated by results obtained for ankyrin-like repeats and EGF-like modules.
引用
收藏
页码:174 / 187
页数:14
相关论文
共 17 条
[1]   PRINTS - A PROTEIN MOTIF FINGERPRINT DATABASE [J].
ATTWOOD, TK ;
BECK, ME .
PROTEIN ENGINEERING, 1994, 7 (07) :841-848
[2]   The SWISS-PROT protein sequence data bank and its new supplement TREMBL [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 1996, 24 (01) :21-25
[3]   The PROSITE database, its status in 1995 [J].
Bairoch, A ;
Bucher, P ;
Hofmann, K .
NUCLEIC ACIDS RESEARCH, 1996, 24 (01) :189-196
[4]   HUNDREDS OF ANKYRIN-LIKE REPEATS IN FUNCTIONALLY DIVERSE PROTEINS - MOBILE MODULES THAT CROSS PHYLA HORIZONTALLY [J].
BORK, P .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1993, 17 (04) :363-374
[5]  
BORK P, 1995, TRENDS BIOCH SCI, V20
[6]  
DAVIS C G, 1990, New Biologist, V2, P410
[7]  
Etzold T, 1996, METHOD ENZYMOL, V266, P114
[8]   The PIR-International protein sequence database [J].
George, DG ;
Barker, WC ;
Mewes, HW ;
Pfeiffer, F ;
Tsugita, A .
NUCLEIC ACIDS RESEARCH, 1996, 24 (01) :17-20
[9]   Automated protein sequence database classification. I. Integration of compositional similarity search, local similarity search, and multiple sequence alignment [J].
Gracy, J ;
Argos, P .
BIOINFORMATICS, 1998, 14 (02) :164-173
[10]   AMINO-ACID SUBSTITUTION MATRICES FROM PROTEIN BLOCKS [J].
HENIKOFF, S ;
HENIKOFF, JG .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (22) :10915-10919