SnapDRAGON: a method to delineate protein structural domains from sequence data

被引:68
作者
George, RA [1 ]
Heringa, J [1 ]
机构
[1] Natl Inst Med Res, Div Math Biol, London NW7 1AA, England
基金
英国医学研究理事会;
关键词
protein; domain; boundaries; prediction; folding;
D O I
10.1006/jmbi.2001.5387
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We describe a method to identify protein domain boundaries from sequence information alone based on the assumption that hydrophobic residues cluster together in space. SnapDRAGON is a suite of programs developed to predict domain boundaries based on the consistency observed in a set of alternative ab initio three-dimensional (3D) models generated for a given protein multiple sequence alignment. This is achieved by running a distance geometry-based folding technique in conjunction with a 3D-domain assignment algorithm. The overall accuracy of our method in predicting the number of domains for a non-redundant data set of 414 multiple alignments, representing 185 single and 231 multiple-domain proteins, is 72.4%. Using domain linker regions observed in the tertiary structures associated with each query alignment as the standard of truth, inter-domain boundary positions are delineated with an accuracy of 63.9% for proteins comprising continuous domains only, and 35.4% for proteins with discontinuous domains. Overall, domain boundaries are delineated with an accuracy of 51.8%. The prediction accuracy values are independent of the pair-wise sequence similarities within each of the alignments. These results demonstrate the capability of our method to delineate domains in protein sequences associated with a wide variety of structural domain organisation. (C) 2002 Elsevier Science Ltd.
引用
收藏
页码:839 / 851
页数:13
相关论文
共 69 条
[1]   Multiple domain protein diagnostic patterns [J].
Adams, RM ;
Das, S ;
Smith, TF .
PROTEIN SCIENCE, 1996, 5 (07) :1240-1249
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   GLOBAL FOLD DETERMINATION FROM A SMALL NUMBER OF DISTANCE RESTRAINTS [J].
ASZODI, A ;
GRADWELL, MJ ;
TAYLOR, WR .
JOURNAL OF MOLECULAR BIOLOGY, 1995, 251 (02) :308-326
[4]   Hierarchic inertial projection: A fast distance matrix embedding algorithm [J].
Aszodi, A ;
Taylor, WR .
COMPUTERS & CHEMISTRY, 1997, 21 (01) :13-23
[5]  
Aszódi A, 1997, PROTEINS, P38
[6]   SECONDARY STRUCTURE FORMATION IN MODEL POLYPEPTIDE-CHAINS [J].
ASZODI, A ;
TAYLOR, WR .
PROTEIN ENGINEERING, 1994, 7 (05) :633-644
[7]   FOLDING POLYPEPTIDE ALPHA-CARBON BACKBONES BY DISTANCE GEOMETRY METHODS [J].
ASZODI, A ;
TAYLOR, WR .
BIOPOLYMERS, 1994, 34 (04) :489-505
[8]   PROTEIN MODULES [J].
BARON, M ;
NORMAN, DG ;
CAMPBELL, ID .
TRENDS IN BIOCHEMICAL SCIENCES, 1991, 16 (01) :13-17
[9]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
[10]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242