NONGLOBULAR DOMAINS IN PROTEIN SEQUENCES - AUTOMATED SEGMENTATION USING COMPLEXITY-MEASURES

被引:388
作者
WOOTTON, JC
机构
[1] National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, Bldg. 38A, Room 8N805
来源
COMPUTERS & CHEMISTRY | 1994年 / 18卷 / 03期
关键词
D O I
10.1016/0097-8485(94)85023-2
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Computational methods based on mathematically-defined measures of compositional complexity have been developed to distinguish globular and non-globular regions of protein sequences. Compact globular structures in protein molecules are shown to be determined by amino acid sequences of high informational complexity. Sequences of known crystal structure in the Brookhaven Protein Data Bank differ only slightly from randomly shuffled sequences in the distribution of statistical properties such as local compositional complexity. In contrast, in the much larger body of deduced sequences in the SWISS-PROT database, approximately one quarter of the residues occur in segments of non-randomly low complexity and approximately half of the entries contain at least one such segment. Sequences of proteins with known, physicochemically-defined non-globular regions have been analyzed, including collagens, different classes of coiled-coil proteins, elastins, histones, non-histone proteins, mucins, proteoglycan core proteins and proteins containing long single solvent-exposed alpha-helices. The SEG algorithm provides an effective general method for partitioning the globular and non-globular regions of these sequences fully automatically. This method is also facilitating the discovery of new classes of long, non-globular sequence segments, as illustrated by the example of the human CAN gene product involved in tumor induction.
引用
收藏
页码:269 / 285
页数:17
相关论文
共 45 条
[11]   A MOLECULAR-DYNAMICS SIMULATION OF POLYALANINE - AN ANALYSIS OF EQUILIBRIUM MOTIONS AND HELIX COIL TRANSITIONS [J].
DAGGETT, V ;
KOLLMAN, PA ;
KUNTZ, ID .
BIOPOLYMERS, 1991, 31 (09) :1115-1134
[12]  
DENNIS JE, 1990, J BIOL CHEM, V265, P12098
[13]  
DEWET W, 1987, J BIOL CHEM, V262, P16032
[14]  
DOEGE KJ, 1991, J BIOL CHEM, V266, P894
[15]   WHY DO GLOBULAR-PROTEINS FIT THE LIMITED SET OF FOLDING PATTERNS [J].
FINKELSTEIN, AV ;
PTITSYN, OB .
PROGRESS IN BIOPHYSICS & MOLECULAR BIOLOGY, 1987, 50 (03) :171-190
[17]   ANCIENT CONSERVED REGIONS IN NEW GENE-SEQUENCES AND THE PROTEIN DATABASES [J].
GREEN, P ;
LIPMAN, D ;
HILLIER, L ;
WATERSTON, R ;
STATES, D ;
CLAVERIE, JM .
SCIENCE, 1993, 259 (5102) :1711-1716
[18]   THE NATURE OF FOLDED STATES OF GLOBULAR-PROTEINS [J].
HONEYCUTT, JD ;
THIRUMALAI, D .
BIOPOLYMERS, 1992, 32 (06) :695-709
[19]   THE COMPLETE SEQUENCE OF THE HUMAN BETA-MYOSIN HEAVY-CHAIN GENE AND A COMPARATIVE-ANALYSIS OF ITS PRODUCT [J].
JAENICKE, T ;
DIEDERICH, KW ;
HAAS, W ;
SCHLEICH, J ;
LICHTER, P ;
PFORDT, M ;
BACH, A ;
VOSBERG, HP .
GENOMICS, 1990, 8 (02) :194-206
[20]   CHANCE AND STATISTICAL SIGNIFICANCE IN PROTEIN AND DNA-SEQUENCE ANALYSIS [J].
KARLIN, S ;
BRENDEL, V .
SCIENCE, 1992, 257 (5066) :39-49