On the origin and highly likely completeness of single-domain protein structures

被引:145
作者
Zhang, Y
Hubner, IA
Arakaki, AK
Shakhnovich, E
Skolnick, J
机构
[1] SUNY Buffalo, Ctr Excellence Bioinformat, Buffalo, NY 14203 USA
[2] Harvard Univ, Dept Chem & Chem Biol, Cambridge, MA 02138 USA
关键词
evolution; protein data bank; protein folding; protein structure prediction;
D O I
10.1073/pnas.0509379103
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The size and origin of the protein fold universe is of fundamental and practical importance. Analyzing randomly generated, compact sticky homopolypeptide conformations constructed in generic simplified and all-atom protein models, all have similar folds in the library of solved structures, the Protein Data Bank, and conversely, all compact, single-domain protein structures in the Protein Data Bank have structural analogues in the compact model set. Thus, both sets are highly likely complete, with the protein fold universe arising from compact conformations of hydrogen-bonded, secondary structures. Because side chains are represented by their C-beta atoms, these results also suggest that the observed protein folds are insensitive to the details of side-chain packing. Sequence specificity enters both in fine-tuning the structure and thermodynamically stabilizing a given fold with respect to the set of alternatives. Scanning the models against a three-dimensional active-site library, close geometric matches are frequently found. Thus, the presence of active-site-like geometries also seems to be a consequence of the packing of compact, secondary structural elements. These results have significant implications for the evolution of protein structure and function.
引用
收藏
页码:2605 / 2610
页数:6
相关论文
共 37 条
  • [1] Ten thousand interactions for the molecular biologist
    Aloy, P
    Russell, RB
    [J]. NATURE BIOTECHNOLOGY, 2004, 22 (10) : 1317 - 1321
  • [2] PRINCIPLES THAT GOVERN FOLDING OF PROTEIN CHAINS
    ANFINSEN, CB
    [J]. SCIENCE, 1973, 181 (4096) : 223 - 230
  • [3] Large-scale assessment of the utility of low-resolution protein structures for biochemical function assignment
    Arakaki, AK
    Zhang, Y
    Skolnick, J
    [J]. BIOINFORMATICS, 2004, 20 (07) : 1087 - 1096
  • [4] The Protein Data Bank
    Berman, HM
    Battistuz, T
    Bhat, TN
    Bluhm, WF
    Bourne, PE
    Burkhardt, K
    Iype, L
    Jain, S
    Fagan, P
    Marvin, J
    Padilla, D
    Ravichandran, V
    Schneider, B
    Thanki, N
    Weissig, H
    Westbrook, JD
    Zardecki, C
    [J]. ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY, 2002, 58 : 899 - 907
  • [5] A METHOD TO IDENTIFY PROTEIN SEQUENCES THAT FOLD INTO A KNOWN 3-DIMENSIONAL STRUCTURE
    BOWIE, JU
    LUTHY, R
    EISENBERG, D
    [J]. SCIENCE, 1991, 253 (5016) : 164 - 170
  • [6] A practical and robust sequence search strategy for structural genomics target selection
    Bray, JE
    Marsden, RL
    Rison, SCG
    Savchenko, A
    Edwards, AM
    Thornton, JM
    Orengo, CA
    [J]. BIOINFORMATICS, 2004, 20 (14) : 2288 - 2295
  • [7] Structuring the universe of proteins
    Burley, SK
    Bonanno, JB
    [J]. ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, 2002, 3 : 243 - 262
  • [8] Identification and optimization of protein domains for NMR studies
    Card, PB
    Gardner, KH
    [J]. NUCLEAR MAGNETIC RESONANCE OF BIOLOGICAL MACROMOLECULES, PART C, 2005, 394 : 3 - +
  • [9] Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches
    Chandonia, JM
    Brenner, SE
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 58 (01) : 166 - 179
  • [10] Automated prediction of CASP-5 structures using the Robetta server
    Chivian, D
    Kim, DE
    Malmström, L
    Bradley, P
    Robertson, T
    Murphy, P
    Strauss, CEM
    Bonneau, R
    Rohl, CA
    Baker, D
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 53 : 524 - 533