Sequence-similar, structure-dissimilar protein pairs in the PDB

被引:92
作者
Kosloff, Mickey [1 ]
Kolodny, Rachel [1 ,2 ]
机构
[1] Columbia Univ, Ctr Computat Biol & Bioinformat, Dept Biochem & Mol Biophys, New York, NY 10032 USA
[2] Howard Hughes Med Inst, Chevy Chase, MD 20815 USA
关键词
structure comparison; structure alignment; structural differences; nonredundant; structure prediction;
D O I
10.1002/prot.21770
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).
引用
收藏
页码:891 / 902
页数:12
相关论文
共 47 条
  • [1] ALTSCHUL SF, 1997, NUCLEIC ACIDS RES, V25, P3402
  • [2] SCOP database in 2004: refinements integrate structure and sequence family data
    Andreeva, A
    Howorth, D
    Brenner, SE
    Hubbard, TJP
    Chothia, C
    Murzin, AG
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D226 - D229
  • [3] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [4] BRANDEN CI, 1999, INTRO PROTEIN STRUCT, V14
  • [5] The ASTRAL compendium for protein structure and sequence analysis
    Brenner, SE
    Koehl, P
    Levitt, R
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 254 - 256
  • [6] How root-mean-square distance (r.m.s.d.) values depend on the resolution of protein structures that are compared
    Carugo, O
    [J]. JOURNAL OF APPLIED CRYSTALLOGRAPHY, 2003, 36 : 125 - 128
  • [7] THE RELATION BETWEEN THE DIVERGENCE OF SEQUENCE AND STRUCTURE IN PROTEINS
    CHOTHIA, C
    LESK, AM
    [J]. EMBO JOURNAL, 1986, 5 (04) : 823 - 826
  • [8] Heterogeneity and inaccuracy in protein structures solved by X-ray crystallography
    DePristo, MA
    de Bakker, PIW
    Blundell, TL
    [J]. STRUCTURE, 2004, 12 (05) : 831 - 838
  • [9] Intrinsically unstructured proteins and their functions
    Dyson, HJ
    Wright, PE
    [J]. NATURE REVIEWS MOLECULAR CELL BIOLOGY, 2005, 6 (03) : 197 - 208
  • [10] MolMovDB: analysis and visualization of conformational change and structural flexibility
    Echols, N
    Milburn, D
    Gerstein, M
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 478 - 482