Improving Contact Prediction along Three Dimensions

被引:58
作者
Feinauer, Christoph [1 ,2 ]
Skwark, Marcin J. [3 ,4 ]
Pagnani, Andrea [1 ,2 ,5 ]
Aurell, Erik [3 ,4 ,6 ]
机构
[1] Politecn Torino, DISAT, Turin, Italy
[2] Politecn Torino, Ctr Computat Sci, Turin, Italy
[3] Aalto Univ, Dept Informat & Comp Sci, Aalto, Finland
[4] Aalto Univ, Aalto Sci Inst AScI, Aalto, Finland
[5] Human Genet Fdn Torino, Ctr Mol Biotechnol, Turin, Italy
[6] AlbaNova Univ Ctr, Royal Inst Technol, Dept Computat Biol, Stockholm, Sweden
基金
芬兰科学院;
关键词
DIRECT-COUPLING ANALYSIS; CORRELATED MUTATIONS; PROTEIN-STRUCTURE; SEQUENCE; CLASSIFICATION; COVARIANCE;
D O I
10.1371/journal.pcbi.1003847
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Correlation patterns in multiple sequence alignments of homologous proteins can be exploited to infer information on the three-dimensional structure of their members. The typical pipeline to address this task, which we in this paper refer to as the three dimensions of contact prediction, is to (i) filter and align the raw sequence data representing the evolutionarily related proteins; (ii) choose a predictive model to describe a sequence alignment; (iii) infer the model parameters and interpret them in terms of structural properties, such as an accurate contact map. We show here that all three dimensions are important for overall prediction success. In particular, we show that it is possible to improve significantly along the second dimension by going beyond the pair-wise Potts models from statistical physics, which have hitherto been the focus of the field. These (simple) extensions are motivated by multiple sequence alignments often containing long stretches of gaps which, as a data feature, would be rather untypical for independent samples drawn from a Potts model. Using a large test set of proteins we show that the combined improvements along the three dimensions are as large as any reported to date.
引用
收藏
页数:13
相关论文
共 44 条
  • [1] CORRELATION OF COORDINATED AMINO-ACID SUBSTITUTIONS WITH FUNCTION IN VIRUSES RELATED TO TOBACCO MOSAIC-VIRUS
    ALTSCHUH, D
    LESK, AM
    BLOOMER, AC
    KLUG, A
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1987, 193 (04) : 693 - 707
  • [2] Information geometry on hierarchy of probability distributions
    Amari, S
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2001, 47 (05) : 1701 - 1711
  • [3] PRINCIPLES THAT GOVERN FOLDING OF PROTEIN CHAINS
    ANFINSEN, CB
    [J]. SCIENCE, 1973, 181 (4096) : 223 - 230
  • [4] [Anonymous], PREDICTION RESIDUE R
  • [5] [Anonymous], LECT NOTES MONOGRAPH
  • [6] [Anonymous], LARGE DEVIATIONS APP
  • [7] Update on activities at the Universal Protein Resource (UniProt) in 2013
    Apweiler, Rolf
    Martin, Maria Jesus
    O'Donovan, Claire
    Magrane, Michele
    Alam-Faruque, Yasmin
    Alpi, Emanuela
    Antunes, Ricardo
    Arganiska, Joanna
    Casanova, Elisabet Barrera
    Bely, Benoit
    Bingley, Mark
    Bonilla, Carlos
    Britto, Ramona
    Bursteinas, Borisas
    Chan, Wei Mun
    Chavali, Gayatri
    Cibrian-Uhalte, Elena
    Da Silva, Alan
    De Giorgi, Maurizio
    Dimmer, Emily
    Fazzini, Francesco
    Gane, Paul
    Fedotov, Alexander
    Castro, Leyla Garcia
    Garmiri, Penelope
    Hatton-Ellis, Emma
    Hieta, Reija
    Huntley, Rachael
    Jacobsen, Julius
    Jones, Rachel
    Legge, Duncan
    Liu, Wudong
    Luo, Jie
    MacDougall, Alistair
    Mutowo, Prudence
    Nightingale, Andrew
    Orchard, Sandra
    Patient, Samuel
    Pichler, Klemens
    Poggioli, Diego
    Pundir, Sangya
    Pureza, Luis
    Qi, Guoying
    Rosanoff, Steven
    Sawford, Tony
    Sehra, Harminder
    Turner, Edward
    Volynkin, Vladimir
    Wardell, Tony
    Watkins, Xavier
    [J]. NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) : D43 - D47
  • [8] Learning generative models for protein fold families
    Balakrishnan, Sivaraman
    Kamisetty, Hetunandan
    Carbonell, Jaime G.
    Lee, Su-In
    Langmead, Christopher James
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2011, 79 (04) : 1061 - 1078
  • [9] Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
  • [10] STATISTICAL-ANALYSIS OF NON-LATTICE DATA
    BESAG, J
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES D-THE STATISTICIAN, 1975, 24 (03) : 179 - 195