From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction

被引:96
作者
Cocco, Simona [1 ,2 ]
Monasson, Remi [2 ,3 ]
Weigt, Martin [4 ,5 ]
机构
[1] Ecole Normale Super, CNRS, Lab Phys Stat, UMR 8550, Paris, France
[2] Univ Paris 06, Paris, France
[3] Ecole Normale Super, CNRS, Phys Theor Lab, UMR 8549, Paris, France
[4] Univ Paris 06, Lab Genom Microorganismes, UMR 7238, Paris, France
[5] Human Genet Fdn, Turin, Italy
关键词
CORRELATED MUTATIONS; CRYSTAL-STRUCTURE; SEQUENCE; INFORMATION; CONTACTS;
D O I
10.1371/journal.pcbi.1003176
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Various approaches have explored the covariation of residues in multiple-sequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analysis (DCA), a global inference method based on the maximum entropy principle, which aims at predicting residue-residue contacts. In this paper, inspired by the statistical physics of disordered systems, we introduce the Hopfield-Potts model to naturally interpolate between these two approaches. The Hopfield-Potts model allows us to identify relevant 'patterns' of residues from the knowledge of the eigenmodes and eigenvalues of the residue-residue correlation matrix. We show how the computation of such statistical patterns makes it possible to accurately predict residue-residue contacts with a much smaller number of parameters than DCA. This dimensional reduction allows us to avoid overfitting and to extract contact information from multiple-sequence alignments of reduced size. In addition, we show that low-eigenvalue correlation modes, discarded by PCA, are important to recover structural information: the corresponding patterns are highly localized, that is, they are concentrated in few sites, which we find to be in close contact in the three-dimensional protein fold.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] [Anonymous], 2002, USING SEQUENCE ALIGN
  • [2] [Anonymous], PHYS REV 2
  • [3] Reorganizing the protein space at the Universal Protein Resource (UniProt)
    Apweiler, Rolf
    Martin, Maria Jesus
    O'Donovan, Claire
    Magrane, Michele
    Alam-Faruque, Yasmin
    Antunes, Ricardo
    Casanova, Elisabet Barrera
    Bely, Benoit
    Bingley, Mark
    Bower, Lawrence
    Bursteinas, Borisas
    Chan, Wei Mun
    Chavali, Gayatri
    Da Silva, Alan
    Dimmer, Emily
    Eberhardt, Ruth
    Fazzini, Francesco
    Fedotov, Alexander
    Garavelli, John
    Castro, Leyla Garcia
    Gardner, Michael
    Hieta, Reija
    Huntley, Rachael
    Jacobsen, Julius
    Legge, Duncan
    Liu, Wudong
    Luo, Jie
    Orchard, Sandra
    Patient, Samuel
    Pichler, Klemens
    Poggioli, Diego
    Pontikos, Nikolas
    Pundir, Sangya
    Rosanoff, Steven
    Sawford, Tony
    Sehra, Harminder
    Turner, Edward
    Wardell, Tony
    Watkins, Xavier
    Corbett, Matt
    Donnelly, Mike
    van Rensburg, Pieter
    Goujon, Mickael
    McWilliam, Hamish
    Lopez, Rodrigo
    Xenarios, Ioannis
    Bougueleret, Lydie
    Bridge, Alan
    Poux, Sylvain
    Redaschi, Nicole
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) : D71 - D75
  • [4] ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids
    Ashkenazy, Haim
    Erez, Elana
    Martz, Eric
    Pupko, Tal
    Ben-Tal, Nir
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 : W529 - W533
  • [5] Bai Z., 2009, SPECTRAL ANAL LARGE
  • [6] Learning generative models for protein fold families
    Balakrishnan, Sivaraman
    Kamisetty, Hetunandan
    Carbonell, Jaime G.
    Lee, Su-In
    Langmead, Christopher James
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2011, 79 (04) : 1061 - 1078
  • [7] Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
  • [8] Crystal structure of the response regulator 02 receiver domain, the essential YycF two-component system of Streptococcus pneumoniae in both complexed and native states
    Bent, CJ
    Isaacs, NW
    Mitchell, TJ
    Riboldi-Tunnicliffe, A
    [J]. JOURNAL OF BACTERIOLOGY, 2004, 186 (09) : 2872 - 2879
  • [9] The Protein Data Bank at 40: Reflecting on the Past to Prepare for the Future
    Berman, Helen M.
    Kleywegt, Gerard J.
    Nakamura, Haruki
    Markley, John L.
    [J]. STRUCTURE, 2012, 20 (03) : 391 - 396
  • [10] Statistical mechanics for natural flocks of birds
    Bialek, William
    Cavagna, Andrea
    Giardina, Irene
    Mora, Thierry
    Silvestri, Edmondo
    Viale, Massimiliano
    Walczak, Aleksandra M.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2012, 109 (13) : 4786 - 4791