Inverse statistical physics of protein sequences: a key issues review

被引:148
作者
Cocco, Simona [1 ,2 ]
Feinauer, Christoph [3 ]
Figliuzzi, Matteo [3 ]
Monasson, Remi [2 ,4 ]
Weigt, Martin [3 ]
机构
[1] Sorbonne Univ UPMC, Ecole Normale Super, Lab Phys Stat, UMR 8549,CNRS, Paris, France
[2] Sorbonne Univ UPMC, PSL Res, Paris, France
[3] Sorbonne Univ, UPMC, Inst Biol Paris Seine, CNRS,Lab Biol Computat & Quantitat,UMR 7238, Paris, France
[4] Sorbonne Univ UPMC, Lab Phys Theor, Ecole Normale Super, UMR 8549,CNRS, Paris, France
关键词
inverse problems; inverse Ising/Potts problem; statistical inference; protein sequence analysis; coevolution; protein structure prediction; protein-protein interaction; DIRECT-COUPLING ANALYSIS; COEVOLUTIONARY INFORMATION; STRUCTURE PREDICTION; RESIDUE COEVOLUTION; MOLECULAR-DYNAMICS; CONTACT PREDICTION; STRUCTURAL BASIS; HIV EVOLUTION; CO-VARIATION; FAMILIES;
D O I
10.1088/1361-6633/aa9965
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.
引用
收藏
页数:17
相关论文
共 96 条
[1]  
ACKLEY DH, 1985, COGNITIVE SCI, V9, P147
[2]  
Anfinsen C B, 1975, Adv Protein Chem, V29, P205, DOI 10.1016/S0065-3233(08)60413-1
[3]  
[Anonymous], 2006, ARXIVQBIO0611072
[4]   Inverse Ising Inference Using All the Data [J].
Aurell, Erik ;
Ekeberg, Magnus .
PHYSICAL REVIEW LETTERS, 2012, 108 (09)
[5]   Inference of sparse combinatorial-control networks from gene-expression data: a message passing approach [J].
Bailly-Bechet, Marc ;
Braunstein, Alfredo ;
Pagnani, Andrea ;
Weigt, Martin ;
Zecchina, Riccardo .
BMC BIOINFORMATICS, 2010, 11
[6]   Learning generative models for protein fold families [J].
Balakrishnan, Sivaraman ;
Kamisetty, Hetunandan ;
Carbonell, Jaime G. ;
Lee, Su-In ;
Langmead, Christopher James .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2011, 79 (04) :1061-1078
[7]   Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners [J].
Baldassi, Carlo ;
Zamparo, Marco ;
Feinauer, Christoph ;
Procaccini, Andrea ;
Zecchina, Riccardo ;
Weigt, Martin ;
Pagnani, Andrea .
PLOS ONE, 2014, 9 (03)
[8]   Improving landscape inference by integrating heterogeneous data in the inverse Ising problem [J].
Barrat-Charlaix, Pierre ;
Figliuzzi, Matteo ;
Weigt, Martin .
SCIENTIFIC REPORTS, 2016, 6
[9]   ACE: adaptive cluster expansion for maximum entropy graphical model inference [J].
Barton, J. P. ;
De Leonardis, E. ;
Coucke, A. ;
Cocco, S. .
BIOINFORMATICS, 2016, 32 (20) :3089-3097
[10]   Relative rate and location of intra-host HIV evolution to evade cellular immunity are predictable [J].
Barton, John P. ;
Goonetilleke, Nilu ;
Butler, Thomas C. ;
Walker, Bruce D. ;
McMichael, Andrew J. ;
Chakraborty, Arup K. .
NATURE COMMUNICATIONS, 2016, 7