Statistical inference of protein structural alignments using information and compression

被引:15
作者
Collier, James H. [1 ]
Allison, Lloyd [1 ]
Lesk, Arthur M. [2 ]
Stuckey, Peter J. [3 ]
de la Banda, Maria Garcia [1 ]
Konagurthu, Arun S. [1 ]
机构
[1] Monash Univ, Fac Informat Technol, Clayton, Vic 3800, Australia
[2] Penn State Univ, Dept Biochem & Mol Biol, University Pk, PA 16802 USA
[3] Univ Melbourne, Dept Comp & Informat Syst, Parkville, Vic 3010, Australia
基金
澳大利亚研究理事会;
关键词
SEQUENCE; CLASSIFICATION;
D O I
10.1093/bioinformatics/btw757
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates). Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a statistical framework for the precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power-the amount of lossless compression achieved to explain the protein coordinates using that alignment. Results: We have implemented this approach in MMLigner, the first program able to infer statistically significant structural alignments. We also demonstrate the reliability of MMLigner's alignment results when compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes.
引用
收藏
页码:1005 / 1013
页数:9
相关论文
共 31 条
[1]   Are viruses a source of new protein folds for organisms? - Virosphere structure space and evolution [J].
Abroi, Aare ;
Gough, Julian .
BIOESSAYS, 2011, 33 (08) :626-635
[2]   FINITE-STATE MODELS IN THE ALIGNMENT OF MACROMOLECULES [J].
ALLISON, L ;
WALLACE, CS ;
YEE, CN .
JOURNAL OF MOLECULAR EVOLUTION, 1992, 35 (01) :77-89
[3]  
[Anonymous], 2005, Statistical and Inductive Inference by Minimum Message Length
[4]   INFORMATION CONTENT OF A MULTISTATE DISTRIBUTION [J].
BOULTON, DM ;
WALLACE, CS .
JOURNAL OF THEORETICAL BIOLOGY, 1969, 23 (02) :269-+
[5]   THE RELATION BETWEEN THE DIVERGENCE OF SEQUENCE AND STRUCTURE IN PROTEINS [J].
CHOTHIA, C ;
LESK, AM .
EMBO JOURNAL, 1986, 5 (04) :823-826
[6]   THE CLASSIFICATION AND ORIGINS OF PROTEIN FOLDING PATTERNS [J].
CHOTHIA, C ;
FINKELSTEIN, AV .
ANNUAL REVIEW OF BIOCHEMISTRY, 1990, 59 :1007-1039
[7]   A new statistical framework to assess structural alignment quality using information compression [J].
Collier, James H. ;
Allison, Lloyd ;
Lesk, Arthur M. ;
de la Banda, Maria Garcia ;
Konagurthu, Arun S. .
BIOINFORMATICS, 2014, 30 (17) :I512-I518
[8]   Advances and pitfalls of protein structural alignment [J].
Hasegawa, Hitomi ;
Holm, Liisa .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2009, 19 (03) :341-348
[9]   PROTEIN-STRUCTURE COMPARISON BY ALIGNMENT OF DISTANCE MATRICES [J].
HOLM, L ;
SANDER, C .
JOURNAL OF MOLECULAR BIOLOGY, 1993, 233 (01) :123-138
[10]   Structure is three to ten times more conserved than sequence-A study of structural response in protein cores [J].
Illergard, Kristoffer ;
Ardell, David H. ;
Elofison, Arne .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2009, 77 (03) :499-508