3D deep convolutional neural networks for amino acid environment similarity analysis

被引:94
作者
Torng, Wen [1 ]
Altman, Russ B. [1 ,2 ]
机构
[1] Stanford Univ, Dept Bioengn, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
来源
BMC BIOINFORMATICS | 2017年 / 18卷
关键词
Protein structural analysis; Amino acid similarities; Mutation analysis; Structural bioinformatics; Convolutional neural network; Deep learning; TEMPERATURE-SENSITIVE MUTANT; ENHANCED PROTEIN THERMOSTABILITY; T4; LYSOZYME; BACTERIOPHAGE-T4; STRUCTURAL-ANALYSIS; HYDROPHOBIC CORE; MUTATIONS; DATABASE; BINDING; REPRESENTATION;
D O I
10.1186/s12859-017-1702-0
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Central to protein biology is the understanding of how structural elements give rise to observed function. The surfeit of protein structural data enables development of computational methods to systematically derive rules governing structural-functional relationships. However, performance of these methods depends critically on the choice of protein structural representation. Most current methods rely on features that are manually selected based on knowledge about protein structures. These are often general-purpose but not optimized for the specific application of interest. In this paper, we present a general framework that applies 3D convolutional neural network (3DCNN) technology to structure-based protein analysis. The framework automatically extracts task-specific features from the raw atom distribution, driven by supervised labels. As a pilot study, we use our network to analyze local protein microenvironments surrounding the 20 amino acids, and predict the amino acids most compatible with environments within a protein structure. To further validate the power of our method, we construct two amino acid substitution matrices from the prediction statistics and use them to predict effects of mutations in T4 lysozyme structures. Results: Our deep 3DCNN achieves a two-fold increase in prediction accuracy compared to models that employ conventional hand-engineered features and successfully recapitulates known information about similar and different microenvironments. Models built from our predictions and substitution matrices achieve an 85% accuracy predicting outcomes of the T4 lysozyme mutation variants. Our substitution matrices contain rich information relevant to mutation analysis compared to well-established substitution matrices. Finally, we present a visualization method to inspect the individual contributions of each atom to the classification decisions. Conclusions: End-to-end trained deep learning networks consistently outperform methods using hand-engineered features, suggesting that the 3DCNN framework is well suited for analysis of protein microenvironments and may be useful for other protein structural analyses.
引用
收藏
页数:23
相关论文
共 82 条
[1]   Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning [J].
Alipanahi, Babak ;
Delong, Andrew ;
Weirauch, Matthew T. ;
Frey, Brendan J. .
NATURE BIOTECHNOLOGY, 2015, 33 (08) :831-+
[2]   A cartography of the van der Waals territories [J].
Alvarez, Santiago .
DALTON TRANSACTIONS, 2013, 42 (24) :8617-8636
[3]   HYDROPHOBIC CORE REPACKING AND AROMATIC AROMATIC INTERACTION IN THE THERMOSTABLE MUTANT OF T4 LYSOZYME SER 117-]PHE [J].
ANDERSON, DE ;
HURLEY, JH ;
NICHOLSON, H ;
BAASE, WA ;
MATTHEWS, BW .
PROTEIN SCIENCE, 1993, 2 (08) :1285-1290
[4]  
[Anonymous], P INT C LEARN REPR
[5]  
[Anonymous], ARXIVABS160502688
[6]  
[Anonymous], 1990, Neurocomputing: Algorithms, architectures and applications
[7]  
[Anonymous], 1989, P ADV NEURAL INFORM
[8]  
[Anonymous], 2015, ADV NEURAL INFORM PR
[9]  
[Anonymous], 2006, Pattern Recognition and Machine Learning
[10]  
[Anonymous], ALTERING PROTEIN SPE