MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction

被引:120
作者
Fang, Chao [1 ]
Shang, Yi [1 ]
Xu, Dong [1 ,2 ]
机构
[1] Univ Missouri, Dept Elect Engn & Comp Sci, Columbia, MO 65211 USA
[2] Univ Missouri, Christopher S Bond Life Sci Ctr, Columbia, MO USA
基金
美国国家科学基金会;
关键词
deep learning; deep neural networks; protein secondary structure; protein structure prediction; NEURAL-NETWORKS; GENERATION;
D O I
10.1002/prot.25487
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein secondary structure prediction can provide important information for protein 3D structure prediction and protein functions. Deep learning offers a new opportunity to significantly improve prediction accuracy. In this article, a new deep neural network architecture, named the Deep inception-inside-inception (Deep3I) network, is proposed for protein secondary structure prediction and implemented as a software tool MUFOLD-SS. The input to MUFOLD-SS is a carefully designed feature matrix corresponding to the primary amino acid sequence of a protein, which consists of a rich set of information derived from individual amino acid, as well as the context of the protein sequence. Specifically, the feature matrix is a composition of physio-chemical properties of amino acids, PSI-BLAST profile, and HHBlits profile. MUFOLD-SS is composed of a sequence of nested inception modules and maps the input matrix to either eight states or three states of secondary structures. The architecture of MUFOLD-SS enables effective processing of local and global interactions between amino acids in making accurate prediction. In extensive experiments on multiple datasets, MUFOLD-SS outperformed the best existing methods and other deep neural networks significantly. MUFold-SS can be downloaded from .
引用
收藏
页码:592 / 598
页数:7
相关论文
共 32 条
[1]  
Abadi M, 2016, XIV160304467
[2]   Accurate prediction of solvent accessibility using neural networks-based regression [J].
Adamczak, R ;
Porollo, A ;
Meller, J .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 56 (04) :753-767
[3]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[4]  
ASAI K, 1993, COMPUT APPL BIOSCI, V9, P141
[5]   Sequence context-specific profiles for homology searching [J].
Biegert, A. ;
Soeding, J. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (10) :3770-3775
[6]   Domain enhanced lookup time accelerated BLAST [J].
Boratyn, Grzegorz M. ;
Schaeffer, Alejandro A. ;
Agarwala, Richa ;
Altschul, Stephen F. ;
Lipman, David J. ;
Madden, Thomas L. .
BIOLOGY DIRECT, 2012, 7
[7]  
Busia A, 2017, XIV170203865
[8]   SCRATCH: a protein structure and structural feature prediction server [J].
Cheng, J ;
Randall, AZ ;
Sweredoski, MJ ;
Baldi, P .
NUCLEIC ACIDS RESEARCH, 2005, 33 :W72-W76
[9]  
Chollet F., 2015, about us
[10]   Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training [J].
Dor, Ofer ;
Zhou, Yaoqi .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2007, 66 (04) :838-845