Protein secondary structure prediction improved by recurrent neural networks integrated with two-dimensional convolutional neural networks

被引:46
作者
Guo, Yanbu [1 ]
Wang, Bingyi [2 ]
Li, Weihua [1 ]
Yang, Bei [3 ]
机构
[1] Yunnan Univ, Sch Informat Sci & Engn, 2 North Cuihu Rd, Kunming 650091, Yunnan, Peoples R China
[2] Chinese Acad Forestry, Res Inst Resource Insects, Kunming 650224, Yunnan, Peoples R China
[3] Second Peoples Hosp Yunnan Prov, Cardiol Dept, 176 Qingnian Rd, Kunming 650021, Yunnan, Peoples R China
基金
美国国家科学基金会;
关键词
Bioinformatics; protein secondary structure predication (PSSP); convolutional neural networks (CNNs); recurrent neural networks (RNNs); long short-term memory (LSTM); gated recurrent units (GRUs);
D O I
10.1142/S021972001850021X
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein secondary structure prediction (PSSP) is an important research field in bioinformatics. The representation of protein sequence features could be treated as a matrix, which includes the amino-acid residue (time-step) dimension and the feature vector dimension. Common approaches to predict secondary structures only focus on the amino-acid residue dimension. However, the feature vector dimension may also contain useful information for PSSP. To integrate the information on both dimensions of the matrix, we propose a hybrid deep learning framework, two-dimensional convolutional bidirectional recurrent neural network (2C-BRNN), for improving the accuracy of 8-class secondary structure prediction. The proposed hybrid framework is to extract the discriminative local interactions between amino-acid residues by two-dimensional convolutional neural networks (2DCNNs), and then further capture long-range interactions between amino-acid residues by bidirectional gated recurrent units (BGRUs) or bidirectional long short-term memory (BLSTM). Specifically, our proposed 2C-BRNNs framework consists of four models: 2DConv-BGRUs, 2DCNN-BGRUs, 2DConv-BLSTM and 2DCNN-BLSTM. Among these four models, the 2DConv- models only contain two-dimensional (2D) convolution operations. Moreover, the 2DCNN- models contain 2D convolutional and pooling operations. Experiments are conducted on four public datasets. The experimental results show that our proposed 2DConv-BLSTM model performs significantly better than the benchmark models. Furthermore, the experiments also demonstrate that the proposed models can extract more meaningful features from the matrix of proteins, and the feature vector dimension is also useful for PSSP. The codes and datasets of our proposed methods are available at https://github.com/guoyanb/JBCB2018/.
引用
收藏
页数:19
相关论文
共 34 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Deep learning for computational biology [J].
Angermueller, Christof ;
Parnamaa, Tanel ;
Parts, Leopold ;
Stegle, Oliver .
MOLECULAR SYSTEMS BIOLOGY, 2016, 12 (07)
[3]   Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics [J].
Asgari, Ehsaneddin ;
Mofrad, Mohammad R. K. .
PLOS ONE, 2015, 10 (11)
[4]  
Busia A, 2017, CORR
[5]   Improved Chou-Fasman method for protein secondary structure prediction [J].
Chen, Hang ;
Gu, Fei ;
Huang, Zhengge .
BMC BIOINFORMATICS, 2006, 7 (Suppl 4)
[6]  
Cheng Jianlin, 2008, IEEE Rev Biomed Eng, V1, P41, DOI 10.1109/RBME.2008.2008239
[7]  
Cho K., 2014, ARXIV, DOI 10.3115/v1/w14-4012
[8]   PREDICTION OF PROTEIN CONFORMATION [J].
CHOU, PY ;
FASMAN, GD .
BIOCHEMISTRY, 1974, 13 (02) :222-245
[9]   MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction [J].
Fang, Chao ;
Shang, Yi ;
Xu, Dong .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2018, 86 (05) :592-598
[10]   6 Deep Learning in Drug Discovery [J].
Gawehn, Erik ;
Hiss, Jan A. ;
Schneider, Gisbert .
MOLECULAR INFORMATICS, 2016, 35 (01) :3-14