DeepDRBP-2L: A New Genome Annotation Predictor for Identifying DNA-Binding Proteins and RNA-Binding Proteins Using Convolutional Neural Network and Long Short-Term Memory

被引:47
作者
Zhang, Jun [1 ]
Chen, Qingcai [1 ]
Liu, Bin [1 ,2 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China
[2] Beijing Inst Technol, Sch Comp Sci, Beijing 100081, Peoples R China
基金
中国国家自然科学基金;
关键词
Proteins; DNA; Benchmark testing; RNA; Deep learning; Databases; Convolutional neural nets; RNA-binding protein; two-level framework; convolutional neural network; long short-term memory; MOTIF; IDENTIFICATION; MACHINE; RECOGNITION; ALIGNMENT; DOMAIN; DPP;
D O I
10.1109/TCBB.2019.2952338
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs) are two kinds of crucial proteins, which are associated with various cellule activities and some important diseases. Accurate identification of DBPs and RBPs facilitate both theoretical research and real world application. Existing sequence-based DBP predictors can accurately identify DBPs but incorrectly predict many RBPs as DBPs, and vice versa, resulting in low prediction precision. Moreover, some proteins (DRBPs) interacting with both DNA and RNA play important roles in gene expression and cannot be identified by existing computational methods. In this study, a two-level predictor named DeepDRBP-2L was proposed by combining Convolutional Neural Network (CNN) and the Long Short-Term Memory (LSTM). It is the first computational method that is able to identify DBPs, RBPs and DRBPs. Rigorous cross-validations and independent tests showed that DeepDRBP-2L is able to overcome the shortcoming of the existing methods and can go one further step to identify DRBPs. Application of DeepDRBP-2L to tomato genome further demonstrated its performance. The webserver of DeepDRBP-2L is freely available at http://bliulab.net/DeepDRBP-2L.
引用
收藏
页码:1451 / 1463
页数:13
相关论文
共 99 条
[51]   DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation [J].
Liu, Bin ;
Wang, Shanyi ;
Wang, Xiaolong .
SCIENTIFIC REPORTS, 2015, 5
[52]   iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance-Pairs and Reduced Alphabet Profile into the General Pseudo Amino Acid Composition [J].
Liu, Bin ;
Xu, Jinghao ;
Lan, Xun ;
Xu, Ruifeng ;
Zhou, Jiyun ;
Wang, Xiaolong ;
Chou, Kuo-Chen .
PLOS ONE, 2014, 9 (09)
[53]   Deep Convolutional Neural Networks for Predicting Hydroxyproline in Proteins [J].
Long, HaiXia ;
Wang, Mi ;
Fu, HaiYan .
CURRENT BIOINFORMATICS, 2017, 12 (03) :233-238
[54]   Sequence Based Prediction of DNA-Binding Proteins Based on Hybrid Feature Selection Using Random Forest and Gaussian Naive Bayes [J].
Lou, Wangchao ;
Wang, Xiaoqing ;
Chen, Fan ;
Chen, Yixiao ;
Jiang, Bo ;
Zhang, Hua .
PLOS ONE, 2014, 9 (01)
[55]  
Luscombe NM, 2000, GENOME BIOL, V1
[56]   Evaluation of different computational methods on 5-methylcytosine sites identification [J].
Lv, Hao ;
Zhang, Zi-Mei ;
Li, Shi-Hao ;
Tan, Jiu-Xin ;
Chen, Wei ;
Lin, Hao .
BRIEFINGS IN BIOINFORMATICS, 2020, 21 (03) :982-995
[57]   DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues [J].
Ma, Xin ;
Guo, Jing ;
Sun, Xiao .
PLOS ONE, 2016, 11 (12)
[58]   PRBP: Prediction of RNA-Binding Proteins Using a Random Forest Algorithm Combined with an RNA-Binding Residue Predictor [J].
Ma, Xin ;
Guo, Jing ;
Xiao, Ke ;
Sun, Xiao .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (06) :1385-1393
[59]   The RNA recognition motif, a plastic RNA-binding platform to regulate post-transcriptional gene expression [J].
Maris, C ;
Dominguez, C ;
Allain, FHT .
FEBS JOURNAL, 2005, 272 (09) :2118-2131
[60]   A Large-Scale Assessment of Nucleic Acids Binding Site Prediction Programs [J].
Miao, Zhichao ;
Westhof, Eric .
PLOS COMPUTATIONAL BIOLOGY, 2015, 11 (12)