DeepDRBP-2L: A New Genome Annotation Predictor for Identifying DNA-Binding Proteins and RNA-Binding Proteins Using Convolutional Neural Network and Long Short-Term Memory

被引:47
作者
Zhang, Jun [1 ]
Chen, Qingcai [1 ]
Liu, Bin [1 ,2 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China
[2] Beijing Inst Technol, Sch Comp Sci, Beijing 100081, Peoples R China
基金
中国国家自然科学基金;
关键词
Proteins; DNA; Benchmark testing; RNA; Deep learning; Databases; Convolutional neural nets; RNA-binding protein; two-level framework; convolutional neural network; long short-term memory; MOTIF; IDENTIFICATION; MACHINE; RECOGNITION; ALIGNMENT; DOMAIN; DPP;
D O I
10.1109/TCBB.2019.2952338
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs) are two kinds of crucial proteins, which are associated with various cellule activities and some important diseases. Accurate identification of DBPs and RBPs facilitate both theoretical research and real world application. Existing sequence-based DBP predictors can accurately identify DBPs but incorrectly predict many RBPs as DBPs, and vice versa, resulting in low prediction precision. Moreover, some proteins (DRBPs) interacting with both DNA and RNA play important roles in gene expression and cannot be identified by existing computational methods. In this study, a two-level predictor named DeepDRBP-2L was proposed by combining Convolutional Neural Network (CNN) and the Long Short-Term Memory (LSTM). It is the first computational method that is able to identify DBPs, RBPs and DRBPs. Rigorous cross-validations and independent tests showed that DeepDRBP-2L is able to overcome the shortcoming of the existing methods and can go one further step to identify DRBPs. Application of DeepDRBP-2L to tomato genome further demonstrated its performance. The webserver of DeepDRBP-2L is freely available at http://bliulab.net/DeepDRBP-2L.
引用
收藏
页码:1451 / 1463
页数:13
相关论文
共 99 条
[1]  
Al-Rfou Rami, 2016, Theano: A Python framework for fast computation of mathematical expressions
[2]   Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning [J].
Alipanahi, Babak ;
Delong, Andrew ;
Weirauch, Matthew T. ;
Frey, Brendan J. .
NATURE BIOTECHNOLOGY, 2015, 33 (08) :831-+
[3]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[4]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[5]   AT-hook motifs identified in a wide variety of DNA binding proteins [J].
Aravind, L ;
Landsman, D .
NUCLEIC ACIDS RESEARCH, 1998, 26 (19) :4413-4421
[6]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[7]   The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 [J].
Boeckmann, B ;
Bairoch, A ;
Apweiler, R ;
Blatter, MC ;
Estreicher, A ;
Gasteiger, E ;
Martin, MJ ;
Michoud, K ;
O'Donovan, C ;
Phan, I ;
Pilbout, S ;
Schneider, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :365-370
[8]  
BRENNAN RG, 1989, J BIOL CHEM, V264, P1903
[9]   iDNAProt-ES: Identification of DNA-binding Proteins Using Evolutionary and Structural Features [J].
Chowdhury, Shahana Yasmin ;
Shatabda, Swakkhar ;
Dehzangi, Abdollah .
SCIENTIFIC REPORTS, 2017, 7
[10]   Identification of mitochondrial proteins of malaria parasite using analysis of variance [J].
Ding, Hui ;
Li, Dongmei .
AMINO ACIDS, 2015, 47 (02) :329-333