Deep-RBPPred: Predicting RNA binding proteins in the proteome scale based on deep learning

被引:45
作者
Zheng, Jinfang [1 ]
Zhang, Xiaoli [1 ]
Zhao, Xunyi [1 ]
Tong, Xiaoxue [1 ]
Hong, Xu [1 ]
Xie, Juan [1 ]
Liu, Shiyong [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Phys, Wuhan 430074, Hubei, Peoples R China
来源
SCIENTIFIC REPORTS | 2018年 / 8卷
关键词
BOUND PROTEOME; SITES; DNA; IDENTIFICATION; INSIGHTS; REVEALS; GENOME; YEAST;
D O I
10.1038/s41598-018-33654-x
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
RNA binding protein (RBP) plays an important role in cellular processes. Identifying RBPs by computation and experiment are both essential. Recently, an RBP predictor, RBPPred, is proposed in our group to predict RBPs. However, RBPPred is too slow for that it needs to generate PSSM matrix as its feature. Herein, based on the protein feature of RBPPred and Convolutional Neural Network (CNN), we develop a deep learning model called Deep-RBPPred. With the balance and imbalance training set, we obtain Deep-RBPPred-balance and Deep-RBPPred-imbalance models. Deep-RBPPred has three advantages comparing to previous methods. (1) Deep-RBPPred only needs few physicochemical properties based on protein sequences. (2) Deep-RBPPred runs much faster. (3) Deep-RBPPred has a good generalization ability. In the meantime, Deep-RBPPred is still as good as the state-of-the-art method. Testing in A. thaliana, S. cerevisiae and H. sapiens proteomes, MCC values are 0.82 (0.82), 0.65 (0.69) and 0.85 (0.80) for balance model (imbalance model) when the score cutoff is set to 0.5, respectively. In the same testing dataset, different machine learning algorithms (CNN and SVM) are also compared. The results show that CNN-based model can identify more RBPs than SVM-based. In comparing the balance and imbalance model, both CNN-base and SVM-based tend to favor the majority class in the imbalance set. Deep- RBPPred forecasts 280 (balance model) and 265 (imbalance model) of 299 new RBP. The sensitivity of balance model is about 7% higher than the state-of-the-art method. We also apply deep-RBPPred to 30 eukaryotes and 109 bacteria proteomes downloaded from Uniprot to estimate all possible RBPs. The estimating result shows that rates of RBPs in eukaryote proteomes are much higher than bacteria proteomes.
引用
收藏
页数:9
相关论文
共 43 条
[1]  
Abadi M., 2015, PREPRINT
[2]   Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning [J].
Alipanahi, Babak ;
Delong, Andrew ;
Weirauch, Matthew T. ;
Frey, Brendan J. .
NATURE BIOTECHNOLOGY, 2015, 33 (08) :831-+
[3]  
[Anonymous], P 14 INT C ART INT S
[4]  
[Anonymous], 2011, ACM T INTEL SYST TEC, DOI DOI 10.1145/1961189.1961199
[5]  
[Anonymous], ADV NEURAL INFORM PR
[6]   The mRNA-Bound Proteome and Its Global Occupancy Profile on Protein-Coding Transcripts [J].
Baltz, Alexander G. ;
Munschauer, Mathias ;
Schwanhaeusser, Bjoern ;
Vasile, Alexandra ;
Murakawa, Yasuhiro ;
Schueler, Markus ;
Youngs, Noah ;
Penfold-Brown, Duncan ;
Drew, Kevin ;
Milek, Miha ;
Wyler, Emanuel ;
Bonneau, Richard ;
Selbach, Matthias ;
Dieterich, Christoph ;
Landthaler, Markus .
MOLECULAR CELL, 2012, 46 (05) :674-690
[7]   UniProt: a hub for protein information [J].
Bateman, Alex ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Apweiler, Rolf ;
Alpi, Emanuele ;
Antunes, Ricardo ;
Arganiska, Joanna ;
Bely, Benoit ;
Bingley, Mark ;
Bonilla, Carlos ;
Britto, Ramona ;
Bursteinas, Borisas ;
Chavali, Gayatri ;
Cibrian-Uhalte, Elena ;
Da Silva, Alan ;
De Giorgi, Maurizio ;
Dogan, Tunca ;
Fazzini, Francesco ;
Gane, Paul ;
Cas-tro, Leyla Garcia ;
Garmiri, Penelope ;
Hatton-Ellis, Emma ;
Hieta, Reija ;
Huntley, Rachael ;
Legge, Duncan ;
Liu, Wudong ;
Luo, Jie ;
MacDougall, Alistair ;
Mutowo, Prudence ;
Nightin-gale, Andrew ;
Orchard, Sandra ;
Pichler, Klemens ;
Poggioli, Diego ;
Pundir, Sangya ;
Pureza, Luis ;
Qi, Guoying ;
Rosanoff, Steven ;
Saidi, Rabie ;
Sawford, Tony ;
Shypitsyna, Aleksandra ;
Turner, Edward ;
Volynkin, Vladimir ;
Wardell, Tony ;
Watkins, Xavier ;
Zellner, Hermann ;
Cowley, Andrew ;
Figueira, Luis ;
Li, Weizhong ;
McWilliam, Hamish .
NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) :D204-D212
[8]   The RNA-binding proteomes from yeast to man harbour conserved enigmRBPs [J].
Beckmann, Benedikt M. ;
Horos, Rastislav ;
Fischer, Bernd ;
Castello, Alfredo ;
Eichelbaum, Katrin ;
Alleaume, Anne-Marie ;
Schwarzl, Thomas ;
Curk, Tomaz ;
Foehr, Sophia ;
Huber, Wolfgang ;
Krijgsveld, Jeroen ;
Hentze, Matthias W. .
NATURE COMMUNICATIONS, 2015, 6
[9]   Predicting protein associations with long noncoding RNAs [J].
Bellucci, Matteo ;
Agostini, Federico ;
Masin, Marianela ;
Tartaglia, Gian Gaetano .
NATURE METHODS, 2011, 8 (06) :444-445
[10]   SONAR Discovers RNA-Binding Proteins from Analysis of Large-Scale Protein-Protein Interactomes [J].
Brannan, Kristopher W. ;
Jin, Wenhao ;
Huelga, Stephanie C. ;
Banks, Charles A. S. ;
Gilmore, Joshua M. ;
Florens, Laurence ;
Washburn, Michael P. ;
Van Nostrand, Eric L. ;
Pratt, Gabriel A. ;
Schwinn, Marie K. ;
Daniels, Danette L. ;
Yeo, Gene W. .
MOLECULAR CELL, 2016, 64 (02) :282-293