AIRBP: Accurate identification of RNA-binding proteins using machine learning techniques

被引:16
作者
Mishra, Avdesh [1 ]
Khanal, Reecha [2 ]
Ul Kabir, Wasi [2 ]
Hoque, Tamjidul [2 ]
机构
[1] Texas A&M Univ Kingsville, Dept Elect Engn & Comp Sci, Kingsville, TX USA
[2] Univ New Orleans, Dept Comp Sci, New Orleans, LA 70148 USA
关键词
Machine learning; Stacking; RNA-binding proteins; RNA-binding prediction; Protein sequence; MOLECULAR RECOGNITION FEATURES; MESSENGER-RNA; COMPUTATIONAL IDENTIFICATION; ENERGY FUNCTION; PREDICTION; SEQUENCE; CLASSIFICATION; ALIGNMENT; PROTEOME; SITES;
D O I
10.1016/j.artmed.2021.102034
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Identification of RNA-binding proteins (RBPs) that bind to ribonucleic acid molecules is an important problem in Computational Biology and Bioinformatics. It becomes indispensable to identify RBPs as they play crucial roles in post-transcriptional control of RNAs and RNA metabolism as well as have diverse roles in various biological processes such as splicing, mRNA stabilization, mRNA localization, and translation, RNA synthesis, folding-unfolding, modification, processing, and degradation. The existing experimental techniques for identifying RBPs are time-consuming and expensive. Therefore, identifying RBPs directly from the sequence using computational methods can be useful to annotate RBPs and assist the experimental design efficiently. In this work, we present a method called AIRBP, which is designed using an advanced machine learning technique, called stacking, to effectively predict RBPs by utilizing features extracted from evolutionary information, physiochemical properties, and disordered properties. Moreover, our method, AIRBP, use the majority vote from RBPPred, DeepRBPPred, and the stacking model for the prediction for RBPs. The results show that AIRBP attains Accuracy (ACC), Balanced Accuracy (BACC), F1-score, and Mathews Correlation Coefficient (MCC) of 95.84 %, 94.71 %, 0.928, and 0.899, respectively, based on the training dataset, using 10-fold cross-validation (CV). Further evaluation of AIRBP on independent test set reveals that it achieves ACC, BACC, F1-score, and MCC of 94.36 %, 94.28 %, 0.897, and 0.860, for Human test set; 91.25 %, 93.00 %, 0.896, and 0.835 for S. cerevisiae test set; and 90.60 %, 90.41 %, 0.934, and 0.775 for A. thaliana test set, respectively. These results indicate that the AIRBP outperforms the existing Deep- and TriPepSVM methods. Therefore, the proposed better-performing AIRBP can be useful for accurate identification and annotation of RBPs directly from the sequence and help gain valuable insight to treat critical diseases.
引用
收藏
页数:14
相关论文
共 83 条
[1]   Posttranscriptional gene regulation by RNA-binding proteins during oxidative stress: implications for cellular senescence [J].
Abdelmohsen, Kotb ;
Kuwano, Yuki ;
Kim, Hyeon Ho ;
Gorospe, Myriarn .
BIOLOGICAL CHEMISTRY, 2008, 389 (03) :243-255
[2]   AN INTRODUCTION TO KERNEL AND NEAREST-NEIGHBOR NONPARAMETRIC REGRESSION [J].
ALTMAN, NS .
AMERICAN STATISTICIAN, 1992, 46 (03) :175-185
[3]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[4]   Computational identification of cis-acting elements affecting post-transcriptional control of gene expression in Saccharomyces cerevisiae [J].
Anderson, JSJ ;
Parker, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (07) :1604-1617
[5]   Intrinsically disordered proteins: regulation and disease [J].
Babu, M. Madan ;
van der Lee, Robin ;
de Groot, Natalia Sanchez ;
Gsponer, Joerg .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2011, 21 (03) :432-440
[6]   Discovering the RNA-Binding Proteome of Plant Leaves with an Improved RNA Interactome Capture Method [J].
Bach-Pages, Marcel ;
Homma, Felix ;
Kourelis, Jiorgos ;
Kaschani, Farnusch ;
Mohammed, Shabaz ;
Kaiser, Markus ;
van der Hoorn, Renier A. L. ;
Castello, Alfredo ;
Preston, Gail M. .
BIOMOLECULES, 2020, 10 (04)
[7]   Modulation of Intrinsically Disordered Protein Function by Post-translational Modifications [J].
Bah, Alaji ;
Forman-Kay, Julie D. .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2016, 291 (13) :6696-6705
[8]   The mRNA-Bound Proteome and Its Global Occupancy Profile on Protein-Coding Transcripts [J].
Baltz, Alexander G. ;
Munschauer, Mathias ;
Schwanhaeusser, Bjoern ;
Vasile, Alexandra ;
Murakawa, Yasuhiro ;
Schueler, Markus ;
Youngs, Noah ;
Penfold-Brown, Duncan ;
Drew, Kevin ;
Milek, Miha ;
Wyler, Emanuel ;
Bonneau, Richard ;
Selbach, Matthias ;
Dieterich, Christoph ;
Landthaler, Markus .
MOLECULAR CELL, 2012, 46 (05) :674-690
[9]   The RNA-binding proteomes from yeast to man harbour conserved enigmRBPs [J].
Beckmann, Benedikt M. ;
Horos, Rastislav ;
Fischer, Bernd ;
Castello, Alfredo ;
Eichelbaum, Katrin ;
Alleaume, Anne-Marie ;
Schwarzl, Thomas ;
Curk, Tomaz ;
Foehr, Sophia ;
Huber, Wolfgang ;
Krijgsveld, Jeroen ;
Hentze, Matthias W. .
NATURE COMMUNICATIONS, 2015, 6
[10]  
Bergstra J, 2012, J MACH LEARN RES, V13, P281