A Data Driven Model for Predicting RNA-Protein Interactions based on Gradient Boosting Machine

被引:10
作者
Jain, Dharm Skandh [1 ,3 ]
Gupte, Sanket Rajan [1 ]
Aduri, Raviprasad [2 ]
机构
[1] Birla Inst Technol & Sci Pilani, Dept Comp Sci & Informat Syst, KK Birla Goa Campus, South Goa, Goa, India
[2] Birla Inst Technol & Sci Pilani, Dept Biol Sci, KK Birla Goa Campus, South Goa 403726, Goa, India
[3] Warsaw Univ Technol, Fac Elect & Informat Technol, Warsaw, Poland
来源
SCIENTIFIC REPORTS | 2018年 / 8卷
关键词
LONG NONCODING RNA; BINDING PROTEINS; CLIP;
D O I
10.1038/s41598-018-27814-2
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
RNA protein interactions (RPI) play a pivotal role in the regulation of various biological processes. Experimental validation of RPI has been time-consuming, paving the way for computational prediction methods. The major limiting factor of these methods has been the accuracy and confidence of the predictions, and our in-house experiments show that they fail to accurately predict RPI involving short RNA sequences such as TERRA RNA. Here, we present a data-driven model for RPI prediction using a gradient boosting classifier. Amino acids and nucleotides are classified based on the high-resolution structural data of RNA protein complexes. The minimum structural unit consisting of five residues is used as the descriptor. Comparative analysis of existing methods shows the consistently higher performance of our method irrespective of the length of RNA present in the RPI. The method has been successfully applied to map RPI networks involving both long noncoding RNA as well as TERRA RNA. The method is also shown to successfully predict RNA and protein hubs present in RPI networks of four different organisms. The robustness of this method will provide a way for predicting RPI networks of yet unknown interactions for both long noncoding RNA and microRNA.
引用
收藏
页数:10
相关论文
共 42 条
  • [1] AMBER force field parameters for the naturally occurring modified nucleosides in RNA
    Aduri, Raviprasad
    Psciuk, Brian T.
    Saro, Pirro
    Taniga, Hariprakash
    Schlegel, H. Bernhard
    SantaLucia, John, Jr.
    [J]. JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2007, 3 (04) : 1464 - 1475
  • [2] [Anonymous], BIOMOLECULES
  • [3] Predicting protein associations with long noncoding RNAs
    Bellucci, Matteo
    Agostini, Federico
    Masin, Marianela
    Tartaglia, Gian Gaetano
    [J]. NATURE METHODS, 2011, 8 (06) : 444 - 445
  • [4] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [5] An intermediate step in the recognition of tRNAAsp by Aspartyl-tRNA synthetase
    Briand, C
    Poterszman, A
    Eiler, S
    Webster, G
    Thierry, JC
    Moras, D
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2000, 299 (04) : 1051 - 1060
  • [6] A Long Noncoding RNA Mediates Both Activation and Repression of Immune Response Genes
    Carpenter, Susan
    Aiello, Daniel
    Atianand, Maninjay K.
    Ricci, Emiliano P.
    Gandhi, Pallavi
    Hall, Lisa L.
    Byron, Meg
    Monks, Brian
    Henry-Bezy, Meabh
    Lawrence, Jeanne B.
    O'Neill, Luke A. J.
    Moore, Melissa J.
    Caffrey, Daniel R.
    Fitzgerald, Katherine A.
    [J]. SCIENCE, 2013, 341 (6147) : 789 - 792
  • [7] The Noncoding RNA Revolution-Trashing Old Rules to Forge New Ones
    Cech, Thomas R.
    Steitz, Joan A.
    [J]. CELL, 2014, 157 (01) : 77 - 94
  • [8] Xist localization and function: new insights from multiple levels
    Cerase, Andrea
    Pintacuda, Greta
    Tattermusch, Anna
    Avner, Philip
    [J]. GENOME BIOLOGY, 2015, 16
  • [9] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [10] Danan C, 2016, METHODS MOL BIOL, V1358, P153, DOI 10.1007/978-1-4939-3067-8_10