Efficient mapping of RNA-binding residues in RNA-binding proteins using local sequence features of binding site residues in protein-RNA complexes

被引:2
|
作者
Agarwal, Ankita [1 ,2 ]
Kant, Shri [2 ]
Bahadur, Ranjit Prasad [2 ,3 ]
机构
[1] Indian Inst Technol Kharagpur, Sch Bio Sci, Kharagpur, India
[2] Indian Inst Technol Kharagpur, Dept Biotechnol, Computat Struct Biol Lab, Kharagpur, India
[3] Indian Inst Technol Kharagpur, Dept Biotechnol, Computat Struct Biol Lab, Kharagpur 721302, India
关键词
balanced random forest; machine learning; prediction; protein-RNA interactions; RNA-binding proteins; RNA-binding residues; PREDICTION; RECOGNITION; SVM; DNA; NUCLEOTIDES; SERVER;
D O I
10.1002/prot.26528
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein-RNA interactions play vital roles in plethora of biological processes such as regulation of gene expression, protein synthesis, mRNA processing and biogenesis. Identification of RNA-binding residues (RBRs) in proteins is essential to understand RNA-mediated protein functioning, to perform site-directed mutagenesis and to develop novel targeted drug therapies. Moreover, the extensive gap between sequence and structural data restricts the identification of binding sites in unsolved structures. However, efficient use of computational methods demanding only sequence to identify binding residues can bridge this huge sequence-structure gap. In this study, we have extensively studied protein-RNA interface in known RNA-binding proteins (RBPs). We find that the interface is highly enriched in basic and polar residues with Gly being the most common interface neighbor. We investigated several amino acid features and developed a method to predict putative RBRs from amino acid sequence. We have implemented balanced random forest (BRF) classifier with local residue features of protein sequences for prediction. With 5-fold cross-validations, the sequence pattern derived dipeptide composition based BRF model (DCP-BRF) resulted in an accuracy of 87.9%, specificity of 88.8%, sensitivity of 82.2%, Mathew's correlation coefficient of 0.60 and AUC of 0.93, performing better than few existing methods. We further validated our prediction model on known human RBPs through RBR prediction and could map similar to 54% of them. Further, knowledge of binding site preferences obtained from computational predictions combined with experimental validations of potential RNA binding sites can enhance our understanding of protein-RNA interactions. This may serve to accelerate investigations on functional roles of many novel RBPs.
引用
收藏
页码:1361 / 1379
页数:19
相关论文
共 50 条
  • [1] RNA-binding residues prediction using structural features
    Huizhu Ren
    Ying Shen
    BMC Bioinformatics, 16
  • [2] RNA-binding residues prediction using structural features
    Ren, Huizhu
    Shen, Ying
    BMC BIOINFORMATICS, 2015, 16
  • [3] Beyond RNA-binding domains: determinants of protein-RNA binding
    Zigdon, Inbal
    Carmi, Miri
    Brodsky, Sagie
    Rosenwaser, Zohar
    Barkai, Naama
    Jonas, Felix
    RNA, 2024, 30 (12) : 1620 - 1633
  • [4] SVM based prediction of RNA-binding proteins using binding residues and evolutionary information
    Kumar, Manish
    Gromiha, M. Michael
    Raghava, Gajendra P. S.
    JOURNAL OF MOLECULAR RECOGNITION, 2011, 24 (02) : 303 - 313
  • [5] The key role of large clusters of polar residues of RNA-binding proteins in the formation of complexes with RNA
    Chirgadze Yu.N.
    Larionova E.A.
    Molecular Biology, 2005, 39 (6) : 892 - 905
  • [6] RNA-binding residues in sequence space: Conservation and interaction patterns
    Spriggs, Ruth V.
    Jones, Susan
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2009, 33 (05) : 397 - 403
  • [7] RNA-binding proteins that lack canonical RNA-binding domains are rarely sequence-specific
    Ray, Debashish
    Laverty, Kaitlin U.
    Jolma, Arttu
    Nie, Kate
    Samson, Reuben
    Pour, Sara E.
    Tam, Cyrus L.
    von Krosigk, Niklas
    Nabeel-Shah, Syed
    Albu, Mihai
    Zheng, Hong
    Perron, Gabrielle
    Lee, Hyunmin
    Najafabadi, Hamed
    Blencowe, Benjamin
    Greenblatt, Jack
    Morris, Quaid
    Hughes, Timothy R.
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [8] Identification of RNA-binding protein residues using machine learning approaches
    Huang, HC
    2005 EMERGING INFORMATION TECHNOLOGY CONFERENCE (EITC), 2005, : 120 - 121
  • [9] RNA-binding proteins that lack canonical RNA-binding domains are rarely sequence-specific
    Debashish Ray
    Kaitlin U. Laverty
    Arttu Jolma
    Kate Nie
    Reuben Samson
    Sara E. Pour
    Cyrus L. Tam
    Niklas von Krosigk
    Syed Nabeel-Shah
    Mihai Albu
    Hong Zheng
    Gabrielle Perron
    Hyunmin Lee
    Hamed Najafabadi
    Benjamin Blencowe
    Jack Greenblatt
    Quaid Morris
    Timothy R. Hughes
    Scientific Reports, 13
  • [10] RNA-binding proteins in RNA interference
    Kotelnikov R.N.
    Shpiz S.G.
    Kalmykova A.I.
    Gvozdev V.A.
    Molecular Biology, 2006, 40 (4) : 528 - 540