PST-PRNA: prediction of RNA-binding sites using protein surface topography and deep learning

被引:21
作者
Li, Pengpai [1 ]
Liu, Zhi-Ping [1 ]
机构
[1] Shandong Univ, Sch Control Sci & Engn, Dept Biomed Engn, Jinan 250061, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
DATABASE; RECOGNITION; GENERATION; FEATURES;
D O I
10.1093/bioinformatics/btac078
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Protein-RNA interactions play essential roles in many biological processes, including pre-mRNA processing, post-transcriptional gene regulation and RNA degradation. Accurate identification of binding sites on RNA-binding proteins (RBPs) is important for functional annotation and site-directed mutagenesis. Experimental assays to sparse RBPs are precise and convincing but also costly and time consuming. Therefore, flexible and reliable computational methods are required to recognize RNA-binding residues. Results: In this work, we propose PST-PRNA, a novel model for predicting RNA-binding sites (PRNA) based on protein surface topography (PST). Taking full advantage of the 3D structural information of protein, PST-PRNA creates representative topography images of the entire protein surface by mapping it onto a unit spherical surface. Four kinds of descriptors are encoded to represent residues on the surface. Then, the potential features are integrated and optimized by using deep learning models. We compile a comprehensive non-redundant RBP dataset to train and test PST-PRNA using 10-fold cross-validation. Numerous experiments demonstrate PST-PRNA learns successfully the latent structural information of protein surface. On the non-redundant dataset with sequence identity of 0.3, PST-PRNA achieves area under the receiver operating characteristic curves (AUC) value of 0.860 and Matthew's correlation coefficient value of 0.420. Furthermore, we construct a completely independent test dataset for justification and comparison. PST-PRNA achieves AUC value of 0.913 on the independent dataset, which is superior to the other state-of-the-art methods.
引用
收藏
页码:2162 / 2168
页数:7
相关论文
共 40 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] Reorganizing the protein space at the Universal Protein Resource (UniProt)
    Apweiler, Rolf
    Martin, Maria Jesus
    O'Donovan, Claire
    Magrane, Michele
    Alam-Faruque, Yasmin
    Antunes, Ricardo
    Casanova, Elisabet Barrera
    Bely, Benoit
    Bingley, Mark
    Bower, Lawrence
    Bursteinas, Borisas
    Chan, Wei Mun
    Chavali, Gayatri
    Da Silva, Alan
    Dimmer, Emily
    Eberhardt, Ruth
    Fazzini, Francesco
    Fedotov, Alexander
    Garavelli, John
    Castro, Leyla Garcia
    Gardner, Michael
    Hieta, Reija
    Huntley, Rachael
    Jacobsen, Julius
    Legge, Duncan
    Liu, Wudong
    Luo, Jie
    Orchard, Sandra
    Patient, Samuel
    Pichler, Klemens
    Poggioli, Diego
    Pontikos, Nikolas
    Pundir, Sangya
    Rosanoff, Steven
    Sawford, Tony
    Sehra, Harminder
    Turner, Edward
    Wardell, Tony
    Watkins, Xavier
    Corbett, Matt
    Donnelly, Mike
    van Rensburg, Pieter
    Goujon, Mickael
    McWilliam, Hamish
    Lopez, Rodrigo
    Xenarios, Ioannis
    Bougueleret, Lydie
    Bridge, Alan
    Poux, Sylvain
    Redaschi, Nicole
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) : D71 - D75
  • [3] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [4] RBP2GO: a comprehensive pan-species database on RNA-binding proteins, their interactions and functions
    Caudron-Herger, Maiwen
    Jansen, Ralf E.
    Wassmer, Elsa
    Diederichs, Sven
    [J]. NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) : D425 - D436
  • [5] SOLVENT-ACCESSIBLE SURFACES OF PROTEINS AND NUCLEIC-ACIDS
    CONNOLLY, ML
    [J]. SCIENCE, 1983, 221 (4612) : 709 - 713
  • [6] How RNA-Binding Proteins Interact with RNA: Molecules and Mechanisms
    Corley, Meredith
    Burns, Margaret C.
    Yeo, Gene W.
    [J]. MOLECULAR CELL, 2020, 78 (01) : 9 - 29
  • [7] CD-HIT: accelerated for clustering the next-generation sequencing data
    Fu, Limin
    Niu, Beifang
    Zhu, Zhengwei
    Wu, Sitao
    Li, Weizhong
    [J]. BIOINFORMATICS, 2012, 28 (23) : 3150 - 3152
  • [8] Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning
    Gainza, P.
    Sverrisson, F.
    Monti, F.
    Rodola, E.
    Boscaini, D.
    Bronstein, M. M.
    Correia, B. E.
    [J]. NATURE METHODS, 2020, 17 (02) : 184 - +
  • [9] Identity Mappings in Deep Residual Networks
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 : 630 - 645
  • [10] Ioffe S., 2015, P INT C MACH LEARN L, P448