PASSION: an ensemble neural network approach for identifying the binding sites of RBPs on circRNAs

被引:67
作者
Jia, Cangzhi [1 ]
Bi, Yue [1 ]
Chen, Jinxiang [2 ,3 ]
Leier, Andre [4 ,5 ]
Li, Fuyi [2 ,3 ]
Song, Jiangning [2 ,3 ,6 ]
机构
[1] Dalian Maritime Univ, Sch Sci, Dalian 116026, Peoples R China
[2] Monash Univ, Dept Biochem & Mol Biol, Monash Biomed Discovery Inst, Melbourne, Vic 3800, Australia
[3] Monash Univ, Fac Informat Technol, Monash Ctr Data Sci, Melbourne, Vic 3800, Australia
[4] Univ Alabama Birmingham, Dept Genet, Sch Med, Birmingham, AL USA
[5] Univ Alabama Birmingham, Dept Cell Dev & Integrat Biol, Sch Med, Birmingham, AL USA
[6] Monash Univ, ARC Ctr Excellence Adv Mol Imaging, Melbourne, Vic 3800, Australia
基金
英国医学研究理事会; 澳大利亚研究理事会;
关键词
CIRCULAR RNAS; PROTEINS; DNA;
D O I
10.1093/bioinformatics/btaa522
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Different from traditional linear RNAs (containing 5' and 3' ends), circular RNAs (circRNAs) are a special type of RNAs that have a closed ring structure. Accumulating evidence has indicated that circRNAs can directly bind proteins and participate in a myriad of different biological processes. Results: For identifying the interaction of circRNAs with 37 different types of circRNA-binding proteins (RBPs), we develop an ensemble neural network, termed PASSION, which is based on the concatenated artificial neural network (ANN) and hybrid deep neural network frameworks. Specifically, the input of the ANN is the optimal feature subset for each RBP, which has been selected from six types of feature encoding schemes through incremental feature selection and application of the XGBoost algorithm. In turn, the input of the hybrid deep neural network is a stacked codon-based scheme. Benchmarking experiments indicate that the ensemble neural network reaches the average best area under the curve (AUC) of 0.883 across the 37 circRNA datasets when compared with XGBoost, k-nearest neighbor, support vector machine, random forest, logistic regression and Naive Bayes. Moreover, each of the 37 RBP models is extensively tested by performing independent tests, with the varying sequence similarity thresholds of 0.8, 0.7, 0.6 and 0.5, respectively. The corresponding average AUC obtained are 0.883, 0.876, 0.868 and 0.883, respectively, highlighting the effectiveness and robustness of PASSION. Extensive benchmarking experiments demonstrate that PASSION achieves a competitive performance for identifying binding sites between circRNA and RBPs, when compared with several state-of-the-art methods.
引用
收藏
页码:4276 / 4282
页数:7
相关论文
共 53 条
  • [1] Identification of HuR target circular RNAs uncovers suppression of PABPN1 translation by CircPABPN1
    Abdelmohsen, Kotb
    Panda, Amaresh C.
    Munk, Rachel
    Grammatikakis, Ioannis
    Dudekula, Dawood B.
    De, Supriyo
    Kim, Jiyoung
    Noh, Ji Heon
    Kim, Kyoung Mi
    Martindale, Jennifer L.
    Gorospe, Myriam
    [J]. RNA BIOLOGY, 2017, 14 (03) : 361 - 369
  • [2] Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning
    Alipanahi, Babak
    Delong, Andrew
    Weirauch, Matthew T.
    Frey, Brendan J.
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (08) : 831 - +
  • [3] circRNA Biogenesis Competes with Pre-mRNA Splicing
    Ashwal-Fluss, Reut
    Meyer, Markus
    Pamudurti, Nagarjuna Reddy
    Ivanov, Andranik
    Bartok, Osnat
    Hanan, Mor
    Evantal, Naveh
    Memczak, Sebastian
    Rajewsky, Nikolaus
    Kadener, Sebastian
    [J]. MOLECULAR CELL, 2014, 56 (01) : 55 - 66
  • [4] Banki-Koshki H, 2017, IRAN CONF ELECTR ENG, P14, DOI 10.1109/IranianCEE.2017.7985420
  • [5] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [6] iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties
    Chen, Wei
    Yang, Hui
    Feng, Pengmian
    Ding, Hui
    Lin, Hao
    [J]. BIOINFORMATICS, 2017, 33 (22) : 3518 - 3523
  • [7] iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data
    Chen, Zhen
    Zhao, Pei
    Li, Fuyi
    Marquez-Lago, Tatiana T.
    Leier, Andre
    Revote, Jerico
    Zhu, Yan
    Powell, David R.
    Akutsu, Tatsuya
    Webb, Geoffrey, I
    Chou, Kuo-Chen
    Smith, A. Ian
    Daly, Roger J.
    Li, Jian
    Song, Jiangning
    [J]. BRIEFINGS IN BIOINFORMATICS, 2020, 21 (03) : 1047 - 1057
  • [8] The RNA Binding Protein Quaking Regulates Formation of circRNAs
    Conn, Simon J.
    Pillman, Katherine A.
    Toubia, John
    Conn, Vanessa M.
    Salmanidis, Marika
    Phillips, Caroline A.
    Roslan, Suraya
    Schreiber, Andreas W.
    Gregory, Philip A.
    Goodall, Gregory J.
    [J]. CELL, 2015, 160 (06) : 1125 - 1134
  • [9] Foxo3 circular RNA retards cell cycle progression via forming ternary complexes with p21 and CDK2
    Du, William W.
    Yang, Weining
    Liu, Elizabeth
    Yang, Zhenguo
    Dhaliwal, Preet
    Yang, Burton B.
    [J]. NUCLEIC ACIDS RESEARCH, 2016, 44 (06) : 2846 - 2858
  • [10] CircInteractome: A web tool for exploring circular RNAs and their interacting proteins and microRNAs
    Dudekulay, Dawood B.
    Panda, Amaresh C.
    Grammatikakis, Ioannis
    De, Supriyo
    Abdelmohsen, Kotb
    Gorospe, Myriam
    [J]. RNA BIOLOGY, 2016, 13 (01) : 34 - 42