EDLMFC: an ensemble deep learning framework with multi-scale features combination for ncRNA-protein interaction prediction

被引:25
作者
Wang, Jingjing [1 ]
Zhao, Yanpeng [1 ]
Gong, Weikang [1 ]
Liu, Yang [1 ]
Wang, Mei [1 ]
Huang, Xiaoqian [1 ]
Tan, Jianjun [1 ]
机构
[1] Beijing Univ Technol, Dept Biomed Engn, Fac Environm & Life, Beijing Int Sci & Technol Cooperat Base Intellige, Beijing 100124, Peoples R China
基金
北京市自然科学基金;
关键词
ncRNA-protein interactions; Multi-scale features combination; Conjoint k-mer; Ensemble deep learning; Independent test; ncRNA-protein networks; LONG NONCODING RNAS; SECONDARY STRUCTURE; BINDING PROTEINS; NEURAL-NETWORKS; ACCURATE PREDICTION; GENE-EXPRESSION; WEB SERVER; SEQUENCE; DNA; IDENTIFICATION;
D O I
10.1186/s12859-021-04069-9
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Non-coding RNA (ncRNA) and protein interactions play essential roles in various physiological and pathological processes. The experimental methods used for predicting ncRNA-protein interactions are time-consuming and labor-intensive. Therefore, there is an increasing demand for computational methods to accurately and efficiently predict ncRNA-protein interactions. Results: In this work, we presented an ensemble deep learning-based method, EDLMFC, to predict ncRNA-protein interactions using the combination of multi-scale features, including primary sequence features, secondary structure sequence features, and tertiary structure features. Conjoint k-mer was used to extract protein/ncRNA sequence features, integrating tertiary structure features, then fed into an ensemble deep learning model, which combined convolutional neural network (CNN) to learn dominating biological information with bi-directional long short-term memory network (BLSTM) to capture long-range dependencies among the features identified by the CNN. Compared with other state-of-the-art methods under five-fold cross-validation, EDLMFC shows the best performance with accuracy of 93.8%, 89.7%, and 86.1% on RPI1807, NPlnter v2.0, and RP1488 datasets, respectively. The results of the independent test demonstrated that EDLMFC can effectively predict potential ncRNA-protein interactions from different organisms. Furtherly, EDLMFC is also shown to predict hub ncRNAs and proteins presented in ncRNA-protein networks of Mus musculus successfully. Conclusions: In general, our proposed method EDLMFC improved the accuracy of ncRNA-protein interaction predictions and anticipated providing some helpful guidance on ncRNA functions research. The source code of EDLMFC and the datasets used in this work are available at https://github.com/JingjingWang-87/EDLMFC.
引用
收藏
页数:19
相关论文
共 74 条
[1]   catRAPID omics: a web server for large-scale prediction of protein-RNA interactions [J].
Agostini, Federico ;
Zanzoni, Andreas ;
Klus, Petr ;
Marchese, Domenica ;
Cirillo, Davide ;
Gaetano Tartaglia, Gian .
BIOINFORMATICS, 2013, 29 (22) :2928-2930
[2]   rpiCOOL: A tool for In Silico RNA-protein interaction detection using random forest [J].
Akbaripour-Elahabad, Mohammad ;
Zahiri, Javad ;
Rafeh, Reza ;
Eslami, Morteza ;
Azari, Mahboobeh .
JOURNAL OF THEORETICAL BIOLOGY, 2016, 402 :1-8
[3]   Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning [J].
Alipanahi, Babak ;
Delong, Andrew ;
Weirauch, Matthew T. ;
Frey, Brendan J. .
NATURE BIOTECHNOLOGY, 2015, 33 (08) :831-+
[4]  
[Anonymous], 2015, ACS SYM SER
[5]   Predicting protein associations with long noncoding RNAs [J].
Bellucci, Matteo ;
Agostini, Federico ;
Masin, Marianela ;
Tartaglia, Gian Gaetano .
NATURE METHODS, 2011, 8 (06) :444-445
[6]   A deep neural network approach for learning intrinsic protein-RNA binding preferences [J].
Ben-Bassat, Ilan ;
Chor, Benny ;
Orenstein, Yaron .
BIOINFORMATICS, 2018, 34 (17) :638-646
[7]   Long non-coding RNAs and complex diseases: from experimental results to computational models [J].
Chen, Xing ;
Yan, Chenggang Clarence ;
Zhang, Xu ;
You, Zhu-Hong .
BRIEFINGS IN BIOINFORMATICS, 2017, 18 (04) :558-576
[8]   DM-RPIs: Predicting ncRNA-protein interactions using stacked ensembling strategy [J].
Cheng, Shuping ;
Zhang, Lu ;
Tan, Jianjun ;
Gong, Weikang ;
Li, Chunhua ;
Zhang, Xiaoyi .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2019, 83
[9]   Integration of biological networks and gene expression data using Cytoscape [J].
Cline, Melissa S. ;
Smoot, Michael ;
Cerami, Ethan ;
Kuchinsky, Allan ;
Landys, Nerius ;
Workman, Chris ;
Christmas, Rowan ;
Avila-Campilo, Iliana ;
Creech, Michael ;
Gross, Benjamin ;
Hanspers, Kristina ;
Isserlin, Ruth ;
Kelley, Ryan ;
Killcoyne, Sarah ;
Lotia, Samad ;
Maere, Steven ;
Morris, John ;
Ono, Keiichiro ;
Pavlovic, Vuk ;
Pico, Alexander R. ;
Vailaya, Aditya ;
Wang, Peng-Liang ;
Adler, Annette ;
Conklin, Bruce R. ;
Hood, Leroy ;
Kuiper, Martin ;
Sander, Chris ;
Schmulevich, Ilya ;
Schwikowski, Benno ;
Warner, Guy J. ;
Ideker, Trey ;
Bader, Gary D. .
NATURE PROTOCOLS, 2007, 2 (10) :2366-2382
[10]   Construction of Complex Features for Computational Predicting ncRNA-Protein Interaction [J].
Dai, Qiguo ;
Guo, Maozu ;
Duan, Xiaodong ;
Teng, Zhixia ;
Fu, Yueyue .
FRONTIERS IN GENETICS, 2019, 10