Prediction of Long Non-Coding RNAs Based on Deep Learning

被引:29
作者
Liu, Xiu-Qin [1 ]
Li, Bing-Xiu [1 ]
Zeng, Guan-Rong [1 ]
Liu, Qiao-Yue [1 ]
Ai, Dong-Mei [1 ]
机构
[1] Univ Sci & Technol Beijing, Sch Math & Phys, Beijing 100083, Peoples R China
基金
中国国家自然科学基金;
关键词
deep learning; long non-coding RNAs; k-mer; BLSTM; CNN; GloVe; GENOME ANNOTATION; FEATURES; IDENTIFICATION; TRANSCRIPTION; SEQUENCES; NETWORKS; LNCRNAS;
D O I
10.3390/genes10040273
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
With the rapid development of high-throughput sequencing technology, a large number of transcript sequences have been discovered, and how to identify long non-coding RNAs (lncRNAs) from transcripts is a challenging task. The identification and inclusion of lncRNAs not only can more clearly help us to understand life activities themselves, but can also help humans further explore and study the disease at the molecular level. At present, the detection of lncRNAs mainly includes two forms of calculation and experiment. Due to the limitations of bio sequencing technology and ineluctable errors in sequencing processes, the detection effect of these methods is not very satisfactory. In this paper, we constructed a deep-learning model to effectively distinguish lncRNAs from mRNAs. We used k-mer embedding vectors obtained through training the GloVe algorithm as input features and set up the deep learning framework to include a bidirectional long short-term memory model (BLSTM) layer and a convolutional neural network (CNN) layer with three additional hidden layers. By testing our model, we have found that it obtained the best values of 97.9%, 96.4% and 99.0% in F1score, accuracy and auROC, respectively, which showed better classification performance than the traditional PLEK, CNCI and CPC methods for identifying lncRNAs. We hope that our model will provide effective help in distinguishing mature mRNAs from lncRNAs, and become a potential tool to help humans understand and detect the diseases associated with lncRNAs.
引用
收藏
页数:16
相关论文
共 41 条
[1]  
[Anonymous], 2014, EMNLP
[2]  
[Anonymous], BMC PLANT BIOL, DOI DOI 10.1186/1471-2229-12-171
[3]   Long Noncoding RNA LINC01619 Regulates MicroRNA-27a/Forkhead Box Protein O1 and Endoplasmic Reticulum Stress-Mediated Podocyte Injury in Diabetic Nephropathy [J].
Bai, Xiaoyan ;
Geng, Jian ;
Li, Xiao ;
Wan, Jiao ;
Liu, Jixing ;
Zhou, Zhanmei ;
Liu, Xiaoting .
ANTIOXIDANTS & REDOX SIGNALING, 2018, 29 (04) :355-376
[4]   LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT [J].
BENGIO, Y ;
SIMARD, P ;
FRASCONI, P .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02) :157-166
[5]   Current-generation high-throughput sequencing: deepening insights into mammalian transcriptomes [J].
Blencowe, Benjamin J. ;
Ahmad, Sidrah ;
Lee, Leo J. .
GENES & DEVELOPMENT, 2009, 23 (12) :1379-1386
[6]  
Bordes A, 2009, J MACH LEARN RES, V10, P1737
[7]   NONCODE v3.0: integrative annotation of long noncoding RNAs [J].
Bu, Dechao ;
Yu, Kuntao ;
Sun, Silong ;
Xie, Chaoyong ;
Skogerbo, Geir ;
Miao, Ruoyu ;
Xiao, Hui ;
Liao, Qi ;
Luo, Haitao ;
Zhao, Guoguang ;
Zhao, Haitao ;
Liu, Zhiyong ;
Liu, Changning ;
Chen, Runsheng ;
Zhao, Yi .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D210-D215
[8]   Microarray profiling and co-expression network analysis of the lncRNAs and mRNAs associated with acute leukemia in adults [J].
Cheng, Hui ;
Huang, Chong Mei ;
Wang, Yang ;
Hu, Xiao Xia ;
Xu, Xiao Qian ;
Song, Xian Min ;
Tang, Gu Sheng ;
Chen, Li ;
Yang, Jian Min .
MOLECULAR BIOSYSTEMS, 2017, 13 (06) :1102-1108
[9]   lncRNA-MFDL: identification of human long non-coding RNAs by fusing multiple features and using deep learning [J].
Fan, Xiao-Nan ;
Zhang, Shao-Wu .
MOLECULAR BIOSYSTEMS, 2015, 11 (03) :892-897
[10]   Ensembl 2011 [J].
Flicek, Paul ;
Amode, M. Ridwan ;
Barrell, Daniel ;
Beal, Kathryn ;
Brent, Simon ;
Chen, Yuan ;
Clapham, Peter ;
Coates, Guy ;
Fairley, Susan ;
Fitzgerald, Stephen ;
Gordon, Leo ;
Hendrix, Maurice ;
Hourlier, Thibaut ;
Johnson, Nathan ;
Kaehaeri, Andreas ;
Keefe, Damian ;
Keenan, Stephen ;
Kinsella, Rhoda ;
Kokocinski, Felix ;
Kulesha, Eugene ;
Larsson, Pontus ;
Longden, Ian ;
McLaren, William ;
Overduin, Bert ;
Pritchard, Bethan ;
Riat, Harpreet Singh ;
Rios, Daniel ;
Ritchie, Graham R. S. ;
Ruffier, Magali ;
Schuster, Michael ;
Sobral, Daniel ;
Spudich, Giulietta ;
Tang, Y. Amy ;
Trevanion, Stephen ;
Vandrovcova, Jana ;
Vilella, Albert J. ;
White, Simon ;
Wilder, Steven P. ;
Zadissa, Amonida ;
Zamora, Jorge ;
Aken, Bronwen L. ;
Birney, Ewan ;
Cunningham, Fiona ;
Dunham, Ian ;
Durbin, Richard ;
Fernandez-Suarez, Xose M. ;
Herrero, Javier ;
Hubbard, Tim J. P. ;
Parker, Anne ;
Proctor, Glenn .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D800-D806