Interpretable Deep Learning Model Reveals Subsequences of Various Functions for Long Non-Coding RNA Identification

被引:2
作者
Lin, Rattaphon [1 ]
Wichadakul, Duangdao [1 ,2 ]
机构
[1] Chulalongkorn Univ, Fac Engn, Dept Comp Engn, Bangkok, Thailand
[2] Chulalongkorn Univ, Fac Med, Ctr Excellence Syst Biol, Bangkok, Thailand
关键词
long non-coding RNA (lncRNA); one-dimensional convolutional neural network (1D CNN); deep learning; explainable artificial intelligence (XAI); SHAP (SHapley additive exPlanations); PROTEIN;
D O I
10.3389/fgene.2022.876721
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Long non-coding RNAs (lncRNAs) play crucial roles in many biological processes and are implicated in several diseases. With the next-generation sequencing technologies, substantial unannotated transcripts have been discovered. Classifying unannotated transcripts using biological experiments are more time-consuming and expensive than computational approaches. Several tools are available for identifying long non-coding RNAs. These tools, however, did not explain the features in their tools that contributed to the prediction results. Here, we present Xlnc1DCNN, a tool for distinguishing long non-coding RNAs (lncRNAs) from protein-coding transcripts (PCTs) using a one-dimensional convolutional neural network with prediction explanations. The evaluation results of the human test set showed that Xlnc1DCNN outperformed other state-of-the-art tools in terms of accuracy and F1-score. The explanation results revealed that lncRNA transcripts were mainly identified as sequences with no conserved regions, short patterns with unknown functions, or only regions of transmembrane helices while protein-coding transcripts were mostly classified by conserved protein domains or families. The explanation results also conveyed the probably inconsistent annotations among the public databases, lncRNA transcripts which contain protein domains, protein families, or intrinsically disordered regions (IDRs). Xlnc1DCNN is freely available at .
引用
收藏
页数:12
相关论文
共 42 条
[1]   Automated detection of arrhythmias using different intervals of tachycardia ECG segments with convolutional neural network [J].
Acharya, U. Rajendra ;
Fujita, Hamido ;
Lih, Oh Shu ;
Hagiwara, Yuki ;
Tan, Jen Hong ;
Adam, Muhammad .
INFORMATION SCIENCES, 2017, 405 :81-90
[2]   A Micropeptide Encoded by a Putative Long Noncoding RNA Regulates Muscle Performance [J].
Anderson, Douglas M. ;
Anderson, Kelly M. ;
Chang, Chi-Lun ;
Makarewich, Catherine A. ;
Nelson, Benjamin R. ;
McAnally, John R. ;
Kasaragod, Prasad ;
Shelton, John M. ;
Liou, Jen ;
Bassel-Duby, Rhonda ;
Olson, Eric N. .
CELL, 2015, 160 (04) :595-606
[3]   The InterPro protein families and domains database: 20 years on [J].
Blum, Matthias ;
Chang, Hsin-Yu ;
Chuguransky, Sara ;
Grego, Tiago ;
Kandasaamy, Swaathi ;
Mitchell, Alex ;
Nuka, Gift ;
Paysan-Lafosse, Typhaine ;
Qureshi, Matloob ;
Raj, Shriya ;
Richardson, Lorna ;
Salazar, Gustavo A. ;
Williams, Lowri ;
Bork, Peer ;
Bridge, Alan ;
Gough, Julian ;
Haft, Daniel H. ;
Letunic, Ivica ;
Marchler-Bauer, Aron ;
Mi, Huaiyu ;
Natale, Darren A. ;
Necci, Marco ;
Orengo, Christine A. ;
Pandurangan, Arun P. ;
Rivoire, Catherine ;
Sigrist, Christian J. A. ;
Sillitoe, Ian ;
Thanki, Narmada ;
Thomas, Paul D. ;
Tosatto, Silvio C. E. ;
Wu, Cathy H. ;
Bateman, Alex ;
Finn, Robert D. .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D344-D354
[4]   RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences [J].
Camargo, Antonio P. ;
Sourkov, Vsevolod ;
Pereira, Goncalo A. G. ;
Carazzolle, Marcelo F. .
NAR GENOMICS AND BIOINFORMATICS, 2020, 2 (01)
[5]   Noncoding RNA:RNA Regulatory Networks in Cancer [J].
Chan, Jia Jia ;
Tay, Yvonne .
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2018, 19 (05)
[6]  
Chen H, 2019, ARXIV
[7]   Anti-proliferative and pro-apoptotic actions of a novel human and mouse ovarian tumor-associated gene OTAG-12: downregulation, alternative splicing and drug sensitization [J].
Chen, X. ;
Zhang, H. ;
Aravindakshan, J. P. ;
Gotlieb, W. H. ;
Sairam, M. R. .
ONCOGENE, 2011, 30 (25) :2874-2887
[8]   MetamORF: a repository of unique short open reading frames identified by both experimental and computational approaches for gene and metagene analyses [J].
Choteau, Sebastien A. ;
Wagner, Audrey ;
Pierre, Philippe ;
Spinelli, Lionel ;
Brun, Christine .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2021,
[9]   Ensembl 2019 [J].
Cunningham, Fiona ;
Achuthan, Premanand ;
Akanni, Wasiu ;
Allen, James ;
Amode, M. Ridwan ;
Armean, Irina M. ;
Bennett, Ruth ;
Bhai, Jyothish ;
Billis, Konstantinos ;
Boddu, Sanjay ;
Cummins, Carla ;
Davidson, Claire ;
Dodiya, Kamalkumar Jayantilal ;
Gall, Astrid ;
Giron, Carlos Garcia ;
Gil, Laurent ;
Grego, Tiago ;
Haggerty, Leanne ;
Haskell, Erin ;
Hourlier, Thibaut ;
Izuogu, Osagie G. ;
Janacek, Sophie H. ;
Juettemann, Thomas ;
Kay, Mike ;
Laird, Matthew R. ;
Lavidas, Ilias ;
Liu, Zhicheng ;
Loveland, Jane E. ;
Marugan, Jose C. ;
Maurel, Thomas ;
McMahon, Aoife C. ;
Moore, Benjamin ;
Morales, Joannella ;
Mudge, Jonathan M. ;
Nuhn, Michael ;
Ogeh, Denye ;
Parker, Anne ;
Parton, Andrew ;
Patricio, Mateus ;
Salam, Ahamed Imran Abdul ;
Schmitt, Bianca M. ;
Schuilenburg, Helen ;
Sheppard, Dan ;
Sparrow, Helen ;
Stapleton, Eloise ;
Szuba, Marek ;
Taylor, Kieron ;
Threadgold, Glen ;
Thormann, Anja ;
Vullo, Alessandro .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D745-D751
[10]   lncRNA_Mdeep: An Alignment-Free Predictor for Distinguishing Long Non-Coding RNAs from Protein-Coding Transcripts by Multimodal Deep Learning [J].
Fan, Xiao-Nan ;
Zhang, Shao-Wu ;
Zhang, Song-Yao ;
Ni, Jin-Jie .
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2020, 21 (15) :1-11