Interpretable Deep Learning Model Reveals Subsequences of Various Functions for Long Non-Coding RNA Identification

被引:2
作者
Lin, Rattaphon [1 ]
Wichadakul, Duangdao [1 ,2 ]
机构
[1] Chulalongkorn Univ, Fac Engn, Dept Comp Engn, Bangkok, Thailand
[2] Chulalongkorn Univ, Fac Med, Ctr Excellence Syst Biol, Bangkok, Thailand
关键词
long non-coding RNA (lncRNA); one-dimensional convolutional neural network (1D CNN); deep learning; explainable artificial intelligence (XAI); SHAP (SHapley additive exPlanations); PROTEIN;
D O I
10.3389/fgene.2022.876721
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Long non-coding RNAs (lncRNAs) play crucial roles in many biological processes and are implicated in several diseases. With the next-generation sequencing technologies, substantial unannotated transcripts have been discovered. Classifying unannotated transcripts using biological experiments are more time-consuming and expensive than computational approaches. Several tools are available for identifying long non-coding RNAs. These tools, however, did not explain the features in their tools that contributed to the prediction results. Here, we present Xlnc1DCNN, a tool for distinguishing long non-coding RNAs (lncRNAs) from protein-coding transcripts (PCTs) using a one-dimensional convolutional neural network with prediction explanations. The evaluation results of the human test set showed that Xlnc1DCNN outperformed other state-of-the-art tools in terms of accuracy and F1-score. The explanation results revealed that lncRNA transcripts were mainly identified as sequences with no conserved regions, short patterns with unknown functions, or only regions of transmembrane helices while protein-coding transcripts were mostly classified by conserved protein domains or families. The explanation results also conveyed the probably inconsistent annotations among the public databases, lncRNA transcripts which contain protein domains, protein families, or intrinsically disordered regions (IDRs). Xlnc1DCNN is freely available at .
引用
收藏
页数:12
相关论文
共 42 条
[11]   Roles, Functions, and Mechanisms of Long Non-coding RNAs in Cancer [J].
Fang, Yiwen ;
Fullwood, Melissa J. .
GENOMICS PROTEOMICS & BIOINFORMATICS, 2016, 14 (01) :42-54
[12]   GENCODE reference annotation for the human and mouse genomes [J].
Frankish, Adam ;
Diekhans, Mark ;
Ferreira, Anne-Maud ;
Johnson, Rory ;
Jungreis, Irwin ;
Loveland, Jane ;
Mudge, Jonathan M. ;
Sisu, Cristina ;
Wright, James ;
Armstrong, Joel ;
Barnes, If ;
Berry, Andrew ;
Bignell, Alexandra ;
Sala, Silvia Carbonell ;
Chrast, Jacqueline ;
Cunningham, Fiona ;
Di Domenico, Tomas ;
Donaldson, Sarah ;
Fiddes, Ian T. ;
Giron, Carlos Garcia ;
Gonzalez, Jose Manuel ;
Grego, Tiago ;
Hardy, Matthew ;
Hourlier, Thibaut ;
Hunt, Toby ;
Izuogu, Osagie G. ;
Lagarde, Julien ;
Martin, Fergal J. ;
Martinez, Laura ;
Mohanan, Shamika ;
Muir, Paul ;
Navarro, Fabio C. P. ;
Parker, Anne ;
Pei, Baikang ;
Pozo, Fernando ;
Ruffier, Magali ;
Schmitt, Bianca M. ;
Stapleton, Eloise ;
Suner, Marie-Marthe ;
Sycheva, Irina ;
Uszczynska-Ratajczak, Barbara ;
Xu, Jinuri ;
Yates, Andrew ;
Zerbino, Daniel ;
Zhang, Yan ;
Aken, Bronwen ;
Choudhary, Jyoti S. ;
Gerstein, Mark ;
Guigo, Roderic ;
Hubbard, Tim J. P. .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D766-D773
[13]   CNIT: a fast and accurate web tool for identifying protein-coding and long non-coding transcripts based on intrinsic sequence composition [J].
Guo, Jin-Cheng ;
Fang, Shuang-Sang ;
Wu, Yang ;
Zhang, Jian-Hua ;
Chen, Yang ;
Liu, Jing ;
Wu, Bo ;
Wu, Jia-Rui ;
Li, En-Min ;
Xu, Li-Yan ;
Sun, Liang ;
Zhao, Yi .
NUCLEIC ACIDS RESEARCH, 2019, 47 (W1) :W516-W522
[14]   When Long Noncoding Becomes Protein Coding [J].
Hartford, Corrine Corrina R. ;
Lal, Ashish .
MOLECULAR AND CELLULAR BIOLOGY, 2020, 40 (06)
[15]   Detection of Atrial Fibrillation Using 1D Convolutional Neural Network [J].
Hsieh, Chaur-Heh ;
Li, Yan-Shuo ;
Hwang, Bor-Jiunn ;
Hsiao, Ching-Hua .
SENSORS, 2020, 20 (07)
[16]   Roles of lncRNAs in cancer: Focusing on angiogenesis [J].
Jin, Ke-Tao ;
Yao, Jia-Yu ;
Fang, Xing-Liang ;
Di, Hua ;
Ma, Ying-Yu .
LIFE SCIENCES, 2020, 252
[17]   CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features [J].
Kang, Yu-Jian ;
Yang, De-Chang ;
Kong, Lei ;
Hou, Mei ;
Meng, Yu-Qi ;
Wei, Liping ;
Gao, Ge .
NUCLEIC ACIDS RESEARCH, 2017, 45 (W1) :W12-W16
[18]   1D convolutional neural networks and applications: A survey [J].
Kiranyaz, Serkan ;
Avci, Onur ;
Abdeljaber, Osama ;
Ince, Turker ;
Gabbouj, Moncef ;
Inman, Daniel J. .
MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2021, 151
[19]   Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes [J].
Krogh, A ;
Larsson, B ;
von Heijne, G ;
Sonnhammer, ELL .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 305 (03) :567-580
[20]   PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme [J].
Li, Aimin ;
Zhang, Junying ;
Zhou, Zhongyin .
BMC BIOINFORMATICS, 2014, 15