Common Features in lncRNA Annotation and Classification: A Survey

被引:17
作者
Klapproth, Christopher-- [1 ,2 ]
Sen, Rituparno [1 ,2 ]
Stadler, Peter F. [1 ,2 ]
Findeiss, Sven [1 ,2 ]
Fallmann, Joerg [1 ,2 ]
机构
[1] Univ Leipzig, Dept Comp Sci, Bioinformat Grp, Hartelstr 16-18, D-04107 Leipzig, Germany
[2] Univ Leipzig, Interdisciplinary Ctr Bioinformat, Hartelstr 16-18, D-04107 Leipzig, Germany
关键词
lncRNA; feature extraction; machine learning; coding sequence; classification problems; NONCODING RNAS; CODING REGIONS; DATABASE; CANCER; TRANSCRIPTOME; VERTEBRATE; LNCIPEDIA; SEQUENCES; ONCOGENE; COVERAGE;
D O I
10.3390/ncrna7040077
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Long non-coding RNAs (lncRNAs) are widely recognized as important regulators of gene expression. Their molecular functions range from miRNA sponging to chromatin-associated mechanisms, leading to effects in disease progression and establishing them as diagnostic and therapeutic targets. Still, only a few representatives of this diverse class of RNAs are well studied, while the vast majority is poorly described beyond the existence of their transcripts. In this review we survey common in silico approaches for lncRNA annotation. We focus on the well-established sets of features used for classification and discuss their specific advantages and weaknesses. While the available tools perform very well for the task of distinguishing coding sequence from other RNAs, we find that current methods are not well suited to distinguish lncRNAs or parts thereof from other non-protein-coding input sequences. We conclude that the distinction of lncRNAs from intronic sequences and untranslated regions of coding mRNAs remains a pressing research gap.
引用
收藏
页数:25
相关论文
共 108 条
[1]   LncRNA-ID: Long non-coding RNA IDentification using balanced random forests [J].
Achawanantakun, Rujira ;
Chen, Jiao ;
Sun, Yanni ;
Zhang, Yuan .
BIOINFORMATICS, 2015, 31 (24) :3897-3905
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   Evaluation of deep learning in non-coding RNA classification [J].
Amin, Noorul ;
McGrath, Annette ;
Chen, Yi-Ping Phoebe .
NATURE MACHINE INTELLIGENCE, 2019, 1 (05) :246-256
[4]   UniProt: the universal protein knowledgebase [J].
Bateman, Alex ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Alpi, Emanuele ;
Antunes, Ricardo ;
Bely, Benoit ;
Bingley, Mark ;
Bonilla, Carlos ;
Britto, Ramona ;
Bursteinas, Borisas ;
Bye-A-Jee, Hema ;
Cowley, Andrew ;
Da Silva, Alan ;
De Giorgi, Maurizio ;
Dogan, Tunca ;
Fazzini, Francesco ;
Castro, Leyla Garcia ;
Figueira, Luis ;
Garmiri, Penelope ;
Georghiou, George ;
Gonzalez, Daniel ;
Hatton-Ellis, Emma ;
Li, Weizhong ;
Liu, Wudong ;
Lopez, Rodrigo ;
Luo, Jie ;
Lussi, Yvonne ;
MacDougall, Alistair ;
Nightingale, Andrew ;
Palka, Barbara ;
Pichler, Klemens ;
Poggioli, Diego ;
Pundir, Sangya ;
Pureza, Luis ;
Qi, Guoying ;
Rosanoff, Steven ;
Saidi, Rabie ;
Sawford, Tony ;
Shypitsyna, Aleksandra ;
Speretta, Elena ;
Turner, Edward ;
Tyagi, Nidhi ;
Volynkin, Vladimir ;
Wardell, Tony ;
Warner, Kate ;
Watkins, Xavier ;
Zaru, Rossana ;
Zellner, Hermann ;
Xenarios, Ioannis .
NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) :D158-D169
[5]   Screening non-coding RNAs in transcriptomes from neglected species using PORTRAIT: case study of the pathogenic fungus Paracoccidioides brasiliensis Software [J].
Arrial, Roberto T. ;
Togawa, Roberto C. ;
Brigido, Marcelo de M. .
BMC BIOINFORMATICS, 2009, 10
[6]   Disease-Causing Mutations and Rearrangements in Long Non-coding RNA Gene Loci [J].
Aznaourova, Marina ;
Schmerer, Nils ;
Schmeck, Bernd ;
Schulte, Leon N. .
FRONTIERS IN GENETICS, 2020, 11
[7]   LncRNAnet: long non-coding RNA identification using deep learning [J].
Baek, Junghwan ;
Lee, Byunghan ;
Kwon, Sunyoung ;
Yoon, Sungroh .
BIOINFORMATICS, 2018, 34 (22) :3889-3897
[8]   UniProt: the universal protein knowledgebase in 2021 [J].
Bateman, Alex ;
Martin, Maria-Jesus ;
Orchard, Sandra ;
Magrane, Michele ;
Agivetova, Rahat ;
Ahmad, Shadab ;
Alpi, Emanuele ;
Bowler-Barnett, Emily H. ;
Britto, Ramona ;
Bursteinas, Borisas ;
Bye-A-Jee, Hema ;
Coetzee, Ray ;
Cukura, Austra ;
Da Silva, Alan ;
Denny, Paul ;
Dogan, Tunca ;
Ebenezer, ThankGod ;
Fan, Jun ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Georghiou, George ;
Gonzales, Leonardo ;
Hatton-Ellis, Emma ;
Hussein, Abdulrahman ;
Ignatchenko, Alexandr ;
Insana, Giuseppe ;
Ishtiaq, Rizwan ;
Jokinen, Petteri ;
Joshi, Vishal ;
Jyothi, Dushyanth ;
Lock, Antonia ;
Lopez, Rodrigo ;
Luciani, Aurelien ;
Luo, Jie ;
Lussi, Yvonne ;
Mac-Dougall, Alistair ;
Madeira, Fabio ;
Mahmoudy, Mahdi ;
Menchi, Manuela ;
Mishra, Alok ;
Moulang, Katie ;
Nightingale, Andrew ;
Oliveira, Carla Susana ;
Pundir, Sangya ;
Qi, Guoying ;
Raj, Shriya ;
Rice, Daniel ;
Lopez, Milagros Rodriguez ;
Saidi, Rabie ;
Sampson, Joseph .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D480-D489
[9]   Local RNA base pairing probabilities in large sequences [J].
Bernhart, SH ;
Hofacker, IL ;
Stadler, PF .
BIOINFORMATICS, 2006, 22 (05) :614-615
[10]   Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses [J].
Cabili, Moran N. ;
Trapnell, Cole ;
Goff, Loyal ;
Koziol, Magdalena ;
Tazon-Vega, Barbara ;
Regev, Aviv ;
Rinn, John L. .
GENES & DEVELOPMENT, 2011, 25 (18) :1915-1927