A Robust and Precise ConvNet for Small Non-Coding RNA Classification (RPC-snRC)

被引:7
作者
Asim, Muhammad Nabeel [1 ,2 ]
Malik, Muhammad Imran [3 ]
Zehe, Christoph [4 ]
Trygg, Johan [5 ,6 ]
Dengel, Andreas [1 ,2 ]
Ahmed, Sheraz [1 ]
机构
[1] German Res Ctr Artificial Intelligence DFKI, D-67663 Kaiserslautern, Germany
[2] TU Kaiserslautern, Dept Comp Sci, D-67663 Kaiserslautern, Germany
[3] Natl Univ Sci & Technol, Natl Ctr Artificial Intelligence NCAI, Islamabad 44000, Pakistan
[4] Sartorius Stedim Cellca GmbH, Sartorius Corp Res, D-89081 Ulm, Germany
[5] Umea Univ, Computat Life Sci Cluster CLiC, S-90187 Umea, Sweden
[6] Sartorius Stedim Data Analyt, Sartorius Corp Res, S-90333 Umea, Sweden
来源
IEEE ACCESS | 2021年 / 9卷
关键词
RNA; Feature extraction; Encoding; Computer architecture; Proteins; Databases; Support vector machines; RNA sequence analysis; small non-coding RNA classification; DenseNet; ResNet; IDENTIFICATION; SEQUENCES; PREDICTION; MICRORNAS; MIRNAS;
D O I
10.1109/ACCESS.2020.3037642
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Small non-coding RNAs (ncRNAs) are attracting increasing attention as they are now considered potentially valuable resources in the development of new drugs intended to cure several human diseases. A prerequisite for the development of drugs targeting ncRNAs or the related pathways is the identification and correct classification of such ncRNAs. State-of-the-art small ncRNA classification methodologies use secondary structural features as input. However, such feature extraction approaches only take global characteristics into account and completely ignore co-relative effects of local structures. Furthermore, secondary structure based approaches incorporate high dimensional feature space which is computationally expensive. The present paper proposes a novel Robust and Precise ConvNet (RPC-snRC) methodology which classifies small ncRNAs into relevant families by utilizing their primary sequence. RPC-snRC methodology learns hierarchical representation of features by utilizing positioning and information on the occurrence of nucleotides. To avoid exploding and vanishing gradient problems, we use an approach similar to DenseNet in which gradient can flow straight from subsequent layers to previous layers. In order to assess the effectiveness of deeper architectures for small ncRNA classification, we also adapted two ResNet architectures having a different number of layers. Experimental results on a benchmark small ncRNA dataset show that the proposed methodology does not only outperform existing small ncRNA classification approaches with a significant performance margin of 10% but it also gives better results than adapted ResNet architectures. To reproduce the results Source code and data set is available at https://github.com/muas16/small-non-coding-RNA-classification
引用
收藏
页码:19379 / 19390
页数:12
相关论文
共 64 条
  • [1] Role of the 5.8S rRNA in ribosome translocation
    AbouElela, S
    Nazar, RN
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (09) : 1788 - 1794
  • [2] LncRNA-ID: Long non-coding RNA IDentification using balanced random forests
    Achawanantakun, Rujira
    Chen, Jiao
    Sun, Yanni
    Zhang, Yuan
    [J]. BIOINFORMATICS, 2015, 31 (24) : 3897 - 3905
  • [3] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [4] Evaluation of deep learning in non-coding RNA classification
    Amin, Noorul
    McGrath, Annette
    Chen, Yi-Ping Phoebe
    [J]. NATURE MACHINE INTELLIGENCE, 2019, 1 (05) : 246 - 256
  • [5] 5S rRNA gene deletions cause an unexpectedly high fitness loss in Escherichia coli
    Ammons, D
    Rampersad, J
    Fox, GE
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (02) : 637 - 642
  • [6] Convolutional neural networks for classification of alignments of non-coding RNA sequences
    Aoki, Genta
    Sakakibara, Yasubumi
    [J]. BIOINFORMATICS, 2018, 34 (13) : 237 - 244
  • [7] Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics
    Asgari, Ehsaneddin
    Mofrad, Mohammad R. K.
    [J]. PLOS ONE, 2015, 10 (11):
  • [8] LncRNAnet: long non-coding RNA identification using deep learning
    Baek, Junghwan
    Lee, Byunghan
    Kwon, Sunyoung
    Yoon, Sungroh
    [J]. BIOINFORMATICS, 2018, 34 (22) : 3889 - 3897
  • [9] let-7 microRNAs in development, stem cells and cancer
    Buessing, Ingo
    Slack, Frank J.
    Grosshans, Helge
    [J]. TRENDS IN MOLECULAR MEDICINE, 2008, 14 (09) : 400 - 409
  • [10] The transcriptional landscape of the mammalian genome
    Carninci, P
    Kasukawa, T
    Katayama, S
    Gough, J
    Frith, MC
    Maeda, N
    Oyama, R
    Ravasi, T
    Lenhard, B
    Wells, C
    Kodzius, R
    Shimokawa, K
    Bajic, VB
    Brenner, SE
    Batalov, S
    Forrest, ARR
    Zavolan, M
    Davis, MJ
    Wilming, LG
    Aidinis, V
    Allen, JE
    Ambesi-Impiombato, X
    Apweiler, R
    Aturaliya, RN
    Bailey, TL
    Bansal, M
    Baxter, L
    Beisel, KW
    Bersano, T
    Bono, H
    Chalk, AM
    Chiu, KP
    Choudhary, V
    Christoffels, A
    Clutterbuck, DR
    Crowe, ML
    Dalla, E
    Dalrymple, BP
    de Bono, B
    Della Gatta, G
    di Bernardo, D
    Down, T
    Engstrom, P
    Fagiolini, M
    Faulkner, G
    Fletcher, CF
    Fukushima, T
    Furuno, M
    Futaki, S
    Gariboldi, M
    [J]. SCIENCE, 2005, 309 (5740) : 1559 - 1563