A convolution neural network-based computational model to identify the occurrence sites of various RNA modifications by fusing varied features

被引:10
作者
Tahir, Muhammad [1 ]
Hayat, Maqsood [1 ]
Chong, Kil To [2 ,3 ]
机构
[1] Abdul Wali Khan Univ Mardan, Dept Comp Sci, Mardan 23200, KP, Pakistan
[2] Jeonbuk Natl Univ, Dept Elect & Informat Engn, Jeonju 54896, South Korea
[3] Jeonbuk Natl Univ, Adv Elect & Informat Res Ctr, Jeonju 54896, South Korea
基金
新加坡国家研究基金会;
关键词
Deep learning; RNA Modifications; k-Gram; Feature extraction; Convolution neural network; Data processing; SEQUENCE-BASED PREDICTOR; N-6-METHYLADENOSINE SITES; N6-METHYLADENOSINE SITES; METHYLATION; 5-METHYLCYTOSINE; PROTEINS;
D O I
10.1016/j.chemolab.2021.104233
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
RNA modification occurs in both prokaryotic and eukaryotic genomes, which is considered one of the major RNA properties. RNA modifications are the main portions of the regulatory landscape found in genes, which contain several bioprocesses at the post-transcriptional level. Therefore, the identification of RNA modifications residue information is essential for determining their molecular functions and their relevant mechanisms. Although the wet lab experimental works for identification of RNA modification sites have produced satisfactory results, these experimental-based approaches are highly labor-intensive and precious. So, it is indispensable to establish a novel and robust computational approach for the prediction of RNA modification sites. To solve these issues, an intelligent computational predictor called ?iRNA-Mod-CNN?, using deep learning hypotheses is developed to identify RNA modification sites. First, the biological sequences are encoded by implementing the one-hot encoding method. Then encoded feature vector is provided to the convolution neural network (CNN) model in order to discern the conceal information. Further, k-Gram feature space is amalgamated with CNN feature space. The computational predictor ?iRNA-Mod-CNN? showed significant improvement over the existing methods, producing 99.56%, 92.39%, and 86.66% of accuracies on m1A, m6A, and m5C benchmark datasets, respectively.
引用
收藏
页数:6
相关论文
共 66 条
  • [11] Identification and analysis of the N6-methyladenosine in the Saccharomyces cerevisiae transcriptome
    Chen, Wei
    Tran, Hong
    Liang, Zhiyong
    Lin, Hao
    Zhang, Liqing
    [J]. SCIENTIFIC REPORTS, 2015, 5
  • [12] iTIS-PseTNC: A sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition
    Chen, Wei
    Feng, Peng-Mian
    Deng, En-Ze
    Lin, Hao
    Chou, Kuo-Chen
    [J]. ANALYTICAL BIOCHEMISTRY, 2014, 462 : 76 - 83
  • [13] iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition
    Chen, Wei
    Feng, Peng-Mian
    Lin, Hao
    Chou, Kuo-Chen
    [J]. NUCLEIC ACIDS RESEARCH, 2013, 41 (06) : e68
  • [14] Using subsite coupling to predict signal peptides
    Chou, KC
    [J]. PROTEIN ENGINEERING, 2001, 14 (02): : 75 - 79
  • [15] Impacts of Bioinformatics to Medicinal Chemistry
    Chou, Kuo-Chen
    [J]. MEDICINAL CHEMISTRY, 2015, 11 (03) : 218 - 234
  • [16] Some remarks on protein attribute prediction and pseudo amino acid composition
    Chou, Kuo-Chen
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2011, 273 (01) : 236 - 247
  • [17] Collobert R, 2011, J MACH LEARN RES, V12, P2493
  • [18] Conneau A., 2017, ARXIV PREPRINT ARXIV
  • [19] DAVIS FF, 1957, J BIOL CHEM, V227, P907
  • [20] Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq
    Dominissini, Dan
    Moshitch-Moshkovitz, Sharon
    Schwartz, Schraga
    Salmon-Divon, Mali
    Ungar, Lior
    Osenberg, Sivan
    Cesarkas, Karen
    Jacob-Hirsch, Jasmine
    Amariglio, Ninette
    Kupiec, Martin
    Sorek, Rotem
    Rechavi, Gideon
    [J]. NATURE, 2012, 485 (7397) : 201 - U84