4mCPred-CNN-Prediction of DNA N4-Methylcytosine in the Mouse Genome Using a Convolutional Neural Network

被引:21
作者
Abbas, Zeeshan [1 ,2 ]
Tayara, Hilal [3 ]
Chong, Kil To [1 ,4 ]
机构
[1] Jeonbuk Natl Univ, Dept Elect & Informat Engn, Jeonju 54896, South Korea
[2] Air Univ, Inst Avion & Aeronaut IAA, Islamabad 44000, Pakistan
[3] Jeonbuk Natl Univ, Sch Int Engn & Sci, Jeonju 54896, South Korea
[4] Jeonbuk Natl Univ, Adv Elect & Informat Res Ctr, Jeonju 54896, South Korea
基金
新加坡国家研究基金会;
关键词
N4-methylcytosine; computational biology; neural networks; epigenetics; SITES; IDENTIFICATION; METHYLATION; EPIGENETICS; TOOL;
D O I
10.3390/genes12020296
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Among DNA modifications, N4-methylcytosine (4mC) is one of the most significant ones, and it is linked to the development of cell proliferation and gene expression. To know different its biological functions, the accurate detection of 4mC sites is required. Although we have several techniques for the prediction of 4mC sites in different genomes based on both machine learning (ML) and convolutional neural networks (CNNs), there is no CNN-based tool for the identification of 4mC sites in the mouse genome. In this article, a CNN-based model named 4mCPred-CNN was developed to classify 4mC locations in the mouse genome. Until now, we had only two ML-based models for this purpose; they utilized several feature encoding schemes, and thus still had a lot of space available to improve the prediction accuracy. Utilizing only a single feature encoding scheme-one-hot encoding-we outperformed both of the previous ML-based techniques. In a ten-fold validation test, the proposed model, 4mCPred-CNN, achieved an accuracy of 85.71% and Matthews correlation coefficient (MCC) of 0.717. On an independent dataset, the achieved accuracy was 87.50% with an MCC value of 0.750. The attained results exhibit that the proposed model can be of great use for researchers in the fields of biology and bioinformatics.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 39 条
  • [1] SpineNet-6mA: A Novel Deep Learning Tool for Predicting DNA N6-Methyladenine Sites in Genomes
    Abbas, Zeeshan
    Tayara, Hilal
    Chong, Kil To
    [J]. IEEE ACCESS, 2020, 8 : 201450 - 201457
  • [2] A CNN-Based RNA N6-Methyladenosine Site Predictor for Multiple Species Using Heterogeneous Features Representation
    Alam, Waleed
    Ali, Syed Danish
    Tayara, Hilal
    Chong, Kil To
    [J]. IEEE ACCESS, 2020, 8 (08): : 138203 - 138209
  • [3] Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics
    Ardui, Simon
    Ameur, Adam
    Vermeesch, Joris R.
    Hestand, Matthew S.
    [J]. NUCLEIC ACIDS RESEARCH, 2018, 46 (05) : 2159 - 2168
  • [4] Nucleic Acid Modifications in Regulation of Gene Expression
    Chen, Kai
    Zhao, Boxuan Simen
    He, Chuan
    [J]. CELL CHEMICAL BIOLOGY, 2016, 23 (01): : 74 - 85
  • [5] iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data
    Chen, Zhen
    Zhao, Pei
    Li, Fuyi
    Marquez-Lago, Tatiana T.
    Leier, Andre
    Revote, Jerico
    Zhu, Yan
    Powell, David R.
    Akutsu, Tatsuya
    Webb, Geoffrey, I
    Chou, Kuo-Chen
    Smith, A. Ian
    Daly, Roger J.
    Li, Jian
    Song, Jiangning
    [J]. BRIEFINGS IN BIOINFORMATICS, 2020, 21 (03) : 1047 - 1057
  • [6] DNA MODIFICATION BY METHYLTRANSFERASES
    CHENG, XD
    [J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 1995, 5 (01) : 4 - 10
  • [7] Exploring genorne wide bisulfite sequencing for DNA rnethylation analysis in livestock: a technical assessment
    Doherty, Rachael
    Couldrey, Christine
    [J]. FRONTIERS IN GENETICS, 2014, 5 : 1 - 1
  • [8] Review - Mass spectrometry and protein analysis
    Domon, B
    Aebersold, R
    [J]. SCIENCE, 2006, 312 (5771) : 212 - 217
  • [9] Mouse models in epigenetics: insights in development and disease
    Espada, Jesus
    Esteller, Manel
    [J]. BRIEFINGS IN FUNCTIONAL GENOMICS, 2013, 12 (03) : 279 - 287
  • [10] i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes
    Hasan, Md. Mehedi
    Manavalan, Balachandran
    Shoombuatong, Watshara
    Khatun, Mst. Shamima
    Kurata, Hiroyuki
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2020, 18 : 906 - 912