Leveraging the attention mechanism to improve the identification of DNA N6-methyladenine sites

被引:29
|
作者
Zhang, Ying [1 ]
Liu, Yan [1 ]
Xu, Jian [1 ]
Wang, Xiaoyu [2 ]
Peng, Xinxin [2 ]
Song, Jiangning [3 ,4 ]
Yu, Dong-Jun [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, 200 Xiaolingwei, Nanjing 210094, Peoples R China
[2] Monash Univ, Biomed Discovery Inst, Melbourne, Vic 3800, Australia
[3] Monash Univ, Dept Biochem & Mol Biol, Melbourne, Vic 3800, Australia
[4] Monash Univ, Monash Biomed Discovery Inst, Melbourne, Vic, Australia
基金
英国医学研究理事会; 澳大利亚研究理事会; 中国国家自然科学基金; 美国国家卫生研究院;
关键词
DNA modification; 6mA; self-attention mechanism; deep learning; LSTM; attention interpretation; RICE GENOME; METHYLATION; N-6-ADENINE; PREDICTION;
D O I
10.1093/bib/bbab351
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
DNA N6-methyladenine is an important type of DNA modification that plays important roles in multiple biological processes. Despite the recent progress in developing DNA 6mA site prediction methods, several challenges remain to be addressed. For example, although the hand-crafted features are interpretable, they contain redundant information that may bias the model training and have a negative impact on the trained model. Furthermore, although deep learning (DL)-based models can perform feature extraction and classification automatically, they lack the interpretability of the crucial features learned by those models. As such, considerable research efforts have been focused on achieving the trade-off between the interpretability and straightforwardness of DL neural networks. In this study, we develop two new DL-based models for improving the prediction of N6-methyladenine sites, termed LA6mA and AL6mA, which use bidirectional long short-term memory to respectively capture the long-range information and self-attention mechanism to extract the key position information from DNA sequences. The performance of the two proposed methods is benchmarked and evaluated on the two model organisms Arabidopsis thaliana and Drosophila melanogaster. On the two benchmark datasets, LA6mA achieves an area under the receiver operating characteristic curve (AUROC) value of 0.962 and 0.966, whereas AL6mA achieves an AUROC value of 0.945 and 0.941, respectively. Moreover, an in-depth analysis of the attention matrix is conducted to interpret the important information, which is hidden in the sequence and relevant for 6mA site prediction. The two novel pipelines developed for DNA 6mA site prediction in this work will facilitate a better understanding of the underlying principle of DL-based DNA methylation site prediction and its future applications.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] DNA N6-methyladenine in metazoans: functional epigenetic mark or bystander?
    Guan-Zheng Luo
    Chuan He
    Nature Structural & Molecular Biology, 2017, 24 : 503 - 506
  • [32] SpineNet-6mA: A Novel Deep Learning Tool for Predicting DNA N6-Methyladenine Sites in Genomes
    Abbas, Zeeshan
    Tayara, Hilal
    Chong, Kil To
    IEEE ACCESS, 2020, 8 : 201450 - 201457
  • [33] SNNRice6mA: A Deep Learning Method for Predicting DNA N6-Methyladenine Sites in Rice Genome
    Yu, Haitao
    Dai, Zhiming
    FRONTIERS IN GENETICS, 2019, 10
  • [34] DNA N6-methyladenine in metazoans: functional epigenetic mark or bystander?
    Luo, Guan-Zheng
    He, Chuan
    NATURE STRUCTURAL & MOLECULAR BIOLOGY, 2017, 24 (06) : 503 - 506
  • [35] N6-Methyladenine Progressively Accumulates in Mitochondrial DNA during Aging
    Sturm, Adam
    Sharma, Himani
    Bodnar, Ferenc
    Aslam, Maryam
    Kovacs, Tibor
    Nemeth, Akos
    Hotzi, Bernadette
    Billes, Viktor
    Sigmond, Timea
    Tatrai, Kitti
    Egyed, Balazs
    Teglas-Huszar, Blanka
    Schlosser, Gitta
    Charmpilas, Nikolaos
    Ploumi, Christina
    Perczel, Andras
    Tavernarakis, Nektarios
    Vellai, Tibor
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2023, 24 (19)
  • [36] N6-METHYLADENINE IN MITOCHONDRIAL-DNA OF HIGHER-PLANTS
    VANYUSHIN, BF
    ALEXANDRUSHKINA, NI
    KIRNOS, MD
    FEBS LETTERS, 1988, 233 (02) : 397 - 399
  • [37] N6-Hydroxymethyladenine: a hydroxylation derivative of N6-methyladenine in genomic DNA of mammals
    Xiong, Jun
    Ye, Tian-Tian
    Ma, Cheng-Jie
    Cheng, Qing-Yun
    Yuan, Bi-Feng
    Feng, Yu-Qi
    NUCLEIC ACIDS RESEARCH, 2019, 47 (03) : 1268 - 1277
  • [38] I-DNAN6mA: Accurate Identification of DNA N6-Methyladenine Sites Using the Base-Pairing Map and Deep Learning
    Fan, Xue-Qiang
    Lin, Bing
    Hu, Jun
    Guo, Zhong-Yi
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2023, 63 (03) : 1076 - 1086
  • [39] A machine learning approach to identify N6-methyladenine sites in the rice genome
    Kong, Linghua
    Zhang, Zhongwang
    Zhao, Xueda
    Journal of Biotech Research, 2023, 14 : 115 - 123
  • [40] i6mA-Caps: a CapsuleNet-based framework for identifying DNA N6-methyladenine sites
    Rehman, Mobeen Ur
    Tayara, Hilal
    Zou, Quan
    Chong, Kil To
    BIOINFORMATICS, 2022, 38 (16) : 3885 - 3891