EpiSemble: A Novel Ensemble-based Machine-learning Framework for Prediction of DNA N6-methyladenine Sites Using Hybrid Features Selection Approach for Crops

被引:8
|
作者
Sinha, Dipro [1 ]
Dasmandal, Tanwy [1 ]
Yeasin, Md [2 ]
Mishra, Dwijesh C. [2 ]
Rai, Anil [2 ]
Archak, Sunil [3 ]
机构
[1] Indian Agr Res Inst, Div Agr Bioinformat, ICAR, New Delhi, India
[2] Indian Agr Res Inst, ICAR, New Delhi, India
[3] Natl Bur Plant Genet Resources, ICAR, New Delhi 110012, India
关键词
Epigenetics; methylation prediction; machine learning; ensemble model; feature selection; EpiSemble; MISMATCH REPAIR; MODEL; METHYLATION; TOOL;
D O I
10.2174/1574893618666230316151648
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Aim The study aimed to develop a robust and more precise 6mA methylation prediction tool that assists researchers in studying the epigenetic behaviour of crop plants.Background N6-methyladenine (6mA) is one of the predominant epigenetic modifications involved in a variety of biological processes in all three kingdoms of life. While in vitro approaches are more precise in detecting epigenetic alterations, they are resource-intensive and time-consuming. Artificial intelligence-based in silico methods have helped overcome these bottlenecks.Methods A novel machine learning framework was developed through the incorporation of four techniques: ensemble machine learning, hybrid approach for feature selection, the addition of features, such as Average Mutual Information Profile (AMIP), and bootstrap samples. In this study, four different feature sets, namely di-nucleotide frequency, GC content, AMIP, and nucleotide chemical properties were chosen for the vectorization of DNA sequences. Nine machine learning models, including support vector machine, random forest, k-nearest neighbor, artificial neural network, multiple logistic regression, decision tree, naive Bayes, AdaBoost, and gradient boosting were employed using relevant features extracted through the feature selection module. The top three best-performing models were selected and a robust ensemble model was developed to predict sequences with 6mA sites.Results EpiSemble, a novel ensemble model was developed for the prediction of 6mA methylation sites. Using the new model, an improvement in accuracy of 7.0%, 3.74%, and 6.65% was achieved over existing models for RiceChen, RiceLv, and Arabidopsis datasets, respectively. An R package, EpiSemble, based on the new model was developed and made available at https://cran.r-project.org/web/packages/EpiSemble/index.html.Conclusion The EpiSemble model added AMIP as a novel feature, integrated feature selection modules, bootstrapping of samples, and ensemble technique to achieve an improved output for accurate prediction of 6mA sites in plants. To our knowledge, this is the first R package developed for predicting epigenetic sites of genomes in crop plants, which is expected to help plant researchers in their future explorations.
引用
收藏
页码:587 / 597
页数:11
相关论文
共 26 条
  • [1] Ense-i6mA: Identification of DNA N6-Methyladenine Sites Using XGB-RFE Feature Selection and Ensemble Machine Learning
    Fan, Xueqiang
    Lin, Bing
    Hu, Jun
    Guo, Zhongyi
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2024, 21 (06) : 1842 - 1854
  • [2] i6mA-stack: A stacking ensemble-based computational prediction of DNA N6-methyladenine (6mA) sites in the Rosaceae genome
    Khanal, Jhabindra
    Lim, Dae Young
    Tayara, Hilal
    Chong, Kil To
    GENOMICS, 2021, 113 (01) : 582 - 592
  • [3] Meta-i6mA: an interspecies predictor for identifying DNA N6-methyladenine sites of plant genomes by exploiting informative features in an integrative machine-learning framework
    Hasan, Md Mehedi
    Basith, Shaherin
    Khatun, Mst Shamima
    Lee, Gwang
    Manavalan, Balachandran
    Kurata, Hiroyuki
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
  • [4] Tissue specific prediction of N6-methyladenine sites based on an ensemble of multi-input hybrid neural network
    Jia, Cangzhi
    Jin, Dong
    Wang, Xin
    Zhao, Qi
    BIOCELL, 2022, 46 (04) : 1105 - 1121
  • [5] Detection of DNA N6-Methyladenine Modification through SMRT-seq Features and Machine Learning Model
    Guo, Yichu
    Zhang, Yixuan
    Liu, Xiaoqing
    He, Pingan
    Zeng, Yuni
    Dai, Qi
    CURRENT BIOINFORMATICS, 2024,
  • [6] BERT6mA: prediction of DNA N6-methyladenine site using deep learning-based approaches
    Tsukiyama, Sho
    Hasan, Md Mehedi
    Deng, Hong-Wen
    Kurata, Hiroyuki
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (02)
  • [7] i6mA-Caps: a CapsuleNet-based framework for identifying DNA N6-methyladenine sites
    Rehman, Mobeen Ur
    Tayara, Hilal
    Zou, Quan
    Chong, Kil To
    BIOINFORMATICS, 2022, 38 (16) : 3885 - 3891
  • [8] SpineNet-6mA: A Novel Deep Learning Tool for Predicting DNA N6-Methyladenine Sites in Genomes
    Abbas, Zeeshan
    Tayara, Hilal
    Chong, Kil To
    IEEE ACCESS, 2020, 8 : 201450 - 201457
  • [9] GC6mA-Pred: A deep learning approach to identify DNA N6-methyladenine sites in the rice genome
    Cai, Jianhua
    Xiao, Guobao
    Su, Ran
    METHODS, 2022, 204 : 14 - 21
  • [10] I-DNAN6mA: Accurate Identification of DNA N6-Methyladenine Sites Using the Base-Pairing Map and Deep Learning
    Fan, Xue-Qiang
    Lin, Bing
    Hu, Jun
    Guo, Zhong-Yi
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2023, 63 (03) : 1076 - 1086