SDM6A: A Web-Based Integrative Machine-Learning Framework for Predicting 6mA Sites in the Rice Genome

被引:131
作者
Basith, Shaherin [1 ]
Manavalan, Balachandran [1 ]
Shin, Tae Hwan [1 ]
Lee, Gwang [1 ]
机构
[1] Ajou Univ, Sch Med, Dept Physiol, Suwon, South Korea
基金
新加坡国家研究基金会;
关键词
DNA METHYLATION; N-6-ADENINE; N-6-METHYLADENINE; IDENTIFICATION;
D O I
10.1016/j.omtn.2019.08.011
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
DNA N-6-adenine methylation (6mA) is an epigenetic modification in prokaryotes and eukaryotes. Identifying 6mA sites in rice genome is important in rice epigenetics and breeding, but non-random distribution and biological functions of these sites remain unclear. Several machine-learning tools can identify 6mA sites but show limited prediction accuracy, which limits their usability in epigenetic research. Here, we developed a novel computational predictor, called the Sequence-based DNA N-6-methyladenine predictor (SDM6A), which is a two-layer ensemble approach for identifying 6mA sites in the rice genome. Unlike existing methods, which are based on single models with basic features, SDM6A explores various features, and five encoding methods were identified as appropriate for this problem. Subsequently, an optimal feature set was identified from encodings, and corresponding models were developed individually using support vector machine and extremely randomized tree. First, all five single models were integrated via ensemble approach to define the class for each classifier. Second, two classifiers were integrated to generate a final prediction. SDM6A achieved robust performance on cross-validation and independent evaluation, with average accuracy and Matthews correlation coefficient (MCC) of 88.2% and 0.764, respectively. Corresponding metrics were 4.7%-11.0% and 2.3%-5.5% higher than those of existing methods, respectively. A user-friendly, publicly accessible web server (http://thegleelab.org/SDM6A) was implemented to predict novel putative 6mA sites in rice genome.
引用
收藏
页码:131 / 141
页数:11
相关论文
共 77 条
[1]   Machine learning for neuroirnaging with scikit-learn [J].
Abraham, Alexandre ;
Pedregosa, Fabian ;
Eickenberg, Michael ;
Gervais, Philippe ;
Mueller, Andreas ;
Kossaifi, Jean ;
Gramfort, Alexandre ;
Thirion, Bertrand ;
Varoquaux, Gael .
FRONTIERS IN NEUROINFORMATICS, 2014, 8
[2]  
[Anonymous], BIOINFORMATICS
[3]  
[Anonymous], 2016, KDD16 P 22 ACM, DOI DOI 10.1145/2939672.2939785
[4]   iGHBP: Computational identification of growth hormone binding proteins from sequences using extremely randomised tree [J].
Basith, Shaherin ;
Manavalan, Balachandran ;
Shin, Tae Hwan ;
Lee, Gwang .
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2018, 16 :412-420
[5]   mACPpred: A Support Vector Machine-Based Meta-Predictor for Identification of Anticancer Peptides [J].
Boopathi, Vinothini ;
Subramaniyam, Sathiyamoorthy ;
Malik, Adeel ;
Lee, Gwang ;
Manavalan, Balachandran ;
Yang, Deok-Chun .
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2019, 20 (08)
[6]  
Cao R., 2017, MOLECULES, V22, P22
[7]   DeepQA: improving the estimation of single protein model quality with deep belief networks [J].
Cao, Renzhi ;
Bhattacharya, Debswapna ;
Hou, Jie ;
Cheng, Jianlin .
BMC BIOINFORMATICS, 2016, 17
[8]   i6mA-Pred: identifying DNA N6 - methyladenine sites in the rice genome [J].
Chen, Wei ;
Lv, Hao ;
Nie, Fulei ;
Lin, Hao .
BIOINFORMATICS, 2019, 35 (16) :2796-2800
[9]   Recent Advances in Machine Learning Methods for Predicting Heat Shock Proteins [J].
Chen, Wei ;
Feng, Pengmian ;
Liu, Tao ;
Jin, Dianchuan .
CURRENT DRUG METABOLISM, 2019, 20 (03) :224-228
[10]   iRNA(m6A)-PseDNC: Identifying N6-methyladenosine sites using pseudo dinucleotide composition [J].
Chen, Wei ;
Ding, Hui ;
Zhou, Xu ;
Lin, Hao ;
Chou, Kuo-Chen .
ANALYTICAL BIOCHEMISTRY, 2018, 561 :59-65