4mCPred: machine learning methods for DNA N4-methylcytosine sites prediction

被引:145
作者
He, Wenying [1 ]
Jia, Cangzhi [2 ]
Zou, Quan [1 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin 300072, Peoples R China
[2] Dalian Maritime Univ, Dept Math, Dalian 116026, Peoples R China
关键词
SEQUENCE-BASED PREDICTOR; AMINO-ACID-COMPOSITION; FEATURE-SELECTION; METHYLATION; INFORMATION; REPLICATION; PROTEINS; PSEKNC; N4-METHYLCYTOSINE; PROMOTERS;
D O I
10.1093/bioinformatics/bty668
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: N-4-methylcytosine (4mC), an important epigenetic modification formed by the action of specific methyltransferases, plays an essential role in DNA repair, expression and replication. The accurate identification of 4mC sites aids in-depth research to biological functions and mechanisms. Because, experimental identification of 4mC sites is time-consuming and costly, especially given the rapid accumulation of gene sequences. Supplementation with efficient computational methods is urgently needed. Results: In this study, we developed a new tool, 4mCPred, for predicting 4mC sites in Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Escherichia coli, Geoalkalibacter subterraneus and Geobacter pickeringii. 4mCPred consists of two independent models, 4mCPred_I and 4mCPred_II, for each species. The predictive results of independent and cross-species tests demonstrated that the performance of 4mCPred_I is a useful tool. To identify position-specific trinucleotide propensity (PSTNP) and electron-ion interaction potential features, we used the F-score method to construct predictive models and to compare their PSTNP features. Compared with other existing predictors, 4mCPred achieved much higher accuracies in rigorous jackknife and independent tests. We also analyzed the importance of different features in detail.
引用
收藏
页码:593 / 601
页数:9
相关论文
共 65 条
[1]   The DNA methyltransferases of mammals [J].
Bestor, TH .
HUMAN MOLECULAR GENETICS, 2000, 9 (16) :2395-2402
[2]   ESCHERICHIA-COLI ORIC AND THE DNAA GENE PROMOTER ARE SEQUESTERED FROM DAM METHYLTRANSFERASE FOLLOWING THE PASSAGE OF THE CHROMOSOMAL REPLICATION FORK [J].
CAMPBELL, JL ;
KLECKNER, N .
CELL, 1990, 62 (05) :967-979
[3]   ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network [J].
Cao, Renzhi ;
Freitas, Colton ;
Chan, Leong ;
Sun, Miao ;
Jiang, Haiqing ;
Chen, Zhangxin .
MOLECULES, 2017, 22 (10)
[4]   QAcon: single model quality assessment using protein structural and contact information with machine learning techniques [J].
Cao, Renzhi ;
Adhikari, Badri ;
Bhattacharya, Debswapna ;
Sun, Miao ;
Hou, Jie ;
Cheng, Jianlin .
BIOINFORMATICS, 2017, 33 (04) :586-588
[5]   DeepQA: improving the estimation of single protein model quality with deep belief networks [J].
Cao, Renzhi ;
Bhattacharya, Debswapna ;
Hou, Jie ;
Cheng, Jianlin .
BMC BIOINFORMATICS, 2016, 17
[6]   SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines [J].
Cao, Renzhi ;
Wang, Zheng ;
Wang, Yiheng ;
Cheng, Jianlin .
BMC BIOINFORMATICS, 2014, 15
[7]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[8]   iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties [J].
Chen, Wei ;
Yang, Hui ;
Feng, Pengmian ;
Ding, Hui ;
Lin, Hao .
BIOINFORMATICS, 2017, 33 (22) :3518-3523
[9]   IACP: a sequence-based tool for identifying anticancer peptides [J].
Chen, Wei ;
Ding, Hui ;
Feng, Pengmian ;
Lin, Hao ;
Chou, Kuo-Chen .
ONCOTARGET, 2016, 7 (13) :16895-16909
[10]   Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences [J].
Chen, Wei ;
Lin, Hao ;
Chou, Kuo-Chen .
MOLECULAR BIOSYSTEMS, 2015, 11 (10) :2620-2634