Evaluation of different computational methods on 5-methylcytosine sites identification

被引:129
作者
Lv, Hao [1 ]
Zhang, Zi-Mei [1 ]
Li, Shi-Hao [1 ]
Tan, Jiu-Xin [1 ]
Chen, Wei [2 ]
Lin, Hao [1 ]
机构
[1] Univ Elect Sci & Technol China, Ctr Informat Biol, Chengdu 610054, Peoples R China
[2] Chengdu Univ Tradit Chinese Med, Innovat Inst Chinese Med & Pharm, Chengdu, Peoples R China
关键词
m5C site; feature description; computational method; webserver; iRNA-m5C; BAYES CLASSIFICATION MODELS; FLEXIBLE WEB SERVER; RNA; 5-METHYLCYTOSINE; SUBCELLULAR LOCATION; QUALITY ASSESSMENT; RIBOSOMAL-RNA; PSEUDO; PREDICTION; METHYLATION; DNA;
D O I
10.1093/bib/bbz048
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
5-Methylcytosine (m5C) plays an extremely important role in the basic biochemical process. With the great increase of identified m5C sites in a wide variety of organisms, their epigenetic roles become largely unknown. Hence, accurate identification of m5C site is a key step in understanding its biological functions. Over the past several years, more attentions have been paid on the identification of m5C sites in multiple species. In this work, we firstly summarized the current progresses in computational prediction of m5C sites and then constructed a more powerful and reliable model for identifying m5C sites. To train the model, we collected experimentally confirmed m5C data from Homo sapiens, Mus musculus, Saccharomyces cerevisiae and Arabidopsis thaliana, and compared the performances of different feature extraction methods and classification algorithms for optimizing prediction model. Based on the optimal model, a novel predictor called iRNA-m5C was developed for the recognition of m5C sites. Finally, we critically evaluated the performance of iRNA-m5C and compared it with existing methods. The result showed that iRNA-m5C could produce the best prediction performance. We hope that this paper could provide a guide on the computational identification of m5C site and also anticipate that the proposed iRNA-m5C will become a powerful tool for large scale identification of m5C sites.
引用
收藏
页码:982 / 995
页数:14
相关论文
共 89 条
[1]   Rapid tRNA decay can result from lack of nonessential modifications [J].
Alexandrov, A ;
Chernyakov, I ;
Gu, WF ;
Hiley, SL ;
Hughes, TR ;
Grayhack, EJ ;
Phizicky, EM .
MOLECULAR CELL, 2006, 21 (01) :87-96
[2]  
[Anonymous], 2019, Bioinformatics
[3]  
[Anonymous], 2013, Comput Math Methods Med
[4]  
[Anonymous], 1996, An introduction to Bayesian networks
[5]  
[Anonymous], BRIEF BIOINFORM
[6]   3Drefine: an interactive web server for efficient protein structure refinement [J].
Bhattacharya, Debswapna ;
Nowotny, Jackson ;
Cao, Renzhi ;
Cheng, Jianlin .
NUCLEIC ACIDS RESEARCH, 2016, 44 (W1) :W406-W409
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[9]   ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network [J].
Cao, Renzhi ;
Freitas, Colton ;
Chan, Leong ;
Sun, Miao ;
Jiang, Haiqing ;
Chen, Zhangxin .
MOLECULES, 2017, 22 (10)
[10]   QAcon: single model quality assessment using protein structural and contact information with machine learning techniques [J].
Cao, Renzhi ;
Adhikari, Badri ;
Bhattacharya, Debswapna ;
Sun, Miao ;
Hou, Jie ;
Cheng, Jianlin .
BIOINFORMATICS, 2017, 33 (04) :586-588