Fast Prediction of Protein Methylation Sites Using a Sequence-Based Feature Selection Technique

被引:155
作者
Wei, Leyi [1 ]
Xing, Pengwei [1 ]
Shi, Gaotao [1 ]
Ji, Zhiliang [2 ,3 ]
Zou, Quan [1 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin 300072, Peoples R China
[2] Xiamen Univ, Sch Life Sci, State Key Lab Stress Cell Biol, Xiamen 361005, Peoples R China
[3] Xiamen Univ, Key Lab Chem Biol Fujian Prov, Xiamen 361005, Peoples R China
基金
中国国家自然科学基金;
关键词
Protein methylation site; machine learning based method; feature representation; feature selection technique; CITRULLINATION; PSEKNC;
D O I
10.1109/TCBB.2017.2670558
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein methylation, an important post-translational modification, plays crucial roles in many cellular processes. The accurate prediction of protein methylation sites is fundamentally important for revealing the molecular mechanisms undergoing methylation. In recent years, computational prediction based on machine learning algorithms has emerged as a powerful and robust approach for identifying methylation sites, and much progress has been made in predictive performance improvement. However, the predictive performance of existing methods is not satisfactory in terms of overall accuracy. Motivated by this, we propose a novel random-forest-based predictor called MePred-RF, integrating several discriminative sequence-based feature descriptors and improving feature representation capability using a powerful feature selection technique. Importantly, unlike other methods based on multiple, complex information inputs, our proposed MePred-RF is based on sequence information alone. Comparative studies on benchmark datasets via vigorous jackknife tests indicate that our proposed MePred-RF method remarkably outperforms other state-of-the-art predictors, leading by a 4.5 percent average in terms of overall accuracy. A user-friendly webserver that implements the proposed method has been established for researchers' convenience, and is now freely available for public use through http://server.malab.cn/MePred-RF. We anticipate our research tool to be useful for the large-scale prediction and analysis of protein methylation sites.
引用
收藏
页码:1264 / 1273
页数:10
相关论文
共 42 条
[1]  
[Anonymous], 2016, Sci. Rep., DOI DOI 10.1155/2016/1654623
[2]  
[Anonymous], 2011, Ann. IEEE India Conf, DOI DOI 10.1109/INDCON.2011.6139332
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]   Predicting functionally important residues from sequence conservation [J].
Capra, John A. ;
Singh, Mona .
BIOINFORMATICS, 2007, 23 (15) :1875-1882
[5]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[6]   MeMo: a web tool for prediction of protein methylation modifications [J].
Chen, Hu ;
Xue, Yu ;
Huang, Ni ;
Yao, Xuebiao ;
Sun, Zhirong .
NUCLEIC ACIDS RESEARCH, 2006, 34 :W249-W253
[7]   Expression of nitric oxide related enzymes in coronary heart disease [J].
Chen, X. ;
Niroomand, F. ;
Liu, Z. ;
Zankl, A. ;
Katus, H. A. ;
Jahn, L. ;
Tiefenbacher, C. P. .
BASIC RESEARCH IN CARDIOLOGY, 2006, 101 (04) :346-353
[8]   Using subsite coupling to predict signal peptides [J].
Chou, KC .
PROTEIN ENGINEERING, 2001, 14 (02) :75-79
[9]   WebLogo: A sequence logo generator [J].
Crooks, GE ;
Hon, G ;
Chandonia, JM ;
Brenner, SE .
GENOME RESEARCH, 2004, 14 (06) :1188-1190
[10]  
Daily KM., 2005, P IEEE S COMP INT BI, P1