Fast Prediction of Protein Methylation Sites Using a Sequence-Based Feature Selection Technique

被引:156
|
作者
Wei, Leyi [1 ]
Xing, Pengwei [1 ]
Shi, Gaotao [1 ]
Ji, Zhiliang [2 ,3 ]
Zou, Quan [1 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin 300072, Peoples R China
[2] Xiamen Univ, Sch Life Sci, State Key Lab Stress Cell Biol, Xiamen 361005, Peoples R China
[3] Xiamen Univ, Key Lab Chem Biol Fujian Prov, Xiamen 361005, Peoples R China
基金
中国国家自然科学基金;
关键词
Protein methylation site; machine learning based method; feature representation; feature selection technique; CITRULLINATION; PSEKNC;
D O I
10.1109/TCBB.2017.2670558
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein methylation, an important post-translational modification, plays crucial roles in many cellular processes. The accurate prediction of protein methylation sites is fundamentally important for revealing the molecular mechanisms undergoing methylation. In recent years, computational prediction based on machine learning algorithms has emerged as a powerful and robust approach for identifying methylation sites, and much progress has been made in predictive performance improvement. However, the predictive performance of existing methods is not satisfactory in terms of overall accuracy. Motivated by this, we propose a novel random-forest-based predictor called MePred-RF, integrating several discriminative sequence-based feature descriptors and improving feature representation capability using a powerful feature selection technique. Importantly, unlike other methods based on multiple, complex information inputs, our proposed MePred-RF is based on sequence information alone. Comparative studies on benchmark datasets via vigorous jackknife tests indicate that our proposed MePred-RF method remarkably outperforms other state-of-the-art predictors, leading by a 4.5 percent average in terms of overall accuracy. A user-friendly webserver that implements the proposed method has been established for researchers' convenience, and is now freely available for public use through http://server.malab.cn/MePred-RF. We anticipate our research tool to be useful for the large-scale prediction and analysis of protein methylation sites.
引用
收藏
页码:1264 / 1273
页数:10
相关论文
共 50 条
  • [1] Sequence-based prediction of protein interaction sites with an integrative method
    Chen, Xue-Wen
    Jeong, Jong Cheol
    BIOINFORMATICS, 2009, 25 (05) : 585 - 591
  • [2] Sequence-based prediction model of protein crystallization propensity using machine learning and two-level feature selection
    Le, Nguyen Quoc Khanh
    Li, Wanru
    Cao, Yanshuang
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (05)
  • [3] Sequence-Based Classification Using Discriminatory Motif Feature Selection
    Xiong, Hao
    Capurso, Daniel
    Sen, Saunak
    Segal, Mark R.
    PLOS ONE, 2011, 6 (11):
  • [4] Sequence based prediction of pattern recognition receptors by using feature selection technique
    Feng, Pengmian
    Feng, Lijing
    INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2020, 162 : 931 - 934
  • [5] A Novel Sequence-Based Method for Phosphorylation Site Prediction with Feature Selection and Analysis
    He, Zhi-Song
    Shi, Xiao-He
    Kong, Xiang-Ying
    Zhu, Yu-Bei
    Chou, Kuo-Chen
    PROTEIN AND PEPTIDE LETTERS, 2012, 19 (01): : 70 - 78
  • [6] Sequence-based feature prediction and annotation of proteins
    Agnieszka S Juncker
    Lars J Jensen
    Andrea Pierleoni
    Andreas Bernsel
    Michael L Tress
    Peer Bork
    Gunnar von Heijne
    Alfonso Valencia
    Christos A Ouzounis
    Rita Casadio
    Søren Brunak
    Genome Biology, 10
  • [7] Sequence-based feature prediction and annotation of proteins
    Juncker, Agnieszka S.
    Jensen, Lars J.
    Pierleoni, Andrea
    Bernsel, Andreas
    Tress, Michael L.
    Bork, Peer
    von Heijne, Gunnar
    Valencia, Alfonso
    Ouzounis, Christos A.
    Casadio, Rita
    Brunak, Soren
    GENOME BIOLOGY, 2009, 10 (02): : 206
  • [8] Sequence-Based Prediction of Protein-Carbohydrate Binding Sites Using Support Vector Machines
    Taherzadeh, Ghazaleh
    Zhou, Yaoqi
    Liew, Alan Wee-Chung
    Yang, Yuedong
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2016, 56 (10) : 2115 - 2122
  • [9] Sequence-Based Prediction of Protein-Peptide Binding Sites Using Support Vector Machine
    Taherzadeh, Ghazaleh
    Yang, Yuedong
    Zhang, Tuo
    Liew, Alan Wee-Chung
    Zhou, Yaoqi
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 2016, 37 (13) : 1223 - 1229
  • [10] Mal-Lys: prediction of lysine malonylation sites in proteins integrated sequence-based features with mRMR feature selection
    Yan Xu
    Ya-Xin Ding
    Jun Ding
    Ling-Yun Wu
    Yu Xue
    Scientific Reports, 6