Fast Prediction of Protein Methylation Sites Using a Sequence-Based Feature Selection Technique

被引:156
|
作者
Wei, Leyi [1 ]
Xing, Pengwei [1 ]
Shi, Gaotao [1 ]
Ji, Zhiliang [2 ,3 ]
Zou, Quan [1 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin 300072, Peoples R China
[2] Xiamen Univ, Sch Life Sci, State Key Lab Stress Cell Biol, Xiamen 361005, Peoples R China
[3] Xiamen Univ, Key Lab Chem Biol Fujian Prov, Xiamen 361005, Peoples R China
基金
中国国家自然科学基金;
关键词
Protein methylation site; machine learning based method; feature representation; feature selection technique; CITRULLINATION; PSEKNC;
D O I
10.1109/TCBB.2017.2670558
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein methylation, an important post-translational modification, plays crucial roles in many cellular processes. The accurate prediction of protein methylation sites is fundamentally important for revealing the molecular mechanisms undergoing methylation. In recent years, computational prediction based on machine learning algorithms has emerged as a powerful and robust approach for identifying methylation sites, and much progress has been made in predictive performance improvement. However, the predictive performance of existing methods is not satisfactory in terms of overall accuracy. Motivated by this, we propose a novel random-forest-based predictor called MePred-RF, integrating several discriminative sequence-based feature descriptors and improving feature representation capability using a powerful feature selection technique. Importantly, unlike other methods based on multiple, complex information inputs, our proposed MePred-RF is based on sequence information alone. Comparative studies on benchmark datasets via vigorous jackknife tests indicate that our proposed MePred-RF method remarkably outperforms other state-of-the-art predictors, leading by a 4.5 percent average in terms of overall accuracy. A user-friendly webserver that implements the proposed method has been established for researchers' convenience, and is now freely available for public use through http://server.malab.cn/MePred-RF. We anticipate our research tool to be useful for the large-scale prediction and analysis of protein methylation sites.
引用
收藏
页码:1264 / 1273
页数:10
相关论文
共 50 条
  • [41] Improving Protein Structure Prediction Using Multiple Sequence-Based Contact Predictions
    Wu, Sitao
    Szilagyi, Andras
    Zhang, Yang
    STRUCTURE, 2011, 19 (08) : 1182 - 1191
  • [42] Prediction of Cyclin Protein Using Two-Step Feature Selection Technique
    Sun, Jia-Nan
    Yang, Hua-Yi
    Yao, Jing
    DIng, Hui
    Han, Shu-Guang
    Wu, Cheng-Yan
    Tang, Hua
    Tang, Hua
    IEEE Access, 2020, 8 : 109535 - 109542
  • [43] Prediction of Golgi-resident protein types by using feature selection technique
    Ding, Hui
    Guo, Shou-Hui
    Deng, En-Ze
    Yuan, Lu-Feng
    Guo, Feng-Biao
    Huang, Jian
    Rao, Nini
    Chen, Wei
    Lin, Hao
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2013, 124 : 9 - 13
  • [44] Prediction of Cyclin Protein Using Two-Step Feature Selection Technique
    Sun, Jia-Nan
    Yang, Hua-Yi
    Yao, Jing
    Ding, Hui
    Han, Shu-Guang
    Wu, Cheng-Yan
    Tang, Hua
    IEEE ACCESS, 2020, 8 : 109535 - 109542
  • [45] Recent developments of sequence-based prediction of protein-protein interactions
    Murakami, Yoichi
    Mizuguchi, Kenji
    BIOPHYSICAL REVIEWS, 2022, 14 (06) : 1393 - 1411
  • [46] Sequence-based prediction of protein-binding sites in DNA: Comparative study of two SVM models
    Park, Byungkyu
    Im, Jinyong
    Tuvshinjargal, Narankhuu
    Lee, Wook
    Han, Kyungsook
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2014, 117 (02) : 158 - 167
  • [47] Sequence-based prediction of physicochemical interactions at protein functional sites using a function-and-interaction-annotated domain profile database
    Min Han
    Yifan Song
    Jiaqiang Qian
    Dengming Ming
    BMC Bioinformatics, 19
  • [48] PCSPred&x005F;SC: Prediction of Protein Citrullination Sites Using an Effective Sequence-Based Combined Method
    Zhang, Lina
    Chen, Jingui
    Zhang, Chengjin
    Gao, Rui
    Yang, Runtao
    IEEE ACCESS, 2020, 8 : 88453 - 88463
  • [49] Sequence-based prediction of physicochemical interactions at protein functional sites using a function-and-interaction-annotated domain profile database
    Han, Min
    Song, Yifan
    Qian, Jiaqiang
    Ming, Dengming
    BMC BIOINFORMATICS, 2018, 19
  • [50] Seeing the trees through the forest: sequence-based homo- and heteromeric protein-protein interaction sites prediction using random forest
    Hou, Qingzhen
    De Geest, Paul F. G.
    Vranken, Wim F.
    Heringa, Jaap
    Feenstra, K. Anton
    BIOINFORMATICS, 2017, 33 (10) : 1479 - 1487