Fast Prediction of Protein Methylation Sites Using a Sequence-Based Feature Selection Technique

被引:156
|
作者
Wei, Leyi [1 ]
Xing, Pengwei [1 ]
Shi, Gaotao [1 ]
Ji, Zhiliang [2 ,3 ]
Zou, Quan [1 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin 300072, Peoples R China
[2] Xiamen Univ, Sch Life Sci, State Key Lab Stress Cell Biol, Xiamen 361005, Peoples R China
[3] Xiamen Univ, Key Lab Chem Biol Fujian Prov, Xiamen 361005, Peoples R China
基金
中国国家自然科学基金;
关键词
Protein methylation site; machine learning based method; feature representation; feature selection technique; CITRULLINATION; PSEKNC;
D O I
10.1109/TCBB.2017.2670558
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein methylation, an important post-translational modification, plays crucial roles in many cellular processes. The accurate prediction of protein methylation sites is fundamentally important for revealing the molecular mechanisms undergoing methylation. In recent years, computational prediction based on machine learning algorithms has emerged as a powerful and robust approach for identifying methylation sites, and much progress has been made in predictive performance improvement. However, the predictive performance of existing methods is not satisfactory in terms of overall accuracy. Motivated by this, we propose a novel random-forest-based predictor called MePred-RF, integrating several discriminative sequence-based feature descriptors and improving feature representation capability using a powerful feature selection technique. Importantly, unlike other methods based on multiple, complex information inputs, our proposed MePred-RF is based on sequence information alone. Comparative studies on benchmark datasets via vigorous jackknife tests indicate that our proposed MePred-RF method remarkably outperforms other state-of-the-art predictors, leading by a 4.5 percent average in terms of overall accuracy. A user-friendly webserver that implements the proposed method has been established for researchers' convenience, and is now freely available for public use through http://server.malab.cn/MePred-RF. We anticipate our research tool to be useful for the large-scale prediction and analysis of protein methylation sites.
引用
收藏
页码:1264 / 1273
页数:10
相关论文
共 50 条
  • [31] Protein sumoylation sites prediction based on two-stage feature selection
    Lin Lu
    Xiao-He Shi
    Su-Jun Li
    Zhi-Qun Xie
    Yong-Li Feng
    Wen-Cong Lu
    Yi-Xue Li
    Haipeng Li
    Yu-Dong Cai
    Molecular Diversity, 2010, 14 : 81 - 86
  • [32] Protein sumoylation sites prediction based on two-stage feature selection
    Lu, Lin
    Shi, Xiao-He
    Li, Su-Jun
    Xie, Zhi-Qun
    Feng, Yong-Li
    Lu, Wen-Cong
    Li, Yi-Xue
    Li, Haipeng
    Cai, Yu-Dong
    MOLECULAR DIVERSITY, 2010, 14 (01) : 81 - 86
  • [33] Recent advances in sequence-based protein structure prediction
    Dukka, B. K. C.
    BRIEFINGS IN BIOINFORMATICS, 2017, 18 (06) : 1021 - 1032
  • [34] Sequence-Based Prediction of Transmembrane Protein Crystallization Propensity
    Qizhi Zhu
    Lihua Wang
    Ruyu Dai
    Wei Zhang
    Wending Tang
    Yannan Bin
    Zeliang Wang
    Junfeng Xia
    Interdisciplinary Sciences: Computational Life Sciences, 2021, 13 : 693 - 702
  • [35] Sequence-Based Prediction of Transmembrane Protein Crystallization Propensity
    Zhu, Qizhi
    Wang, Lihua
    Dai, Ruyu
    Zhang, Wei
    Tang, Wending
    Bin, Yannan
    Wang, Zeliang
    Xia, Junfeng
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2021, 13 (04) : 693 - 702
  • [36] SOLpro: accurate sequence-based prediction of protein solubility
    Magnan, Christophe N.
    Randall, Arlo
    Baldi, Pierre
    BIOINFORMATICS, 2009, 25 (17) : 2200 - 2207
  • [37] Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection
    Ma, Xin
    Guo, Jing
    Sun, Xiao
    BIOMED RESEARCH INTERNATIONAL, 2015, 2015
  • [38] Sequence-based prediction of protein binding mode landscapes
    Horvath, Attila
    Miskei, Marton
    Ambrusl, Viktor
    Vendruscolo, Michele
    Fuxreiter, Monika
    PLOS COMPUTATIONAL BIOLOGY, 2020, 16 (05)
  • [39] Sequence-based prediction of pH-dependent protein solubility using CamSol
    Oeller, Marc
    Kang, Ryan
    Bell, Rosie
    Ausserwoger, Hannes
    Sormanni, Pietro
    Vendruscolo, Michele
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (02)
  • [40] PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine
    Rawi, Reda
    Mall, Raghvendra
    Kunji, Khalid
    Shen, Chen-Hsiang
    Kwong, Peter D.
    Chuang, Gwo-Yu
    BIOINFORMATICS, 2018, 34 (07) : 1092 - 1098