Fast Prediction of Protein Methylation Sites Using a Sequence-Based Feature Selection Technique

被引:156
|
作者
Wei, Leyi [1 ]
Xing, Pengwei [1 ]
Shi, Gaotao [1 ]
Ji, Zhiliang [2 ,3 ]
Zou, Quan [1 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin 300072, Peoples R China
[2] Xiamen Univ, Sch Life Sci, State Key Lab Stress Cell Biol, Xiamen 361005, Peoples R China
[3] Xiamen Univ, Key Lab Chem Biol Fujian Prov, Xiamen 361005, Peoples R China
基金
中国国家自然科学基金;
关键词
Protein methylation site; machine learning based method; feature representation; feature selection technique; CITRULLINATION; PSEKNC;
D O I
10.1109/TCBB.2017.2670558
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein methylation, an important post-translational modification, plays crucial roles in many cellular processes. The accurate prediction of protein methylation sites is fundamentally important for revealing the molecular mechanisms undergoing methylation. In recent years, computational prediction based on machine learning algorithms has emerged as a powerful and robust approach for identifying methylation sites, and much progress has been made in predictive performance improvement. However, the predictive performance of existing methods is not satisfactory in terms of overall accuracy. Motivated by this, we propose a novel random-forest-based predictor called MePred-RF, integrating several discriminative sequence-based feature descriptors and improving feature representation capability using a powerful feature selection technique. Importantly, unlike other methods based on multiple, complex information inputs, our proposed MePred-RF is based on sequence information alone. Comparative studies on benchmark datasets via vigorous jackknife tests indicate that our proposed MePred-RF method remarkably outperforms other state-of-the-art predictors, leading by a 4.5 percent average in terms of overall accuracy. A user-friendly webserver that implements the proposed method has been established for researchers' convenience, and is now freely available for public use through http://server.malab.cn/MePred-RF. We anticipate our research tool to be useful for the large-scale prediction and analysis of protein methylation sites.
引用
收藏
页码:1264 / 1273
页数:10
相关论文
共 50 条
  • [21] Sequence-based prediction of protein-protein interaction sites with L1-logreg classifier
    Dhole, Kaustubh
    Singh, Gurdeep
    Pai, Priyadarshini P.
    Mondal, Sukanta
    JOURNAL OF THEORETICAL BIOLOGY, 2014, 348 : 47 - 54
  • [22] Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins
    Daniele Raimondi
    Gabriele Orlando
    Rita Pancsa
    Taushif Khan
    Wim F. Vranken
    Scientific Reports, 7
  • [23] Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins
    Raimondi, Daniele
    Orlando, Gabriele
    Pancsa, Rita
    Khan, Taushif
    Vranken, Wim F.
    SCIENTIFIC REPORTS, 2017, 7
  • [24] Prediction of protein amidation sites by feature selection and analysis
    Weiren Cui
    Shen Niu
    Lulu Zheng
    Lele Hu
    Tao Huang
    Lei Gu
    Kaiyan Feng
    Ning Zhang
    Yudong Cai
    Yixue Li
    Molecular Genetics and Genomics, 2013, 288 : 391 - 400
  • [25] Prediction of protein amidation sites by feature selection and analysis
    Cui, Weiren
    Niu, Shen
    Zheng, Lulu
    Hu, Lele
    Huang, Tao
    Gu, Lei
    Feng, Kaiyan
    Zhang, Ning
    Cai, Yudong
    Li, Yixue
    MOLECULAR GENETICS AND GENOMICS, 2013, 288 (09) : 391 - 400
  • [26] Sequence-based prediction of protein protein interaction using a deep-learning algorithm
    Sun, Tanlin
    Zhou, Bo
    Lai, Luhua
    Pei, Jianfeng
    BMC BIOINFORMATICS, 2017, 18
  • [27] Recent developments of sequence-based prediction of protein–protein interactions
    Yoichi Murakami
    Kenji Mizuguchi
    Biophysical Reviews, 2022, 14 : 1393 - 1411
  • [28] Sequence-based prediction of protein protein interaction using a deep-learning algorithm
    Tanlin Sun
    Bo Zhou
    Luhua Lai
    Jianfeng Pei
    BMC Bioinformatics, 18
  • [29] A Comprehensive Comparative Review of Protein Sequence-Based Computational Prediction Models of Lysine Succinylation Sites
    Tasmia, Samme Amena
    Kibria, Md. Kaderi
    Islam, Md. Ariful
    Khatun, Mst Shamima
    Mollah, Md. Nurul Haque
    CURRENT PROTEIN & PEPTIDE SCIENCE, 2022, 23 (11) : 744 - 756
  • [30] SeqTMPPI: Sequence-Based Transmembrane Protein Interaction Prediction
    Wang, Han
    Jiang, Jiuhong
    Chen, Qiufen
    Zhang, Chunhua
    Lu, Chang
    Ma, Zhiqiang
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 96 - 99