STALLION: a stacking-based ensemble learning framework for prokaryotic lysine acetylation site prediction

被引:57
作者
Basith, Shaherin [1 ]
Lee, Gwang [1 ]
Manavalan, Balachandran [1 ,2 ]
机构
[1] Ajou Univ, Sch Med, Dept Physiol, Suwon 16499, South Korea
[2] Korea Inst Adv Study, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
lysine acetylation sites; bioinformatics; stacking strategy; machine learning; feature optimization; performance assessment; POSTTRANSLATIONAL MODIFICATION; PROTEINS;
D O I
10.1093/bib/bbab376
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein post-translational modification (PTM) is an important regulatory mechanism that plays a key role in both normal and disease states. Acetylation on lysine residues is one of the most potent PTMs owing to its critical role in cellular metabolism and regulatory processes. Identifying protein lysine acetylation (Kace) sites is a challenging task in bioinformatics. To date, several machine learning-based methods for the in silico identification of Kace sites have been developed. Of those, a few are prokaryotic species-specific. Despite their attractive advantages and performances, these methods have certain limitations. Therefore, this study proposes a novel predictor STALLION (STacking-based Predictor for ProkAryotic Lysine AcetyLatION), containing six prokaryotic species-specific models to identify Kace sites accurately. To extract crucial patterns around Kace sites, we employed 11 different encodings representing three different characteristics. Subsequently, a systematic and rigorous feature selection approach was employed to identify the optimal feature set independently for five tree-based ensemble algorithms and built their respective baseline model for each species. Finally, the predicted values from baseline models were utilized and trained with an appropriate classifier using the stacking strategy to develop STALLION. Comparative benchmarking experiments showed that STALLION significantly outperformed existing predictor on independent tests. To expedite direct accessibility to the STALLION models, a user-friendly online predictor was implemented, which is available at: http://thegleelab.org/STALLION.
引用
收藏
页数:15
相关论文
共 92 条
  • [41] Prediction of Nε-acetylation on internal lysines implemented in Bayesian Discriminant Method
    Li, Ao
    Xue, Yu
    Jin, Changjiang
    Wang, Minghui
    Yao, Xuebiao
    [J]. BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2006, 350 (04) : 818 - 824
  • [42] Li F, 2021, BRIEF BIOINFORM
  • [43] Computational prediction and interpretation of both general and specific types of promoters in Escherichia coli by exploiting a stacked ensemble-learning framework
    Li, Fuyi
    Chen, Jinxiang
    Ge, Zongyuan
    Wen, Ya
    Yue, Yanwei
    Hayashida, Morihiro
    Baggag, Abdelkader
    Bensmail, Halima
    Song, Jiangning
    [J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (02) : 2126 - 2140
  • [44] Improved Prediction of Lysine Acetylation by Support Vector Machines
    Li, Songling
    Li, Hong
    Li, Mingfa
    Shyr, Yu
    Xie, Lu
    Li, Yixue
    [J]. PROTEIN AND PEPTIDE LETTERS, 2009, 16 (08) : 977 - 983
  • [45] Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features
    Li, Yuan
    Wang, Mingjun
    Wang, Huilin
    Tan, Hao
    Zhang, Ziding
    Webb, Geoffrey I.
    Song, Jiangning
    [J]. SCIENTIFIC REPORTS, 2014, 4
  • [46] Large-scale comparative review and assessment of computational methods for anti-cancer peptide identification
    Liang, Xiao
    Li, Fuyi
    Chen, Jinxiang
    Li, Junlong
    Wu, Hao
    Li, Shuqin
    Song, Jiangning
    Liu, Quanzhong
    [J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (04)
  • [47] iMRM: a platform for simultaneously identifying multiple kinds of RNA modifications
    Liu, Kewei
    Chen, Wei
    [J]. BIOINFORMATICS, 2020, 36 (11) : 3336 - 3342
  • [48] An Intelligent System for Identifying Acetylated Lysine on Histones and Nonhistone Proteins
    Lu, Cheng-Tsung
    Lee, Tzong-Yi
    Chen, Yu-Ju
    Chen, Yi-Ju
    [J]. BIOMED RESEARCH INTERNATIONAL, 2014, 2014
  • [49] Lv H, 2020, BRIEF BIOINFORM, V22, P1
  • [50] Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation
    Manavalan, Balachandran
    Basith, Shaherin
    Shin, Tae Hwan
    Wei, Leyi
    Lee, Gwang
    [J]. MOLECULAR THERAPY-NUCLEIC ACIDS, 2019, 16 : 733 - 744