STALLION: a stacking-based ensemble learning framework for prokaryotic lysine acetylation site prediction

被引:58
作者
Basith, Shaherin [1 ]
Lee, Gwang [1 ]
Manavalan, Balachandran [1 ,2 ]
机构
[1] Ajou Univ, Sch Med, Dept Physiol, Suwon 16499, South Korea
[2] Korea Inst Adv Study, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
lysine acetylation sites; bioinformatics; stacking strategy; machine learning; feature optimization; performance assessment; POSTTRANSLATIONAL MODIFICATION; PROTEINS;
D O I
10.1093/bib/bbab376
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein post-translational modification (PTM) is an important regulatory mechanism that plays a key role in both normal and disease states. Acetylation on lysine residues is one of the most potent PTMs owing to its critical role in cellular metabolism and regulatory processes. Identifying protein lysine acetylation (Kace) sites is a challenging task in bioinformatics. To date, several machine learning-based methods for the in silico identification of Kace sites have been developed. Of those, a few are prokaryotic species-specific. Despite their attractive advantages and performances, these methods have certain limitations. Therefore, this study proposes a novel predictor STALLION (STacking-based Predictor for ProkAryotic Lysine AcetyLatION), containing six prokaryotic species-specific models to identify Kace sites accurately. To extract crucial patterns around Kace sites, we employed 11 different encodings representing three different characteristics. Subsequently, a systematic and rigorous feature selection approach was employed to identify the optimal feature set independently for five tree-based ensemble algorithms and built their respective baseline model for each species. Finally, the predicted values from baseline models were utilized and trained with an appropriate classifier using the stacking strategy to develop STALLION. Comparative benchmarking experiments showed that STALLION significantly outperformed existing predictor on independent tests. To expedite direct accessibility to the STALLION models, a user-friendly online predictor was implemented, which is available at: http://thegleelab.org/STALLION.
引用
收藏
页数:15
相关论文
共 92 条
  • [1] Recent Trends on the Development of Machine Learning Approaches for the Prediction of Lysine Acetylation Sites
    Basith, Shaherin
    Chang, Hye Jin
    Nithiyanandam, Saraswathy
    Shin, Tae Hwan
    Manavalan, Balachandran
    Lee, Gwang
    [J]. CURRENT MEDICINAL CHEMISTRY, 2022, 29 (02) : 235 - 250
  • [2] Machine intelligence in peptide therapeutics: A next-generation tool for rapid disease screening
    Basith, Shaherin
    Manavalan, Balachandran
    Shin, Tae Hwan
    Lee, Gwang
    [J]. MEDICINAL RESEARCH REVIEWS, 2020, 40 (04) : 1276 - 1314
  • [3] SDM6A: A Web-Based Integrative Machine-Learning Framework for Predicting 6mA Sites in the Rice Genome
    Basith, Shaherin
    Manavalan, Balachandran
    Shin, Tae Hwan
    Lee, Gwang
    [J]. MOLECULAR THERAPY-NUCLEIC ACIDS, 2019, 18 : 131 - 141
  • [4] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [5] StackIL6: a stacking ensemble model for improving the prediction of IL-6 inducing peptides
    Charoenkwan, Phasit
    Chiangjong, Wararat
    Nantasenamat, Chanin
    Hasan, Md Mehedi
    Manavalan, Balachandran
    Shoombuatong, Watshara
    [J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
  • [6] BERT4Bitter: a bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides
    Charoenkwan, Phasit
    Nantasenamat, Chanin
    Hasan, Md Mehedi
    Manavalan, Balachandran
    Shoombuatong, Watshara
    [J]. BIOINFORMATICS, 2021, 37 (17) : 2556 - 2562
  • [7] Meta-iPVP: a sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation
    Charoenkwan, Phasit
    Nantasenamat, Chanin
    Hasan, Md. Mehedi
    Shoombuatong, Watshara
    [J]. JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2020, 34 (10) : 1105 - 1116
  • [8] Prediction and functional analysis of prokaryote lysine acetylation site by incorporating six types of features into Chou's general PseAAC
    Chen, Guodong
    Cao, Man
    Yu, Jialin
    Guo, Xinyun
    Shi, Shaoping
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2019, 461 : 92 - 101
  • [9] ProAcePred: prokaryote lysine acetylation sites prediction based on elastic net feature optimization
    Chen, Guodong
    Cao, Man
    Luo, Kun
    Wang, Lina
    Wen, Pingping
    Shi, Shaoping
    [J]. BIOINFORMATICS, 2018, 34 (23) : 3999 - 4006
  • [10] iATP: A Sequence Based Method for identifying Anti-tubercular Peptides
    Chen, Wei
    Feng, Pengmian
    Nie, Fulei
    [J]. MEDICINAL CHEMISTRY, 2020, 16 (05) : 620 - 625