StackIL6: a stacking ensemble model for improving the prediction of IL-6 inducing peptides

被引:101
作者
Charoenkwan, Phasit [3 ]
Chiangjong, Wararat [4 ]
Nantasenamat, Chanin [5 ]
Hasan, Md Mehedi [6 ]
Manavalan, Balachandran [1 ,7 ]
Shoombuatong, Watshara [2 ]
机构
[1] Ajou Univ, Dept Physiol, Sch Med, Suwon 443380, South Korea
[2] Mahidol Univ, Fac Med Technol, Ctr Data Min & Biomed Informat, Bangkok 10700, Thailand
[3] Chiang Mai Univ, Coll Arts Media & Technol, Chiang Mai, Thailand
[4] Mahidol Univ, Pediat Translat Res Unit, Dept Pediat, Fac Med,Ramathibodi Hosp, Salaya, Nakhon Pathom, Thailand
[5] Mahidol Univ, Ctr Data Min & Biomed Informat, Fac Med Technol, Salaya, Nakhon Pathom, Thailand
[6] Tulane Univ, New Orleans, LA 70118 USA
[7] Korea Inst Adv Study, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
interleukin; 6; IL-6; bioinformatics; sequence analysis; machine learning; ensemble learning; PROTEINS;
D O I
10.1093/bib/bbab172
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The release of interleukin (IL)-6 is stimulated by antigenic peptides from pathogens as well as by immune cells for activating aggressive inflammation. IL-6 inducing peptides are derived from pathogens and can be used as diagnostic biomarkers for predicting various stages of disease severity as well as being used as IL-6 inhibitors for the suppression of aggressive multi-signaling immune responses. Thus, the accurate identification of IL-6 inducing peptides is of great importance for investigating their mechanism of action as well as for developing diagnostic and immunotherapeutic applications. This study proposes a novel stacking ensemble model (termed StackIL6) for accurately identifying IL-6 inducing peptides. More specifically, StackIL6 was constructed from twelve different feature descriptors derived from three major groups of features (composition-based features, composition-transition-distribution-based features and physicochemical properties-based features) and five popular machine learning algorithms (extremely randomized trees, logistic regression, multi-layer perceptron, support vector machine and random forest). To enhance the utility of baseline models, they were effectively and systematically integrated through a stacking strategy to build the final meta-based model. Extensive benchmarking experiments demonstrated that StackIL6 could achieve significantly better performance than the existing method (IL6PRED) and outperformed its constituent baseline models on both training and independent test datasets, which thereby support its excellent discrimination and generalization abilities. To facilitate easy access to the StackIL6 model, it was established as a freely available web server accessible at http://camt.pythonanywhere.com/StackIL6. It is anticipated that StackIL6 can help to facilitate rapid screening of promising IL-6 inducing peptides for the development of diagnostic and immunotherapeutic applications in the future.
引用
收藏
页数:13
相关论文
共 60 条
  • [1] Machine intelligence in peptide therapeutics: A next-generation tool for rapid disease screening
    Basith, Shaherin
    Manavalan, Balachandran
    Shin, Tae Hwan
    Lee, Gwang
    [J]. MEDICINAL RESEARCH REVIEWS, 2020, 40 (04) : 1276 - 1314
  • [2] Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds and their interactions
    Cao, Dong-Sheng
    Xiao, Nan
    Xu, Qing-Song
    Chen, Alex F.
    [J]. BIOINFORMATICS, 2015, 31 (02) : 279 - 281
  • [3] BERT4Bitter: a bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides
    Charoenkwan, Phasit
    Nantasenamat, Chanin
    Hasan, Md Mehedi
    Manavalan, Balachandran
    Shoombuatong, Watshara
    [J]. BIOINFORMATICS, 2021, 37 (17) : 2556 - 2562
  • [4] Improved prediction and characterization of anticancer activities of peptides using a novel flexible scoring card method
    Charoenkwan, Phasit
    Chiangjong, Wararat
    Lee, Vannajan Sanghiran
    Nantasenamat, Chanin
    Hasan, Md Mehedi
    Shoombuatong, Watshara
    [J]. SCIENTIFIC REPORTS, 2021, 11 (01)
  • [5] iUmami-SCM: A Novel Sequence-Based Predictor for Prediction and Analysis of Umami Peptides Using a Scoring Card Method with Propensity Scores of Dipeptides
    Charoenkwan, Phasit
    Yana, Janchai
    Nantasenamat, Chanin
    Hasan, Mehedi
    Shoombuatong, Watshara
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2020, 60 (12) : 6666 - 6678
  • [6] iDPPIV-SCM: A Sequence-Based Predictor for Identifying and Analyzing Dipeptidyl Peptidase IV (DPP-IV) Inhibitory Peptides Using a Scoring Card Method
    Charoenkwan, Phasit
    Kanthawong, Sakawrat
    Nantasenamat, Chanin
    Hasan, Mehedi
    Shoombuatong, Watshara
    [J]. JOURNAL OF PROTEOME RESEARCH, 2020, 19 (10) : 4125 - 4136
  • [7] Meta-iPVP: a sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation
    Charoenkwan, Phasit
    Nantasenamat, Chanin
    Hasan, Md. Mehedi
    Shoombuatong, Watshara
    [J]. JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2020, 34 (10) : 1105 - 1116
  • [8] iBitter-SCM: Identi fication and characterization of bitter peptides using a scoring card method with propensity scores of dipeptides
    Charoenkwan, Phasit
    Yana, Janchai
    Schaduangrat, Nalini
    Nantasenamat, Chanin
    Hasan, Md Mehedi
    Shoombuatong, Watshara
    [J]. GENOMICS, 2020, 112 (04) : 2813 - 2822
  • [9] iTTCA-Hybrid: Improved and robust identification of tumor T cell antigens by utilizing hybrid feature representation
    Charoenkwan, Phasit
    Nantasenamat, Chanin
    Hasan, Md Mehedi
    Shoombuatong, Watshara
    [J]. ANALYTICAL BIOCHEMISTRY, 2020, 599
  • [10] iQSP: A Sequence-Based Tool for the Prediction and Analysis of Quorum Sensing Peptides via Chou's 5-Steps Rule and Informative Physicochemical Properties
    Charoenkwan, Phasit
    Schaduangrat, Nalini
    Nantasenamat, Chanin
    Piacham, Theeraphon
    Shoombuatong, Watshara
    [J]. INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2020, 21 (01)