SAPPHIRE: A stacking-based ensemble learning framework for accurate prediction of thermophilic proteins

被引:36
|
作者
Charoenkwan, Phasit [1 ]
Schaduangrat, Nalini [2 ]
Moni, Mohammad Ali [3 ]
Lio, Pietro [4 ]
Manavalan, Balachandran [5 ]
Shoombuatong, Watshara [2 ]
机构
[1] Chiang Mai Univ, Coll Arts Media & Technol, Modern Management & Informat Technol, Chiang Mai 50200, Thailand
[2] Mahidol Univ, Fac Med Technol, Ctr Data Min & Biomed Informat, Bangkok 10700, Thailand
[3] Univ Queensland, Fac Hlth & Behav Sci, Sch Hlth & Rehabil Sci, Artificial Intelligence & Digital Hlth Data Sci, St Lucia, Qld 4072, Australia
[4] Univ Cambridge, Dept Comp Sci & Technol, Cambridge CB3 0FD, England
[5] Sungkyunkwan Univ, Coll Biotechnol & Bioengn, Dept Integrat Biotechnol, Computat Biol & Bioinformat Lab, Suwon 16419, South Korea
基金
新加坡国家研究基金会;
关键词
Thermophilic protein; Sequence analysis; Bioinformatics; Stacking strategy; Feature selection; Machine learning; AMINO-ACID-COMPOSITION; FEATURE-SELECTION; WEB SERVER; THERMOSTABILITY; DISCRIMINATION; INFORMATION; MUTATION; FEATURES;
D O I
10.1016/j.compbiomed.2022.105704
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Thermophilic proteins (TPPs) are important in the field of protein biochemistry and development of new enzymes. Thus, computational methods must be urgently developed to accurately and rapidly identify TPPs. To date, several computational methods have been developed for TPP identification; however, few limitations in terms of performance and utility remain. In this study, we present a novel computational method, SAPPHIRE, to achieve more accurate identification of TPPs using only sequence information without any need for structural information. We combined twelve different feature encodings representing different perspectives and six popular machine learning algorithms to train 72 baseline models and extract the key information of TPPs. Subsequently, the informative predicted probabilities from the baseline models were mined and selected using a genetic algorithm in conjunction with a self-assessment-report approach. Finally, the final meta-predictor, SAPPHIRE, was built and optimized by applying an optimal feature set. The performance of SAPPHIRE in the 10-fold crossvalidation test showed that a superior predictive performance compared with several baseline models could be achieved. Moreover, SAPPHIRE yielded an accuracy of 0.942 and Matthew's coefficient correlation of 0.884, which were 7.68 and 5.12% higher than those of the current existing methods, respectively, as indicated by the independent test. The proposed computational approach is anticipated to facilitate large-scale identification of TPPs and accelerate their applications in the food industry. The codes and datasets are available at https://gith ub.com/plenoi/SAPPHIRE.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins
    Ahmad, Saeed
    Charoenkwan, Phasit
    Quinn, Julian M. W.
    Moni, Mohammad Ali
    Hasan, Md Mehedi
    Lio, Pietro
    Shoombuatong, Watshara
    SCIENTIFIC REPORTS, 2022, 12 (01):
  • [2] SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins
    Saeed Ahmad
    Phasit Charoenkwan
    Julian M. W. Quinn
    Mohammad Ali Moni
    Md Mehedi Hasan
    Pietro Lio’
    Watshara Shoombuatong
    Scientific Reports, 12 (1)
  • [3] STALLION: a stacking-based ensemble learning framework for prokaryotic lysine acetylation site prediction
    Basith, Shaherin
    Lee, Gwang
    Manavalan, Balachandran
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [4] A stacking-based ensemble learning method for earthquake casualty prediction
    Cui, Shaoze
    Yin, Yunqiang
    Wang, Dujuan
    Li, Zhiwu
    Wang, Yanzhang
    APPLIED SOFT COMPUTING, 2021, 101 (101)
  • [5] Stacking-based multi-objective ensemble framework for prediction of hypertension
    Ren, Lijuan
    Zhang, Haiqing
    Seklouli, Aicha Sekhari
    Wang, Tao
    Bouras, Abdelaziz
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 215
  • [6] The usability of stacking-based ensemble learning model in crime prediction: a systematic review
    Eroglu, Cem
    Cakir, Huseyin
    CRIME PREVENTION & COMMUNITY SAFETY, 2024, 26 (04) : 440 - 489
  • [7] SEHP: stacking-based ensemble learning on novel features for review helpfulness prediction
    Malik, Muhammad Shahid Iqbal
    Nawaz, Aftab
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (01) : 653 - 679
  • [8] SEHP: stacking-based ensemble learning on novel features for review helpfulness prediction
    Muhammad Shahid Iqbal Malik
    Aftab Nawaz
    Knowledge and Information Systems, 2024, 66 : 653 - 679
  • [9] Stacking-based multi-objective evolutionary ensemble framework for prediction of diabetes mellitus
    Singh, Namrata
    Singh, Pradeep
    BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2020, 40 (01) : 1 - 22
  • [10] CRISPRCasStack: a stacking strategy-based ensemble learning framework for accurate identification of Cas proteins
    Zhang, Tianjiao
    Jia, Yuran
    Li, Hongfei
    Xu, Dali
    Zhou, Jie
    Wang, Guohua
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (05)