SAPPHIRE: A stacking-based ensemble learning framework for accurate prediction of thermophilic proteins

被引:36
|
作者
Charoenkwan, Phasit [1 ]
Schaduangrat, Nalini [2 ]
Moni, Mohammad Ali [3 ]
Lio, Pietro [4 ]
Manavalan, Balachandran [5 ]
Shoombuatong, Watshara [2 ]
机构
[1] Chiang Mai Univ, Coll Arts Media & Technol, Modern Management & Informat Technol, Chiang Mai 50200, Thailand
[2] Mahidol Univ, Fac Med Technol, Ctr Data Min & Biomed Informat, Bangkok 10700, Thailand
[3] Univ Queensland, Fac Hlth & Behav Sci, Sch Hlth & Rehabil Sci, Artificial Intelligence & Digital Hlth Data Sci, St Lucia, Qld 4072, Australia
[4] Univ Cambridge, Dept Comp Sci & Technol, Cambridge CB3 0FD, England
[5] Sungkyunkwan Univ, Coll Biotechnol & Bioengn, Dept Integrat Biotechnol, Computat Biol & Bioinformat Lab, Suwon 16419, South Korea
基金
新加坡国家研究基金会;
关键词
Thermophilic protein; Sequence analysis; Bioinformatics; Stacking strategy; Feature selection; Machine learning; AMINO-ACID-COMPOSITION; FEATURE-SELECTION; WEB SERVER; THERMOSTABILITY; DISCRIMINATION; INFORMATION; MUTATION; FEATURES;
D O I
10.1016/j.compbiomed.2022.105704
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Thermophilic proteins (TPPs) are important in the field of protein biochemistry and development of new enzymes. Thus, computational methods must be urgently developed to accurately and rapidly identify TPPs. To date, several computational methods have been developed for TPP identification; however, few limitations in terms of performance and utility remain. In this study, we present a novel computational method, SAPPHIRE, to achieve more accurate identification of TPPs using only sequence information without any need for structural information. We combined twelve different feature encodings representing different perspectives and six popular machine learning algorithms to train 72 baseline models and extract the key information of TPPs. Subsequently, the informative predicted probabilities from the baseline models were mined and selected using a genetic algorithm in conjunction with a self-assessment-report approach. Finally, the final meta-predictor, SAPPHIRE, was built and optimized by applying an optimal feature set. The performance of SAPPHIRE in the 10-fold crossvalidation test showed that a superior predictive performance compared with several baseline models could be achieved. Moreover, SAPPHIRE yielded an accuracy of 0.942 and Matthew's coefficient correlation of 0.884, which were 7.68 and 5.12% higher than those of the current existing methods, respectively, as indicated by the independent test. The proposed computational approach is anticipated to facilitate large-scale identification of TPPs and accelerate their applications in the food industry. The codes and datasets are available at https://gith ub.com/plenoi/SAPPHIRE.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Stacking-based multi-objective approach for detection of smart power grid attacks using evolutionary ensemble learning
    Panthi, Manikant
    Das, Tanmoy Kanti
    INTERNATIONAL JOURNAL OF CRITICAL INFRASTRUCTURES, 2024, 20 (03) : 195 - 215
  • [32] Accelerating the identification of the allergenic potential of plant proteins using a stacked ensemble-learning framework
    Charoenkwan, Phasit
    Chumnanpuen, Pramote
    Schaduangrat, Nalini
    Shoombuatong, Watshara
    JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2024,
  • [33] A hybrid ensemble learning-based prediction model to minimise delay in air cargo transport using bagging and stacking
    Sahoo, Rosalin
    Pasayat, Ajit Kumar
    Bhowmick, Bhaskar
    Fernandes, Kiran
    Tiwari, Manoj Kumar
    INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2022, 60 (02) : 644 - 660
  • [34] Harnessing Ensemble in Machine Learning for Accurate Early Prediction and Prevention of Heart Disease
    Husain, Mohammad
    Kumar, Pankaj
    Ahmed, Mohammad Nadeem
    Ali, Arshad
    Rasool, Mohammad Ashiquee
    Hussain, Mohammad Rashid
    Dildar, Muhammad Shahid
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 182 - 195
  • [35] Prediction Method for Ocean Wave Height Based on Stacking Ensemble Learning Model
    Zhan, Yu
    Zhang, Huajun
    Li, Jianhao
    Li, Gen
    JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2022, 10 (08)
  • [36] Injection Molding Part Size Prediction Method Based on Stacking Ensemble Learning
    Song J.
    Wang W.
    Li D.
    Liang J.
    Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2022, 50 (06): : 19 - 26
  • [37] Prediction Model of Thermophilic Protein Based on Stacking Method
    Wang, Xian-Fang
    Lu, Fan
    Du, Zhi-Yong
    Li, Qi-Meng
    CURRENT BIOINFORMATICS, 2021, 16 (10) : 1328 - 1340
  • [38] Accurate eQTL prioritization with an ensemble-based framework
    Zeng, Haoyang
    Edwards, Matthew D.
    Guo, Yuchun
    Gifford, David K.
    HUMAN MUTATION, 2017, 38 (09) : 1259 - 1265
  • [39] RPI-SE: a stacking ensemble learning framework for ncRNA-protein interactions prediction using sequence information
    Yi, Hai-Cheng
    You, Zhu-Hong
    Wang, Mei-Neng
    Guo, Zhen-Hao
    Wang, Yan-Bin
    Zhou, Ji-Ren
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [40] Towards an Accurate Breast Cancer Classification Model based on Ensemble Learning
    Hesham, Aya
    El-Rashidy, Nora
    Rezk, Amira
    Hikal, Noha A.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (12) : 590 - 602