SAPPHIRE: A stacking-based ensemble learning framework for accurate prediction of thermophilic proteins

被引:36
|
作者
Charoenkwan, Phasit [1 ]
Schaduangrat, Nalini [2 ]
Moni, Mohammad Ali [3 ]
Lio, Pietro [4 ]
Manavalan, Balachandran [5 ]
Shoombuatong, Watshara [2 ]
机构
[1] Chiang Mai Univ, Coll Arts Media & Technol, Modern Management & Informat Technol, Chiang Mai 50200, Thailand
[2] Mahidol Univ, Fac Med Technol, Ctr Data Min & Biomed Informat, Bangkok 10700, Thailand
[3] Univ Queensland, Fac Hlth & Behav Sci, Sch Hlth & Rehabil Sci, Artificial Intelligence & Digital Hlth Data Sci, St Lucia, Qld 4072, Australia
[4] Univ Cambridge, Dept Comp Sci & Technol, Cambridge CB3 0FD, England
[5] Sungkyunkwan Univ, Coll Biotechnol & Bioengn, Dept Integrat Biotechnol, Computat Biol & Bioinformat Lab, Suwon 16419, South Korea
基金
新加坡国家研究基金会;
关键词
Thermophilic protein; Sequence analysis; Bioinformatics; Stacking strategy; Feature selection; Machine learning; AMINO-ACID-COMPOSITION; FEATURE-SELECTION; WEB SERVER; THERMOSTABILITY; DISCRIMINATION; INFORMATION; MUTATION; FEATURES;
D O I
10.1016/j.compbiomed.2022.105704
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Thermophilic proteins (TPPs) are important in the field of protein biochemistry and development of new enzymes. Thus, computational methods must be urgently developed to accurately and rapidly identify TPPs. To date, several computational methods have been developed for TPP identification; however, few limitations in terms of performance and utility remain. In this study, we present a novel computational method, SAPPHIRE, to achieve more accurate identification of TPPs using only sequence information without any need for structural information. We combined twelve different feature encodings representing different perspectives and six popular machine learning algorithms to train 72 baseline models and extract the key information of TPPs. Subsequently, the informative predicted probabilities from the baseline models were mined and selected using a genetic algorithm in conjunction with a self-assessment-report approach. Finally, the final meta-predictor, SAPPHIRE, was built and optimized by applying an optimal feature set. The performance of SAPPHIRE in the 10-fold crossvalidation test showed that a superior predictive performance compared with several baseline models could be achieved. Moreover, SAPPHIRE yielded an accuracy of 0.942 and Matthew's coefficient correlation of 0.884, which were 7.68 and 5.12% higher than those of the current existing methods, respectively, as indicated by the independent test. The proposed computational approach is anticipated to facilitate large-scale identification of TPPs and accelerate their applications in the food industry. The codes and datasets are available at https://gith ub.com/plenoi/SAPPHIRE.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] A Novel Deep Stacking-Based Ensemble Approach for Short-Term Traffic Speed Prediction
    Awan, Anees Ahmed
    Majid, Abdul
    Riaz, Rabia
    Rizvi, Sanam Shahla
    Kwon, Se Jin
    IEEE ACCESS, 2024, 12 : 15222 - 15235
  • [22] StackTADB: a stacking-based ensemble learning model for predicting the boundaries of topologically associating domains (TADs) accurately in fruit flies
    Wu, Hao
    Zhang, Pengyu
    Ai, Zhaoheng
    Wei, Leyi
    Zhang, Hongming
    Yang, Fan
    Cui, Lizhen
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (02)
  • [23] Effective network intrusion detection using stacking-based ensemble approach
    Ali, Muhammad
    Haque, Mansoor-ul
    Durad, Muhammad Hanif
    Usman, Anila
    Mohsin, Syed Muhammad
    Mujlid, Hana
    Maple, Carsten
    INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 2023, 22 (06) : 1781 - 1798
  • [24] Effective network intrusion detection using stacking-based ensemble approach
    Muhammad Ali
    Mansoor-ul- Haque
    Muhammad Hanif Durad
    Anila Usman
    Syed Muhammad Mohsin
    Hana Mujlid
    Carsten Maple
    International Journal of Information Security, 2023, 22 : 1781 - 1798
  • [25] Optimum-path forest stacking-based ensemble for intrusion detection
    Mateus A. Bertoni
    Gustavo H. de Rosa
    Jose R. F. Brega
    Evolutionary Intelligence, 2022, 15 : 2037 - 2054
  • [26] Optimum-path forest stacking-based ensemble for intrusion detection
    Bertoni, Mateus A.
    de Rosa, Gustavo H.
    Brega, Jose R. F.
    EVOLUTIONARY INTELLIGENCE, 2022, 15 (03) : 2037 - 2054
  • [27] Early Diabetes Prediction Based on Stacking Ensemble Learning Model
    Liu, JiMin
    Fan, LuHao
    Jia, QuanQiu
    Wen, LongRi
    Shi, ChengFeng
    PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 2687 - 2692
  • [28] STACKION: Ion Channel-Modulating Peptides Identification Using Stacking-Based Ensemble Machine Learning
    Ali, Md. Mamun
    Ahmed, Kawsar
    Bui, Francis M.
    Chen, Li
    2023 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CCECE, 2023,
  • [29] Stacking Ensemble Machine Learning Modelling for Milk Yield Prediction Based on Biological Characteristics and Feeding Strategies
    Xing, Ruiming
    Li, Baihua
    Dora, Shirin
    Whittaker, Michael
    Mathie, Janette
    2024 19TH CONFERENCE ON COMPUTER SCIENCE AND INTELLIGENCE SYSTEMS, FEDCSIS 2024, 2024, : 701 - 706
  • [30] A highly accurate and robust prediction framework for drilling rate of penetration based on machine learning ensemble algorithm
    Yang, Yuxiang
    Cen, Xiao
    Ni, Haocheng
    Liu, Yibing
    Chen, Zhangxing John
    Yang, Jin
    Hong, Bingyuan
    GEOENERGY SCIENCE AND ENGINEERING, 2025, 244