Interpretable machine learning with tree-based shapley additive explanations: Application to metabolomics datasets for binary classification

被引:25
作者
Bifarin, Olatomiwa O. [1 ,2 ]
机构
[1] Univ Georgia, Dept Biochem & Mol Biol, Athens, GA 30602 USA
[2] Georgia Inst Technol, Sch Chem & Biochem, Atlanta, GA 30602 USA
来源
PLOS ONE | 2023年 / 18卷 / 05期
基金
英国科研创新办公室;
关键词
METABOLIGHTS; REPOSITORY;
D O I
10.1371/journal.pone.0284315
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Machine learning (ML) models are used in clinical metabolomics studies most notably for biomarker discoveries, to identify metabolites that discriminate between a case and control group. To improve understanding of the underlying biomedical problem and to bolster confidence in these discoveries, model interpretability is germane. In metabolomics, partial least square discriminant analysis (PLS-DA) and its variants are widely used, partly due to the model's interpretability with the Variable Influence in Projection (VIP) scores, a global interpretable method. Herein, Tree-based Shapley Additive explanations (SHAP), an interpretable ML method grounded in game theory, was used to explain ML models with local explanation properties. In this study, ML experiments (binary classification) were conducted for three published metabolomics datasets using PLS-DA, random forests, gradient boosting, and extreme gradient boosting (XGBoost). Using one of the datasets, PLS-DA model was explained using VIP scores, while one of the best-performing models, a random forest model, was interpreted using Tree SHAP. The results show that SHAP has a more explanation depth than PLS-DA's VIP, making it a powerful method for rationalizing machine learning predictions from metabolomics studies.
引用
收藏
页数:21
相关论文
共 49 条
  • [1] Multiclass feature selection with metaheuristic optimization algorithms: a review
    Akinola, Olatunji O.
    Ezugwu, Absalom E.
    Agushaka, Jeffrey O.
    Abu Zitar, Raed
    Abualigah, Latih
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (22) : 19751 - 19790
  • [2] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [3] Statistical modeling: The two cultures
    Breiman, L
    [J]. STATISTICAL SCIENCE, 2001, 16 (03) : 199 - 215
  • [4] Metabolomic profiles predict individual multidisease outcomes
    Buergel, Thore
    Steinfeldt, Jakob
    Ruyoga, Greg
    Pietzner, Maik
    Bizzarri, Daniele
    Vojinovic, Dina
    zu Belzen, Julius Upmeier
    Loock, Lukas
    Kittner, Paul
    Christmann, Lara
    Hollmann, Noah
    Strangalies, Henrik
    Braunger, Jana M.
    Wild, Benjamin
    Chiesa, Scott T.
    Spranger, Joachim
    Klostermann, Fabian
    van den Akker, Erik B.
    Trompet, Stella
    Mooijaart, Simon P.
    Sattar, Naveed
    Jukema, J. Wouter
    Lavrijssen, Birgit
    Kavousi, Maryam
    Ghanbari, Mohsen
    Ikram, Mohammad A.
    Slagboom, Eline
    Kivimaki, Mika
    Langenberg, Claudia
    Deanfield, John
    Eils, Roland
    Landmesser, Ulf
    [J]. NATURE MEDICINE, 2022, 28 (11) : 2309 - +
  • [5] POINTS OF SIGNIFICANCE Statistics versus machine learning
    Bzdok, Danilo
    Altman, Naomi
    Krzywinski, Martin
    [J]. NATURE METHODS, 2018, 15 (04) : 232 - 233
  • [6] POINTS OF SIGNIFICANCE Machine learning: a primer
    Bzdok, Danilo
    Krzywinski, Martin
    Altman, Naomi
    [J]. NATURE METHODS, 2017, 14 (12) : 1119 - 1120
  • [7] An interpretable machine learning method for supporting ecosystem management: Application to species distribution models of freshwater macroinvertebrates
    Cha, YoonKyung
    Shin, Jihoon
    Go, ByeongGeon
    Lee, Dae-Seong
    Kim, YoungWoo
    Kim, TaeHo
    Park, Young-Seuk
    [J]. JOURNAL OF ENVIRONMENTAL MANAGEMENT, 2021, 291
  • [8] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [9] Investigation of Metabolomic Blood Biomarkers for Detection of Adenocarcinoma Lung Cancer
    Fahrmann, Johannes F.
    Kim, Kyoungmi
    DeFelice, Brian C.
    Taylor, Sandra L.
    Gandara, David R.
    Yoneda, Ken Y.
    Cooke, David T.
    Fiehn, Oliver
    Kelly, Karen
    Miyamoto, Suzanne
    [J]. CANCER EPIDEMIOLOGY BIOMARKERS & PREVENTION, 2015, 24 (11) : 1716 - 1723
  • [10] Fisher A, 2019, Arxiv, DOI arXiv:1801.01489