Automated Machine Learning and Explainable AI (AutoML-XAI) for Metabolomics: Improving Cancer Diagnostics

被引：6

作者：

Bifarin, Olatomiwa O. ^{[1
]}

Fernandez, Facundo M. ^{[1
,2
]}

机构：

[1] Georgia Inst Technol, Sch Chem & Biochem, Atlanta, GA 30332 USA

[2] Georgia Inst Technol, Petit Inst Bioengn & Biosci, Atlanta, GA 30332 USA

来源：

JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY | 2024年 / 35卷 / 06期

关键词：

metabolomics; automated machine learning; explainableAI; cancer biology; Shapley additive explanations;

D O I：

10.1021/jasms.3c00403

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Metabolomics generates complex data necessitating advanced computational methods for generating biological insight. While machine learning (ML) is promising, the challenges of selecting the best algorithms and tuning hyperparameters, particularly for nonexperts, remain. Automated machine learning (AutoML) can streamline this process; however, the issue of interpretability could persist. This research introduces a unified pipeline that combines AutoML with explainable AI (XAI) techniques to optimize metabolomics analysis. We tested our approach on two data sets: renal cell carcinoma (RCC) urine metabolomics and ovarian cancer (OC) serum metabolomics. AutoML, using Auto-sklearn, surpassed standalone ML algorithms like SVM and k-Nearest Neighbors in differentiating between RCC and healthy controls, as well as OC patients and those with other gynecological cancers. The effectiveness of Auto-sklearn is highlighted by its AUC scores of 0.97 for RCC and 0.85 for OC, obtained from the unseen test sets. Importantly, on most of the metrics considered, Auto-sklearn demonstrated a better classification performance, leveraging a mix of algorithms and ensemble techniques. Shapley Additive Explanations (SHAP) provided a global ranking of feature importance, identifying dibutylamine and ganglioside GM(d34:1) as the top discriminative metabolites for RCC and OC, respectively. Waterfall plots offered local explanations by illustrating the influence of each metabolite on individual predictions. Dependence plots spotlighted metabolite interactions, such as the connection between hippuric acid and one of its derivatives in RCC, and between GM3(d34:1) and GM3(18:1_16:0) in OC, hinting at potential mechanistic relationships. Through decision plots, a detailed error analysis was conducted, contrasting feature importance for correctly versus incorrectly classified samples. In essence, our pipeline emphasizes the importance of harmonizing AutoML and XAI, facilitating both simplified ML application and improved interpretability in metabolomics data science.

引用

页码：1089 / 1100

页数：12

共 24 条

[1] Machine Learning-Enabled Renal Cell Carcinoma Status Prediction Using Multiplatform Urine-Based Metabolomics [J].

Bifarin, Olatomiwa O. ;

Gaul, David A. ;

Sah, Samyukta ;

Arnold, Rebecca S. ;

Ogan, Kenneth ;

Master, Viraj A. ;

Roberts, David L. ;

Bergquist, Sharon H. ;

Petros, John A. ;

Fernandez, Facundo M. ;

Edison, Arthur S. .

JOURNAL OF PROTEOME RESEARCH, 2021, 20 (07) :3629-3641

[2] Data analysis with Shapley values for automatic subject selection in Alzheimer's disease data sets using interpretable machine learning [J].

Bloch, Louise ;

Friedrich, Christoph M. .

ALZHEIMERS RESEARCH & THERAPY, 2021, 13 (01)

[3] Harnessing the complexity of metabolomic data with chemometrics [J].

Boccard, Julien ;

Rudaz, Serge .

JOURNAL OF CHEMOMETRICS, 2014, 28 (01) :1-9

[4]

Feurer M, 2015, ADV NEUR IN, V28

[5]

Feurer Matthias, 2022, JOURNAL OF MACHINE LEARNING RESEARCH, V23

[6] Greedy function approximation: A gradient boosting machine [J].

Friedman, JH .

ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232

[7] Applications of machine learning in metabolomics: Disease modeling and classification [J].

Galal, Aya ;

Talal, Marwa ;

Moustafa, Ahmed .

FRONTIERS IN GENETICS, 2022, 13

[8] Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation [J].

Goldstein, Alex ;

Kapelner, Adam ;

Bleich, Justin ;

Pitkin, Emil .

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2015, 24 (01) :44-65

[9] AutoML: A survey of the state-of-the-art [J].

He, Xin ;

Zhao, Kaiyong ;

Chu, Xiaowen .

KNOWLEDGE-BASED SYSTEMS, 2021, 212

[10] Explainable AI for Bioinformatics: Methods, Tools and Applications [J].

Karim, Md Rezaul ;

Islam, Tanhim ;

Shajalal, Md ;

Beyan, Oya ;

Lange, Christoph ;

Cochez, Michael ;

Rebholz-Schuhmann, Dietrich ;

Decker, Stefan .

BRIEFINGS IN BIOINFORMATICS, 2023, 24 (05)

← 1 2 3 →