Explainable discovery of disease biomarkers: The case of ovarian cancer to illustrate the best practice in machine learning and Shapley analysis

被引:24
作者
Huang, Weitong [1 ]
Suominen, Hanna [1 ,2 ]
Liu, Tommy [1 ]
Rice, Gregory [3 ,4 ]
Salomon, Carlos [4 ]
Barnard, Amanda S. [1 ]
机构
[1] Australian Natl Univ, Sch Comp, Acton, ACT 2601, Australia
[2] Univ Turku, Dept Comp, Turku, Finland
[3] Inoviq Ltd, Notting Hill, Australia
[4] Univ Queensland, Ctr Clin Res, Royal Brisbane & Womens Hosp, Fac Med,Translat Extracellular Vesicles Obstet &, Brisbane, Australia
基金
英国医学研究理事会;
关键词
Cancer screening; Supervised machine learning; Medical informatics; Evaluation study as topic; ARTIFICIAL-INTELLIGENCE; BIG DATA; BLACK-BOX; DIAGNOSIS; ROMA; ACCOUNTABILITY; PREDICTION; ACCURACY; PROTEIN; CA125;
D O I
10.1016/j.jbi.2023.104365
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Objective: Ovarian cancer is a significant health issue with lasting impacts on the community. Despite recent advances in surgical, chemotherapeutic and radiotherapeutic interventions, they have had only marginal impacts due to an inability to identify biomarkers at an early stage. Biomarker discovery is challenging, yet essential for improving drug discovery and clinical care. Machine learning (ML) techniques are invaluable for recognising complex patterns in biomarkers compared to conventional methods, yet they can lack physical insights into diagnosis. eXplainable Artificial Intelligence (XAI) is capable of providing deeper insights into the decision-making of complex ML algorithms increasing their applicability. We aim to introduce best practice for combining ML and XAI techniques for biomarker validation tasks.Methods: We focused on classification tasks and a game theoretic approach based on Shapley values to build and evaluate models and visualise results. We described the workflow and apply the pipeline in a case study using the CDAS PLCO Ovarian Biomarkers dataset to demonstrate the potential for accuracy and utility. Results: The case study results demonstrate the efficacy of the ML pipeline, its consistency, and advantages compared to conventional statistical approaches.Conclusion: The resulting guidelines provide a general framework for practical application of XAI in medical research that can inform clinicians and validate and explain cancer biomarkers.
引用
收藏
页数:9
相关论文
共 90 条
[1]  
Aihemaiti G., 2022, DEV VALIDATION NOMOG
[2]  
Alufaisan Y, 2021, AAAI CONF ARTIF INTE, V35, P6618
[3]  
Bergstra J, 2012, J MACH LEARN RES, V13, P281
[4]  
Berrar D., 2019, Encyclopedia of Bioinformaticsand Computational Biology: Cross-Validation
[5]   Biomarker definitions and their applications [J].
Califf, Robert M. .
EXPERIMENTAL BIOLOGY AND MEDICINE, 2018, 243 (03) :213-221
[6]   Big Data: A Survey [J].
Chen, Min ;
Mao, Shiwen ;
Liu, Yunhao .
MOBILE NETWORKS & APPLICATIONS, 2014, 19 (02) :171-209
[7]  
Cheung P., 2018, PUBLIC TRUST MED RES
[8]  
Cinà G, 2022, Arxiv, DOI arXiv:2206.15363
[9]  
Coticchia Christine M, 2008, J Natl Compr Canc Netw, V6, P795
[10]  
Covert I, 2021, PR MACH LEARN RES, V130