An explainable AI approach for diagnosis of COVID-19 using MALDI-ToF mass spectrometry

被引:3
作者
Seethi, Venkata Devesh Reddy [1 ]
Lacasse, Zane [1 ]
Chivte, Prajkta [1 ]
Bland, Joshua [2 ]
Kadkol, Shrihari S. [2 ]
Gaillard, Elizabeth R. [1 ]
Bharti, Pratool [1 ]
Alhoori, Hamed [1 ]
机构
[1] Northern Illinois Univ, 1425 W Lincoln Hwy, De Kalb, IL 60115 USA
[2] Univ Illinois, 840 S Wood St, Chicago, IL 60612 USA
关键词
COVID-19; testing; Explainable AI; Machine learning; RT-PCR test; Mass spectrometry; MALDI-ToF; SELECTION;
D O I
10.1016/j.eswa.2023.121226
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current artificial intelligence (AI) applications for the diagnosis of coronavirus disease 2019 (COVID-19) often lack a biological foundation in the decision-making process. In this study, we have employed AI for COVID-19 diagnosis using mass spectrometry (MS) data and leveraged explainable AI (X-AI) to explain the decision -making process on a local (per-sample) and global (all samples) basis. We first assessed eight machine learning models with five feature engineering techniques using a five-fold stratified cross-validation. The best accuracy was achieved by Random Forest (RF) classifier using the ratio of areas under the curve (AUC) from the MS data as features. These features were chosen on the basis of tentatively representing both human and viral proteins in human gargle samples. We evaluated the RF classifier on a 70% - 30% train-test split strategy of 152 human gargle samples, yielding an accuracy of 94.12% on the test dataset. Employing X-AI, we further interpreted the RF model using shapely additive explanations (SHAP) and feature importance techniques, including permutation and impurity-based feature importances. With these interpretation models offering a local and global explanation for the machine learning model decisions, we devised a straightforward, three -stage X-AI framework that can enable medical practitioners to understand the mechanisms of a black-box AI model. To the medical practitioner, this instills trust in the AI model by providing the rationales for its decisions.
引用
收藏
页数:16
相关论文
共 86 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]   Saliva samples are a viable alternative to blood samples as a source of DNA for high throughput genotyping [J].
Abraham, Jean E. ;
Maranian, Mel J. ;
Spiteri, Inmaculada ;
Russell, Roslin ;
Ingle, Susan ;
Luccarini, Craig ;
Earl, Helena M. ;
Pharoah, Paul P. D. ;
Dunning, Alison M. ;
Caldas, Carlos .
BMC MEDICAL GENOMICS, 2012, 5
[3]   Explaining machine learning based diagnosis of COVID-19 from routine blood tests with decision trees and criteria graphs [J].
Alves, Marcos Antonio ;
Castro, Giulia Zanon ;
Oliveira, Bruno Alberto Soares ;
Ferreira, Leonardo Augusto ;
Ramirez, Jaime Arturod ;
Silva, Rodrigo ;
Guimaraes, Frederico Gadelha .
COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 132
[4]   Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: Results of 10 convolutional neural networks [J].
Ardakani, Ali Abbasian ;
Kanafi, Alireza Rajabzadeh ;
Acharya, U. Rajendra ;
Khadem, Nazanin ;
Mohammadi, Afshin .
COMPUTERS IN BIOLOGY AND MEDICINE, 2020, 121 (121)
[5]   COVID-19 detection from CT scans using a two-stage framework [J].
Basu, Arpan ;
Sheikh, Khalid Hassan ;
Cuevas, Erik ;
Sarkar, Ram .
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 193
[6]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   Assay Techniques and Test Development for COVID-19 Diagnosis [J].
Carter, Linda J. ;
Garner, Linda, V ;
Smoot, Jeffrey W. ;
Li, Yingzhu ;
Zhou, Qiongqiong ;
Saveson, Catherine J. ;
Sasso, Janet M. ;
Gregg, Anne C. ;
Soares, Divya J. ;
Beskid, Tiffany R. ;
Jervey, Susan R. ;
Liu, Cynthia .
ACS CENTRAL SCIENCE, 2020, 6 (05) :591-605
[9]   The origins of the Gini index: extracts from VariabilitA e MutabilitA (1912) by Corrado Gini [J].
Ceriani, Lidia ;
Verme, Paolo .
JOURNAL OF ECONOMIC INEQUALITY, 2012, 10 (03) :421-443
[10]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794