Advanced Receiver Operating Characteristic Curve Analysis to Identify Outliers in Binary Machine Learning Classifications for Precision Medicine

被引：0

作者：

Namdar, Khashayar ^{[1
,2
,4
,7
]}

Khalvati, Farzad ^{[1
,2
,3
,4
,5
,6
,7
]}

机构：

[1] Hosp Sick Children, Dept Diagnost Intervent Radiol, Toronto, ON, Canada

[2] SickKids Res Inst, Neurosci & Mental Hlth Res Program, Toronto, ON, Canada

[3] Univ Toronto, Dept Med Imaging, Toronto, ON, Canada

[4] Univ Toronto, Inst Med Sci, Toronto, ON, Canada

[5] Univ Toronto, Dept Comp Sci, Toronto, ON, Canada

[6] Univ Toronto, Dept Mech & Ind Engn, Toronto, ON, Canada

[7] Vector Inst, Toronto, ON, Canada

来源：

2024 IEEE EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS, BHI | 2024年

关键词：

AUROC; Binary Classification; Evaluation Metric; Machine Learning; Outlier Identification; ROC;

D O I：

10.1109/BHI62660.2024.10913597

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Receiver Operating Characteristic (ROC) curve is a critical tool for binary classification analysis in medicine, with the Area Under the ROC Curve (AUROC) serving as a widely accepted metric to evaluate the performance of binary classifiers. This study conducts a comprehensive review of the ROC curve with a focus on its utility in outlier identification. We introduce a novel scoring method to rank actual positives and actual negatives within a test set, according to their impact on AUROC degradation. We bridge the scoring system with the ROC curve analysis to quantify each data point's contribution to AUROC loss. Furthermore, we introduce the IMICS ROC Analyzer, a graphical user interface-based software, embedded with our innovative algorithms. Through the use of an open-source prostate cancer dataset, we illustrate the application of our algorithms for practical outlier detection in binary classification tasks. The IMICS ROC Analyzer enhances the field of precision medicine by allowing for measuring an individual's contributions (be it patients, lesions, or samples) to the overall AUROC, thus facilitating confidence measurement of Machine Learning (ML) classifiers for individual cases of interest in a cohort.

引用

页数：8

共 32 条

[1]

[Anonymous], 2011, ICML 11

[2] Data Descriptor: Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features [J].

Bakas, Spyridon ;

Akbari, Hamed ;

Sotiras, Aristeidis ;

Bilello, Michel ;

Rozycki, Martin ;

Kirby, Justin S. ;

Freymann, John B. ;

Farahani, Keyvan ;

Davatzikos, Christos .

SCIENTIFIC DATA, 2017, 4

[3]

BURKE HB, 1994, 1994 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOL 1-7, P2213, DOI 10.1109/ICNN.1994.374560

[4] XGBoost: A Scalable Tree Boosting System [J].

Chen, Tianqi ;

Guestrin, Carlos .

KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794

[5]

Cortes C., 2003, Advances in Neural Information Processing Systems, V16

[6]

Evangelista PF, 2006, IEEE IJCNN, P2166

[7] An introduction to ROC analysis [J].

Fawcett, Tom .

PATTERN RECOGNITION LETTERS, 2006, 27 (08) :861-874

[8] SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary [J].

Fernandez, Alberto ;

Garcia, Salvador ;

Herrera, Francisco ;

Chawla, Nitesh V. .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2018, 61 :863-905

[9]

Ghanbari H, 2018, Arxiv, DOI arXiv:1802.02535

[10]

Hajian-Tilaki K, 2013, CASP J INTERN MED, V4, P627

← 1 2 3 4 →