MedTric : A clinically applicable metric for evaluation of multi-label computational diagnostic systems

被引：4

作者：

Saha, Soumadeep ^{[1
,2
]}

Garain, Utpal ^{[1
]}

Ukil, Arijit ^{[2
]}

Pal, Arpan ^{[2
]}

Khandelwal, Sundeep ^{[2
]}

机构：

[1] Indian Stat Inst, Comp Vis & Pattern Recognit Unit, Kolkata, West Bengal, India

[2] Tata Consultancy Serv, TCS Res, Kolkata, West Bengal, India

来源：

PLOS ONE | 2023年 / 18卷 / 08期

关键词：

CLASSIFICATION;

D O I：

10.1371/journal.pone.0283895

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

When judging the quality of a computational system for a pathological screening task, several factors seem to be important, like sensitivity, specificity, accuracy, etc. With machine learning based approaches showing promise in the multi-label paradigm, they are being widely adopted to diagnostics and digital therapeutics. Metrics are usually borrowed from machine learning literature, and the current consensus is to report results on a diverse set of metrics. It is infeasible to compare efficacy of computational systems which have been evaluated on different sets of metrics. From a diagnostic utility standpoint, the current metrics themselves are far from perfect, often biased by prevalence of negative samples or other statistical factors and importantly, they are designed to evaluate general purpose machine learning tasks. In this paper we outline the various parameters that are important in constructing a clinical metric aligned with diagnostic practice, and demonstrate their incompatibility with existing metrics. We propose a new metric, MedTric that takes into account several factors that are of clinical importance. MedTric is built from the ground up keeping in mind the unique context of computational diagnostics and the principle of risk minimization, penalizing missed diagnosis more harshly than over-diagnosis. MedTric is a unified metric for medical or pathological screening system evaluation. We compare this metric against other widely used metrics and demonstrate how our system outperforms them in key areas of medical relevance.

引用

页数：19

共 20 条

[1] Multi-label classification of symptom terms from free-text bilingual adverse drug reaction reports using natural language processing [J].

Chaichulee, Sitthichok ;

Promchai, Chissanupong ;

Kaewkomon, Tanyamai ;

Kongkamol, Chanon ;

Ingviya, Thammasin ;

Sangsupawanich, Pasuree .

PLOS ONE, 2022, 17 (08)

[2] The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation [J].

Chicco, Davide ;

Jurman, Giuseppe .

BMC GENOMICS, 2020, 21 (01)

[3]

El Kafrawy P., 2015, International Journal of Computers and Applications, V114, P1, DOI [DOI 10.5120/20083-1666, 10.5120/20083-1666]

[4]

Elkan C., 2001, FDN COST SENSITIVE L, P973

[5]

Giraldo-Forero AF, 2015, LECT N BIOINFORMAT, V9043, P557, DOI 10.1007/978-3-319-16483-0_54

[6] Identifying neuroanatomical and behavioral features for autism spectrum disorder diagnosis in children using machine learning [J].

Han, Yu ;

Rizzo, Donna M. ;

Hanley, John P. ;

Coderre, Emily L. ;

Prelock, Patricia A. .

PLOS ONE, 2022, 17 (07)

[7] Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network [J].

Hannun, Awni Y. ;

Rajpurkar, Pranav ;

Haghpanahi, Masoumeh ;

Tison, Geoffrey H. ;

Bourn, Codie ;

Turakhia, Mintu P. ;

Ng, Andrew Y. .

NATURE MEDICINE, 2019, 25 (01) :65-+

[8] On evaluation metrics for medical applications of artificial intelligence [J].

Hicks, Steven A. ;

Struemke, Inga ;

Thambawita, Vajira ;

Hammou, Malek ;

Riegler, Michael A. ;

Halvorsen, Pal ;

Parasa, Sravanthi .

SCIENTIFIC REPORTS, 2022, 12 (01)

[9]

Irvin J, 2019, AAAI CONF ARTIF INTE, P590

[10] Automatic Multi-Label ECG Classification with Category Imbalance and Cost-Sensitive Thresholding [J].

Liu, Yang ;

Li, Qince ;

Wang, Kuanquan ;

Liu, Jun ;

He, Runnan ;

Yuan, Yongfeng ;

Zhang, Henggui .

BIOSENSORS-BASEL, 2021, 11 (11)

← 1 2 →