MedTric : A clinically applicable metric for evaluation of multi-label computational diagnostic systems

被引:1
作者
Saha, Soumadeep [1 ,2 ]
Garain, Utpal [1 ]
Ukil, Arijit [2 ]
Pal, Arpan [2 ]
Khandelwal, Sundeep [2 ]
机构
[1] Indian Stat Inst, Comp Vis & Pattern Recognit Unit, Kolkata, West Bengal, India
[2] Tata Consultancy Serv, TCS Res, Kolkata, West Bengal, India
来源
PLOS ONE | 2023年 / 18卷 / 08期
关键词
CLASSIFICATION;
D O I
10.1371/journal.pone.0283895
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
When judging the quality of a computational system for a pathological screening task, several factors seem to be important, like sensitivity, specificity, accuracy, etc. With machine learning based approaches showing promise in the multi-label paradigm, they are being widely adopted to diagnostics and digital therapeutics. Metrics are usually borrowed from machine learning literature, and the current consensus is to report results on a diverse set of metrics. It is infeasible to compare efficacy of computational systems which have been evaluated on different sets of metrics. From a diagnostic utility standpoint, the current metrics themselves are far from perfect, often biased by prevalence of negative samples or other statistical factors and importantly, they are designed to evaluate general purpose machine learning tasks. In this paper we outline the various parameters that are important in constructing a clinical metric aligned with diagnostic practice, and demonstrate their incompatibility with existing metrics. We propose a new metric, MedTric that takes into account several factors that are of clinical importance. MedTric is built from the ground up keeping in mind the unique context of computational diagnostics and the principle of risk minimization, penalizing missed diagnosis more harshly than over-diagnosis. MedTric is a unified metric for medical or pathological screening system evaluation. We compare this metric against other widely used metrics and demonstrate how our system outperforms them in key areas of medical relevance.
引用
收藏
页数:19
相关论文
共 20 条
  • [1] Multi-label classification of symptom terms from free-text bilingual adverse drug reaction reports using natural language processing
    Chaichulee, Sitthichok
    Promchai, Chissanupong
    Kaewkomon, Tanyamai
    Kongkamol, Chanon
    Ingviya, Thammasin
    Sangsupawanich, Pasuree
    [J]. PLOS ONE, 2022, 17 (08):
  • [2] The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation
    Chicco, Davide
    Jurman, Giuseppe
    [J]. BMC GENOMICS, 2020, 21 (01)
  • [3] El Kafrawy P., 2015, Int. J. Comput. Appl, V114, P1, DOI [DOI 10.5120/20083-1666, 10.5120/20083-1666]
  • [4] Elkan C., 2001, INT JOINT C ART INT, V17, P973
  • [5] Giraldo-Forero AF, 2015, LECT N BIOINFORMAT, V9043, P557, DOI 10.1007/978-3-319-16483-0_54
  • [6] Identifying neuroanatomical and behavioral features for autism spectrum disorder diagnosis in children using machine learning
    Han, Yu
    Rizzo, Donna M.
    Hanley, John P.
    Coderre, Emily L.
    Prelock, Patricia A.
    [J]. PLOS ONE, 2022, 17 (07):
  • [7] Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network
    Hannun, Awni Y.
    Rajpurkar, Pranav
    Haghpanahi, Masoumeh
    Tison, Geoffrey H.
    Bourn, Codie
    Turakhia, Mintu P.
    Ng, Andrew Y.
    [J]. NATURE MEDICINE, 2019, 25 (01) : 65 - +
  • [8] On evaluation metrics for medical applications of artificial intelligence
    Hicks, Steven A.
    Struemke, Inga
    Thambawita, Vajira
    Hammou, Malek
    Riegler, Michael A.
    Halvorsen, Pal
    Parasa, Sravanthi
    [J]. SCIENTIFIC REPORTS, 2022, 12 (01)
  • [9] Irvin J, 2019, AAAI CONF ARTIF INTE, P590
  • [10] Automatic Multi-Label ECG Classification with Category Imbalance and Cost-Sensitive Thresholding
    Liu, Yang
    Li, Qince
    Wang, Kuanquan
    Liu, Jun
    He, Runnan
    Yuan, Yongfeng
    Zhang, Henggui
    [J]. BIOSENSORS-BASEL, 2021, 11 (11):