Comparison of evaluation metrics of deep learning for imbalanced imaging data in osteoarthritis studies

被引:7
作者
Liu, Shen [1 ]
Roemer, Frank [2 ,4 ]
Ge, Yong [3 ]
Bedrick, Edward J. [1 ]
Li, Zong-Ming [16 ]
Guermazi, Ali [4 ]
Sharma, Leena [11 ]
Eaton, Charles [6 ,7 ,8 ]
Hochberg, Marc C. [9 ,10 ]
Hunter, David J. [12 ]
Nevitt, Michael C. [5 ]
Wirth, Wolfgang [13 ,14 ,15 ]
Kwoh, C. Kent [16 ]
Sun, Xiaoxiao [1 ]
机构
[1] Univ Arizona, Dept Epidemiol & Biostat, 1295 N Martin Ave, Tucson, AZ 85724 USA
[2] Univ Hosp Erlangen Nuremberg, Dept Radiol, Erlangen, Germany
[3] Univ Arizona, Dept Management Informat Syst, Tucson, AZ USA
[4] Boston Univ, Sch Med, Dept Neurol, Boston, MA USA
[5] Univ Calif San Francisco, Dept Epidemiol & Biostat, San Francisco, CA USA
[6] Brown Univ, Kent Mem Hosp, Warren Alpert Med Sch, Sch Publ Hlth, Providence, RI USA
[7] Brown Univ, Warren Alpert Med Sch, Sch Publ Hlth, Dept Family Med, Providence, RI USA
[8] Brown Univ, Sch Publ Hlth, Dept Epidemiol, Providence, RI USA
[9] Univ Maryland, Sch Med, Baltimore, MD USA
[10] VA Maryland Hlth Care Syst, Med Care Clin Ctr, Baltimore, MD USA
[11] Northwestern Univ, Feinberg Sch Med, Wilmette, IL USA
[12] Univ Sydney, Fac Med & Hlth, Kolling Inst, Sydney Musculoskeletal Hlth, Sydney, NSW 2065, Australia
[13] Paracelsus Med Univ Salzburg & Nuremberg, Inst Anat & Cell Biol, Dept Imaging & Funct Musculoskeletal Res, Salzburg, Austria
[14] Paracelsus Med Univ Salzburg & Nuremberg, Ludwig Boltzmann Inst Arthrit & Rehabil, Salzburg, Austria
[15] Chondrometrics GmbH, Ainring, Germany
[16] Univ Arizona, Coll Med, Arizona Arthrit Ctr, Arthrit Ctr, Tucson, AZ USA
基金
美国国家卫生研究院;
关键词
Osteoarthritis; Bone marrow lesion; Imbalanced data; Deep learning; Receiver operating characteristic; Precision recall curve; KNEE OSTEOARTHRITIS;
D O I
10.1016/j.joca.2023.05.006
中图分类号
R826.8 [整形外科学]; R782.2 [口腔颌面部整形外科学]; R726.2 [小儿整形外科学]; R62 [整形外科学(修复外科学)];
学科分类号
摘要
Purpose: To compare the evaluation metrics for deep learning methods that were developed using imbalanced imaging data in osteoarthritis studies.Materials and methods: This retrospective study utilized 2996 sagittal intermediate-weighted fat -suppressed knee MRIs with MRI Osteoarthritis Knee Score readings from 2467 participants in the Osteoarthritis Initiative study. We obtained probabilities of the presence of bone marrow lesions (BMLs) from MRIs in the testing dataset at the sub-region (15 sub-regions), compartment, and whole-knee levels based on the trained deep learning models. We compared different evaluation metrics (e.g., receiver operating characteristic (ROC) and precision-recall (PR) curves) in the testing dataset with various class ratios (presence of BMLs vs. absence of BMLs) at these three data levels to assess the model's performance.Results: In a subregion with an extremely high imbalance ratio, the model achieved a ROC-AUC of 0.84, a PR-AUC of 0.10, a sensitivity of 0, and a specificity of 1.Conclusion: The commonly used ROC curve is not sufficiently informative, especially in the case of imbalanced data. We provide the following practical suggestions based on our data analysis: 1) ROC-AUC is recommended for balanced data, 2) PR-AUC should be used for moderately imbalanced data (i.e., when the proportion of the minor class is above 5% and less than 50%), and 3) for severely imbalanced data (i.e., when the proportion of the minor class is below 5%), it is not practical to apply a deep learning model, even with the application of techniques addressing imbalanced data issues.& COPY; 2023 The Author(s). Published by Elsevier Ltd on behalf of Osteoarthritis Research Society International. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-ncnd/4.0/).
引用
收藏
页码:1242 / 1248
页数:7
相关论文
共 15 条
  • [1] Deep-learning-assisted diagnosis for knee magnetic resonance imaging: Development and retrospective validation of MRNet
    Bien, Nicholas
    Rajpurkar, Pranav
    Ball, Robyn L.
    Irvin, Jeremy
    Park, Allison
    Jones, Erik
    Bereket, Michael
    Patel, Bhavik N.
    Yeom, Kristen W.
    Shpanskaya, Katie
    Halabi, Safwan
    Zucker, Evan
    Fanton, Gary
    Amanatullah, Derek F.
    Beaulieu, Christopher F.
    Riley, Geoffrey M.
    Stewart, Russell J.
    Blankenberg, Francis G.
    Larson, David B.
    Jones, Ricky H.
    Langlotz, Curtis P.
    Ng, Andrew Y.
    Lungren, Matthew P.
    [J]. PLOS MEDICINE, 2018, 15 (11)
  • [2] On Model Evaluation Under Non-constant Class Imbalance
    Brabec, Jan
    Komarek, Tomas
    Franc, Vojtech
    Machlica, Lukas
    [J]. COMPUTATIONAL SCIENCE - ICCS 2020, PT IV, 2020, 12140 : 74 - 87
  • [3] The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation
    Chicco, Davide
    Totsch, Niklas
    Jurman, Giuseppe
    [J]. BIODATA MINING, 2021, 14 (01) : 1 - 22
  • [4] Exploring deep learning capabilities in knee osteoarthritis case study for classification
    Christodoulou, Eirini
    Moustakidis, Serafeim
    Papandrianos, Nikolaos
    Tsaopoulos, Dimitrios
    Papageorgiou, Elpiniki
    [J]. 2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS AND APPLICATIONS (IISA), 2019, : 271 - 276
  • [5] Davis J., 2006, P 23 INT C MACHINE L, P233, DOI [DOI 10.1145/1143844.1143874, 10.1145/1143844.1143874]
  • [6] Number of Persons With Symptomatic Knee Osteoarthritis in the US: Impact of Race and Ethnicity, Age, Sex, and Obesity
    Deshpande, Bhushan R.
    Katz, Jeffrey N.
    Solomon, Daniel H.
    Yelin, Edward H.
    Hunter, David J.
    Messier, Stephen P.
    Suter, Lisa G.
    Losina, Elena
    [J]. ARTHRITIS CARE & RESEARCH, 2016, 68 (12) : 1743 - 1750
  • [7] Flach PA, 2015, ADV NEUR IN, V28
  • [8] Evolution of semi-quantitative whole joint assessment of knee OA: MOAKS (MRI Osteoarthritis Knee Score)
    Hunter, D. J.
    Guermazi, A.
    Lo, G. H.
    Grainger, A. J.
    Conaghan, P. G.
    Boudreau, R. M.
    Roemer, F. W.
    [J]. OSTEOARTHRITIS AND CARTILAGE, 2011, 19 (08) : 990 - 1002
  • [9] Liu S, 2021, ARTHRITIS RHEUMATOL, V73, P3096
  • [10] Deep learning for large scale MRI-based morphological phenotyping of osteoarthritis
    Namiri, Nikan K.
    Lee, Jinhee
    Astuto, Bruno
    Liu, Felix
    Shah, Rutwik
    Majumdar, Sharmila
    Pedoia, Valentina
    [J]. SCIENTIFIC REPORTS, 2021, 11 (01)