Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images

被引:20
作者
Calderon-Ramirez, Saul [1 ,2 ]
Yang, Shengxiang [1 ]
Moemeni, Armaghan [3 ]
Elizondo, David [1 ]
Colreavy-Donnelly, Simon [1 ]
Chavarria-Estrada, Luis Fernando [4 ]
Molina-Cabello, Miguel A. [5 ,6 ]
机构
[1] De Montfort Univ, Ctr Computat Intelligence CCI, Leicester, Leics, England
[2] Inst Tecnol Costa Rica, Cartago, Costa Rica
[3] Univ Nottingham, Sch Comp Sci, Nottingham, England
[4] Imagenes Med Dr Chavarria Estrada, San Jose, Costa Rica
[5] Univ Malaga, Dept Comp Languages & Comp Sci, Malaga, Spain
[6] Inst Invest Biomed Malaga IBIMA, Malaga, Spain
关键词
Coronavirus; COVID-19; Computer aided diagnosis; Data imbalance; Semi-supervised learning; DEEP; RADIOLOGY; FEATURES;
D O I
10.1016/j.asoc.2021.107692
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A key factor in the fight against viral diseases such as the coronavirus (COVID-19) is the identification of virus carriers as early and quickly as possible, in a cheap and efficient manner. The application of deep learning for image classification of chest X-ray images of COVID-19 patients could become a useful pre-diagnostic detection methodology. However, deep learning architectures require large labelled datasets. This is often a limitation when the subject of research is relatively new as in the case of the virus outbreak, where dealing with small labelled datasets is a challenge. Moreover, in such context, the datasets are also highly imbalanced, with few observations from positive cases of the new disease. In this work we evaluate the performance of the semi-supervised deep learning architecture known as MixMatch with a very limited number of labelled observations and highly imbalanced labelled datasets. We demonstrate the critical impact of data imbalance to the model's accuracy. Therefore, we propose a simple approach for correcting data imbalance, by re-weighting each observation in the loss function, giving a higher weight to the observations corresponding to the under-represented class. For unlabelled observations, we use the pseudo and augmented labels calculated by MixMatch to choose the appropriate weight. The proposed method improved classification accuracy by up to 18%, with respect to the non balanced MixMatch algorithm. We tested our proposed approach with several available datasets using 10, 15 and 20 labelled observations, for binary classification (COVID-19 positive and normal cases). For multi-class classification (COVID-19 positive, pneumonia and normal cases), we tested 30, 50, 70 and 90 labelled observations. Additionally, a new dataset is included among the tested datasets, composed of chest X-ray images of Costa Rican adult patients. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:15
相关论文
共 62 条
  • [1] Alfaro E., 2019, 2019 IEEE INT WORK C, P123, DOI 10.1109/IWOBI47054.2019.9114436
  • [2] Alqudah A, 2020, COVID 2019 DETECTION
  • [3] Deep Over-sampling Framework for Classifying Imbalanced Data
    Ando, Shin
    Huang, Chun Yuan
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT I, 2017, 10534 : 770 - 785
  • [4] [Anonymous], Advice on the use of point-of-care immunodiagnostic tests for COVID-19 by World Health Organization
  • [5] [Anonymous], 2020, RADIOLOGY
  • [6] [Anonymous], 2019, ARXIV PREPRINT ARXIV
  • [7] Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks
    Apostolopoulos, Ioannis D.
    Mpesiana, Tzani A.
    [J]. PHYSICAL AND ENGINEERING SCIENCES IN MEDICINE, 2020, 43 (02) : 635 - 640
  • [8] The training and practice of radiology in India: current trends
    Arora, Richa
    [J]. QUANTITATIVE IMAGING IN MEDICINE AND SURGERY, 2014, 4 (06) : 449 - 450
  • [9] Sample-Size Determination Methodologies for Machine Learning in Medical Imaging Research: A Systematic Review
    Balki, Indranil
    Amirabadi, Afsaneh
    Levman, Jacob
    Martel, Anne L.
    Emersic, Ziga
    Meden, Blaz
    Garcia-Pedrero, Angel
    Ramirez, Saul C.
    Kong, Dehan
    Moody, Alan R.
    Tyrrell, Pascal N.
    [J]. CANADIAN ASSOCIATION OF RADIOLOGISTS JOURNAL-JOURNAL DE L ASSOCIATION CANADIENNE DES RADIOLOGISTES, 2019, 70 (04): : 344 - 353
  • [10] A First Glance to the Quality Assessment of Dental Photostimulable Phosphor Plates with Deep Learning
    Bermudez, Ariana
    Calderon-Ramirez, Saul
    Thang, Trevor
    Tyrrell, Pascal
    Moemeni, Armaghan
    Yang, Shengxiang
    Torrents-Barrena, Jordina
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,