Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images

被引:26
作者
Calderon-Ramirez, Saul [1 ,2 ]
Yang, Shengxiang [1 ]
Moemeni, Armaghan [3 ]
Elizondo, David [1 ]
Colreavy-Donnelly, Simon [1 ]
Chavarria-Estrada, Luis Fernando [4 ]
Molina-Cabello, Miguel A. [5 ,6 ]
机构
[1] De Montfort Univ, Ctr Computat Intelligence CCI, Leicester, Leics, England
[2] Inst Tecnol Costa Rica, Cartago, Costa Rica
[3] Univ Nottingham, Sch Comp Sci, Nottingham, England
[4] Imagenes Med Dr Chavarria Estrada, San Jose, Costa Rica
[5] Univ Malaga, Dept Comp Languages & Comp Sci, Malaga, Spain
[6] Inst Invest Biomed Malaga IBIMA, Malaga, Spain
关键词
Coronavirus; COVID-19; Computer aided diagnosis; Data imbalance; Semi-supervised learning; DEEP; RADIOLOGY; FEATURES;
D O I
10.1016/j.asoc.2021.107692
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A key factor in the fight against viral diseases such as the coronavirus (COVID-19) is the identification of virus carriers as early and quickly as possible, in a cheap and efficient manner. The application of deep learning for image classification of chest X-ray images of COVID-19 patients could become a useful pre-diagnostic detection methodology. However, deep learning architectures require large labelled datasets. This is often a limitation when the subject of research is relatively new as in the case of the virus outbreak, where dealing with small labelled datasets is a challenge. Moreover, in such context, the datasets are also highly imbalanced, with few observations from positive cases of the new disease. In this work we evaluate the performance of the semi-supervised deep learning architecture known as MixMatch with a very limited number of labelled observations and highly imbalanced labelled datasets. We demonstrate the critical impact of data imbalance to the model's accuracy. Therefore, we propose a simple approach for correcting data imbalance, by re-weighting each observation in the loss function, giving a higher weight to the observations corresponding to the under-represented class. For unlabelled observations, we use the pseudo and augmented labels calculated by MixMatch to choose the appropriate weight. The proposed method improved classification accuracy by up to 18%, with respect to the non balanced MixMatch algorithm. We tested our proposed approach with several available datasets using 10, 15 and 20 labelled observations, for binary classification (COVID-19 positive and normal cases). For multi-class classification (COVID-19 positive, pneumonia and normal cases), we tested 30, 50, 70 and 90 labelled observations. Additionally, a new dataset is included among the tested datasets, composed of chest X-ray images of Costa Rican adult patients. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:15
相关论文
共 62 条
[1]  
Alfaro E., 2019, 2019 IEEE INT WORK C
[2]  
Alqudah A. M., 2020, COVID 2019 DETECTION
[3]   Deep Over-sampling Framework for Classifying Imbalanced Data [J].
Ando, Shin ;
Huang, Chun Yuan .
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT I, 2017, 10534 :770-785
[4]  
[Anonymous], Advice on the Use of Point-of-Care Immunodiagnostic Tests for COVID-19
[5]  
[Anonymous], 2019, ARXIV PREPRINT ARXIV
[6]  
[Anonymous], 2020, RADIOLOGY
[7]   Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks [J].
Apostolopoulos, Ioannis D. ;
Mpesiana, Tzani A. .
PHYSICAL AND ENGINEERING SCIENCES IN MEDICINE, 2020, 43 (02) :635-640
[8]   The training and practice of radiology in India: current trends [J].
Arora, Richa .
QUANTITATIVE IMAGING IN MEDICINE AND SURGERY, 2014, 4 (06) :449-450
[9]   Sample-Size Determination Methodologies for Machine Learning in Medical Imaging Research: A Systematic Review [J].
Balki, Indranil ;
Amirabadi, Afsaneh ;
Levman, Jacob ;
Martel, Anne L. ;
Emersic, Ziga ;
Meden, Blaz ;
Garcia-Pedrero, Angel ;
Ramirez, Saul C. ;
Kong, Dehan ;
Moody, Alan R. ;
Tyrrell, Pascal N. .
CANADIAN ASSOCIATION OF RADIOLOGISTS JOURNAL-JOURNAL DE L ASSOCIATION CANADIENNE DES RADIOLOGISTES, 2019, 70 (04) :344-353
[10]   A First Glance to the Quality Assessment of Dental Photostimulable Phosphor Plates with Deep Learning [J].
Bermudez, Ariana ;
Calderon-Ramirez, Saul ;
Thang, Trevor ;
Tyrrell, Pascal ;
Moemeni, Armaghan ;
Yang, Shengxiang ;
Torrents-Barrena, Jordina .
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,