Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images

被引:20
作者
Calderon-Ramirez, Saul [1 ,2 ]
Yang, Shengxiang [1 ]
Moemeni, Armaghan [3 ]
Elizondo, David [1 ]
Colreavy-Donnelly, Simon [1 ]
Chavarria-Estrada, Luis Fernando [4 ]
Molina-Cabello, Miguel A. [5 ,6 ]
机构
[1] De Montfort Univ, Ctr Computat Intelligence CCI, Leicester, Leics, England
[2] Inst Tecnol Costa Rica, Cartago, Costa Rica
[3] Univ Nottingham, Sch Comp Sci, Nottingham, England
[4] Imagenes Med Dr Chavarria Estrada, San Jose, Costa Rica
[5] Univ Malaga, Dept Comp Languages & Comp Sci, Malaga, Spain
[6] Inst Invest Biomed Malaga IBIMA, Malaga, Spain
关键词
Coronavirus; COVID-19; Computer aided diagnosis; Data imbalance; Semi-supervised learning; DEEP; RADIOLOGY; FEATURES;
D O I
10.1016/j.asoc.2021.107692
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A key factor in the fight against viral diseases such as the coronavirus (COVID-19) is the identification of virus carriers as early and quickly as possible, in a cheap and efficient manner. The application of deep learning for image classification of chest X-ray images of COVID-19 patients could become a useful pre-diagnostic detection methodology. However, deep learning architectures require large labelled datasets. This is often a limitation when the subject of research is relatively new as in the case of the virus outbreak, where dealing with small labelled datasets is a challenge. Moreover, in such context, the datasets are also highly imbalanced, with few observations from positive cases of the new disease. In this work we evaluate the performance of the semi-supervised deep learning architecture known as MixMatch with a very limited number of labelled observations and highly imbalanced labelled datasets. We demonstrate the critical impact of data imbalance to the model's accuracy. Therefore, we propose a simple approach for correcting data imbalance, by re-weighting each observation in the loss function, giving a higher weight to the observations corresponding to the under-represented class. For unlabelled observations, we use the pseudo and augmented labels calculated by MixMatch to choose the appropriate weight. The proposed method improved classification accuracy by up to 18%, with respect to the non balanced MixMatch algorithm. We tested our proposed approach with several available datasets using 10, 15 and 20 labelled observations, for binary classification (COVID-19 positive and normal cases). For multi-class classification (COVID-19 positive, pneumonia and normal cases), we tested 30, 50, 70 and 90 labelled observations. Additionally, a new dataset is included among the tested datasets, composed of chest X-ray images of Costa Rican adult patients. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:15
相关论文
共 62 条
  • [41] Automatic detection of coronavirus disease (COVID-19) using X-ray images and deep convolutional neural networks
    Narin, Ali
    Kaya, Ceren
    Pamuk, Ziynet
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2021, 24 (03) : 1207 - 1220
  • [42] Niu S, IEEE J BIOMED HLTH I
  • [43] Oala L, 2020, PR MACH LEARN RES, V136, P280
  • [44] Deep Learning COVID-19 Features on CXR Using Limited Training Data Sets
    Oh, Yujin
    Park, Sangjoon
    Ye, Jong Chul
    [J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (08) : 2688 - 2700
  • [45] Oliver A, 2018, ADV NEUR IN, V31
  • [46] Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans
    Roberts, Michael
    Driggs, Derek
    Thorpe, Matthew
    Gilbey, Julian
    Yeung, Michael
    Ursprung, Stephan
    Aviles-Rivero, Angelica I.
    Etmann, Christian
    McCague, Cathal
    Beer, Lucian
    Weir-McCall, Jonathan R.
    Teng, Zhongzhao
    Gkrania-Klotsas, Effrossyni
    Rudd, James H. F.
    Sala, Evis
    Schonlieb, Carola-Bibiane
    [J]. NATURE MACHINE INTELLIGENCE, 2021, 3 (03) : 199 - 217
  • [47] Salman F.M., 2020, Covid-19 detection using artificial intelligence, V4, P18
  • [48] Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
    Selvaraju, Ramprasaath R.
    Cogswell, Michael
    Das, Abhishek
    Vedantam, Ramakrishna
    Parikh, Devi
    Batra, Dhruv
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 618 - 626
  • [49] Detection of Coronavirus Disease (COVID-19) based on Deep Features and Support Vector Machine
    Sethy, Prabira Kumar
    Behera, Santi Kumari
    Ratha, Pradyumna Kumar
    Biswas, Preesat
    [J]. INTERNATIONAL JOURNAL OF MATHEMATICAL ENGINEERING AND MANAGEMENT SCIENCES, 2020, 5 (04) : 643 - 651
  • [50] Assessment of the Availability of Technology for Trauma Care in India
    Shah, Mihir Tejanshu
    Joshipura, Manjul
    Singleton, Jered
    LaBarre, Paul
    Desai, Hem
    Sharma, Eliza
    Mock, Charles
    [J]. WORLD JOURNAL OF SURGERY, 2015, 39 (02) : 363 - 372