Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images

被引:26
作者
Calderon-Ramirez, Saul [1 ,2 ]
Yang, Shengxiang [1 ]
Moemeni, Armaghan [3 ]
Elizondo, David [1 ]
Colreavy-Donnelly, Simon [1 ]
Chavarria-Estrada, Luis Fernando [4 ]
Molina-Cabello, Miguel A. [5 ,6 ]
机构
[1] De Montfort Univ, Ctr Computat Intelligence CCI, Leicester, Leics, England
[2] Inst Tecnol Costa Rica, Cartago, Costa Rica
[3] Univ Nottingham, Sch Comp Sci, Nottingham, England
[4] Imagenes Med Dr Chavarria Estrada, San Jose, Costa Rica
[5] Univ Malaga, Dept Comp Languages & Comp Sci, Malaga, Spain
[6] Inst Invest Biomed Malaga IBIMA, Malaga, Spain
关键词
Coronavirus; COVID-19; Computer aided diagnosis; Data imbalance; Semi-supervised learning; DEEP; RADIOLOGY; FEATURES;
D O I
10.1016/j.asoc.2021.107692
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A key factor in the fight against viral diseases such as the coronavirus (COVID-19) is the identification of virus carriers as early and quickly as possible, in a cheap and efficient manner. The application of deep learning for image classification of chest X-ray images of COVID-19 patients could become a useful pre-diagnostic detection methodology. However, deep learning architectures require large labelled datasets. This is often a limitation when the subject of research is relatively new as in the case of the virus outbreak, where dealing with small labelled datasets is a challenge. Moreover, in such context, the datasets are also highly imbalanced, with few observations from positive cases of the new disease. In this work we evaluate the performance of the semi-supervised deep learning architecture known as MixMatch with a very limited number of labelled observations and highly imbalanced labelled datasets. We demonstrate the critical impact of data imbalance to the model's accuracy. Therefore, we propose a simple approach for correcting data imbalance, by re-weighting each observation in the loss function, giving a higher weight to the observations corresponding to the under-represented class. For unlabelled observations, we use the pseudo and augmented labels calculated by MixMatch to choose the appropriate weight. The proposed method improved classification accuracy by up to 18%, with respect to the non balanced MixMatch algorithm. We tested our proposed approach with several available datasets using 10, 15 and 20 labelled observations, for binary classification (COVID-19 positive and normal cases). For multi-class classification (COVID-19 positive, pneumonia and normal cases), we tested 30, 50, 70 and 90 labelled observations. Additionally, a new dataset is included among the tested datasets, composed of chest X-ray images of Costa Rican adult patients. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:15
相关论文
共 62 条
[41]   Automatic detection of coronavirus disease (COVID-19) using X-ray images and deep convolutional neural networks [J].
Narin, Ali ;
Kaya, Ceren ;
Pamuk, Ziynet .
PATTERN ANALYSIS AND APPLICATIONS, 2021, 24 (03) :1207-1220
[42]  
Niu S, IEEE J BIOMED HLTH I
[43]  
Oala L, 2020, PR MACH LEARN RES, V136, P280
[44]   Deep Learning COVID-19 Features on CXR Using Limited Training Data Sets [J].
Oh, Yujin ;
Park, Sangjoon ;
Ye, Jong Chul .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (08) :2688-2700
[45]  
Oliver A, 2018, ADV NEUR IN, V31
[46]   Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans [J].
Roberts, Michael ;
Driggs, Derek ;
Thorpe, Matthew ;
Gilbey, Julian ;
Yeung, Michael ;
Ursprung, Stephan ;
Aviles-Rivero, Angelica I. ;
Etmann, Christian ;
McCague, Cathal ;
Beer, Lucian ;
Weir-McCall, Jonathan R. ;
Teng, Zhongzhao ;
Gkrania-Klotsas, Effrossyni ;
Rudd, James H. F. ;
Sala, Evis ;
Schonlieb, Carola-Bibiane .
NATURE MACHINE INTELLIGENCE, 2021, 3 (03) :199-217
[47]  
Salman F.M., 2020, Covid-19 detection using artificial intelligence, V4, P18
[48]   Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization [J].
Selvaraju, Ramprasaath R. ;
Cogswell, Michael ;
Das, Abhishek ;
Vedantam, Ramakrishna ;
Parikh, Devi ;
Batra, Dhruv .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :618-626
[49]   Detection of Coronavirus Disease (COVID-19) based on Deep Features and Support Vector Machine [J].
Sethy, Prabira Kumar ;
Behera, Santi Kumari ;
Ratha, Pradyumna Kumar ;
Biswas, Preesat .
INTERNATIONAL JOURNAL OF MATHEMATICAL ENGINEERING AND MANAGEMENT SCIENCES, 2020, 5 (04) :643-651
[50]   Assessment of the Availability of Technology for Trauma Care in India [J].
Shah, Mihir Tejanshu ;
Joshipura, Manjul ;
Singleton, Jered ;
LaBarre, Paul ;
Desai, Hem ;
Sharma, Eliza ;
Mock, Charles .
WORLD JOURNAL OF SURGERY, 2015, 39 (02) :363-372