Bias Analysis on Public X-Ray Image Datasets of Pneumonia and COVID-19 Patients

被引:21
作者
Del Tejo Catala, Omar [1 ]
Salvador Igual, Ismael [1 ]
Javier Perez-Benito, Francisco [1 ]
Millan Escriva, David [1 ]
Ortiz Castello, Vicent [1 ]
Llobet, Rafael [1 ,2 ]
Perez-Cortes, Juan-Carlos [1 ,3 ]
机构
[1] Univ Politecn Valencia, Inst Tecnol Informat ITI, Valencia 46022, Spain
[2] Univ Politecn Valencia, Dept Comp Syst & Computat DSIC, Valencia 46022, Spain
[3] Univ Politecn Valencia, Dept Comp Engn DISCA, Valencia 46022, Spain
关键词
Lung; COVID-19; X-ray imaging; Heating systems; Feature extraction; Task analysis; Licenses; Deep learning; convolutional neural networks; chest X-ray; bias; segmentation; saliency map;
D O I
10.1109/ACCESS.2021.3065456
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Chest X-ray images are useful for early COVID-19 diagnosis with the advantage that X-ray devices are already available in health centers and images are obtained immediately. Some datasets containing X-ray images with cases (pneumonia or COVID-19) and controls have been made available to develop machine-learning-based methods to aid in diagnosing the disease. However, these datasets are mainly composed of different sources coming from pre-COVID-19 datasets and COVID-19 datasets. Particularly, we have detected a significant bias in some of the released datasets used to train and test diagnostic systems, which might imply that the results published are optimistic and may overestimate the actual predictive capacity of the techniques proposed. In this article, we analyze the existing bias in some commonly used datasets and propose a series of preliminary steps to carry out before the classic machine learning pipeline in order to detect possible biases, to avoid them if possible and to report results that are more representative of the actual predictive power of the methods under analysis.
引用
收藏
页码:42370 / 42383
页数:14
相关论文
共 37 条
[1]  
[Anonymous], 2018, ARXIV181201716
[2]   Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks [J].
Apostolopoulos, Ioannis D. ;
Mpesiana, Tzani A. .
PHYSICAL AND ENGINEERING SCIENCES IN MEDICINE, 2020, 43 (02) :635-640
[3]   PadChest: A large chest x-ray image dataset with multi-label annotated reports [J].
Bustos, Aurelia ;
Pertusa, Antonio ;
Salinas, Jose-Maria ;
de la Iglesia-Vaya, Maria .
MEDICAL IMAGE ANALYSIS, 2020, 66
[4]   A review on lung boundary detection in chest X-rays [J].
Candemir, Sema ;
Antani, Sameer .
INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2019, 14 (04) :563-576
[5]  
Carroll J. D., 1998, HDB DATA VISUALIZATI, P179
[6]  
Cohen J.P., 2020, COVID-19 image data collection: Prospective predictions are the future
[7]   Early diagnosis of COVID-19-affected patients based on X-ray and computed tomography images using deep learning algorithm [J].
Dansana, Debabrata ;
Kumar, Raghvendra ;
Bhattacharjee, Aishik ;
Hemanth, D. Jude ;
Gupta, Deepak ;
Khanna, Ashish ;
Castillo, Oscar .
SOFT COMPUTING, 2023, 27 (05) :2635-2643
[8]  
Horry MJ, 2020, X-ray image based COVID-19 detection using pre-trained deep learning models, DOI [DOI 10.31224/OSF.IO/WX89S, 10.31224/osf.io/wx89s]
[9]  
Irvin J, 2019, AAAI CONF ARTIF INTE, P590
[10]   Deep learning approaches for COVID-19 detection based on chest X-ray images [J].
Ismael, Aras M. ;
Sengur, Abdulkadir .
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 164