Bias Analysis on Public X-Ray Image Datasets of Pneumonia and COVID-19 Patients

Chest X-ray images are useful for early COVID-19 diagnosis with the advantage that X-ray devices are already available in health centers and images are obtained immediately. Some datasets containing X-ray images with cases (pneumonia or COVID-19) and controls have been made available to develop mach...

Full description

Bibliographic Details
Main Authors: Omar Del Tejo Catala, Ismael Salvador Igual, Francisco Javier Perez-Benito, David Millan Escriva, Vicent Ortiz Castello, Rafael Llobet, Juan-Carlos Perez-Cortes
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9374968/
_version_ 1811274364916596736
author Omar Del Tejo Catala
Ismael Salvador Igual
Francisco Javier Perez-Benito
David Millan Escriva
Vicent Ortiz Castello
Rafael Llobet
Juan-Carlos Perez-Cortes
author_facet Omar Del Tejo Catala
Ismael Salvador Igual
Francisco Javier Perez-Benito
David Millan Escriva
Vicent Ortiz Castello
Rafael Llobet
Juan-Carlos Perez-Cortes
author_sort Omar Del Tejo Catala
collection DOAJ
description Chest X-ray images are useful for early COVID-19 diagnosis with the advantage that X-ray devices are already available in health centers and images are obtained immediately. Some datasets containing X-ray images with cases (pneumonia or COVID-19) and controls have been made available to develop machine-learning-based methods to aid in diagnosing the disease. However, these datasets are mainly composed of different sources coming from pre-COVID-19 datasets and COVID-19 datasets. Particularly, we have detected a significant bias in some of the released datasets used to train and test diagnostic systems, which might imply that the results published are optimistic and may overestimate the actual predictive capacity of the techniques proposed. In this article, we analyze the existing bias in some commonly used datasets and propose a series of preliminary steps to carry out before the classic machine learning pipeline in order to detect possible biases, to avoid them if possible and to report results that are more representative of the actual predictive power of the methods under analysis.
first_indexed 2024-04-12T23:17:40Z
format Article
id doaj.art-d9c75879ae434c1bbd124ceb66d22ce0
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-12T23:17:40Z
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-d9c75879ae434c1bbd124ceb66d22ce02022-12-22T03:12:38ZengIEEEIEEE Access2169-35362021-01-019423704238310.1109/ACCESS.2021.30654569374968Bias Analysis on Public X-Ray Image Datasets of Pneumonia and COVID-19 PatientsOmar Del Tejo Catala0https://orcid.org/0000-0002-8953-0344Ismael Salvador Igual1https://orcid.org/0000-0001-9269-3737Francisco Javier Perez-Benito2https://orcid.org/0000-0002-6290-5644David Millan Escriva3https://orcid.org/0000-0003-4224-2334Vicent Ortiz Castello4https://orcid.org/0000-0002-4390-6190Rafael Llobet5https://orcid.org/0000-0002-8278-9740Juan-Carlos Perez-Cortes6https://orcid.org/0000-0001-6506-090XInstituto Tecnológico de Informática (ITI), Universitat Politècnica de València, Valencia, SpainInstituto Tecnológico de Informática (ITI), Universitat Politècnica de València, Valencia, SpainInstituto Tecnológico de Informática (ITI), Universitat Politècnica de València, Valencia, SpainInstituto Tecnológico de Informática (ITI), Universitat Politècnica de València, Valencia, SpainInstituto Tecnológico de Informática (ITI), Universitat Politècnica de València, Valencia, SpainInstituto Tecnológico de Informática (ITI), Universitat Politècnica de València, Valencia, SpainInstituto Tecnológico de Informática (ITI), Universitat Politècnica de València, Valencia, SpainChest X-ray images are useful for early COVID-19 diagnosis with the advantage that X-ray devices are already available in health centers and images are obtained immediately. Some datasets containing X-ray images with cases (pneumonia or COVID-19) and controls have been made available to develop machine-learning-based methods to aid in diagnosing the disease. However, these datasets are mainly composed of different sources coming from pre-COVID-19 datasets and COVID-19 datasets. Particularly, we have detected a significant bias in some of the released datasets used to train and test diagnostic systems, which might imply that the results published are optimistic and may overestimate the actual predictive capacity of the techniques proposed. In this article, we analyze the existing bias in some commonly used datasets and propose a series of preliminary steps to carry out before the classic machine learning pipeline in order to detect possible biases, to avoid them if possible and to report results that are more representative of the actual predictive power of the methods under analysis.https://ieeexplore.ieee.org/document/9374968/Deep learningCOVID-19convolutional neural networkschest X-raybiassegmentation
spellingShingle Omar Del Tejo Catala
Ismael Salvador Igual
Francisco Javier Perez-Benito
David Millan Escriva
Vicent Ortiz Castello
Rafael Llobet
Juan-Carlos Perez-Cortes
Bias Analysis on Public X-Ray Image Datasets of Pneumonia and COVID-19 Patients
IEEE Access
Deep learning
COVID-19
convolutional neural networks
chest X-ray
bias
segmentation
title Bias Analysis on Public X-Ray Image Datasets of Pneumonia and COVID-19 Patients
title_full Bias Analysis on Public X-Ray Image Datasets of Pneumonia and COVID-19 Patients
title_fullStr Bias Analysis on Public X-Ray Image Datasets of Pneumonia and COVID-19 Patients
title_full_unstemmed Bias Analysis on Public X-Ray Image Datasets of Pneumonia and COVID-19 Patients
title_short Bias Analysis on Public X-Ray Image Datasets of Pneumonia and COVID-19 Patients
title_sort bias analysis on public x ray image datasets of pneumonia and covid 19 patients
topic Deep learning
COVID-19
convolutional neural networks
chest X-ray
bias
segmentation
url https://ieeexplore.ieee.org/document/9374968/
work_keys_str_mv AT omardeltejocatala biasanalysisonpublicxrayimagedatasetsofpneumoniaandcovid19patients
AT ismaelsalvadorigual biasanalysisonpublicxrayimagedatasetsofpneumoniaandcovid19patients
AT franciscojavierperezbenito biasanalysisonpublicxrayimagedatasetsofpneumoniaandcovid19patients
AT davidmillanescriva biasanalysisonpublicxrayimagedatasetsofpneumoniaandcovid19patients
AT vicentortizcastello biasanalysisonpublicxrayimagedatasetsofpneumoniaandcovid19patients
AT rafaelllobet biasanalysisonpublicxrayimagedatasetsofpneumoniaandcovid19patients
AT juancarlosperezcortes biasanalysisonpublicxrayimagedatasetsofpneumoniaandcovid19patients