A Comparison of Techniques for Class Imbalance in Deep Learning Classification of Breast Cancer

Tools based on deep learning models have been created in recent years to aid radiologists in the diagnosis of breast cancer from mammograms. However, the datasets used to train these models may suffer from class imbalance, i.e., there are often fewer malignant samples than benign or healthy cases, w...

Full description

Bibliographic Details
Main Authors:	Ricky Walsh, Mickael Tardy
Format:	Article
Language:	English
Published:	MDPI AG 2022-12-01
Series:	Diagnostics
Subjects:	mammography medical imaging breast cancer class imbalance deep learning synthetic data
Online Access:	https://www.mdpi.com/2075-4418/13/1/67

_version_	1827760839015268352
author	Ricky Walsh Mickael Tardy
author_facet	Ricky Walsh Mickael Tardy
author_sort	Ricky Walsh
collection	DOAJ
description	Tools based on deep learning models have been created in recent years to aid radiologists in the diagnosis of breast cancer from mammograms. However, the datasets used to train these models may suffer from class imbalance, i.e., there are often fewer malignant samples than benign or healthy cases, which can bias the model towards the healthy class. In this study, we systematically evaluate several popular techniques to deal with this class imbalance, namely, class weighting, over-sampling, and under-sampling, as well as a synthetic lesion generation approach to increase the number of malignant samples. These techniques are applied when training on three diverse Full-Field Digital Mammography datasets, and tested on in-distribution and out-of-distribution samples. The experiments show that a greater imbalance is associated with a greater bias towards the majority class, which can be counteracted by any of the standard class imbalance techniques. On the other hand, these methods provide no benefit to model performance with respect to Area Under the Curve of the Recall Operating Characteristic (AUC-ROC), and indeed under-sampling leads to a reduction of 0.066 in AUC in the case of a 19:1 benign to malignant imbalance. Our synthetic lesion methodology leads to better performance in most cases, with increases of up to 0.07 in AUC on out-of-distribution test sets over the next best experiment.
first_indexed	2024-03-11T10:04:48Z
format	Article
id	doaj.art-476c91479ecb4290819a0158a53b9aee
institution	Directory Open Access Journal
issn	2075-4418
language	English
last_indexed	2024-03-11T10:04:48Z
publishDate	2022-12-01
publisher	MDPI AG
record_format	Article
series	Diagnostics
spelling	doaj.art-476c91479ecb4290819a0158a53b9aee2023-11-16T15:08:22ZengMDPI AGDiagnostics2075-44182022-12-011316710.3390/diagnostics13010067A Comparison of Techniques for Class Imbalance in Deep Learning Classification of Breast CancerRicky Walsh0Mickael Tardy1ISTIC, Campus Beaulieu, Université de Rennes 1, 35700 Rennes, FranceHera-MI SAS, 44800 Saint-Herblain, FranceTools based on deep learning models have been created in recent years to aid radiologists in the diagnosis of breast cancer from mammograms. However, the datasets used to train these models may suffer from class imbalance, i.e., there are often fewer malignant samples than benign or healthy cases, which can bias the model towards the healthy class. In this study, we systematically evaluate several popular techniques to deal with this class imbalance, namely, class weighting, over-sampling, and under-sampling, as well as a synthetic lesion generation approach to increase the number of malignant samples. These techniques are applied when training on three diverse Full-Field Digital Mammography datasets, and tested on in-distribution and out-of-distribution samples. The experiments show that a greater imbalance is associated with a greater bias towards the majority class, which can be counteracted by any of the standard class imbalance techniques. On the other hand, these methods provide no benefit to model performance with respect to Area Under the Curve of the Recall Operating Characteristic (AUC-ROC), and indeed under-sampling leads to a reduction of 0.066 in AUC in the case of a 19:1 benign to malignant imbalance. Our synthetic lesion methodology leads to better performance in most cases, with increases of up to 0.07 in AUC on out-of-distribution test sets over the next best experiment.https://www.mdpi.com/2075-4418/13/1/67mammographymedical imagingbreast cancerclass imbalancedeep learningsynthetic data
spellingShingle	Ricky Walsh Mickael Tardy A Comparison of Techniques for Class Imbalance in Deep Learning Classification of Breast Cancer Diagnostics mammography medical imaging breast cancer class imbalance deep learning synthetic data
title	A Comparison of Techniques for Class Imbalance in Deep Learning Classification of Breast Cancer
title_full	A Comparison of Techniques for Class Imbalance in Deep Learning Classification of Breast Cancer
title_fullStr	A Comparison of Techniques for Class Imbalance in Deep Learning Classification of Breast Cancer
title_full_unstemmed	A Comparison of Techniques for Class Imbalance in Deep Learning Classification of Breast Cancer
title_short	A Comparison of Techniques for Class Imbalance in Deep Learning Classification of Breast Cancer
title_sort	comparison of techniques for class imbalance in deep learning classification of breast cancer
topic	mammography medical imaging breast cancer class imbalance deep learning synthetic data
url	https://www.mdpi.com/2075-4418/13/1/67
work_keys_str_mv	AT rickywalsh acomparisonoftechniquesforclassimbalanceindeeplearningclassificationofbreastcancer AT mickaeltardy acomparisonoftechniquesforclassimbalanceindeeplearningclassificationofbreastcancer AT rickywalsh comparisonoftechniquesforclassimbalanceindeeplearningclassificationofbreastcancer AT mickaeltardy comparisonoftechniquesforclassimbalanceindeeplearningclassificationofbreastcancer

A Comparison of Techniques for Class Imbalance in Deep Learning Classification of Breast Cancer

Similar Items