Enhancing Small Tabular Clinical Trial Dataset through Hybrid Data Augmentation: Combining SMOTE and WCGAN-GP
This study addressed the challenge of training generative adversarial networks (GANs) on small tabular clinical trial datasets for data augmentation, which are known to pose difficulties in training due to limited sample sizes. To overcome this obstacle, a hybrid approach is proposed, combining the...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-08-01
|
Series: | Data |
Subjects: | |
Online Access: | https://www.mdpi.com/2306-5729/8/9/135 |
_version_ | 1797580643590209536 |
---|---|
author | Winston Wang Tun-Wen Pai |
author_facet | Winston Wang Tun-Wen Pai |
author_sort | Winston Wang |
collection | DOAJ |
description | This study addressed the challenge of training generative adversarial networks (GANs) on small tabular clinical trial datasets for data augmentation, which are known to pose difficulties in training due to limited sample sizes. To overcome this obstacle, a hybrid approach is proposed, combining the synthetic minority oversampling technique (SMOTE) to initially augment the original data to a more substantial size for improving the subsequent GAN training with a Wasserstein conditional generative adversarial network with gradient penalty (WCGAN-GP), proven for its state-of-art performance and enhanced stability. The ultimate objective of this research was to demonstrate that the quality of synthetic tabular data generated by the final WCGAN-GP model maintains the structural integrity and statistical representation of the original small dataset using this hybrid approach. This focus is particularly relevant for clinical trials, where limited data availability due to privacy concerns and restricted accessibility to subject enrollment pose common challenges. Despite the limitation of data, the findings demonstrate that the hybrid approach successfully generates synthetic data that closely preserved the characteristics of the original small dataset. By harnessing the power of this hybrid approach to generate faithful synthetic data, the potential for enhancing data-driven research in drug clinical trials become evident. This includes enabling a robust analysis on small datasets, supplementing the lack of clinical trial data, facilitating its utility in machine learning tasks, even extending to using the model for anomaly detection to ensure better quality control during clinical trial data collection, all while prioritizing data privacy and implementing strict data protection measures. |
first_indexed | 2024-03-10T22:53:47Z |
format | Article |
id | doaj.art-ff50ad3621ce472d85dc39a9b241e25f |
institution | Directory Open Access Journal |
issn | 2306-5729 |
language | English |
last_indexed | 2024-03-10T22:53:47Z |
publishDate | 2023-08-01 |
publisher | MDPI AG |
record_format | Article |
series | Data |
spelling | doaj.art-ff50ad3621ce472d85dc39a9b241e25f2023-11-19T10:11:36ZengMDPI AGData2306-57292023-08-018913510.3390/data8090135Enhancing Small Tabular Clinical Trial Dataset through Hybrid Data Augmentation: Combining SMOTE and WCGAN-GPWinston Wang0Tun-Wen Pai1Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei 10608, TaiwanDepartment of Computer Science and Information Engineering, National Taipei University of Technology, Taipei 10608, TaiwanThis study addressed the challenge of training generative adversarial networks (GANs) on small tabular clinical trial datasets for data augmentation, which are known to pose difficulties in training due to limited sample sizes. To overcome this obstacle, a hybrid approach is proposed, combining the synthetic minority oversampling technique (SMOTE) to initially augment the original data to a more substantial size for improving the subsequent GAN training with a Wasserstein conditional generative adversarial network with gradient penalty (WCGAN-GP), proven for its state-of-art performance and enhanced stability. The ultimate objective of this research was to demonstrate that the quality of synthetic tabular data generated by the final WCGAN-GP model maintains the structural integrity and statistical representation of the original small dataset using this hybrid approach. This focus is particularly relevant for clinical trials, where limited data availability due to privacy concerns and restricted accessibility to subject enrollment pose common challenges. Despite the limitation of data, the findings demonstrate that the hybrid approach successfully generates synthetic data that closely preserved the characteristics of the original small dataset. By harnessing the power of this hybrid approach to generate faithful synthetic data, the potential for enhancing data-driven research in drug clinical trials become evident. This includes enabling a robust analysis on small datasets, supplementing the lack of clinical trial data, facilitating its utility in machine learning tasks, even extending to using the model for anomaly detection to ensure better quality control during clinical trial data collection, all while prioritizing data privacy and implementing strict data protection measures.https://www.mdpi.com/2306-5729/8/9/135clinical trialGANmultiple sclerosissmall tabular datasetSMOTEWCGAN-GP |
spellingShingle | Winston Wang Tun-Wen Pai Enhancing Small Tabular Clinical Trial Dataset through Hybrid Data Augmentation: Combining SMOTE and WCGAN-GP Data clinical trial GAN multiple sclerosis small tabular dataset SMOTE WCGAN-GP |
title | Enhancing Small Tabular Clinical Trial Dataset through Hybrid Data Augmentation: Combining SMOTE and WCGAN-GP |
title_full | Enhancing Small Tabular Clinical Trial Dataset through Hybrid Data Augmentation: Combining SMOTE and WCGAN-GP |
title_fullStr | Enhancing Small Tabular Clinical Trial Dataset through Hybrid Data Augmentation: Combining SMOTE and WCGAN-GP |
title_full_unstemmed | Enhancing Small Tabular Clinical Trial Dataset through Hybrid Data Augmentation: Combining SMOTE and WCGAN-GP |
title_short | Enhancing Small Tabular Clinical Trial Dataset through Hybrid Data Augmentation: Combining SMOTE and WCGAN-GP |
title_sort | enhancing small tabular clinical trial dataset through hybrid data augmentation combining smote and wcgan gp |
topic | clinical trial GAN multiple sclerosis small tabular dataset SMOTE WCGAN-GP |
url | https://www.mdpi.com/2306-5729/8/9/135 |
work_keys_str_mv | AT winstonwang enhancingsmalltabularclinicaltrialdatasetthroughhybriddataaugmentationcombiningsmoteandwcgangp AT tunwenpai enhancingsmalltabularclinicaltrialdatasetthroughhybriddataaugmentationcombiningsmoteandwcgangp |