ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic Samples

Most existing data synthesis methods are designed to tackle problems with dataset imbalance, data anonymization, and an insufficient sample size. There is a lack of effective synthesis methods in cases where the actual datasets have a limited number of data points but a large number of features and...

Full description

Bibliographic Details
Main Authors: Yukun Du, Yitao Cai, Xiao Jin, Hongxia Wang, Yao Li, Min Lu
Format: Article
Language:English
Published: MDPI AG 2023-09-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/11/18/3891
_version_ 1797578991825059840
author Yukun Du
Yitao Cai
Xiao Jin
Hongxia Wang
Yao Li
Min Lu
author_facet Yukun Du
Yitao Cai
Xiao Jin
Hongxia Wang
Yao Li
Min Lu
author_sort Yukun Du
collection DOAJ
description Most existing data synthesis methods are designed to tackle problems with dataset imbalance, data anonymization, and an insufficient sample size. There is a lack of effective synthesis methods in cases where the actual datasets have a limited number of data points but a large number of features and unknown noise. Thus, in this paper we propose a data synthesis method named Adaptive Subspace Interpolation for Data Synthesis (ASIDS). The idea is to divide the original data feature space into several subspaces with an equal number of data points, and then perform interpolation on the data points in the adjacent subspaces. This method can adaptively adjust the sample size of the synthetic dataset that contains unknown noise, and the generated sample data typically contain minimal errors. Moreover, it adjusts the feature composition of the data points, which can significantly reduce the proportion of the data points with large fitting errors. Furthermore, the hyperparameters of this method have an intuitive interpretation and usually require little calibration. Analysis results obtained using simulated original data and benchmark original datasets demonstrate that ASIDS is a robust and stable method for data synthesis.
first_indexed 2024-03-10T22:30:31Z
format Article
id doaj.art-fb69f13b31894f7b8baccf308f630c84
institution Directory Open Access Journal
issn 2227-7390
language English
last_indexed 2024-03-10T22:30:31Z
publishDate 2023-09-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj.art-fb69f13b31894f7b8baccf308f630c842023-11-19T11:49:02ZengMDPI AGMathematics2227-73902023-09-011118389110.3390/math11183891ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic SamplesYukun Du0Yitao Cai1Xiao Jin2Hongxia Wang3Yao Li4Min Lu5School of Statistics and Data Science, Nanjing Audit University, Nanjing 211815, ChinaSchool of Statistics and Data Science, Nanjing Audit University, Nanjing 211815, ChinaSchool of Statistics and Data Science, Nanjing Audit University, Nanjing 211815, ChinaSchool of Statistics and Data Science, Nanjing Audit University, Nanjing 211815, ChinaSchool of Statistics and Data Science, Nanjing Audit University, Nanjing 211815, ChinaSchool of Statistics and Data Science, Nanjing Audit University, Nanjing 211815, ChinaMost existing data synthesis methods are designed to tackle problems with dataset imbalance, data anonymization, and an insufficient sample size. There is a lack of effective synthesis methods in cases where the actual datasets have a limited number of data points but a large number of features and unknown noise. Thus, in this paper we propose a data synthesis method named Adaptive Subspace Interpolation for Data Synthesis (ASIDS). The idea is to divide the original data feature space into several subspaces with an equal number of data points, and then perform interpolation on the data points in the adjacent subspaces. This method can adaptively adjust the sample size of the synthetic dataset that contains unknown noise, and the generated sample data typically contain minimal errors. Moreover, it adjusts the feature composition of the data points, which can significantly reduce the proportion of the data points with large fitting errors. Furthermore, the hyperparameters of this method have an intuitive interpretation and usually require little calibration. Analysis results obtained using simulated original data and benchmark original datasets demonstrate that ASIDS is a robust and stable method for data synthesis.https://www.mdpi.com/2227-7390/11/18/3891data synthesisunknown noiseinterpolationsample optimizationrobust and stable
spellingShingle Yukun Du
Yitao Cai
Xiao Jin
Hongxia Wang
Yao Li
Min Lu
ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic Samples
Mathematics
data synthesis
unknown noise
interpolation
sample optimization
robust and stable
title ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic Samples
title_full ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic Samples
title_fullStr ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic Samples
title_full_unstemmed ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic Samples
title_short ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic Samples
title_sort asids a robust data synthesis method for generating optimal synthetic samples
topic data synthesis
unknown noise
interpolation
sample optimization
robust and stable
url https://www.mdpi.com/2227-7390/11/18/3891
work_keys_str_mv AT yukundu asidsarobustdatasynthesismethodforgeneratingoptimalsyntheticsamples
AT yitaocai asidsarobustdatasynthesismethodforgeneratingoptimalsyntheticsamples
AT xiaojin asidsarobustdatasynthesismethodforgeneratingoptimalsyntheticsamples
AT hongxiawang asidsarobustdatasynthesismethodforgeneratingoptimalsyntheticsamples
AT yaoli asidsarobustdatasynthesismethodforgeneratingoptimalsyntheticsamples
AT minlu asidsarobustdatasynthesismethodforgeneratingoptimalsyntheticsamples