ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic Samples
Most existing data synthesis methods are designed to tackle problems with dataset imbalance, data anonymization, and an insufficient sample size. There is a lack of effective synthesis methods in cases where the actual datasets have a limited number of data points but a large number of features and...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-09-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/11/18/3891 |
_version_ | 1797578991825059840 |
---|---|
author | Yukun Du Yitao Cai Xiao Jin Hongxia Wang Yao Li Min Lu |
author_facet | Yukun Du Yitao Cai Xiao Jin Hongxia Wang Yao Li Min Lu |
author_sort | Yukun Du |
collection | DOAJ |
description | Most existing data synthesis methods are designed to tackle problems with dataset imbalance, data anonymization, and an insufficient sample size. There is a lack of effective synthesis methods in cases where the actual datasets have a limited number of data points but a large number of features and unknown noise. Thus, in this paper we propose a data synthesis method named Adaptive Subspace Interpolation for Data Synthesis (ASIDS). The idea is to divide the original data feature space into several subspaces with an equal number of data points, and then perform interpolation on the data points in the adjacent subspaces. This method can adaptively adjust the sample size of the synthetic dataset that contains unknown noise, and the generated sample data typically contain minimal errors. Moreover, it adjusts the feature composition of the data points, which can significantly reduce the proportion of the data points with large fitting errors. Furthermore, the hyperparameters of this method have an intuitive interpretation and usually require little calibration. Analysis results obtained using simulated original data and benchmark original datasets demonstrate that ASIDS is a robust and stable method for data synthesis. |
first_indexed | 2024-03-10T22:30:31Z |
format | Article |
id | doaj.art-fb69f13b31894f7b8baccf308f630c84 |
institution | Directory Open Access Journal |
issn | 2227-7390 |
language | English |
last_indexed | 2024-03-10T22:30:31Z |
publishDate | 2023-09-01 |
publisher | MDPI AG |
record_format | Article |
series | Mathematics |
spelling | doaj.art-fb69f13b31894f7b8baccf308f630c842023-11-19T11:49:02ZengMDPI AGMathematics2227-73902023-09-011118389110.3390/math11183891ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic SamplesYukun Du0Yitao Cai1Xiao Jin2Hongxia Wang3Yao Li4Min Lu5School of Statistics and Data Science, Nanjing Audit University, Nanjing 211815, ChinaSchool of Statistics and Data Science, Nanjing Audit University, Nanjing 211815, ChinaSchool of Statistics and Data Science, Nanjing Audit University, Nanjing 211815, ChinaSchool of Statistics and Data Science, Nanjing Audit University, Nanjing 211815, ChinaSchool of Statistics and Data Science, Nanjing Audit University, Nanjing 211815, ChinaSchool of Statistics and Data Science, Nanjing Audit University, Nanjing 211815, ChinaMost existing data synthesis methods are designed to tackle problems with dataset imbalance, data anonymization, and an insufficient sample size. There is a lack of effective synthesis methods in cases where the actual datasets have a limited number of data points but a large number of features and unknown noise. Thus, in this paper we propose a data synthesis method named Adaptive Subspace Interpolation for Data Synthesis (ASIDS). The idea is to divide the original data feature space into several subspaces with an equal number of data points, and then perform interpolation on the data points in the adjacent subspaces. This method can adaptively adjust the sample size of the synthetic dataset that contains unknown noise, and the generated sample data typically contain minimal errors. Moreover, it adjusts the feature composition of the data points, which can significantly reduce the proportion of the data points with large fitting errors. Furthermore, the hyperparameters of this method have an intuitive interpretation and usually require little calibration. Analysis results obtained using simulated original data and benchmark original datasets demonstrate that ASIDS is a robust and stable method for data synthesis.https://www.mdpi.com/2227-7390/11/18/3891data synthesisunknown noiseinterpolationsample optimizationrobust and stable |
spellingShingle | Yukun Du Yitao Cai Xiao Jin Hongxia Wang Yao Li Min Lu ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic Samples Mathematics data synthesis unknown noise interpolation sample optimization robust and stable |
title | ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic Samples |
title_full | ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic Samples |
title_fullStr | ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic Samples |
title_full_unstemmed | ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic Samples |
title_short | ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic Samples |
title_sort | asids a robust data synthesis method for generating optimal synthetic samples |
topic | data synthesis unknown noise interpolation sample optimization robust and stable |
url | https://www.mdpi.com/2227-7390/11/18/3891 |
work_keys_str_mv | AT yukundu asidsarobustdatasynthesismethodforgeneratingoptimalsyntheticsamples AT yitaocai asidsarobustdatasynthesismethodforgeneratingoptimalsyntheticsamples AT xiaojin asidsarobustdatasynthesismethodforgeneratingoptimalsyntheticsamples AT hongxiawang asidsarobustdatasynthesismethodforgeneratingoptimalsyntheticsamples AT yaoli asidsarobustdatasynthesismethodforgeneratingoptimalsyntheticsamples AT minlu asidsarobustdatasynthesismethodforgeneratingoptimalsyntheticsamples |