Synthetic Data Generation for Data Envelopment Analysis

The paper is devoted to the problem of generating artificial datasets for data envelopment analysis (DEA), which can be used for testing DEA models and methods. In particular, the papers that applied DEA to big data often used synthetic data generation to obtain large-scale datasets because real dat...

Full description

Bibliographic Details
Main Author: Andrey V. Lychev
Format: Article
Language:English
Published: MDPI AG 2023-09-01
Series:Data
Subjects:
Online Access:https://www.mdpi.com/2306-5729/8/10/146
_version_ 1797574173803937792
author Andrey V. Lychev
author_facet Andrey V. Lychev
author_sort Andrey V. Lychev
collection DOAJ
description The paper is devoted to the problem of generating artificial datasets for data envelopment analysis (DEA), which can be used for testing DEA models and methods. In particular, the papers that applied DEA to big data often used synthetic data generation to obtain large-scale datasets because real datasets of large size, available in the public domain, are extremely rare. This paper proposes the algorithm which takes as input some real dataset and complements it by artificial efficient and inefficient units. The generation process extends the efficient part of the frontier by inserting artificial efficient units, keeping the original efficient frontier unchanged. For this purpose, the algorithm uses the assurance region method and consistently relaxes weight restrictions during the iterations. This approach produces synthetic datasets that are closer to real ones, compared to other algorithms that generate data from scratch. The proposed algorithm is applied to a pair of small real-life datasets. As a result, the datasets were expanded to 50K units. Computational experiments show that artificially generated DMUs preserve isotonicity and do not increase the collinearity of the original data as a whole.
first_indexed 2024-03-10T21:20:11Z
format Article
id doaj.art-f5d11e5bf67d4edfb9811dfcb820b6a6
institution Directory Open Access Journal
issn 2306-5729
language English
last_indexed 2024-03-10T21:20:11Z
publishDate 2023-09-01
publisher MDPI AG
record_format Article
series Data
spelling doaj.art-f5d11e5bf67d4edfb9811dfcb820b6a62023-11-19T16:11:23ZengMDPI AGData2306-57292023-09-0181014610.3390/data8100146Synthetic Data Generation for Data Envelopment AnalysisAndrey V. Lychev0College of Information Technologies and Computer Sciences, National University of Science and Technology “MISIS”, 4 Leninsky Ave., Bldg. 1, 119049 Moscow, RussiaThe paper is devoted to the problem of generating artificial datasets for data envelopment analysis (DEA), which can be used for testing DEA models and methods. In particular, the papers that applied DEA to big data often used synthetic data generation to obtain large-scale datasets because real datasets of large size, available in the public domain, are extremely rare. This paper proposes the algorithm which takes as input some real dataset and complements it by artificial efficient and inefficient units. The generation process extends the efficient part of the frontier by inserting artificial efficient units, keeping the original efficient frontier unchanged. For this purpose, the algorithm uses the assurance region method and consistently relaxes weight restrictions during the iterations. This approach produces synthetic datasets that are closer to real ones, compared to other algorithms that generate data from scratch. The proposed algorithm is applied to a pair of small real-life datasets. As a result, the datasets were expanded to 50K units. Computational experiments show that artificially generated DMUs preserve isotonicity and do not increase the collinearity of the original data as a whole.https://www.mdpi.com/2306-5729/8/10/146synthetic data generationdata augmentationresearch datadata envelopment analysisweight restrictions
spellingShingle Andrey V. Lychev
Synthetic Data Generation for Data Envelopment Analysis
Data
synthetic data generation
data augmentation
research data
data envelopment analysis
weight restrictions
title Synthetic Data Generation for Data Envelopment Analysis
title_full Synthetic Data Generation for Data Envelopment Analysis
title_fullStr Synthetic Data Generation for Data Envelopment Analysis
title_full_unstemmed Synthetic Data Generation for Data Envelopment Analysis
title_short Synthetic Data Generation for Data Envelopment Analysis
title_sort synthetic data generation for data envelopment analysis
topic synthetic data generation
data augmentation
research data
data envelopment analysis
weight restrictions
url https://www.mdpi.com/2306-5729/8/10/146
work_keys_str_mv AT andreyvlychev syntheticdatagenerationfordataenvelopmentanalysis