DEGAIN: Generative-Adversarial-Network-Based Missing Data Imputation

Insights and analysis are only as good as the available data. Data cleaning is one of the most important steps to create quality data decision making. Machine learning (ML) helps deal with data quickly, and to create error-free or limited-error datasets. One of the quality standards for cleaning the...

Full description

Bibliographic Details
Main Authors: Reza Shahbazian, Irina Trubitsyna
Format: Article
Language:English
Published: MDPI AG 2022-12-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/13/12/575
_version_ 1797457105877204992
author Reza Shahbazian
Irina Trubitsyna
author_facet Reza Shahbazian
Irina Trubitsyna
author_sort Reza Shahbazian
collection DOAJ
description Insights and analysis are only as good as the available data. Data cleaning is one of the most important steps to create quality data decision making. Machine learning (ML) helps deal with data quickly, and to create error-free or limited-error datasets. One of the quality standards for cleaning the data includes handling the missing data, also known as data imputation. This research focuses on the use of machine learning methods to deal with missing data. In particular, we propose a generative adversarial network (GAN) based model called DEGAIN to estimate the missing values in the dataset. We evaluate the performance of the presented method and compare the results with some of the existing methods on publicly available Letter Recognition and SPAM datasets. The Letter dataset consists of 20,000 samples and 16 input features and the SPAM dataset consists of 4601 samples and 57 input features. The results show that the proposed DEGAIN outperforms the existing ones in terms of root mean square error and Frechet inception distance metrics.
first_indexed 2024-03-09T16:17:24Z
format Article
id doaj.art-0a2e2a726765454380effc1b9ada05e7
institution Directory Open Access Journal
issn 2078-2489
language English
last_indexed 2024-03-09T16:17:24Z
publishDate 2022-12-01
publisher MDPI AG
record_format Article
series Information
spelling doaj.art-0a2e2a726765454380effc1b9ada05e72023-11-24T15:37:22ZengMDPI AGInformation2078-24892022-12-01131257510.3390/info13120575DEGAIN: Generative-Adversarial-Network-Based Missing Data ImputationReza Shahbazian0Irina Trubitsyna1Department of Informatics, Modeling, Electronics and System Engineering, University of Calabria, 87036 Rende, ItalyDepartment of Informatics, Modeling, Electronics and System Engineering, University of Calabria, 87036 Rende, ItalyInsights and analysis are only as good as the available data. Data cleaning is one of the most important steps to create quality data decision making. Machine learning (ML) helps deal with data quickly, and to create error-free or limited-error datasets. One of the quality standards for cleaning the data includes handling the missing data, also known as data imputation. This research focuses on the use of machine learning methods to deal with missing data. In particular, we propose a generative adversarial network (GAN) based model called DEGAIN to estimate the missing values in the dataset. We evaluate the performance of the presented method and compare the results with some of the existing methods on publicly available Letter Recognition and SPAM datasets. The Letter dataset consists of 20,000 samples and 16 input features and the SPAM dataset consists of 4601 samples and 57 input features. The results show that the proposed DEGAIN outperforms the existing ones in terms of root mean square error and Frechet inception distance metrics.https://www.mdpi.com/2078-2489/13/12/575machine learningdata cleaningmissing datadata imputationgenerative networks
spellingShingle Reza Shahbazian
Irina Trubitsyna
DEGAIN: Generative-Adversarial-Network-Based Missing Data Imputation
Information
machine learning
data cleaning
missing data
data imputation
generative networks
title DEGAIN: Generative-Adversarial-Network-Based Missing Data Imputation
title_full DEGAIN: Generative-Adversarial-Network-Based Missing Data Imputation
title_fullStr DEGAIN: Generative-Adversarial-Network-Based Missing Data Imputation
title_full_unstemmed DEGAIN: Generative-Adversarial-Network-Based Missing Data Imputation
title_short DEGAIN: Generative-Adversarial-Network-Based Missing Data Imputation
title_sort degain generative adversarial network based missing data imputation
topic machine learning
data cleaning
missing data
data imputation
generative networks
url https://www.mdpi.com/2078-2489/13/12/575
work_keys_str_mv AT rezashahbazian degaingenerativeadversarialnetworkbasedmissingdataimputation
AT irinatrubitsyna degaingenerativeadversarialnetworkbasedmissingdataimputation