Missing Data Imputation Method Combining Random Forest and Generative Adversarial Imputation Network

(1) Background: In order to solve the problem of missing time-series data due to the influence of the acquisition system or external factors, a missing time-series data interpolation method based on random forest and a generative adversarial interpolation network is proposed. (2) Methods: First, the...

Full description

Bibliographic Details
Main Authors: Hongsen Ou, Yunan Yao, Yi He
Format: Article
Language:English
Published: MDPI AG 2024-02-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/24/4/1112
_version_ 1827342821900681216
author Hongsen Ou
Yunan Yao
Yi He
author_facet Hongsen Ou
Yunan Yao
Yi He
author_sort Hongsen Ou
collection DOAJ
description (1) Background: In order to solve the problem of missing time-series data due to the influence of the acquisition system or external factors, a missing time-series data interpolation method based on random forest and a generative adversarial interpolation network is proposed. (2) Methods: First, the position of the missing part of the data is calibrated, and the trained random forest algorithm is used for the first data interpolation. The output value of the random forest algorithm is used as the input value of the generative adversarial interpolation network, and the generative adversarial interpolation network is used to calibrate the position. The data are interpolated for the second time, and the advantages of the two algorithms are combined to make the interpolation result closer to the true value. (3) Results: The filling effect of the algorithm is tested on a certain bearing data set, and the root mean square error (RMSE) is used to evaluate the interpolation results. The results show that the RMSE of the interpolation results based on the random forest and generative adversarial interpolation network algorithms in the case of single-segment and multi-segment missing data is only 0.0157, 0.0386, and 0.0527, which is better than the random forest algorithm, generative adversarial interpolation network algorithm, and K-nearest neighbor algorithm. (4) Conclusions: The proposed algorithm performs well in each data set and provides a reference method in the field of data filling.
first_indexed 2024-03-07T22:15:03Z
format Article
id doaj.art-86f1123f8b3040b88a6c2dee86884f9f
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-07T22:15:03Z
publishDate 2024-02-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-86f1123f8b3040b88a6c2dee86884f9f2024-02-23T15:33:36ZengMDPI AGSensors1424-82202024-02-01244111210.3390/s24041112Missing Data Imputation Method Combining Random Forest and Generative Adversarial Imputation NetworkHongsen Ou0Yunan Yao1Yi He2School of Naval Architecture, Ocean and Energy Power Engineering, Wuhan University of Technology, Wuhan 430063, ChinaSchool of Naval Architecture, Ocean and Energy Power Engineering, Wuhan University of Technology, Wuhan 430063, ChinaSchool of Naval Architecture, Ocean and Energy Power Engineering, Wuhan University of Technology, Wuhan 430063, China(1) Background: In order to solve the problem of missing time-series data due to the influence of the acquisition system or external factors, a missing time-series data interpolation method based on random forest and a generative adversarial interpolation network is proposed. (2) Methods: First, the position of the missing part of the data is calibrated, and the trained random forest algorithm is used for the first data interpolation. The output value of the random forest algorithm is used as the input value of the generative adversarial interpolation network, and the generative adversarial interpolation network is used to calibrate the position. The data are interpolated for the second time, and the advantages of the two algorithms are combined to make the interpolation result closer to the true value. (3) Results: The filling effect of the algorithm is tested on a certain bearing data set, and the root mean square error (RMSE) is used to evaluate the interpolation results. The results show that the RMSE of the interpolation results based on the random forest and generative adversarial interpolation network algorithms in the case of single-segment and multi-segment missing data is only 0.0157, 0.0386, and 0.0527, which is better than the random forest algorithm, generative adversarial interpolation network algorithm, and K-nearest neighbor algorithm. (4) Conclusions: The proposed algorithm performs well in each data set and provides a reference method in the field of data filling.https://www.mdpi.com/1424-8220/24/4/1112random forestgenerative adversarial interpolation networktime-series datadata interpolation
spellingShingle Hongsen Ou
Yunan Yao
Yi He
Missing Data Imputation Method Combining Random Forest and Generative Adversarial Imputation Network
Sensors
random forest
generative adversarial interpolation network
time-series data
data interpolation
title Missing Data Imputation Method Combining Random Forest and Generative Adversarial Imputation Network
title_full Missing Data Imputation Method Combining Random Forest and Generative Adversarial Imputation Network
title_fullStr Missing Data Imputation Method Combining Random Forest and Generative Adversarial Imputation Network
title_full_unstemmed Missing Data Imputation Method Combining Random Forest and Generative Adversarial Imputation Network
title_short Missing Data Imputation Method Combining Random Forest and Generative Adversarial Imputation Network
title_sort missing data imputation method combining random forest and generative adversarial imputation network
topic random forest
generative adversarial interpolation network
time-series data
data interpolation
url https://www.mdpi.com/1424-8220/24/4/1112
work_keys_str_mv AT hongsenou missingdataimputationmethodcombiningrandomforestandgenerativeadversarialimputationnetwork
AT yunanyao missingdataimputationmethodcombiningrandomforestandgenerativeadversarialimputationnetwork
AT yihe missingdataimputationmethodcombiningrandomforestandgenerativeadversarialimputationnetwork