An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate Characteristics

With rapid urbanization, awareness of environmental pollution is growing rapidly and, accordingly, interest in environmental sensors that measure atmospheric and indoor air quality is increasing. Since these IoT-based environmental sensors are sensitive and value reliability, it is essential to deal...

Full description

Bibliographic Details
Main Authors: Chanyoung Choi, Haewoong Jung, Jaehyuk Cho
Format: Article
Language:English
Published: MDPI AG 2021-11-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/21/22/7595
_version_ 1797508553794125824
author Chanyoung Choi
Haewoong Jung
Jaehyuk Cho
author_facet Chanyoung Choi
Haewoong Jung
Jaehyuk Cho
author_sort Chanyoung Choi
collection DOAJ
description With rapid urbanization, awareness of environmental pollution is growing rapidly and, accordingly, interest in environmental sensors that measure atmospheric and indoor air quality is increasing. Since these IoT-based environmental sensors are sensitive and value reliability, it is essential to deal with missing values, which are one of the causes of reliability problems. Characteristics that can be used to impute missing values in environmental sensors are the time dependency of single variables and the correlation between multivariate variables. However, in the existing method of imputing missing values, only one characteristic has been used and there has been no case where both characteristics were used. In this work, we introduced a new ensemble imputation method reflecting this. First, the cases in which missing values occur frequently were divided into four cases and were generated into the experimental data: communication error (aperiodic, periodic), sensor error (rapid change, measurement range). To compare the existing method with the proposed method, five methods of univariate imputation and five methods of multivariate imputation—both of which are widely used—were used as a single model to predict missing values for the four cases. The values predicted by a single model were applied to the ensemble method. Among the ensemble methods, the weighted average and stacking methods were used to derive the final predicted values and replace the missing values. Finally, the predicted values, substituted with the original data, were evaluated by a comparison between the mean absolute error (MAE) and the root mean square error (RMSE). The proposed ensemble method generally performed better than the single method. In addition, this method simultaneously considers the correlation between variables and time dependence, which are characteristics that must be considered in the environmental sensor. As a result, our proposed ensemble technique can contribute to the replacement of the missing values generated by environmental sensors, which can help to increase the reliability of environmental sensor data.
first_indexed 2024-03-10T05:05:34Z
format Article
id doaj.art-f2eff07a07814bb78470a1c93081f821
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-10T05:05:34Z
publishDate 2021-11-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-f2eff07a07814bb78470a1c93081f8212023-11-23T01:26:19ZengMDPI AGSensors1424-82202021-11-012122759510.3390/s21227595An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate CharacteristicsChanyoung Choi0Haewoong Jung1Jaehyuk Cho2School of Statistics and Actuarial Science, Soongsil University, Seoul 06978, KoreaSchool of Electronic Engineering, Soongsil University, Seoul 06978, KoreaSchool of Electronic Engineering, Soongsil University, Seoul 06978, KoreaWith rapid urbanization, awareness of environmental pollution is growing rapidly and, accordingly, interest in environmental sensors that measure atmospheric and indoor air quality is increasing. Since these IoT-based environmental sensors are sensitive and value reliability, it is essential to deal with missing values, which are one of the causes of reliability problems. Characteristics that can be used to impute missing values in environmental sensors are the time dependency of single variables and the correlation between multivariate variables. However, in the existing method of imputing missing values, only one characteristic has been used and there has been no case where both characteristics were used. In this work, we introduced a new ensemble imputation method reflecting this. First, the cases in which missing values occur frequently were divided into four cases and were generated into the experimental data: communication error (aperiodic, periodic), sensor error (rapid change, measurement range). To compare the existing method with the proposed method, five methods of univariate imputation and five methods of multivariate imputation—both of which are widely used—were used as a single model to predict missing values for the four cases. The values predicted by a single model were applied to the ensemble method. Among the ensemble methods, the weighted average and stacking methods were used to derive the final predicted values and replace the missing values. Finally, the predicted values, substituted with the original data, were evaluated by a comparison between the mean absolute error (MAE) and the root mean square error (RMSE). The proposed ensemble method generally performed better than the single method. In addition, this method simultaneously considers the correlation between variables and time dependence, which are characteristics that must be considered in the environmental sensor. As a result, our proposed ensemble technique can contribute to the replacement of the missing values generated by environmental sensors, which can help to increase the reliability of environmental sensor data.https://www.mdpi.com/1424-8220/21/22/7595missing dataenvironmental sensorunivariate and multivariate imputationmachine learningensemble method
spellingShingle Chanyoung Choi
Haewoong Jung
Jaehyuk Cho
An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate Characteristics
Sensors
missing data
environmental sensor
univariate and multivariate imputation
machine learning
ensemble method
title An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate Characteristics
title_full An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate Characteristics
title_fullStr An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate Characteristics
title_full_unstemmed An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate Characteristics
title_short An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate Characteristics
title_sort ensemble method for missing data of environmental sensor considering univariate and multivariate characteristics
topic missing data
environmental sensor
univariate and multivariate imputation
machine learning
ensemble method
url https://www.mdpi.com/1424-8220/21/22/7595
work_keys_str_mv AT chanyoungchoi anensemblemethodformissingdataofenvironmentalsensorconsideringunivariateandmultivariatecharacteristics
AT haewoongjung anensemblemethodformissingdataofenvironmentalsensorconsideringunivariateandmultivariatecharacteristics
AT jaehyukcho anensemblemethodformissingdataofenvironmentalsensorconsideringunivariateandmultivariatecharacteristics
AT chanyoungchoi ensemblemethodformissingdataofenvironmentalsensorconsideringunivariateandmultivariatecharacteristics
AT haewoongjung ensemblemethodformissingdataofenvironmentalsensorconsideringunivariateandmultivariatecharacteristics
AT jaehyukcho ensemblemethodformissingdataofenvironmentalsensorconsideringunivariateandmultivariatecharacteristics