Machine-Learning-Based Imputation Method for Filling Missing Values in Ground Meteorological Observation Data

Ground meteorological observation data (GMOD) are the core of research on earth-related disciplines and an important reference for societal production and life. Unfortunately, due to operational issues or equipment failures, missing values may occur in GMOD. Hence, the imputation of missing data is...

Full description

Bibliographic Details
Main Authors: Cong Li, Xupeng Ren, Guohui Zhao
Format: Article
Language:English
Published: MDPI AG 2023-09-01
Series:Algorithms
Subjects:
Online Access:https://www.mdpi.com/1999-4893/16/9/422
_version_ 1827727617047920640
author Cong Li
Xupeng Ren
Guohui Zhao
author_facet Cong Li
Xupeng Ren
Guohui Zhao
author_sort Cong Li
collection DOAJ
description Ground meteorological observation data (GMOD) are the core of research on earth-related disciplines and an important reference for societal production and life. Unfortunately, due to operational issues or equipment failures, missing values may occur in GMOD. Hence, the imputation of missing data is a prevalent issue during the pre-processing of GMOD. Although a large number of machine-learning methods have been applied to the field of meteorological missing value imputation and have achieved good results, they are usually aimed at specific meteorological elements, and few studies discuss imputation when multiple elements are randomly missing in the dataset. This paper designed a machine-learning-based multidimensional meteorological data imputation framework (MMDIF), which can use the predictions of machine-learning methods to impute the GMOD with random missing values in multiple attributes, and tested the effectiveness of 20 machine-learning methods on imputing missing values within 124 meteorological stations across six different climatic regions based on the MMDIF. The results show that MMDIF-RF was the most effective missing value imputation method; it is better than other methods for imputing 11 types of hourly meteorological elements. Although this paper applied MMDIF to the imputation of missing values in meteorological data, the method can also provide guidance for dataset reconstruction in other industries.
first_indexed 2024-03-10T23:07:54Z
format Article
id doaj.art-cd97678810144ba9a583368bce5e04c5
institution Directory Open Access Journal
issn 1999-4893
language English
last_indexed 2024-03-10T23:07:54Z
publishDate 2023-09-01
publisher MDPI AG
record_format Article
series Algorithms
spelling doaj.art-cd97678810144ba9a583368bce5e04c52023-11-19T09:12:55ZengMDPI AGAlgorithms1999-48932023-09-0116942210.3390/a16090422Machine-Learning-Based Imputation Method for Filling Missing Values in Ground Meteorological Observation DataCong Li0Xupeng Ren1Guohui Zhao2School of Computer and Communication, LanZhou University of Technology, LanZhou 730050, ChinaSchool of Computer and Communication, LanZhou University of Technology, LanZhou 730050, ChinaNational Cryosphere Desert Date Center, Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, Lanzhou 730000, ChinaGround meteorological observation data (GMOD) are the core of research on earth-related disciplines and an important reference for societal production and life. Unfortunately, due to operational issues or equipment failures, missing values may occur in GMOD. Hence, the imputation of missing data is a prevalent issue during the pre-processing of GMOD. Although a large number of machine-learning methods have been applied to the field of meteorological missing value imputation and have achieved good results, they are usually aimed at specific meteorological elements, and few studies discuss imputation when multiple elements are randomly missing in the dataset. This paper designed a machine-learning-based multidimensional meteorological data imputation framework (MMDIF), which can use the predictions of machine-learning methods to impute the GMOD with random missing values in multiple attributes, and tested the effectiveness of 20 machine-learning methods on imputing missing values within 124 meteorological stations across six different climatic regions based on the MMDIF. The results show that MMDIF-RF was the most effective missing value imputation method; it is better than other methods for imputing 11 types of hourly meteorological elements. Although this paper applied MMDIF to the imputation of missing values in meteorological data, the method can also provide guidance for dataset reconstruction in other industries.https://www.mdpi.com/1999-4893/16/9/422meteorological datamissing value imputationmachine learningreconstruction
spellingShingle Cong Li
Xupeng Ren
Guohui Zhao
Machine-Learning-Based Imputation Method for Filling Missing Values in Ground Meteorological Observation Data
Algorithms
meteorological data
missing value imputation
machine learning
reconstruction
title Machine-Learning-Based Imputation Method for Filling Missing Values in Ground Meteorological Observation Data
title_full Machine-Learning-Based Imputation Method for Filling Missing Values in Ground Meteorological Observation Data
title_fullStr Machine-Learning-Based Imputation Method for Filling Missing Values in Ground Meteorological Observation Data
title_full_unstemmed Machine-Learning-Based Imputation Method for Filling Missing Values in Ground Meteorological Observation Data
title_short Machine-Learning-Based Imputation Method for Filling Missing Values in Ground Meteorological Observation Data
title_sort machine learning based imputation method for filling missing values in ground meteorological observation data
topic meteorological data
missing value imputation
machine learning
reconstruction
url https://www.mdpi.com/1999-4893/16/9/422
work_keys_str_mv AT congli machinelearningbasedimputationmethodforfillingmissingvaluesingroundmeteorologicalobservationdata
AT xupengren machinelearningbasedimputationmethodforfillingmissingvaluesingroundmeteorologicalobservationdata
AT guohuizhao machinelearningbasedimputationmethodforfillingmissingvaluesingroundmeteorologicalobservationdata