Machine-Learning-Based Imputation Method for Filling Missing Values in Ground Meteorological Observation Data
Ground meteorological observation data (GMOD) are the core of research on earth-related disciplines and an important reference for societal production and life. Unfortunately, due to operational issues or equipment failures, missing values may occur in GMOD. Hence, the imputation of missing data is...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-09-01
|
Series: | Algorithms |
Subjects: | |
Online Access: | https://www.mdpi.com/1999-4893/16/9/422 |
_version_ | 1827727617047920640 |
---|---|
author | Cong Li Xupeng Ren Guohui Zhao |
author_facet | Cong Li Xupeng Ren Guohui Zhao |
author_sort | Cong Li |
collection | DOAJ |
description | Ground meteorological observation data (GMOD) are the core of research on earth-related disciplines and an important reference for societal production and life. Unfortunately, due to operational issues or equipment failures, missing values may occur in GMOD. Hence, the imputation of missing data is a prevalent issue during the pre-processing of GMOD. Although a large number of machine-learning methods have been applied to the field of meteorological missing value imputation and have achieved good results, they are usually aimed at specific meteorological elements, and few studies discuss imputation when multiple elements are randomly missing in the dataset. This paper designed a machine-learning-based multidimensional meteorological data imputation framework (MMDIF), which can use the predictions of machine-learning methods to impute the GMOD with random missing values in multiple attributes, and tested the effectiveness of 20 machine-learning methods on imputing missing values within 124 meteorological stations across six different climatic regions based on the MMDIF. The results show that MMDIF-RF was the most effective missing value imputation method; it is better than other methods for imputing 11 types of hourly meteorological elements. Although this paper applied MMDIF to the imputation of missing values in meteorological data, the method can also provide guidance for dataset reconstruction in other industries. |
first_indexed | 2024-03-10T23:07:54Z |
format | Article |
id | doaj.art-cd97678810144ba9a583368bce5e04c5 |
institution | Directory Open Access Journal |
issn | 1999-4893 |
language | English |
last_indexed | 2024-03-10T23:07:54Z |
publishDate | 2023-09-01 |
publisher | MDPI AG |
record_format | Article |
series | Algorithms |
spelling | doaj.art-cd97678810144ba9a583368bce5e04c52023-11-19T09:12:55ZengMDPI AGAlgorithms1999-48932023-09-0116942210.3390/a16090422Machine-Learning-Based Imputation Method for Filling Missing Values in Ground Meteorological Observation DataCong Li0Xupeng Ren1Guohui Zhao2School of Computer and Communication, LanZhou University of Technology, LanZhou 730050, ChinaSchool of Computer and Communication, LanZhou University of Technology, LanZhou 730050, ChinaNational Cryosphere Desert Date Center, Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, Lanzhou 730000, ChinaGround meteorological observation data (GMOD) are the core of research on earth-related disciplines and an important reference for societal production and life. Unfortunately, due to operational issues or equipment failures, missing values may occur in GMOD. Hence, the imputation of missing data is a prevalent issue during the pre-processing of GMOD. Although a large number of machine-learning methods have been applied to the field of meteorological missing value imputation and have achieved good results, they are usually aimed at specific meteorological elements, and few studies discuss imputation when multiple elements are randomly missing in the dataset. This paper designed a machine-learning-based multidimensional meteorological data imputation framework (MMDIF), which can use the predictions of machine-learning methods to impute the GMOD with random missing values in multiple attributes, and tested the effectiveness of 20 machine-learning methods on imputing missing values within 124 meteorological stations across six different climatic regions based on the MMDIF. The results show that MMDIF-RF was the most effective missing value imputation method; it is better than other methods for imputing 11 types of hourly meteorological elements. Although this paper applied MMDIF to the imputation of missing values in meteorological data, the method can also provide guidance for dataset reconstruction in other industries.https://www.mdpi.com/1999-4893/16/9/422meteorological datamissing value imputationmachine learningreconstruction |
spellingShingle | Cong Li Xupeng Ren Guohui Zhao Machine-Learning-Based Imputation Method for Filling Missing Values in Ground Meteorological Observation Data Algorithms meteorological data missing value imputation machine learning reconstruction |
title | Machine-Learning-Based Imputation Method for Filling Missing Values in Ground Meteorological Observation Data |
title_full | Machine-Learning-Based Imputation Method for Filling Missing Values in Ground Meteorological Observation Data |
title_fullStr | Machine-Learning-Based Imputation Method for Filling Missing Values in Ground Meteorological Observation Data |
title_full_unstemmed | Machine-Learning-Based Imputation Method for Filling Missing Values in Ground Meteorological Observation Data |
title_short | Machine-Learning-Based Imputation Method for Filling Missing Values in Ground Meteorological Observation Data |
title_sort | machine learning based imputation method for filling missing values in ground meteorological observation data |
topic | meteorological data missing value imputation machine learning reconstruction |
url | https://www.mdpi.com/1999-4893/16/9/422 |
work_keys_str_mv | AT congli machinelearningbasedimputationmethodforfillingmissingvaluesingroundmeteorologicalobservationdata AT xupengren machinelearningbasedimputationmethodforfillingmissingvaluesingroundmeteorologicalobservationdata AT guohuizhao machinelearningbasedimputationmethodforfillingmissingvaluesingroundmeteorologicalobservationdata |