K-Means and C4.5 Decision Tree Based Prediction of Long-Term Precipitation Variability in the Poyang Lake Basin, China

The machine learning algorithms application in atmospheric sciences along the Earth System Models has the potential of improving prediction, forecast, and reconstruction of missing data. In the current study, a combination of two machine learning techniques namely K-means, and decision tree (C4.5) a...

Full description

Bibliographic Details
Main Authors: Dan Lou, Mengxi Yang, Dawei Shi, Guojie Wang, Waheed Ullah, Yuanfang Chai, Yutian Chen
Format: Article
Language:English
Published: MDPI AG 2021-06-01
Series:Atmosphere
Subjects:
Online Access:https://www.mdpi.com/2073-4433/12/7/834
_version_ 1797528485969788928
author Dan Lou
Mengxi Yang
Dawei Shi
Guojie Wang
Waheed Ullah
Yuanfang Chai
Yutian Chen
author_facet Dan Lou
Mengxi Yang
Dawei Shi
Guojie Wang
Waheed Ullah
Yuanfang Chai
Yutian Chen
author_sort Dan Lou
collection DOAJ
description The machine learning algorithms application in atmospheric sciences along the Earth System Models has the potential of improving prediction, forecast, and reconstruction of missing data. In the current study, a combination of two machine learning techniques namely K-means, and decision tree (C4.5) algorithms, are used to separate observed precipitation into clusters and classified the associated large-scale circulation indices. Observed precipitation from the Chinese Meteorological Agency (CMA) during 1961–2016 for 83 stations in the Poyang Lake basin (PLB) is used. The results from K-Means clusters show two precipitation clusters splitting the PLB precipitation into a northern and southern cluster, with a silhouette coefficient ~0.5. The PLB precipitation leading cluster (C1) contains 48 stations accounting for 58% of the regional station density, while Cluster 2 (C2) covers 35, accounting for 42% of the stations. The interannual variability in precipitation exhibited significant differences for both clusters. The decision tree (C4.5) is employed to explore the large-scale atmospheric indices from National Climate Center (NCC) associated with each cluster during the preceding spring season as a predictor. The C1 precipitation was linked with the location and intensity of subtropical ridgeline position over Northern Africa, whereas the C2 precipitation was suggested to be associated with the Atlantic-European Polar Vortex Area Index. The precipitation anomalies further validated the results of both algorithms. The findings are in accordance with previous studies conducted globally and hence recommend the applications of machine learning techniques in atmospheric science on a sub-regional and sub-seasonal scale. Future studies should explore the dynamics of the K-Means, and C4.5 derived indicators for a better assessment on a regional scale. This research based on machine learning methods may bring a new solution to climate forecast.
first_indexed 2024-03-10T09:59:00Z
format Article
id doaj.art-bc51ab0f6cd54d089afb27fa1538a9c7
institution Directory Open Access Journal
issn 2073-4433
language English
last_indexed 2024-03-10T09:59:00Z
publishDate 2021-06-01
publisher MDPI AG
record_format Article
series Atmosphere
spelling doaj.art-bc51ab0f6cd54d089afb27fa1538a9c72023-11-22T02:05:28ZengMDPI AGAtmosphere2073-44332021-06-0112783410.3390/atmos12070834K-Means and C4.5 Decision Tree Based Prediction of Long-Term Precipitation Variability in the Poyang Lake Basin, ChinaDan Lou0Mengxi Yang1Dawei Shi2Guojie Wang3Waheed Ullah4Yuanfang Chai5Yutian Chen6Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, School of Geographical Sciences, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaTaizhou Meteorological Bureau, Taizhou 318000, ChinaLianyungang Meteorological Bureau, Lianyungang 222199, ChinaCollaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, School of Geographical Sciences, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaCollaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, School of Geographical Sciences, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaSchool of Earth Sciences, Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam, The NetherlandsCollaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, School of Geographical Sciences, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaThe machine learning algorithms application in atmospheric sciences along the Earth System Models has the potential of improving prediction, forecast, and reconstruction of missing data. In the current study, a combination of two machine learning techniques namely K-means, and decision tree (C4.5) algorithms, are used to separate observed precipitation into clusters and classified the associated large-scale circulation indices. Observed precipitation from the Chinese Meteorological Agency (CMA) during 1961–2016 for 83 stations in the Poyang Lake basin (PLB) is used. The results from K-Means clusters show two precipitation clusters splitting the PLB precipitation into a northern and southern cluster, with a silhouette coefficient ~0.5. The PLB precipitation leading cluster (C1) contains 48 stations accounting for 58% of the regional station density, while Cluster 2 (C2) covers 35, accounting for 42% of the stations. The interannual variability in precipitation exhibited significant differences for both clusters. The decision tree (C4.5) is employed to explore the large-scale atmospheric indices from National Climate Center (NCC) associated with each cluster during the preceding spring season as a predictor. The C1 precipitation was linked with the location and intensity of subtropical ridgeline position over Northern Africa, whereas the C2 precipitation was suggested to be associated with the Atlantic-European Polar Vortex Area Index. The precipitation anomalies further validated the results of both algorithms. The findings are in accordance with previous studies conducted globally and hence recommend the applications of machine learning techniques in atmospheric science on a sub-regional and sub-seasonal scale. Future studies should explore the dynamics of the K-Means, and C4.5 derived indicators for a better assessment on a regional scale. This research based on machine learning methods may bring a new solution to climate forecast.https://www.mdpi.com/2073-4433/12/7/834Poyang Lake basinK-meansC4.5 decision treeprecipitationclimate indicesdynamics
spellingShingle Dan Lou
Mengxi Yang
Dawei Shi
Guojie Wang
Waheed Ullah
Yuanfang Chai
Yutian Chen
K-Means and C4.5 Decision Tree Based Prediction of Long-Term Precipitation Variability in the Poyang Lake Basin, China
Atmosphere
Poyang Lake basin
K-means
C4.5 decision tree
precipitation
climate indices
dynamics
title K-Means and C4.5 Decision Tree Based Prediction of Long-Term Precipitation Variability in the Poyang Lake Basin, China
title_full K-Means and C4.5 Decision Tree Based Prediction of Long-Term Precipitation Variability in the Poyang Lake Basin, China
title_fullStr K-Means and C4.5 Decision Tree Based Prediction of Long-Term Precipitation Variability in the Poyang Lake Basin, China
title_full_unstemmed K-Means and C4.5 Decision Tree Based Prediction of Long-Term Precipitation Variability in the Poyang Lake Basin, China
title_short K-Means and C4.5 Decision Tree Based Prediction of Long-Term Precipitation Variability in the Poyang Lake Basin, China
title_sort k means and c4 5 decision tree based prediction of long term precipitation variability in the poyang lake basin china
topic Poyang Lake basin
K-means
C4.5 decision tree
precipitation
climate indices
dynamics
url https://www.mdpi.com/2073-4433/12/7/834
work_keys_str_mv AT danlou kmeansandc45decisiontreebasedpredictionoflongtermprecipitationvariabilityinthepoyanglakebasinchina
AT mengxiyang kmeansandc45decisiontreebasedpredictionoflongtermprecipitationvariabilityinthepoyanglakebasinchina
AT daweishi kmeansandc45decisiontreebasedpredictionoflongtermprecipitationvariabilityinthepoyanglakebasinchina
AT guojiewang kmeansandc45decisiontreebasedpredictionoflongtermprecipitationvariabilityinthepoyanglakebasinchina
AT waheedullah kmeansandc45decisiontreebasedpredictionoflongtermprecipitationvariabilityinthepoyanglakebasinchina
AT yuanfangchai kmeansandc45decisiontreebasedpredictionoflongtermprecipitationvariabilityinthepoyanglakebasinchina
AT yutianchen kmeansandc45decisiontreebasedpredictionoflongtermprecipitationvariabilityinthepoyanglakebasinchina