K-Means and C4.5 Decision Tree Based Prediction of Long-Term Precipitation Variability in the Poyang Lake Basin, China
The machine learning algorithms application in atmospheric sciences along the Earth System Models has the potential of improving prediction, forecast, and reconstruction of missing data. In the current study, a combination of two machine learning techniques namely K-means, and decision tree (C4.5) a...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-06-01
|
Series: | Atmosphere |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-4433/12/7/834 |
_version_ | 1797528485969788928 |
---|---|
author | Dan Lou Mengxi Yang Dawei Shi Guojie Wang Waheed Ullah Yuanfang Chai Yutian Chen |
author_facet | Dan Lou Mengxi Yang Dawei Shi Guojie Wang Waheed Ullah Yuanfang Chai Yutian Chen |
author_sort | Dan Lou |
collection | DOAJ |
description | The machine learning algorithms application in atmospheric sciences along the Earth System Models has the potential of improving prediction, forecast, and reconstruction of missing data. In the current study, a combination of two machine learning techniques namely K-means, and decision tree (C4.5) algorithms, are used to separate observed precipitation into clusters and classified the associated large-scale circulation indices. Observed precipitation from the Chinese Meteorological Agency (CMA) during 1961–2016 for 83 stations in the Poyang Lake basin (PLB) is used. The results from K-Means clusters show two precipitation clusters splitting the PLB precipitation into a northern and southern cluster, with a silhouette coefficient ~0.5. The PLB precipitation leading cluster (C1) contains 48 stations accounting for 58% of the regional station density, while Cluster 2 (C2) covers 35, accounting for 42% of the stations. The interannual variability in precipitation exhibited significant differences for both clusters. The decision tree (C4.5) is employed to explore the large-scale atmospheric indices from National Climate Center (NCC) associated with each cluster during the preceding spring season as a predictor. The C1 precipitation was linked with the location and intensity of subtropical ridgeline position over Northern Africa, whereas the C2 precipitation was suggested to be associated with the Atlantic-European Polar Vortex Area Index. The precipitation anomalies further validated the results of both algorithms. The findings are in accordance with previous studies conducted globally and hence recommend the applications of machine learning techniques in atmospheric science on a sub-regional and sub-seasonal scale. Future studies should explore the dynamics of the K-Means, and C4.5 derived indicators for a better assessment on a regional scale. This research based on machine learning methods may bring a new solution to climate forecast. |
first_indexed | 2024-03-10T09:59:00Z |
format | Article |
id | doaj.art-bc51ab0f6cd54d089afb27fa1538a9c7 |
institution | Directory Open Access Journal |
issn | 2073-4433 |
language | English |
last_indexed | 2024-03-10T09:59:00Z |
publishDate | 2021-06-01 |
publisher | MDPI AG |
record_format | Article |
series | Atmosphere |
spelling | doaj.art-bc51ab0f6cd54d089afb27fa1538a9c72023-11-22T02:05:28ZengMDPI AGAtmosphere2073-44332021-06-0112783410.3390/atmos12070834K-Means and C4.5 Decision Tree Based Prediction of Long-Term Precipitation Variability in the Poyang Lake Basin, ChinaDan Lou0Mengxi Yang1Dawei Shi2Guojie Wang3Waheed Ullah4Yuanfang Chai5Yutian Chen6Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, School of Geographical Sciences, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaTaizhou Meteorological Bureau, Taizhou 318000, ChinaLianyungang Meteorological Bureau, Lianyungang 222199, ChinaCollaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, School of Geographical Sciences, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaCollaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, School of Geographical Sciences, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaSchool of Earth Sciences, Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam, The NetherlandsCollaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, School of Geographical Sciences, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaThe machine learning algorithms application in atmospheric sciences along the Earth System Models has the potential of improving prediction, forecast, and reconstruction of missing data. In the current study, a combination of two machine learning techniques namely K-means, and decision tree (C4.5) algorithms, are used to separate observed precipitation into clusters and classified the associated large-scale circulation indices. Observed precipitation from the Chinese Meteorological Agency (CMA) during 1961–2016 for 83 stations in the Poyang Lake basin (PLB) is used. The results from K-Means clusters show two precipitation clusters splitting the PLB precipitation into a northern and southern cluster, with a silhouette coefficient ~0.5. The PLB precipitation leading cluster (C1) contains 48 stations accounting for 58% of the regional station density, while Cluster 2 (C2) covers 35, accounting for 42% of the stations. The interannual variability in precipitation exhibited significant differences for both clusters. The decision tree (C4.5) is employed to explore the large-scale atmospheric indices from National Climate Center (NCC) associated with each cluster during the preceding spring season as a predictor. The C1 precipitation was linked with the location and intensity of subtropical ridgeline position over Northern Africa, whereas the C2 precipitation was suggested to be associated with the Atlantic-European Polar Vortex Area Index. The precipitation anomalies further validated the results of both algorithms. The findings are in accordance with previous studies conducted globally and hence recommend the applications of machine learning techniques in atmospheric science on a sub-regional and sub-seasonal scale. Future studies should explore the dynamics of the K-Means, and C4.5 derived indicators for a better assessment on a regional scale. This research based on machine learning methods may bring a new solution to climate forecast.https://www.mdpi.com/2073-4433/12/7/834Poyang Lake basinK-meansC4.5 decision treeprecipitationclimate indicesdynamics |
spellingShingle | Dan Lou Mengxi Yang Dawei Shi Guojie Wang Waheed Ullah Yuanfang Chai Yutian Chen K-Means and C4.5 Decision Tree Based Prediction of Long-Term Precipitation Variability in the Poyang Lake Basin, China Atmosphere Poyang Lake basin K-means C4.5 decision tree precipitation climate indices dynamics |
title | K-Means and C4.5 Decision Tree Based Prediction of Long-Term Precipitation Variability in the Poyang Lake Basin, China |
title_full | K-Means and C4.5 Decision Tree Based Prediction of Long-Term Precipitation Variability in the Poyang Lake Basin, China |
title_fullStr | K-Means and C4.5 Decision Tree Based Prediction of Long-Term Precipitation Variability in the Poyang Lake Basin, China |
title_full_unstemmed | K-Means and C4.5 Decision Tree Based Prediction of Long-Term Precipitation Variability in the Poyang Lake Basin, China |
title_short | K-Means and C4.5 Decision Tree Based Prediction of Long-Term Precipitation Variability in the Poyang Lake Basin, China |
title_sort | k means and c4 5 decision tree based prediction of long term precipitation variability in the poyang lake basin china |
topic | Poyang Lake basin K-means C4.5 decision tree precipitation climate indices dynamics |
url | https://www.mdpi.com/2073-4433/12/7/834 |
work_keys_str_mv | AT danlou kmeansandc45decisiontreebasedpredictionoflongtermprecipitationvariabilityinthepoyanglakebasinchina AT mengxiyang kmeansandc45decisiontreebasedpredictionoflongtermprecipitationvariabilityinthepoyanglakebasinchina AT daweishi kmeansandc45decisiontreebasedpredictionoflongtermprecipitationvariabilityinthepoyanglakebasinchina AT guojiewang kmeansandc45decisiontreebasedpredictionoflongtermprecipitationvariabilityinthepoyanglakebasinchina AT waheedullah kmeansandc45decisiontreebasedpredictionoflongtermprecipitationvariabilityinthepoyanglakebasinchina AT yuanfangchai kmeansandc45decisiontreebasedpredictionoflongtermprecipitationvariabilityinthepoyanglakebasinchina AT yutianchen kmeansandc45decisiontreebasedpredictionoflongtermprecipitationvariabilityinthepoyanglakebasinchina |