Estimating the grade of storm surge disaster loss in coastal areas of China via machine learning algorithms
Storm surge is the most severe marine disaster in China, affecting the whole coastal area. Estimating storm surge disaster loss (SSDL) is significant to disaster prevention, sustainability and decision-making. Taking 11 provincial administrative regions in the coastal areas of China as the study are...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2022-03-01
|
Series: | Ecological Indicators |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S1470160X22000012 |
_version_ | 1818908510702796800 |
---|---|
author | Suming Zhang Jie Zhang Xiaomin Li Xuexue Du Tangqi Zhao Qi Hou Xifang Jin |
author_facet | Suming Zhang Jie Zhang Xiaomin Li Xuexue Du Tangqi Zhao Qi Hou Xifang Jin |
author_sort | Suming Zhang |
collection | DOAJ |
description | Storm surge is the most severe marine disaster in China, affecting the whole coastal area. Estimating storm surge disaster loss (SSDL) is significant to disaster prevention, sustainability and decision-making. Taking 11 provincial administrative regions in the coastal areas of China as the study area, this paper estimated SSDL grades based on four machine learning (ML) algorithms. A total of 132 pieces of official open-source data of storm surge disasters were collected and divided into a cross-validation set (CV set) and a test set. First, a comprehensive indicator system was constructed from three perspectives, covering the hazard (16) of disaster-causing factors, the vulnerability (22) and resilience (12) of disaster-bearing bodies, including 50 indicators. A few data preprocessing methods are implemented to improve the model performance such as normalization, SMOTE, etc. Then, Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Logistic model tree (LMT), and K-star were applied to construct the estimation model of SSDL grades. Principal component analysis (PCA) and recursive feature elimination (RFE) are adopted for an intelligent screening of the indicators. Finally, the models’ performance is compared through Precision, Recall, F1 score and Kappa metrics. The results show that scientific and efficient data preparation is a strong guarantee for the reliability and stability of the models. RFE is verified more suitable for indicator selection in this paper compared with PCA. The importance ranking of RFE enhances the interpretability of the ML model, which shows that the hazard indicator is the most important, the vulnerability indicator is the second, and the resilience indicator is the least. The 27-indicator K-star model, with advantages of accurate estimation, strong generalization, and less workload, is the optimal SSDL estimation model. The number of input indicators of the optimal SSDL estimation model is 27, its CV Precision, Recall, F1 score, and Kappa are 0.838, 0.832, 0.827, and 0.776, and its Precision, Recall, F1 score, and Kappa for test set are 0.819, 0.786, 0.781, and 0.714, respectively. This paper provides a scientific basis for the government's decision-making and risk management, and it can be used as a typical demonstration case of SSDL research. |
first_indexed | 2024-12-19T22:12:10Z |
format | Article |
id | doaj.art-f1910e99ba3d42059e87011c8b1db66f |
institution | Directory Open Access Journal |
issn | 1470-160X |
language | English |
last_indexed | 2024-12-19T22:12:10Z |
publishDate | 2022-03-01 |
publisher | Elsevier |
record_format | Article |
series | Ecological Indicators |
spelling | doaj.art-f1910e99ba3d42059e87011c8b1db66f2022-12-21T20:03:52ZengElsevierEcological Indicators1470-160X2022-03-01136108533Estimating the grade of storm surge disaster loss in coastal areas of China via machine learning algorithmsSuming Zhang0Jie Zhang1Xiaomin Li2Xuexue Du3Tangqi Zhao4Qi Hou5Xifang Jin6College of Oceanography and Space Informatics, China University of Petroleum (East China), Qingdao 266580, ChinaCollege of Oceanography and Space Informatics, China University of Petroleum (East China), Qingdao 266580, China; First Institute of Oceanography, Ministry of Natural Resources of China, Qingdao 266061, China; Corresponding author at: College of Oceanography and Space Informatics, China University of Petroleum (East China), Qingdao 266580, China.First Institute of Oceanography, Ministry of Natural Resources of China, Qingdao 266061, ChinaCollege of Oceanography and Space Informatics, China University of Petroleum (East China), Qingdao 266580, ChinaCollege of Oceanography and Space Informatics, China University of Petroleum (East China), Qingdao 266580, ChinaCollege of Oceanography and Space Informatics, China University of Petroleum (East China), Qingdao 266580, ChinaNorth Sea Marine Forecast Center of State Oceanic Administration, Qingdao 266001, ChinaStorm surge is the most severe marine disaster in China, affecting the whole coastal area. Estimating storm surge disaster loss (SSDL) is significant to disaster prevention, sustainability and decision-making. Taking 11 provincial administrative regions in the coastal areas of China as the study area, this paper estimated SSDL grades based on four machine learning (ML) algorithms. A total of 132 pieces of official open-source data of storm surge disasters were collected and divided into a cross-validation set (CV set) and a test set. First, a comprehensive indicator system was constructed from three perspectives, covering the hazard (16) of disaster-causing factors, the vulnerability (22) and resilience (12) of disaster-bearing bodies, including 50 indicators. A few data preprocessing methods are implemented to improve the model performance such as normalization, SMOTE, etc. Then, Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Logistic model tree (LMT), and K-star were applied to construct the estimation model of SSDL grades. Principal component analysis (PCA) and recursive feature elimination (RFE) are adopted for an intelligent screening of the indicators. Finally, the models’ performance is compared through Precision, Recall, F1 score and Kappa metrics. The results show that scientific and efficient data preparation is a strong guarantee for the reliability and stability of the models. RFE is verified more suitable for indicator selection in this paper compared with PCA. The importance ranking of RFE enhances the interpretability of the ML model, which shows that the hazard indicator is the most important, the vulnerability indicator is the second, and the resilience indicator is the least. The 27-indicator K-star model, with advantages of accurate estimation, strong generalization, and less workload, is the optimal SSDL estimation model. The number of input indicators of the optimal SSDL estimation model is 27, its CV Precision, Recall, F1 score, and Kappa are 0.838, 0.832, 0.827, and 0.776, and its Precision, Recall, F1 score, and Kappa for test set are 0.819, 0.786, 0.781, and 0.714, respectively. This paper provides a scientific basis for the government's decision-making and risk management, and it can be used as a typical demonstration case of SSDL research.http://www.sciencedirect.com/science/article/pii/S1470160X22000012Storm surge disaster lossMachine learning algorithmsIndicator screeningModel interpretability |
spellingShingle | Suming Zhang Jie Zhang Xiaomin Li Xuexue Du Tangqi Zhao Qi Hou Xifang Jin Estimating the grade of storm surge disaster loss in coastal areas of China via machine learning algorithms Ecological Indicators Storm surge disaster loss Machine learning algorithms Indicator screening Model interpretability |
title | Estimating the grade of storm surge disaster loss in coastal areas of China via machine learning algorithms |
title_full | Estimating the grade of storm surge disaster loss in coastal areas of China via machine learning algorithms |
title_fullStr | Estimating the grade of storm surge disaster loss in coastal areas of China via machine learning algorithms |
title_full_unstemmed | Estimating the grade of storm surge disaster loss in coastal areas of China via machine learning algorithms |
title_short | Estimating the grade of storm surge disaster loss in coastal areas of China via machine learning algorithms |
title_sort | estimating the grade of storm surge disaster loss in coastal areas of china via machine learning algorithms |
topic | Storm surge disaster loss Machine learning algorithms Indicator screening Model interpretability |
url | http://www.sciencedirect.com/science/article/pii/S1470160X22000012 |
work_keys_str_mv | AT sumingzhang estimatingthegradeofstormsurgedisasterlossincoastalareasofchinaviamachinelearningalgorithms AT jiezhang estimatingthegradeofstormsurgedisasterlossincoastalareasofchinaviamachinelearningalgorithms AT xiaominli estimatingthegradeofstormsurgedisasterlossincoastalareasofchinaviamachinelearningalgorithms AT xuexuedu estimatingthegradeofstormsurgedisasterlossincoastalareasofchinaviamachinelearningalgorithms AT tangqizhao estimatingthegradeofstormsurgedisasterlossincoastalareasofchinaviamachinelearningalgorithms AT qihou estimatingthegradeofstormsurgedisasterlossincoastalareasofchinaviamachinelearningalgorithms AT xifangjin estimatingthegradeofstormsurgedisasterlossincoastalareasofchinaviamachinelearningalgorithms |