Estimating the grade of storm surge disaster loss in coastal areas of China via machine learning algorithms

Storm surge is the most severe marine disaster in China, affecting the whole coastal area. Estimating storm surge disaster loss (SSDL) is significant to disaster prevention, sustainability and decision-making. Taking 11 provincial administrative regions in the coastal areas of China as the study are...

Full description

Bibliographic Details
Main Authors: Suming Zhang, Jie Zhang, Xiaomin Li, Xuexue Du, Tangqi Zhao, Qi Hou, Xifang Jin
Format: Article
Language:English
Published: Elsevier 2022-03-01
Series:Ecological Indicators
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1470160X22000012
_version_ 1818908510702796800
author Suming Zhang
Jie Zhang
Xiaomin Li
Xuexue Du
Tangqi Zhao
Qi Hou
Xifang Jin
author_facet Suming Zhang
Jie Zhang
Xiaomin Li
Xuexue Du
Tangqi Zhao
Qi Hou
Xifang Jin
author_sort Suming Zhang
collection DOAJ
description Storm surge is the most severe marine disaster in China, affecting the whole coastal area. Estimating storm surge disaster loss (SSDL) is significant to disaster prevention, sustainability and decision-making. Taking 11 provincial administrative regions in the coastal areas of China as the study area, this paper estimated SSDL grades based on four machine learning (ML) algorithms. A total of 132 pieces of official open-source data of storm surge disasters were collected and divided into a cross-validation set (CV set) and a test set. First, a comprehensive indicator system was constructed from three perspectives, covering the hazard (16) of disaster-causing factors, the vulnerability (22) and resilience (12) of disaster-bearing bodies, including 50 indicators. A few data preprocessing methods are implemented to improve the model performance such as normalization, SMOTE, etc. Then, Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Logistic model tree (LMT), and K-star were applied to construct the estimation model of SSDL grades. Principal component analysis (PCA) and recursive feature elimination (RFE) are adopted for an intelligent screening of the indicators. Finally, the models’ performance is compared through Precision, Recall, F1 score and Kappa metrics. The results show that scientific and efficient data preparation is a strong guarantee for the reliability and stability of the models. RFE is verified more suitable for indicator selection in this paper compared with PCA. The importance ranking of RFE enhances the interpretability of the ML model, which shows that the hazard indicator is the most important, the vulnerability indicator is the second, and the resilience indicator is the least. The 27-indicator K-star model, with advantages of accurate estimation, strong generalization, and less workload, is the optimal SSDL estimation model. The number of input indicators of the optimal SSDL estimation model is 27, its CV Precision, Recall, F1 score, and Kappa are 0.838, 0.832, 0.827, and 0.776, and its Precision, Recall, F1 score, and Kappa for test set are 0.819, 0.786, 0.781, and 0.714, respectively. This paper provides a scientific basis for the government's decision-making and risk management, and it can be used as a typical demonstration case of SSDL research.
first_indexed 2024-12-19T22:12:10Z
format Article
id doaj.art-f1910e99ba3d42059e87011c8b1db66f
institution Directory Open Access Journal
issn 1470-160X
language English
last_indexed 2024-12-19T22:12:10Z
publishDate 2022-03-01
publisher Elsevier
record_format Article
series Ecological Indicators
spelling doaj.art-f1910e99ba3d42059e87011c8b1db66f2022-12-21T20:03:52ZengElsevierEcological Indicators1470-160X2022-03-01136108533Estimating the grade of storm surge disaster loss in coastal areas of China via machine learning algorithmsSuming Zhang0Jie Zhang1Xiaomin Li2Xuexue Du3Tangqi Zhao4Qi Hou5Xifang Jin6College of Oceanography and Space Informatics, China University of Petroleum (East China), Qingdao 266580, ChinaCollege of Oceanography and Space Informatics, China University of Petroleum (East China), Qingdao 266580, China; First Institute of Oceanography, Ministry of Natural Resources of China, Qingdao 266061, China; Corresponding author at: College of Oceanography and Space Informatics, China University of Petroleum (East China), Qingdao 266580, China.First Institute of Oceanography, Ministry of Natural Resources of China, Qingdao 266061, ChinaCollege of Oceanography and Space Informatics, China University of Petroleum (East China), Qingdao 266580, ChinaCollege of Oceanography and Space Informatics, China University of Petroleum (East China), Qingdao 266580, ChinaCollege of Oceanography and Space Informatics, China University of Petroleum (East China), Qingdao 266580, ChinaNorth Sea Marine Forecast Center of State Oceanic Administration, Qingdao 266001, ChinaStorm surge is the most severe marine disaster in China, affecting the whole coastal area. Estimating storm surge disaster loss (SSDL) is significant to disaster prevention, sustainability and decision-making. Taking 11 provincial administrative regions in the coastal areas of China as the study area, this paper estimated SSDL grades based on four machine learning (ML) algorithms. A total of 132 pieces of official open-source data of storm surge disasters were collected and divided into a cross-validation set (CV set) and a test set. First, a comprehensive indicator system was constructed from three perspectives, covering the hazard (16) of disaster-causing factors, the vulnerability (22) and resilience (12) of disaster-bearing bodies, including 50 indicators. A few data preprocessing methods are implemented to improve the model performance such as normalization, SMOTE, etc. Then, Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Logistic model tree (LMT), and K-star were applied to construct the estimation model of SSDL grades. Principal component analysis (PCA) and recursive feature elimination (RFE) are adopted for an intelligent screening of the indicators. Finally, the models’ performance is compared through Precision, Recall, F1 score and Kappa metrics. The results show that scientific and efficient data preparation is a strong guarantee for the reliability and stability of the models. RFE is verified more suitable for indicator selection in this paper compared with PCA. The importance ranking of RFE enhances the interpretability of the ML model, which shows that the hazard indicator is the most important, the vulnerability indicator is the second, and the resilience indicator is the least. The 27-indicator K-star model, with advantages of accurate estimation, strong generalization, and less workload, is the optimal SSDL estimation model. The number of input indicators of the optimal SSDL estimation model is 27, its CV Precision, Recall, F1 score, and Kappa are 0.838, 0.832, 0.827, and 0.776, and its Precision, Recall, F1 score, and Kappa for test set are 0.819, 0.786, 0.781, and 0.714, respectively. This paper provides a scientific basis for the government's decision-making and risk management, and it can be used as a typical demonstration case of SSDL research.http://www.sciencedirect.com/science/article/pii/S1470160X22000012Storm surge disaster lossMachine learning algorithmsIndicator screeningModel interpretability
spellingShingle Suming Zhang
Jie Zhang
Xiaomin Li
Xuexue Du
Tangqi Zhao
Qi Hou
Xifang Jin
Estimating the grade of storm surge disaster loss in coastal areas of China via machine learning algorithms
Ecological Indicators
Storm surge disaster loss
Machine learning algorithms
Indicator screening
Model interpretability
title Estimating the grade of storm surge disaster loss in coastal areas of China via machine learning algorithms
title_full Estimating the grade of storm surge disaster loss in coastal areas of China via machine learning algorithms
title_fullStr Estimating the grade of storm surge disaster loss in coastal areas of China via machine learning algorithms
title_full_unstemmed Estimating the grade of storm surge disaster loss in coastal areas of China via machine learning algorithms
title_short Estimating the grade of storm surge disaster loss in coastal areas of China via machine learning algorithms
title_sort estimating the grade of storm surge disaster loss in coastal areas of china via machine learning algorithms
topic Storm surge disaster loss
Machine learning algorithms
Indicator screening
Model interpretability
url http://www.sciencedirect.com/science/article/pii/S1470160X22000012
work_keys_str_mv AT sumingzhang estimatingthegradeofstormsurgedisasterlossincoastalareasofchinaviamachinelearningalgorithms
AT jiezhang estimatingthegradeofstormsurgedisasterlossincoastalareasofchinaviamachinelearningalgorithms
AT xiaominli estimatingthegradeofstormsurgedisasterlossincoastalareasofchinaviamachinelearningalgorithms
AT xuexuedu estimatingthegradeofstormsurgedisasterlossincoastalareasofchinaviamachinelearningalgorithms
AT tangqizhao estimatingthegradeofstormsurgedisasterlossincoastalareasofchinaviamachinelearningalgorithms
AT qihou estimatingthegradeofstormsurgedisasterlossincoastalareasofchinaviamachinelearningalgorithms
AT xifangjin estimatingthegradeofstormsurgedisasterlossincoastalareasofchinaviamachinelearningalgorithms