Evaluating Imputation Methods for rainfall data under high variability in Johor River Basin, Malaysia

Missing values in rainfall records might result in erroneous predictions and inefficient management practices with significant economic, environmental, and social consequences. This is particularly important for rainfall datasets in Peninsular Malaysia (PM) due to the high level of missingness that...

Full description

Bibliographic Details
Main Authors: Zulfaqar Sa’adi, Zulkifli Yusop, Nor Eliza Alias, Ming Fai Chow, Mohd Khairul Idlan Muhammad, Muhammad Wafiy Adli Ramli, Zafar Iqbal, Mohammed Sanusi Shiru, Faizal Immaddudin Wira Rohmat, Nur Athirah Mohamad, Mohamad Faizal Ahmad
Format: Article
Language:English
Published: Elsevier 2023-12-01
Series:Applied Computing and Geosciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2590197423000344
_version_ 1827578620191703040
author Zulfaqar Sa’adi
Zulkifli Yusop
Nor Eliza Alias
Ming Fai Chow
Mohd Khairul Idlan Muhammad
Muhammad Wafiy Adli Ramli
Zafar Iqbal
Mohammed Sanusi Shiru
Faizal Immaddudin Wira Rohmat
Nur Athirah Mohamad
Mohamad Faizal Ahmad
author_facet Zulfaqar Sa’adi
Zulkifli Yusop
Nor Eliza Alias
Ming Fai Chow
Mohd Khairul Idlan Muhammad
Muhammad Wafiy Adli Ramli
Zafar Iqbal
Mohammed Sanusi Shiru
Faizal Immaddudin Wira Rohmat
Nur Athirah Mohamad
Mohamad Faizal Ahmad
author_sort Zulfaqar Sa’adi
collection DOAJ
description Missing values in rainfall records might result in erroneous predictions and inefficient management practices with significant economic, environmental, and social consequences. This is particularly important for rainfall datasets in Peninsular Malaysia (PM) due to the high level of missingness that can affect the inherent pattern in the highly variable time series. In this work, 21 target rainfall stations in the Johor River Basin (JRB) with daily data between 1970 and 2015 were used to examine 19 different multiple imputation methods that were carried out using the Multivariate Imputation by Chained Equations (MICE) package in R. For each station, artificial missing data were added at rates of up to 5%, 10%, 20%, and 30% for different types of missingness, namely, Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR), leaving the original missing data intact. The imputation quality was evaluated based on several statistical performance metrics, namely mean absolute error (MAE), root mean square error (RMSE), normalized root mean square error (NRMSE), Nash-Sutcliffe efficiency (NSE), modified degree of agreement (MD), coefficient of determination (R2), Kling-Gupta efficiency (KGE), and volumetric efficiency (VE), which were later ranked and aggregated by using the compromise programming index (CPI) to select the best method. The results showed that linear regression predicted values (norm.predict) consistently ranked the highest under all types and levels of missingness. For example, under MAR, MNAR, and MCAR, this method showed the lowest MAE values, ranging between 0.78 and 2.25, 0.93–2.57, and 0.87–2.43, respectively. It also consistently shows higher NSE and R2 values of 0.71–0.92, 0.6–0.92, and 0.66–0.91, and 0.77–0.92, 0.71–0.93, and 0.75–0.92 under MAR, MCAR, and MNAR, respectively. The methods of mean, rf, and cart also appear to be efficient. The incorporation of the compromise programming index (CPI) as a decision-support tool has enabled an objective assessment of the output from the multiple performance metrics for the ranking and selection of the top-performing method. During validation, the Probability Density Function (PDF) demonstrated that even with up to 30% missingness, the shape of the distribution was retained after imputation compared to the actual data. The methodology proposed in this study can help in choosing suitable imputation methods for other tropical rainfall datasets, leading to improved accuracy in rainfall estimation and prediction.
first_indexed 2024-03-08T21:49:07Z
format Article
id doaj.art-67265cb74d7f4ef3ba34e3df26d8e54d
institution Directory Open Access Journal
issn 2590-1974
language English
last_indexed 2024-03-08T21:49:07Z
publishDate 2023-12-01
publisher Elsevier
record_format Article
series Applied Computing and Geosciences
spelling doaj.art-67265cb74d7f4ef3ba34e3df26d8e54d2023-12-20T07:36:39ZengElsevierApplied Computing and Geosciences2590-19742023-12-0120100145Evaluating Imputation Methods for rainfall data under high variability in Johor River Basin, MalaysiaZulfaqar Sa’adi0Zulkifli Yusop1Nor Eliza Alias2Ming Fai Chow3Mohd Khairul Idlan Muhammad4Muhammad Wafiy Adli Ramli5Zafar Iqbal6Mohammed Sanusi Shiru7Faizal Immaddudin Wira Rohmat8Nur Athirah Mohamad9Mohamad Faizal Ahmad10Centre for Environmental Sustainability and Water Security, Research Institute for Sustainable Environment, Universiti Teknologi Malaysia, 81310 UTM Skudai, Johor Bahru, Malaysia; Department of Water and Environmental Engineering, Faculty of Civil Engineering, Universiti Teknologi Malaysia, 81310 UTM Skudai, Johor Bahru, Malaysia; Corresponding author. Centre for Environmental Sustainability and Water Security, Research Institute for Sustainable Environment, Universiti Teknologi Malaysia, 81310 UTM Skudai, Johor Bahru, Malaysia.Centre for Environmental Sustainability and Water Security, Research Institute for Sustainable Environment, Universiti Teknologi Malaysia, 81310 UTM Skudai, Johor Bahru, Malaysia; Department of Water and Environmental Engineering, Faculty of Civil Engineering, Universiti Teknologi Malaysia, 81310 UTM Skudai, Johor Bahru, MalaysiaCentre for Environmental Sustainability and Water Security, Research Institute for Sustainable Environment, Universiti Teknologi Malaysia, 81310 UTM Skudai, Johor Bahru, Malaysia; Department of Water and Environmental Engineering, Faculty of Civil Engineering, Universiti Teknologi Malaysia, 81310 UTM Skudai, Johor Bahru, MalaysiaDepartment of Civil Engineering, School of Engineering, Monash University Malaysia, Jalan Lagoon Selatan, 47500 Bandar Sunway, Selangor, MalaysiaDepartment of Water & Environmental Engineering, School of Civil Engineering, Faculty of Engineering, Universiti Teknologi Malaysia (UTM), 81310, Johor Bahru, MalaysiaSchool of Humanities, Universiti Sains Malaysia, 11700, Penang, MalaysiaNUST Institute of Civil Engineering-SCEE, National University of Sciences and Technology (NUST), H-12, Islamabad, 44000, PakistanDepartment of Environmental Sciences, Faculty of Science, Federal University Dutse, P.M.B 7156, Dutse, NigeriaWater Resources Development Center, Bandung Institute of Technology, Indonesia; Water Resources Research Group, Faculty of Civil and Environmental Engineering, Bandung Institute of Technology, IndonesiaDepartment of Water & Environmental Engineering, School of Civil Engineering, Faculty of Engineering, Universiti Teknologi Malaysia (UTM), 81310, Johor Bahru, MalaysiaFaculty of Civil Engineering, Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, MalaysiaMissing values in rainfall records might result in erroneous predictions and inefficient management practices with significant economic, environmental, and social consequences. This is particularly important for rainfall datasets in Peninsular Malaysia (PM) due to the high level of missingness that can affect the inherent pattern in the highly variable time series. In this work, 21 target rainfall stations in the Johor River Basin (JRB) with daily data between 1970 and 2015 were used to examine 19 different multiple imputation methods that were carried out using the Multivariate Imputation by Chained Equations (MICE) package in R. For each station, artificial missing data were added at rates of up to 5%, 10%, 20%, and 30% for different types of missingness, namely, Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR), leaving the original missing data intact. The imputation quality was evaluated based on several statistical performance metrics, namely mean absolute error (MAE), root mean square error (RMSE), normalized root mean square error (NRMSE), Nash-Sutcliffe efficiency (NSE), modified degree of agreement (MD), coefficient of determination (R2), Kling-Gupta efficiency (KGE), and volumetric efficiency (VE), which were later ranked and aggregated by using the compromise programming index (CPI) to select the best method. The results showed that linear regression predicted values (norm.predict) consistently ranked the highest under all types and levels of missingness. For example, under MAR, MNAR, and MCAR, this method showed the lowest MAE values, ranging between 0.78 and 2.25, 0.93–2.57, and 0.87–2.43, respectively. It also consistently shows higher NSE and R2 values of 0.71–0.92, 0.6–0.92, and 0.66–0.91, and 0.77–0.92, 0.71–0.93, and 0.75–0.92 under MAR, MCAR, and MNAR, respectively. The methods of mean, rf, and cart also appear to be efficient. The incorporation of the compromise programming index (CPI) as a decision-support tool has enabled an objective assessment of the output from the multiple performance metrics for the ranking and selection of the top-performing method. During validation, the Probability Density Function (PDF) demonstrated that even with up to 30% missingness, the shape of the distribution was retained after imputation compared to the actual data. The methodology proposed in this study can help in choosing suitable imputation methods for other tropical rainfall datasets, leading to improved accuracy in rainfall estimation and prediction.http://www.sciencedirect.com/science/article/pii/S2590197423000344Daily rainfallJohor river basinMissing dataMultiple imputation methodsPeninsular MalaysiaSpatiotemporal variability
spellingShingle Zulfaqar Sa’adi
Zulkifli Yusop
Nor Eliza Alias
Ming Fai Chow
Mohd Khairul Idlan Muhammad
Muhammad Wafiy Adli Ramli
Zafar Iqbal
Mohammed Sanusi Shiru
Faizal Immaddudin Wira Rohmat
Nur Athirah Mohamad
Mohamad Faizal Ahmad
Evaluating Imputation Methods for rainfall data under high variability in Johor River Basin, Malaysia
Applied Computing and Geosciences
Daily rainfall
Johor river basin
Missing data
Multiple imputation methods
Peninsular Malaysia
Spatiotemporal variability
title Evaluating Imputation Methods for rainfall data under high variability in Johor River Basin, Malaysia
title_full Evaluating Imputation Methods for rainfall data under high variability in Johor River Basin, Malaysia
title_fullStr Evaluating Imputation Methods for rainfall data under high variability in Johor River Basin, Malaysia
title_full_unstemmed Evaluating Imputation Methods for rainfall data under high variability in Johor River Basin, Malaysia
title_short Evaluating Imputation Methods for rainfall data under high variability in Johor River Basin, Malaysia
title_sort evaluating imputation methods for rainfall data under high variability in johor river basin malaysia
topic Daily rainfall
Johor river basin
Missing data
Multiple imputation methods
Peninsular Malaysia
Spatiotemporal variability
url http://www.sciencedirect.com/science/article/pii/S2590197423000344
work_keys_str_mv AT zulfaqarsaadi evaluatingimputationmethodsforrainfalldataunderhighvariabilityinjohorriverbasinmalaysia
AT zulkifliyusop evaluatingimputationmethodsforrainfalldataunderhighvariabilityinjohorriverbasinmalaysia
AT norelizaalias evaluatingimputationmethodsforrainfalldataunderhighvariabilityinjohorriverbasinmalaysia
AT mingfaichow evaluatingimputationmethodsforrainfalldataunderhighvariabilityinjohorriverbasinmalaysia
AT mohdkhairulidlanmuhammad evaluatingimputationmethodsforrainfalldataunderhighvariabilityinjohorriverbasinmalaysia
AT muhammadwafiyadliramli evaluatingimputationmethodsforrainfalldataunderhighvariabilityinjohorriverbasinmalaysia
AT zafariqbal evaluatingimputationmethodsforrainfalldataunderhighvariabilityinjohorriverbasinmalaysia
AT mohammedsanusishiru evaluatingimputationmethodsforrainfalldataunderhighvariabilityinjohorriverbasinmalaysia
AT faizalimmaddudinwirarohmat evaluatingimputationmethodsforrainfalldataunderhighvariabilityinjohorriverbasinmalaysia
AT nurathirahmohamad evaluatingimputationmethodsforrainfalldataunderhighvariabilityinjohorriverbasinmalaysia
AT mohamadfaizalahmad evaluatingimputationmethodsforrainfalldataunderhighvariabilityinjohorriverbasinmalaysia