Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques

Machine Learning (ML) has been used for a long time and has gained wide attention over the last several years. It can handle a large amount of data and allow non-linear structures by using complex mathematical computations. However, traditional ML models do suffer some problems, such as high bias an...

Full description

Bibliographic Details
Main Authors: Nur Hanisah Abdul Malek, Wan Fairos Wan Yaacob, Syerina Azlin Md Nasir, Norshahida Shaadan
Format: Article
Language:English
Published: MDPI AG 2022-03-01
Series:Water
Subjects:
Online Access:https://www.mdpi.com/2073-4441/14/7/1067
_version_ 1797437490587500544
author Nur Hanisah Abdul Malek
Wan Fairos Wan Yaacob
Syerina Azlin Md Nasir
Norshahida Shaadan
author_facet Nur Hanisah Abdul Malek
Wan Fairos Wan Yaacob
Syerina Azlin Md Nasir
Norshahida Shaadan
author_sort Nur Hanisah Abdul Malek
collection DOAJ
description Machine Learning (ML) has been used for a long time and has gained wide attention over the last several years. It can handle a large amount of data and allow non-linear structures by using complex mathematical computations. However, traditional ML models do suffer some problems, such as high bias and overfitting. Therefore, this has resulted in the advancement and improvement of ML techniques, such as the bagging and boosting approach, to address these problems. This study explores a series of ML models to predict the water quality classification (WQC) in the Kelantan River using data from 2005 to 2020. The proposed methodology employed 13 physical and chemical parameters of water quality and 7 ML models that are Decision Tree, Artificial Neural Networks, K-Nearest Neighbors, Naïve Bayes, Support Vector Machine, Random Forest and Gradient Boosting. Based on the analysis, the ensemble model of Gradient Boosting with a learning rate of 0.1 exhibited the best prediction performance compared to the other algorithms. It had the highest accuracy (94.90%), sensitivity (80.00%) and f-measure (86.49%), with the lowest classification error. Total Suspended Solid (TSS) was the most significant variable for the Gradient Boosting (GB) model to predict WQC, followed by Ammoniacal Nitrogen (NH<sub>3</sub>N), Biochemical Oxygen Demand (BOD) and Chemical Oxygen Demand (COD). Based on the accurate water quality prediction, the results could help to improve the National Environmental Policy regarding water resources by continuously improving water quality.
first_indexed 2024-03-09T11:20:05Z
format Article
id doaj.art-44c2b72a371d4d0595a131bf6f119113
institution Directory Open Access Journal
issn 2073-4441
language English
last_indexed 2024-03-09T11:20:05Z
publishDate 2022-03-01
publisher MDPI AG
record_format Article
series Water
spelling doaj.art-44c2b72a371d4d0595a131bf6f1191132023-12-01T00:19:33ZengMDPI AGWater2073-44412022-03-01147106710.3390/w14071067Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning TechniquesNur Hanisah Abdul Malek0Wan Fairos Wan Yaacob1Syerina Azlin Md Nasir2Norshahida Shaadan3Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA Cawangan Kelantan, Kampus Kota Bharu, Lembah Sireh, Kota Bharu 15050, Kelantan, MalaysiaFaculty of Computer and Mathematical Sciences, Universiti Teknologi MARA Cawangan Kelantan, Kampus Kota Bharu, Lembah Sireh, Kota Bharu 15050, Kelantan, MalaysiaFaculty of Computer and Mathematical Sciences, Universiti Teknologi MARA Cawangan Kelantan, Kampus Kota Bharu, Lembah Sireh, Kota Bharu 15050, Kelantan, MalaysiaFaculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Shah Alam 40450, Selangor, MalaysiaMachine Learning (ML) has been used for a long time and has gained wide attention over the last several years. It can handle a large amount of data and allow non-linear structures by using complex mathematical computations. However, traditional ML models do suffer some problems, such as high bias and overfitting. Therefore, this has resulted in the advancement and improvement of ML techniques, such as the bagging and boosting approach, to address these problems. This study explores a series of ML models to predict the water quality classification (WQC) in the Kelantan River using data from 2005 to 2020. The proposed methodology employed 13 physical and chemical parameters of water quality and 7 ML models that are Decision Tree, Artificial Neural Networks, K-Nearest Neighbors, Naïve Bayes, Support Vector Machine, Random Forest and Gradient Boosting. Based on the analysis, the ensemble model of Gradient Boosting with a learning rate of 0.1 exhibited the best prediction performance compared to the other algorithms. It had the highest accuracy (94.90%), sensitivity (80.00%) and f-measure (86.49%), with the lowest classification error. Total Suspended Solid (TSS) was the most significant variable for the Gradient Boosting (GB) model to predict WQC, followed by Ammoniacal Nitrogen (NH<sub>3</sub>N), Biochemical Oxygen Demand (BOD) and Chemical Oxygen Demand (COD). Based on the accurate water quality prediction, the results could help to improve the National Environmental Policy regarding water resources by continuously improving water quality.https://www.mdpi.com/2073-4441/14/7/1067water quality classwater quality indexsupervised machine learningrandom forestgradient boostingdecision tree
spellingShingle Nur Hanisah Abdul Malek
Wan Fairos Wan Yaacob
Syerina Azlin Md Nasir
Norshahida Shaadan
Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques
Water
water quality class
water quality index
supervised machine learning
random forest
gradient boosting
decision tree
title Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques
title_full Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques
title_fullStr Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques
title_full_unstemmed Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques
title_short Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques
title_sort prediction of water quality classification of the kelantan river basin malaysia using machine learning techniques
topic water quality class
water quality index
supervised machine learning
random forest
gradient boosting
decision tree
url https://www.mdpi.com/2073-4441/14/7/1067
work_keys_str_mv AT nurhanisahabdulmalek predictionofwaterqualityclassificationofthekelantanriverbasinmalaysiausingmachinelearningtechniques
AT wanfairoswanyaacob predictionofwaterqualityclassificationofthekelantanriverbasinmalaysiausingmachinelearningtechniques
AT syerinaazlinmdnasir predictionofwaterqualityclassificationofthekelantanriverbasinmalaysiausingmachinelearningtechniques
AT norshahidashaadan predictionofwaterqualityclassificationofthekelantanriverbasinmalaysiausingmachinelearningtechniques