Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques
Machine Learning (ML) has been used for a long time and has gained wide attention over the last several years. It can handle a large amount of data and allow non-linear structures by using complex mathematical computations. However, traditional ML models do suffer some problems, such as high bias an...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-03-01
|
Series: | Water |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-4441/14/7/1067 |
_version_ | 1797437490587500544 |
---|---|
author | Nur Hanisah Abdul Malek Wan Fairos Wan Yaacob Syerina Azlin Md Nasir Norshahida Shaadan |
author_facet | Nur Hanisah Abdul Malek Wan Fairos Wan Yaacob Syerina Azlin Md Nasir Norshahida Shaadan |
author_sort | Nur Hanisah Abdul Malek |
collection | DOAJ |
description | Machine Learning (ML) has been used for a long time and has gained wide attention over the last several years. It can handle a large amount of data and allow non-linear structures by using complex mathematical computations. However, traditional ML models do suffer some problems, such as high bias and overfitting. Therefore, this has resulted in the advancement and improvement of ML techniques, such as the bagging and boosting approach, to address these problems. This study explores a series of ML models to predict the water quality classification (WQC) in the Kelantan River using data from 2005 to 2020. The proposed methodology employed 13 physical and chemical parameters of water quality and 7 ML models that are Decision Tree, Artificial Neural Networks, K-Nearest Neighbors, Naïve Bayes, Support Vector Machine, Random Forest and Gradient Boosting. Based on the analysis, the ensemble model of Gradient Boosting with a learning rate of 0.1 exhibited the best prediction performance compared to the other algorithms. It had the highest accuracy (94.90%), sensitivity (80.00%) and f-measure (86.49%), with the lowest classification error. Total Suspended Solid (TSS) was the most significant variable for the Gradient Boosting (GB) model to predict WQC, followed by Ammoniacal Nitrogen (NH<sub>3</sub>N), Biochemical Oxygen Demand (BOD) and Chemical Oxygen Demand (COD). Based on the accurate water quality prediction, the results could help to improve the National Environmental Policy regarding water resources by continuously improving water quality. |
first_indexed | 2024-03-09T11:20:05Z |
format | Article |
id | doaj.art-44c2b72a371d4d0595a131bf6f119113 |
institution | Directory Open Access Journal |
issn | 2073-4441 |
language | English |
last_indexed | 2024-03-09T11:20:05Z |
publishDate | 2022-03-01 |
publisher | MDPI AG |
record_format | Article |
series | Water |
spelling | doaj.art-44c2b72a371d4d0595a131bf6f1191132023-12-01T00:19:33ZengMDPI AGWater2073-44412022-03-01147106710.3390/w14071067Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning TechniquesNur Hanisah Abdul Malek0Wan Fairos Wan Yaacob1Syerina Azlin Md Nasir2Norshahida Shaadan3Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA Cawangan Kelantan, Kampus Kota Bharu, Lembah Sireh, Kota Bharu 15050, Kelantan, MalaysiaFaculty of Computer and Mathematical Sciences, Universiti Teknologi MARA Cawangan Kelantan, Kampus Kota Bharu, Lembah Sireh, Kota Bharu 15050, Kelantan, MalaysiaFaculty of Computer and Mathematical Sciences, Universiti Teknologi MARA Cawangan Kelantan, Kampus Kota Bharu, Lembah Sireh, Kota Bharu 15050, Kelantan, MalaysiaFaculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Shah Alam 40450, Selangor, MalaysiaMachine Learning (ML) has been used for a long time and has gained wide attention over the last several years. It can handle a large amount of data and allow non-linear structures by using complex mathematical computations. However, traditional ML models do suffer some problems, such as high bias and overfitting. Therefore, this has resulted in the advancement and improvement of ML techniques, such as the bagging and boosting approach, to address these problems. This study explores a series of ML models to predict the water quality classification (WQC) in the Kelantan River using data from 2005 to 2020. The proposed methodology employed 13 physical and chemical parameters of water quality and 7 ML models that are Decision Tree, Artificial Neural Networks, K-Nearest Neighbors, Naïve Bayes, Support Vector Machine, Random Forest and Gradient Boosting. Based on the analysis, the ensemble model of Gradient Boosting with a learning rate of 0.1 exhibited the best prediction performance compared to the other algorithms. It had the highest accuracy (94.90%), sensitivity (80.00%) and f-measure (86.49%), with the lowest classification error. Total Suspended Solid (TSS) was the most significant variable for the Gradient Boosting (GB) model to predict WQC, followed by Ammoniacal Nitrogen (NH<sub>3</sub>N), Biochemical Oxygen Demand (BOD) and Chemical Oxygen Demand (COD). Based on the accurate water quality prediction, the results could help to improve the National Environmental Policy regarding water resources by continuously improving water quality.https://www.mdpi.com/2073-4441/14/7/1067water quality classwater quality indexsupervised machine learningrandom forestgradient boostingdecision tree |
spellingShingle | Nur Hanisah Abdul Malek Wan Fairos Wan Yaacob Syerina Azlin Md Nasir Norshahida Shaadan Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques Water water quality class water quality index supervised machine learning random forest gradient boosting decision tree |
title | Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques |
title_full | Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques |
title_fullStr | Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques |
title_full_unstemmed | Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques |
title_short | Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques |
title_sort | prediction of water quality classification of the kelantan river basin malaysia using machine learning techniques |
topic | water quality class water quality index supervised machine learning random forest gradient boosting decision tree |
url | https://www.mdpi.com/2073-4441/14/7/1067 |
work_keys_str_mv | AT nurhanisahabdulmalek predictionofwaterqualityclassificationofthekelantanriverbasinmalaysiausingmachinelearningtechniques AT wanfairoswanyaacob predictionofwaterqualityclassificationofthekelantanriverbasinmalaysiausingmachinelearningtechniques AT syerinaazlinmdnasir predictionofwaterqualityclassificationofthekelantanriverbasinmalaysiausingmachinelearningtechniques AT norshahidashaadan predictionofwaterqualityclassificationofthekelantanriverbasinmalaysiausingmachinelearningtechniques |