Simple Prediction of an Ecosystem-Specific Water Quality Index and the Water Quality Classification of a Highly Polluted River through Supervised Machine Learning
Water quality indices (WQIs) are used for the simple assessment and classification of the water quality of surface water sources. However, considerable time, financial resources, and effort are required to measure the parameters used for their calculation. Prediction of WQIs through supervised machi...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-04-01
|
Series: | Water |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-4441/14/8/1235 |
_version_ | 1797434151140327424 |
---|---|
author | Alberto Fernández del Castillo Carlos Yebra-Montes Marycarmen Verduzco Garibay José de Anda Alejandro Garcia-Gonzalez Misael Sebastián Gradilla-Hernández |
author_facet | Alberto Fernández del Castillo Carlos Yebra-Montes Marycarmen Verduzco Garibay José de Anda Alejandro Garcia-Gonzalez Misael Sebastián Gradilla-Hernández |
author_sort | Alberto Fernández del Castillo |
collection | DOAJ |
description | Water quality indices (WQIs) are used for the simple assessment and classification of the water quality of surface water sources. However, considerable time, financial resources, and effort are required to measure the parameters used for their calculation. Prediction of WQIs through supervised machine learning is a useful and simple approach to reduce the cost of the analysis through the development of predictive models with a reduced number of water quality parameters. In this study, regression and classification machine-learning models were developed to estimate the ecosystem-specific WQI previously developed for the Santiago-Guadalajara River (SGR-WQI), which involves the measurement of 17 water quality parameters. The best subset selection method was employed to reduce the number of significant parameters required for the SGR-WQI prediction. The multiple linear regression model using 12 parameters displayed a residual square error (RSE) of 3.262, similar to that of the multiple linear regression model using 17 parameters (RSE = 3.255), which translates into significant savings for WQI estimation. Additionally, the generalized additive model not only displayed an adjusted R<sup>2</sup> of 0.9992, which is the best fit of all the models evaluated, but also fitted the rating curves of each parameter developed for the original algorithm for the SGR-WQI calculation with great accuracy. Regarding the classification models, an overall proportion of 93% and 86% of data were correctly classified using the logistic regression model with 17 and 12 parameters, respectively, while the linear discriminant functions using 12 parameters correctly classified an overall proportion of 84%. The models evaluated were found to be efficient in predicting the SGR-WQI with a reduced number of parameters as complementary tools to extend the current water quality monitoring program of the Santiago-Guadalajara River. |
first_indexed | 2024-03-09T10:27:09Z |
format | Article |
id | doaj.art-5a791cb68c7e4f9989a5a03c07843082 |
institution | Directory Open Access Journal |
issn | 2073-4441 |
language | English |
last_indexed | 2024-03-09T10:27:09Z |
publishDate | 2022-04-01 |
publisher | MDPI AG |
record_format | Article |
series | Water |
spelling | doaj.art-5a791cb68c7e4f9989a5a03c078430822023-12-01T21:31:46ZengMDPI AGWater2073-44412022-04-01148123510.3390/w14081235Simple Prediction of an Ecosystem-Specific Water Quality Index and the Water Quality Classification of a Highly Polluted River through Supervised Machine LearningAlberto Fernández del Castillo0Carlos Yebra-Montes1Marycarmen Verduzco Garibay2José de Anda3Alejandro Garcia-Gonzalez4Misael Sebastián Gradilla-Hernández5Tecnologico de Monterrey, Escuela de Ingenieria y Ciencias, Av. General Ramon Corona 2514, Nuevo México, Zapopan CP 45138, Jalisco, MexicoENES-León, Universidad Nacional Autónoma de México, Blvd. UNAM 2011, Predio el Saucillo y El Potrero, León CP 37684, Guanajuato, MexicoTecnologico de Monterrey, Escuela de Ingenieria y Ciencias, Av. General Ramon Corona 2514, Nuevo México, Zapopan CP 45138, Jalisco, MexicoUnidad de Tecnología Ambiental, Centro de Investigación y Asistencia en Tecnología y Diseño del Estado de Jalisco, A. C. Av. Normalistas 800, Colinas de la Normal, Guadalajara CP 44270, Jalisco, MexicoTecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Av. General Ramon Corona 2514, Nuevo Mexico, Zapopan CP 45138, Jalisco, MexicoTecnologico de Monterrey, Escuela de Ingenieria y Ciencias, Av. General Ramon Corona 2514, Nuevo México, Zapopan CP 45138, Jalisco, MexicoWater quality indices (WQIs) are used for the simple assessment and classification of the water quality of surface water sources. However, considerable time, financial resources, and effort are required to measure the parameters used for their calculation. Prediction of WQIs through supervised machine learning is a useful and simple approach to reduce the cost of the analysis through the development of predictive models with a reduced number of water quality parameters. In this study, regression and classification machine-learning models were developed to estimate the ecosystem-specific WQI previously developed for the Santiago-Guadalajara River (SGR-WQI), which involves the measurement of 17 water quality parameters. The best subset selection method was employed to reduce the number of significant parameters required for the SGR-WQI prediction. The multiple linear regression model using 12 parameters displayed a residual square error (RSE) of 3.262, similar to that of the multiple linear regression model using 17 parameters (RSE = 3.255), which translates into significant savings for WQI estimation. Additionally, the generalized additive model not only displayed an adjusted R<sup>2</sup> of 0.9992, which is the best fit of all the models evaluated, but also fitted the rating curves of each parameter developed for the original algorithm for the SGR-WQI calculation with great accuracy. Regarding the classification models, an overall proportion of 93% and 86% of data were correctly classified using the logistic regression model with 17 and 12 parameters, respectively, while the linear discriminant functions using 12 parameters correctly classified an overall proportion of 84%. The models evaluated were found to be efficient in predicting the SGR-WQI with a reduced number of parameters as complementary tools to extend the current water quality monitoring program of the Santiago-Guadalajara River.https://www.mdpi.com/2073-4441/14/8/1235water quality index predictionregression and classification algorithmsSantiago-Guadalajara River |
spellingShingle | Alberto Fernández del Castillo Carlos Yebra-Montes Marycarmen Verduzco Garibay José de Anda Alejandro Garcia-Gonzalez Misael Sebastián Gradilla-Hernández Simple Prediction of an Ecosystem-Specific Water Quality Index and the Water Quality Classification of a Highly Polluted River through Supervised Machine Learning Water water quality index prediction regression and classification algorithms Santiago-Guadalajara River |
title | Simple Prediction of an Ecosystem-Specific Water Quality Index and the Water Quality Classification of a Highly Polluted River through Supervised Machine Learning |
title_full | Simple Prediction of an Ecosystem-Specific Water Quality Index and the Water Quality Classification of a Highly Polluted River through Supervised Machine Learning |
title_fullStr | Simple Prediction of an Ecosystem-Specific Water Quality Index and the Water Quality Classification of a Highly Polluted River through Supervised Machine Learning |
title_full_unstemmed | Simple Prediction of an Ecosystem-Specific Water Quality Index and the Water Quality Classification of a Highly Polluted River through Supervised Machine Learning |
title_short | Simple Prediction of an Ecosystem-Specific Water Quality Index and the Water Quality Classification of a Highly Polluted River through Supervised Machine Learning |
title_sort | simple prediction of an ecosystem specific water quality index and the water quality classification of a highly polluted river through supervised machine learning |
topic | water quality index prediction regression and classification algorithms Santiago-Guadalajara River |
url | https://www.mdpi.com/2073-4441/14/8/1235 |
work_keys_str_mv | AT albertofernandezdelcastillo simplepredictionofanecosystemspecificwaterqualityindexandthewaterqualityclassificationofahighlypollutedriverthroughsupervisedmachinelearning AT carlosyebramontes simplepredictionofanecosystemspecificwaterqualityindexandthewaterqualityclassificationofahighlypollutedriverthroughsupervisedmachinelearning AT marycarmenverduzcogaribay simplepredictionofanecosystemspecificwaterqualityindexandthewaterqualityclassificationofahighlypollutedriverthroughsupervisedmachinelearning AT josedeanda simplepredictionofanecosystemspecificwaterqualityindexandthewaterqualityclassificationofahighlypollutedriverthroughsupervisedmachinelearning AT alejandrogarciagonzalez simplepredictionofanecosystemspecificwaterqualityindexandthewaterqualityclassificationofahighlypollutedriverthroughsupervisedmachinelearning AT misaelsebastiangradillahernandez simplepredictionofanecosystemspecificwaterqualityindexandthewaterqualityclassificationofahighlypollutedriverthroughsupervisedmachinelearning |