Simple Prediction of an Ecosystem-Specific Water Quality Index and the Water Quality Classification of a Highly Polluted River through Supervised Machine Learning

Water quality indices (WQIs) are used for the simple assessment and classification of the water quality of surface water sources. However, considerable time, financial resources, and effort are required to measure the parameters used for their calculation. Prediction of WQIs through supervised machi...

Full description

Bibliographic Details
Main Authors: Alberto Fernández del Castillo, Carlos Yebra-Montes, Marycarmen Verduzco Garibay, José de Anda, Alejandro Garcia-Gonzalez, Misael Sebastián Gradilla-Hernández
Format: Article
Language:English
Published: MDPI AG 2022-04-01
Series:Water
Subjects:
Online Access:https://www.mdpi.com/2073-4441/14/8/1235
_version_ 1797434151140327424
author Alberto Fernández del Castillo
Carlos Yebra-Montes
Marycarmen Verduzco Garibay
José de Anda
Alejandro Garcia-Gonzalez
Misael Sebastián Gradilla-Hernández
author_facet Alberto Fernández del Castillo
Carlos Yebra-Montes
Marycarmen Verduzco Garibay
José de Anda
Alejandro Garcia-Gonzalez
Misael Sebastián Gradilla-Hernández
author_sort Alberto Fernández del Castillo
collection DOAJ
description Water quality indices (WQIs) are used for the simple assessment and classification of the water quality of surface water sources. However, considerable time, financial resources, and effort are required to measure the parameters used for their calculation. Prediction of WQIs through supervised machine learning is a useful and simple approach to reduce the cost of the analysis through the development of predictive models with a reduced number of water quality parameters. In this study, regression and classification machine-learning models were developed to estimate the ecosystem-specific WQI previously developed for the Santiago-Guadalajara River (SGR-WQI), which involves the measurement of 17 water quality parameters. The best subset selection method was employed to reduce the number of significant parameters required for the SGR-WQI prediction. The multiple linear regression model using 12 parameters displayed a residual square error (RSE) of 3.262, similar to that of the multiple linear regression model using 17 parameters (RSE = 3.255), which translates into significant savings for WQI estimation. Additionally, the generalized additive model not only displayed an adjusted R<sup>2</sup> of 0.9992, which is the best fit of all the models evaluated, but also fitted the rating curves of each parameter developed for the original algorithm for the SGR-WQI calculation with great accuracy. Regarding the classification models, an overall proportion of 93% and 86% of data were correctly classified using the logistic regression model with 17 and 12 parameters, respectively, while the linear discriminant functions using 12 parameters correctly classified an overall proportion of 84%. The models evaluated were found to be efficient in predicting the SGR-WQI with a reduced number of parameters as complementary tools to extend the current water quality monitoring program of the Santiago-Guadalajara River.
first_indexed 2024-03-09T10:27:09Z
format Article
id doaj.art-5a791cb68c7e4f9989a5a03c07843082
institution Directory Open Access Journal
issn 2073-4441
language English
last_indexed 2024-03-09T10:27:09Z
publishDate 2022-04-01
publisher MDPI AG
record_format Article
series Water
spelling doaj.art-5a791cb68c7e4f9989a5a03c078430822023-12-01T21:31:46ZengMDPI AGWater2073-44412022-04-01148123510.3390/w14081235Simple Prediction of an Ecosystem-Specific Water Quality Index and the Water Quality Classification of a Highly Polluted River through Supervised Machine LearningAlberto Fernández del Castillo0Carlos Yebra-Montes1Marycarmen Verduzco Garibay2José de Anda3Alejandro Garcia-Gonzalez4Misael Sebastián Gradilla-Hernández5Tecnologico de Monterrey, Escuela de Ingenieria y Ciencias, Av. General Ramon Corona 2514, Nuevo México, Zapopan CP 45138, Jalisco, MexicoENES-León, Universidad Nacional Autónoma de México, Blvd. UNAM 2011, Predio el Saucillo y El Potrero, León CP 37684, Guanajuato, MexicoTecnologico de Monterrey, Escuela de Ingenieria y Ciencias, Av. General Ramon Corona 2514, Nuevo México, Zapopan CP 45138, Jalisco, MexicoUnidad de Tecnología Ambiental, Centro de Investigación y Asistencia en Tecnología y Diseño del Estado de Jalisco, A. C. Av. Normalistas 800, Colinas de la Normal, Guadalajara CP 44270, Jalisco, MexicoTecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Av. General Ramon Corona 2514, Nuevo Mexico, Zapopan CP 45138, Jalisco, MexicoTecnologico de Monterrey, Escuela de Ingenieria y Ciencias, Av. General Ramon Corona 2514, Nuevo México, Zapopan CP 45138, Jalisco, MexicoWater quality indices (WQIs) are used for the simple assessment and classification of the water quality of surface water sources. However, considerable time, financial resources, and effort are required to measure the parameters used for their calculation. Prediction of WQIs through supervised machine learning is a useful and simple approach to reduce the cost of the analysis through the development of predictive models with a reduced number of water quality parameters. In this study, regression and classification machine-learning models were developed to estimate the ecosystem-specific WQI previously developed for the Santiago-Guadalajara River (SGR-WQI), which involves the measurement of 17 water quality parameters. The best subset selection method was employed to reduce the number of significant parameters required for the SGR-WQI prediction. The multiple linear regression model using 12 parameters displayed a residual square error (RSE) of 3.262, similar to that of the multiple linear regression model using 17 parameters (RSE = 3.255), which translates into significant savings for WQI estimation. Additionally, the generalized additive model not only displayed an adjusted R<sup>2</sup> of 0.9992, which is the best fit of all the models evaluated, but also fitted the rating curves of each parameter developed for the original algorithm for the SGR-WQI calculation with great accuracy. Regarding the classification models, an overall proportion of 93% and 86% of data were correctly classified using the logistic regression model with 17 and 12 parameters, respectively, while the linear discriminant functions using 12 parameters correctly classified an overall proportion of 84%. The models evaluated were found to be efficient in predicting the SGR-WQI with a reduced number of parameters as complementary tools to extend the current water quality monitoring program of the Santiago-Guadalajara River.https://www.mdpi.com/2073-4441/14/8/1235water quality index predictionregression and classification algorithmsSantiago-Guadalajara River
spellingShingle Alberto Fernández del Castillo
Carlos Yebra-Montes
Marycarmen Verduzco Garibay
José de Anda
Alejandro Garcia-Gonzalez
Misael Sebastián Gradilla-Hernández
Simple Prediction of an Ecosystem-Specific Water Quality Index and the Water Quality Classification of a Highly Polluted River through Supervised Machine Learning
Water
water quality index prediction
regression and classification algorithms
Santiago-Guadalajara River
title Simple Prediction of an Ecosystem-Specific Water Quality Index and the Water Quality Classification of a Highly Polluted River through Supervised Machine Learning
title_full Simple Prediction of an Ecosystem-Specific Water Quality Index and the Water Quality Classification of a Highly Polluted River through Supervised Machine Learning
title_fullStr Simple Prediction of an Ecosystem-Specific Water Quality Index and the Water Quality Classification of a Highly Polluted River through Supervised Machine Learning
title_full_unstemmed Simple Prediction of an Ecosystem-Specific Water Quality Index and the Water Quality Classification of a Highly Polluted River through Supervised Machine Learning
title_short Simple Prediction of an Ecosystem-Specific Water Quality Index and the Water Quality Classification of a Highly Polluted River through Supervised Machine Learning
title_sort simple prediction of an ecosystem specific water quality index and the water quality classification of a highly polluted river through supervised machine learning
topic water quality index prediction
regression and classification algorithms
Santiago-Guadalajara River
url https://www.mdpi.com/2073-4441/14/8/1235
work_keys_str_mv AT albertofernandezdelcastillo simplepredictionofanecosystemspecificwaterqualityindexandthewaterqualityclassificationofahighlypollutedriverthroughsupervisedmachinelearning
AT carlosyebramontes simplepredictionofanecosystemspecificwaterqualityindexandthewaterqualityclassificationofahighlypollutedriverthroughsupervisedmachinelearning
AT marycarmenverduzcogaribay simplepredictionofanecosystemspecificwaterqualityindexandthewaterqualityclassificationofahighlypollutedriverthroughsupervisedmachinelearning
AT josedeanda simplepredictionofanecosystemspecificwaterqualityindexandthewaterqualityclassificationofahighlypollutedriverthroughsupervisedmachinelearning
AT alejandrogarciagonzalez simplepredictionofanecosystemspecificwaterqualityindexandthewaterqualityclassificationofahighlypollutedriverthroughsupervisedmachinelearning
AT misaelsebastiangradillahernandez simplepredictionofanecosystemspecificwaterqualityindexandthewaterqualityclassificationofahighlypollutedriverthroughsupervisedmachinelearning