Evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinear
The aim of the study was to evaluate the possibility of applying different methods of data mining to model the inflow of sewage into the municipal sewage treatment plant. Prediction models were elaborated using methods of support vector machines (SVM), random forests (RF), k-nearest neighbour (k-NN)...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Polish Academy of Sciences
2017-09-01
|
Series: | Archives of Environmental Protection |
Subjects: | |
Online Access: | http://www.degruyter.com/view/j/aep.2017.43.issue-3/aep-2017-0030/aep-2017-0030.xml?format=INT |
_version_ | 1797430792856535040 |
---|---|
author | Szeląg Bartosz Bartkiewicz Lidia Studziński Jan Barbusiński Krzysztof |
author_facet | Szeląg Bartosz Bartkiewicz Lidia Studziński Jan Barbusiński Krzysztof |
author_sort | Szeląg Bartosz |
collection | DOAJ |
description | The aim of the study was to evaluate the possibility of applying different methods of data mining to model the inflow of sewage into the municipal sewage treatment plant. Prediction models were elaborated using methods of support vector machines (SVM), random forests (RF), k-nearest neighbour (k-NN) and of Kernel regression (K). Data consisted of the time series of daily rainfalls, water level measurements in the clarified sewage recipient and the wastewater inflow into the Rzeszow city plant. Results indicate that the best models with one input delayed by 1 day were obtained using the k-NN method while the worst with the K method. For the models with two input variables and one explanatory one the smallest errors were obtained if model inputs were sewage inflow and rainfall data delayed by 1 day and the best fit is provided using RF method while the worst with the K method. In the case of models with three inputs and two explanatory variables, the best results were reported for the SVM and the worst for the K method. In the most of the modelling runs the smallest prediction errors are obtained using the SVM method and the biggest ones with the K method. In the case of the simplest model with one input delayed by 1 day the best results are provided using k-NN method and by the models with two inputs in two modelling runs the RF method appeared as the best. |
first_indexed | 2024-03-09T09:33:38Z |
format | Article |
id | doaj.art-366c6883047e4849a0d0dd4e87bc3cce |
institution | Directory Open Access Journal |
issn | 2083-4810 |
language | English |
last_indexed | 2024-03-09T09:33:38Z |
publishDate | 2017-09-01 |
publisher | Polish Academy of Sciences |
record_format | Article |
series | Archives of Environmental Protection |
spelling | doaj.art-366c6883047e4849a0d0dd4e87bc3cce2023-12-02T02:44:44ZengPolish Academy of SciencesArchives of Environmental Protection2083-48102017-09-01433748110.1515/aep-2017-0030aep-2017-0030Evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinearSzeląg Bartosz0Bartkiewicz Lidia1Studziński Jan2Barbusiński Krzysztof3Kielce University of Technology, PolandKielce University of Technology, PolandSystems Research Institute PAN, PolandSilesian University of Technology, PolandThe aim of the study was to evaluate the possibility of applying different methods of data mining to model the inflow of sewage into the municipal sewage treatment plant. Prediction models were elaborated using methods of support vector machines (SVM), random forests (RF), k-nearest neighbour (k-NN) and of Kernel regression (K). Data consisted of the time series of daily rainfalls, water level measurements in the clarified sewage recipient and the wastewater inflow into the Rzeszow city plant. Results indicate that the best models with one input delayed by 1 day were obtained using the k-NN method while the worst with the K method. For the models with two input variables and one explanatory one the smallest errors were obtained if model inputs were sewage inflow and rainfall data delayed by 1 day and the best fit is provided using RF method while the worst with the K method. In the case of models with three inputs and two explanatory variables, the best results were reported for the SVM and the worst for the K method. In the most of the modelling runs the smallest prediction errors are obtained using the SVM method and the biggest ones with the K method. In the case of the simplest model with one input delayed by 1 day the best results are provided using k-NN method and by the models with two inputs in two modelling runs the RF method appeared as the best.http://www.degruyter.com/view/j/aep.2017.43.issue-3/aep-2017-0030/aep-2017-0030.xml?format=INTwastewater treatment plantData MiningRandom forestforecasting inflowk – nearest neighbourKernel regression |
spellingShingle | Szeląg Bartosz Bartkiewicz Lidia Studziński Jan Barbusiński Krzysztof Evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinear Archives of Environmental Protection wastewater treatment plant Data Mining Random forest forecasting inflow k – nearest neighbour Kernel regression |
title | Evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinear |
title_full | Evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinear |
title_fullStr | Evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinear |
title_full_unstemmed | Evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinear |
title_short | Evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinear |
title_sort | evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinear |
topic | wastewater treatment plant Data Mining Random forest forecasting inflow k – nearest neighbour Kernel regression |
url | http://www.degruyter.com/view/j/aep.2017.43.issue-3/aep-2017-0030/aep-2017-0030.xml?format=INT |
work_keys_str_mv | AT szelagbartosz evaluationoftheimpactofexplanatoryvariablesontheaccuracyofpredictionofdailyinflowtothesewagetreatmentplantbyselectedmodelsnonlinear AT bartkiewiczlidia evaluationoftheimpactofexplanatoryvariablesontheaccuracyofpredictionofdailyinflowtothesewagetreatmentplantbyselectedmodelsnonlinear AT studzinskijan evaluationoftheimpactofexplanatoryvariablesontheaccuracyofpredictionofdailyinflowtothesewagetreatmentplantbyselectedmodelsnonlinear AT barbusinskikrzysztof evaluationoftheimpactofexplanatoryvariablesontheaccuracyofpredictionofdailyinflowtothesewagetreatmentplantbyselectedmodelsnonlinear |