Evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinear

The aim of the study was to evaluate the possibility of applying different methods of data mining to model the inflow of sewage into the municipal sewage treatment plant. Prediction models were elaborated using methods of support vector machines (SVM), random forests (RF), k-nearest neighbour (k-NN)...

Full description

Bibliographic Details
Main Authors: Szeląg Bartosz, Bartkiewicz Lidia, Studziński Jan, Barbusiński Krzysztof
Format: Article
Language:English
Published: Polish Academy of Sciences 2017-09-01
Series:Archives of Environmental Protection
Subjects:
Online Access:http://www.degruyter.com/view/j/aep.2017.43.issue-3/aep-2017-0030/aep-2017-0030.xml?format=INT
_version_ 1797430792856535040
author Szeląg Bartosz
Bartkiewicz Lidia
Studziński Jan
Barbusiński Krzysztof
author_facet Szeląg Bartosz
Bartkiewicz Lidia
Studziński Jan
Barbusiński Krzysztof
author_sort Szeląg Bartosz
collection DOAJ
description The aim of the study was to evaluate the possibility of applying different methods of data mining to model the inflow of sewage into the municipal sewage treatment plant. Prediction models were elaborated using methods of support vector machines (SVM), random forests (RF), k-nearest neighbour (k-NN) and of Kernel regression (K). Data consisted of the time series of daily rainfalls, water level measurements in the clarified sewage recipient and the wastewater inflow into the Rzeszow city plant. Results indicate that the best models with one input delayed by 1 day were obtained using the k-NN method while the worst with the K method. For the models with two input variables and one explanatory one the smallest errors were obtained if model inputs were sewage inflow and rainfall data delayed by 1 day and the best fit is provided using RF method while the worst with the K method. In the case of models with three inputs and two explanatory variables, the best results were reported for the SVM and the worst for the K method. In the most of the modelling runs the smallest prediction errors are obtained using the SVM method and the biggest ones with the K method. In the case of the simplest model with one input delayed by 1 day the best results are provided using k-NN method and by the models with two inputs in two modelling runs the RF method appeared as the best.
first_indexed 2024-03-09T09:33:38Z
format Article
id doaj.art-366c6883047e4849a0d0dd4e87bc3cce
institution Directory Open Access Journal
issn 2083-4810
language English
last_indexed 2024-03-09T09:33:38Z
publishDate 2017-09-01
publisher Polish Academy of Sciences
record_format Article
series Archives of Environmental Protection
spelling doaj.art-366c6883047e4849a0d0dd4e87bc3cce2023-12-02T02:44:44ZengPolish Academy of SciencesArchives of Environmental Protection2083-48102017-09-01433748110.1515/aep-2017-0030aep-2017-0030Evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinearSzeląg Bartosz0Bartkiewicz Lidia1Studziński Jan2Barbusiński Krzysztof3Kielce University of Technology, PolandKielce University of Technology, PolandSystems Research Institute PAN, PolandSilesian University of Technology, PolandThe aim of the study was to evaluate the possibility of applying different methods of data mining to model the inflow of sewage into the municipal sewage treatment plant. Prediction models were elaborated using methods of support vector machines (SVM), random forests (RF), k-nearest neighbour (k-NN) and of Kernel regression (K). Data consisted of the time series of daily rainfalls, water level measurements in the clarified sewage recipient and the wastewater inflow into the Rzeszow city plant. Results indicate that the best models with one input delayed by 1 day were obtained using the k-NN method while the worst with the K method. For the models with two input variables and one explanatory one the smallest errors were obtained if model inputs were sewage inflow and rainfall data delayed by 1 day and the best fit is provided using RF method while the worst with the K method. In the case of models with three inputs and two explanatory variables, the best results were reported for the SVM and the worst for the K method. In the most of the modelling runs the smallest prediction errors are obtained using the SVM method and the biggest ones with the K method. In the case of the simplest model with one input delayed by 1 day the best results are provided using k-NN method and by the models with two inputs in two modelling runs the RF method appeared as the best.http://www.degruyter.com/view/j/aep.2017.43.issue-3/aep-2017-0030/aep-2017-0030.xml?format=INTwastewater treatment plantData MiningRandom forestforecasting inflowk – nearest neighbourKernel regression
spellingShingle Szeląg Bartosz
Bartkiewicz Lidia
Studziński Jan
Barbusiński Krzysztof
Evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinear
Archives of Environmental Protection
wastewater treatment plant
Data Mining
Random forest
forecasting inflow
k – nearest neighbour
Kernel regression
title Evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinear
title_full Evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinear
title_fullStr Evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinear
title_full_unstemmed Evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinear
title_short Evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinear
title_sort evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinear
topic wastewater treatment plant
Data Mining
Random forest
forecasting inflow
k – nearest neighbour
Kernel regression
url http://www.degruyter.com/view/j/aep.2017.43.issue-3/aep-2017-0030/aep-2017-0030.xml?format=INT
work_keys_str_mv AT szelagbartosz evaluationoftheimpactofexplanatoryvariablesontheaccuracyofpredictionofdailyinflowtothesewagetreatmentplantbyselectedmodelsnonlinear
AT bartkiewiczlidia evaluationoftheimpactofexplanatoryvariablesontheaccuracyofpredictionofdailyinflowtothesewagetreatmentplantbyselectedmodelsnonlinear
AT studzinskijan evaluationoftheimpactofexplanatoryvariablesontheaccuracyofpredictionofdailyinflowtothesewagetreatmentplantbyselectedmodelsnonlinear
AT barbusinskikrzysztof evaluationoftheimpactofexplanatoryvariablesontheaccuracyofpredictionofdailyinflowtothesewagetreatmentplantbyselectedmodelsnonlinear