KNN vs. Bluecat—Machine Learning vs. Classical Statistics

Uncertainty is inherent in the modelling of any physical processes. Regarding hydrological modelling, the uncertainty has multiple sources including the measurement errors of the stresses (the model inputs), the measurement errors of the hydrological process of interest (the observations against whi...

Full description

Bibliographic Details
Main Authors: Evangelos Rozos, Demetris Koutsoyiannis, Alberto Montanari
Format: Article
Language:English
Published: MDPI AG 2022-06-01
Series:Hydrology
Subjects:
Online Access:https://www.mdpi.com/2306-5338/9/6/101
_version_ 1797486770918522880
author Evangelos Rozos
Demetris Koutsoyiannis
Alberto Montanari
author_facet Evangelos Rozos
Demetris Koutsoyiannis
Alberto Montanari
author_sort Evangelos Rozos
collection DOAJ
description Uncertainty is inherent in the modelling of any physical processes. Regarding hydrological modelling, the uncertainty has multiple sources including the measurement errors of the stresses (the model inputs), the measurement errors of the hydrological process of interest (the observations against which the model is calibrated), the model limitations, etc. The typical techniques to assess this uncertainty (e.g., Monte Carlo simulation) are computationally expensive and require specific preparations for each individual application (e.g., selection of appropriate probability distribution). Recently, data-driven methods have been suggested that attempt to estimate the uncertainty of a model simulation based exclusively on the available data. In this study, two data-driven methods were employed, one based on machine learning techniques, and one based on statistical approaches. These methods were tested in two real-world case studies to obtain conclusions regarding their reliability. Furthermore, the flexibility of the machine learning method allowed assessing more complex sampling schemes for the data-driven estimation of the uncertainty. The anatomisation of the algorithmic background of the two methods revealed similarities between them, with the background of the statistical method being more theoretically robust. Nevertheless, the results from the case studies indicated that both methods perform equivalently well. For this reason, data-driven methods can become a valuable tool for practitioners.
first_indexed 2024-03-09T23:38:00Z
format Article
id doaj.art-77ea5be6bfc445878076777b765556fb
institution Directory Open Access Journal
issn 2306-5338
language English
last_indexed 2024-03-09T23:38:00Z
publishDate 2022-06-01
publisher MDPI AG
record_format Article
series Hydrology
spelling doaj.art-77ea5be6bfc445878076777b765556fb2023-11-23T16:56:34ZengMDPI AGHydrology2306-53382022-06-019610110.3390/hydrology9060101KNN vs. Bluecat—Machine Learning vs. Classical StatisticsEvangelos Rozos0Demetris Koutsoyiannis1Alberto Montanari2Institute for Environmental Research & Sustainable Development, National Observatory of Athens, 15236 Athens, GreeceDepartment of Water Resources and Environmental Engineering, School of Civil Engineering, National Technical University of Athens, 15780 Athens, GreeceDepartment of Civil, Chemical, Environmental and Materials Engineering (DICAM), University of Bologna, 40136 Bologna, ItalyUncertainty is inherent in the modelling of any physical processes. Regarding hydrological modelling, the uncertainty has multiple sources including the measurement errors of the stresses (the model inputs), the measurement errors of the hydrological process of interest (the observations against which the model is calibrated), the model limitations, etc. The typical techniques to assess this uncertainty (e.g., Monte Carlo simulation) are computationally expensive and require specific preparations for each individual application (e.g., selection of appropriate probability distribution). Recently, data-driven methods have been suggested that attempt to estimate the uncertainty of a model simulation based exclusively on the available data. In this study, two data-driven methods were employed, one based on machine learning techniques, and one based on statistical approaches. These methods were tested in two real-world case studies to obtain conclusions regarding their reliability. Furthermore, the flexibility of the machine learning method allowed assessing more complex sampling schemes for the data-driven estimation of the uncertainty. The anatomisation of the algorithmic background of the two methods revealed similarities between them, with the background of the statistical method being more theoretically robust. Nevertheless, the results from the case studies indicated that both methods perform equivalently well. For this reason, data-driven methods can become a valuable tool for practitioners.https://www.mdpi.com/2306-5338/9/6/101k-nearest neighboursdata-driven modellingmodel uncertaintymachine learningstatistical analysishydrological modelling
spellingShingle Evangelos Rozos
Demetris Koutsoyiannis
Alberto Montanari
KNN vs. Bluecat—Machine Learning vs. Classical Statistics
Hydrology
k-nearest neighbours
data-driven modelling
model uncertainty
machine learning
statistical analysis
hydrological modelling
title KNN vs. Bluecat—Machine Learning vs. Classical Statistics
title_full KNN vs. Bluecat—Machine Learning vs. Classical Statistics
title_fullStr KNN vs. Bluecat—Machine Learning vs. Classical Statistics
title_full_unstemmed KNN vs. Bluecat—Machine Learning vs. Classical Statistics
title_short KNN vs. Bluecat—Machine Learning vs. Classical Statistics
title_sort knn vs bluecat machine learning vs classical statistics
topic k-nearest neighbours
data-driven modelling
model uncertainty
machine learning
statistical analysis
hydrological modelling
url https://www.mdpi.com/2306-5338/9/6/101
work_keys_str_mv AT evangelosrozos knnvsbluecatmachinelearningvsclassicalstatistics
AT demetriskoutsoyiannis knnvsbluecatmachinelearningvsclassicalstatistics
AT albertomontanari knnvsbluecatmachinelearningvsclassicalstatistics