Interpretable SAM-kNN Regressor for Incremental Learning on High-Dimensional Data Streams

In many real-world scenarios, data are provided as a potentially infinite stream of samples that are subject to changes in the underlying data distribution, a phenomenon often referred to as concept drift. A specific facet of concept drift is feature drift, where the relevance of a feature to the pr...

Full description

Bibliographic Details
Main Authors: Jonathan Jakob, André Artelt, Martina Hasenjäger, Barbara Hammer
Format: Article
Language:English
Published: Taylor & Francis Group 2023-12-01
Series:Applied Artificial Intelligence
Online Access:http://dx.doi.org/10.1080/08839514.2023.2198846
_version_ 1797684794566377472
author Jonathan Jakob
André Artelt
Martina Hasenjäger
Barbara Hammer
author_facet Jonathan Jakob
André Artelt
Martina Hasenjäger
Barbara Hammer
author_sort Jonathan Jakob
collection DOAJ
description In many real-world scenarios, data are provided as a potentially infinite stream of samples that are subject to changes in the underlying data distribution, a phenomenon often referred to as concept drift. A specific facet of concept drift is feature drift, where the relevance of a feature to the problem at hand changes over time. High-dimensionality of the data poses an additional challenge to learning algorithms operating in such environments. Common scenarios of this nature can for example be found in sensor-based maintenance operations of industrial machines or inside entire networks, such as power grids or water distribution systems. However, since most existing methods for incremental learning focus on classification tasks, efficient online learning for regression is still an underdeveloped area. In this work, we introduce an extension to the SAM-kNN Regressor that incorporates metric learning in order to improve the prediction quality on data streams, gain insights into the relevance of different input features and based on that, transform the input data into a lower dimension in order to improve computational complexity and suitability for high-dimensional data. We evaluate our proposed method on artificial data, to demonstrate its applicability in various scenarios. In addition to that, we apply the method to the real-world problem of water distribution network monitoring. Specifically, we demonstrate that sensor faults in the water distribution network can be detected by monitoring the feature relevances computed by our algorithm.
first_indexed 2024-03-12T00:34:57Z
format Article
id doaj.art-32738ebc4484455c85f7390faaf97305
institution Directory Open Access Journal
issn 0883-9514
1087-6545
language English
last_indexed 2024-03-12T00:34:57Z
publishDate 2023-12-01
publisher Taylor & Francis Group
record_format Article
series Applied Artificial Intelligence
spelling doaj.art-32738ebc4484455c85f7390faaf973052023-09-15T10:01:06ZengTaylor & Francis GroupApplied Artificial Intelligence0883-95141087-65452023-12-0137110.1080/08839514.2023.21988462198846Interpretable SAM-kNN Regressor for Incremental Learning on High-Dimensional Data StreamsJonathan Jakob0André Artelt1Martina Hasenjäger2Barbara Hammer3Bielefeld UniversityBielefeld UniversityHonda Research InstituteBielefeld UniversityIn many real-world scenarios, data are provided as a potentially infinite stream of samples that are subject to changes in the underlying data distribution, a phenomenon often referred to as concept drift. A specific facet of concept drift is feature drift, where the relevance of a feature to the problem at hand changes over time. High-dimensionality of the data poses an additional challenge to learning algorithms operating in such environments. Common scenarios of this nature can for example be found in sensor-based maintenance operations of industrial machines or inside entire networks, such as power grids or water distribution systems. However, since most existing methods for incremental learning focus on classification tasks, efficient online learning for regression is still an underdeveloped area. In this work, we introduce an extension to the SAM-kNN Regressor that incorporates metric learning in order to improve the prediction quality on data streams, gain insights into the relevance of different input features and based on that, transform the input data into a lower dimension in order to improve computational complexity and suitability for high-dimensional data. We evaluate our proposed method on artificial data, to demonstrate its applicability in various scenarios. In addition to that, we apply the method to the real-world problem of water distribution network monitoring. Specifically, we demonstrate that sensor faults in the water distribution network can be detected by monitoring the feature relevances computed by our algorithm.http://dx.doi.org/10.1080/08839514.2023.2198846
spellingShingle Jonathan Jakob
André Artelt
Martina Hasenjäger
Barbara Hammer
Interpretable SAM-kNN Regressor for Incremental Learning on High-Dimensional Data Streams
Applied Artificial Intelligence
title Interpretable SAM-kNN Regressor for Incremental Learning on High-Dimensional Data Streams
title_full Interpretable SAM-kNN Regressor for Incremental Learning on High-Dimensional Data Streams
title_fullStr Interpretable SAM-kNN Regressor for Incremental Learning on High-Dimensional Data Streams
title_full_unstemmed Interpretable SAM-kNN Regressor for Incremental Learning on High-Dimensional Data Streams
title_short Interpretable SAM-kNN Regressor for Incremental Learning on High-Dimensional Data Streams
title_sort interpretable sam knn regressor for incremental learning on high dimensional data streams
url http://dx.doi.org/10.1080/08839514.2023.2198846
work_keys_str_mv AT jonathanjakob interpretablesamknnregressorforincrementallearningonhighdimensionaldatastreams
AT andreartelt interpretablesamknnregressorforincrementallearningonhighdimensionaldatastreams
AT martinahasenjager interpretablesamknnregressorforincrementallearningonhighdimensionaldatastreams
AT barbarahammer interpretablesamknnregressorforincrementallearningonhighdimensionaldatastreams