Interpretable SAM-kNN Regressor for Incremental Learning on High-Dimensional Data Streams
In many real-world scenarios, data are provided as a potentially infinite stream of samples that are subject to changes in the underlying data distribution, a phenomenon often referred to as concept drift. A specific facet of concept drift is feature drift, where the relevance of a feature to the pr...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Taylor & Francis Group
2023-12-01
|
Series: | Applied Artificial Intelligence |
Online Access: | http://dx.doi.org/10.1080/08839514.2023.2198846 |
_version_ | 1797684794566377472 |
---|---|
author | Jonathan Jakob André Artelt Martina Hasenjäger Barbara Hammer |
author_facet | Jonathan Jakob André Artelt Martina Hasenjäger Barbara Hammer |
author_sort | Jonathan Jakob |
collection | DOAJ |
description | In many real-world scenarios, data are provided as a potentially infinite stream of samples that are subject to changes in the underlying data distribution, a phenomenon often referred to as concept drift. A specific facet of concept drift is feature drift, where the relevance of a feature to the problem at hand changes over time. High-dimensionality of the data poses an additional challenge to learning algorithms operating in such environments. Common scenarios of this nature can for example be found in sensor-based maintenance operations of industrial machines or inside entire networks, such as power grids or water distribution systems. However, since most existing methods for incremental learning focus on classification tasks, efficient online learning for regression is still an underdeveloped area. In this work, we introduce an extension to the SAM-kNN Regressor that incorporates metric learning in order to improve the prediction quality on data streams, gain insights into the relevance of different input features and based on that, transform the input data into a lower dimension in order to improve computational complexity and suitability for high-dimensional data. We evaluate our proposed method on artificial data, to demonstrate its applicability in various scenarios. In addition to that, we apply the method to the real-world problem of water distribution network monitoring. Specifically, we demonstrate that sensor faults in the water distribution network can be detected by monitoring the feature relevances computed by our algorithm. |
first_indexed | 2024-03-12T00:34:57Z |
format | Article |
id | doaj.art-32738ebc4484455c85f7390faaf97305 |
institution | Directory Open Access Journal |
issn | 0883-9514 1087-6545 |
language | English |
last_indexed | 2024-03-12T00:34:57Z |
publishDate | 2023-12-01 |
publisher | Taylor & Francis Group |
record_format | Article |
series | Applied Artificial Intelligence |
spelling | doaj.art-32738ebc4484455c85f7390faaf973052023-09-15T10:01:06ZengTaylor & Francis GroupApplied Artificial Intelligence0883-95141087-65452023-12-0137110.1080/08839514.2023.21988462198846Interpretable SAM-kNN Regressor for Incremental Learning on High-Dimensional Data StreamsJonathan Jakob0André Artelt1Martina Hasenjäger2Barbara Hammer3Bielefeld UniversityBielefeld UniversityHonda Research InstituteBielefeld UniversityIn many real-world scenarios, data are provided as a potentially infinite stream of samples that are subject to changes in the underlying data distribution, a phenomenon often referred to as concept drift. A specific facet of concept drift is feature drift, where the relevance of a feature to the problem at hand changes over time. High-dimensionality of the data poses an additional challenge to learning algorithms operating in such environments. Common scenarios of this nature can for example be found in sensor-based maintenance operations of industrial machines or inside entire networks, such as power grids or water distribution systems. However, since most existing methods for incremental learning focus on classification tasks, efficient online learning for regression is still an underdeveloped area. In this work, we introduce an extension to the SAM-kNN Regressor that incorporates metric learning in order to improve the prediction quality on data streams, gain insights into the relevance of different input features and based on that, transform the input data into a lower dimension in order to improve computational complexity and suitability for high-dimensional data. We evaluate our proposed method on artificial data, to demonstrate its applicability in various scenarios. In addition to that, we apply the method to the real-world problem of water distribution network monitoring. Specifically, we demonstrate that sensor faults in the water distribution network can be detected by monitoring the feature relevances computed by our algorithm.http://dx.doi.org/10.1080/08839514.2023.2198846 |
spellingShingle | Jonathan Jakob André Artelt Martina Hasenjäger Barbara Hammer Interpretable SAM-kNN Regressor for Incremental Learning on High-Dimensional Data Streams Applied Artificial Intelligence |
title | Interpretable SAM-kNN Regressor for Incremental Learning on High-Dimensional Data Streams |
title_full | Interpretable SAM-kNN Regressor for Incremental Learning on High-Dimensional Data Streams |
title_fullStr | Interpretable SAM-kNN Regressor for Incremental Learning on High-Dimensional Data Streams |
title_full_unstemmed | Interpretable SAM-kNN Regressor for Incremental Learning on High-Dimensional Data Streams |
title_short | Interpretable SAM-kNN Regressor for Incremental Learning on High-Dimensional Data Streams |
title_sort | interpretable sam knn regressor for incremental learning on high dimensional data streams |
url | http://dx.doi.org/10.1080/08839514.2023.2198846 |
work_keys_str_mv | AT jonathanjakob interpretablesamknnregressorforincrementallearningonhighdimensionaldatastreams AT andreartelt interpretablesamknnregressorforincrementallearningonhighdimensionaldatastreams AT martinahasenjager interpretablesamknnregressorforincrementallearningonhighdimensionaldatastreams AT barbarahammer interpretablesamknnregressorforincrementallearningonhighdimensionaldatastreams |