Detection of Representative Variables in Complex Systems with Interpretable Rules Using Core-Clusters

In this paper, we present a new framework dedicated to the robust detection of representative variables in high dimensional spaces with a potentially limited number of observations. Representative variables are selected by using an original regularization strategy: they are the center of specific va...

Full description

Bibliographic Details
Main Authors: Camille Champion, Anne-Claire Brunet, Rémy Burcelin, Jean-Michel Loubes, Laurent Risser
Format: Article
Language:English
Published: MDPI AG 2021-02-01
Series:Algorithms
Subjects:
Online Access:https://www.mdpi.com/1999-4893/14/2/66
_version_ 1797395722519183360
author Camille Champion
Anne-Claire Brunet
Rémy Burcelin
Jean-Michel Loubes
Laurent Risser
author_facet Camille Champion
Anne-Claire Brunet
Rémy Burcelin
Jean-Michel Loubes
Laurent Risser
author_sort Camille Champion
collection DOAJ
description In this paper, we present a new framework dedicated to the robust detection of representative variables in high dimensional spaces with a potentially limited number of observations. Representative variables are selected by using an original regularization strategy: they are the center of specific variable clusters, denoted CORE-clusters, which respect fully interpretable constraints. Each CORE-cluster indeed contains more than a predefined amount of variables and each pair of its variables has a coherent behavior in the observed data. The key advantage of our regularization strategy is therefore that it only requires to tune two intuitive parameters: the minimal dimension of the CORE-clusters and the minimum level of similarity which gathers their variables. Interpreting the role played by a selected representative variable is additionally obvious as it has a similar observed behaviour as a controlled number of other variables. After introducing and justifying this variable selection formalism, we propose two algorithmic strategies to detect the CORE-clusters, one of them scaling particularly well to high-dimensional data. Results obtained on synthetic as well as real data are finally presented.
first_indexed 2024-03-09T00:38:41Z
format Article
id doaj.art-a4e6f0d76765453a83ad5a4b0ac407be
institution Directory Open Access Journal
issn 1999-4893
language English
last_indexed 2024-03-09T00:38:41Z
publishDate 2021-02-01
publisher MDPI AG
record_format Article
series Algorithms
spelling doaj.art-a4e6f0d76765453a83ad5a4b0ac407be2023-12-11T17:57:43ZengMDPI AGAlgorithms1999-48932021-02-011426610.3390/a14020066Detection of Representative Variables in Complex Systems with Interpretable Rules Using Core-ClustersCamille Champion0Anne-Claire Brunet1Rémy Burcelin2Jean-Michel Loubes3Laurent Risser4Toulouse Mathematics Institute (UMR 5219), CNRS, University of Toulouse, F-31062 Toulouse, FranceToulouse Mathematics Institute (UMR 5219), CNRS, University of Toulouse, F-31062 Toulouse, FranceInstitute of Cardiovascular and Metabolic Diseases INSERM, F-31432 Toulouse, FranceToulouse Mathematics Institute (UMR 5219), CNRS, University of Toulouse, F-31062 Toulouse, FranceToulouse Mathematics Institute (UMR 5219), CNRS, University of Toulouse, F-31062 Toulouse, FranceIn this paper, we present a new framework dedicated to the robust detection of representative variables in high dimensional spaces with a potentially limited number of observations. Representative variables are selected by using an original regularization strategy: they are the center of specific variable clusters, denoted CORE-clusters, which respect fully interpretable constraints. Each CORE-cluster indeed contains more than a predefined amount of variables and each pair of its variables has a coherent behavior in the observed data. The key advantage of our regularization strategy is therefore that it only requires to tune two intuitive parameters: the minimal dimension of the CORE-clusters and the minimum level of similarity which gathers their variables. Interpreting the role played by a selected representative variable is additionally obvious as it has a similar observed behaviour as a controlled number of other variables. After introducing and justifying this variable selection formalism, we propose two algorithmic strategies to detect the CORE-clusters, one of them scaling particularly well to high-dimensional data. Results obtained on synthetic as well as real data are finally presented.https://www.mdpi.com/1999-4893/14/2/66feature selectionrepresentative variable detectioninterpretable machine learningregularizationcomplex datagraph clustering
spellingShingle Camille Champion
Anne-Claire Brunet
Rémy Burcelin
Jean-Michel Loubes
Laurent Risser
Detection of Representative Variables in Complex Systems with Interpretable Rules Using Core-Clusters
Algorithms
feature selection
representative variable detection
interpretable machine learning
regularization
complex data
graph clustering
title Detection of Representative Variables in Complex Systems with Interpretable Rules Using Core-Clusters
title_full Detection of Representative Variables in Complex Systems with Interpretable Rules Using Core-Clusters
title_fullStr Detection of Representative Variables in Complex Systems with Interpretable Rules Using Core-Clusters
title_full_unstemmed Detection of Representative Variables in Complex Systems with Interpretable Rules Using Core-Clusters
title_short Detection of Representative Variables in Complex Systems with Interpretable Rules Using Core-Clusters
title_sort detection of representative variables in complex systems with interpretable rules using core clusters
topic feature selection
representative variable detection
interpretable machine learning
regularization
complex data
graph clustering
url https://www.mdpi.com/1999-4893/14/2/66
work_keys_str_mv AT camillechampion detectionofrepresentativevariablesincomplexsystemswithinterpretablerulesusingcoreclusters
AT anneclairebrunet detectionofrepresentativevariablesincomplexsystemswithinterpretablerulesusingcoreclusters
AT remyburcelin detectionofrepresentativevariablesincomplexsystemswithinterpretablerulesusingcoreclusters
AT jeanmichelloubes detectionofrepresentativevariablesincomplexsystemswithinterpretablerulesusingcoreclusters
AT laurentrisser detectionofrepresentativevariablesincomplexsystemswithinterpretablerulesusingcoreclusters