Assessing Information Transmission in Data Transformations with the Channel Multivariate Entropy Triangle

Data transformation, e.g., feature transformation and selection, is an integral part of any machine learning procedure. In this paper, we introduce an information-theoretic model and tools to assess the quality of data transformations in machine learning tasks. In an unsupervised fashion, we analyze...

Full description

Bibliographic Details
Main Authors: Francisco J. Valverde-Albacete, Carmen Peláez-Moreno
Format: Article
Language:English
Published: MDPI AG 2018-06-01
Series:Entropy
Subjects:
Online Access:http://www.mdpi.com/1099-4300/20/7/498
_version_ 1818007965488119808
author Francisco J. Valverde-Albacete
Carmen Peláez-Moreno
author_facet Francisco J. Valverde-Albacete
Carmen Peláez-Moreno
author_sort Francisco J. Valverde-Albacete
collection DOAJ
description Data transformation, e.g., feature transformation and selection, is an integral part of any machine learning procedure. In this paper, we introduce an information-theoretic model and tools to assess the quality of data transformations in machine learning tasks. In an unsupervised fashion, we analyze the transformation of a discrete, multivariate source of information X¯ into a discrete, multivariate sink of information Y¯ related by a distribution PX¯Y¯. The first contribution is a decomposition of the maximal potential entropy of (X¯,Y¯), which we call a balance equation, into its (a) non-transferable, (b) transferable, but not transferred, and (c) transferred parts. Such balance equations can be represented in (de Finetti) entropy diagrams, our second set of contributions. The most important of these, the aggregate channel multivariate entropy triangle, is a visual exploratory tool to assess the effectiveness of multivariate data transformations in transferring information from input to output variables. We also show how these decomposition and balance equations also apply to the entropies of X¯ and Y¯, respectively, and generate entropy triangles for them. As an example, we present the application of these tools to the assessment of information transfer efficiency for Principal Component Analysis and Independent Component Analysis as unsupervised feature transformation and selection procedures in supervised classification tasks.
first_indexed 2024-04-14T05:22:35Z
format Article
id doaj.art-86b5550402eb44e981bd5fa219f1d292
institution Directory Open Access Journal
issn 1099-4300
language English
last_indexed 2024-04-14T05:22:35Z
publishDate 2018-06-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj.art-86b5550402eb44e981bd5fa219f1d2922022-12-22T02:10:08ZengMDPI AGEntropy1099-43002018-06-0120749810.3390/e20070498e20070498Assessing Information Transmission in Data Transformations with the Channel Multivariate Entropy TriangleFrancisco J. Valverde-Albacete0Carmen Peláez-Moreno1Department of Signal Theory and Communications, Universidad Carlos III de Madrid, Leganés 28911, SpainDepartment of Signal Theory and Communications, Universidad Carlos III de Madrid, Leganés 28911, SpainData transformation, e.g., feature transformation and selection, is an integral part of any machine learning procedure. In this paper, we introduce an information-theoretic model and tools to assess the quality of data transformations in machine learning tasks. In an unsupervised fashion, we analyze the transformation of a discrete, multivariate source of information X¯ into a discrete, multivariate sink of information Y¯ related by a distribution PX¯Y¯. The first contribution is a decomposition of the maximal potential entropy of (X¯,Y¯), which we call a balance equation, into its (a) non-transferable, (b) transferable, but not transferred, and (c) transferred parts. Such balance equations can be represented in (de Finetti) entropy diagrams, our second set of contributions. The most important of these, the aggregate channel multivariate entropy triangle, is a visual exploratory tool to assess the effectiveness of multivariate data transformations in transferring information from input to output variables. We also show how these decomposition and balance equations also apply to the entropies of X¯ and Y¯, respectively, and generate entropy triangles for them. As an example, we present the application of these tools to the assessment of information transfer efficiency for Principal Component Analysis and Independent Component Analysis as unsupervised feature transformation and selection procedures in supervised classification tasks.http://www.mdpi.com/1099-4300/20/7/498entropy, entropy visualizationentropy balance equationShannon-type relationsmultivariate analysismachine learning evaluationdata transformation
spellingShingle Francisco J. Valverde-Albacete
Carmen Peláez-Moreno
Assessing Information Transmission in Data Transformations with the Channel Multivariate Entropy Triangle
Entropy
entropy, entropy visualization
entropy balance equation
Shannon-type relations
multivariate analysis
machine learning evaluation
data transformation
title Assessing Information Transmission in Data Transformations with the Channel Multivariate Entropy Triangle
title_full Assessing Information Transmission in Data Transformations with the Channel Multivariate Entropy Triangle
title_fullStr Assessing Information Transmission in Data Transformations with the Channel Multivariate Entropy Triangle
title_full_unstemmed Assessing Information Transmission in Data Transformations with the Channel Multivariate Entropy Triangle
title_short Assessing Information Transmission in Data Transformations with the Channel Multivariate Entropy Triangle
title_sort assessing information transmission in data transformations with the channel multivariate entropy triangle
topic entropy, entropy visualization
entropy balance equation
Shannon-type relations
multivariate analysis
machine learning evaluation
data transformation
url http://www.mdpi.com/1099-4300/20/7/498
work_keys_str_mv AT franciscojvalverdealbacete assessinginformationtransmissionindatatransformationswiththechannelmultivariateentropytriangle
AT carmenpelaezmoreno assessinginformationtransmissionindatatransformationswiththechannelmultivariateentropytriangle