Fuzzy Information Discrimination Measures and Their Application to Low Dimensional Embedding Construction in the UMAP Algorithm

Dimensionality reduction techniques are often used by researchers in order to make high dimensional data easier to interpret visually, as data visualization is only possible in low dimensional spaces. Recent research in nonlinear dimensionality reduction introduced many effective algorithms, includi...

Full description

Bibliographic Details
Main Authors: Liliya A. Demidova, Artyom V. Gorchakov
Format: Article
Language:English
Published: MDPI AG 2022-04-01
Series:Journal of Imaging
Subjects:
Online Access:https://www.mdpi.com/2313-433X/8/4/113
_version_ 1797445614724710400
author Liliya A. Demidova
Artyom V. Gorchakov
author_facet Liliya A. Demidova
Artyom V. Gorchakov
author_sort Liliya A. Demidova
collection DOAJ
description Dimensionality reduction techniques are often used by researchers in order to make high dimensional data easier to interpret visually, as data visualization is only possible in low dimensional spaces. Recent research in nonlinear dimensionality reduction introduced many effective algorithms, including t-distributed stochastic neighbor embedding (t-SNE), uniform manifold approximation and projection (UMAP), dimensionality reduction technique based on triplet constraints (TriMAP), and pairwise controlled manifold approximation (PaCMAP), aimed to preserve both the local and global structure of high dimensional data while reducing the dimensionality. The UMAP algorithm has found its application in bioinformatics, genetics, genomics, and has been widely used to improve the accuracy of other machine learning algorithms. In this research, we compare the performance of different fuzzy information discrimination measures used as loss functions in the UMAP algorithm while constructing low dimensional embeddings. In order to achieve this, we derive the gradients of the considered losses analytically and employ the Adam algorithm during the loss function optimization process. From the conducted experimental studies we conclude that the use of either the logarithmic fuzzy cross entropy loss without reduced repulsion or the symmetric logarithmic fuzzy cross entropy loss with sufficiently large neighbor count leads to better global structure preservation of the original multidimensional data when compared to the loss function used in the original UMAP algorithm implementation.
first_indexed 2024-03-09T13:29:25Z
format Article
id doaj.art-9ab65ff7a87c47e7a7279fd6238ae148
institution Directory Open Access Journal
issn 2313-433X
language English
last_indexed 2024-03-09T13:29:25Z
publishDate 2022-04-01
publisher MDPI AG
record_format Article
series Journal of Imaging
spelling doaj.art-9ab65ff7a87c47e7a7279fd6238ae1482023-11-30T21:20:39ZengMDPI AGJournal of Imaging2313-433X2022-04-018411310.3390/jimaging8040113Fuzzy Information Discrimination Measures and Their Application to Low Dimensional Embedding Construction in the UMAP AlgorithmLiliya A. Demidova0Artyom V. Gorchakov1Institute of Information Technologies, Federal State Budget Educational Institution of Higher Education “MIREA–Russian Technological University”, 78, Vernadsky Avenue, 119454 Moscow, RussiaInstitute of Information Technologies, Federal State Budget Educational Institution of Higher Education “MIREA–Russian Technological University”, 78, Vernadsky Avenue, 119454 Moscow, RussiaDimensionality reduction techniques are often used by researchers in order to make high dimensional data easier to interpret visually, as data visualization is only possible in low dimensional spaces. Recent research in nonlinear dimensionality reduction introduced many effective algorithms, including t-distributed stochastic neighbor embedding (t-SNE), uniform manifold approximation and projection (UMAP), dimensionality reduction technique based on triplet constraints (TriMAP), and pairwise controlled manifold approximation (PaCMAP), aimed to preserve both the local and global structure of high dimensional data while reducing the dimensionality. The UMAP algorithm has found its application in bioinformatics, genetics, genomics, and has been widely used to improve the accuracy of other machine learning algorithms. In this research, we compare the performance of different fuzzy information discrimination measures used as loss functions in the UMAP algorithm while constructing low dimensional embeddings. In order to achieve this, we derive the gradients of the considered losses analytically and employ the Adam algorithm during the loss function optimization process. From the conducted experimental studies we conclude that the use of either the logarithmic fuzzy cross entropy loss without reduced repulsion or the symmetric logarithmic fuzzy cross entropy loss with sufficiently large neighbor count leads to better global structure preservation of the original multidimensional data when compared to the loss function used in the original UMAP algorithm implementation.https://www.mdpi.com/2313-433X/8/4/113dimension reductiondata visualizationentropycross-entropyfuzzy logic
spellingShingle Liliya A. Demidova
Artyom V. Gorchakov
Fuzzy Information Discrimination Measures and Their Application to Low Dimensional Embedding Construction in the UMAP Algorithm
Journal of Imaging
dimension reduction
data visualization
entropy
cross-entropy
fuzzy logic
title Fuzzy Information Discrimination Measures and Their Application to Low Dimensional Embedding Construction in the UMAP Algorithm
title_full Fuzzy Information Discrimination Measures and Their Application to Low Dimensional Embedding Construction in the UMAP Algorithm
title_fullStr Fuzzy Information Discrimination Measures and Their Application to Low Dimensional Embedding Construction in the UMAP Algorithm
title_full_unstemmed Fuzzy Information Discrimination Measures and Their Application to Low Dimensional Embedding Construction in the UMAP Algorithm
title_short Fuzzy Information Discrimination Measures and Their Application to Low Dimensional Embedding Construction in the UMAP Algorithm
title_sort fuzzy information discrimination measures and their application to low dimensional embedding construction in the umap algorithm
topic dimension reduction
data visualization
entropy
cross-entropy
fuzzy logic
url https://www.mdpi.com/2313-433X/8/4/113
work_keys_str_mv AT liliyaademidova fuzzyinformationdiscriminationmeasuresandtheirapplicationtolowdimensionalembeddingconstructionintheumapalgorithm
AT artyomvgorchakov fuzzyinformationdiscriminationmeasuresandtheirapplicationtolowdimensionalembeddingconstructionintheumapalgorithm