How to visualize high-dimensional data: a roadmap

International audience Discovery of the chronological or geographical distribution of collections of historical text can be more reliable when based on multivariate rather than on univariate data because multivariate data provide a more complete description. Where the data are high-dimensional, howe...

Full description

Bibliographic Details
Main Author: Hermann Moisl
Format: Article
Language:English
Published: Nicolas Turenne 2020-12-01
Series:Journal of Data Mining and Digital Humanities
Subjects:
Online Access:https://jdmdh.episciences.org/7021/pdf
_version_ 1818582771746996224
author Hermann Moisl
author_facet Hermann Moisl
author_sort Hermann Moisl
collection DOAJ
description International audience Discovery of the chronological or geographical distribution of collections of historical text can be more reliable when based on multivariate rather than on univariate data because multivariate data provide a more complete description. Where the data are high-dimensional, however, their complexity can defy analysis using traditional philological methods. The first step in dealing with such data is to visualize it using graphical methods in order to identify any latent structure. If found, such structure facilitates formulation of hypotheses which can be tested using a range of mathematical and statistical methods. Where, however, the dimensionality is greater than 3, direct graphical investigation is impossible. The present discussion presents a roadmap of how this obstacle can be overcome, and is in three main parts: the first part presents some fundamental data concepts, the second describes an example corpus and a high-dimensional data set derived from it, and the third outlines two approaches to visualization of that data set: dimensionality reduction and cluster analysis.
first_indexed 2024-12-16T07:54:41Z
format Article
id doaj.art-7655036ba85642e9901cfbbfc0a6bf5e
institution Directory Open Access Journal
issn 2416-5999
language English
last_indexed 2024-12-16T07:54:41Z
publishDate 2020-12-01
publisher Nicolas Turenne
record_format Article
series Journal of Data Mining and Digital Humanities
spelling doaj.art-7655036ba85642e9901cfbbfc0a6bf5e2022-12-21T22:38:46ZengNicolas TurenneJournal of Data Mining and Digital Humanities2416-59992020-12-01Special issue on Visualisations in Historical Linguisticsjdmdh:7021How to visualize high-dimensional data: a roadmapHermann MoislInternational audience Discovery of the chronological or geographical distribution of collections of historical text can be more reliable when based on multivariate rather than on univariate data because multivariate data provide a more complete description. Where the data are high-dimensional, however, their complexity can defy analysis using traditional philological methods. The first step in dealing with such data is to visualize it using graphical methods in order to identify any latent structure. If found, such structure facilitates formulation of hypotheses which can be tested using a range of mathematical and statistical methods. Where, however, the dimensionality is greater than 3, direct graphical investigation is impossible. The present discussion presents a roadmap of how this obstacle can be overcome, and is in three main parts: the first part presents some fundamental data concepts, the second describes an example corpus and a high-dimensional data set derived from it, and the third outlines two approaches to visualization of that data set: dimensionality reduction and cluster analysis.https://jdmdh.episciences.org/7021/pdfdimensionality reductionhigh dimensionalitymultivariate datadata visualizationcluster analysis[shs]humanities and social sciences
spellingShingle Hermann Moisl
How to visualize high-dimensional data: a roadmap
Journal of Data Mining and Digital Humanities
dimensionality reduction
high dimensionality
multivariate data
data visualization
cluster analysis
[shs]humanities and social sciences
title How to visualize high-dimensional data: a roadmap
title_full How to visualize high-dimensional data: a roadmap
title_fullStr How to visualize high-dimensional data: a roadmap
title_full_unstemmed How to visualize high-dimensional data: a roadmap
title_short How to visualize high-dimensional data: a roadmap
title_sort how to visualize high dimensional data a roadmap
topic dimensionality reduction
high dimensionality
multivariate data
data visualization
cluster analysis
[shs]humanities and social sciences
url https://jdmdh.episciences.org/7021/pdf
work_keys_str_mv AT hermannmoisl howtovisualizehighdimensionaldataaroadmap