Clustering of samples and variables with mixed-type data.

Analysis of data measured on different scales is a relevant challenge. Biomedical studies often focus on high-throughput datasets of, e.g., quantitative measurements. However, the need for integration of other features possibly measured on different scales, e.g. clinical or cytogenetic factors, beco...

Full description

Bibliographic Details
Main Authors: Manuela Hummel, Dominic Edelmann, Annette Kopp-Schneider
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2017-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC5705083?pdf=render
_version_ 1828879506701549568
author Manuela Hummel
Dominic Edelmann
Annette Kopp-Schneider
author_facet Manuela Hummel
Dominic Edelmann
Annette Kopp-Schneider
author_sort Manuela Hummel
collection DOAJ
description Analysis of data measured on different scales is a relevant challenge. Biomedical studies often focus on high-throughput datasets of, e.g., quantitative measurements. However, the need for integration of other features possibly measured on different scales, e.g. clinical or cytogenetic factors, becomes increasingly important. The analysis results (e.g. a selection of relevant genes) are then visualized, while adding further information, like clinical factors, on top. However, a more integrative approach is desirable, where all available data are analyzed jointly, and where also in the visualization different data sources are combined in a more natural way. Here we specifically target integrative visualization and present a heatmap-style graphic display. To this end, we develop and explore methods for clustering mixed-type data, with special focus on clustering variables. Clustering of variables does not receive as much attention in the literature as does clustering of samples. We extend the variables clustering methodology by two new approaches, one based on the combination of different association measures and the other on distance correlation. With simulation studies we evaluate and compare different clustering strategies. Applying specific methods for mixed-type data proves to be comparable and in many cases beneficial as compared to standard approaches applied to corresponding quantitative or binarized data. Our two novel approaches for mixed-type variables show similar or better performance than the existing methods ClustOfVar and bias-corrected mutual information. Further, in contrast to ClustOfVar, our methods provide dissimilarity matrices, which is an advantage, especially for the purpose of visualization. Real data examples aim to give an impression of various kinds of potential applications for the integrative heatmap and other graphical displays based on dissimilarity matrices. We demonstrate that the presented integrative heatmap provides more information than common data displays about the relationship among variables and samples. The described clustering and visualization methods are implemented in our R package CluMix available from https://cran.r-project.org/web/packages/CluMix.
first_indexed 2024-12-13T09:28:28Z
format Article
id doaj.art-baf6f71e69c34a70a05be7103e3ad369
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-13T09:28:28Z
publishDate 2017-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-baf6f71e69c34a70a05be7103e3ad3692022-12-21T23:52:33ZengPublic Library of Science (PLoS)PLoS ONE1932-62032017-01-011211e018827410.1371/journal.pone.0188274Clustering of samples and variables with mixed-type data.Manuela HummelDominic EdelmannAnnette Kopp-SchneiderAnalysis of data measured on different scales is a relevant challenge. Biomedical studies often focus on high-throughput datasets of, e.g., quantitative measurements. However, the need for integration of other features possibly measured on different scales, e.g. clinical or cytogenetic factors, becomes increasingly important. The analysis results (e.g. a selection of relevant genes) are then visualized, while adding further information, like clinical factors, on top. However, a more integrative approach is desirable, where all available data are analyzed jointly, and where also in the visualization different data sources are combined in a more natural way. Here we specifically target integrative visualization and present a heatmap-style graphic display. To this end, we develop and explore methods for clustering mixed-type data, with special focus on clustering variables. Clustering of variables does not receive as much attention in the literature as does clustering of samples. We extend the variables clustering methodology by two new approaches, one based on the combination of different association measures and the other on distance correlation. With simulation studies we evaluate and compare different clustering strategies. Applying specific methods for mixed-type data proves to be comparable and in many cases beneficial as compared to standard approaches applied to corresponding quantitative or binarized data. Our two novel approaches for mixed-type variables show similar or better performance than the existing methods ClustOfVar and bias-corrected mutual information. Further, in contrast to ClustOfVar, our methods provide dissimilarity matrices, which is an advantage, especially for the purpose of visualization. Real data examples aim to give an impression of various kinds of potential applications for the integrative heatmap and other graphical displays based on dissimilarity matrices. We demonstrate that the presented integrative heatmap provides more information than common data displays about the relationship among variables and samples. The described clustering and visualization methods are implemented in our R package CluMix available from https://cran.r-project.org/web/packages/CluMix.http://europepmc.org/articles/PMC5705083?pdf=render
spellingShingle Manuela Hummel
Dominic Edelmann
Annette Kopp-Schneider
Clustering of samples and variables with mixed-type data.
PLoS ONE
title Clustering of samples and variables with mixed-type data.
title_full Clustering of samples and variables with mixed-type data.
title_fullStr Clustering of samples and variables with mixed-type data.
title_full_unstemmed Clustering of samples and variables with mixed-type data.
title_short Clustering of samples and variables with mixed-type data.
title_sort clustering of samples and variables with mixed type data
url http://europepmc.org/articles/PMC5705083?pdf=render
work_keys_str_mv AT manuelahummel clusteringofsamplesandvariableswithmixedtypedata
AT dominicedelmann clusteringofsamplesandvariableswithmixedtypedata
AT annettekoppschneider clusteringofsamplesandvariableswithmixedtypedata