M3C: Monte Carlo reference-based consensus clustering

Genome-wide data is used to stratify patients into classes for precision medicine using clustering algorithms. A common problem in this area is selection of the number of clusters (K). The Monti consensus clustering algorithm is a widely used method which uses stability selection to estimate K. Howe...

Full description

Bibliographic Details
Main Authors: John, CR, Watson, D, Russ, D, Goldmann, K, Ehrenstein, M, Pitzalis, C, Lewis, M, Barnes, M
Format: Journal article
Language:English
Published: Springer 2020
_version_ 1826262277160960000
author John, CR
Watson, D
Russ, D
Goldmann, K
Ehrenstein, M
Pitzalis, C
Lewis, M
Barnes, M
author_facet John, CR
Watson, D
Russ, D
Goldmann, K
Ehrenstein, M
Pitzalis, C
Lewis, M
Barnes, M
author_sort John, CR
collection OXFORD
description Genome-wide data is used to stratify patients into classes for precision medicine using clustering algorithms. A common problem in this area is selection of the number of clusters (K). The Monti consensus clustering algorithm is a widely used method which uses stability selection to estimate K. However, the method has bias towards higher values of K and yields high numbers of false positives. As a solution, we developed Monte Carlo reference-based consensus clustering (M3C), which is based on this algorithm. M3C simulates null distributions of stability scores for a range of K values thus enabling a comparison with real data to remove bias and statistically test for the presence of structure. M3C corrects the inherent bias of consensus clustering as demonstrated on simulated and real expression data from The Cancer Genome Atlas (TCGA). For testing M3C, we developed clusterlab, a new method for simulating multivariate Gaussian clusters.
first_indexed 2024-03-06T19:33:51Z
format Journal article
id oxford-uuid:1e5eebaf-d899-454d-b0f8-1f8a4539a54b
institution University of Oxford
language English
last_indexed 2024-03-06T19:33:51Z
publishDate 2020
publisher Springer
record_format dspace
spelling oxford-uuid:1e5eebaf-d899-454d-b0f8-1f8a4539a54b2022-03-26T11:16:04ZM3C: Monte Carlo reference-based consensus clusteringJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:1e5eebaf-d899-454d-b0f8-1f8a4539a54bEnglishSymplectic ElementsSpringer2020John, CRWatson, DRuss, DGoldmann, KEhrenstein, MPitzalis, CLewis, MBarnes, MGenome-wide data is used to stratify patients into classes for precision medicine using clustering algorithms. A common problem in this area is selection of the number of clusters (K). The Monti consensus clustering algorithm is a widely used method which uses stability selection to estimate K. However, the method has bias towards higher values of K and yields high numbers of false positives. As a solution, we developed Monte Carlo reference-based consensus clustering (M3C), which is based on this algorithm. M3C simulates null distributions of stability scores for a range of K values thus enabling a comparison with real data to remove bias and statistically test for the presence of structure. M3C corrects the inherent bias of consensus clustering as demonstrated on simulated and real expression data from The Cancer Genome Atlas (TCGA). For testing M3C, we developed clusterlab, a new method for simulating multivariate Gaussian clusters.
spellingShingle John, CR
Watson, D
Russ, D
Goldmann, K
Ehrenstein, M
Pitzalis, C
Lewis, M
Barnes, M
M3C: Monte Carlo reference-based consensus clustering
title M3C: Monte Carlo reference-based consensus clustering
title_full M3C: Monte Carlo reference-based consensus clustering
title_fullStr M3C: Monte Carlo reference-based consensus clustering
title_full_unstemmed M3C: Monte Carlo reference-based consensus clustering
title_short M3C: Monte Carlo reference-based consensus clustering
title_sort m3c monte carlo reference based consensus clustering
work_keys_str_mv AT johncr m3cmontecarloreferencebasedconsensusclustering
AT watsond m3cmontecarloreferencebasedconsensusclustering
AT russd m3cmontecarloreferencebasedconsensusclustering
AT goldmannk m3cmontecarloreferencebasedconsensusclustering
AT ehrensteinm m3cmontecarloreferencebasedconsensusclustering
AT pitzalisc m3cmontecarloreferencebasedconsensusclustering
AT lewism m3cmontecarloreferencebasedconsensusclustering
AT barnesm m3cmontecarloreferencebasedconsensusclustering