Robust, scalable, and informative clustering for diverse biological networks

Abstract Clustering molecular data into informative groups is a primary step in extracting robust conclusions from big data. However, due to foundational issues in how they are defined and detected, such clusters are not always reliable, leading to unstable conclusions. We compare popular clustering...

Full description

Bibliographic Details
Main Authors: Chris Gaiteri, David R. Connell, Faraz A. Sultan, Artemis Iatrou, Bernard Ng, Boleslaw K. Szymanski, Ada Zhang, Shinya Tasaki
Format: Article
Language:English
Published: BMC 2023-10-01
Series:Genome Biology
Online Access:https://doi.org/10.1186/s13059-023-03062-0
_version_ 1797452432192569344
author Chris Gaiteri
David R. Connell
Faraz A. Sultan
Artemis Iatrou
Bernard Ng
Boleslaw K. Szymanski
Ada Zhang
Shinya Tasaki
author_facet Chris Gaiteri
David R. Connell
Faraz A. Sultan
Artemis Iatrou
Bernard Ng
Boleslaw K. Szymanski
Ada Zhang
Shinya Tasaki
author_sort Chris Gaiteri
collection DOAJ
description Abstract Clustering molecular data into informative groups is a primary step in extracting robust conclusions from big data. However, due to foundational issues in how they are defined and detected, such clusters are not always reliable, leading to unstable conclusions. We compare popular clustering algorithms across thousands of synthetic and real biological datasets, including a new consensus clustering algorithm—SpeakEasy2: Champagne. These tests identify trends in performance, show no single method is universally optimal, and allow us to examine factors behind variation in performance. Multiple metrics indicate SpeakEasy2 generally provides robust, scalable, and informative clusters for a range of applications.
first_indexed 2024-03-09T15:08:41Z
format Article
id doaj.art-3f74f51a008c47ccbc239894876d10e3
institution Directory Open Access Journal
issn 1474-760X
language English
last_indexed 2024-03-09T15:08:41Z
publishDate 2023-10-01
publisher BMC
record_format Article
series Genome Biology
spelling doaj.art-3f74f51a008c47ccbc239894876d10e32023-11-26T13:29:21ZengBMCGenome Biology1474-760X2023-10-0124112710.1186/s13059-023-03062-0Robust, scalable, and informative clustering for diverse biological networksChris Gaiteri0David R. Connell1Faraz A. Sultan2Artemis Iatrou3Bernard Ng4Boleslaw K. Szymanski5Ada Zhang6Shinya Tasaki7Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical UniversityRush University Graduate College, Rush University Medical CenterRush Alzheimer’s Disease Center, Rush University Medical CenterRush Alzheimer’s Disease Center, Rush University Medical CenterDepartment of Psychiatry and Behavioral Sciences, SUNY Upstate Medical UniversityDepartment of Computer Science, Rensselaer Polytechnic InstituteDepartment of Psychiatry and Behavioral Sciences, SUNY Upstate Medical UniversityRush Alzheimer’s Disease Center, Rush University Medical CenterAbstract Clustering molecular data into informative groups is a primary step in extracting robust conclusions from big data. However, due to foundational issues in how they are defined and detected, such clusters are not always reliable, leading to unstable conclusions. We compare popular clustering algorithms across thousands of synthetic and real biological datasets, including a new consensus clustering algorithm—SpeakEasy2: Champagne. These tests identify trends in performance, show no single method is universally optimal, and allow us to examine factors behind variation in performance. Multiple metrics indicate SpeakEasy2 generally provides robust, scalable, and informative clusters for a range of applications.https://doi.org/10.1186/s13059-023-03062-0
spellingShingle Chris Gaiteri
David R. Connell
Faraz A. Sultan
Artemis Iatrou
Bernard Ng
Boleslaw K. Szymanski
Ada Zhang
Shinya Tasaki
Robust, scalable, and informative clustering for diverse biological networks
Genome Biology
title Robust, scalable, and informative clustering for diverse biological networks
title_full Robust, scalable, and informative clustering for diverse biological networks
title_fullStr Robust, scalable, and informative clustering for diverse biological networks
title_full_unstemmed Robust, scalable, and informative clustering for diverse biological networks
title_short Robust, scalable, and informative clustering for diverse biological networks
title_sort robust scalable and informative clustering for diverse biological networks
url https://doi.org/10.1186/s13059-023-03062-0
work_keys_str_mv AT chrisgaiteri robustscalableandinformativeclusteringfordiversebiologicalnetworks
AT davidrconnell robustscalableandinformativeclusteringfordiversebiologicalnetworks
AT farazasultan robustscalableandinformativeclusteringfordiversebiologicalnetworks
AT artemisiatrou robustscalableandinformativeclusteringfordiversebiologicalnetworks
AT bernardng robustscalableandinformativeclusteringfordiversebiologicalnetworks
AT boleslawkszymanski robustscalableandinformativeclusteringfordiversebiologicalnetworks
AT adazhang robustscalableandinformativeclusteringfordiversebiologicalnetworks
AT shinyatasaki robustscalableandinformativeclusteringfordiversebiologicalnetworks