Robust, scalable, and informative clustering for diverse biological networks
Abstract Clustering molecular data into informative groups is a primary step in extracting robust conclusions from big data. However, due to foundational issues in how they are defined and detected, such clusters are not always reliable, leading to unstable conclusions. We compare popular clustering...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2023-10-01
|
Series: | Genome Biology |
Online Access: | https://doi.org/10.1186/s13059-023-03062-0 |
_version_ | 1797452432192569344 |
---|---|
author | Chris Gaiteri David R. Connell Faraz A. Sultan Artemis Iatrou Bernard Ng Boleslaw K. Szymanski Ada Zhang Shinya Tasaki |
author_facet | Chris Gaiteri David R. Connell Faraz A. Sultan Artemis Iatrou Bernard Ng Boleslaw K. Szymanski Ada Zhang Shinya Tasaki |
author_sort | Chris Gaiteri |
collection | DOAJ |
description | Abstract Clustering molecular data into informative groups is a primary step in extracting robust conclusions from big data. However, due to foundational issues in how they are defined and detected, such clusters are not always reliable, leading to unstable conclusions. We compare popular clustering algorithms across thousands of synthetic and real biological datasets, including a new consensus clustering algorithm—SpeakEasy2: Champagne. These tests identify trends in performance, show no single method is universally optimal, and allow us to examine factors behind variation in performance. Multiple metrics indicate SpeakEasy2 generally provides robust, scalable, and informative clusters for a range of applications. |
first_indexed | 2024-03-09T15:08:41Z |
format | Article |
id | doaj.art-3f74f51a008c47ccbc239894876d10e3 |
institution | Directory Open Access Journal |
issn | 1474-760X |
language | English |
last_indexed | 2024-03-09T15:08:41Z |
publishDate | 2023-10-01 |
publisher | BMC |
record_format | Article |
series | Genome Biology |
spelling | doaj.art-3f74f51a008c47ccbc239894876d10e32023-11-26T13:29:21ZengBMCGenome Biology1474-760X2023-10-0124112710.1186/s13059-023-03062-0Robust, scalable, and informative clustering for diverse biological networksChris Gaiteri0David R. Connell1Faraz A. Sultan2Artemis Iatrou3Bernard Ng4Boleslaw K. Szymanski5Ada Zhang6Shinya Tasaki7Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical UniversityRush University Graduate College, Rush University Medical CenterRush Alzheimer’s Disease Center, Rush University Medical CenterRush Alzheimer’s Disease Center, Rush University Medical CenterDepartment of Psychiatry and Behavioral Sciences, SUNY Upstate Medical UniversityDepartment of Computer Science, Rensselaer Polytechnic InstituteDepartment of Psychiatry and Behavioral Sciences, SUNY Upstate Medical UniversityRush Alzheimer’s Disease Center, Rush University Medical CenterAbstract Clustering molecular data into informative groups is a primary step in extracting robust conclusions from big data. However, due to foundational issues in how they are defined and detected, such clusters are not always reliable, leading to unstable conclusions. We compare popular clustering algorithms across thousands of synthetic and real biological datasets, including a new consensus clustering algorithm—SpeakEasy2: Champagne. These tests identify trends in performance, show no single method is universally optimal, and allow us to examine factors behind variation in performance. Multiple metrics indicate SpeakEasy2 generally provides robust, scalable, and informative clusters for a range of applications.https://doi.org/10.1186/s13059-023-03062-0 |
spellingShingle | Chris Gaiteri David R. Connell Faraz A. Sultan Artemis Iatrou Bernard Ng Boleslaw K. Szymanski Ada Zhang Shinya Tasaki Robust, scalable, and informative clustering for diverse biological networks Genome Biology |
title | Robust, scalable, and informative clustering for diverse biological networks |
title_full | Robust, scalable, and informative clustering for diverse biological networks |
title_fullStr | Robust, scalable, and informative clustering for diverse biological networks |
title_full_unstemmed | Robust, scalable, and informative clustering for diverse biological networks |
title_short | Robust, scalable, and informative clustering for diverse biological networks |
title_sort | robust scalable and informative clustering for diverse biological networks |
url | https://doi.org/10.1186/s13059-023-03062-0 |
work_keys_str_mv | AT chrisgaiteri robustscalableandinformativeclusteringfordiversebiologicalnetworks AT davidrconnell robustscalableandinformativeclusteringfordiversebiologicalnetworks AT farazasultan robustscalableandinformativeclusteringfordiversebiologicalnetworks AT artemisiatrou robustscalableandinformativeclusteringfordiversebiologicalnetworks AT bernardng robustscalableandinformativeclusteringfordiversebiologicalnetworks AT boleslawkszymanski robustscalableandinformativeclusteringfordiversebiologicalnetworks AT adazhang robustscalableandinformativeclusteringfordiversebiologicalnetworks AT shinyatasaki robustscalableandinformativeclusteringfordiversebiologicalnetworks |