Nearest Neighbor Networks: clustering expression data based on gene neighborhoods

Abstract Background The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell....

Full description

Bibliographic Details
Main Authors:	Olszewski Kellen L, Myers Chad L, Sahi Sauhard, Landis Jessica N, Flamholz Avi I, Huttenhower Curtis, Hibbs Matthew A, Siemers Nathan O, Troyanskaya Olga G, Coller Hilary A
Format:	Article
Language:	English
Published:	BMC 2007-07-01
Series:	BMC Bioinformatics
Online Access:	http://www.biomedcentral.com/1471-2105/8/250

_version_	1828219612154560512
author	Olszewski Kellen L Myers Chad L Sahi Sauhard Landis Jessica N Flamholz Avi I Huttenhower Curtis Hibbs Matthew A Siemers Nathan O Troyanskaya Olga G Coller Hilary A
author_facet	Olszewski Kellen L Myers Chad L Sahi Sauhard Landis Jessica N Flamholz Avi I Huttenhower Curtis Hibbs Matthew A Siemers Nathan O Troyanskaya Olga G Coller Hilary A
author_sort	Olszewski Kellen L
collection	DOAJ
description	<p>Abstract</p> <p>Background</p> <p>The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes).</p> <p>Results</p> <p>We developed Nearest Neighbor Networks (NNN), a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods.</p> <p>Conclusion</p> <p>The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the analysis of large datasets, and its ability to span a wide range of biological functions with high precision.</p>
first_indexed	2024-04-12T16:17:42Z
format	Article
id	doaj.art-df965bbc2d8e4f3a9e2a026d189f5e43
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-04-12T16:17:42Z
publishDate	2007-07-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-df965bbc2d8e4f3a9e2a026d189f5e432022-12-22T03:25:41ZengBMCBMC Bioinformatics1471-21052007-07-018125010.1186/1471-2105-8-250Nearest Neighbor Networks: clustering expression data based on gene neighborhoodsOlszewski Kellen LMyers Chad LSahi SauhardLandis Jessica NFlamholz Avi IHuttenhower CurtisHibbs Matthew ASiemers Nathan OTroyanskaya Olga GColler Hilary A<p>Abstract</p> <p>Background</p> <p>The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes).</p> <p>Results</p> <p>We developed Nearest Neighbor Networks (NNN), a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods.</p> <p>Conclusion</p> <p>The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the analysis of large datasets, and its ability to span a wide range of biological functions with high precision.</p>http://www.biomedcentral.com/1471-2105/8/250
spellingShingle	Olszewski Kellen L Myers Chad L Sahi Sauhard Landis Jessica N Flamholz Avi I Huttenhower Curtis Hibbs Matthew A Siemers Nathan O Troyanskaya Olga G Coller Hilary A Nearest Neighbor Networks: clustering expression data based on gene neighborhoods BMC Bioinformatics
title	Nearest Neighbor Networks: clustering expression data based on gene neighborhoods
title_full	Nearest Neighbor Networks: clustering expression data based on gene neighborhoods
title_fullStr	Nearest Neighbor Networks: clustering expression data based on gene neighborhoods
title_full_unstemmed	Nearest Neighbor Networks: clustering expression data based on gene neighborhoods
title_short	Nearest Neighbor Networks: clustering expression data based on gene neighborhoods
title_sort	nearest neighbor networks clustering expression data based on gene neighborhoods
url	http://www.biomedcentral.com/1471-2105/8/250
work_keys_str_mv	AT olszewskikellenl nearestneighbornetworksclusteringexpressiondatabasedongeneneighborhoods AT myerschadl nearestneighbornetworksclusteringexpressiondatabasedongeneneighborhoods AT sahisauhard nearestneighbornetworksclusteringexpressiondatabasedongeneneighborhoods AT landisjessican nearestneighbornetworksclusteringexpressiondatabasedongeneneighborhoods AT flamholzavii nearestneighbornetworksclusteringexpressiondatabasedongeneneighborhoods AT huttenhowercurtis nearestneighbornetworksclusteringexpressiondatabasedongeneneighborhoods AT hibbsmatthewa nearestneighbornetworksclusteringexpressiondatabasedongeneneighborhoods AT siemersnathano nearestneighbornetworksclusteringexpressiondatabasedongeneneighborhoods AT troyanskayaolgag nearestneighbornetworksclusteringexpressiondatabasedongeneneighborhoods AT collerhilarya nearestneighbornetworksclusteringexpressiondatabasedongeneneighborhoods

Nearest Neighbor Networks: clustering expression data based on gene neighborhoods

Similar Items