A comparison of four clustering methods for brain expression microarray data

<p>Abstract</p> <p>Background</p> <p>DNA microarrays, which determine the expression levels of tens of thousands of genes from a sample, are an important research tool. However, the volume of data they produce can be an obstacle to interpretation of the results. Cluster...

Full description

Bibliographic Details
Main Authors: Owen Michael J, O'Donovan Michael C, Holmans Peter, Richards Alexander L, Jones Lesley
Format: Article
Language:English
Published: BMC 2008-11-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/9/490
_version_ 1818148007845036032
author Owen Michael J
O'Donovan Michael C
Holmans Peter
Richards Alexander L
Jones Lesley
author_facet Owen Michael J
O'Donovan Michael C
Holmans Peter
Richards Alexander L
Jones Lesley
author_sort Owen Michael J
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>DNA microarrays, which determine the expression levels of tens of thousands of genes from a sample, are an important research tool. However, the volume of data they produce can be an obstacle to interpretation of the results. Clustering the genes on the basis of similarity of their expression profiles can simplify the data, and potentially provides an important source of biological inference, but these methods have not been tested systematically on datasets from complex human tissues. In this paper, four clustering methods, CRC, k-means, ISA and memISA, are used upon three brain expression datasets. The results are compared on speed, gene coverage and GO enrichment. The effects of combining the clusters produced by each method are also assessed.</p> <p>Results</p> <p>k-means outperforms the other methods, with 100% gene coverage and GO enrichments only slightly exceeded by memISA and ISA. Those two methods produce greater GO enrichments on the datasets used, but at the cost of much lower gene coverage, fewer clusters produced, and speed. The clusters they find are largely different to those produced by k-means. Combining clusters produced by k-means and memISA or ISA leads to increased GO enrichment and number of clusters produced (compared to k-means alone), without negatively impacting gene coverage. memISA can also find potentially disease-related clusters. In two independent dorsolateral prefrontal cortex datasets, it finds three overlapping clusters that are either enriched for genes associated with schizophrenia, genes differentially expressed in schizophrenia, or both. Two of these clusters are enriched for genes of the MAP kinase pathway, suggesting a possible role for this pathway in the aetiology of schizophrenia.</p> <p>Conclusion</p> <p>Considered alone, k-means clustering is the most effective of the four methods on typical microarray brain expression datasets. However, memISA and ISA can add extra high-quality clusters to the set produced by k-means, so combining these three methods is the method of choice.</p>
first_indexed 2024-12-11T12:44:18Z
format Article
id doaj.art-3cb30db47f16460984c1b7341f84ded7
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-11T12:44:18Z
publishDate 2008-11-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-3cb30db47f16460984c1b7341f84ded72022-12-22T01:06:52ZengBMCBMC Bioinformatics1471-21052008-11-019149010.1186/1471-2105-9-490A comparison of four clustering methods for brain expression microarray dataOwen Michael JO'Donovan Michael CHolmans PeterRichards Alexander LJones Lesley<p>Abstract</p> <p>Background</p> <p>DNA microarrays, which determine the expression levels of tens of thousands of genes from a sample, are an important research tool. However, the volume of data they produce can be an obstacle to interpretation of the results. Clustering the genes on the basis of similarity of their expression profiles can simplify the data, and potentially provides an important source of biological inference, but these methods have not been tested systematically on datasets from complex human tissues. In this paper, four clustering methods, CRC, k-means, ISA and memISA, are used upon three brain expression datasets. The results are compared on speed, gene coverage and GO enrichment. The effects of combining the clusters produced by each method are also assessed.</p> <p>Results</p> <p>k-means outperforms the other methods, with 100% gene coverage and GO enrichments only slightly exceeded by memISA and ISA. Those two methods produce greater GO enrichments on the datasets used, but at the cost of much lower gene coverage, fewer clusters produced, and speed. The clusters they find are largely different to those produced by k-means. Combining clusters produced by k-means and memISA or ISA leads to increased GO enrichment and number of clusters produced (compared to k-means alone), without negatively impacting gene coverage. memISA can also find potentially disease-related clusters. In two independent dorsolateral prefrontal cortex datasets, it finds three overlapping clusters that are either enriched for genes associated with schizophrenia, genes differentially expressed in schizophrenia, or both. Two of these clusters are enriched for genes of the MAP kinase pathway, suggesting a possible role for this pathway in the aetiology of schizophrenia.</p> <p>Conclusion</p> <p>Considered alone, k-means clustering is the most effective of the four methods on typical microarray brain expression datasets. However, memISA and ISA can add extra high-quality clusters to the set produced by k-means, so combining these three methods is the method of choice.</p>http://www.biomedcentral.com/1471-2105/9/490
spellingShingle Owen Michael J
O'Donovan Michael C
Holmans Peter
Richards Alexander L
Jones Lesley
A comparison of four clustering methods for brain expression microarray data
BMC Bioinformatics
title A comparison of four clustering methods for brain expression microarray data
title_full A comparison of four clustering methods for brain expression microarray data
title_fullStr A comparison of four clustering methods for brain expression microarray data
title_full_unstemmed A comparison of four clustering methods for brain expression microarray data
title_short A comparison of four clustering methods for brain expression microarray data
title_sort comparison of four clustering methods for brain expression microarray data
url http://www.biomedcentral.com/1471-2105/9/490
work_keys_str_mv AT owenmichaelj acomparisonoffourclusteringmethodsforbrainexpressionmicroarraydata
AT odonovanmichaelc acomparisonoffourclusteringmethodsforbrainexpressionmicroarraydata
AT holmanspeter acomparisonoffourclusteringmethodsforbrainexpressionmicroarraydata
AT richardsalexanderl acomparisonoffourclusteringmethodsforbrainexpressionmicroarraydata
AT joneslesley acomparisonoffourclusteringmethodsforbrainexpressionmicroarraydata
AT owenmichaelj comparisonoffourclusteringmethodsforbrainexpressionmicroarraydata
AT odonovanmichaelc comparisonoffourclusteringmethodsforbrainexpressionmicroarraydata
AT holmanspeter comparisonoffourclusteringmethodsforbrainexpressionmicroarraydata
AT richardsalexanderl comparisonoffourclusteringmethodsforbrainexpressionmicroarraydata
AT joneslesley comparisonoffourclusteringmethodsforbrainexpressionmicroarraydata