Combining gene expression data from different generations of oligonucleotide arrays

<p>Abstract</p> <p>Background</p> <p>One of the important challenges in microarray analysis is to take full advantage of previously accumulated data, both from one's own laboratory and from public repositories. Through a comparative analysis on a variety of dataset...

Full description

Bibliographic Details
Main Authors: Kong Sek, Hwang Kyu-Baek, Greenberg Steve A, Park Peter J
Format: Article
Language:English
Published: BMC 2004-10-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/5/159
_version_ 1811283542359932928
author Kong Sek
Hwang Kyu-Baek
Greenberg Steve A
Park Peter J
author_facet Kong Sek
Hwang Kyu-Baek
Greenberg Steve A
Park Peter J
author_sort Kong Sek
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>One of the important challenges in microarray analysis is to take full advantage of previously accumulated data, both from one's own laboratory and from public repositories. Through a comparative analysis on a variety of datasets, a more comprehensive view of the underlying mechanism or structure can be obtained. However, as we discover in this work, continual changes in genomic sequence annotations and probe design criteria make it difficult to compare gene expression data even from different generations of the same microarray platform.</p> <p>Results</p> <p>We first describe the extent of discordance between the results derived from two generations of Affymetrix oligonucleotide arrays, as revealed in cluster analysis and in identification of differentially expressed genes. We then propose a method for increasing comparability. The dataset we use consists of a set of 14 human muscle biopsy samples from patients with inflammatory myopathies that were hybridized on both HG-U95Av2 and HG-U133A human arrays. We find that the use of the probe set matching table for comparative analysis provided by Affymetrix produces better results than matching by UniGene or LocusLink identifiers but still remains inadequate. Rescaling of expression values for each gene across samples and data filtering by expression values enhance comparability but only for few specific analyses. As a generic method for improving comparability, we select a subset of probes with overlapping sequence segments in the two array types and recalculate expression values based only on the selected probes. We show that this filtering of probes significantly improves the comparability while retaining a sufficient number of probe sets for further analysis.</p> <p>Conclusions</p> <p>Compatibility between high-density oligonucleotide arrays is significantly affected by probe-level sequence information. With a careful filtering of the probes based on their sequence overlaps, data from different generations of microarrays can be combined more effectively.</p>
first_indexed 2024-04-13T02:13:22Z
format Article
id doaj.art-21c18a3234534e6daabaa3e8f9451a3e
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-13T02:13:22Z
publishDate 2004-10-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-21c18a3234534e6daabaa3e8f9451a3e2022-12-22T03:07:13ZengBMCBMC Bioinformatics1471-21052004-10-015115910.1186/1471-2105-5-159Combining gene expression data from different generations of oligonucleotide arraysKong SekHwang Kyu-BaekGreenberg Steve APark Peter J<p>Abstract</p> <p>Background</p> <p>One of the important challenges in microarray analysis is to take full advantage of previously accumulated data, both from one's own laboratory and from public repositories. Through a comparative analysis on a variety of datasets, a more comprehensive view of the underlying mechanism or structure can be obtained. However, as we discover in this work, continual changes in genomic sequence annotations and probe design criteria make it difficult to compare gene expression data even from different generations of the same microarray platform.</p> <p>Results</p> <p>We first describe the extent of discordance between the results derived from two generations of Affymetrix oligonucleotide arrays, as revealed in cluster analysis and in identification of differentially expressed genes. We then propose a method for increasing comparability. The dataset we use consists of a set of 14 human muscle biopsy samples from patients with inflammatory myopathies that were hybridized on both HG-U95Av2 and HG-U133A human arrays. We find that the use of the probe set matching table for comparative analysis provided by Affymetrix produces better results than matching by UniGene or LocusLink identifiers but still remains inadequate. Rescaling of expression values for each gene across samples and data filtering by expression values enhance comparability but only for few specific analyses. As a generic method for improving comparability, we select a subset of probes with overlapping sequence segments in the two array types and recalculate expression values based only on the selected probes. We show that this filtering of probes significantly improves the comparability while retaining a sufficient number of probe sets for further analysis.</p> <p>Conclusions</p> <p>Compatibility between high-density oligonucleotide arrays is significantly affected by probe-level sequence information. With a careful filtering of the probes based on their sequence overlaps, data from different generations of microarrays can be combined more effectively.</p>http://www.biomedcentral.com/1471-2105/5/159
spellingShingle Kong Sek
Hwang Kyu-Baek
Greenberg Steve A
Park Peter J
Combining gene expression data from different generations of oligonucleotide arrays
BMC Bioinformatics
title Combining gene expression data from different generations of oligonucleotide arrays
title_full Combining gene expression data from different generations of oligonucleotide arrays
title_fullStr Combining gene expression data from different generations of oligonucleotide arrays
title_full_unstemmed Combining gene expression data from different generations of oligonucleotide arrays
title_short Combining gene expression data from different generations of oligonucleotide arrays
title_sort combining gene expression data from different generations of oligonucleotide arrays
url http://www.biomedcentral.com/1471-2105/5/159
work_keys_str_mv AT kongsek combininggeneexpressiondatafromdifferentgenerationsofoligonucleotidearrays
AT hwangkyubaek combininggeneexpressiondatafromdifferentgenerationsofoligonucleotidearrays
AT greenbergstevea combininggeneexpressiondatafromdifferentgenerationsofoligonucleotidearrays
AT parkpeterj combininggeneexpressiondatafromdifferentgenerationsofoligonucleotidearrays