Alignment of gene expression profiles from test samples against a reference database: New method for context-specific interpretation of microarray data

<p>Abstract</p> <p>Background</p> <p>Gene expression microarray data have been organized and made available as public databases, but the utilization of such highly heterogeneous reference datasets in the interpretation of data from individual test samples is not as deve...

Full description

Bibliographic Details
Main Authors: Kilpinen Sami K, Ojala Kalle A, Kallioniemi Olli P
Format: Article
Language:English
Published: BMC 2011-03-01
Series:BioData Mining
Online Access:http://www.biodatamining.org/content/4/1/5
_version_ 1811297636444012544
author Kilpinen Sami K
Ojala Kalle A
Kallioniemi Olli P
author_facet Kilpinen Sami K
Ojala Kalle A
Kallioniemi Olli P
author_sort Kilpinen Sami K
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Gene expression microarray data have been organized and made available as public databases, but the utilization of such highly heterogeneous reference datasets in the interpretation of data from individual test samples is not as developed as e.g. in the field of nucleotide sequence comparisons. We have created a rapid and powerful approach for the alignment of microarray gene expression profiles (AGEP) from test samples with those contained in a large annotated public reference database and demonstrate here how this can facilitate interpretation of microarray data from individual samples.</p> <p>Methods</p> <p>AGEP is based on the calculation of kernel density distributions for the levels of expression of each gene in each reference tissue type and provides a quantitation of the similarity between the test sample and the reference tissue types as well as the identity of the typical and atypical genes in each comparison. As a reference database, we used 1654 samples from 44 normal tissues (extracted from the Genesapiens database).</p> <p>Results</p> <p>Using leave-one-out validation, AGEP correctly defined the tissue of origin for 1521 (93.6%) of all the 1654 samples in the original database. Independent validation of 195 external normal tissue samples resulted in 87% accuracy for the exact tissue type and 97% accuracy with related tissue types. AGEP analysis of 10 Duchenne muscular dystrophy (DMD) samples provided quantitative description of the key pathogenetic events, such as the extent of inflammation, in individual samples and pinpointed tissue-specific genes whose expression changed (<it>SAMD4A</it>) in DMD. AGEP analysis of microarray data from adipocytic differentiation of mesenchymal stem cells and from normal myeloid cell types and leukemias provided quantitative characterization of the transcriptomic changes during normal and abnormal cell differentiation.</p> <p>Conclusions</p> <p>The AGEP method is a widely applicable method for the rapid comprehensive interpretation of microarray data, as proven here by the definition of tissue- and disease-specific changes in gene expression as well as during cellular differentiation. The capability to quantitatively compare data from individual samples against a large-scale annotated reference database represents a widely applicable paradigm for the analysis of all types of high-throughput data. AGEP enables systematic and quantitative comparison of gene expression data from test samples against a comprehensive collection of different cell/tissue types previously studied by the entire research community.</p>
first_indexed 2024-04-13T06:08:07Z
format Article
id doaj.art-aa4d556dcf24497da30c797e90ec8cc5
institution Directory Open Access Journal
issn 1756-0381
language English
last_indexed 2024-04-13T06:08:07Z
publishDate 2011-03-01
publisher BMC
record_format Article
series BioData Mining
spelling doaj.art-aa4d556dcf24497da30c797e90ec8cc52022-12-22T02:59:11ZengBMCBioData Mining1756-03812011-03-0141510.1186/1756-0381-4-5Alignment of gene expression profiles from test samples against a reference database: New method for context-specific interpretation of microarray dataKilpinen Sami KOjala Kalle AKallioniemi Olli P<p>Abstract</p> <p>Background</p> <p>Gene expression microarray data have been organized and made available as public databases, but the utilization of such highly heterogeneous reference datasets in the interpretation of data from individual test samples is not as developed as e.g. in the field of nucleotide sequence comparisons. We have created a rapid and powerful approach for the alignment of microarray gene expression profiles (AGEP) from test samples with those contained in a large annotated public reference database and demonstrate here how this can facilitate interpretation of microarray data from individual samples.</p> <p>Methods</p> <p>AGEP is based on the calculation of kernel density distributions for the levels of expression of each gene in each reference tissue type and provides a quantitation of the similarity between the test sample and the reference tissue types as well as the identity of the typical and atypical genes in each comparison. As a reference database, we used 1654 samples from 44 normal tissues (extracted from the Genesapiens database).</p> <p>Results</p> <p>Using leave-one-out validation, AGEP correctly defined the tissue of origin for 1521 (93.6%) of all the 1654 samples in the original database. Independent validation of 195 external normal tissue samples resulted in 87% accuracy for the exact tissue type and 97% accuracy with related tissue types. AGEP analysis of 10 Duchenne muscular dystrophy (DMD) samples provided quantitative description of the key pathogenetic events, such as the extent of inflammation, in individual samples and pinpointed tissue-specific genes whose expression changed (<it>SAMD4A</it>) in DMD. AGEP analysis of microarray data from adipocytic differentiation of mesenchymal stem cells and from normal myeloid cell types and leukemias provided quantitative characterization of the transcriptomic changes during normal and abnormal cell differentiation.</p> <p>Conclusions</p> <p>The AGEP method is a widely applicable method for the rapid comprehensive interpretation of microarray data, as proven here by the definition of tissue- and disease-specific changes in gene expression as well as during cellular differentiation. The capability to quantitatively compare data from individual samples against a large-scale annotated reference database represents a widely applicable paradigm for the analysis of all types of high-throughput data. AGEP enables systematic and quantitative comparison of gene expression data from test samples against a comprehensive collection of different cell/tissue types previously studied by the entire research community.</p>http://www.biodatamining.org/content/4/1/5
spellingShingle Kilpinen Sami K
Ojala Kalle A
Kallioniemi Olli P
Alignment of gene expression profiles from test samples against a reference database: New method for context-specific interpretation of microarray data
BioData Mining
title Alignment of gene expression profiles from test samples against a reference database: New method for context-specific interpretation of microarray data
title_full Alignment of gene expression profiles from test samples against a reference database: New method for context-specific interpretation of microarray data
title_fullStr Alignment of gene expression profiles from test samples against a reference database: New method for context-specific interpretation of microarray data
title_full_unstemmed Alignment of gene expression profiles from test samples against a reference database: New method for context-specific interpretation of microarray data
title_short Alignment of gene expression profiles from test samples against a reference database: New method for context-specific interpretation of microarray data
title_sort alignment of gene expression profiles from test samples against a reference database new method for context specific interpretation of microarray data
url http://www.biodatamining.org/content/4/1/5
work_keys_str_mv AT kilpinensamik alignmentofgeneexpressionprofilesfromtestsamplesagainstareferencedatabasenewmethodforcontextspecificinterpretationofmicroarraydata
AT ojalakallea alignmentofgeneexpressionprofilesfromtestsamplesagainstareferencedatabasenewmethodforcontextspecificinterpretationofmicroarraydata
AT kallioniemiollip alignmentofgeneexpressionprofilesfromtestsamplesagainstareferencedatabasenewmethodforcontextspecificinterpretationofmicroarraydata