Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution
<p>Abstract</p> <p>Background</p> <p>In gene expression analysis, statistical tests for differential gene expression provide lists of candidate genes having, individually, a sufficiently low <it>p</it>-value. However, the interpretation of each single <it...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2008-09-01
|
Series: | BMC Bioinformatics |
Online Access: | http://www.biomedcentral.com/1471-2105/9/361 |
_version_ | 1811298529346322432 |
---|---|
author | Kleinjans Jos van Delft Joost Bonassi Stefano Gmuender Hans van Leeuwen Danitsja Moretti Stefano Patrone Fioravante Merlo Domenico |
author_facet | Kleinjans Jos van Delft Joost Bonassi Stefano Gmuender Hans van Leeuwen Danitsja Moretti Stefano Patrone Fioravante Merlo Domenico |
author_sort | Kleinjans Jos |
collection | DOAJ |
description | <p>Abstract</p> <p>Background</p> <p>In gene expression analysis, statistical tests for differential gene expression provide lists of candidate genes having, individually, a sufficiently low <it>p</it>-value. However, the interpretation of each single <it>p</it>-value within complex systems involving several interacting genes is problematic. In parallel, in the last sixty years, <it>game theory </it>has been applied to political and social problems to assess the power of interacting agents in forcing a decision and, more recently, to represent the relevance of genes in response to certain conditions.</p> <p>Results</p> <p>In this paper we introduce a Bootstrap procedure to test the null hypothesis that each gene has the same relevance between two conditions, where the relevance is represented by the Shapley value of a particular coalitional game defined on a microarray data-set. This method, which is called <it>Comparative Analysis of Shapley value </it>(shortly, CASh), is applied to data concerning the gene expression in children differentially exposed to air pollution. The results provided by CASh are compared with the results from a parametric statistical test for testing differential gene expression. Both lists of genes provided by CASh and t-test are informative enough to discriminate exposed subjects on the basis of their gene expression profiles. While many genes are selected in common by CASh and the parametric test, it turns out that the biological interpretation of the differences between these two selections is more interesting, suggesting a different interpretation of the main biological pathways in gene expression regulation for exposed individuals. A simulation study suggests that CASh offers more power than t-test for the detection of differential gene expression variability.</p> <p>Conclusion</p> <p>CASh is successfully applied to gene expression analysis of a data-set where the joint expression behavior of genes may be critical to characterize the expression response to air pollution. We demonstrate a synergistic effect between coalitional games and statistics that resulted in a selection of genes with a potential impact in the regulation of complex pathways.</p> |
first_indexed | 2024-04-13T06:20:41Z |
format | Article |
id | doaj.art-24fbb2c234fb4b8b9b2e5b23b390e2ac |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-04-13T06:20:41Z |
publishDate | 2008-09-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-24fbb2c234fb4b8b9b2e5b23b390e2ac2022-12-22T02:58:38ZengBMCBMC Bioinformatics1471-21052008-09-019136110.1186/1471-2105-9-361Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollutionKleinjans Josvan Delft JoostBonassi StefanoGmuender Hansvan Leeuwen DanitsjaMoretti StefanoPatrone FioravanteMerlo Domenico<p>Abstract</p> <p>Background</p> <p>In gene expression analysis, statistical tests for differential gene expression provide lists of candidate genes having, individually, a sufficiently low <it>p</it>-value. However, the interpretation of each single <it>p</it>-value within complex systems involving several interacting genes is problematic. In parallel, in the last sixty years, <it>game theory </it>has been applied to political and social problems to assess the power of interacting agents in forcing a decision and, more recently, to represent the relevance of genes in response to certain conditions.</p> <p>Results</p> <p>In this paper we introduce a Bootstrap procedure to test the null hypothesis that each gene has the same relevance between two conditions, where the relevance is represented by the Shapley value of a particular coalitional game defined on a microarray data-set. This method, which is called <it>Comparative Analysis of Shapley value </it>(shortly, CASh), is applied to data concerning the gene expression in children differentially exposed to air pollution. The results provided by CASh are compared with the results from a parametric statistical test for testing differential gene expression. Both lists of genes provided by CASh and t-test are informative enough to discriminate exposed subjects on the basis of their gene expression profiles. While many genes are selected in common by CASh and the parametric test, it turns out that the biological interpretation of the differences between these two selections is more interesting, suggesting a different interpretation of the main biological pathways in gene expression regulation for exposed individuals. A simulation study suggests that CASh offers more power than t-test for the detection of differential gene expression variability.</p> <p>Conclusion</p> <p>CASh is successfully applied to gene expression analysis of a data-set where the joint expression behavior of genes may be critical to characterize the expression response to air pollution. We demonstrate a synergistic effect between coalitional games and statistics that resulted in a selection of genes with a potential impact in the regulation of complex pathways.</p>http://www.biomedcentral.com/1471-2105/9/361 |
spellingShingle | Kleinjans Jos van Delft Joost Bonassi Stefano Gmuender Hans van Leeuwen Danitsja Moretti Stefano Patrone Fioravante Merlo Domenico Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution BMC Bioinformatics |
title | Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution |
title_full | Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution |
title_fullStr | Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution |
title_full_unstemmed | Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution |
title_short | Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution |
title_sort | combining shapley value and statistics to the analysis of gene expression data in children exposed to air pollution |
url | http://www.biomedcentral.com/1471-2105/9/361 |
work_keys_str_mv | AT kleinjansjos combiningshapleyvalueandstatisticstotheanalysisofgeneexpressiondatainchildrenexposedtoairpollution AT vandelftjoost combiningshapleyvalueandstatisticstotheanalysisofgeneexpressiondatainchildrenexposedtoairpollution AT bonassistefano combiningshapleyvalueandstatisticstotheanalysisofgeneexpressiondatainchildrenexposedtoairpollution AT gmuenderhans combiningshapleyvalueandstatisticstotheanalysisofgeneexpressiondatainchildrenexposedtoairpollution AT vanleeuwendanitsja combiningshapleyvalueandstatisticstotheanalysisofgeneexpressiondatainchildrenexposedtoairpollution AT morettistefano combiningshapleyvalueandstatisticstotheanalysisofgeneexpressiondatainchildrenexposedtoairpollution AT patronefioravante combiningshapleyvalueandstatisticstotheanalysisofgeneexpressiondatainchildrenexposedtoairpollution AT merlodomenico combiningshapleyvalueandstatisticstotheanalysisofgeneexpressiondatainchildrenexposedtoairpollution |