Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges

Over the last decade, gene set analysis has become the first choice for gaining insights into underlying complex biology of diseases through gene expression and gene association studies. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results...

Full description

Bibliographic Details
Main Authors: Samarendra Das, Craig J. McClain, Shesh N. Rai
Format: Article
Language:English
Published: MDPI AG 2020-04-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/22/4/427
_version_ 1797571000469028864
author Samarendra Das
Craig J. McClain
Shesh N. Rai
author_facet Samarendra Das
Craig J. McClain
Shesh N. Rai
author_sort Samarendra Das
collection DOAJ
description Over the last decade, gene set analysis has become the first choice for gaining insights into underlying complex biology of diseases through gene expression and gene association studies. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Although gene set analysis approaches are extensively used in gene expression and genome wide association data analysis, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. In this article, we provide a comprehensive overview, statistical structure and steps of gene set analysis approaches used for microarrays, RNA-sequencing and genome wide association data analysis. Further, we also classify the gene set analysis approaches and tools by the type of genomic study, null hypothesis, sampling model and nature of the test statistic, etc. Rather than reviewing the gene set analysis approaches individually, we provide the generation-wise evolution of such approaches for microarrays, RNA-sequencing and genome wide association studies and discuss their relative merits and limitations. Here, we identify the key biological and statistical challenges in current gene set analysis, which will be addressed by statisticians and biologists collectively in order to develop the next generation of gene set analysis approaches. Further, this study will serve as a catalog and provide guidelines to genome researchers and experimental biologists for choosing the proper gene set analysis approach based on several factors.
first_indexed 2024-03-10T20:33:23Z
format Article
id doaj.art-d9ae74a811be4d41bfc7e8d1595eb177
institution Directory Open Access Journal
issn 1099-4300
language English
last_indexed 2024-03-10T20:33:23Z
publishDate 2020-04-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj.art-d9ae74a811be4d41bfc7e8d1595eb1772023-11-19T21:14:22ZengMDPI AGEntropy1099-43002020-04-0122442710.3390/e22040427Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future ChallengesSamarendra Das0Craig J. McClain1Shesh N. Rai2Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, IndiaDepartment of Medicine, University of Louisville, Louisville, KY 40202, USASchool of Interdisciplinary and Graduate Studies, University of Louisville, Louisville, KY 40292, USAOver the last decade, gene set analysis has become the first choice for gaining insights into underlying complex biology of diseases through gene expression and gene association studies. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Although gene set analysis approaches are extensively used in gene expression and genome wide association data analysis, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. In this article, we provide a comprehensive overview, statistical structure and steps of gene set analysis approaches used for microarrays, RNA-sequencing and genome wide association data analysis. Further, we also classify the gene set analysis approaches and tools by the type of genomic study, null hypothesis, sampling model and nature of the test statistic, etc. Rather than reviewing the gene set analysis approaches individually, we provide the generation-wise evolution of such approaches for microarrays, RNA-sequencing and genome wide association studies and discuss their relative merits and limitations. Here, we identify the key biological and statistical challenges in current gene set analysis, which will be addressed by statisticians and biologists collectively in order to develop the next generation of gene set analysis approaches. Further, this study will serve as a catalog and provide guidelines to genome researchers and experimental biologists for choosing the proper gene set analysis approach based on several factors.https://www.mdpi.com/1099-4300/22/4/427gene set analysismicroarraysRNA-sequencinggenome wide association studycompetitiveself-contained
spellingShingle Samarendra Das
Craig J. McClain
Shesh N. Rai
Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges
Entropy
gene set analysis
microarrays
RNA-sequencing
genome wide association study
competitive
self-contained
title Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges
title_full Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges
title_fullStr Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges
title_full_unstemmed Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges
title_short Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges
title_sort fifteen years of gene set analysis for high throughput genomic data a review of statistical approaches and future challenges
topic gene set analysis
microarrays
RNA-sequencing
genome wide association study
competitive
self-contained
url https://www.mdpi.com/1099-4300/22/4/427
work_keys_str_mv AT samarendradas fifteenyearsofgenesetanalysisforhighthroughputgenomicdataareviewofstatisticalapproachesandfuturechallenges
AT craigjmcclain fifteenyearsofgenesetanalysisforhighthroughputgenomicdataareviewofstatisticalapproachesandfuturechallenges
AT sheshnrai fifteenyearsofgenesetanalysisforhighthroughputgenomicdataareviewofstatisticalapproachesandfuturechallenges