NGS allele counts versus called genotypes for testing genetic association

RNA sequence data are commonly summarized as read counts. By contrast, so far there is no alternative to genotype calling for investigating the relationship between genetic variants determined by next-generation sequencing (NGS) and a phenotype of interest. Here we propose and evaluate the direct an...

Full description

Bibliographic Details
Main Authors: Rosa González Silos, Christine Fischer, Justo Lorenzo Bermejo
Format: Article
Language:English
Published: Elsevier 2022-01-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2001037022002951
_version_ 1797978234337361920
author Rosa González Silos
Christine Fischer
Justo Lorenzo Bermejo
author_facet Rosa González Silos
Christine Fischer
Justo Lorenzo Bermejo
author_sort Rosa González Silos
collection DOAJ
description RNA sequence data are commonly summarized as read counts. By contrast, so far there is no alternative to genotype calling for investigating the relationship between genetic variants determined by next-generation sequencing (NGS) and a phenotype of interest. Here we propose and evaluate the direct analysis of allele counts for genetic association tests. Specifically, we assess the potential advantage of the ratio of alternative allele counts to the total number of reads aligned at a specific position of the genome (coverage) over called genotypes. We simulated association studies based on NGS data from HapMap individuals. Genotype quality scores and allele counts were simulated using NGS data from the Personal Genome Project. Real data from the 1000 Genomes Project was also used to compare the two competing approaches. The average proportions of probability values lower or equal to 0.05 amounted to 0.0496 for called genotypes and 0.0485 for the ratio of alternative allele counts to coverage in the null scenario, and to 0.69 for called genotypes and 0.75 for the ratio of alternative allele counts to coverage in the alternative scenario (9% power increase). The advantage in statistical power of the novel approach increased with decreasing coverage, with decreasing genotype quality and with decreasing allele frequency – 124% power increase for variants with a minor allele frequency lower than 0.05. We provide computer code in R to implement the novel approach, which does not preclude the use of complementary data quality filters before or after identification of the most promising association signals. Author summary: Genetic association tests usually rely on called genotypes. We postulate here that the direct analysis of allele counts from sequence data improves the quality of statistical inference. To evaluate this hypothesis, we investigate simulated and real data using distinct statistical approaches. We demonstrate that association tests based on allele counts rather than called genotypes achieve higher statistical power with controlled type I error rates.
first_indexed 2024-04-11T05:19:44Z
format Article
id doaj.art-a150a0dc7f72477ea63ddeccb9beef9f
institution Directory Open Access Journal
issn 2001-0370
language English
last_indexed 2024-04-11T05:19:44Z
publishDate 2022-01-01
publisher Elsevier
record_format Article
series Computational and Structural Biotechnology Journal
spelling doaj.art-a150a0dc7f72477ea63ddeccb9beef9f2022-12-24T04:53:25ZengElsevierComputational and Structural Biotechnology Journal2001-03702022-01-012037293733NGS allele counts versus called genotypes for testing genetic associationRosa González Silos0Christine Fischer1Justo Lorenzo Bermejo2Institute of Medical Biometry, University of Heidelberg, 69120, GermanyInstitute of Human Genetics, University of Heidelberg, 69120, GermanyInstitute of Medical Biometry, University of Heidelberg, 69120, Germany; Corresponding author.RNA sequence data are commonly summarized as read counts. By contrast, so far there is no alternative to genotype calling for investigating the relationship between genetic variants determined by next-generation sequencing (NGS) and a phenotype of interest. Here we propose and evaluate the direct analysis of allele counts for genetic association tests. Specifically, we assess the potential advantage of the ratio of alternative allele counts to the total number of reads aligned at a specific position of the genome (coverage) over called genotypes. We simulated association studies based on NGS data from HapMap individuals. Genotype quality scores and allele counts were simulated using NGS data from the Personal Genome Project. Real data from the 1000 Genomes Project was also used to compare the two competing approaches. The average proportions of probability values lower or equal to 0.05 amounted to 0.0496 for called genotypes and 0.0485 for the ratio of alternative allele counts to coverage in the null scenario, and to 0.69 for called genotypes and 0.75 for the ratio of alternative allele counts to coverage in the alternative scenario (9% power increase). The advantage in statistical power of the novel approach increased with decreasing coverage, with decreasing genotype quality and with decreasing allele frequency – 124% power increase for variants with a minor allele frequency lower than 0.05. We provide computer code in R to implement the novel approach, which does not preclude the use of complementary data quality filters before or after identification of the most promising association signals. Author summary: Genetic association tests usually rely on called genotypes. We postulate here that the direct analysis of allele counts from sequence data improves the quality of statistical inference. To evaluate this hypothesis, we investigate simulated and real data using distinct statistical approaches. We demonstrate that association tests based on allele counts rather than called genotypes achieve higher statistical power with controlled type I error rates.http://www.sciencedirect.com/science/article/pii/S2001037022002951Genotype callingNext generation sequencingAllele countsGenetic association testsStatistical power
spellingShingle Rosa González Silos
Christine Fischer
Justo Lorenzo Bermejo
NGS allele counts versus called genotypes for testing genetic association
Computational and Structural Biotechnology Journal
Genotype calling
Next generation sequencing
Allele counts
Genetic association tests
Statistical power
title NGS allele counts versus called genotypes for testing genetic association
title_full NGS allele counts versus called genotypes for testing genetic association
title_fullStr NGS allele counts versus called genotypes for testing genetic association
title_full_unstemmed NGS allele counts versus called genotypes for testing genetic association
title_short NGS allele counts versus called genotypes for testing genetic association
title_sort ngs allele counts versus called genotypes for testing genetic association
topic Genotype calling
Next generation sequencing
Allele counts
Genetic association tests
Statistical power
url http://www.sciencedirect.com/science/article/pii/S2001037022002951
work_keys_str_mv AT rosagonzalezsilos ngsallelecountsversuscalledgenotypesfortestinggeneticassociation
AT christinefischer ngsallelecountsversuscalledgenotypesfortestinggeneticassociation
AT justolorenzobermejo ngsallelecountsversuscalledgenotypesfortestinggeneticassociation