Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation

High coverage whole genome sequencing provides near complete information about genetic variation. However, other technologies can be more efficient in some settings by (a) reducing redundant coverage within samples and (b) exploiting patterns of genetic variation across samples. To characterize as m...

Full description

Bibliographic Details
Main Authors: Flannick, Jason, Korn, Joshua M., Fontanillas, Pierre, Grant, George B., Depristo, Mark A., Altshuler, David, Banks, Eric, 1976-
Other Authors: Harvard University--MIT Division of Health Sciences and Technology
Format: Article
Language:en_US
Published: Public Library of Science 2012
Online Access:http://hdl.handle.net/1721.1/72418
https://orcid.org/0000-0002-7250-4107
_version_ 1826205027188867072
author Flannick, Jason
Korn, Joshua M.
Fontanillas, Pierre
Grant, George B.
Depristo, Mark A.
Altshuler, David
Banks, Eric, 1976-
author2 Harvard University--MIT Division of Health Sciences and Technology
author_facet Harvard University--MIT Division of Health Sciences and Technology
Flannick, Jason
Korn, Joshua M.
Fontanillas, Pierre
Grant, George B.
Depristo, Mark A.
Altshuler, David
Banks, Eric, 1976-
author_sort Flannick, Jason
collection MIT
description High coverage whole genome sequencing provides near complete information about genetic variation. However, other technologies can be more efficient in some settings by (a) reducing redundant coverage within samples and (b) exploiting patterns of genetic variation across samples. To characterize as many samples as possible, many genetic studies therefore employ lower coverage sequencing or SNP array genotyping coupled to statistical imputation. To compare these approaches individually and in conjunction, we developed a statistical framework to estimate genotypes jointly from sequence reads, array intensities, and imputation. In European samples, we find similar sensitivity (89%) and specificity (99.6%) from imputation with either 1× sequencing or 1 M SNP arrays. Sensitivity is increased, particularly for low-frequency polymorphisms (MAF <5%), when low coverage sequence reads are added to dense genome-wide SNP arrays — the converse, however, is not true. At sites where sequence reads and array intensities produce different sample genotypes, joint analysis reduces genotype errors and identifies novel error modes. Our joint framework informs the use of next-generation sequencing in genome wide association studies and supports development of improved methods for genotype calling.
first_indexed 2024-09-23T13:05:54Z
format Article
id mit-1721.1/72418
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T13:05:54Z
publishDate 2012
publisher Public Library of Science
record_format dspace
spelling mit-1721.1/724182022-09-28T11:58:50Z Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation Flannick, Jason Korn, Joshua M. Fontanillas, Pierre Grant, George B. Depristo, Mark A. Altshuler, David Banks, Eric, 1976- Harvard University--MIT Division of Health Sciences and Technology Massachusetts Institute of Technology. Department of Biology Altshuler, David Korn, Joshua M. Altshuler, David High coverage whole genome sequencing provides near complete information about genetic variation. However, other technologies can be more efficient in some settings by (a) reducing redundant coverage within samples and (b) exploiting patterns of genetic variation across samples. To characterize as many samples as possible, many genetic studies therefore employ lower coverage sequencing or SNP array genotyping coupled to statistical imputation. To compare these approaches individually and in conjunction, we developed a statistical framework to estimate genotypes jointly from sequence reads, array intensities, and imputation. In European samples, we find similar sensitivity (89%) and specificity (99.6%) from imputation with either 1× sequencing or 1 M SNP arrays. Sensitivity is increased, particularly for low-frequency polymorphisms (MAF <5%), when low coverage sequence reads are added to dense genome-wide SNP arrays — the converse, however, is not true. At sites where sequence reads and array intensities produce different sample genotypes, joint analysis reduces genotype errors and identifies novel error modes. Our joint framework informs the use of next-generation sequencing in genome wide association studies and supports development of improved methods for genotype calling. 2012-08-29T15:25:17Z 2012-08-29T15:25:17Z 2012-07 2012-03 Article http://purl.org/eprint/type/JournalArticle 1553-734X 1553-7358 http://hdl.handle.net/1721.1/72418 Flannick, Jason et al. “Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation.” Ed. Jan Korbel. PLoS Computational Biology 8.7 (2012): e1002604. https://orcid.org/0000-0002-7250-4107 en_US http://dx.doi.org/10.1371/journal.pcbi.1002604 PLoS Computational Biology Creative Commons Attribution http://creativecommons.org/licenses/by/2.5/ application/pdf Public Library of Science PLoS
spellingShingle Flannick, Jason
Korn, Joshua M.
Fontanillas, Pierre
Grant, George B.
Depristo, Mark A.
Altshuler, David
Banks, Eric, 1976-
Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation
title Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation
title_full Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation
title_fullStr Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation
title_full_unstemmed Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation
title_short Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation
title_sort efficiency and power as a function of sequence coverage snp array density and imputation
url http://hdl.handle.net/1721.1/72418
https://orcid.org/0000-0002-7250-4107
work_keys_str_mv AT flannickjason efficiencyandpowerasafunctionofsequencecoveragesnparraydensityandimputation
AT kornjoshuam efficiencyandpowerasafunctionofsequencecoveragesnparraydensityandimputation
AT fontanillaspierre efficiencyandpowerasafunctionofsequencecoveragesnparraydensityandimputation
AT grantgeorgeb efficiencyandpowerasafunctionofsequencecoveragesnparraydensityandimputation
AT depristomarka efficiencyandpowerasafunctionofsequencecoveragesnparraydensityandimputation
AT altshulerdavid efficiencyandpowerasafunctionofsequencecoveragesnparraydensityandimputation
AT bankseric1976 efficiencyandpowerasafunctionofsequencecoveragesnparraydensityandimputation