Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation
High coverage whole genome sequencing provides near complete information about genetic variation. However, other technologies can be more efficient in some settings by (a) reducing redundant coverage within samples and (b) exploiting patterns of genetic variation across samples. To characterize as m...
Main Authors: | , , , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
Public Library of Science
2012
|
Online Access: | http://hdl.handle.net/1721.1/72418 https://orcid.org/0000-0002-7250-4107 |
_version_ | 1826205027188867072 |
---|---|
author | Flannick, Jason Korn, Joshua M. Fontanillas, Pierre Grant, George B. Depristo, Mark A. Altshuler, David Banks, Eric, 1976- |
author2 | Harvard University--MIT Division of Health Sciences and Technology |
author_facet | Harvard University--MIT Division of Health Sciences and Technology Flannick, Jason Korn, Joshua M. Fontanillas, Pierre Grant, George B. Depristo, Mark A. Altshuler, David Banks, Eric, 1976- |
author_sort | Flannick, Jason |
collection | MIT |
description | High coverage whole genome sequencing provides near complete information about genetic variation. However, other technologies can be more efficient in some settings by (a) reducing redundant coverage within samples and (b) exploiting patterns of genetic variation across samples. To characterize as many samples as possible, many genetic studies therefore employ lower coverage sequencing or SNP array genotyping coupled to statistical imputation. To compare these approaches individually and in conjunction, we developed a statistical framework to estimate genotypes jointly from sequence reads, array intensities, and imputation. In European samples, we find similar sensitivity (89%) and specificity (99.6%) from imputation with either 1× sequencing or 1 M SNP arrays. Sensitivity is increased, particularly for low-frequency polymorphisms (MAF <5%), when low coverage sequence reads are added to dense genome-wide SNP arrays — the converse, however, is not true. At sites where sequence reads and array intensities produce different sample genotypes, joint analysis reduces genotype errors and identifies novel error modes. Our joint framework informs the use of next-generation sequencing in genome wide association studies and supports development of improved methods for genotype calling. |
first_indexed | 2024-09-23T13:05:54Z |
format | Article |
id | mit-1721.1/72418 |
institution | Massachusetts Institute of Technology |
language | en_US |
last_indexed | 2024-09-23T13:05:54Z |
publishDate | 2012 |
publisher | Public Library of Science |
record_format | dspace |
spelling | mit-1721.1/724182022-09-28T11:58:50Z Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation Flannick, Jason Korn, Joshua M. Fontanillas, Pierre Grant, George B. Depristo, Mark A. Altshuler, David Banks, Eric, 1976- Harvard University--MIT Division of Health Sciences and Technology Massachusetts Institute of Technology. Department of Biology Altshuler, David Korn, Joshua M. Altshuler, David High coverage whole genome sequencing provides near complete information about genetic variation. However, other technologies can be more efficient in some settings by (a) reducing redundant coverage within samples and (b) exploiting patterns of genetic variation across samples. To characterize as many samples as possible, many genetic studies therefore employ lower coverage sequencing or SNP array genotyping coupled to statistical imputation. To compare these approaches individually and in conjunction, we developed a statistical framework to estimate genotypes jointly from sequence reads, array intensities, and imputation. In European samples, we find similar sensitivity (89%) and specificity (99.6%) from imputation with either 1× sequencing or 1 M SNP arrays. Sensitivity is increased, particularly for low-frequency polymorphisms (MAF <5%), when low coverage sequence reads are added to dense genome-wide SNP arrays — the converse, however, is not true. At sites where sequence reads and array intensities produce different sample genotypes, joint analysis reduces genotype errors and identifies novel error modes. Our joint framework informs the use of next-generation sequencing in genome wide association studies and supports development of improved methods for genotype calling. 2012-08-29T15:25:17Z 2012-08-29T15:25:17Z 2012-07 2012-03 Article http://purl.org/eprint/type/JournalArticle 1553-734X 1553-7358 http://hdl.handle.net/1721.1/72418 Flannick, Jason et al. “Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation.” Ed. Jan Korbel. PLoS Computational Biology 8.7 (2012): e1002604. https://orcid.org/0000-0002-7250-4107 en_US http://dx.doi.org/10.1371/journal.pcbi.1002604 PLoS Computational Biology Creative Commons Attribution http://creativecommons.org/licenses/by/2.5/ application/pdf Public Library of Science PLoS |
spellingShingle | Flannick, Jason Korn, Joshua M. Fontanillas, Pierre Grant, George B. Depristo, Mark A. Altshuler, David Banks, Eric, 1976- Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation |
title | Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation |
title_full | Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation |
title_fullStr | Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation |
title_full_unstemmed | Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation |
title_short | Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation |
title_sort | efficiency and power as a function of sequence coverage snp array density and imputation |
url | http://hdl.handle.net/1721.1/72418 https://orcid.org/0000-0002-7250-4107 |
work_keys_str_mv | AT flannickjason efficiencyandpowerasafunctionofsequencecoveragesnparraydensityandimputation AT kornjoshuam efficiencyandpowerasafunctionofsequencecoveragesnparraydensityandimputation AT fontanillaspierre efficiencyandpowerasafunctionofsequencecoveragesnparraydensityandimputation AT grantgeorgeb efficiencyandpowerasafunctionofsequencecoveragesnparraydensityandimputation AT depristomarka efficiencyandpowerasafunctionofsequencecoveragesnparraydensityandimputation AT altshulerdavid efficiencyandpowerasafunctionofsequencecoveragesnparraydensityandimputation AT bankseric1976 efficiencyandpowerasafunctionofsequencecoveragesnparraydensityandimputation |