A robust clustering algorithm for identifying problematic samples in genome-wide association studies
High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequen...
Main Authors: | , , , , , |
---|---|
Other Authors: | |
Format: | Journal article |
Language: | English |
Published: |
Oxford University Press
2012
|
Subjects: |
_version_ | 1797085847030333440 |
---|---|
author | Bellenguez, C Strange, A Freeman, C Wellcome Trust Case Control Consortium 2 Donnelly, P Spencer, C |
author2 | The International Society for Computational Biology |
author_facet | The International Society for Computational Biology Bellenguez, C Strange, A Freeman, C Wellcome Trust Case Control Consortium 2 Donnelly, P Spencer, C |
author_sort | Bellenguez, C |
collection | OXFORD |
description | High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental array can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become standard practice to remove individuals whose genome-wide data differs from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections. |
first_indexed | 2024-03-07T02:13:43Z |
format | Journal article |
id | oxford-uuid:a18401ce-7a9b-43b1-9cce-235e40300b2c |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-07T02:13:43Z |
publishDate | 2012 |
publisher | Oxford University Press |
record_format | dspace |
spelling | oxford-uuid:a18401ce-7a9b-43b1-9cce-235e40300b2c2022-03-27T02:13:46ZA robust clustering algorithm for identifying problematic samples in genome-wide association studiesJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:a18401ce-7a9b-43b1-9cce-235e40300b2cStatistics (see also social sciences)Genetics (medical sciences)EnglishOxford University Research Archive - ValetOxford University Press2012Bellenguez, CStrange, AFreeman, CWellcome Trust Case Control Consortium 2Donnelly, PSpencer, CThe International Society for Computational BiologyHigh-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental array can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become standard practice to remove individuals whose genome-wide data differs from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections. |
spellingShingle | Statistics (see also social sciences) Genetics (medical sciences) Bellenguez, C Strange, A Freeman, C Wellcome Trust Case Control Consortium 2 Donnelly, P Spencer, C A robust clustering algorithm for identifying problematic samples in genome-wide association studies |
title | A robust clustering algorithm for identifying problematic samples in genome-wide association studies |
title_full | A robust clustering algorithm for identifying problematic samples in genome-wide association studies |
title_fullStr | A robust clustering algorithm for identifying problematic samples in genome-wide association studies |
title_full_unstemmed | A robust clustering algorithm for identifying problematic samples in genome-wide association studies |
title_short | A robust clustering algorithm for identifying problematic samples in genome-wide association studies |
title_sort | robust clustering algorithm for identifying problematic samples in genome wide association studies |
topic | Statistics (see also social sciences) Genetics (medical sciences) |
work_keys_str_mv | AT bellenguezc arobustclusteringalgorithmforidentifyingproblematicsamplesingenomewideassociationstudies AT strangea arobustclusteringalgorithmforidentifyingproblematicsamplesingenomewideassociationstudies AT freemanc arobustclusteringalgorithmforidentifyingproblematicsamplesingenomewideassociationstudies AT wellcometrustcasecontrolconsortium2 arobustclusteringalgorithmforidentifyingproblematicsamplesingenomewideassociationstudies AT donnellyp arobustclusteringalgorithmforidentifyingproblematicsamplesingenomewideassociationstudies AT spencerc arobustclusteringalgorithmforidentifyingproblematicsamplesingenomewideassociationstudies AT bellenguezc robustclusteringalgorithmforidentifyingproblematicsamplesingenomewideassociationstudies AT strangea robustclusteringalgorithmforidentifyingproblematicsamplesingenomewideassociationstudies AT freemanc robustclusteringalgorithmforidentifyingproblematicsamplesingenomewideassociationstudies AT wellcometrustcasecontrolconsortium2 robustclusteringalgorithmforidentifyingproblematicsamplesingenomewideassociationstudies AT donnellyp robustclusteringalgorithmforidentifyingproblematicsamplesingenomewideassociationstudies AT spencerc robustclusteringalgorithmforidentifyingproblematicsamplesingenomewideassociationstudies |