Genotype imputation using the Positional Burrows Wheeler Transform.

Genotype imputation is the process of predicting unobserved genotypes in a sample of individuals using a reference panel of haplotypes. In the last 10 years reference panels have increased in size by more than 100 fold. Increasing reference panel size improves accuracy of markers with low minor alle...

Full description

Bibliographic Details
Main Authors:	Simone Rubinacci, Olivier Delaneau, Jonathan Marchini
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2020-11-01
Series:	PLoS Genetics
Online Access:	https://doi.org/10.1371/journal.pgen.1009049

_version_	1818579324627845120
author	Simone Rubinacci Olivier Delaneau Jonathan Marchini
author_facet	Simone Rubinacci Olivier Delaneau Jonathan Marchini
author_sort	Simone Rubinacci
collection	DOAJ
description	Genotype imputation is the process of predicting unobserved genotypes in a sample of individuals using a reference panel of haplotypes. In the last 10 years reference panels have increased in size by more than 100 fold. Increasing reference panel size improves accuracy of markers with low minor allele frequencies but poses ever increasing computational challenges for imputation methods. Here we present IMPUTE5, a genotype imputation method that can scale to reference panels with millions of samples. This method continues to refine the observation made in the IMPUTE2 method, that accuracy is optimized via use of a custom subset of haplotypes when imputing each individual. It achieves fast, accurate, and memory-efficient imputation by selecting haplotypes using the Positional Burrows Wheeler Transform (PBWT). By using the PBWT data structure at genotyped markers, IMPUTE5 identifies locally best matching haplotypes and long identical by state segments. The method then uses the selected haplotypes as conditioning states within the IMPUTE model. Using the HRC reference panel, which has ∼65,000 haplotypes, we show that IMPUTE5 is up to 30x faster than MINIMAC4 and up to 3x faster than BEAGLE5.1, and uses less memory than both these methods. Using simulated reference panels we show that IMPUTE5 scales sub-linearly with reference panel size. For example, keeping the number of imputed markers constant, increasing the reference panel size from 10,000 to 1 million haplotypes requires less than twice the computation time. As the reference panel increases in size IMPUTE5 is able to utilize a smaller number of reference haplotypes, thus reducing computational cost.
first_indexed	2024-12-16T06:59:54Z
format	Article
id	doaj.art-f8b8159f04d74309b2555712da98c901
institution	Directory Open Access Journal
issn	1553-7390 1553-7404
language	English
last_indexed	2024-12-16T06:59:54Z
publishDate	2020-11-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS Genetics
spelling	doaj.art-f8b8159f04d74309b2555712da98c9012022-12-21T22:40:12ZengPublic Library of Science (PLoS)PLoS Genetics1553-73901553-74042020-11-011611e100904910.1371/journal.pgen.1009049Genotype imputation using the Positional Burrows Wheeler Transform.Simone RubinacciOlivier DelaneauJonathan MarchiniGenotype imputation is the process of predicting unobserved genotypes in a sample of individuals using a reference panel of haplotypes. In the last 10 years reference panels have increased in size by more than 100 fold. Increasing reference panel size improves accuracy of markers with low minor allele frequencies but poses ever increasing computational challenges for imputation methods. Here we present IMPUTE5, a genotype imputation method that can scale to reference panels with millions of samples. This method continues to refine the observation made in the IMPUTE2 method, that accuracy is optimized via use of a custom subset of haplotypes when imputing each individual. It achieves fast, accurate, and memory-efficient imputation by selecting haplotypes using the Positional Burrows Wheeler Transform (PBWT). By using the PBWT data structure at genotyped markers, IMPUTE5 identifies locally best matching haplotypes and long identical by state segments. The method then uses the selected haplotypes as conditioning states within the IMPUTE model. Using the HRC reference panel, which has ∼65,000 haplotypes, we show that IMPUTE5 is up to 30x faster than MINIMAC4 and up to 3x faster than BEAGLE5.1, and uses less memory than both these methods. Using simulated reference panels we show that IMPUTE5 scales sub-linearly with reference panel size. For example, keeping the number of imputed markers constant, increasing the reference panel size from 10,000 to 1 million haplotypes requires less than twice the computation time. As the reference panel increases in size IMPUTE5 is able to utilize a smaller number of reference haplotypes, thus reducing computational cost.https://doi.org/10.1371/journal.pgen.1009049
spellingShingle	Simone Rubinacci Olivier Delaneau Jonathan Marchini Genotype imputation using the Positional Burrows Wheeler Transform. PLoS Genetics
title	Genotype imputation using the Positional Burrows Wheeler Transform.
title_full	Genotype imputation using the Positional Burrows Wheeler Transform.
title_fullStr	Genotype imputation using the Positional Burrows Wheeler Transform.
title_full_unstemmed	Genotype imputation using the Positional Burrows Wheeler Transform.
title_short	Genotype imputation using the Positional Burrows Wheeler Transform.
title_sort	genotype imputation using the positional burrows wheeler transform
url	https://doi.org/10.1371/journal.pgen.1009049
work_keys_str_mv	AT simonerubinacci genotypeimputationusingthepositionalburrowswheelertransform AT olivierdelaneau genotypeimputationusingthepositionalburrowswheelertransform AT jonathanmarchini genotypeimputationusingthepositionalburrowswheelertransform

Genotype imputation using the Positional Burrows Wheeler Transform.

Similar Items