Opening the Black Box of Imputation Software to Study the Impact of Reference Panel Composition on Performance

Genotype imputation is widely used to enrich genetic datasets. The operation relies on panels of known reference haplotypes, typically with whole-genome sequencing data. How to choose a reference panel has been widely studied and it is essential to have a panel that is well matched to the individual...

Full description

Bibliographic Details
Main Authors: Thibault Dekeyser, Emmanuelle Génin, Anthony F. Herzig
Format: Article
Language:English
Published: MDPI AG 2023-02-01
Series:Genes
Subjects:
Online Access:https://www.mdpi.com/2073-4425/14/2/410
_version_ 1797620835448520704
author Thibault Dekeyser
Emmanuelle Génin
Anthony F. Herzig
author_facet Thibault Dekeyser
Emmanuelle Génin
Anthony F. Herzig
author_sort Thibault Dekeyser
collection DOAJ
description Genotype imputation is widely used to enrich genetic datasets. The operation relies on panels of known reference haplotypes, typically with whole-genome sequencing data. How to choose a reference panel has been widely studied and it is essential to have a panel that is well matched to the individuals who require missing genotype imputation. However, it is broadly accepted that such an imputation panel will have an enhanced performance with the inclusion of diversity (haplotypes from many different populations). We investigate this observation by examining, in fine detail, exactly which reference haplotypes are contributing at different regions of the genome. This is achieved using a novel method of inserting synthetic genetic variation into the reference panel in order to track the performance of leading imputation algorithms. We show that while diversity may globally improve imputation accuracy, there can be occasions where incorrect genotypes are imputed following the inclusion of more diverse haplotypes in the reference panel. We, however, demonstrate a technique for retaining and benefitting from the diversity in the reference panel whilst avoiding the occasional adverse effects on imputation accuracy. What is more, our results more clearly elucidate the role of diversity in a reference panel than has been shown in previous studies.
first_indexed 2024-03-11T08:47:14Z
format Article
id doaj.art-c0266717648f48e29257aeb2c615a72c
institution Directory Open Access Journal
issn 2073-4425
language English
last_indexed 2024-03-11T08:47:14Z
publishDate 2023-02-01
publisher MDPI AG
record_format Article
series Genes
spelling doaj.art-c0266717648f48e29257aeb2c615a72c2023-11-16T20:42:35ZengMDPI AGGenes2073-44252023-02-0114241010.3390/genes14020410Opening the Black Box of Imputation Software to Study the Impact of Reference Panel Composition on PerformanceThibault Dekeyser0Emmanuelle Génin1Anthony F. Herzig2Inserm, Université de Brest, EFS, UMR 1078, GGB, F-29200 Brest, FranceInserm, Université de Brest, EFS, UMR 1078, GGB, F-29200 Brest, FranceInserm, Université de Brest, EFS, UMR 1078, GGB, F-29200 Brest, FranceGenotype imputation is widely used to enrich genetic datasets. The operation relies on panels of known reference haplotypes, typically with whole-genome sequencing data. How to choose a reference panel has been widely studied and it is essential to have a panel that is well matched to the individuals who require missing genotype imputation. However, it is broadly accepted that such an imputation panel will have an enhanced performance with the inclusion of diversity (haplotypes from many different populations). We investigate this observation by examining, in fine detail, exactly which reference haplotypes are contributing at different regions of the genome. This is achieved using a novel method of inserting synthetic genetic variation into the reference panel in order to track the performance of leading imputation algorithms. We show that while diversity may globally improve imputation accuracy, there can be occasions where incorrect genotypes are imputed following the inclusion of more diverse haplotypes in the reference panel. We, however, demonstrate a technique for retaining and benefitting from the diversity in the reference panel whilst avoiding the occasional adverse effects on imputation accuracy. What is more, our results more clearly elucidate the role of diversity in a reference panel than has been shown in previous studies.https://www.mdpi.com/2073-4425/14/2/410genotype imputationpopulation geneticsrare variantsreference paneladmixture
spellingShingle Thibault Dekeyser
Emmanuelle Génin
Anthony F. Herzig
Opening the Black Box of Imputation Software to Study the Impact of Reference Panel Composition on Performance
Genes
genotype imputation
population genetics
rare variants
reference panel
admixture
title Opening the Black Box of Imputation Software to Study the Impact of Reference Panel Composition on Performance
title_full Opening the Black Box of Imputation Software to Study the Impact of Reference Panel Composition on Performance
title_fullStr Opening the Black Box of Imputation Software to Study the Impact of Reference Panel Composition on Performance
title_full_unstemmed Opening the Black Box of Imputation Software to Study the Impact of Reference Panel Composition on Performance
title_short Opening the Black Box of Imputation Software to Study the Impact of Reference Panel Composition on Performance
title_sort opening the black box of imputation software to study the impact of reference panel composition on performance
topic genotype imputation
population genetics
rare variants
reference panel
admixture
url https://www.mdpi.com/2073-4425/14/2/410
work_keys_str_mv AT thibaultdekeyser openingtheblackboxofimputationsoftwaretostudytheimpactofreferencepanelcompositiononperformance
AT emmanuellegenin openingtheblackboxofimputationsoftwaretostudytheimpactofreferencepanelcompositiononperformance
AT anthonyfherzig openingtheblackboxofimputationsoftwaretostudytheimpactofreferencepanelcompositiononperformance