A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population

Abstract Variant imputation, a common practice in genome-wide association studies, relies on reference panels to infer unobserved genotypes. Multiple public reference panels are currently available with variations in size, sequencing depth, and represented populations. Currently, limited data exist...

Full description

Bibliographic Details
Main Authors: John Mauleekoonphairoj, Sissades Tongsima, Apichai Khongphatthanayothin, Sean J. Jurgens, Dominic S. Zimmerman, Boosamas Sutjaporn, Pharawee Wandee, Connie R. Bezzina, Koonlawee Nademanee, Yong Poovorawan
Format: Article
Language:English
Published: Nature Portfolio 2023-07-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-023-39429-3
_version_ 1797752738715533312
author John Mauleekoonphairoj
Sissades Tongsima
Apichai Khongphatthanayothin
Sean J. Jurgens
Dominic S. Zimmerman
Boosamas Sutjaporn
Pharawee Wandee
Connie R. Bezzina
Koonlawee Nademanee
Yong Poovorawan
author_facet John Mauleekoonphairoj
Sissades Tongsima
Apichai Khongphatthanayothin
Sean J. Jurgens
Dominic S. Zimmerman
Boosamas Sutjaporn
Pharawee Wandee
Connie R. Bezzina
Koonlawee Nademanee
Yong Poovorawan
author_sort John Mauleekoonphairoj
collection DOAJ
description Abstract Variant imputation, a common practice in genome-wide association studies, relies on reference panels to infer unobserved genotypes. Multiple public reference panels are currently available with variations in size, sequencing depth, and represented populations. Currently, limited data exist regarding the performance of public reference panels when used in an imputation of populations underrepresented in the reference panel. Here, we compare the performance of various public reference panels: 1000 Genomes Project, Haplotype Reference Consortium, GenomeAsia 100 K, and the recent Trans-Omics for Precision Medicine (TOPMed) program, when used in an imputation of samples from the Thai population. Genotype yields were assessed, and imputation accuracies were examined by comparison with high-depth whole genome sequencing data of the same sample. We found that imputation using the TOPMed panel yielded the largest number of variants (~ 271 million). Despite being the smallest in size, GenomeAsia 100 K achieved the best imputation accuracy with a median genotype concordance rate of 0.97. For rare variants, GenomeAsia 100 K also offered the best accuracy, although rare variants were less accurately imputable than common variants (30.3% reduction in concordance rates). The high accuracy observed when using GenomeAsia 100 K is likely attributable to the diverse representation of populations genetically similar to the study cohort emphasizing the benefits of sequencing populations classically underrepresented in human genomics.
first_indexed 2024-03-12T17:07:48Z
format Article
id doaj.art-032ec8fe9d0c4691961a6891be021e8b
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-03-12T17:07:48Z
publishDate 2023-07-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-032ec8fe9d0c4691961a6891be021e8b2023-08-06T11:13:01ZengNature PortfolioScientific Reports2045-23222023-07-011311810.1038/s41598-023-39429-3A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented populationJohn Mauleekoonphairoj0Sissades Tongsima1Apichai Khongphatthanayothin2Sean J. Jurgens3Dominic S. Zimmerman4Boosamas Sutjaporn5Pharawee Wandee6Connie R. Bezzina7Koonlawee Nademanee8Yong Poovorawan9Center of Excellence in Arrhythmia Research, Department of Medicine, Faculty of Medicine, Chulalongkorn UniversityNational Biobank of Thailand, National Science and Technology Development AgencyCenter of Excellence in Arrhythmia Research, Department of Medicine, Faculty of Medicine, Chulalongkorn UniversityHeart Center, Department of Experimental Cardiology, Amsterdam Cardiovascular Sciences, Amsterdam University, Medical Centre, University of AmsterdamHeart Center, Department of Experimental Cardiology, Amsterdam Cardiovascular Sciences, Amsterdam University, Medical Centre, University of AmsterdamCenter of Excellence in Arrhythmia Research, Department of Medicine, Faculty of Medicine, Chulalongkorn UniversityCenter of Excellence in Arrhythmia Research, Department of Medicine, Faculty of Medicine, Chulalongkorn UniversityHeart Center, Department of Experimental Cardiology, Amsterdam Cardiovascular Sciences, Amsterdam University, Medical Centre, University of AmsterdamCenter of Excellence in Arrhythmia Research, Department of Medicine, Faculty of Medicine, Chulalongkorn UniversityCenter of Excellence in Clinical Virology, Faculty of Medicine, Chulalongkorn UniversityAbstract Variant imputation, a common practice in genome-wide association studies, relies on reference panels to infer unobserved genotypes. Multiple public reference panels are currently available with variations in size, sequencing depth, and represented populations. Currently, limited data exist regarding the performance of public reference panels when used in an imputation of populations underrepresented in the reference panel. Here, we compare the performance of various public reference panels: 1000 Genomes Project, Haplotype Reference Consortium, GenomeAsia 100 K, and the recent Trans-Omics for Precision Medicine (TOPMed) program, when used in an imputation of samples from the Thai population. Genotype yields were assessed, and imputation accuracies were examined by comparison with high-depth whole genome sequencing data of the same sample. We found that imputation using the TOPMed panel yielded the largest number of variants (~ 271 million). Despite being the smallest in size, GenomeAsia 100 K achieved the best imputation accuracy with a median genotype concordance rate of 0.97. For rare variants, GenomeAsia 100 K also offered the best accuracy, although rare variants were less accurately imputable than common variants (30.3% reduction in concordance rates). The high accuracy observed when using GenomeAsia 100 K is likely attributable to the diverse representation of populations genetically similar to the study cohort emphasizing the benefits of sequencing populations classically underrepresented in human genomics.https://doi.org/10.1038/s41598-023-39429-3
spellingShingle John Mauleekoonphairoj
Sissades Tongsima
Apichai Khongphatthanayothin
Sean J. Jurgens
Dominic S. Zimmerman
Boosamas Sutjaporn
Pharawee Wandee
Connie R. Bezzina
Koonlawee Nademanee
Yong Poovorawan
A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population
Scientific Reports
title A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population
title_full A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population
title_fullStr A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population
title_full_unstemmed A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population
title_short A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population
title_sort diverse ancestrally matched reference panel increases genotype imputation accuracy in a underrepresented population
url https://doi.org/10.1038/s41598-023-39429-3
work_keys_str_mv AT johnmauleekoonphairoj adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT sissadestongsima adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT apichaikhongphatthanayothin adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT seanjjurgens adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT dominicszimmerman adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT boosamassutjaporn adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT pharaweewandee adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT connierbezzina adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT koonlaweenademanee adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT yongpoovorawan adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT johnmauleekoonphairoj diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT sissadestongsima diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT apichaikhongphatthanayothin diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT seanjjurgens diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT dominicszimmerman diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT boosamassutjaporn diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT pharaweewandee diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT connierbezzina diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT koonlaweenademanee diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT yongpoovorawan diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation