A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population
Abstract Variant imputation, a common practice in genome-wide association studies, relies on reference panels to infer unobserved genotypes. Multiple public reference panels are currently available with variations in size, sequencing depth, and represented populations. Currently, limited data exist...
Main Authors: | , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2023-07-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-023-39429-3 |
_version_ | 1797752738715533312 |
---|---|
author | John Mauleekoonphairoj Sissades Tongsima Apichai Khongphatthanayothin Sean J. Jurgens Dominic S. Zimmerman Boosamas Sutjaporn Pharawee Wandee Connie R. Bezzina Koonlawee Nademanee Yong Poovorawan |
author_facet | John Mauleekoonphairoj Sissades Tongsima Apichai Khongphatthanayothin Sean J. Jurgens Dominic S. Zimmerman Boosamas Sutjaporn Pharawee Wandee Connie R. Bezzina Koonlawee Nademanee Yong Poovorawan |
author_sort | John Mauleekoonphairoj |
collection | DOAJ |
description | Abstract Variant imputation, a common practice in genome-wide association studies, relies on reference panels to infer unobserved genotypes. Multiple public reference panels are currently available with variations in size, sequencing depth, and represented populations. Currently, limited data exist regarding the performance of public reference panels when used in an imputation of populations underrepresented in the reference panel. Here, we compare the performance of various public reference panels: 1000 Genomes Project, Haplotype Reference Consortium, GenomeAsia 100 K, and the recent Trans-Omics for Precision Medicine (TOPMed) program, when used in an imputation of samples from the Thai population. Genotype yields were assessed, and imputation accuracies were examined by comparison with high-depth whole genome sequencing data of the same sample. We found that imputation using the TOPMed panel yielded the largest number of variants (~ 271 million). Despite being the smallest in size, GenomeAsia 100 K achieved the best imputation accuracy with a median genotype concordance rate of 0.97. For rare variants, GenomeAsia 100 K also offered the best accuracy, although rare variants were less accurately imputable than common variants (30.3% reduction in concordance rates). The high accuracy observed when using GenomeAsia 100 K is likely attributable to the diverse representation of populations genetically similar to the study cohort emphasizing the benefits of sequencing populations classically underrepresented in human genomics. |
first_indexed | 2024-03-12T17:07:48Z |
format | Article |
id | doaj.art-032ec8fe9d0c4691961a6891be021e8b |
institution | Directory Open Access Journal |
issn | 2045-2322 |
language | English |
last_indexed | 2024-03-12T17:07:48Z |
publishDate | 2023-07-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj.art-032ec8fe9d0c4691961a6891be021e8b2023-08-06T11:13:01ZengNature PortfolioScientific Reports2045-23222023-07-011311810.1038/s41598-023-39429-3A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented populationJohn Mauleekoonphairoj0Sissades Tongsima1Apichai Khongphatthanayothin2Sean J. Jurgens3Dominic S. Zimmerman4Boosamas Sutjaporn5Pharawee Wandee6Connie R. Bezzina7Koonlawee Nademanee8Yong Poovorawan9Center of Excellence in Arrhythmia Research, Department of Medicine, Faculty of Medicine, Chulalongkorn UniversityNational Biobank of Thailand, National Science and Technology Development AgencyCenter of Excellence in Arrhythmia Research, Department of Medicine, Faculty of Medicine, Chulalongkorn UniversityHeart Center, Department of Experimental Cardiology, Amsterdam Cardiovascular Sciences, Amsterdam University, Medical Centre, University of AmsterdamHeart Center, Department of Experimental Cardiology, Amsterdam Cardiovascular Sciences, Amsterdam University, Medical Centre, University of AmsterdamCenter of Excellence in Arrhythmia Research, Department of Medicine, Faculty of Medicine, Chulalongkorn UniversityCenter of Excellence in Arrhythmia Research, Department of Medicine, Faculty of Medicine, Chulalongkorn UniversityHeart Center, Department of Experimental Cardiology, Amsterdam Cardiovascular Sciences, Amsterdam University, Medical Centre, University of AmsterdamCenter of Excellence in Arrhythmia Research, Department of Medicine, Faculty of Medicine, Chulalongkorn UniversityCenter of Excellence in Clinical Virology, Faculty of Medicine, Chulalongkorn UniversityAbstract Variant imputation, a common practice in genome-wide association studies, relies on reference panels to infer unobserved genotypes. Multiple public reference panels are currently available with variations in size, sequencing depth, and represented populations. Currently, limited data exist regarding the performance of public reference panels when used in an imputation of populations underrepresented in the reference panel. Here, we compare the performance of various public reference panels: 1000 Genomes Project, Haplotype Reference Consortium, GenomeAsia 100 K, and the recent Trans-Omics for Precision Medicine (TOPMed) program, when used in an imputation of samples from the Thai population. Genotype yields were assessed, and imputation accuracies were examined by comparison with high-depth whole genome sequencing data of the same sample. We found that imputation using the TOPMed panel yielded the largest number of variants (~ 271 million). Despite being the smallest in size, GenomeAsia 100 K achieved the best imputation accuracy with a median genotype concordance rate of 0.97. For rare variants, GenomeAsia 100 K also offered the best accuracy, although rare variants were less accurately imputable than common variants (30.3% reduction in concordance rates). The high accuracy observed when using GenomeAsia 100 K is likely attributable to the diverse representation of populations genetically similar to the study cohort emphasizing the benefits of sequencing populations classically underrepresented in human genomics.https://doi.org/10.1038/s41598-023-39429-3 |
spellingShingle | John Mauleekoonphairoj Sissades Tongsima Apichai Khongphatthanayothin Sean J. Jurgens Dominic S. Zimmerman Boosamas Sutjaporn Pharawee Wandee Connie R. Bezzina Koonlawee Nademanee Yong Poovorawan A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population Scientific Reports |
title | A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population |
title_full | A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population |
title_fullStr | A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population |
title_full_unstemmed | A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population |
title_short | A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population |
title_sort | diverse ancestrally matched reference panel increases genotype imputation accuracy in a underrepresented population |
url | https://doi.org/10.1038/s41598-023-39429-3 |
work_keys_str_mv | AT johnmauleekoonphairoj adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT sissadestongsima adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT apichaikhongphatthanayothin adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT seanjjurgens adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT dominicszimmerman adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT boosamassutjaporn adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT pharaweewandee adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT connierbezzina adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT koonlaweenademanee adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT yongpoovorawan adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT johnmauleekoonphairoj diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT sissadestongsima diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT apichaikhongphatthanayothin diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT seanjjurgens diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT dominicszimmerman diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT boosamassutjaporn diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT pharaweewandee diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT connierbezzina diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT koonlaweenademanee diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT yongpoovorawan diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation |