A comprehensive evaluation of polygenic score and genotype imputation performances of human SNP arrays in diverse populations

Abstract Regardless of the overwhelming use of next-generation sequencing technologies, microarray-based genotyping combined with the imputation of untyped variants remains a cost-effective means to interrogate genetic variations across the human genome. This technology is widely used in genome-wide...

Full description

Bibliographic Details
Main Authors: Dat Thanh Nguyen, Trang T. H. Tran, Mai Hoang Tran, Khai Tran, Duy Pham, Nguyen Thuy Duong, Quan Nguyen, Nam S. Vo
Format: Article
Language:English
Published: Nature Portfolio 2022-10-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-022-22215-y
_version_ 1811336559194013696
author Dat Thanh Nguyen
Trang T. H. Tran
Mai Hoang Tran
Khai Tran
Duy Pham
Nguyen Thuy Duong
Quan Nguyen
Nam S. Vo
author_facet Dat Thanh Nguyen
Trang T. H. Tran
Mai Hoang Tran
Khai Tran
Duy Pham
Nguyen Thuy Duong
Quan Nguyen
Nam S. Vo
author_sort Dat Thanh Nguyen
collection DOAJ
description Abstract Regardless of the overwhelming use of next-generation sequencing technologies, microarray-based genotyping combined with the imputation of untyped variants remains a cost-effective means to interrogate genetic variations across the human genome. This technology is widely used in genome-wide association studies (GWAS) at bio-bank scales, and more recently, in polygenic score (PGS) analysis to predict and stratify disease risk. Over the last decade, human genotyping arrays have undergone a tremendous growth in both number and content making a comprehensive evaluation of their performances became more important. Here, we performed a comprehensive performance assessment for 23 available human genotyping arrays in 6 ancestry groups using diverse public and in-house datasets. The analyses focus on performance estimation of derived imputation (in terms of accuracy and coverage) and PGS (in terms of concordance to PGS estimated from whole-genome sequencing data) in three different traits and diseases. We found that the arrays with a higher number of SNPs are not necessarily the ones with higher imputation performance, but the arrays that are well-optimized for the targeted population could provide very good imputation performance. In addition, PGS estimated by imputed SNP array data is highly correlated to PGS estimated by whole-genome sequencing data in most cases. When optimal arrays are used, the correlations of PGS between two types of data are higher than 0.97, but interestingly, arrays with high density can result in lower PGS performance. Our results suggest the importance of properly selecting a suitable genotyping array for PGS applications. Finally, we developed a web tool that provides interactive analyses of tag SNP contents and imputation performance based on population and genomic regions of interest. This study would act as a practical guide for researchers to design their genotyping arrays-based studies. The tool is available at: https://genome.vinbigdata.org/tools/saa/ .
first_indexed 2024-04-13T17:41:16Z
format Article
id doaj.art-347781593dec4bb9a4a049540e160dcc
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-04-13T17:41:16Z
publishDate 2022-10-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-347781593dec4bb9a4a049540e160dcc2022-12-22T02:37:10ZengNature PortfolioScientific Reports2045-23222022-10-0112111310.1038/s41598-022-22215-yA comprehensive evaluation of polygenic score and genotype imputation performances of human SNP arrays in diverse populationsDat Thanh Nguyen0Trang T. H. Tran1Mai Hoang Tran2Khai Tran3Duy Pham4Nguyen Thuy Duong5Quan Nguyen6Nam S. Vo7Center for Biomedical Informatics, Vingroup Big Data InstituteCenter for Biomedical Informatics, Vingroup Big Data InstituteCenter for Biomedical Informatics, Vingroup Big Data InstituteCenter for Biomedical Informatics, Vingroup Big Data InstituteInstitute for Molecular Bioscience, University of QueenslandCenter for Biomedical Informatics, Vingroup Big Data InstituteInstitute for Molecular Bioscience, University of QueenslandCenter for Biomedical Informatics, Vingroup Big Data InstituteAbstract Regardless of the overwhelming use of next-generation sequencing technologies, microarray-based genotyping combined with the imputation of untyped variants remains a cost-effective means to interrogate genetic variations across the human genome. This technology is widely used in genome-wide association studies (GWAS) at bio-bank scales, and more recently, in polygenic score (PGS) analysis to predict and stratify disease risk. Over the last decade, human genotyping arrays have undergone a tremendous growth in both number and content making a comprehensive evaluation of their performances became more important. Here, we performed a comprehensive performance assessment for 23 available human genotyping arrays in 6 ancestry groups using diverse public and in-house datasets. The analyses focus on performance estimation of derived imputation (in terms of accuracy and coverage) and PGS (in terms of concordance to PGS estimated from whole-genome sequencing data) in three different traits and diseases. We found that the arrays with a higher number of SNPs are not necessarily the ones with higher imputation performance, but the arrays that are well-optimized for the targeted population could provide very good imputation performance. In addition, PGS estimated by imputed SNP array data is highly correlated to PGS estimated by whole-genome sequencing data in most cases. When optimal arrays are used, the correlations of PGS between two types of data are higher than 0.97, but interestingly, arrays with high density can result in lower PGS performance. Our results suggest the importance of properly selecting a suitable genotyping array for PGS applications. Finally, we developed a web tool that provides interactive analyses of tag SNP contents and imputation performance based on population and genomic regions of interest. This study would act as a practical guide for researchers to design their genotyping arrays-based studies. The tool is available at: https://genome.vinbigdata.org/tools/saa/ .https://doi.org/10.1038/s41598-022-22215-y
spellingShingle Dat Thanh Nguyen
Trang T. H. Tran
Mai Hoang Tran
Khai Tran
Duy Pham
Nguyen Thuy Duong
Quan Nguyen
Nam S. Vo
A comprehensive evaluation of polygenic score and genotype imputation performances of human SNP arrays in diverse populations
Scientific Reports
title A comprehensive evaluation of polygenic score and genotype imputation performances of human SNP arrays in diverse populations
title_full A comprehensive evaluation of polygenic score and genotype imputation performances of human SNP arrays in diverse populations
title_fullStr A comprehensive evaluation of polygenic score and genotype imputation performances of human SNP arrays in diverse populations
title_full_unstemmed A comprehensive evaluation of polygenic score and genotype imputation performances of human SNP arrays in diverse populations
title_short A comprehensive evaluation of polygenic score and genotype imputation performances of human SNP arrays in diverse populations
title_sort comprehensive evaluation of polygenic score and genotype imputation performances of human snp arrays in diverse populations
url https://doi.org/10.1038/s41598-022-22215-y
work_keys_str_mv AT datthanhnguyen acomprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations
AT trangthtran acomprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations
AT maihoangtran acomprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations
AT khaitran acomprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations
AT duypham acomprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations
AT nguyenthuyduong acomprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations
AT quannguyen acomprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations
AT namsvo acomprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations
AT datthanhnguyen comprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations
AT trangthtran comprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations
AT maihoangtran comprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations
AT khaitran comprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations
AT duypham comprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations
AT nguyenthuyduong comprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations
AT quannguyen comprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations
AT namsvo comprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations