A comprehensive evaluation of polygenic score and genotype imputation performances of human SNP arrays in diverse populations
Abstract Regardless of the overwhelming use of next-generation sequencing technologies, microarray-based genotyping combined with the imputation of untyped variants remains a cost-effective means to interrogate genetic variations across the human genome. This technology is widely used in genome-wide...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2022-10-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-022-22215-y |
_version_ | 1811336559194013696 |
---|---|
author | Dat Thanh Nguyen Trang T. H. Tran Mai Hoang Tran Khai Tran Duy Pham Nguyen Thuy Duong Quan Nguyen Nam S. Vo |
author_facet | Dat Thanh Nguyen Trang T. H. Tran Mai Hoang Tran Khai Tran Duy Pham Nguyen Thuy Duong Quan Nguyen Nam S. Vo |
author_sort | Dat Thanh Nguyen |
collection | DOAJ |
description | Abstract Regardless of the overwhelming use of next-generation sequencing technologies, microarray-based genotyping combined with the imputation of untyped variants remains a cost-effective means to interrogate genetic variations across the human genome. This technology is widely used in genome-wide association studies (GWAS) at bio-bank scales, and more recently, in polygenic score (PGS) analysis to predict and stratify disease risk. Over the last decade, human genotyping arrays have undergone a tremendous growth in both number and content making a comprehensive evaluation of their performances became more important. Here, we performed a comprehensive performance assessment for 23 available human genotyping arrays in 6 ancestry groups using diverse public and in-house datasets. The analyses focus on performance estimation of derived imputation (in terms of accuracy and coverage) and PGS (in terms of concordance to PGS estimated from whole-genome sequencing data) in three different traits and diseases. We found that the arrays with a higher number of SNPs are not necessarily the ones with higher imputation performance, but the arrays that are well-optimized for the targeted population could provide very good imputation performance. In addition, PGS estimated by imputed SNP array data is highly correlated to PGS estimated by whole-genome sequencing data in most cases. When optimal arrays are used, the correlations of PGS between two types of data are higher than 0.97, but interestingly, arrays with high density can result in lower PGS performance. Our results suggest the importance of properly selecting a suitable genotyping array for PGS applications. Finally, we developed a web tool that provides interactive analyses of tag SNP contents and imputation performance based on population and genomic regions of interest. This study would act as a practical guide for researchers to design their genotyping arrays-based studies. The tool is available at: https://genome.vinbigdata.org/tools/saa/ . |
first_indexed | 2024-04-13T17:41:16Z |
format | Article |
id | doaj.art-347781593dec4bb9a4a049540e160dcc |
institution | Directory Open Access Journal |
issn | 2045-2322 |
language | English |
last_indexed | 2024-04-13T17:41:16Z |
publishDate | 2022-10-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj.art-347781593dec4bb9a4a049540e160dcc2022-12-22T02:37:10ZengNature PortfolioScientific Reports2045-23222022-10-0112111310.1038/s41598-022-22215-yA comprehensive evaluation of polygenic score and genotype imputation performances of human SNP arrays in diverse populationsDat Thanh Nguyen0Trang T. H. Tran1Mai Hoang Tran2Khai Tran3Duy Pham4Nguyen Thuy Duong5Quan Nguyen6Nam S. Vo7Center for Biomedical Informatics, Vingroup Big Data InstituteCenter for Biomedical Informatics, Vingroup Big Data InstituteCenter for Biomedical Informatics, Vingroup Big Data InstituteCenter for Biomedical Informatics, Vingroup Big Data InstituteInstitute for Molecular Bioscience, University of QueenslandCenter for Biomedical Informatics, Vingroup Big Data InstituteInstitute for Molecular Bioscience, University of QueenslandCenter for Biomedical Informatics, Vingroup Big Data InstituteAbstract Regardless of the overwhelming use of next-generation sequencing technologies, microarray-based genotyping combined with the imputation of untyped variants remains a cost-effective means to interrogate genetic variations across the human genome. This technology is widely used in genome-wide association studies (GWAS) at bio-bank scales, and more recently, in polygenic score (PGS) analysis to predict and stratify disease risk. Over the last decade, human genotyping arrays have undergone a tremendous growth in both number and content making a comprehensive evaluation of their performances became more important. Here, we performed a comprehensive performance assessment for 23 available human genotyping arrays in 6 ancestry groups using diverse public and in-house datasets. The analyses focus on performance estimation of derived imputation (in terms of accuracy and coverage) and PGS (in terms of concordance to PGS estimated from whole-genome sequencing data) in three different traits and diseases. We found that the arrays with a higher number of SNPs are not necessarily the ones with higher imputation performance, but the arrays that are well-optimized for the targeted population could provide very good imputation performance. In addition, PGS estimated by imputed SNP array data is highly correlated to PGS estimated by whole-genome sequencing data in most cases. When optimal arrays are used, the correlations of PGS between two types of data are higher than 0.97, but interestingly, arrays with high density can result in lower PGS performance. Our results suggest the importance of properly selecting a suitable genotyping array for PGS applications. Finally, we developed a web tool that provides interactive analyses of tag SNP contents and imputation performance based on population and genomic regions of interest. This study would act as a practical guide for researchers to design their genotyping arrays-based studies. The tool is available at: https://genome.vinbigdata.org/tools/saa/ .https://doi.org/10.1038/s41598-022-22215-y |
spellingShingle | Dat Thanh Nguyen Trang T. H. Tran Mai Hoang Tran Khai Tran Duy Pham Nguyen Thuy Duong Quan Nguyen Nam S. Vo A comprehensive evaluation of polygenic score and genotype imputation performances of human SNP arrays in diverse populations Scientific Reports |
title | A comprehensive evaluation of polygenic score and genotype imputation performances of human SNP arrays in diverse populations |
title_full | A comprehensive evaluation of polygenic score and genotype imputation performances of human SNP arrays in diverse populations |
title_fullStr | A comprehensive evaluation of polygenic score and genotype imputation performances of human SNP arrays in diverse populations |
title_full_unstemmed | A comprehensive evaluation of polygenic score and genotype imputation performances of human SNP arrays in diverse populations |
title_short | A comprehensive evaluation of polygenic score and genotype imputation performances of human SNP arrays in diverse populations |
title_sort | comprehensive evaluation of polygenic score and genotype imputation performances of human snp arrays in diverse populations |
url | https://doi.org/10.1038/s41598-022-22215-y |
work_keys_str_mv | AT datthanhnguyen acomprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations AT trangthtran acomprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations AT maihoangtran acomprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations AT khaitran acomprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations AT duypham acomprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations AT nguyenthuyduong acomprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations AT quannguyen acomprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations AT namsvo acomprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations AT datthanhnguyen comprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations AT trangthtran comprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations AT maihoangtran comprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations AT khaitran comprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations AT duypham comprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations AT nguyenthuyduong comprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations AT quannguyen comprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations AT namsvo comprehensiveevaluationofpolygenicscoreandgenotypeimputationperformancesofhumansnparraysindiversepopulations |