An ensemble penalized regression method for multi-ancestry polygenic risk prediction
Abstract Great efforts are being made to develop advanced polygenic risk scores (PRS) to improve the prediction of complex traits and diseases. However, most existing PRS are primarily trained on European ancestry populations, limiting their transferability to non-European populations. In this artic...
Main Authors: | , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2024-04-01
|
Series: | Nature Communications |
Online Access: | https://doi.org/10.1038/s41467-024-47357-7 |
_version_ | 1827183724738904064 |
---|---|
author | Jingning Zhang Jianan Zhan Jin Jin Cheng Ma Ruzhang Zhao Jared O’Connell Yunxuan Jiang 23andMe Research Team Bertram L. Koelsch Haoyu Zhang Nilanjan Chatterjee |
author_facet | Jingning Zhang Jianan Zhan Jin Jin Cheng Ma Ruzhang Zhao Jared O’Connell Yunxuan Jiang 23andMe Research Team Bertram L. Koelsch Haoyu Zhang Nilanjan Chatterjee |
author_sort | Jingning Zhang |
collection | DOAJ |
description | Abstract Great efforts are being made to develop advanced polygenic risk scores (PRS) to improve the prediction of complex traits and diseases. However, most existing PRS are primarily trained on European ancestry populations, limiting their transferability to non-European populations. In this article, we propose a novel method for generating multi-ancestry Polygenic Risk scOres based on enSemble of PEnalized Regression models (PROSPER). PROSPER integrates genome-wide association studies (GWAS) summary statistics from diverse populations to develop ancestry-specific PRS with improved predictive power for minority populations. The method uses a combination of $${{{{{{\mathscr{L}}}}}}}_{1}$$ L 1 (lasso) and $${{{{{{\mathscr{L}}}}}}}_{2}$$ L 2 (ridge) penalty functions, a parsimonious specification of the penalty parameters across populations, and an ensemble step to combine PRS generated across different penalty parameters. We evaluate the performance of PROSPER and other existing methods on large-scale simulated and real datasets, including those from 23andMe Inc., the Global Lipids Genetics Consortium, and All of Us. Results show that PROSPER can substantially improve multi-ancestry polygenic prediction compared to alternative methods across a wide variety of genetic architectures. In real data analyses, for example, PROSPER increased out-of-sample prediction R2 for continuous traits by an average of 70% compared to a state-of-the-art Bayesian method (PRS-CSx) in the African ancestry population. Further, PROSPER is computationally highly scalable for the analysis of large SNP contents and many diverse populations. |
first_indexed | 2024-04-24T07:14:36Z |
format | Article |
id | doaj.art-e9ada99664a547299a2de31c62d93bb4 |
institution | Directory Open Access Journal |
issn | 2041-1723 |
language | English |
last_indexed | 2025-03-21T06:23:09Z |
publishDate | 2024-04-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Nature Communications |
spelling | doaj.art-e9ada99664a547299a2de31c62d93bb42024-07-21T11:26:50ZengNature PortfolioNature Communications2041-17232024-04-0115111410.1038/s41467-024-47357-7An ensemble penalized regression method for multi-ancestry polygenic risk predictionJingning Zhang0Jianan Zhan1Jin Jin2Cheng Ma3Ruzhang Zhao4Jared O’Connell5Yunxuan Jiang623andMe Research TeamBertram L. Koelsch7Haoyu Zhang8Nilanjan Chatterjee9Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health23andMe Inc.Department of Biostatistics, Epidemiology, and Informatics, University of PennsylvaniaDepartment of Statistics, University of MichiganDepartment of Biostatistics, Johns Hopkins Bloomberg School of Public Health23andMe Inc.23andMe Inc.23andMe Inc.Division of Cancer Epidemiology and Genetics, National Cancer InstituteDepartment of Biostatistics, Johns Hopkins Bloomberg School of Public HealthAbstract Great efforts are being made to develop advanced polygenic risk scores (PRS) to improve the prediction of complex traits and diseases. However, most existing PRS are primarily trained on European ancestry populations, limiting their transferability to non-European populations. In this article, we propose a novel method for generating multi-ancestry Polygenic Risk scOres based on enSemble of PEnalized Regression models (PROSPER). PROSPER integrates genome-wide association studies (GWAS) summary statistics from diverse populations to develop ancestry-specific PRS with improved predictive power for minority populations. The method uses a combination of $${{{{{{\mathscr{L}}}}}}}_{1}$$ L 1 (lasso) and $${{{{{{\mathscr{L}}}}}}}_{2}$$ L 2 (ridge) penalty functions, a parsimonious specification of the penalty parameters across populations, and an ensemble step to combine PRS generated across different penalty parameters. We evaluate the performance of PROSPER and other existing methods on large-scale simulated and real datasets, including those from 23andMe Inc., the Global Lipids Genetics Consortium, and All of Us. Results show that PROSPER can substantially improve multi-ancestry polygenic prediction compared to alternative methods across a wide variety of genetic architectures. In real data analyses, for example, PROSPER increased out-of-sample prediction R2 for continuous traits by an average of 70% compared to a state-of-the-art Bayesian method (PRS-CSx) in the African ancestry population. Further, PROSPER is computationally highly scalable for the analysis of large SNP contents and many diverse populations.https://doi.org/10.1038/s41467-024-47357-7 |
spellingShingle | Jingning Zhang Jianan Zhan Jin Jin Cheng Ma Ruzhang Zhao Jared O’Connell Yunxuan Jiang 23andMe Research Team Bertram L. Koelsch Haoyu Zhang Nilanjan Chatterjee An ensemble penalized regression method for multi-ancestry polygenic risk prediction Nature Communications |
title | An ensemble penalized regression method for multi-ancestry polygenic risk prediction |
title_full | An ensemble penalized regression method for multi-ancestry polygenic risk prediction |
title_fullStr | An ensemble penalized regression method for multi-ancestry polygenic risk prediction |
title_full_unstemmed | An ensemble penalized regression method for multi-ancestry polygenic risk prediction |
title_short | An ensemble penalized regression method for multi-ancestry polygenic risk prediction |
title_sort | ensemble penalized regression method for multi ancestry polygenic risk prediction |
url | https://doi.org/10.1038/s41467-024-47357-7 |
work_keys_str_mv | AT jingningzhang anensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction AT jiananzhan anensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction AT jinjin anensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction AT chengma anensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction AT ruzhangzhao anensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction AT jaredoconnell anensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction AT yunxuanjiang anensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction AT 23andmeresearchteam anensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction AT bertramlkoelsch anensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction AT haoyuzhang anensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction AT nilanjanchatterjee anensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction AT jingningzhang ensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction AT jiananzhan ensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction AT jinjin ensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction AT chengma ensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction AT ruzhangzhao ensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction AT jaredoconnell ensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction AT yunxuanjiang ensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction AT 23andmeresearchteam ensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction AT bertramlkoelsch ensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction AT haoyuzhang ensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction AT nilanjanchatterjee ensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction |