An ensemble penalized regression method for multi-ancestry polygenic risk prediction

Abstract Great efforts are being made to develop advanced polygenic risk scores (PRS) to improve the prediction of complex traits and diseases. However, most existing PRS are primarily trained on European ancestry populations, limiting their transferability to non-European populations. In this artic...

Full description

Bibliographic Details
Main Authors: Jingning Zhang, Jianan Zhan, Jin Jin, Cheng Ma, Ruzhang Zhao, Jared O’Connell, Yunxuan Jiang, 23andMe Research Team, Bertram L. Koelsch, Haoyu Zhang, Nilanjan Chatterjee
Format: Article
Language:English
Published: Nature Portfolio 2024-04-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-024-47357-7
_version_ 1827183724738904064
author Jingning Zhang
Jianan Zhan
Jin Jin
Cheng Ma
Ruzhang Zhao
Jared O’Connell
Yunxuan Jiang
23andMe Research Team
Bertram L. Koelsch
Haoyu Zhang
Nilanjan Chatterjee
author_facet Jingning Zhang
Jianan Zhan
Jin Jin
Cheng Ma
Ruzhang Zhao
Jared O’Connell
Yunxuan Jiang
23andMe Research Team
Bertram L. Koelsch
Haoyu Zhang
Nilanjan Chatterjee
author_sort Jingning Zhang
collection DOAJ
description Abstract Great efforts are being made to develop advanced polygenic risk scores (PRS) to improve the prediction of complex traits and diseases. However, most existing PRS are primarily trained on European ancestry populations, limiting their transferability to non-European populations. In this article, we propose a novel method for generating multi-ancestry Polygenic Risk scOres based on enSemble of PEnalized Regression models (PROSPER). PROSPER integrates genome-wide association studies (GWAS) summary statistics from diverse populations to develop ancestry-specific PRS with improved predictive power for minority populations. The method uses a combination of $${{{{{{\mathscr{L}}}}}}}_{1}$$ L 1 (lasso) and $${{{{{{\mathscr{L}}}}}}}_{2}$$ L 2 (ridge) penalty functions, a parsimonious specification of the penalty parameters across populations, and an ensemble step to combine PRS generated across different penalty parameters. We evaluate the performance of PROSPER and other existing methods on large-scale simulated and real datasets, including those from 23andMe Inc., the Global Lipids Genetics Consortium, and All of Us. Results show that PROSPER can substantially improve multi-ancestry polygenic prediction compared to alternative methods across a wide variety of genetic architectures. In real data analyses, for example, PROSPER increased out-of-sample prediction R2 for continuous traits by an average of 70% compared to a state-of-the-art Bayesian method (PRS-CSx) in the African ancestry population. Further, PROSPER is computationally highly scalable for the analysis of large SNP contents and many diverse populations.
first_indexed 2024-04-24T07:14:36Z
format Article
id doaj.art-e9ada99664a547299a2de31c62d93bb4
institution Directory Open Access Journal
issn 2041-1723
language English
last_indexed 2025-03-21T06:23:09Z
publishDate 2024-04-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj.art-e9ada99664a547299a2de31c62d93bb42024-07-21T11:26:50ZengNature PortfolioNature Communications2041-17232024-04-0115111410.1038/s41467-024-47357-7An ensemble penalized regression method for multi-ancestry polygenic risk predictionJingning Zhang0Jianan Zhan1Jin Jin2Cheng Ma3Ruzhang Zhao4Jared O’Connell5Yunxuan Jiang623andMe Research TeamBertram L. Koelsch7Haoyu Zhang8Nilanjan Chatterjee9Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health23andMe Inc.Department of Biostatistics, Epidemiology, and Informatics, University of PennsylvaniaDepartment of Statistics, University of MichiganDepartment of Biostatistics, Johns Hopkins Bloomberg School of Public Health23andMe Inc.23andMe Inc.23andMe Inc.Division of Cancer Epidemiology and Genetics, National Cancer InstituteDepartment of Biostatistics, Johns Hopkins Bloomberg School of Public HealthAbstract Great efforts are being made to develop advanced polygenic risk scores (PRS) to improve the prediction of complex traits and diseases. However, most existing PRS are primarily trained on European ancestry populations, limiting their transferability to non-European populations. In this article, we propose a novel method for generating multi-ancestry Polygenic Risk scOres based on enSemble of PEnalized Regression models (PROSPER). PROSPER integrates genome-wide association studies (GWAS) summary statistics from diverse populations to develop ancestry-specific PRS with improved predictive power for minority populations. The method uses a combination of $${{{{{{\mathscr{L}}}}}}}_{1}$$ L 1 (lasso) and $${{{{{{\mathscr{L}}}}}}}_{2}$$ L 2 (ridge) penalty functions, a parsimonious specification of the penalty parameters across populations, and an ensemble step to combine PRS generated across different penalty parameters. We evaluate the performance of PROSPER and other existing methods on large-scale simulated and real datasets, including those from 23andMe Inc., the Global Lipids Genetics Consortium, and All of Us. Results show that PROSPER can substantially improve multi-ancestry polygenic prediction compared to alternative methods across a wide variety of genetic architectures. In real data analyses, for example, PROSPER increased out-of-sample prediction R2 for continuous traits by an average of 70% compared to a state-of-the-art Bayesian method (PRS-CSx) in the African ancestry population. Further, PROSPER is computationally highly scalable for the analysis of large SNP contents and many diverse populations.https://doi.org/10.1038/s41467-024-47357-7
spellingShingle Jingning Zhang
Jianan Zhan
Jin Jin
Cheng Ma
Ruzhang Zhao
Jared O’Connell
Yunxuan Jiang
23andMe Research Team
Bertram L. Koelsch
Haoyu Zhang
Nilanjan Chatterjee
An ensemble penalized regression method for multi-ancestry polygenic risk prediction
Nature Communications
title An ensemble penalized regression method for multi-ancestry polygenic risk prediction
title_full An ensemble penalized regression method for multi-ancestry polygenic risk prediction
title_fullStr An ensemble penalized regression method for multi-ancestry polygenic risk prediction
title_full_unstemmed An ensemble penalized regression method for multi-ancestry polygenic risk prediction
title_short An ensemble penalized regression method for multi-ancestry polygenic risk prediction
title_sort ensemble penalized regression method for multi ancestry polygenic risk prediction
url https://doi.org/10.1038/s41467-024-47357-7
work_keys_str_mv AT jingningzhang anensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction
AT jiananzhan anensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction
AT jinjin anensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction
AT chengma anensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction
AT ruzhangzhao anensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction
AT jaredoconnell anensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction
AT yunxuanjiang anensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction
AT 23andmeresearchteam anensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction
AT bertramlkoelsch anensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction
AT haoyuzhang anensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction
AT nilanjanchatterjee anensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction
AT jingningzhang ensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction
AT jiananzhan ensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction
AT jinjin ensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction
AT chengma ensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction
AT ruzhangzhao ensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction
AT jaredoconnell ensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction
AT yunxuanjiang ensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction
AT 23andmeresearchteam ensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction
AT bertramlkoelsch ensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction
AT haoyuzhang ensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction
AT nilanjanchatterjee ensemblepenalizedregressionmethodformultiancestrypolygenicriskprediction