Investigation of heteroscedasticity in polygenic risk scores across 15 quantitative traits

The polygenic risk score (PRS) could be used to stratify individuals with high risk of diseases and predict complex trait of individual in a population. Previous studies developed a PRS-based prediction model using linear regression and evaluated the predictive performance of the model using the R2...

Full description

Bibliographic Details
Main Authors: Hyein Jung, Hae-Un Jung, Eun Ju Baek, Ju Yeon Chung, Shin Young Kwon, Ji-One Kang, Ji Eun Lim, Bermseok Oh
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-05-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2023.1150889/full
_version_ 1797830934870884352
author Hyein Jung
Hae-Un Jung
Eun Ju Baek
Ju Yeon Chung
Shin Young Kwon
Ji-One Kang
Ji Eun Lim
Bermseok Oh
Bermseok Oh
Bermseok Oh
author_facet Hyein Jung
Hae-Un Jung
Eun Ju Baek
Ju Yeon Chung
Shin Young Kwon
Ji-One Kang
Ji Eun Lim
Bermseok Oh
Bermseok Oh
Bermseok Oh
author_sort Hyein Jung
collection DOAJ
description The polygenic risk score (PRS) could be used to stratify individuals with high risk of diseases and predict complex trait of individual in a population. Previous studies developed a PRS-based prediction model using linear regression and evaluated the predictive performance of the model using the R2 value. One of the key assumptions of linear regression is that the variance of the residual should be constant at each level of the predictor variables, called homoscedasticity. However, some studies show that PRS models exhibit heteroscedasticity between PRS and traits. This study analyzes whether heteroscedasticity exists in PRS models of diverse disease-related traits and, if any, it affects the accuracy of PRS-based prediction in 354,761 Europeans from the UK Biobank. We constructed PRSs for 15 quantitative traits using LDpred2 and estimated the existence of heteroscedasticity between PRSs and 15 traits using three different tests of the Breusch-Pagan (BP) test, score test, and F test. Thirteen out of fifteen traits show significant heteroscedasticity. Further replication using new PRSs from the PGS catalog and independent samples (N = 23,620) from the UK Biobank confirmed the heteroscedasticity in ten traits. As a result, ten out of fifteen quantitative traits show statistically significant heteroscedasticity between the PRS and each trait. There was a greater variance of residuals as PRS increased, and the prediction accuracy at each level of PRS tended to decrease as the variance of residuals increased. In conclusion, heteroscedasticity was frequently observed in the PRS-based prediction models of quantitative traits, and the accuracy of the predictive model may differ according to PRS values. Therefore, prediction models using the PRS should be constructed by considering heteroscedasticity.
first_indexed 2024-04-09T13:44:55Z
format Article
id doaj.art-b039f4e167a74594bac6030410bc703d
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-04-09T13:44:55Z
publishDate 2023-05-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-b039f4e167a74594bac6030410bc703d2023-05-09T05:38:04ZengFrontiers Media S.A.Frontiers in Genetics1664-80212023-05-011410.3389/fgene.2023.11508891150889Investigation of heteroscedasticity in polygenic risk scores across 15 quantitative traitsHyein Jung0Hae-Un Jung1Eun Ju Baek2Ju Yeon Chung3Shin Young Kwon4Ji-One Kang5Ji Eun Lim6Bermseok Oh7Bermseok Oh8Bermseok Oh9Department of Biomedical Science, Graduate School, Kyung Hee University, Seoul, Republic of KoreaDepartment of Biomedical Science, Graduate School, Kyung Hee University, Seoul, Republic of KoreaMendel, Seoul, Republic of KoreaDepartment of Biomedical Science, Graduate School, Kyung Hee University, Seoul, Republic of KoreaDepartment of Biomedical Science, Graduate School, Kyung Hee University, Seoul, Republic of KoreaDepartment of Biochemistry and Molecular Biology, School of Medicine, Kyung Hee University, Seoul, Republic of KoreaDepartment of Biochemistry and Molecular Biology, School of Medicine, Kyung Hee University, Seoul, Republic of KoreaDepartment of Biomedical Science, Graduate School, Kyung Hee University, Seoul, Republic of KoreaMendel, Seoul, Republic of KoreaDepartment of Biochemistry and Molecular Biology, School of Medicine, Kyung Hee University, Seoul, Republic of KoreaThe polygenic risk score (PRS) could be used to stratify individuals with high risk of diseases and predict complex trait of individual in a population. Previous studies developed a PRS-based prediction model using linear regression and evaluated the predictive performance of the model using the R2 value. One of the key assumptions of linear regression is that the variance of the residual should be constant at each level of the predictor variables, called homoscedasticity. However, some studies show that PRS models exhibit heteroscedasticity between PRS and traits. This study analyzes whether heteroscedasticity exists in PRS models of diverse disease-related traits and, if any, it affects the accuracy of PRS-based prediction in 354,761 Europeans from the UK Biobank. We constructed PRSs for 15 quantitative traits using LDpred2 and estimated the existence of heteroscedasticity between PRSs and 15 traits using three different tests of the Breusch-Pagan (BP) test, score test, and F test. Thirteen out of fifteen traits show significant heteroscedasticity. Further replication using new PRSs from the PGS catalog and independent samples (N = 23,620) from the UK Biobank confirmed the heteroscedasticity in ten traits. As a result, ten out of fifteen quantitative traits show statistically significant heteroscedasticity between the PRS and each trait. There was a greater variance of residuals as PRS increased, and the prediction accuracy at each level of PRS tended to decrease as the variance of residuals increased. In conclusion, heteroscedasticity was frequently observed in the PRS-based prediction models of quantitative traits, and the accuracy of the predictive model may differ according to PRS values. Therefore, prediction models using the PRS should be constructed by considering heteroscedasticity.https://www.frontiersin.org/articles/10.3389/fgene.2023.1150889/fullpolygenic risk score (PRS)linear regression modelquantitative traitprediction accuracyheteroscedasticity
spellingShingle Hyein Jung
Hae-Un Jung
Eun Ju Baek
Ju Yeon Chung
Shin Young Kwon
Ji-One Kang
Ji Eun Lim
Bermseok Oh
Bermseok Oh
Bermseok Oh
Investigation of heteroscedasticity in polygenic risk scores across 15 quantitative traits
Frontiers in Genetics
polygenic risk score (PRS)
linear regression model
quantitative trait
prediction accuracy
heteroscedasticity
title Investigation of heteroscedasticity in polygenic risk scores across 15 quantitative traits
title_full Investigation of heteroscedasticity in polygenic risk scores across 15 quantitative traits
title_fullStr Investigation of heteroscedasticity in polygenic risk scores across 15 quantitative traits
title_full_unstemmed Investigation of heteroscedasticity in polygenic risk scores across 15 quantitative traits
title_short Investigation of heteroscedasticity in polygenic risk scores across 15 quantitative traits
title_sort investigation of heteroscedasticity in polygenic risk scores across 15 quantitative traits
topic polygenic risk score (PRS)
linear regression model
quantitative trait
prediction accuracy
heteroscedasticity
url https://www.frontiersin.org/articles/10.3389/fgene.2023.1150889/full
work_keys_str_mv AT hyeinjung investigationofheteroscedasticityinpolygenicriskscoresacross15quantitativetraits
AT haeunjung investigationofheteroscedasticityinpolygenicriskscoresacross15quantitativetraits
AT eunjubaek investigationofheteroscedasticityinpolygenicriskscoresacross15quantitativetraits
AT juyeonchung investigationofheteroscedasticityinpolygenicriskscoresacross15quantitativetraits
AT shinyoungkwon investigationofheteroscedasticityinpolygenicriskscoresacross15quantitativetraits
AT jionekang investigationofheteroscedasticityinpolygenicriskscoresacross15quantitativetraits
AT jieunlim investigationofheteroscedasticityinpolygenicriskscoresacross15quantitativetraits
AT bermseokoh investigationofheteroscedasticityinpolygenicriskscoresacross15quantitativetraits
AT bermseokoh investigationofheteroscedasticityinpolygenicriskscoresacross15quantitativetraits
AT bermseokoh investigationofheteroscedasticityinpolygenicriskscoresacross15quantitativetraits