Calculating polygenic risk scores (PRS) in UK Biobank: A practical guide for epidemiologists

A polygenic risk score estimates the genetic risk of an individual for some disease or trait, calculated by aggregating the effect of many common variants associated with the condition. With the increasing availability of genetic data in large cohort studies such as the UK Biobank, inclusion of this...

Full description

Bibliographic Details
Main Authors: Collister, JA, Liu, X, Clifton, L
Format: Journal article
Language:English
Published: Frontiers Media 2022
_version_ 1826307960856379392
author Collister, JA
Liu, X
Clifton, L
author_facet Collister, JA
Liu, X
Clifton, L
author_sort Collister, JA
collection OXFORD
description A polygenic risk score estimates the genetic risk of an individual for some disease or trait, calculated by aggregating the effect of many common variants associated with the condition. With the increasing availability of genetic data in large cohort studies such as the UK Biobank, inclusion of this genetic risk as a covariate in statistical analyses is becoming more widespread. Previously this required specialist knowledge, but as tooling and data availability have improved it has become more feasible for statisticians and epidemiologists to calculate existing scores themselves for use in analyses. While tutorial resources exist for conducting genome-wide association studies and generating of new polygenic risk scores, fewer guides exist for the simple calculation and application of existing genetic scores. This guide outlines the key steps of this process: selection of suitable polygenic risk scores from the literature, extraction of relevant genetic variants and verification of their quality, calculation of the risk score and key considerations of its inclusion in statistical models, using the UK Biobank imputed data as a model data set. Many of the techniques in this guide will generalize to other datasets, however we also focus on some of the specific techniques required for using data in the formats UK Biobank have selected. This includes some of the challenges faced when working with large numbers of variants, where the computation time required by some tools is impractical. While we have focused on only a couple of tools, which may not be the best ones for every given aspect of the process, one barrier to working with genetic data is the sheer volume of tools available, and the difficulty for a novice to assess their viability. By discussing in depth a couple of tools that are adequate for the calculation even at large scale, we hope to make polygenic risk scores more accessible to a wider range of researchers.
first_indexed 2024-03-07T07:11:00Z
format Journal article
id oxford-uuid:8b4444be-96e5-4766-a41f-c9b1304b118f
institution University of Oxford
language English
last_indexed 2024-03-07T07:11:00Z
publishDate 2022
publisher Frontiers Media
record_format dspace
spelling oxford-uuid:8b4444be-96e5-4766-a41f-c9b1304b118f2022-06-24T15:23:58ZCalculating polygenic risk scores (PRS) in UK Biobank: A practical guide for epidemiologistsJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:8b4444be-96e5-4766-a41f-c9b1304b118fEnglishSymplectic ElementsFrontiers Media2022Collister, JALiu, XClifton, LA polygenic risk score estimates the genetic risk of an individual for some disease or trait, calculated by aggregating the effect of many common variants associated with the condition. With the increasing availability of genetic data in large cohort studies such as the UK Biobank, inclusion of this genetic risk as a covariate in statistical analyses is becoming more widespread. Previously this required specialist knowledge, but as tooling and data availability have improved it has become more feasible for statisticians and epidemiologists to calculate existing scores themselves for use in analyses. While tutorial resources exist for conducting genome-wide association studies and generating of new polygenic risk scores, fewer guides exist for the simple calculation and application of existing genetic scores. This guide outlines the key steps of this process: selection of suitable polygenic risk scores from the literature, extraction of relevant genetic variants and verification of their quality, calculation of the risk score and key considerations of its inclusion in statistical models, using the UK Biobank imputed data as a model data set. Many of the techniques in this guide will generalize to other datasets, however we also focus on some of the specific techniques required for using data in the formats UK Biobank have selected. This includes some of the challenges faced when working with large numbers of variants, where the computation time required by some tools is impractical. While we have focused on only a couple of tools, which may not be the best ones for every given aspect of the process, one barrier to working with genetic data is the sheer volume of tools available, and the difficulty for a novice to assess their viability. By discussing in depth a couple of tools that are adequate for the calculation even at large scale, we hope to make polygenic risk scores more accessible to a wider range of researchers.
spellingShingle Collister, JA
Liu, X
Clifton, L
Calculating polygenic risk scores (PRS) in UK Biobank: A practical guide for epidemiologists
title Calculating polygenic risk scores (PRS) in UK Biobank: A practical guide for epidemiologists
title_full Calculating polygenic risk scores (PRS) in UK Biobank: A practical guide for epidemiologists
title_fullStr Calculating polygenic risk scores (PRS) in UK Biobank: A practical guide for epidemiologists
title_full_unstemmed Calculating polygenic risk scores (PRS) in UK Biobank: A practical guide for epidemiologists
title_short Calculating polygenic risk scores (PRS) in UK Biobank: A practical guide for epidemiologists
title_sort calculating polygenic risk scores prs in uk biobank a practical guide for epidemiologists
work_keys_str_mv AT collisterja calculatingpolygenicriskscoresprsinukbiobankapracticalguideforepidemiologists
AT liux calculatingpolygenicriskscoresprsinukbiobankapracticalguideforepidemiologists
AT cliftonl calculatingpolygenicriskscoresprsinukbiobankapracticalguideforepidemiologists