The effect of sample size on polygenic hazard models for prostate cancer

We determined the effect of sample size on performance of polygenic hazard score (PHS) models in prostate cancer. Age and genotypes were obtained for 40,861 men from the PRACTICAL consortium. The dataset included 201,590 SNPs per subject, and was split into training and testing sets. Established-SNP...

Full description

Bibliographic Details
Main Authors: Karunamuni, RA, Huynh-Le, M-P, Fan, CC, Key, TJ, Travis, RC, Neal, DE, Hamdy, FC, Mills, IG, The PRACTICAL Consortium
Format: Journal article
Language:English
Published: Springer Nature 2020
_version_ 1826258351869132800
author Karunamuni, RA
Huynh-Le, M-P
Fan, CC
Key, TJ
Travis, RC
Neal, DE
Hamdy, FC
Mills, IG
The PRACTICAL Consortium
author_facet Karunamuni, RA
Huynh-Le, M-P
Fan, CC
Key, TJ
Travis, RC
Neal, DE
Hamdy, FC
Mills, IG
The PRACTICAL Consortium
author_sort Karunamuni, RA
collection OXFORD
description We determined the effect of sample size on performance of polygenic hazard score (PHS) models in prostate cancer. Age and genotypes were obtained for 40,861 men from the PRACTICAL consortium. The dataset included 201,590 SNPs per subject, and was split into training and testing sets. Established-SNP models considered 65 SNPs that had been previously associated with prostate cancer. Discovery-SNP models used stepwise selection to identify new SNPs. The performance of each PHS model was calculated for random sizes of the training set. The performance of a representative Established-SNP model was estimated for random sizes of the testing set. Mean HR98/50 (hazard ratio of top 2% to average in test set) of the Established-SNP model increased from 1.73 [95% CI: 1.69–1.77] to 2.41 [2.40–2.43] when the number of training samples was increased from 1 thousand to 30 thousand. Corresponding HR98/50 of the Discovery-SNP model increased from 1.05 [0.93–1.18] to 2.19 [2.16–2.23]. HR98/50 of a representative Established-SNP model using testing set sample sizes of 0.6 thousand and 6 thousand observations were 1.78 [1.70–1.85] and 1.73 [1.71–1.76], respectively. We estimate that a study population of 20 thousand men is required to develop Discovery-SNP PHS models while 10 thousand men should be sufficient for Established-SNP models.
first_indexed 2024-03-06T18:32:38Z
format Journal article
id oxford-uuid:0a2fa1ba-e2c2-4a89-a35f-27e1ace01477
institution University of Oxford
language English
last_indexed 2024-03-06T18:32:38Z
publishDate 2020
publisher Springer Nature
record_format dspace
spelling oxford-uuid:0a2fa1ba-e2c2-4a89-a35f-27e1ace014772022-03-26T09:22:24ZThe effect of sample size on polygenic hazard models for prostate cancerJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:0a2fa1ba-e2c2-4a89-a35f-27e1ace01477EnglishSymplectic ElementsSpringer Nature2020Karunamuni, RAHuynh-Le, M-PFan, CCKey, TJTravis, RCNeal, DEHamdy, FCMills, IGThe PRACTICAL ConsortiumWe determined the effect of sample size on performance of polygenic hazard score (PHS) models in prostate cancer. Age and genotypes were obtained for 40,861 men from the PRACTICAL consortium. The dataset included 201,590 SNPs per subject, and was split into training and testing sets. Established-SNP models considered 65 SNPs that had been previously associated with prostate cancer. Discovery-SNP models used stepwise selection to identify new SNPs. The performance of each PHS model was calculated for random sizes of the training set. The performance of a representative Established-SNP model was estimated for random sizes of the testing set. Mean HR98/50 (hazard ratio of top 2% to average in test set) of the Established-SNP model increased from 1.73 [95% CI: 1.69–1.77] to 2.41 [2.40–2.43] when the number of training samples was increased from 1 thousand to 30 thousand. Corresponding HR98/50 of the Discovery-SNP model increased from 1.05 [0.93–1.18] to 2.19 [2.16–2.23]. HR98/50 of a representative Established-SNP model using testing set sample sizes of 0.6 thousand and 6 thousand observations were 1.78 [1.70–1.85] and 1.73 [1.71–1.76], respectively. We estimate that a study population of 20 thousand men is required to develop Discovery-SNP PHS models while 10 thousand men should be sufficient for Established-SNP models.
spellingShingle Karunamuni, RA
Huynh-Le, M-P
Fan, CC
Key, TJ
Travis, RC
Neal, DE
Hamdy, FC
Mills, IG
The PRACTICAL Consortium
The effect of sample size on polygenic hazard models for prostate cancer
title The effect of sample size on polygenic hazard models for prostate cancer
title_full The effect of sample size on polygenic hazard models for prostate cancer
title_fullStr The effect of sample size on polygenic hazard models for prostate cancer
title_full_unstemmed The effect of sample size on polygenic hazard models for prostate cancer
title_short The effect of sample size on polygenic hazard models for prostate cancer
title_sort effect of sample size on polygenic hazard models for prostate cancer
work_keys_str_mv AT karunamunira theeffectofsamplesizeonpolygenichazardmodelsforprostatecancer
AT huynhlemp theeffectofsamplesizeonpolygenichazardmodelsforprostatecancer
AT fancc theeffectofsamplesizeonpolygenichazardmodelsforprostatecancer
AT keytj theeffectofsamplesizeonpolygenichazardmodelsforprostatecancer
AT travisrc theeffectofsamplesizeonpolygenichazardmodelsforprostatecancer
AT nealde theeffectofsamplesizeonpolygenichazardmodelsforprostatecancer
AT hamdyfc theeffectofsamplesizeonpolygenichazardmodelsforprostatecancer
AT millsig theeffectofsamplesizeonpolygenichazardmodelsforprostatecancer
AT thepracticalconsortium theeffectofsamplesizeonpolygenichazardmodelsforprostatecancer
AT karunamunira effectofsamplesizeonpolygenichazardmodelsforprostatecancer
AT huynhlemp effectofsamplesizeonpolygenichazardmodelsforprostatecancer
AT fancc effectofsamplesizeonpolygenichazardmodelsforprostatecancer
AT keytj effectofsamplesizeonpolygenichazardmodelsforprostatecancer
AT travisrc effectofsamplesizeonpolygenichazardmodelsforprostatecancer
AT nealde effectofsamplesizeonpolygenichazardmodelsforprostatecancer
AT hamdyfc effectofsamplesizeonpolygenichazardmodelsforprostatecancer
AT millsig effectofsamplesizeonpolygenichazardmodelsforprostatecancer
AT thepracticalconsortium effectofsamplesizeonpolygenichazardmodelsforprostatecancer