A Prism Vote method for individualized risk prediction of traits in genotype data of Multi-population.

Multi-population cohorts offer unprecedented opportunities for profiling disease risk in large samples, however, heterogeneous risk effects underlying complex traits across populations make integrative prediction challenging. In this study, we propose a novel Bayesian probability framework, the Pris...

Full description

Bibliographic Details
Main Authors: Xiaoxuan Xia, Yexian Zhang, Rui Sun, Yingying Wei, Qi Li, Marc Ka Chun Chong, William Ka Kei Wu, Benny Chung-Ying Zee, Hua Tang, Maggie Haitian Wang
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-10-01
Series:PLoS Genetics
Online Access:https://doi.org/10.1371/journal.pgen.1010443
_version_ 1811155422121295872
author Xiaoxuan Xia
Yexian Zhang
Rui Sun
Yingying Wei
Qi Li
Marc Ka Chun Chong
William Ka Kei Wu
Benny Chung-Ying Zee
Hua Tang
Maggie Haitian Wang
author_facet Xiaoxuan Xia
Yexian Zhang
Rui Sun
Yingying Wei
Qi Li
Marc Ka Chun Chong
William Ka Kei Wu
Benny Chung-Ying Zee
Hua Tang
Maggie Haitian Wang
author_sort Xiaoxuan Xia
collection DOAJ
description Multi-population cohorts offer unprecedented opportunities for profiling disease risk in large samples, however, heterogeneous risk effects underlying complex traits across populations make integrative prediction challenging. In this study, we propose a novel Bayesian probability framework, the Prism Vote (PV), to construct risk predictions in heterogeneous genetic data. The PV views the trait of an individual as a composite risk from subpopulations, in which stratum-specific predictors can be formed in data of more homogeneous genetic structure. Since each individual is described by a composition of subpopulation memberships, the framework enables individualized risk characterization. Simulations demonstrated that the PV framework applied with alternative prediction methods significantly improved prediction accuracy in mixed and admixed populations. The advantage of PV enlarges as genetic heterogeneity and sample size increase. In two real genome-wide association data consists of multiple populations, we showed that the framework considerably enhanced prediction accuracy of the linear mixed model in five-group cross validations. The proposed method offers a new aspect to analyze individual's disease risk and improve accuracy for predicting complex traits in genotype data.
first_indexed 2024-04-10T04:33:45Z
format Article
id doaj.art-4b6ab098b18f487a892fbfc60d539e65
institution Directory Open Access Journal
issn 1553-7390
1553-7404
language English
last_indexed 2024-04-10T04:33:45Z
publishDate 2022-10-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Genetics
spelling doaj.art-4b6ab098b18f487a892fbfc60d539e652023-03-10T05:31:53ZengPublic Library of Science (PLoS)PLoS Genetics1553-73901553-74042022-10-011810e101044310.1371/journal.pgen.1010443A Prism Vote method for individualized risk prediction of traits in genotype data of Multi-population.Xiaoxuan XiaYexian ZhangRui SunYingying WeiQi LiMarc Ka Chun ChongWilliam Ka Kei WuBenny Chung-Ying ZeeHua TangMaggie Haitian WangMulti-population cohorts offer unprecedented opportunities for profiling disease risk in large samples, however, heterogeneous risk effects underlying complex traits across populations make integrative prediction challenging. In this study, we propose a novel Bayesian probability framework, the Prism Vote (PV), to construct risk predictions in heterogeneous genetic data. The PV views the trait of an individual as a composite risk from subpopulations, in which stratum-specific predictors can be formed in data of more homogeneous genetic structure. Since each individual is described by a composition of subpopulation memberships, the framework enables individualized risk characterization. Simulations demonstrated that the PV framework applied with alternative prediction methods significantly improved prediction accuracy in mixed and admixed populations. The advantage of PV enlarges as genetic heterogeneity and sample size increase. In two real genome-wide association data consists of multiple populations, we showed that the framework considerably enhanced prediction accuracy of the linear mixed model in five-group cross validations. The proposed method offers a new aspect to analyze individual's disease risk and improve accuracy for predicting complex traits in genotype data.https://doi.org/10.1371/journal.pgen.1010443
spellingShingle Xiaoxuan Xia
Yexian Zhang
Rui Sun
Yingying Wei
Qi Li
Marc Ka Chun Chong
William Ka Kei Wu
Benny Chung-Ying Zee
Hua Tang
Maggie Haitian Wang
A Prism Vote method for individualized risk prediction of traits in genotype data of Multi-population.
PLoS Genetics
title A Prism Vote method for individualized risk prediction of traits in genotype data of Multi-population.
title_full A Prism Vote method for individualized risk prediction of traits in genotype data of Multi-population.
title_fullStr A Prism Vote method for individualized risk prediction of traits in genotype data of Multi-population.
title_full_unstemmed A Prism Vote method for individualized risk prediction of traits in genotype data of Multi-population.
title_short A Prism Vote method for individualized risk prediction of traits in genotype data of Multi-population.
title_sort prism vote method for individualized risk prediction of traits in genotype data of multi population
url https://doi.org/10.1371/journal.pgen.1010443
work_keys_str_mv AT xiaoxuanxia aprismvotemethodforindividualizedriskpredictionoftraitsingenotypedataofmultipopulation
AT yexianzhang aprismvotemethodforindividualizedriskpredictionoftraitsingenotypedataofmultipopulation
AT ruisun aprismvotemethodforindividualizedriskpredictionoftraitsingenotypedataofmultipopulation
AT yingyingwei aprismvotemethodforindividualizedriskpredictionoftraitsingenotypedataofmultipopulation
AT qili aprismvotemethodforindividualizedriskpredictionoftraitsingenotypedataofmultipopulation
AT marckachunchong aprismvotemethodforindividualizedriskpredictionoftraitsingenotypedataofmultipopulation
AT williamkakeiwu aprismvotemethodforindividualizedriskpredictionoftraitsingenotypedataofmultipopulation
AT bennychungyingzee aprismvotemethodforindividualizedriskpredictionoftraitsingenotypedataofmultipopulation
AT huatang aprismvotemethodforindividualizedriskpredictionoftraitsingenotypedataofmultipopulation
AT maggiehaitianwang aprismvotemethodforindividualizedriskpredictionoftraitsingenotypedataofmultipopulation
AT xiaoxuanxia prismvotemethodforindividualizedriskpredictionoftraitsingenotypedataofmultipopulation
AT yexianzhang prismvotemethodforindividualizedriskpredictionoftraitsingenotypedataofmultipopulation
AT ruisun prismvotemethodforindividualizedriskpredictionoftraitsingenotypedataofmultipopulation
AT yingyingwei prismvotemethodforindividualizedriskpredictionoftraitsingenotypedataofmultipopulation
AT qili prismvotemethodforindividualizedriskpredictionoftraitsingenotypedataofmultipopulation
AT marckachunchong prismvotemethodforindividualizedriskpredictionoftraitsingenotypedataofmultipopulation
AT williamkakeiwu prismvotemethodforindividualizedriskpredictionoftraitsingenotypedataofmultipopulation
AT bennychungyingzee prismvotemethodforindividualizedriskpredictionoftraitsingenotypedataofmultipopulation
AT huatang prismvotemethodforindividualizedriskpredictionoftraitsingenotypedataofmultipopulation
AT maggiehaitianwang prismvotemethodforindividualizedriskpredictionoftraitsingenotypedataofmultipopulation