Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations

Using genetic data to predict gene expression has garnered significant attention in recent years. PrediXcan has become one of the most widely used gene-based methods for testing associations between predicted gene expression values and a phenotype, which has facilitated novel insights into the relat...

Full description

Bibliographic Details
Main Authors: Anna V. Mikhaylova, Timothy A. Thornton
Format: Article
Language:English
Published: Frontiers Media S.A. 2019-04-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fgene.2019.00261/full
_version_ 1811340816812081152
author Anna V. Mikhaylova
Timothy A. Thornton
author_facet Anna V. Mikhaylova
Timothy A. Thornton
author_sort Anna V. Mikhaylova
collection DOAJ
description Using genetic data to predict gene expression has garnered significant attention in recent years. PrediXcan has become one of the most widely used gene-based methods for testing associations between predicted gene expression values and a phenotype, which has facilitated novel insights into the relationship between complex traits and the component of gene expression that can be attributed to genetic variation. The gene expression prediction models for PrediXcan were developed using supervised machine learning methods and training data from the Depression Genes and Networks (DGN) study and the Genotype-Tissue Expression (GTEx) project, where the majority of subjects are of European descent. Many genetic studies, however, include samples from multi-ethnic populations, and in this paper we evaluate the accuracy of PrediXcan for predicting gene expression in diverse populations. Using transcriptomic data from the GEUVADIS (Genetic European Variation in Disease) RNA sequencing project and whole genome sequencing data from the 1000 Genomes project, we evaluate and compare the predictive performance of PrediXcan in an African population (Yoruban) and four European ancestry populations for thousands of genes. We evaluate a range of models from the PrediXcan weight databases and use Pearson's correlation coefficient to assess gene expression prediction accuracy with PrediXcan. From our evaluation, we find that the predictive performance of PrediXcan varies substantially among populations from different continents (F-test p-value < 2.2 × 10−16), where prediction accuracy is lower in the Yoruban population from West Africa compared to the European-ancestry populations. Moreover, not only do we find differences in predictive performance between populations from different continents, we also find highly significant differences in prediction accuracy among the four European ancestry populations considered (F-test p-value < 2.2 × 10−16). Finally, while there is variability in prediction accuracy across different PrediXcan weight databases, we also find consistency in the qualitative performance of PrediXcan for the five populations considered, with the African ancestry population having the lowest accuracy across databases.
first_indexed 2024-04-13T18:47:35Z
format Article
id doaj.art-d3ca2443a830457cb9639fb52b17793d
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-04-13T18:47:35Z
publishDate 2019-04-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-d3ca2443a830457cb9639fb52b17793d2022-12-22T02:34:32ZengFrontiers Media S.A.Frontiers in Genetics1664-80212019-04-011010.3389/fgene.2019.00261440307Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental PopulationsAnna V. MikhaylovaTimothy A. ThorntonUsing genetic data to predict gene expression has garnered significant attention in recent years. PrediXcan has become one of the most widely used gene-based methods for testing associations between predicted gene expression values and a phenotype, which has facilitated novel insights into the relationship between complex traits and the component of gene expression that can be attributed to genetic variation. The gene expression prediction models for PrediXcan were developed using supervised machine learning methods and training data from the Depression Genes and Networks (DGN) study and the Genotype-Tissue Expression (GTEx) project, where the majority of subjects are of European descent. Many genetic studies, however, include samples from multi-ethnic populations, and in this paper we evaluate the accuracy of PrediXcan for predicting gene expression in diverse populations. Using transcriptomic data from the GEUVADIS (Genetic European Variation in Disease) RNA sequencing project and whole genome sequencing data from the 1000 Genomes project, we evaluate and compare the predictive performance of PrediXcan in an African population (Yoruban) and four European ancestry populations for thousands of genes. We evaluate a range of models from the PrediXcan weight databases and use Pearson's correlation coefficient to assess gene expression prediction accuracy with PrediXcan. From our evaluation, we find that the predictive performance of PrediXcan varies substantially among populations from different continents (F-test p-value < 2.2 × 10−16), where prediction accuracy is lower in the Yoruban population from West Africa compared to the European-ancestry populations. Moreover, not only do we find differences in predictive performance between populations from different continents, we also find highly significant differences in prediction accuracy among the four European ancestry populations considered (F-test p-value < 2.2 × 10−16). Finally, while there is variability in prediction accuracy across different PrediXcan weight databases, we also find consistency in the qualitative performance of PrediXcan for the five populations considered, with the African ancestry population having the lowest accuracy across databases.https://www.frontiersin.org/article/10.3389/fgene.2019.00261/fulltranscriptomeexpression quantitative trait loci (eQTL)genetic diversitygenetic mappingcomplex traits
spellingShingle Anna V. Mikhaylova
Timothy A. Thornton
Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations
Frontiers in Genetics
transcriptome
expression quantitative trait loci (eQTL)
genetic diversity
genetic mapping
complex traits
title Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations
title_full Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations
title_fullStr Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations
title_full_unstemmed Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations
title_short Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations
title_sort accuracy of gene expression prediction from genotype data with predixcan varies across and within continental populations
topic transcriptome
expression quantitative trait loci (eQTL)
genetic diversity
genetic mapping
complex traits
url https://www.frontiersin.org/article/10.3389/fgene.2019.00261/full
work_keys_str_mv AT annavmikhaylova accuracyofgeneexpressionpredictionfromgenotypedatawithpredixcanvariesacrossandwithincontinentalpopulations
AT timothyathornton accuracyofgeneexpressionpredictionfromgenotypedatawithpredixcanvariesacrossandwithincontinentalpopulations