Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations
Using genetic data to predict gene expression has garnered significant attention in recent years. PrediXcan has become one of the most widely used gene-based methods for testing associations between predicted gene expression values and a phenotype, which has facilitated novel insights into the relat...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2019-04-01
|
Series: | Frontiers in Genetics |
Subjects: | |
Online Access: | https://www.frontiersin.org/article/10.3389/fgene.2019.00261/full |
_version_ | 1811340816812081152 |
---|---|
author | Anna V. Mikhaylova Timothy A. Thornton |
author_facet | Anna V. Mikhaylova Timothy A. Thornton |
author_sort | Anna V. Mikhaylova |
collection | DOAJ |
description | Using genetic data to predict gene expression has garnered significant attention in recent years. PrediXcan has become one of the most widely used gene-based methods for testing associations between predicted gene expression values and a phenotype, which has facilitated novel insights into the relationship between complex traits and the component of gene expression that can be attributed to genetic variation. The gene expression prediction models for PrediXcan were developed using supervised machine learning methods and training data from the Depression Genes and Networks (DGN) study and the Genotype-Tissue Expression (GTEx) project, where the majority of subjects are of European descent. Many genetic studies, however, include samples from multi-ethnic populations, and in this paper we evaluate the accuracy of PrediXcan for predicting gene expression in diverse populations. Using transcriptomic data from the GEUVADIS (Genetic European Variation in Disease) RNA sequencing project and whole genome sequencing data from the 1000 Genomes project, we evaluate and compare the predictive performance of PrediXcan in an African population (Yoruban) and four European ancestry populations for thousands of genes. We evaluate a range of models from the PrediXcan weight databases and use Pearson's correlation coefficient to assess gene expression prediction accuracy with PrediXcan. From our evaluation, we find that the predictive performance of PrediXcan varies substantially among populations from different continents (F-test p-value < 2.2 × 10−16), where prediction accuracy is lower in the Yoruban population from West Africa compared to the European-ancestry populations. Moreover, not only do we find differences in predictive performance between populations from different continents, we also find highly significant differences in prediction accuracy among the four European ancestry populations considered (F-test p-value < 2.2 × 10−16). Finally, while there is variability in prediction accuracy across different PrediXcan weight databases, we also find consistency in the qualitative performance of PrediXcan for the five populations considered, with the African ancestry population having the lowest accuracy across databases. |
first_indexed | 2024-04-13T18:47:35Z |
format | Article |
id | doaj.art-d3ca2443a830457cb9639fb52b17793d |
institution | Directory Open Access Journal |
issn | 1664-8021 |
language | English |
last_indexed | 2024-04-13T18:47:35Z |
publishDate | 2019-04-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Genetics |
spelling | doaj.art-d3ca2443a830457cb9639fb52b17793d2022-12-22T02:34:32ZengFrontiers Media S.A.Frontiers in Genetics1664-80212019-04-011010.3389/fgene.2019.00261440307Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental PopulationsAnna V. MikhaylovaTimothy A. ThorntonUsing genetic data to predict gene expression has garnered significant attention in recent years. PrediXcan has become one of the most widely used gene-based methods for testing associations between predicted gene expression values and a phenotype, which has facilitated novel insights into the relationship between complex traits and the component of gene expression that can be attributed to genetic variation. The gene expression prediction models for PrediXcan were developed using supervised machine learning methods and training data from the Depression Genes and Networks (DGN) study and the Genotype-Tissue Expression (GTEx) project, where the majority of subjects are of European descent. Many genetic studies, however, include samples from multi-ethnic populations, and in this paper we evaluate the accuracy of PrediXcan for predicting gene expression in diverse populations. Using transcriptomic data from the GEUVADIS (Genetic European Variation in Disease) RNA sequencing project and whole genome sequencing data from the 1000 Genomes project, we evaluate and compare the predictive performance of PrediXcan in an African population (Yoruban) and four European ancestry populations for thousands of genes. We evaluate a range of models from the PrediXcan weight databases and use Pearson's correlation coefficient to assess gene expression prediction accuracy with PrediXcan. From our evaluation, we find that the predictive performance of PrediXcan varies substantially among populations from different continents (F-test p-value < 2.2 × 10−16), where prediction accuracy is lower in the Yoruban population from West Africa compared to the European-ancestry populations. Moreover, not only do we find differences in predictive performance between populations from different continents, we also find highly significant differences in prediction accuracy among the four European ancestry populations considered (F-test p-value < 2.2 × 10−16). Finally, while there is variability in prediction accuracy across different PrediXcan weight databases, we also find consistency in the qualitative performance of PrediXcan for the five populations considered, with the African ancestry population having the lowest accuracy across databases.https://www.frontiersin.org/article/10.3389/fgene.2019.00261/fulltranscriptomeexpression quantitative trait loci (eQTL)genetic diversitygenetic mappingcomplex traits |
spellingShingle | Anna V. Mikhaylova Timothy A. Thornton Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations Frontiers in Genetics transcriptome expression quantitative trait loci (eQTL) genetic diversity genetic mapping complex traits |
title | Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations |
title_full | Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations |
title_fullStr | Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations |
title_full_unstemmed | Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations |
title_short | Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations |
title_sort | accuracy of gene expression prediction from genotype data with predixcan varies across and within continental populations |
topic | transcriptome expression quantitative trait loci (eQTL) genetic diversity genetic mapping complex traits |
url | https://www.frontiersin.org/article/10.3389/fgene.2019.00261/full |
work_keys_str_mv | AT annavmikhaylova accuracyofgeneexpressionpredictionfromgenotypedatawithpredixcanvariesacrossandwithincontinentalpopulations AT timothyathornton accuracyofgeneexpressionpredictionfromgenotypedatawithpredixcanvariesacrossandwithincontinentalpopulations |