Direct inference and control of genetic population structure from RNA sequencing data
Abstract RNAseq data can be used to infer genetic variants, yet its use for estimating genetic population structure remains underexplored. Here, we construct a freely available computational tool (RGStraP) to estimate RNAseq-based genetic principal components (RG-PCs) and assess whether RG-PCs can b...
Main Authors: | , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2023-08-01
|
Series: | Communications Biology |
Online Access: | https://doi.org/10.1038/s42003-023-05171-9 |
_version_ | 1797752608775995392 |
---|---|
author | Muhamad Fachrul Abhilasha Karkey Mila Shakya Louise M. Judd Taylor Harshegyi Kar Seng Sim Susan Tonks Sabina Dongol Rajendra Shrestha Agus Salim STRATAA study group Stephen Baker Andrew J. Pollard Chiea Chuen Khor Christiane Dolecek Buddha Basnyat Sarah J. Dunstan Kathryn E. Holt Michael Inouye |
author_facet | Muhamad Fachrul Abhilasha Karkey Mila Shakya Louise M. Judd Taylor Harshegyi Kar Seng Sim Susan Tonks Sabina Dongol Rajendra Shrestha Agus Salim STRATAA study group Stephen Baker Andrew J. Pollard Chiea Chuen Khor Christiane Dolecek Buddha Basnyat Sarah J. Dunstan Kathryn E. Holt Michael Inouye |
author_sort | Muhamad Fachrul |
collection | DOAJ |
description | Abstract RNAseq data can be used to infer genetic variants, yet its use for estimating genetic population structure remains underexplored. Here, we construct a freely available computational tool (RGStraP) to estimate RNAseq-based genetic principal components (RG-PCs) and assess whether RG-PCs can be used to control for population structure in gene expression analyses. Using whole blood samples from understudied Nepalese populations and the Geuvadis study, we show that RG-PCs had comparable results to paired array-based genotypes, with high genotype concordance and high correlations of genetic principal components, capturing subpopulations within the dataset. In differential gene expression analysis, we found that inclusion of RG-PCs as covariates reduced test statistic inflation. Our paper demonstrates that genetic population structure can be directly inferred and controlled for using RNAseq data, thus facilitating improved retrospective and future analyses of transcriptomic data. |
first_indexed | 2024-03-12T17:06:55Z |
format | Article |
id | doaj.art-390b9386454c4f749e67d7d8d78820c8 |
institution | Directory Open Access Journal |
issn | 2399-3642 |
language | English |
last_indexed | 2024-03-12T17:06:55Z |
publishDate | 2023-08-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Communications Biology |
spelling | doaj.art-390b9386454c4f749e67d7d8d78820c82023-08-06T11:22:34ZengNature PortfolioCommunications Biology2399-36422023-08-01611910.1038/s42003-023-05171-9Direct inference and control of genetic population structure from RNA sequencing dataMuhamad Fachrul0Abhilasha Karkey1Mila Shakya2Louise M. Judd3Taylor Harshegyi4Kar Seng Sim5Susan Tonks6Sabina Dongol7Rajendra Shrestha8Agus Salim9STRATAA study groupStephen Baker10Andrew J. Pollard11Chiea Chuen Khor12Christiane Dolecek13Buddha Basnyat14Sarah J. Dunstan15Kathryn E. Holt16Michael Inouye17Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes InstituteOxford University Clinical Research Unit, Patan Academy of Health SciencesOxford University Clinical Research Unit, Patan Academy of Health SciencesDepartment of Infectious Diseases, Central Clinical School, Monash UniversityDepartment of Infectious Diseases, Central Clinical School, Monash UniversityGenome Institute of SingaporeOxford Vaccine Group, Department of Paediatrics, University of Oxford, and the NIHR Oxford Biomedical Research CentreOxford University Clinical Research Unit, Patan Academy of Health SciencesPatan Academy of Health Sciences, Patan HospitalCentre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of MelbourneDepartment of Medicine, University of CambridgeOxford Vaccine Group, Department of Paediatrics, University of Oxford, and the NIHR Oxford Biomedical Research CentreGenome Institute of SingaporeNuffield Department of Medicine, Centre for Tropical Medicine and Global Health, University of OxfordOxford University Clinical Research Unit, Patan Academy of Health SciencesThe Peter Doherty Institute for Infection and Immunity, The University of MelbourneDepartment of Infectious Diseases, Central Clinical School, Monash UniversityCambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes InstituteAbstract RNAseq data can be used to infer genetic variants, yet its use for estimating genetic population structure remains underexplored. Here, we construct a freely available computational tool (RGStraP) to estimate RNAseq-based genetic principal components (RG-PCs) and assess whether RG-PCs can be used to control for population structure in gene expression analyses. Using whole blood samples from understudied Nepalese populations and the Geuvadis study, we show that RG-PCs had comparable results to paired array-based genotypes, with high genotype concordance and high correlations of genetic principal components, capturing subpopulations within the dataset. In differential gene expression analysis, we found that inclusion of RG-PCs as covariates reduced test statistic inflation. Our paper demonstrates that genetic population structure can be directly inferred and controlled for using RNAseq data, thus facilitating improved retrospective and future analyses of transcriptomic data.https://doi.org/10.1038/s42003-023-05171-9 |
spellingShingle | Muhamad Fachrul Abhilasha Karkey Mila Shakya Louise M. Judd Taylor Harshegyi Kar Seng Sim Susan Tonks Sabina Dongol Rajendra Shrestha Agus Salim STRATAA study group Stephen Baker Andrew J. Pollard Chiea Chuen Khor Christiane Dolecek Buddha Basnyat Sarah J. Dunstan Kathryn E. Holt Michael Inouye Direct inference and control of genetic population structure from RNA sequencing data Communications Biology |
title | Direct inference and control of genetic population structure from RNA sequencing data |
title_full | Direct inference and control of genetic population structure from RNA sequencing data |
title_fullStr | Direct inference and control of genetic population structure from RNA sequencing data |
title_full_unstemmed | Direct inference and control of genetic population structure from RNA sequencing data |
title_short | Direct inference and control of genetic population structure from RNA sequencing data |
title_sort | direct inference and control of genetic population structure from rna sequencing data |
url | https://doi.org/10.1038/s42003-023-05171-9 |
work_keys_str_mv | AT muhamadfachrul directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata AT abhilashakarkey directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata AT milashakya directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata AT louisemjudd directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata AT taylorharshegyi directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata AT karsengsim directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata AT susantonks directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata AT sabinadongol directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata AT rajendrashrestha directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata AT agussalim directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata AT strataastudygroup directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata AT stephenbaker directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata AT andrewjpollard directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata AT chieachuenkhor directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata AT christianedolecek directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata AT buddhabasnyat directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata AT sarahjdunstan directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata AT kathryneholt directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata AT michaelinouye directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata |