Direct inference and control of genetic population structure from RNA sequencing data

Abstract RNAseq data can be used to infer genetic variants, yet its use for estimating genetic population structure remains underexplored. Here, we construct a freely available computational tool (RGStraP) to estimate RNAseq-based genetic principal components (RG-PCs) and assess whether RG-PCs can b...

Full description

Bibliographic Details
Main Authors: Muhamad Fachrul, Abhilasha Karkey, Mila Shakya, Louise M. Judd, Taylor Harshegyi, Kar Seng Sim, Susan Tonks, Sabina Dongol, Rajendra Shrestha, Agus Salim, STRATAA study group, Stephen Baker, Andrew J. Pollard, Chiea Chuen Khor, Christiane Dolecek, Buddha Basnyat, Sarah J. Dunstan, Kathryn E. Holt, Michael Inouye
Format: Article
Language:English
Published: Nature Portfolio 2023-08-01
Series:Communications Biology
Online Access:https://doi.org/10.1038/s42003-023-05171-9
_version_ 1797752608775995392
author Muhamad Fachrul
Abhilasha Karkey
Mila Shakya
Louise M. Judd
Taylor Harshegyi
Kar Seng Sim
Susan Tonks
Sabina Dongol
Rajendra Shrestha
Agus Salim
STRATAA study group
Stephen Baker
Andrew J. Pollard
Chiea Chuen Khor
Christiane Dolecek
Buddha Basnyat
Sarah J. Dunstan
Kathryn E. Holt
Michael Inouye
author_facet Muhamad Fachrul
Abhilasha Karkey
Mila Shakya
Louise M. Judd
Taylor Harshegyi
Kar Seng Sim
Susan Tonks
Sabina Dongol
Rajendra Shrestha
Agus Salim
STRATAA study group
Stephen Baker
Andrew J. Pollard
Chiea Chuen Khor
Christiane Dolecek
Buddha Basnyat
Sarah J. Dunstan
Kathryn E. Holt
Michael Inouye
author_sort Muhamad Fachrul
collection DOAJ
description Abstract RNAseq data can be used to infer genetic variants, yet its use for estimating genetic population structure remains underexplored. Here, we construct a freely available computational tool (RGStraP) to estimate RNAseq-based genetic principal components (RG-PCs) and assess whether RG-PCs can be used to control for population structure in gene expression analyses. Using whole blood samples from understudied Nepalese populations and the Geuvadis study, we show that RG-PCs had comparable results to paired array-based genotypes, with high genotype concordance and high correlations of genetic principal components, capturing subpopulations within the dataset. In differential gene expression analysis, we found that inclusion of RG-PCs as covariates reduced test statistic inflation. Our paper demonstrates that genetic population structure can be directly inferred and controlled for using RNAseq data, thus facilitating improved retrospective and future analyses of transcriptomic data.
first_indexed 2024-03-12T17:06:55Z
format Article
id doaj.art-390b9386454c4f749e67d7d8d78820c8
institution Directory Open Access Journal
issn 2399-3642
language English
last_indexed 2024-03-12T17:06:55Z
publishDate 2023-08-01
publisher Nature Portfolio
record_format Article
series Communications Biology
spelling doaj.art-390b9386454c4f749e67d7d8d78820c82023-08-06T11:22:34ZengNature PortfolioCommunications Biology2399-36422023-08-01611910.1038/s42003-023-05171-9Direct inference and control of genetic population structure from RNA sequencing dataMuhamad Fachrul0Abhilasha Karkey1Mila Shakya2Louise M. Judd3Taylor Harshegyi4Kar Seng Sim5Susan Tonks6Sabina Dongol7Rajendra Shrestha8Agus Salim9STRATAA study groupStephen Baker10Andrew J. Pollard11Chiea Chuen Khor12Christiane Dolecek13Buddha Basnyat14Sarah J. Dunstan15Kathryn E. Holt16Michael Inouye17Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes InstituteOxford University Clinical Research Unit, Patan Academy of Health SciencesOxford University Clinical Research Unit, Patan Academy of Health SciencesDepartment of Infectious Diseases, Central Clinical School, Monash UniversityDepartment of Infectious Diseases, Central Clinical School, Monash UniversityGenome Institute of SingaporeOxford Vaccine Group, Department of Paediatrics, University of Oxford, and the NIHR Oxford Biomedical Research CentreOxford University Clinical Research Unit, Patan Academy of Health SciencesPatan Academy of Health Sciences, Patan HospitalCentre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of MelbourneDepartment of Medicine, University of CambridgeOxford Vaccine Group, Department of Paediatrics, University of Oxford, and the NIHR Oxford Biomedical Research CentreGenome Institute of SingaporeNuffield Department of Medicine, Centre for Tropical Medicine and Global Health, University of OxfordOxford University Clinical Research Unit, Patan Academy of Health SciencesThe Peter Doherty Institute for Infection and Immunity, The University of MelbourneDepartment of Infectious Diseases, Central Clinical School, Monash UniversityCambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes InstituteAbstract RNAseq data can be used to infer genetic variants, yet its use for estimating genetic population structure remains underexplored. Here, we construct a freely available computational tool (RGStraP) to estimate RNAseq-based genetic principal components (RG-PCs) and assess whether RG-PCs can be used to control for population structure in gene expression analyses. Using whole blood samples from understudied Nepalese populations and the Geuvadis study, we show that RG-PCs had comparable results to paired array-based genotypes, with high genotype concordance and high correlations of genetic principal components, capturing subpopulations within the dataset. In differential gene expression analysis, we found that inclusion of RG-PCs as covariates reduced test statistic inflation. Our paper demonstrates that genetic population structure can be directly inferred and controlled for using RNAseq data, thus facilitating improved retrospective and future analyses of transcriptomic data.https://doi.org/10.1038/s42003-023-05171-9
spellingShingle Muhamad Fachrul
Abhilasha Karkey
Mila Shakya
Louise M. Judd
Taylor Harshegyi
Kar Seng Sim
Susan Tonks
Sabina Dongol
Rajendra Shrestha
Agus Salim
STRATAA study group
Stephen Baker
Andrew J. Pollard
Chiea Chuen Khor
Christiane Dolecek
Buddha Basnyat
Sarah J. Dunstan
Kathryn E. Holt
Michael Inouye
Direct inference and control of genetic population structure from RNA sequencing data
Communications Biology
title Direct inference and control of genetic population structure from RNA sequencing data
title_full Direct inference and control of genetic population structure from RNA sequencing data
title_fullStr Direct inference and control of genetic population structure from RNA sequencing data
title_full_unstemmed Direct inference and control of genetic population structure from RNA sequencing data
title_short Direct inference and control of genetic population structure from RNA sequencing data
title_sort direct inference and control of genetic population structure from rna sequencing data
url https://doi.org/10.1038/s42003-023-05171-9
work_keys_str_mv AT muhamadfachrul directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata
AT abhilashakarkey directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata
AT milashakya directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata
AT louisemjudd directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata
AT taylorharshegyi directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata
AT karsengsim directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata
AT susantonks directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata
AT sabinadongol directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata
AT rajendrashrestha directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata
AT agussalim directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata
AT strataastudygroup directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata
AT stephenbaker directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata
AT andrewjpollard directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata
AT chieachuenkhor directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata
AT christianedolecek directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata
AT buddhabasnyat directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata
AT sarahjdunstan directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata
AT kathryneholt directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata
AT michaelinouye directinferenceandcontrolofgeneticpopulationstructurefromrnasequencingdata