Reference genome and transcriptome informed by the sex chromosome complement of the sample increase ability to detect sex differences in gene expression from RNA-Seq data

Abstract Background Human X and Y chromosomes share an evolutionary origin and, as a consequence, sequence similarity. We investigated whether the sequence homology between the X and Y chromosomes affects the alignment of RNA-Seq reads and estimates of differential expression. We tested the effects...

Full description

Bibliographic Details
Main Authors: Kimberly C. Olney, Sarah M. Brotman, Jocelyn P. Andrews, Valeria A. Valverde-Vesling, Melissa A. Wilson
Format: Article
Language:English
Published: BMC 2020-07-01
Series:Biology of Sex Differences
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13293-020-00312-9
_version_ 1828939723913035776
author Kimberly C. Olney
Sarah M. Brotman
Jocelyn P. Andrews
Valeria A. Valverde-Vesling
Melissa A. Wilson
author_facet Kimberly C. Olney
Sarah M. Brotman
Jocelyn P. Andrews
Valeria A. Valverde-Vesling
Melissa A. Wilson
author_sort Kimberly C. Olney
collection DOAJ
description Abstract Background Human X and Y chromosomes share an evolutionary origin and, as a consequence, sequence similarity. We investigated whether the sequence homology between the X and Y chromosomes affects the alignment of RNA-Seq reads and estimates of differential expression. We tested the effects of using reference genomes and reference transcriptomes informed by the sex chromosome complement of the sample’s genome on the measurements of RNA-Seq abundance and sex differences in expression. Results The default genome includes the entire human reference genome (GRCh38), including the entire sequence of the X and Y chromosomes. We created two sex chromosome complement informed reference genomes. One sex chromosome complement informed reference genome was used for samples that lacked a Y chromosome; for this reference genome version, we hard-masked the entire Y chromosome. For the other sex chromosome complement informed reference genome, to be used for samples with a Y chromosome, we hard-masked only the pseudoautosomal regions of the Y chromosome, because these regions are duplicated identically in the reference genome on the X chromosome. We analyzed the transcript abundance in the whole blood, brain cortex, breast, liver, and thyroid tissues from 20 genetic female (46, XX) and 20 genetic male (46, XY) samples. Each sample was aligned twice: once to the default reference genome and then independently aligned to a reference genome informed by the sex chromosome complement of the sample, repeated using two different read aligners, HISAT and STAR. We then quantified sex differences in gene expression using featureCounts to get the raw count estimates followed by Limma/Voom for normalization and differential expression. We additionally created sex chromosome complement informed transcriptome references for use in pseudo-alignment using Salmon. Transcript abundance was quantified twice for each sample: once to the default target transcripts and then independently to target transcripts informed by the sex chromosome complement of the sample. Conclusions We show that regardless of the choice of the read aligner, using an alignment protocol informed by the sex chromosome complement of the sample results in higher expression estimates on the pseudoautosomal regions of the X chromosome in both genetic male and genetic female samples, as well as an increased number of unique genes being called as differentially expressed between the sexes. We additionally show that using a pseudo-alignment approach informed on the sex chromosome complement of the sample eliminates Y-linked expression in female XX samples.
first_indexed 2024-12-14T03:07:48Z
format Article
id doaj.art-8c760b741f52491795218fddc88a53c6
institution Directory Open Access Journal
issn 2042-6410
language English
last_indexed 2024-12-14T03:07:48Z
publishDate 2020-07-01
publisher BMC
record_format Article
series Biology of Sex Differences
spelling doaj.art-8c760b741f52491795218fddc88a53c62022-12-21T23:19:21ZengBMCBiology of Sex Differences2042-64102020-07-0111111810.1186/s13293-020-00312-9Reference genome and transcriptome informed by the sex chromosome complement of the sample increase ability to detect sex differences in gene expression from RNA-Seq dataKimberly C. Olney0Sarah M. Brotman1Jocelyn P. Andrews2Valeria A. Valverde-Vesling3Melissa A. Wilson4School of Life Sciences, Arizona State UniversitySchool of Life Sciences, Arizona State UniversitySchool of Life Sciences, Arizona State UniversitySchool of Life Sciences, Arizona State UniversitySchool of Life Sciences, Arizona State UniversityAbstract Background Human X and Y chromosomes share an evolutionary origin and, as a consequence, sequence similarity. We investigated whether the sequence homology between the X and Y chromosomes affects the alignment of RNA-Seq reads and estimates of differential expression. We tested the effects of using reference genomes and reference transcriptomes informed by the sex chromosome complement of the sample’s genome on the measurements of RNA-Seq abundance and sex differences in expression. Results The default genome includes the entire human reference genome (GRCh38), including the entire sequence of the X and Y chromosomes. We created two sex chromosome complement informed reference genomes. One sex chromosome complement informed reference genome was used for samples that lacked a Y chromosome; for this reference genome version, we hard-masked the entire Y chromosome. For the other sex chromosome complement informed reference genome, to be used for samples with a Y chromosome, we hard-masked only the pseudoautosomal regions of the Y chromosome, because these regions are duplicated identically in the reference genome on the X chromosome. We analyzed the transcript abundance in the whole blood, brain cortex, breast, liver, and thyroid tissues from 20 genetic female (46, XX) and 20 genetic male (46, XY) samples. Each sample was aligned twice: once to the default reference genome and then independently aligned to a reference genome informed by the sex chromosome complement of the sample, repeated using two different read aligners, HISAT and STAR. We then quantified sex differences in gene expression using featureCounts to get the raw count estimates followed by Limma/Voom for normalization and differential expression. We additionally created sex chromosome complement informed transcriptome references for use in pseudo-alignment using Salmon. Transcript abundance was quantified twice for each sample: once to the default target transcripts and then independently to target transcripts informed by the sex chromosome complement of the sample. Conclusions We show that regardless of the choice of the read aligner, using an alignment protocol informed by the sex chromosome complement of the sample results in higher expression estimates on the pseudoautosomal regions of the X chromosome in both genetic male and genetic female samples, as well as an increased number of unique genes being called as differentially expressed between the sexes. We additionally show that using a pseudo-alignment approach informed on the sex chromosome complement of the sample eliminates Y-linked expression in female XX samples.http://link.springer.com/article/10.1186/s13293-020-00312-9RNA-SeqSex chromosomesDifferential expressionTranscriptomeMappingAlignment
spellingShingle Kimberly C. Olney
Sarah M. Brotman
Jocelyn P. Andrews
Valeria A. Valverde-Vesling
Melissa A. Wilson
Reference genome and transcriptome informed by the sex chromosome complement of the sample increase ability to detect sex differences in gene expression from RNA-Seq data
Biology of Sex Differences
RNA-Seq
Sex chromosomes
Differential expression
Transcriptome
Mapping
Alignment
title Reference genome and transcriptome informed by the sex chromosome complement of the sample increase ability to detect sex differences in gene expression from RNA-Seq data
title_full Reference genome and transcriptome informed by the sex chromosome complement of the sample increase ability to detect sex differences in gene expression from RNA-Seq data
title_fullStr Reference genome and transcriptome informed by the sex chromosome complement of the sample increase ability to detect sex differences in gene expression from RNA-Seq data
title_full_unstemmed Reference genome and transcriptome informed by the sex chromosome complement of the sample increase ability to detect sex differences in gene expression from RNA-Seq data
title_short Reference genome and transcriptome informed by the sex chromosome complement of the sample increase ability to detect sex differences in gene expression from RNA-Seq data
title_sort reference genome and transcriptome informed by the sex chromosome complement of the sample increase ability to detect sex differences in gene expression from rna seq data
topic RNA-Seq
Sex chromosomes
Differential expression
Transcriptome
Mapping
Alignment
url http://link.springer.com/article/10.1186/s13293-020-00312-9
work_keys_str_mv AT kimberlycolney referencegenomeandtranscriptomeinformedbythesexchromosomecomplementofthesampleincreaseabilitytodetectsexdifferencesingeneexpressionfromrnaseqdata
AT sarahmbrotman referencegenomeandtranscriptomeinformedbythesexchromosomecomplementofthesampleincreaseabilitytodetectsexdifferencesingeneexpressionfromrnaseqdata
AT jocelynpandrews referencegenomeandtranscriptomeinformedbythesexchromosomecomplementofthesampleincreaseabilitytodetectsexdifferencesingeneexpressionfromrnaseqdata
AT valeriaavalverdevesling referencegenomeandtranscriptomeinformedbythesexchromosomecomplementofthesampleincreaseabilitytodetectsexdifferencesingeneexpressionfromrnaseqdata
AT melissaawilson referencegenomeandtranscriptomeinformedbythesexchromosomecomplementofthesampleincreaseabilitytodetectsexdifferencesingeneexpressionfromrnaseqdata