Identification and control for the effects of bioinformatic globin depletion on human RNA-seq differential expression analysis

Abstract When profiling blood samples by RNA-sequencing (RNA-seq), RNA from haemoglobin (Hgb) can account for up to 70% of the transcriptome. Due to considerations of sequencing depth and power to detect biological variation, Hgb RNA is typically depleted prior to sequencing by hybridisation-based m...

Full description

Bibliographic Details
Main Authors: Dylan Sheerin, Francisco Lakay, Hanif Esmail, Craig Kinnear, Bianca Sansom, Brigitte Glanzmann, Robert J. Wilkinson, Matthew E. Ritchie, Anna K. Coussens
Format: Article
Language:English
Published: Nature Portfolio 2023-02-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-023-28218-7
_version_ 1811171754183229440
author Dylan Sheerin
Francisco Lakay
Hanif Esmail
Craig Kinnear
Bianca Sansom
Brigitte Glanzmann
Robert J. Wilkinson
Matthew E. Ritchie
Anna K. Coussens
author_facet Dylan Sheerin
Francisco Lakay
Hanif Esmail
Craig Kinnear
Bianca Sansom
Brigitte Glanzmann
Robert J. Wilkinson
Matthew E. Ritchie
Anna K. Coussens
author_sort Dylan Sheerin
collection DOAJ
description Abstract When profiling blood samples by RNA-sequencing (RNA-seq), RNA from haemoglobin (Hgb) can account for up to 70% of the transcriptome. Due to considerations of sequencing depth and power to detect biological variation, Hgb RNA is typically depleted prior to sequencing by hybridisation-based methods; an alternative approach is to deplete reads arising from Hgb RNA bioinformatically. In the present study, we compared the impact of these two approaches on the outcome of differential gene expression analysis performed using RNA-seq data from 58 human tuberculosis (TB) patient or contact whole blood samples–29 globin kit-depleted and 29 matched non-depleted—a subset of which were taken at TB diagnosis and at six months post-TB treatment from the same patient. Bioinformatic depletion of Hgb genes from the non-depleted samples (bioinformatic-depleted) substantially reduced library sizes (median = 57.24%) and fewer long non-coding, micro, small nuclear and small nucleolar RNAs were captured in these libraries. Profiling published TB gene signatures across all samples revealed inferior correlation between kit-depleted and bioinformatic-depleted pairs when the proportion of reads mapping to Hgb genes was higher in the non-depleted sample, particularly at the TB diagnosis time point. A set of putative “globin-fingerprint” genes were identified by directly comparing kit-depleted and bioinformatic-depleted samples at each timepoint. Two TB treatment response signatures were also shown to have decreased differential performance when comparing samples at TB diagnosis to six months post-TB treatment when profiled on the bioinformatic-depleted samples compared with their kit-depleted counterparts. These results demonstrate that failure to deplete Hgb RNA prior to sequencing has a negative impact on the sensitivity to detect disease-relevant gene expression changes even when bioinformatic removal is performed.
first_indexed 2024-04-10T17:19:16Z
format Article
id doaj.art-4c58c4aa5ae14734864fe13b1e9a05cd
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-04-10T17:19:16Z
publishDate 2023-02-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-4c58c4aa5ae14734864fe13b1e9a05cd2023-02-05T12:11:35ZengNature PortfolioScientific Reports2045-23222023-02-0113111110.1038/s41598-023-28218-7Identification and control for the effects of bioinformatic globin depletion on human RNA-seq differential expression analysisDylan Sheerin0Francisco Lakay1Hanif Esmail2Craig Kinnear3Bianca Sansom4Brigitte Glanzmann5Robert J. Wilkinson6Matthew E. Ritchie7Anna K. Coussens8Infectious Diseases and Immune Defence Division, The Walter and Eliza Hall Institute of Medical ResearchWellcome Centre for Infectious Diseases Research in Africa and Institute of Infectious Disease and Molecular Medicine, University of Cape Town, ObservatoryWellcome Centre for Infectious Diseases Research in Africa and Institute of Infectious Disease and Molecular Medicine, University of Cape Town, ObservatorySouth African Medical Research Council Genomics CentreSouth African Medical Research Council Genomics CentreSouth African Medical Research Council Genomics CentreWellcome Centre for Infectious Diseases Research in Africa and Institute of Infectious Disease and Molecular Medicine, University of Cape Town, ObservatoryEpigenetics and Development Division, The Walter and Eliza Hall Institute of Medical ResearchInfectious Diseases and Immune Defence Division, The Walter and Eliza Hall Institute of Medical ResearchAbstract When profiling blood samples by RNA-sequencing (RNA-seq), RNA from haemoglobin (Hgb) can account for up to 70% of the transcriptome. Due to considerations of sequencing depth and power to detect biological variation, Hgb RNA is typically depleted prior to sequencing by hybridisation-based methods; an alternative approach is to deplete reads arising from Hgb RNA bioinformatically. In the present study, we compared the impact of these two approaches on the outcome of differential gene expression analysis performed using RNA-seq data from 58 human tuberculosis (TB) patient or contact whole blood samples–29 globin kit-depleted and 29 matched non-depleted—a subset of which were taken at TB diagnosis and at six months post-TB treatment from the same patient. Bioinformatic depletion of Hgb genes from the non-depleted samples (bioinformatic-depleted) substantially reduced library sizes (median = 57.24%) and fewer long non-coding, micro, small nuclear and small nucleolar RNAs were captured in these libraries. Profiling published TB gene signatures across all samples revealed inferior correlation between kit-depleted and bioinformatic-depleted pairs when the proportion of reads mapping to Hgb genes was higher in the non-depleted sample, particularly at the TB diagnosis time point. A set of putative “globin-fingerprint” genes were identified by directly comparing kit-depleted and bioinformatic-depleted samples at each timepoint. Two TB treatment response signatures were also shown to have decreased differential performance when comparing samples at TB diagnosis to six months post-TB treatment when profiled on the bioinformatic-depleted samples compared with their kit-depleted counterparts. These results demonstrate that failure to deplete Hgb RNA prior to sequencing has a negative impact on the sensitivity to detect disease-relevant gene expression changes even when bioinformatic removal is performed.https://doi.org/10.1038/s41598-023-28218-7
spellingShingle Dylan Sheerin
Francisco Lakay
Hanif Esmail
Craig Kinnear
Bianca Sansom
Brigitte Glanzmann
Robert J. Wilkinson
Matthew E. Ritchie
Anna K. Coussens
Identification and control for the effects of bioinformatic globin depletion on human RNA-seq differential expression analysis
Scientific Reports
title Identification and control for the effects of bioinformatic globin depletion on human RNA-seq differential expression analysis
title_full Identification and control for the effects of bioinformatic globin depletion on human RNA-seq differential expression analysis
title_fullStr Identification and control for the effects of bioinformatic globin depletion on human RNA-seq differential expression analysis
title_full_unstemmed Identification and control for the effects of bioinformatic globin depletion on human RNA-seq differential expression analysis
title_short Identification and control for the effects of bioinformatic globin depletion on human RNA-seq differential expression analysis
title_sort identification and control for the effects of bioinformatic globin depletion on human rna seq differential expression analysis
url https://doi.org/10.1038/s41598-023-28218-7
work_keys_str_mv AT dylansheerin identificationandcontrolfortheeffectsofbioinformaticglobindepletiononhumanrnaseqdifferentialexpressionanalysis
AT franciscolakay identificationandcontrolfortheeffectsofbioinformaticglobindepletiononhumanrnaseqdifferentialexpressionanalysis
AT hanifesmail identificationandcontrolfortheeffectsofbioinformaticglobindepletiononhumanrnaseqdifferentialexpressionanalysis
AT craigkinnear identificationandcontrolfortheeffectsofbioinformaticglobindepletiononhumanrnaseqdifferentialexpressionanalysis
AT biancasansom identificationandcontrolfortheeffectsofbioinformaticglobindepletiononhumanrnaseqdifferentialexpressionanalysis
AT brigitteglanzmann identificationandcontrolfortheeffectsofbioinformaticglobindepletiononhumanrnaseqdifferentialexpressionanalysis
AT robertjwilkinson identificationandcontrolfortheeffectsofbioinformaticglobindepletiononhumanrnaseqdifferentialexpressionanalysis
AT mattheweritchie identificationandcontrolfortheeffectsofbioinformaticglobindepletiononhumanrnaseqdifferentialexpressionanalysis
AT annakcoussens identificationandcontrolfortheeffectsofbioinformaticglobindepletiononhumanrnaseqdifferentialexpressionanalysis