RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study

Abstract Background Measuring how gene expression changes in the course of an experiment assesses how an organism responds on a molecular level. Sequencing of RNA molecules, and their subsequent quantification, aims to assess global gene expression changes on the RNA level (transcriptome). While adv...

Full description

Bibliographic Details
Main Authors: Bork A. Berghoff, Torgny Karlsson, Thomas Källman, E. Gerhart H. Wagner, Manfred G. Grabherr
Format: Article
Language:English
Published: BMC 2017-09-01
Series:BioData Mining
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13040-017-0150-8
_version_ 1818967111125434368
author Bork A. Berghoff
Torgny Karlsson
Thomas Källman
E. Gerhart H. Wagner
Manfred G. Grabherr
author_facet Bork A. Berghoff
Torgny Karlsson
Thomas Källman
E. Gerhart H. Wagner
Manfred G. Grabherr
author_sort Bork A. Berghoff
collection DOAJ
description Abstract Background Measuring how gene expression changes in the course of an experiment assesses how an organism responds on a molecular level. Sequencing of RNA molecules, and their subsequent quantification, aims to assess global gene expression changes on the RNA level (transcriptome). While advances in high-throughput RNA-sequencing (RNA-seq) technologies allow for inexpensive data generation, accurate post-processing and normalization across samples is required to eliminate any systematic noise introduced by the biochemical and/or technical processes. Existing methods thus either normalize on selected known reference genes that are invariant in expression across the experiment, assume that the majority of genes are invariant, or that the effects of up- and down-regulated genes cancel each other out during the normalization. Results Here, we present a novel method, moose 2 , which predicts invariant genes in silico through a dynamic programming (DP) scheme and applies a quadratic normalization based on this subset. The method allows for specifying a set of known or experimentally validated invariant genes, which guides the DP. We experimentally verified the predictions of this method in the bacterium Escherichia coli, and show how moose 2 is able to (i) estimate the expression value distances between RNA-seq samples, (ii) reduce the variation of expression values across all samples, and (iii) to subsequently reveal new functional groups of genes during the late stages of DNA damage. We further applied the method to three eukaryotic data sets, on which its performance compares favourably to other methods. The software is implemented in C++ and is publicly available from http://grabherr.github.io/moose2/ . Conclusions The proposed RNA-seq normalization method, moose 2 , is a valuable alternative to existing methods, with two major advantages: (i) in silico prediction of invariant genes provides a list of potential reference genes for downstream analyses, and (ii) non-linear artefacts in RNA-seq data are handled adequately to minimize variations between replicates.
first_indexed 2024-12-20T13:43:36Z
format Article
id doaj.art-c6941cf90aba4d87863157e1b5b1fab5
institution Directory Open Access Journal
issn 1756-0381
language English
last_indexed 2024-12-20T13:43:36Z
publishDate 2017-09-01
publisher BMC
record_format Article
series BioData Mining
spelling doaj.art-c6941cf90aba4d87863157e1b5b1fab52022-12-21T19:38:44ZengBMCBioData Mining1756-03812017-09-0110112010.1186/s13040-017-0150-8RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case studyBork A. Berghoff0Torgny Karlsson1Thomas Källman2E. Gerhart H. Wagner3Manfred G. Grabherr4Institut für Mikrobiologie und Molekularbiologie, Justus-Liebig-UniversitätDepartment of Immunology, Genetics and Pathology, Uppsala UniversityDepartment of Medical Biochemistry and Microbiology, Uppsala UniversityDepartment of Cell and Molecular Biology, Uppsala UniversityDepartment of Medical Biochemistry and Microbiology, Uppsala UniversityAbstract Background Measuring how gene expression changes in the course of an experiment assesses how an organism responds on a molecular level. Sequencing of RNA molecules, and their subsequent quantification, aims to assess global gene expression changes on the RNA level (transcriptome). While advances in high-throughput RNA-sequencing (RNA-seq) technologies allow for inexpensive data generation, accurate post-processing and normalization across samples is required to eliminate any systematic noise introduced by the biochemical and/or technical processes. Existing methods thus either normalize on selected known reference genes that are invariant in expression across the experiment, assume that the majority of genes are invariant, or that the effects of up- and down-regulated genes cancel each other out during the normalization. Results Here, we present a novel method, moose 2 , which predicts invariant genes in silico through a dynamic programming (DP) scheme and applies a quadratic normalization based on this subset. The method allows for specifying a set of known or experimentally validated invariant genes, which guides the DP. We experimentally verified the predictions of this method in the bacterium Escherichia coli, and show how moose 2 is able to (i) estimate the expression value distances between RNA-seq samples, (ii) reduce the variation of expression values across all samples, and (iii) to subsequently reveal new functional groups of genes during the late stages of DNA damage. We further applied the method to three eukaryotic data sets, on which its performance compares favourably to other methods. The software is implemented in C++ and is publicly available from http://grabherr.github.io/moose2/ . Conclusions The proposed RNA-seq normalization method, moose 2 , is a valuable alternative to existing methods, with two major advantages: (i) in silico prediction of invariant genes provides a list of potential reference genes for downstream analyses, and (ii) non-linear artefacts in RNA-seq data are handled adequately to minimize variations between replicates.http://link.springer.com/article/10.1186/s13040-017-0150-8RNA-seqTranscriptomicsNormalizationGene expressionDNA damageStress response
spellingShingle Bork A. Berghoff
Torgny Karlsson
Thomas Källman
E. Gerhart H. Wagner
Manfred G. Grabherr
RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study
BioData Mining
RNA-seq
Transcriptomics
Normalization
Gene expression
DNA damage
Stress response
title RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study
title_full RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study
title_fullStr RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study
title_full_unstemmed RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study
title_short RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study
title_sort rna sequence data normalization through in silico prediction of reference genes the bacterial response to dna damage as case study
topic RNA-seq
Transcriptomics
Normalization
Gene expression
DNA damage
Stress response
url http://link.springer.com/article/10.1186/s13040-017-0150-8
work_keys_str_mv AT borkaberghoff rnasequencedatanormalizationthroughinsilicopredictionofreferencegenesthebacterialresponsetodnadamageascasestudy
AT torgnykarlsson rnasequencedatanormalizationthroughinsilicopredictionofreferencegenesthebacterialresponsetodnadamageascasestudy
AT thomaskallman rnasequencedatanormalizationthroughinsilicopredictionofreferencegenesthebacterialresponsetodnadamageascasestudy
AT egerharthwagner rnasequencedatanormalizationthroughinsilicopredictionofreferencegenesthebacterialresponsetodnadamageascasestudy
AT manfredggrabherr rnasequencedatanormalizationthroughinsilicopredictionofreferencegenesthebacterialresponsetodnadamageascasestudy