Automated identification of reference genes based on RNA-seq data

Abstract Background Gene expression analyses demand appropriate reference genes (RGs) for normalization, in order to obtain reliable assessments. Ideally, RG expression levels should remain constant in all cells, tissues or experimental conditions under study. Housekeeping genes traditionally fulfil...

Full description

Bibliographic Details
Main Authors: Rosario Carmona, Macarena Arroyo, María José Jiménez-Quesada, Pedro Seoane, Adoración Zafra, Rafael Larrosa, Juan de Dios Alché, M. Gonzalo Claros
Format: Article
Language:English
Published: BMC 2017-08-01
Series:BioMedical Engineering OnLine
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12938-017-0356-5
_version_ 1818345407181225984
author Rosario Carmona
Macarena Arroyo
María José Jiménez-Quesada
Pedro Seoane
Adoración Zafra
Rafael Larrosa
Juan de Dios Alché
M. Gonzalo Claros
author_facet Rosario Carmona
Macarena Arroyo
María José Jiménez-Quesada
Pedro Seoane
Adoración Zafra
Rafael Larrosa
Juan de Dios Alché
M. Gonzalo Claros
author_sort Rosario Carmona
collection DOAJ
description Abstract Background Gene expression analyses demand appropriate reference genes (RGs) for normalization, in order to obtain reliable assessments. Ideally, RG expression levels should remain constant in all cells, tissues or experimental conditions under study. Housekeeping genes traditionally fulfilled this requirement, but they have been reported to be less invariant than expected; therefore, RGs should be tested and validated for every particular situation. Microarray data have been used to propose new RGs, but only a limited set of model species and conditions are available; on the contrary, RNA-seq experiments are more and more frequent and constitute a new source of candidate RGs. Results An automated workflow based on mapped NGS reads has been constructed to obtain highly and invariantly expressed RGs based on a normalized expression in reads per mapped million and the coefficient of variation. This workflow has been tested with Roche/454 reads from reproductive tissues of olive tree (Olea europaea L.), as well as with Illumina paired-end reads from two different accessions of Arabidopsis thaliana and three different human cancers (prostate, small-cell cancer lung and lung adenocarcinoma). Candidate RGs have been proposed for each species and many of them have been previously reported as RGs in literature. Experimental validation of significant RGs in olive tree is provided to support the algorithm. Conclusion Regardless sequencing technology, number of replicates, and library sizes, when RNA-seq experiments are designed and performed, the same datasets can be analyzed with our workflow to extract suitable RGs for subsequent PCR validation. Moreover, different subset of experimental conditions can provide different suitable RGs.
first_indexed 2024-12-13T17:01:53Z
format Article
id doaj.art-156bb23fb86748c7883f42d36b9423bd
institution Directory Open Access Journal
issn 1475-925X
language English
last_indexed 2024-12-13T17:01:53Z
publishDate 2017-08-01
publisher BMC
record_format Article
series BioMedical Engineering OnLine
spelling doaj.art-156bb23fb86748c7883f42d36b9423bd2022-12-21T23:37:46ZengBMCBioMedical Engineering OnLine1475-925X2017-08-0116S112310.1186/s12938-017-0356-5Automated identification of reference genes based on RNA-seq dataRosario Carmona0Macarena Arroyo1María José Jiménez-Quesada2Pedro Seoane3Adoración Zafra4Rafael Larrosa5Juan de Dios Alché6M. Gonzalo Claros7Plant Reproductive Biology Laboratory, Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, CSICServicio de Neumología, Hospital Regional Universitario de MálagaPlant Reproductive Biology Laboratory, Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, CSICDepartamento de Biología Molecular y Bioquímica, Universidad de MálagaPlant Reproductive Biology Laboratory, Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, CSICDepartamento de Arquitectura de Computadores, Universidad de MálagaPlant Reproductive Biology Laboratory, Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, CSICDepartamento de Biología Molecular y Bioquímica, Universidad de MálagaAbstract Background Gene expression analyses demand appropriate reference genes (RGs) for normalization, in order to obtain reliable assessments. Ideally, RG expression levels should remain constant in all cells, tissues or experimental conditions under study. Housekeeping genes traditionally fulfilled this requirement, but they have been reported to be less invariant than expected; therefore, RGs should be tested and validated for every particular situation. Microarray data have been used to propose new RGs, but only a limited set of model species and conditions are available; on the contrary, RNA-seq experiments are more and more frequent and constitute a new source of candidate RGs. Results An automated workflow based on mapped NGS reads has been constructed to obtain highly and invariantly expressed RGs based on a normalized expression in reads per mapped million and the coefficient of variation. This workflow has been tested with Roche/454 reads from reproductive tissues of olive tree (Olea europaea L.), as well as with Illumina paired-end reads from two different accessions of Arabidopsis thaliana and three different human cancers (prostate, small-cell cancer lung and lung adenocarcinoma). Candidate RGs have been proposed for each species and many of them have been previously reported as RGs in literature. Experimental validation of significant RGs in olive tree is provided to support the algorithm. Conclusion Regardless sequencing technology, number of replicates, and library sizes, when RNA-seq experiments are designed and performed, the same datasets can be analyzed with our workflow to extract suitable RGs for subsequent PCR validation. Moreover, different subset of experimental conditions can provide different suitable RGs.http://link.springer.com/article/10.1186/s12938-017-0356-5Reference genesNormalizationReal-time PCRQuantitative PCROlive (Olea europaea L.)Cancer
spellingShingle Rosario Carmona
Macarena Arroyo
María José Jiménez-Quesada
Pedro Seoane
Adoración Zafra
Rafael Larrosa
Juan de Dios Alché
M. Gonzalo Claros
Automated identification of reference genes based on RNA-seq data
BioMedical Engineering OnLine
Reference genes
Normalization
Real-time PCR
Quantitative PCR
Olive (Olea europaea L.)
Cancer
title Automated identification of reference genes based on RNA-seq data
title_full Automated identification of reference genes based on RNA-seq data
title_fullStr Automated identification of reference genes based on RNA-seq data
title_full_unstemmed Automated identification of reference genes based on RNA-seq data
title_short Automated identification of reference genes based on RNA-seq data
title_sort automated identification of reference genes based on rna seq data
topic Reference genes
Normalization
Real-time PCR
Quantitative PCR
Olive (Olea europaea L.)
Cancer
url http://link.springer.com/article/10.1186/s12938-017-0356-5
work_keys_str_mv AT rosariocarmona automatedidentificationofreferencegenesbasedonrnaseqdata
AT macarenaarroyo automatedidentificationofreferencegenesbasedonrnaseqdata
AT mariajosejimenezquesada automatedidentificationofreferencegenesbasedonrnaseqdata
AT pedroseoane automatedidentificationofreferencegenesbasedonrnaseqdata
AT adoracionzafra automatedidentificationofreferencegenesbasedonrnaseqdata
AT rafaellarrosa automatedidentificationofreferencegenesbasedonrnaseqdata
AT juandediosalche automatedidentificationofreferencegenesbasedonrnaseqdata
AT mgonzaloclaros automatedidentificationofreferencegenesbasedonrnaseqdata