Normalization of gene expression data revisited: the three viewpoints of the transcriptome in human skeletal muscle undergoing load-induced hypertrophy and why they matter

Abstract Background The biological relevance and accuracy of gene expression data depend on the adequacy of data normalization. This is both due to its role in resolving and accounting for technical variation and errors, and its defining role in shaping the viewpoint of biological interpretations. S...

Full description

Bibliographic Details
Main Authors:	Yusuf Khan, Daniel Hammarström, Stian Ellefsen, Rafi Ahmad
Format:	Article
Language:	English
Published:	BMC 2022-06-01
Series:	BMC Bioinformatics
Subjects:	RNA-seq Skeletal muscle Normalization Resistance training
Online Access:	https://doi.org/10.1186/s12859-022-04791-y

_version_	1811344917777088512
author	Yusuf Khan Daniel Hammarström Stian Ellefsen Rafi Ahmad
author_facet	Yusuf Khan Daniel Hammarström Stian Ellefsen Rafi Ahmad
author_sort	Yusuf Khan
collection	DOAJ
description	Abstract Background The biological relevance and accuracy of gene expression data depend on the adequacy of data normalization. This is both due to its role in resolving and accounting for technical variation and errors, and its defining role in shaping the viewpoint of biological interpretations. Still, the choice of the normalization method is often not explicitly motivated although this choice may be particularly decisive for conclusions in studies involving pronounced cellular plasticity. In this study, we highlight the consequences of using three fundamentally different modes of normalization for interpreting RNA-seq data from human skeletal muscle undergoing exercise-training-induced growth. Briefly, 25 participants conducted 12 weeks of high-load resistance training. Muscle biopsy specimens were sampled from m. vastus lateralis before, after two weeks of training (week 2) and after the intervention (week 12), and were subsequently analyzed using RNA-seq. Transcript counts were modeled as (1) per-library-size, (2) per-total-RNA, and (3) per-sample-size (per-mg-tissue). Result Initially, the three modes of transcript modeling led to the identification of three unique sets of stable genes, which displayed differential expression profiles. Specifically, genes showing stable expression across samples in the per-library-size dataset displayed training-associated increases in per-total-RNA and per-sample-size datasets. These gene sets were then used for normalization of the entire dataset, providing transcript abundance estimates corresponding to each of the three biological viewpoints (i.e., per-library-size, per-total-RNA, and per-sample-size). The different normalization modes led to different conclusions, measured as training-associated changes in transcript expression. Briefly, for 27% and 20% of the transcripts, training was associated with changes in expression in per-total-RNA and per-sample-size scenarios, but not in the per-library-size scenario. At week 2, this led to opposite conclusions for 4% of the transcripts between per-library-size and per-sample-size datasets (↑ vs. ↓, respectively). Conclusion Scientists should be explicit with their choice of normalization strategies and should interpret the results of gene expression analyses with caution. This is particularly important for data sets involving a limited number of genes or involving growing or differentiating cellular models, where the risk of biased conclusions is pronounced.
first_indexed	2024-04-13T19:55:17Z
format	Article
id	doaj.art-1ba9eecb4e604dcba4dcb5d614f9ffa8
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-04-13T19:55:17Z
publishDate	2022-06-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-1ba9eecb4e604dcba4dcb5d614f9ffa82022-12-22T02:32:22ZengBMCBMC Bioinformatics1471-21052022-06-012311910.1186/s12859-022-04791-yNormalization of gene expression data revisited: the three viewpoints of the transcriptome in human skeletal muscle undergoing load-induced hypertrophy and why they matterYusuf Khan0Daniel Hammarström1Stian Ellefsen2Rafi Ahmad3Department of Biotechnology, Inland Norway University of Applied SciencesSection for Health and Exercise Physiology, Department of Public Health and Sport Sciences, Inland Norway University of Applied SciencesSection for Health and Exercise Physiology, Department of Public Health and Sport Sciences, Inland Norway University of Applied SciencesDepartment of Biotechnology, Inland Norway University of Applied SciencesAbstract Background The biological relevance and accuracy of gene expression data depend on the adequacy of data normalization. This is both due to its role in resolving and accounting for technical variation and errors, and its defining role in shaping the viewpoint of biological interpretations. Still, the choice of the normalization method is often not explicitly motivated although this choice may be particularly decisive for conclusions in studies involving pronounced cellular plasticity. In this study, we highlight the consequences of using three fundamentally different modes of normalization for interpreting RNA-seq data from human skeletal muscle undergoing exercise-training-induced growth. Briefly, 25 participants conducted 12 weeks of high-load resistance training. Muscle biopsy specimens were sampled from m. vastus lateralis before, after two weeks of training (week 2) and after the intervention (week 12), and were subsequently analyzed using RNA-seq. Transcript counts were modeled as (1) per-library-size, (2) per-total-RNA, and (3) per-sample-size (per-mg-tissue). Result Initially, the three modes of transcript modeling led to the identification of three unique sets of stable genes, which displayed differential expression profiles. Specifically, genes showing stable expression across samples in the per-library-size dataset displayed training-associated increases in per-total-RNA and per-sample-size datasets. These gene sets were then used for normalization of the entire dataset, providing transcript abundance estimates corresponding to each of the three biological viewpoints (i.e., per-library-size, per-total-RNA, and per-sample-size). The different normalization modes led to different conclusions, measured as training-associated changes in transcript expression. Briefly, for 27% and 20% of the transcripts, training was associated with changes in expression in per-total-RNA and per-sample-size scenarios, but not in the per-library-size scenario. At week 2, this led to opposite conclusions for 4% of the transcripts between per-library-size and per-sample-size datasets (↑ vs. ↓, respectively). Conclusion Scientists should be explicit with their choice of normalization strategies and should interpret the results of gene expression analyses with caution. This is particularly important for data sets involving a limited number of genes or involving growing or differentiating cellular models, where the risk of biased conclusions is pronounced.https://doi.org/10.1186/s12859-022-04791-yRNA-seqSkeletal muscleNormalizationResistance training
spellingShingle	Yusuf Khan Daniel Hammarström Stian Ellefsen Rafi Ahmad Normalization of gene expression data revisited: the three viewpoints of the transcriptome in human skeletal muscle undergoing load-induced hypertrophy and why they matter BMC Bioinformatics RNA-seq Skeletal muscle Normalization Resistance training
title	Normalization of gene expression data revisited: the three viewpoints of the transcriptome in human skeletal muscle undergoing load-induced hypertrophy and why they matter
title_full	Normalization of gene expression data revisited: the three viewpoints of the transcriptome in human skeletal muscle undergoing load-induced hypertrophy and why they matter
title_fullStr	Normalization of gene expression data revisited: the three viewpoints of the transcriptome in human skeletal muscle undergoing load-induced hypertrophy and why they matter
title_full_unstemmed	Normalization of gene expression data revisited: the three viewpoints of the transcriptome in human skeletal muscle undergoing load-induced hypertrophy and why they matter
title_short	Normalization of gene expression data revisited: the three viewpoints of the transcriptome in human skeletal muscle undergoing load-induced hypertrophy and why they matter
title_sort	normalization of gene expression data revisited the three viewpoints of the transcriptome in human skeletal muscle undergoing load induced hypertrophy and why they matter
topic	RNA-seq Skeletal muscle Normalization Resistance training
url	https://doi.org/10.1186/s12859-022-04791-y
work_keys_str_mv	AT yusufkhan normalizationofgeneexpressiondatarevisitedthethreeviewpointsofthetranscriptomeinhumanskeletalmuscleundergoingloadinducedhypertrophyandwhytheymatter AT danielhammarstrom normalizationofgeneexpressiondatarevisitedthethreeviewpointsofthetranscriptomeinhumanskeletalmuscleundergoingloadinducedhypertrophyandwhytheymatter AT stianellefsen normalizationofgeneexpressiondatarevisitedthethreeviewpointsofthetranscriptomeinhumanskeletalmuscleundergoingloadinducedhypertrophyandwhytheymatter AT rafiahmad normalizationofgeneexpressiondatarevisitedthethreeviewpointsofthetranscriptomeinhumanskeletalmuscleundergoingloadinducedhypertrophyandwhytheymatter

Normalization of gene expression data revisited: the three viewpoints of the transcriptome in human skeletal muscle undergoing load-induced hypertrophy and why they matter

Similar Items