TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository

Abstract Background In order to correctly decode phenotypic information from RNA-sequencing (RNA-seq) data, careful selection of the RNA-seq quantification measure is critical for inter-sample comparisons and for downstream analyses, such as differential gene expression between two or more condition...

Full description

Bibliographic Details
Main Authors: Yingdong Zhao, Ming-Chung Li, Mariam M. Konaté, Li Chen, Biswajit Das, Chris Karlovich, P. Mickey Williams, Yvonne A. Evrard, James H. Doroshow, Lisa M. McShane
Format: Article
Language:English
Published: BMC 2021-06-01
Series:Journal of Translational Medicine
Subjects:
Online Access:https://doi.org/10.1186/s12967-021-02936-w
_version_ 1819085842413518848
author Yingdong Zhao
Ming-Chung Li
Mariam M. Konaté
Li Chen
Biswajit Das
Chris Karlovich
P. Mickey Williams
Yvonne A. Evrard
James H. Doroshow
Lisa M. McShane
author_facet Yingdong Zhao
Ming-Chung Li
Mariam M. Konaté
Li Chen
Biswajit Das
Chris Karlovich
P. Mickey Williams
Yvonne A. Evrard
James H. Doroshow
Lisa M. McShane
author_sort Yingdong Zhao
collection DOAJ
description Abstract Background In order to correctly decode phenotypic information from RNA-sequencing (RNA-seq) data, careful selection of the RNA-seq quantification measure is critical for inter-sample comparisons and for downstream analyses, such as differential gene expression between two or more conditions. Several methods have been proposed and continue to be used. However, a consensus has not been reached regarding the best gene expression quantification method for RNA-seq data analysis. Methods In the present study, we used replicate samples from each of 20 patient-derived xenograft (PDX) models spanning 15 tumor types, for a total of 61 human tumor xenograft samples available through the NCI patient-derived model repository (PDMR). We compared the reproducibility across replicate samples based on TPM (transcripts per million), FPKM (fragments per kilobase of transcript per million fragments mapped), and normalized counts using coefficient of variation, intraclass correlation coefficient, and cluster analysis. Results Our results revealed that hierarchical clustering on normalized count data tended to group replicate samples from the same PDX model together more accurately than TPM and FPKM data. Furthermore, normalized count data were observed to have the lowest median coefficient of variation (CV), and highest intraclass correlation (ICC) values across all replicate samples from the same model and for the same gene across all PDX models compared to TPM and FPKM data. Conclusion We provided compelling evidence for a preferred quantification measure to conduct downstream analyses of PDX RNA-seq data. To our knowledge, this is the first comparative study of RNA-seq data quantification measures conducted on PDX models, which are known to be inherently more variable than cell line models. Our findings are consistent with what others have shown for human tumors and cell lines and add further support to the thesis that normalized counts are the best choice for the analysis of RNA-seq data across samples.
first_indexed 2024-12-21T21:10:47Z
format Article
id doaj.art-c10f445a336e4544bc1017e110c062d5
institution Directory Open Access Journal
issn 1479-5876
language English
last_indexed 2024-12-21T21:10:47Z
publishDate 2021-06-01
publisher BMC
record_format Article
series Journal of Translational Medicine
spelling doaj.art-c10f445a336e4544bc1017e110c062d52022-12-21T18:50:08ZengBMCJournal of Translational Medicine1479-58762021-06-0119111510.1186/s12967-021-02936-wTPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models RepositoryYingdong Zhao0Ming-Chung Li1Mariam M. Konaté2Li Chen3Biswajit Das4Chris Karlovich5P. Mickey Williams6Yvonne A. Evrard7James H. Doroshow8Lisa M. McShane9Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer InstituteBiometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer InstituteBiometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer InstituteLeidos Biomedical Research, Inc., Frederick National Laboratory for Cancer ResearchLeidos Biomedical Research, Inc., Frederick National Laboratory for Cancer ResearchLeidos Biomedical Research, Inc., Frederick National Laboratory for Cancer ResearchLeidos Biomedical Research, Inc., Frederick National Laboratory for Cancer ResearchLeidos Biomedical Research, Inc., Frederick National Laboratory for Cancer ResearchDivision of Cancer Treatment and Diagnosis, National Cancer InstituteBiometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer InstituteAbstract Background In order to correctly decode phenotypic information from RNA-sequencing (RNA-seq) data, careful selection of the RNA-seq quantification measure is critical for inter-sample comparisons and for downstream analyses, such as differential gene expression between two or more conditions. Several methods have been proposed and continue to be used. However, a consensus has not been reached regarding the best gene expression quantification method for RNA-seq data analysis. Methods In the present study, we used replicate samples from each of 20 patient-derived xenograft (PDX) models spanning 15 tumor types, for a total of 61 human tumor xenograft samples available through the NCI patient-derived model repository (PDMR). We compared the reproducibility across replicate samples based on TPM (transcripts per million), FPKM (fragments per kilobase of transcript per million fragments mapped), and normalized counts using coefficient of variation, intraclass correlation coefficient, and cluster analysis. Results Our results revealed that hierarchical clustering on normalized count data tended to group replicate samples from the same PDX model together more accurately than TPM and FPKM data. Furthermore, normalized count data were observed to have the lowest median coefficient of variation (CV), and highest intraclass correlation (ICC) values across all replicate samples from the same model and for the same gene across all PDX models compared to TPM and FPKM data. Conclusion We provided compelling evidence for a preferred quantification measure to conduct downstream analyses of PDX RNA-seq data. To our knowledge, this is the first comparative study of RNA-seq data quantification measures conducted on PDX models, which are known to be inherently more variable than cell line models. Our findings are consistent with what others have shown for human tumors and cell lines and add further support to the thesis that normalized counts are the best choice for the analysis of RNA-seq data across samples.https://doi.org/10.1186/s12967-021-02936-wRNA sequencingQuantification measuresNormalizationTPMFPKMCount
spellingShingle Yingdong Zhao
Ming-Chung Li
Mariam M. Konaté
Li Chen
Biswajit Das
Chris Karlovich
P. Mickey Williams
Yvonne A. Evrard
James H. Doroshow
Lisa M. McShane
TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository
Journal of Translational Medicine
RNA sequencing
Quantification measures
Normalization
TPM
FPKM
Count
title TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository
title_full TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository
title_fullStr TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository
title_full_unstemmed TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository
title_short TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository
title_sort tpm fpkm or normalized counts a comparative study of quantification measures for the analysis of rna seq data from the nci patient derived models repository
topic RNA sequencing
Quantification measures
Normalization
TPM
FPKM
Count
url https://doi.org/10.1186/s12967-021-02936-w
work_keys_str_mv AT yingdongzhao tpmfpkmornormalizedcountsacomparativestudyofquantificationmeasuresfortheanalysisofrnaseqdatafromthencipatientderivedmodelsrepository
AT mingchungli tpmfpkmornormalizedcountsacomparativestudyofquantificationmeasuresfortheanalysisofrnaseqdatafromthencipatientderivedmodelsrepository
AT mariammkonate tpmfpkmornormalizedcountsacomparativestudyofquantificationmeasuresfortheanalysisofrnaseqdatafromthencipatientderivedmodelsrepository
AT lichen tpmfpkmornormalizedcountsacomparativestudyofquantificationmeasuresfortheanalysisofrnaseqdatafromthencipatientderivedmodelsrepository
AT biswajitdas tpmfpkmornormalizedcountsacomparativestudyofquantificationmeasuresfortheanalysisofrnaseqdatafromthencipatientderivedmodelsrepository
AT chriskarlovich tpmfpkmornormalizedcountsacomparativestudyofquantificationmeasuresfortheanalysisofrnaseqdatafromthencipatientderivedmodelsrepository
AT pmickeywilliams tpmfpkmornormalizedcountsacomparativestudyofquantificationmeasuresfortheanalysisofrnaseqdatafromthencipatientderivedmodelsrepository
AT yvonneaevrard tpmfpkmornormalizedcountsacomparativestudyofquantificationmeasuresfortheanalysisofrnaseqdatafromthencipatientderivedmodelsrepository
AT jameshdoroshow tpmfpkmornormalizedcountsacomparativestudyofquantificationmeasuresfortheanalysisofrnaseqdatafromthencipatientderivedmodelsrepository
AT lisammcshane tpmfpkmornormalizedcountsacomparativestudyofquantificationmeasuresfortheanalysisofrnaseqdatafromthencipatientderivedmodelsrepository