Shambhala: a platform-agnostic data harmonizer for gene expression data

Abstract Background Harmonization techniques make different gene expression profiles and their sets compatible and ready for comparisons. Here we present a new bioinformatic tool termed Shambhala for harmonization of multiple human gene expression datasets obtained using different experimental metho...

Full description

Bibliographic Details
Main Authors: Nicolas Borisov, Irina Shabalina, Victor Tkachev, Maxim Sorokin, Andrew Garazha, Andrey Pulin, Ilya I. Eremin, Anton Buzdin
Format: Article
Language:English
Published: BMC 2019-02-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-019-2641-8
_version_ 1811287116290719744
author Nicolas Borisov
Irina Shabalina
Victor Tkachev
Maxim Sorokin
Andrew Garazha
Andrey Pulin
Ilya I. Eremin
Anton Buzdin
author_facet Nicolas Borisov
Irina Shabalina
Victor Tkachev
Maxim Sorokin
Andrew Garazha
Andrey Pulin
Ilya I. Eremin
Anton Buzdin
author_sort Nicolas Borisov
collection DOAJ
description Abstract Background Harmonization techniques make different gene expression profiles and their sets compatible and ready for comparisons. Here we present a new bioinformatic tool termed Shambhala for harmonization of multiple human gene expression datasets obtained using different experimental methods and platforms of microarray hybridization and RNA sequencing. Results Unlike previously published methods enabling good quality data harmonization for only two datasets, Shambhala allows conversion of multiple datasets into the universal form suitable for further comparisons. Shambhala harmonization is based on the calibration of gene expression profiles using the auxiliary standardization dataset. Each profile is transformed to make it similar to the output of microarray hybridization platform Affymetrix Human Gene. This platform was chosen because it has the biggest number of human gene expression profiles deposited in public databases. We evaluated Shambhala ability to retain biologically important features after harmonization. The same four biological samples taken in multiple replicates were profiled independently using three and four different experimental platforms, respectively, then Shambhala-harmonized and investigated by hierarchical clustering. Conclusion Our results showed that unlike other frequently used methods: quantile normalization and DESeq/DESeq2 normalization, Shambhala harmonization was the only method supporting sample-specific and platform-independent biologically meaningful clustering for the data obtained from multiple experimental platforms.
first_indexed 2024-04-13T03:13:12Z
format Article
id doaj.art-5eecde675783479d9739b2c4756b0a22
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-13T03:13:12Z
publishDate 2019-02-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-5eecde675783479d9739b2c4756b0a222022-12-22T03:04:59ZengBMCBMC Bioinformatics1471-21052019-02-0120111010.1186/s12859-019-2641-8Shambhala: a platform-agnostic data harmonizer for gene expression dataNicolas Borisov0Irina Shabalina1Victor Tkachev2Maxim Sorokin3Andrew Garazha4Andrey Pulin5Ilya I. Eremin6Anton Buzdin7I.M. Sechenov First Moscow State Medical University, Sechenov UniversityFaculty of Mathematics and Information Technologies, Petrozavodsk State UniversityDepartment of bioinformatics and molecular networks, OmicsWay CorporationI.M. Sechenov First Moscow State Medical University, Sechenov UniversityDepartment of bioinformatics and molecular networks, OmicsWay CorporationLaboratory for Cell Biology and Developmental Pathology, Federal State Institution “Institute of General Pathology and Pathophysiology”, FSBSI “IGPP”Department for Regenerative Medicine, JSC GeneriumI.M. Sechenov First Moscow State Medical University, Sechenov UniversityAbstract Background Harmonization techniques make different gene expression profiles and their sets compatible and ready for comparisons. Here we present a new bioinformatic tool termed Shambhala for harmonization of multiple human gene expression datasets obtained using different experimental methods and platforms of microarray hybridization and RNA sequencing. Results Unlike previously published methods enabling good quality data harmonization for only two datasets, Shambhala allows conversion of multiple datasets into the universal form suitable for further comparisons. Shambhala harmonization is based on the calibration of gene expression profiles using the auxiliary standardization dataset. Each profile is transformed to make it similar to the output of microarray hybridization platform Affymetrix Human Gene. This platform was chosen because it has the biggest number of human gene expression profiles deposited in public databases. We evaluated Shambhala ability to retain biologically important features after harmonization. The same four biological samples taken in multiple replicates were profiled independently using three and four different experimental platforms, respectively, then Shambhala-harmonized and investigated by hierarchical clustering. Conclusion Our results showed that unlike other frequently used methods: quantile normalization and DESeq/DESeq2 normalization, Shambhala harmonization was the only method supporting sample-specific and platform-independent biologically meaningful clustering for the data obtained from multiple experimental platforms.http://link.springer.com/article/10.1186/s12859-019-2641-8TranscriptomeGene expressionMicroarray hybridizationNext-generation sequencingHarmonization of transcriptional profilesComparison of multiple datasets
spellingShingle Nicolas Borisov
Irina Shabalina
Victor Tkachev
Maxim Sorokin
Andrew Garazha
Andrey Pulin
Ilya I. Eremin
Anton Buzdin
Shambhala: a platform-agnostic data harmonizer for gene expression data
BMC Bioinformatics
Transcriptome
Gene expression
Microarray hybridization
Next-generation sequencing
Harmonization of transcriptional profiles
Comparison of multiple datasets
title Shambhala: a platform-agnostic data harmonizer for gene expression data
title_full Shambhala: a platform-agnostic data harmonizer for gene expression data
title_fullStr Shambhala: a platform-agnostic data harmonizer for gene expression data
title_full_unstemmed Shambhala: a platform-agnostic data harmonizer for gene expression data
title_short Shambhala: a platform-agnostic data harmonizer for gene expression data
title_sort shambhala a platform agnostic data harmonizer for gene expression data
topic Transcriptome
Gene expression
Microarray hybridization
Next-generation sequencing
Harmonization of transcriptional profiles
Comparison of multiple datasets
url http://link.springer.com/article/10.1186/s12859-019-2641-8
work_keys_str_mv AT nicolasborisov shambhalaaplatformagnosticdataharmonizerforgeneexpressiondata
AT irinashabalina shambhalaaplatformagnosticdataharmonizerforgeneexpressiondata
AT victortkachev shambhalaaplatformagnosticdataharmonizerforgeneexpressiondata
AT maximsorokin shambhalaaplatformagnosticdataharmonizerforgeneexpressiondata
AT andrewgarazha shambhalaaplatformagnosticdataharmonizerforgeneexpressiondata
AT andreypulin shambhalaaplatformagnosticdataharmonizerforgeneexpressiondata
AT ilyaieremin shambhalaaplatformagnosticdataharmonizerforgeneexpressiondata
AT antonbuzdin shambhalaaplatformagnosticdataharmonizerforgeneexpressiondata