Shambhala: a platform-agnostic data harmonizer for gene expression data
Abstract Background Harmonization techniques make different gene expression profiles and their sets compatible and ready for comparisons. Here we present a new bioinformatic tool termed Shambhala for harmonization of multiple human gene expression datasets obtained using different experimental metho...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2019-02-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-019-2641-8 |
_version_ | 1811287116290719744 |
---|---|
author | Nicolas Borisov Irina Shabalina Victor Tkachev Maxim Sorokin Andrew Garazha Andrey Pulin Ilya I. Eremin Anton Buzdin |
author_facet | Nicolas Borisov Irina Shabalina Victor Tkachev Maxim Sorokin Andrew Garazha Andrey Pulin Ilya I. Eremin Anton Buzdin |
author_sort | Nicolas Borisov |
collection | DOAJ |
description | Abstract Background Harmonization techniques make different gene expression profiles and their sets compatible and ready for comparisons. Here we present a new bioinformatic tool termed Shambhala for harmonization of multiple human gene expression datasets obtained using different experimental methods and platforms of microarray hybridization and RNA sequencing. Results Unlike previously published methods enabling good quality data harmonization for only two datasets, Shambhala allows conversion of multiple datasets into the universal form suitable for further comparisons. Shambhala harmonization is based on the calibration of gene expression profiles using the auxiliary standardization dataset. Each profile is transformed to make it similar to the output of microarray hybridization platform Affymetrix Human Gene. This platform was chosen because it has the biggest number of human gene expression profiles deposited in public databases. We evaluated Shambhala ability to retain biologically important features after harmonization. The same four biological samples taken in multiple replicates were profiled independently using three and four different experimental platforms, respectively, then Shambhala-harmonized and investigated by hierarchical clustering. Conclusion Our results showed that unlike other frequently used methods: quantile normalization and DESeq/DESeq2 normalization, Shambhala harmonization was the only method supporting sample-specific and platform-independent biologically meaningful clustering for the data obtained from multiple experimental platforms. |
first_indexed | 2024-04-13T03:13:12Z |
format | Article |
id | doaj.art-5eecde675783479d9739b2c4756b0a22 |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-04-13T03:13:12Z |
publishDate | 2019-02-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-5eecde675783479d9739b2c4756b0a222022-12-22T03:04:59ZengBMCBMC Bioinformatics1471-21052019-02-0120111010.1186/s12859-019-2641-8Shambhala: a platform-agnostic data harmonizer for gene expression dataNicolas Borisov0Irina Shabalina1Victor Tkachev2Maxim Sorokin3Andrew Garazha4Andrey Pulin5Ilya I. Eremin6Anton Buzdin7I.M. Sechenov First Moscow State Medical University, Sechenov UniversityFaculty of Mathematics and Information Technologies, Petrozavodsk State UniversityDepartment of bioinformatics and molecular networks, OmicsWay CorporationI.M. Sechenov First Moscow State Medical University, Sechenov UniversityDepartment of bioinformatics and molecular networks, OmicsWay CorporationLaboratory for Cell Biology and Developmental Pathology, Federal State Institution “Institute of General Pathology and Pathophysiology”, FSBSI “IGPP”Department for Regenerative Medicine, JSC GeneriumI.M. Sechenov First Moscow State Medical University, Sechenov UniversityAbstract Background Harmonization techniques make different gene expression profiles and their sets compatible and ready for comparisons. Here we present a new bioinformatic tool termed Shambhala for harmonization of multiple human gene expression datasets obtained using different experimental methods and platforms of microarray hybridization and RNA sequencing. Results Unlike previously published methods enabling good quality data harmonization for only two datasets, Shambhala allows conversion of multiple datasets into the universal form suitable for further comparisons. Shambhala harmonization is based on the calibration of gene expression profiles using the auxiliary standardization dataset. Each profile is transformed to make it similar to the output of microarray hybridization platform Affymetrix Human Gene. This platform was chosen because it has the biggest number of human gene expression profiles deposited in public databases. We evaluated Shambhala ability to retain biologically important features after harmonization. The same four biological samples taken in multiple replicates were profiled independently using three and four different experimental platforms, respectively, then Shambhala-harmonized and investigated by hierarchical clustering. Conclusion Our results showed that unlike other frequently used methods: quantile normalization and DESeq/DESeq2 normalization, Shambhala harmonization was the only method supporting sample-specific and platform-independent biologically meaningful clustering for the data obtained from multiple experimental platforms.http://link.springer.com/article/10.1186/s12859-019-2641-8TranscriptomeGene expressionMicroarray hybridizationNext-generation sequencingHarmonization of transcriptional profilesComparison of multiple datasets |
spellingShingle | Nicolas Borisov Irina Shabalina Victor Tkachev Maxim Sorokin Andrew Garazha Andrey Pulin Ilya I. Eremin Anton Buzdin Shambhala: a platform-agnostic data harmonizer for gene expression data BMC Bioinformatics Transcriptome Gene expression Microarray hybridization Next-generation sequencing Harmonization of transcriptional profiles Comparison of multiple datasets |
title | Shambhala: a platform-agnostic data harmonizer for gene expression data |
title_full | Shambhala: a platform-agnostic data harmonizer for gene expression data |
title_fullStr | Shambhala: a platform-agnostic data harmonizer for gene expression data |
title_full_unstemmed | Shambhala: a platform-agnostic data harmonizer for gene expression data |
title_short | Shambhala: a platform-agnostic data harmonizer for gene expression data |
title_sort | shambhala a platform agnostic data harmonizer for gene expression data |
topic | Transcriptome Gene expression Microarray hybridization Next-generation sequencing Harmonization of transcriptional profiles Comparison of multiple datasets |
url | http://link.springer.com/article/10.1186/s12859-019-2641-8 |
work_keys_str_mv | AT nicolasborisov shambhalaaplatformagnosticdataharmonizerforgeneexpressiondata AT irinashabalina shambhalaaplatformagnosticdataharmonizerforgeneexpressiondata AT victortkachev shambhalaaplatformagnosticdataharmonizerforgeneexpressiondata AT maximsorokin shambhalaaplatformagnosticdataharmonizerforgeneexpressiondata AT andrewgarazha shambhalaaplatformagnosticdataharmonizerforgeneexpressiondata AT andreypulin shambhalaaplatformagnosticdataharmonizerforgeneexpressiondata AT ilyaieremin shambhalaaplatformagnosticdataharmonizerforgeneexpressiondata AT antonbuzdin shambhalaaplatformagnosticdataharmonizerforgeneexpressiondata |