Semi-supervised integration of single-cell transcriptomics data

Abstract Batch effects in single-cell RNA-seq data pose a significant challenge for comparative analyses across samples, individuals, and conditions. Although batch effect correction methods are routinely applied, data integration often leads to overcorrection and can result in the loss of biologica...

Full description

Bibliographic Details
Main Authors:	Massimo Andreatta, Léonard Hérault, Paul Gueguen, David Gfeller, Ariel J. Berenstein, Santiago J. Carmona
Format:	Article
Language:	English
Published:	Nature Portfolio 2024-01-01
Series:	Nature Communications
Online Access:	https://doi.org/10.1038/s41467-024-45240-z

_version_	1797274128718233600
author	Massimo Andreatta Léonard Hérault Paul Gueguen David Gfeller Ariel J. Berenstein Santiago J. Carmona
author_facet	Massimo Andreatta Léonard Hérault Paul Gueguen David Gfeller Ariel J. Berenstein Santiago J. Carmona
author_sort	Massimo Andreatta
collection	DOAJ
description	Abstract Batch effects in single-cell RNA-seq data pose a significant challenge for comparative analyses across samples, individuals, and conditions. Although batch effect correction methods are routinely applied, data integration often leads to overcorrection and can result in the loss of biological variability. In this work we present STACAS, a batch correction method for scRNA-seq that leverages prior knowledge on cell types to preserve biological variability upon integration. Through an open-source benchmark, we show that semi-supervised STACAS outperforms state-of-the-art unsupervised methods, as well as supervised methods such as scANVI and scGen. STACAS scales well to large datasets and is robust to incomplete and imprecise input cell type labels, which are commonly encountered in real-life integration tasks. We argue that the incorporation of prior cell type information should be a common practice in single-cell data integration, and we provide a flexible framework for semi-supervised batch effect correction.
first_indexed	2024-03-07T14:53:56Z
format	Article
id	doaj.art-d35b657fcd2e4fe8aefe46b618827fe8
institution	Directory Open Access Journal
issn	2041-1723
language	English
last_indexed	2024-03-07T14:53:56Z
publishDate	2024-01-01
publisher	Nature Portfolio
record_format	Article
series	Nature Communications
spelling	doaj.art-d35b657fcd2e4fe8aefe46b618827fe82024-03-05T19:33:33ZengNature PortfolioNature Communications2041-17232024-01-0115111310.1038/s41467-024-45240-zSemi-supervised integration of single-cell transcriptomics dataMassimo Andreatta0Léonard Hérault1Paul Gueguen2David Gfeller3Ariel J. Berenstein4Santiago J. Carmona5Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of LausanneDepartment of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of LausanneDepartment of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of LausanneDepartment of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of LausanneLaboratorio de Biología Molecular, División Patología, Instituto Multidisciplinario de Investigaciones en Patologías Pediátricas (IMIPP), CONICET-GCBADepartment of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of LausanneAbstract Batch effects in single-cell RNA-seq data pose a significant challenge for comparative analyses across samples, individuals, and conditions. Although batch effect correction methods are routinely applied, data integration often leads to overcorrection and can result in the loss of biological variability. In this work we present STACAS, a batch correction method for scRNA-seq that leverages prior knowledge on cell types to preserve biological variability upon integration. Through an open-source benchmark, we show that semi-supervised STACAS outperforms state-of-the-art unsupervised methods, as well as supervised methods such as scANVI and scGen. STACAS scales well to large datasets and is robust to incomplete and imprecise input cell type labels, which are commonly encountered in real-life integration tasks. We argue that the incorporation of prior cell type information should be a common practice in single-cell data integration, and we provide a flexible framework for semi-supervised batch effect correction.https://doi.org/10.1038/s41467-024-45240-z
spellingShingle	Massimo Andreatta Léonard Hérault Paul Gueguen David Gfeller Ariel J. Berenstein Santiago J. Carmona Semi-supervised integration of single-cell transcriptomics data Nature Communications
title	Semi-supervised integration of single-cell transcriptomics data
title_full	Semi-supervised integration of single-cell transcriptomics data
title_fullStr	Semi-supervised integration of single-cell transcriptomics data
title_full_unstemmed	Semi-supervised integration of single-cell transcriptomics data
title_short	Semi-supervised integration of single-cell transcriptomics data
title_sort	semi supervised integration of single cell transcriptomics data
url	https://doi.org/10.1038/s41467-024-45240-z
work_keys_str_mv	AT massimoandreatta semisupervisedintegrationofsinglecelltranscriptomicsdata AT leonardherault semisupervisedintegrationofsinglecelltranscriptomicsdata AT paulgueguen semisupervisedintegrationofsinglecelltranscriptomicsdata AT davidgfeller semisupervisedintegrationofsinglecelltranscriptomicsdata AT arieljberenstein semisupervisedintegrationofsinglecelltranscriptomicsdata AT santiagojcarmona semisupervisedintegrationofsinglecelltranscriptomicsdata

Semi-supervised integration of single-cell transcriptomics data

Similar Items