Semi-supervised integration of single-cell transcriptomics data
Abstract Batch effects in single-cell RNA-seq data pose a significant challenge for comparative analyses across samples, individuals, and conditions. Although batch effect correction methods are routinely applied, data integration often leads to overcorrection and can result in the loss of biologica...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2024-01-01
|
Series: | Nature Communications |
Online Access: | https://doi.org/10.1038/s41467-024-45240-z |
_version_ | 1797274128718233600 |
---|---|
author | Massimo Andreatta Léonard Hérault Paul Gueguen David Gfeller Ariel J. Berenstein Santiago J. Carmona |
author_facet | Massimo Andreatta Léonard Hérault Paul Gueguen David Gfeller Ariel J. Berenstein Santiago J. Carmona |
author_sort | Massimo Andreatta |
collection | DOAJ |
description | Abstract Batch effects in single-cell RNA-seq data pose a significant challenge for comparative analyses across samples, individuals, and conditions. Although batch effect correction methods are routinely applied, data integration often leads to overcorrection and can result in the loss of biological variability. In this work we present STACAS, a batch correction method for scRNA-seq that leverages prior knowledge on cell types to preserve biological variability upon integration. Through an open-source benchmark, we show that semi-supervised STACAS outperforms state-of-the-art unsupervised methods, as well as supervised methods such as scANVI and scGen. STACAS scales well to large datasets and is robust to incomplete and imprecise input cell type labels, which are commonly encountered in real-life integration tasks. We argue that the incorporation of prior cell type information should be a common practice in single-cell data integration, and we provide a flexible framework for semi-supervised batch effect correction. |
first_indexed | 2024-03-07T14:53:56Z |
format | Article |
id | doaj.art-d35b657fcd2e4fe8aefe46b618827fe8 |
institution | Directory Open Access Journal |
issn | 2041-1723 |
language | English |
last_indexed | 2024-03-07T14:53:56Z |
publishDate | 2024-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Nature Communications |
spelling | doaj.art-d35b657fcd2e4fe8aefe46b618827fe82024-03-05T19:33:33ZengNature PortfolioNature Communications2041-17232024-01-0115111310.1038/s41467-024-45240-zSemi-supervised integration of single-cell transcriptomics dataMassimo Andreatta0Léonard Hérault1Paul Gueguen2David Gfeller3Ariel J. Berenstein4Santiago J. Carmona5Department of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of LausanneDepartment of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of LausanneDepartment of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of LausanneDepartment of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of LausanneLaboratorio de Biología Molecular, División Patología, Instituto Multidisciplinario de Investigaciones en Patologías Pediátricas (IMIPP), CONICET-GCBADepartment of Oncology, Lausanne Branch, Ludwig Institute for Cancer Research, CHUV and University of LausanneAbstract Batch effects in single-cell RNA-seq data pose a significant challenge for comparative analyses across samples, individuals, and conditions. Although batch effect correction methods are routinely applied, data integration often leads to overcorrection and can result in the loss of biological variability. In this work we present STACAS, a batch correction method for scRNA-seq that leverages prior knowledge on cell types to preserve biological variability upon integration. Through an open-source benchmark, we show that semi-supervised STACAS outperforms state-of-the-art unsupervised methods, as well as supervised methods such as scANVI and scGen. STACAS scales well to large datasets and is robust to incomplete and imprecise input cell type labels, which are commonly encountered in real-life integration tasks. We argue that the incorporation of prior cell type information should be a common practice in single-cell data integration, and we provide a flexible framework for semi-supervised batch effect correction.https://doi.org/10.1038/s41467-024-45240-z |
spellingShingle | Massimo Andreatta Léonard Hérault Paul Gueguen David Gfeller Ariel J. Berenstein Santiago J. Carmona Semi-supervised integration of single-cell transcriptomics data Nature Communications |
title | Semi-supervised integration of single-cell transcriptomics data |
title_full | Semi-supervised integration of single-cell transcriptomics data |
title_fullStr | Semi-supervised integration of single-cell transcriptomics data |
title_full_unstemmed | Semi-supervised integration of single-cell transcriptomics data |
title_short | Semi-supervised integration of single-cell transcriptomics data |
title_sort | semi supervised integration of single cell transcriptomics data |
url | https://doi.org/10.1038/s41467-024-45240-z |
work_keys_str_mv | AT massimoandreatta semisupervisedintegrationofsinglecelltranscriptomicsdata AT leonardherault semisupervisedintegrationofsinglecelltranscriptomicsdata AT paulgueguen semisupervisedintegrationofsinglecelltranscriptomicsdata AT davidgfeller semisupervisedintegrationofsinglecelltranscriptomicsdata AT arieljberenstein semisupervisedintegrationofsinglecelltranscriptomicsdata AT santiagojcarmona semisupervisedintegrationofsinglecelltranscriptomicsdata |