Joint analysis of scATAC-seq datasets using epiConv

Abstract Background Technical improvement in ATAC-seq makes it possible for high throughput profiling the chromatin states of single cells. However, data from multiple sources frequently show strong technical variations, which is referred to as batch effects. In order to perform joint analysis acros...

Full description

Bibliographic Details
Main Authors: Li Lin, Liye Zhang
Format: Article
Language:English
Published: BMC 2022-07-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-022-04858-w
_version_ 1828778212604248064
author Li Lin
Liye Zhang
author_facet Li Lin
Liye Zhang
author_sort Li Lin
collection DOAJ
description Abstract Background Technical improvement in ATAC-seq makes it possible for high throughput profiling the chromatin states of single cells. However, data from multiple sources frequently show strong technical variations, which is referred to as batch effects. In order to perform joint analysis across multiple datasets, specialized method is required to remove technical variations between datasets while keep biological information. Results Here we present an algorithm named epiConv to perform joint analyses on scATAC-seq datasets. We first show that epiConv better corrects batch effects and is less prone to over-fitting problem than existing methods on a collection of PBMC datasets. In a collection of mouse brain data, we show that epiConv is capable of aligning low-depth scATAC-Seq from co-assay data (simultaneous profiling of transcriptome and chromatin) onto high-quality ATAC-seq reference and increasing the resolution of chromatin profiles of co-assay data. Finally, we show that epiConv can be used to integrate cells from different biological conditions (T cells in normal vs. germ-free mouse; normal vs. malignant hematopoiesis), which reveals hidden cell populations that would otherwise be undetectable. Conclusions In this study, we introduce epiConv to integrate multiple scATAC-seq datasets and perform joint analysis on them. Through several case studies, we show that epiConv removes the batch effects and retains the biological signal. Moreover, joint analysis across multiple datasets improves the performance of clustering and differentially accessible peak calling, especially when the biological signal is weak in single dataset.
first_indexed 2024-12-11T16:37:43Z
format Article
id doaj.art-323453ab0cbe405aa01d03bdf90721e3
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-11T16:37:43Z
publishDate 2022-07-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-323453ab0cbe405aa01d03bdf90721e32022-12-22T00:58:24ZengBMCBMC Bioinformatics1471-21052022-07-0123112010.1186/s12859-022-04858-wJoint analysis of scATAC-seq datasets using epiConvLi Lin0Liye Zhang1School of Life Science and Technology, ShanghaiTech UniversitySchool of Life Science and Technology, ShanghaiTech UniversityAbstract Background Technical improvement in ATAC-seq makes it possible for high throughput profiling the chromatin states of single cells. However, data from multiple sources frequently show strong technical variations, which is referred to as batch effects. In order to perform joint analysis across multiple datasets, specialized method is required to remove technical variations between datasets while keep biological information. Results Here we present an algorithm named epiConv to perform joint analyses on scATAC-seq datasets. We first show that epiConv better corrects batch effects and is less prone to over-fitting problem than existing methods on a collection of PBMC datasets. In a collection of mouse brain data, we show that epiConv is capable of aligning low-depth scATAC-Seq from co-assay data (simultaneous profiling of transcriptome and chromatin) onto high-quality ATAC-seq reference and increasing the resolution of chromatin profiles of co-assay data. Finally, we show that epiConv can be used to integrate cells from different biological conditions (T cells in normal vs. germ-free mouse; normal vs. malignant hematopoiesis), which reveals hidden cell populations that would otherwise be undetectable. Conclusions In this study, we introduce epiConv to integrate multiple scATAC-seq datasets and perform joint analysis on them. Through several case studies, we show that epiConv removes the batch effects and retains the biological signal. Moreover, joint analysis across multiple datasets improves the performance of clustering and differentially accessible peak calling, especially when the biological signal is weak in single dataset.https://doi.org/10.1186/s12859-022-04858-wscATAC-seqCell clusteringBatch effectsData integration
spellingShingle Li Lin
Liye Zhang
Joint analysis of scATAC-seq datasets using epiConv
BMC Bioinformatics
scATAC-seq
Cell clustering
Batch effects
Data integration
title Joint analysis of scATAC-seq datasets using epiConv
title_full Joint analysis of scATAC-seq datasets using epiConv
title_fullStr Joint analysis of scATAC-seq datasets using epiConv
title_full_unstemmed Joint analysis of scATAC-seq datasets using epiConv
title_short Joint analysis of scATAC-seq datasets using epiConv
title_sort joint analysis of scatac seq datasets using epiconv
topic scATAC-seq
Cell clustering
Batch effects
Data integration
url https://doi.org/10.1186/s12859-022-04858-w
work_keys_str_mv AT lilin jointanalysisofscatacseqdatasetsusingepiconv
AT liyezhang jointanalysisofscatacseqdatasetsusingepiconv