Joint analysis of scATAC-seq datasets using epiConv
Abstract Background Technical improvement in ATAC-seq makes it possible for high throughput profiling the chromatin states of single cells. However, data from multiple sources frequently show strong technical variations, which is referred to as batch effects. In order to perform joint analysis acros...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2022-07-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12859-022-04858-w |
_version_ | 1828778212604248064 |
---|---|
author | Li Lin Liye Zhang |
author_facet | Li Lin Liye Zhang |
author_sort | Li Lin |
collection | DOAJ |
description | Abstract Background Technical improvement in ATAC-seq makes it possible for high throughput profiling the chromatin states of single cells. However, data from multiple sources frequently show strong technical variations, which is referred to as batch effects. In order to perform joint analysis across multiple datasets, specialized method is required to remove technical variations between datasets while keep biological information. Results Here we present an algorithm named epiConv to perform joint analyses on scATAC-seq datasets. We first show that epiConv better corrects batch effects and is less prone to over-fitting problem than existing methods on a collection of PBMC datasets. In a collection of mouse brain data, we show that epiConv is capable of aligning low-depth scATAC-Seq from co-assay data (simultaneous profiling of transcriptome and chromatin) onto high-quality ATAC-seq reference and increasing the resolution of chromatin profiles of co-assay data. Finally, we show that epiConv can be used to integrate cells from different biological conditions (T cells in normal vs. germ-free mouse; normal vs. malignant hematopoiesis), which reveals hidden cell populations that would otherwise be undetectable. Conclusions In this study, we introduce epiConv to integrate multiple scATAC-seq datasets and perform joint analysis on them. Through several case studies, we show that epiConv removes the batch effects and retains the biological signal. Moreover, joint analysis across multiple datasets improves the performance of clustering and differentially accessible peak calling, especially when the biological signal is weak in single dataset. |
first_indexed | 2024-12-11T16:37:43Z |
format | Article |
id | doaj.art-323453ab0cbe405aa01d03bdf90721e3 |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-12-11T16:37:43Z |
publishDate | 2022-07-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-323453ab0cbe405aa01d03bdf90721e32022-12-22T00:58:24ZengBMCBMC Bioinformatics1471-21052022-07-0123112010.1186/s12859-022-04858-wJoint analysis of scATAC-seq datasets using epiConvLi Lin0Liye Zhang1School of Life Science and Technology, ShanghaiTech UniversitySchool of Life Science and Technology, ShanghaiTech UniversityAbstract Background Technical improvement in ATAC-seq makes it possible for high throughput profiling the chromatin states of single cells. However, data from multiple sources frequently show strong technical variations, which is referred to as batch effects. In order to perform joint analysis across multiple datasets, specialized method is required to remove technical variations between datasets while keep biological information. Results Here we present an algorithm named epiConv to perform joint analyses on scATAC-seq datasets. We first show that epiConv better corrects batch effects and is less prone to over-fitting problem than existing methods on a collection of PBMC datasets. In a collection of mouse brain data, we show that epiConv is capable of aligning low-depth scATAC-Seq from co-assay data (simultaneous profiling of transcriptome and chromatin) onto high-quality ATAC-seq reference and increasing the resolution of chromatin profiles of co-assay data. Finally, we show that epiConv can be used to integrate cells from different biological conditions (T cells in normal vs. germ-free mouse; normal vs. malignant hematopoiesis), which reveals hidden cell populations that would otherwise be undetectable. Conclusions In this study, we introduce epiConv to integrate multiple scATAC-seq datasets and perform joint analysis on them. Through several case studies, we show that epiConv removes the batch effects and retains the biological signal. Moreover, joint analysis across multiple datasets improves the performance of clustering and differentially accessible peak calling, especially when the biological signal is weak in single dataset.https://doi.org/10.1186/s12859-022-04858-wscATAC-seqCell clusteringBatch effectsData integration |
spellingShingle | Li Lin Liye Zhang Joint analysis of scATAC-seq datasets using epiConv BMC Bioinformatics scATAC-seq Cell clustering Batch effects Data integration |
title | Joint analysis of scATAC-seq datasets using epiConv |
title_full | Joint analysis of scATAC-seq datasets using epiConv |
title_fullStr | Joint analysis of scATAC-seq datasets using epiConv |
title_full_unstemmed | Joint analysis of scATAC-seq datasets using epiConv |
title_short | Joint analysis of scATAC-seq datasets using epiConv |
title_sort | joint analysis of scatac seq datasets using epiconv |
topic | scATAC-seq Cell clustering Batch effects Data integration |
url | https://doi.org/10.1186/s12859-022-04858-w |
work_keys_str_mv | AT lilin jointanalysisofscatacseqdatasetsusingepiconv AT liyezhang jointanalysisofscatacseqdatasetsusingepiconv |