ATAC2GRN: optimized ATAC-seq and DNase1-seq pipelines for rapid and accurate genome regulatory network inference

Abstract Background Chromatin accessibility profiling assays such as ATAC-seq and DNase1-seq offer the opportunity to rapidly characterize the regulatory state of the genome at a single nucleotide resolution. Optimization of molecular protocols has enabled the molecular biologist to produce next-gen...

Full description

Bibliographic Details
Main Authors: Thomas J. F. Pranzatelli, Drew G. Michael, John A. Chiorini
Format: Article
Language:English
Published: BMC 2018-07-01
Series:BMC Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12864-018-4943-z
_version_ 1818957142521020416
author Thomas J. F. Pranzatelli
Drew G. Michael
John A. Chiorini
author_facet Thomas J. F. Pranzatelli
Drew G. Michael
John A. Chiorini
author_sort Thomas J. F. Pranzatelli
collection DOAJ
description Abstract Background Chromatin accessibility profiling assays such as ATAC-seq and DNase1-seq offer the opportunity to rapidly characterize the regulatory state of the genome at a single nucleotide resolution. Optimization of molecular protocols has enabled the molecular biologist to produce next-generation sequencing libraries in several hours, leaving the analysis of sequencing data as the primary obstacle to wide-scale deployment of accessibility profiling assays. To address this obstacle we have developed an optimized and efficient pipeline for the analysis of ATAC-seq and DNase1-seq data. Results We executed a multi-dimensional grid-search on the NIH Biowulf supercomputing cluster to assess the impact of parameter selection on biological reproducibility and ChIP-seq recovery by analyzing 4560 pipeline configurations. Our analysis improved ChIP-seq recovery by 15% for ATAC-seq and 3% for DNase1-seq and determined that PCR duplicate removal improves biological reproducibility by 36% without significant costs in footprinting transcription factors. Our analyses of down sampled reads identified a point of diminishing returns for increased library sequencing depth, with 95% of the ChIP-seq data of a 200 million read footprinting library recovered by 160 million reads. Conclusions We present optimized ATAC-seq and DNase-seq pipelines in both Snakemake and bash formats as well as optimal sequencing depths for ATAC-seq and DNase-seq projects. The optimized ATAC-seq and DNase1-seq analysis pipelines, parameters, and ground-truth ChIP-seq datasets have been made available for deployment and future algorithmic profiling.
first_indexed 2024-12-20T11:05:09Z
format Article
id doaj.art-31ca8508c9564c709377008b05c142e4
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-12-20T11:05:09Z
publishDate 2018-07-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-31ca8508c9564c709377008b05c142e42022-12-21T19:42:52ZengBMCBMC Genomics1471-21642018-07-0119111310.1186/s12864-018-4943-zATAC2GRN: optimized ATAC-seq and DNase1-seq pipelines for rapid and accurate genome regulatory network inferenceThomas J. F. Pranzatelli0Drew G. Michael1John A. Chiorini2National Institute of Dental and Craniofacial Research, National Institutes of HealthNational Institute of Dental and Craniofacial Research, National Institutes of HealthNational Institute of Dental and Craniofacial Research, National Institutes of HealthAbstract Background Chromatin accessibility profiling assays such as ATAC-seq and DNase1-seq offer the opportunity to rapidly characterize the regulatory state of the genome at a single nucleotide resolution. Optimization of molecular protocols has enabled the molecular biologist to produce next-generation sequencing libraries in several hours, leaving the analysis of sequencing data as the primary obstacle to wide-scale deployment of accessibility profiling assays. To address this obstacle we have developed an optimized and efficient pipeline for the analysis of ATAC-seq and DNase1-seq data. Results We executed a multi-dimensional grid-search on the NIH Biowulf supercomputing cluster to assess the impact of parameter selection on biological reproducibility and ChIP-seq recovery by analyzing 4560 pipeline configurations. Our analysis improved ChIP-seq recovery by 15% for ATAC-seq and 3% for DNase1-seq and determined that PCR duplicate removal improves biological reproducibility by 36% without significant costs in footprinting transcription factors. Our analyses of down sampled reads identified a point of diminishing returns for increased library sequencing depth, with 95% of the ChIP-seq data of a 200 million read footprinting library recovered by 160 million reads. Conclusions We present optimized ATAC-seq and DNase-seq pipelines in both Snakemake and bash formats as well as optimal sequencing depths for ATAC-seq and DNase-seq projects. The optimized ATAC-seq and DNase1-seq analysis pipelines, parameters, and ground-truth ChIP-seq datasets have been made available for deployment and future algorithmic profiling.http://link.springer.com/article/10.1186/s12864-018-4943-zDNA footprintingPipelineATAC-seqDNase1-seqRegulationOptimization
spellingShingle Thomas J. F. Pranzatelli
Drew G. Michael
John A. Chiorini
ATAC2GRN: optimized ATAC-seq and DNase1-seq pipelines for rapid and accurate genome regulatory network inference
BMC Genomics
DNA footprinting
Pipeline
ATAC-seq
DNase1-seq
Regulation
Optimization
title ATAC2GRN: optimized ATAC-seq and DNase1-seq pipelines for rapid and accurate genome regulatory network inference
title_full ATAC2GRN: optimized ATAC-seq and DNase1-seq pipelines for rapid and accurate genome regulatory network inference
title_fullStr ATAC2GRN: optimized ATAC-seq and DNase1-seq pipelines for rapid and accurate genome regulatory network inference
title_full_unstemmed ATAC2GRN: optimized ATAC-seq and DNase1-seq pipelines for rapid and accurate genome regulatory network inference
title_short ATAC2GRN: optimized ATAC-seq and DNase1-seq pipelines for rapid and accurate genome regulatory network inference
title_sort atac2grn optimized atac seq and dnase1 seq pipelines for rapid and accurate genome regulatory network inference
topic DNA footprinting
Pipeline
ATAC-seq
DNase1-seq
Regulation
Optimization
url http://link.springer.com/article/10.1186/s12864-018-4943-z
work_keys_str_mv AT thomasjfpranzatelli atac2grnoptimizedatacseqanddnase1seqpipelinesforrapidandaccurategenomeregulatorynetworkinference
AT drewgmichael atac2grnoptimizedatacseqanddnase1seqpipelinesforrapidandaccurategenomeregulatorynetworkinference
AT johnachiorini atac2grnoptimizedatacseqanddnase1seqpipelinesforrapidandaccurategenomeregulatorynetworkinference