ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data

<p>Abstract</p> <p>Background</p> <p>Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) or ChIP followed by genome tiling array analysis (ChIP-chip) have become standard technologies for genome-wide identification of DNA-binding prote...

Full description

Bibliographic Details
Main Authors: Pagès Hervé, Lawson Nathan D, Gazin Claude, Zhu Lihua J, Lin Simon M, Lapointe David S, Green Michael R
Format: Article
Language:English
Published: BMC 2010-05-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/11/237
_version_ 1818960227753525248
author Pagès Hervé
Lawson Nathan D
Gazin Claude
Zhu Lihua J
Lin Simon M
Lapointe David S
Green Michael R
author_facet Pagès Hervé
Lawson Nathan D
Gazin Claude
Zhu Lihua J
Lin Simon M
Lapointe David S
Green Michael R
author_sort Pagès Hervé
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) or ChIP followed by genome tiling array analysis (ChIP-chip) have become standard technologies for genome-wide identification of DNA-binding protein target sites. A number of algorithms have been developed in parallel that allow identification of binding sites from ChIP-seq or ChIP-chip datasets and subsequent visualization in the University of California Santa Cruz (UCSC) Genome Browser as custom annotation tracks. However, summarizing these tracks can be a daunting task, particularly if there are a large number of binding sites or the binding sites are distributed widely across the genome.</p> <p>Results</p> <p>We have developed <it>ChIPpeakAnno </it>as a Bioconductor package within the statistical programming environment R to facilitate batch annotation of enriched peaks identified from ChIP-seq, ChIP-chip, cap analysis of gene expression (CAGE) or any experiments resulting in a large number of enriched genomic regions. The binding sites annotated with <it>ChIPpeakAnno </it>can be viewed easily as a table, a pie chart or plotted in histogram form, i.e., the distribution of distances to the nearest genes for each set of peaks. In addition, we have implemented functionalities for determining the significance of overlap between replicates or binding sites among transcription factors within a complex, and for drawing Venn diagrams to visualize the extent of the overlap between replicates. Furthermore, the package includes functionalities to retrieve sequences flanking putative binding sites for PCR amplification, cloning, or motif discovery, and to identify Gene Ontology (GO) terms associated with adjacent genes.</p> <p>Conclusions</p> <p><it>ChIPpeakAnno </it>enables batch annotation of the binding sites identified from ChIP-seq, ChIP-chip, CAGE or any technology that results in a large number of enriched genomic regions within the statistical programming environment R. Allowing users to pass their own annotation data such as a different Chromatin immunoprecipitation (ChIP) preparation and a dataset from literature, or existing annotation packages, such as <it>GenomicFeatures </it>and <it>BSgenom</it>e, provides flexibility. Tight integration to the <it>biomaRt </it>package enables up-to-date annotation retrieval from the BioMart database.</p>
first_indexed 2024-12-20T11:54:11Z
format Article
id doaj.art-7237726074f34435b03929bdd35c847b
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-20T11:54:11Z
publishDate 2010-05-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-7237726074f34435b03929bdd35c847b2022-12-21T19:41:43ZengBMCBMC Bioinformatics1471-21052010-05-0111123710.1186/1471-2105-11-237ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip dataPagès HervéLawson Nathan DGazin ClaudeZhu Lihua JLin Simon MLapointe David SGreen Michael R<p>Abstract</p> <p>Background</p> <p>Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) or ChIP followed by genome tiling array analysis (ChIP-chip) have become standard technologies for genome-wide identification of DNA-binding protein target sites. A number of algorithms have been developed in parallel that allow identification of binding sites from ChIP-seq or ChIP-chip datasets and subsequent visualization in the University of California Santa Cruz (UCSC) Genome Browser as custom annotation tracks. However, summarizing these tracks can be a daunting task, particularly if there are a large number of binding sites or the binding sites are distributed widely across the genome.</p> <p>Results</p> <p>We have developed <it>ChIPpeakAnno </it>as a Bioconductor package within the statistical programming environment R to facilitate batch annotation of enriched peaks identified from ChIP-seq, ChIP-chip, cap analysis of gene expression (CAGE) or any experiments resulting in a large number of enriched genomic regions. The binding sites annotated with <it>ChIPpeakAnno </it>can be viewed easily as a table, a pie chart or plotted in histogram form, i.e., the distribution of distances to the nearest genes for each set of peaks. In addition, we have implemented functionalities for determining the significance of overlap between replicates or binding sites among transcription factors within a complex, and for drawing Venn diagrams to visualize the extent of the overlap between replicates. Furthermore, the package includes functionalities to retrieve sequences flanking putative binding sites for PCR amplification, cloning, or motif discovery, and to identify Gene Ontology (GO) terms associated with adjacent genes.</p> <p>Conclusions</p> <p><it>ChIPpeakAnno </it>enables batch annotation of the binding sites identified from ChIP-seq, ChIP-chip, CAGE or any technology that results in a large number of enriched genomic regions within the statistical programming environment R. Allowing users to pass their own annotation data such as a different Chromatin immunoprecipitation (ChIP) preparation and a dataset from literature, or existing annotation packages, such as <it>GenomicFeatures </it>and <it>BSgenom</it>e, provides flexibility. Tight integration to the <it>biomaRt </it>package enables up-to-date annotation retrieval from the BioMart database.</p>http://www.biomedcentral.com/1471-2105/11/237
spellingShingle Pagès Hervé
Lawson Nathan D
Gazin Claude
Zhu Lihua J
Lin Simon M
Lapointe David S
Green Michael R
ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data
BMC Bioinformatics
title ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data
title_full ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data
title_fullStr ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data
title_full_unstemmed ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data
title_short ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data
title_sort chippeakanno a bioconductor package to annotate chip seq and chip chip data
url http://www.biomedcentral.com/1471-2105/11/237
work_keys_str_mv AT pagesherve chippeakannoabioconductorpackagetoannotatechipseqandchipchipdata
AT lawsonnathand chippeakannoabioconductorpackagetoannotatechipseqandchipchipdata
AT gazinclaude chippeakannoabioconductorpackagetoannotatechipseqandchipchipdata
AT zhulihuaj chippeakannoabioconductorpackagetoannotatechipseqandchipchipdata
AT linsimonm chippeakannoabioconductorpackagetoannotatechipseqandchipchipdata
AT lapointedavids chippeakannoabioconductorpackagetoannotatechipseqandchipchipdata
AT greenmichaelr chippeakannoabioconductorpackagetoannotatechipseqandchipchipdata