Assessing quality standards for ChIP-seq and related massive parallel sequencing-generated datasets: When rating goes beyond avoiding the crisis

Massive parallel DNA sequencing combined with chromatin immunoprecipitation and a large variety of DNA/RNA-enrichment methodologies is at the origin of data resources of major importance. Indeed these resources, available for multiple genomes, represent the most comprehensive catalogue of (i) cell,...

Full description

Bibliographic Details
Main Authors: Marco Antonio Mendoza-Parra, Hinrich Gronemeyer
Format: Article
Language:English
Published: Elsevier 2014-12-01
Series:Genomics Data
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2213596014000671
_version_ 1818893752845991936
author Marco Antonio Mendoza-Parra
Hinrich Gronemeyer
author_facet Marco Antonio Mendoza-Parra
Hinrich Gronemeyer
author_sort Marco Antonio Mendoza-Parra
collection DOAJ
description Massive parallel DNA sequencing combined with chromatin immunoprecipitation and a large variety of DNA/RNA-enrichment methodologies is at the origin of data resources of major importance. Indeed these resources, available for multiple genomes, represent the most comprehensive catalogue of (i) cell, development and signal transduction-specified patterns of binding sites for transcription factors (‘cistromes’) and for transcription and chromatin modifying machineries and (ii) the patterns of specific local post-translational modifications of histones and DNA (‘epigenome’) or of regulatory chromatin binding factors. In addition, (iii) the resources specifying chromatin structure alterations are emerging. Importantly, these types of “omics” datasets populate increasingly public repositories and provide highly valuable resources for the exploration of general principles of cell function in a multi-dimensional genome–transcriptome–epigenome–chromatin structure context. However, data mining is critically dependent on the data quality, an issue that, surprisingly, is still largely ignored by scientists and well-financed consortia, data repositories and scientific journals. So what determines the quality of ChIP-seq experiments and the datasets generated therefrom and what refrains scientists from associating quality criteria to their data? In this ‘opinion’ we trace the various parameters that influence the quality of this type of datasets, as well as the computational efforts that were made until now to qualify them. Moreover, we describe a universal quality control (QC) certification approach that provides a quality rating for ChIP-seq and enrichment-related assays. The corresponding QC tool and a regularly updated database, from which at present the quality parameters of more than 8000 datasets can be retrieved, are freely accessible at www.ngs-qc.org.
first_indexed 2024-12-19T18:17:36Z
format Article
id doaj.art-7f2a8a441eea40bd960f91ade7b1f030
institution Directory Open Access Journal
issn 2213-5960
language English
last_indexed 2024-12-19T18:17:36Z
publishDate 2014-12-01
publisher Elsevier
record_format Article
series Genomics Data
spelling doaj.art-7f2a8a441eea40bd960f91ade7b1f0302022-12-21T20:11:03ZengElsevierGenomics Data2213-59602014-12-012C26827310.1016/j.gdata.2014.08.002Assessing quality standards for ChIP-seq and related massive parallel sequencing-generated datasets: When rating goes beyond avoiding the crisisMarco Antonio Mendoza-ParraHinrich GronemeyerMassive parallel DNA sequencing combined with chromatin immunoprecipitation and a large variety of DNA/RNA-enrichment methodologies is at the origin of data resources of major importance. Indeed these resources, available for multiple genomes, represent the most comprehensive catalogue of (i) cell, development and signal transduction-specified patterns of binding sites for transcription factors (‘cistromes’) and for transcription and chromatin modifying machineries and (ii) the patterns of specific local post-translational modifications of histones and DNA (‘epigenome’) or of regulatory chromatin binding factors. In addition, (iii) the resources specifying chromatin structure alterations are emerging. Importantly, these types of “omics” datasets populate increasingly public repositories and provide highly valuable resources for the exploration of general principles of cell function in a multi-dimensional genome–transcriptome–epigenome–chromatin structure context. However, data mining is critically dependent on the data quality, an issue that, surprisingly, is still largely ignored by scientists and well-financed consortia, data repositories and scientific journals. So what determines the quality of ChIP-seq experiments and the datasets generated therefrom and what refrains scientists from associating quality criteria to their data? In this ‘opinion’ we trace the various parameters that influence the quality of this type of datasets, as well as the computational efforts that were made until now to qualify them. Moreover, we describe a universal quality control (QC) certification approach that provides a quality rating for ChIP-seq and enrichment-related assays. The corresponding QC tool and a regularly updated database, from which at present the quality parameters of more than 8000 datasets can be retrieved, are freely accessible at www.ngs-qc.org.http://www.sciencedirect.com/science/article/pii/S2213596014000671ChIP sequencingMassive parallel sequencingQuality controlOmics data mining
spellingShingle Marco Antonio Mendoza-Parra
Hinrich Gronemeyer
Assessing quality standards for ChIP-seq and related massive parallel sequencing-generated datasets: When rating goes beyond avoiding the crisis
Genomics Data
ChIP sequencing
Massive parallel sequencing
Quality control
Omics data mining
title Assessing quality standards for ChIP-seq and related massive parallel sequencing-generated datasets: When rating goes beyond avoiding the crisis
title_full Assessing quality standards for ChIP-seq and related massive parallel sequencing-generated datasets: When rating goes beyond avoiding the crisis
title_fullStr Assessing quality standards for ChIP-seq and related massive parallel sequencing-generated datasets: When rating goes beyond avoiding the crisis
title_full_unstemmed Assessing quality standards for ChIP-seq and related massive parallel sequencing-generated datasets: When rating goes beyond avoiding the crisis
title_short Assessing quality standards for ChIP-seq and related massive parallel sequencing-generated datasets: When rating goes beyond avoiding the crisis
title_sort assessing quality standards for chip seq and related massive parallel sequencing generated datasets when rating goes beyond avoiding the crisis
topic ChIP sequencing
Massive parallel sequencing
Quality control
Omics data mining
url http://www.sciencedirect.com/science/article/pii/S2213596014000671
work_keys_str_mv AT marcoantoniomendozaparra assessingqualitystandardsforchipseqandrelatedmassiveparallelsequencinggenerateddatasetswhenratinggoesbeyondavoidingthecrisis
AT hinrichgronemeyer assessingqualitystandardsforchipseqandrelatedmassiveparallelsequencinggenerateddatasetswhenratinggoesbeyondavoidingthecrisis