Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance

<p>Abstract</p> <p>Background</p> <p>Massively parallel DNA sequencing technologies have enabled the sequencing of several individual human genomes. These technologies are also being used in novel ways for mRNA expression profiling, genome-wide discovery of transcriptio...

Full description

Bibliographic Details
Main Authors: Bansal Vikas, Bashir Ali, Bafna Vineet
Format: Article
Language:English
Published: BMC 2010-06-01
Series:BMC Genomics
Online Access:http://www.biomedcentral.com/1471-2164/11/385
_version_ 1818564805919768576
author Bansal Vikas
Bashir Ali
Bafna Vineet
author_facet Bansal Vikas
Bashir Ali
Bafna Vineet
author_sort Bansal Vikas
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Massively parallel DNA sequencing technologies have enabled the sequencing of several individual human genomes. These technologies are also being used in novel ways for mRNA expression profiling, genome-wide discovery of transcription-factor binding sites, small RNA discovery, etc. The multitude of sequencing platforms, each with their unique characteristics, pose a number of design challenges, regarding the technology to be used and the depth of sequencing required for a particular sequencing application. Here we describe a number of analytical and empirical results to address design questions for two applications: detection of structural variations from paired-end sequencing and estimating mRNA transcript abundance.</p> <p>Results</p> <p>For structural variation, our results provide explicit trade-offs between the detection and resolution of rearrangement breakpoints, and the optimal mix of paired-read insert lengths. Specifically, we prove that optimal detection and resolution of breakpoints is achieved using a mix of exactly two insert library lengths. Furthermore, we derive explicit formulae to determine these insert length combinations, enabling a 15% improvement in breakpoint detection at the same experimental cost. On empirical short read data, these predictions show good concordance with Illumina 200 bp and 2 Kbp insert length libraries. For transcriptome sequencing, we determine the sequencing depth needed to detect rare transcripts from a small pilot study. With only 1 Million reads, we derive corrections that enable almost perfect prediction of the underlying expression probability distribution, and use this to predict the sequencing depth required to detect low expressed genes with greater than 95% probability.</p> <p>Conclusions</p> <p>Together, our results form a generic framework for many design considerations related to high-throughput sequencing. We provide software tools <url>http://bix.ucsd.edu/projects/NGS-DesignTools</url> to derive platform independent guidelines for designing sequencing experiments (amount of sequencing, choice of insert length, mix of libraries) for novel applications of next generation sequencing.</p>
first_indexed 2024-12-14T01:33:23Z
format Article
id doaj.art-eeaab98d0b5f40539eeca6d2dce10a08
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-12-14T01:33:23Z
publishDate 2010-06-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-eeaab98d0b5f40539eeca6d2dce10a082022-12-21T23:21:58ZengBMCBMC Genomics1471-21642010-06-0111138510.1186/1471-2164-11-385Designing deep sequencing experiments: detecting structural variation and estimating transcript abundanceBansal VikasBashir AliBafna Vineet<p>Abstract</p> <p>Background</p> <p>Massively parallel DNA sequencing technologies have enabled the sequencing of several individual human genomes. These technologies are also being used in novel ways for mRNA expression profiling, genome-wide discovery of transcription-factor binding sites, small RNA discovery, etc. The multitude of sequencing platforms, each with their unique characteristics, pose a number of design challenges, regarding the technology to be used and the depth of sequencing required for a particular sequencing application. Here we describe a number of analytical and empirical results to address design questions for two applications: detection of structural variations from paired-end sequencing and estimating mRNA transcript abundance.</p> <p>Results</p> <p>For structural variation, our results provide explicit trade-offs between the detection and resolution of rearrangement breakpoints, and the optimal mix of paired-read insert lengths. Specifically, we prove that optimal detection and resolution of breakpoints is achieved using a mix of exactly two insert library lengths. Furthermore, we derive explicit formulae to determine these insert length combinations, enabling a 15% improvement in breakpoint detection at the same experimental cost. On empirical short read data, these predictions show good concordance with Illumina 200 bp and 2 Kbp insert length libraries. For transcriptome sequencing, we determine the sequencing depth needed to detect rare transcripts from a small pilot study. With only 1 Million reads, we derive corrections that enable almost perfect prediction of the underlying expression probability distribution, and use this to predict the sequencing depth required to detect low expressed genes with greater than 95% probability.</p> <p>Conclusions</p> <p>Together, our results form a generic framework for many design considerations related to high-throughput sequencing. We provide software tools <url>http://bix.ucsd.edu/projects/NGS-DesignTools</url> to derive platform independent guidelines for designing sequencing experiments (amount of sequencing, choice of insert length, mix of libraries) for novel applications of next generation sequencing.</p>http://www.biomedcentral.com/1471-2164/11/385
spellingShingle Bansal Vikas
Bashir Ali
Bafna Vineet
Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance
BMC Genomics
title Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance
title_full Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance
title_fullStr Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance
title_full_unstemmed Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance
title_short Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance
title_sort designing deep sequencing experiments detecting structural variation and estimating transcript abundance
url http://www.biomedcentral.com/1471-2164/11/385
work_keys_str_mv AT bansalvikas designingdeepsequencingexperimentsdetectingstructuralvariationandestimatingtranscriptabundance
AT bashirali designingdeepsequencingexperimentsdetectingstructuralvariationandestimatingtranscriptabundance
AT bafnavineet designingdeepsequencingexperimentsdetectingstructuralvariationandestimatingtranscriptabundance