DistMap: a toolkit for distributed short read mapping on a Hadoop cluster.

With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliora...

Full description

Bibliographic Details
Main Authors:	Ram Vinay Pandey, Christian Schlötterer
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2013-01-01
Series:	PLoS ONE
Online Access:	http://europepmc.org/articles/PMC3751911?pdf=render

_version_	1818247429365956608
author	Ram Vinay Pandey Christian Schlötterer
author_facet	Ram Vinay Pandey Christian Schlötterer
author_sort	Ram Vinay Pandey
collection	DOAJ
description	With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework. DistMap is easy to use, currently supports nine different short read mapping tools and can be run on all Unix-based operating systems. It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms. DistMap is available from http://code.google.com/p/distmap/
first_indexed	2024-12-12T15:04:34Z
format	Article
id	doaj.art-04391609688c45a5a76f4e8ee3e671f1
institution	Directory Open Access Journal
issn	1932-6203
language	English
last_indexed	2024-12-12T15:04:34Z
publishDate	2013-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj.art-04391609688c45a5a76f4e8ee3e671f12022-12-22T00:20:45ZengPublic Library of Science (PLoS)PLoS ONE1932-62032013-01-0188e7261410.1371/journal.pone.0072614DistMap: a toolkit for distributed short read mapping on a Hadoop cluster.Ram Vinay PandeyChristian SchlöttererWith the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework. DistMap is easy to use, currently supports nine different short read mapping tools and can be run on all Unix-based operating systems. It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms. DistMap is available from http://code.google.com/p/distmap/http://europepmc.org/articles/PMC3751911?pdf=render
spellingShingle	Ram Vinay Pandey Christian Schlötterer DistMap: a toolkit for distributed short read mapping on a Hadoop cluster. PLoS ONE
title	DistMap: a toolkit for distributed short read mapping on a Hadoop cluster.
title_full	DistMap: a toolkit for distributed short read mapping on a Hadoop cluster.
title_fullStr	DistMap: a toolkit for distributed short read mapping on a Hadoop cluster.
title_full_unstemmed	DistMap: a toolkit for distributed short read mapping on a Hadoop cluster.
title_short	DistMap: a toolkit for distributed short read mapping on a Hadoop cluster.
title_sort	distmap a toolkit for distributed short read mapping on a hadoop cluster
url	http://europepmc.org/articles/PMC3751911?pdf=render
work_keys_str_mv	AT ramvinaypandey distmapatoolkitfordistributedshortreadmappingonahadoopcluster AT christianschlotterer distmapatoolkitfordistributedshortreadmappingonahadoopcluster

DistMap: a toolkit for distributed short read mapping on a Hadoop cluster.

Similar Items