Geoseq: a tool for dissecting deep-sequencing datasets

<p>Abstract</p> <p>Background</p> <p>Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despi...

Full description

Bibliographic Details
Main Authors: Homann Robert, George Ajish, Levovitz Chaya, Shah Hardik, Cancio Anthony, Gurtowski James, Sachidanandam Ravi
Format: Article
Language:English
Published: BMC 2010-10-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/11/506
_version_ 1819085845654667264
author Homann Robert
George Ajish
Levovitz Chaya
Shah Hardik
Cancio Anthony
Gurtowski James
Sachidanandam Ravi
author_facet Homann Robert
George Ajish
Levovitz Chaya
Shah Hardik
Cancio Anthony
Gurtowski James
Sachidanandam Ravi
author_sort Homann Robert
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest.</p> <p>Results</p> <p>Geoseq <url>http://geoseq.mssm.edu</url> provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment.</p> <p>Conclusions</p> <p>Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a) identify differential isoform expression in mRNA-seq datasets, b) identify miRNAs (microRNAs) in libraries, and identify mature and star sequences in miRNAS and c) to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.</p>
first_indexed 2024-12-21T21:10:50Z
format Article
id doaj.art-c8ca69155a5c463aba48d9a0b56c7e00
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-21T21:10:50Z
publishDate 2010-10-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-c8ca69155a5c463aba48d9a0b56c7e002022-12-21T18:50:09ZengBMCBMC Bioinformatics1471-21052010-10-0111150610.1186/1471-2105-11-506Geoseq: a tool for dissecting deep-sequencing datasetsHomann RobertGeorge AjishLevovitz ChayaShah HardikCancio AnthonyGurtowski JamesSachidanandam Ravi<p>Abstract</p> <p>Background</p> <p>Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest.</p> <p>Results</p> <p>Geoseq <url>http://geoseq.mssm.edu</url> provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment.</p> <p>Conclusions</p> <p>Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a) identify differential isoform expression in mRNA-seq datasets, b) identify miRNAs (microRNAs) in libraries, and identify mature and star sequences in miRNAS and c) to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.</p>http://www.biomedcentral.com/1471-2105/11/506
spellingShingle Homann Robert
George Ajish
Levovitz Chaya
Shah Hardik
Cancio Anthony
Gurtowski James
Sachidanandam Ravi
Geoseq: a tool for dissecting deep-sequencing datasets
BMC Bioinformatics
title Geoseq: a tool for dissecting deep-sequencing datasets
title_full Geoseq: a tool for dissecting deep-sequencing datasets
title_fullStr Geoseq: a tool for dissecting deep-sequencing datasets
title_full_unstemmed Geoseq: a tool for dissecting deep-sequencing datasets
title_short Geoseq: a tool for dissecting deep-sequencing datasets
title_sort geoseq a tool for dissecting deep sequencing datasets
url http://www.biomedcentral.com/1471-2105/11/506
work_keys_str_mv AT homannrobert geoseqatoolfordissectingdeepsequencingdatasets
AT georgeajish geoseqatoolfordissectingdeepsequencingdatasets
AT levovitzchaya geoseqatoolfordissectingdeepsequencingdatasets
AT shahhardik geoseqatoolfordissectingdeepsequencingdatasets
AT cancioanthony geoseqatoolfordissectingdeepsequencingdatasets
AT gurtowskijames geoseqatoolfordissectingdeepsequencingdatasets
AT sachidanandamravi geoseqatoolfordissectingdeepsequencingdatasets