VARiD: A variation detection framework for color-space and letter-space platforms

Motivation: High-throughput sequencing (HTS) technologies are transforming the study of genomic variation. The various HTS technologies have different sequencing biases and error rates, and while most HTS technologies sequence the residues of the genome directly, generating base calls for each posit...

Full description

Bibliographic Details
Main Authors:	Dalca, Adrian Vasile, Rumble, Stephen M., Levy, Samuel, Brudno, Michael
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format:	Article
Language:	en_US
Published:	Oxford University Press 2012
Online Access:	http://hdl.handle.net/1721.1/73027 https://orcid.org/0000-0002-8422-0136

_version_	1811068736994541568
author	Dalca, Adrian Vasile Rumble, Stephen M. Levy, Samuel Brudno, Michael
author2	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Dalca, Adrian Vasile Rumble, Stephen M. Levy, Samuel Brudno, Michael
author_sort	Dalca, Adrian Vasile
collection	MIT
description	Motivation: High-throughput sequencing (HTS) technologies are transforming the study of genomic variation. The various HTS technologies have different sequencing biases and error rates, and while most HTS technologies sequence the residues of the genome directly, generating base calls for each position, the Applied Biosystem's SOLiD platform generates dibase-coded (color space) sequences. While combining data from the various platforms should increase the accuracy of variation detection, to date there are only a few tools that can identify variants from color space data, and none that can analyze color space and regular (letter space) data together. Results: We present VARiD—a probabilistic method for variation detection from both letter- and color-space reads simultaneously. VARiD is based on a hidden Markov model and uses the forward-backward algorithm to accurately identify heterozygous, homozygous and tri-allelic SNPs, as well as micro-indels. Our analysis shows that VARiD performs better than the AB SOLiD toolset at detecting variants from color-space data alone, and improves the calls dramatically when letter- and color-space reads are combined.
first_indexed	2024-09-23T08:00:19Z
format	Article
id	mit-1721.1/73027
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T08:00:19Z
publishDate	2012
publisher	Oxford University Press
record_format	dspace
spelling	mit-1721.1/730272021-09-09T17:21:02Z VARiD: A variation detection framework for color-space and letter-space platforms Dalca, Adrian Vasile Rumble, Stephen M. Levy, Samuel Brudno, Michael Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Dalca, Adrian Vasile Dalca, Adrian Vasile Motivation: High-throughput sequencing (HTS) technologies are transforming the study of genomic variation. The various HTS technologies have different sequencing biases and error rates, and while most HTS technologies sequence the residues of the genome directly, generating base calls for each position, the Applied Biosystem's SOLiD platform generates dibase-coded (color space) sequences. While combining data from the various platforms should increase the accuracy of variation detection, to date there are only a few tools that can identify variants from color space data, and none that can analyze color space and regular (letter space) data together. Results: We present VARiD—a probabilistic method for variation detection from both letter- and color-space reads simultaneously. VARiD is based on a hidden Markov model and uses the forward-backward algorithm to accurately identify heterozygous, homozygous and tri-allelic SNPs, as well as micro-indels. Our analysis shows that VARiD performs better than the AB SOLiD toolset at detecting variants from color-space data alone, and improves the calls dramatically when letter- and color-space reads are combined. Natural Sciences and Engineering Research Council of Canada (NSERC) Mathematics of Information Technology and Complex Systems (Network) Life Technologies, Inc. 2012-09-17T20:08:07Z 2012-09-17T20:08:07Z 2010-06 Article http://purl.org/eprint/type/JournalArticle 1460-2059 1367-4803 http://hdl.handle.net/1721.1/73027 Dalca, A. V. et al. “VARiD: A Variation Detection Framework for Color-space and Letter-space Platforms.” Bioinformatics 26.12 (2010): i343–i349. Web. https://orcid.org/0000-0002-8422-0136 en_US http://dx.doi.org/10.1093/bioinformatics/btq184 Bioinformatics Creative Commons Attribution Non-Commercial http://creativecommons.org/licenses/by-nc/2.5 application/pdf Oxford University Press Oxford
spellingShingle	Dalca, Adrian Vasile Rumble, Stephen M. Levy, Samuel Brudno, Michael VARiD: A variation detection framework for color-space and letter-space platforms
title	VARiD: A variation detection framework for color-space and letter-space platforms
title_full	VARiD: A variation detection framework for color-space and letter-space platforms
title_fullStr	VARiD: A variation detection framework for color-space and letter-space platforms
title_full_unstemmed	VARiD: A variation detection framework for color-space and letter-space platforms
title_short	VARiD: A variation detection framework for color-space and letter-space platforms
title_sort	varid a variation detection framework for color space and letter space platforms
url	http://hdl.handle.net/1721.1/73027 https://orcid.org/0000-0002-8422-0136
work_keys_str_mv	AT dalcaadrianvasile varidavariationdetectionframeworkforcolorspaceandletterspaceplatforms AT rumblestephenm varidavariationdetectionframeworkforcolorspaceandletterspaceplatforms AT levysamuel varidavariationdetectionframeworkforcolorspaceandletterspaceplatforms AT brudnomichael varidavariationdetectionframeworkforcolorspaceandletterspaceplatforms

VARiD: A variation detection framework for color-space and letter-space platforms

Similar Items