SNP-PHAGE – High throughput SNP discovery pipeline

<p>Abstract</p> <p>Background</p> <p>Single nucleotide polymorphisms (SNPs) as defined here are single base sequence changes or short insertion/deletions between or within individuals of a given species. As a result of their abundance and the availability of high throug...

Full description

Bibliographic Details
Main Authors: Cregan Perry B, Choi Ik-Young, Hyten David L, Grefenstette John J, Matukumalli Lakshmi K, Van Tassell Curtis P
Format: Article
Language:English
Published: BMC 2006-10-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/7/468
_version_ 1828214235123941376
author Cregan Perry B
Choi Ik-Young
Hyten David L
Grefenstette John J
Matukumalli Lakshmi K
Van Tassell Curtis P
author_facet Cregan Perry B
Choi Ik-Young
Hyten David L
Grefenstette John J
Matukumalli Lakshmi K
Van Tassell Curtis P
author_sort Cregan Perry B
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Single nucleotide polymorphisms (SNPs) as defined here are single base sequence changes or short insertion/deletions between or within individuals of a given species. As a result of their abundance and the availability of high throughput analysis technologies SNP markers have begun to replace other traditional markers such as restriction fragment length polymorphisms (RFLPs), amplified fragment length polymorphisms (AFLPs) and simple sequence repeats (SSRs or microsatellite) markers for fine mapping and association studies in several species. For SNP discovery from chromatogram data, several bioinformatics programs have to be combined to generate an analysis pipeline. Results have to be stored in a relational database to facilitate interrogation through queries or to generate data for further analyses such as determination of linkage disequilibrium and identification of common haplotypes. Although these tasks are routinely performed by several groups, an integrated open source SNP discovery pipeline that can be easily adapted by new groups interested in SNP marker development is currently unavailable.</p> <p>Results</p> <p>We developed SNP-PHAGE (<b>SNP </b>discovery <b>P</b>ipeline with additional features for identification of common haplotypes within a sequence tagged site (<b>H</b>aplotype <b>A</b>nalysis) and <b>Ge</b>nBank (-dbSNP) submissions. This tool was applied for analyzing sequence traces from diverse soybean genotypes to discover over 10,000 SNPs. This package was developed on UNIX/Linux platform, written in Perl and uses a MySQL database. Scripts to generate a user-friendly web interface are also provided with common queries for preliminary data analysis. A machine learning tool developed by this group for increasing the efficiency of SNP discovery is integrated as a part of this package as an optional feature. The SNP-PHAGE package is being made available open source at <url>http://bfgl.anri.barc.usda.gov/ML/snp-phage/</url>.</p> <p>Conclusion</p> <p>SNP-PHAGE provides a bioinformatics solution for high throughput SNP discovery, identification of common haplotypes within an amplicon, and GenBank (dbSNP) submissions. SNP selection and visualization are aided through a user-friendly web interface. This tool is useful for analyzing sequence tagged sites (STSs) of genomic sequences, and this software can serve as a starting point for groups interested in developing SNP markers.</p>
first_indexed 2024-04-12T15:01:08Z
format Article
id doaj.art-833980729a244be681ff10c9ba355446
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-12T15:01:08Z
publishDate 2006-10-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-833980729a244be681ff10c9ba3554462022-12-22T03:28:04ZengBMCBMC Bioinformatics1471-21052006-10-017146810.1186/1471-2105-7-468SNP-PHAGE – High throughput SNP discovery pipelineCregan Perry BChoi Ik-YoungHyten David LGrefenstette John JMatukumalli Lakshmi KVan Tassell Curtis P<p>Abstract</p> <p>Background</p> <p>Single nucleotide polymorphisms (SNPs) as defined here are single base sequence changes or short insertion/deletions between or within individuals of a given species. As a result of their abundance and the availability of high throughput analysis technologies SNP markers have begun to replace other traditional markers such as restriction fragment length polymorphisms (RFLPs), amplified fragment length polymorphisms (AFLPs) and simple sequence repeats (SSRs or microsatellite) markers for fine mapping and association studies in several species. For SNP discovery from chromatogram data, several bioinformatics programs have to be combined to generate an analysis pipeline. Results have to be stored in a relational database to facilitate interrogation through queries or to generate data for further analyses such as determination of linkage disequilibrium and identification of common haplotypes. Although these tasks are routinely performed by several groups, an integrated open source SNP discovery pipeline that can be easily adapted by new groups interested in SNP marker development is currently unavailable.</p> <p>Results</p> <p>We developed SNP-PHAGE (<b>SNP </b>discovery <b>P</b>ipeline with additional features for identification of common haplotypes within a sequence tagged site (<b>H</b>aplotype <b>A</b>nalysis) and <b>Ge</b>nBank (-dbSNP) submissions. This tool was applied for analyzing sequence traces from diverse soybean genotypes to discover over 10,000 SNPs. This package was developed on UNIX/Linux platform, written in Perl and uses a MySQL database. Scripts to generate a user-friendly web interface are also provided with common queries for preliminary data analysis. A machine learning tool developed by this group for increasing the efficiency of SNP discovery is integrated as a part of this package as an optional feature. The SNP-PHAGE package is being made available open source at <url>http://bfgl.anri.barc.usda.gov/ML/snp-phage/</url>.</p> <p>Conclusion</p> <p>SNP-PHAGE provides a bioinformatics solution for high throughput SNP discovery, identification of common haplotypes within an amplicon, and GenBank (dbSNP) submissions. SNP selection and visualization are aided through a user-friendly web interface. This tool is useful for analyzing sequence tagged sites (STSs) of genomic sequences, and this software can serve as a starting point for groups interested in developing SNP markers.</p>http://www.biomedcentral.com/1471-2105/7/468
spellingShingle Cregan Perry B
Choi Ik-Young
Hyten David L
Grefenstette John J
Matukumalli Lakshmi K
Van Tassell Curtis P
SNP-PHAGE – High throughput SNP discovery pipeline
BMC Bioinformatics
title SNP-PHAGE – High throughput SNP discovery pipeline
title_full SNP-PHAGE – High throughput SNP discovery pipeline
title_fullStr SNP-PHAGE – High throughput SNP discovery pipeline
title_full_unstemmed SNP-PHAGE – High throughput SNP discovery pipeline
title_short SNP-PHAGE – High throughput SNP discovery pipeline
title_sort snp phage high throughput snp discovery pipeline
url http://www.biomedcentral.com/1471-2105/7/468
work_keys_str_mv AT creganperryb snpphagehighthroughputsnpdiscoverypipeline
AT choiikyoung snpphagehighthroughputsnpdiscoverypipeline
AT hytendavidl snpphagehighthroughputsnpdiscoverypipeline
AT grefenstettejohnj snpphagehighthroughputsnpdiscoverypipeline
AT matukumallilakshmik snpphagehighthroughputsnpdiscoverypipeline
AT vantassellcurtisp snpphagehighthroughputsnpdiscoverypipeline