SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access

<p>Abstract</p> <p>Background</p> <p>In the last five years large online resources of human variability have appeared, notably HapMap, Perlegen and the CEPH foundation. These databases of genotypes with population information act as catalogues of human diversity, and ar...

Full description

Bibliographic Details
Main Authors: Carracedo Ángel, Phillips Christopher, Salas Antonio, Amigo Jorge
Format: Article
Language:English
Published: BMC 2008-10-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/9/428
_version_ 1818230551184670720
author Carracedo Ángel
Phillips Christopher
Salas Antonio
Amigo Jorge
author_facet Carracedo Ángel
Phillips Christopher
Salas Antonio
Amigo Jorge
author_sort Carracedo Ángel
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>In the last five years large online resources of human variability have appeared, notably HapMap, Perlegen and the CEPH foundation. These databases of genotypes with population information act as catalogues of human diversity, and are widely used as reference sources for population genetics studies. Although many useful conclusions may be extracted by querying databases individually, the lack of flexibility for combining data from within and between each database does not allow the calculation of key population variability statistics.</p> <p>Results</p> <p>We have developed a novel tool for accessing and combining large-scale genomic databases of single nucleotide polymorphisms (SNPs) in widespread use in human population genetics: SPSmart (SNPs for Population Studies). A fast pipeline creates and maintains a data mart from the most commonly accessed databases of genotypes containing population information: data is mined, summarized into the standard statistical reference indices, and stored into a relational database that currently handles as many as 4 × 10<sup>9 </sup>genotypes and that can be easily extended to new database initiatives. We have also built a web interface to the data mart that allows the browsing of underlying data indexed by population and the combining of populations, allowing intuitive and straightforward comparison of population groups. All the information served is optimized for web display, and most of the computations are already pre-processed in the data mart to speed up the data browsing and any computational treatment requested.</p> <p>Conclusion</p> <p>In practice, SPSmart allows populations to be combined into user-defined groups, while multiple databases can be accessed and compared in a few simple steps from a single query. It performs the queries rapidly and gives straightforward graphical summaries of SNP population variability through visual inspection of allele frequencies outlined in standard pie-chart format. In addition, full numerical description of the data is output in statistical results panels that include common population genetics metrics such as heterozygosity, <it>Fst </it>and <it>In</it>.</p>
first_indexed 2024-12-12T10:36:17Z
format Article
id doaj.art-abb660aaf129441b90e95fe6eb0da0c0
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-12T10:36:17Z
publishDate 2008-10-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-abb660aaf129441b90e95fe6eb0da0c02022-12-22T00:27:11ZengBMCBMC Bioinformatics1471-21052008-10-019142810.1186/1471-2105-9-428SPSmart: adapting population based SNP genotype databases for fast and comprehensive web accessCarracedo ÁngelPhillips ChristopherSalas AntonioAmigo Jorge<p>Abstract</p> <p>Background</p> <p>In the last five years large online resources of human variability have appeared, notably HapMap, Perlegen and the CEPH foundation. These databases of genotypes with population information act as catalogues of human diversity, and are widely used as reference sources for population genetics studies. Although many useful conclusions may be extracted by querying databases individually, the lack of flexibility for combining data from within and between each database does not allow the calculation of key population variability statistics.</p> <p>Results</p> <p>We have developed a novel tool for accessing and combining large-scale genomic databases of single nucleotide polymorphisms (SNPs) in widespread use in human population genetics: SPSmart (SNPs for Population Studies). A fast pipeline creates and maintains a data mart from the most commonly accessed databases of genotypes containing population information: data is mined, summarized into the standard statistical reference indices, and stored into a relational database that currently handles as many as 4 × 10<sup>9 </sup>genotypes and that can be easily extended to new database initiatives. We have also built a web interface to the data mart that allows the browsing of underlying data indexed by population and the combining of populations, allowing intuitive and straightforward comparison of population groups. All the information served is optimized for web display, and most of the computations are already pre-processed in the data mart to speed up the data browsing and any computational treatment requested.</p> <p>Conclusion</p> <p>In practice, SPSmart allows populations to be combined into user-defined groups, while multiple databases can be accessed and compared in a few simple steps from a single query. It performs the queries rapidly and gives straightforward graphical summaries of SNP population variability through visual inspection of allele frequencies outlined in standard pie-chart format. In addition, full numerical description of the data is output in statistical results panels that include common population genetics metrics such as heterozygosity, <it>Fst </it>and <it>In</it>.</p>http://www.biomedcentral.com/1471-2105/9/428
spellingShingle Carracedo Ángel
Phillips Christopher
Salas Antonio
Amigo Jorge
SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access
BMC Bioinformatics
title SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access
title_full SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access
title_fullStr SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access
title_full_unstemmed SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access
title_short SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access
title_sort spsmart adapting population based snp genotype databases for fast and comprehensive web access
url http://www.biomedcentral.com/1471-2105/9/428
work_keys_str_mv AT carracedoangel spsmartadaptingpopulationbasedsnpgenotypedatabasesforfastandcomprehensivewebaccess
AT phillipschristopher spsmartadaptingpopulationbasedsnpgenotypedatabasesforfastandcomprehensivewebaccess
AT salasantonio spsmartadaptingpopulationbasedsnpgenotypedatabasesforfastandcomprehensivewebaccess
AT amigojorge spsmartadaptingpopulationbasedsnpgenotypedatabasesforfastandcomprehensivewebaccess