SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa

The 3000 Rice Genomes Project generated a large dataset of genomic variation to the world’s most important crop, Oryza sativa L. Using the Burrows-Wheeler Aligner (BWA) and the Genome Analysis Toolkit (GATK) variant calling on this dataset, we identified ∼40 M single-nucleotide polymorphisms (SNPs)....

Full description

Bibliographic Details
Main Authors: Locedie Mansueto, Roven Rommel Fuentes, Dmytro Chebotarov, Frances Nikki Borja, Jeffrey Detras, Juan Miguel Abriol-Santos, Kevin Palis, Alexandre Poliakov, Inna Dubchak, Victor Solovyev, Ruaraidh Sackville Hamilton, Kenneth L. McNally, Nickolai Alexandrov, Ramil Mauleon
Format: Article
Language:English
Published: Elsevier 2016-11-01
Series:Current Plant Biology
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2214662816300780
_version_ 1818417987493822464
author Locedie Mansueto
Roven Rommel Fuentes
Dmytro Chebotarov
Frances Nikki Borja
Jeffrey Detras
Juan Miguel Abriol-Santos
Kevin Palis
Alexandre Poliakov
Inna Dubchak
Victor Solovyev
Ruaraidh Sackville Hamilton
Kenneth L. McNally
Nickolai Alexandrov
Ramil Mauleon
author_facet Locedie Mansueto
Roven Rommel Fuentes
Dmytro Chebotarov
Frances Nikki Borja
Jeffrey Detras
Juan Miguel Abriol-Santos
Kevin Palis
Alexandre Poliakov
Inna Dubchak
Victor Solovyev
Ruaraidh Sackville Hamilton
Kenneth L. McNally
Nickolai Alexandrov
Ramil Mauleon
author_sort Locedie Mansueto
collection DOAJ
description The 3000 Rice Genomes Project generated a large dataset of genomic variation to the world’s most important crop, Oryza sativa L. Using the Burrows-Wheeler Aligner (BWA) and the Genome Analysis Toolkit (GATK) variant calling on this dataset, we identified ∼40 M single-nucleotide polymorphisms (SNPs). Five reference genomes of rice representing the major variety groups were used: Nipponbare (temperate japonica), IR 64 (indica), 93–11 (indica), DJ 123 (aus), and Kasalath (aus). The results are accessible through the Rice SNP-Seek Database (http://snp-seek.irri.org) and through web services of the application programming interface (API). We incorporated legacy phenotypic and passport data for the sequenced varieties originating from the International Rice Genebank Collection Information System (IRGCIS) and gene models from several rice annotation projects. The massive genotypic data in SNP-Seek are stored using hierarchical data format 5 (HDF5) files for quick retrieval. Germplasm, phenotypic, and genomic data are stored in a relational database management system (RDBMS) using the Chado schema, allowing the use of controlled vocabularies from biological ontologies as query constraints in SNP-Seek. In this paper, we discuss the datasets stored in SNP-Seek, architecture of the database and web application, interoperability methodologies in place, and discuss a few use cases demonstrating the utility of SNP-Seek for diversity analysis and molecular breeding.
first_indexed 2024-12-14T12:15:31Z
format Article
id doaj.art-0f453519967f4c62b512b95d4e3651c0
institution Directory Open Access Journal
issn 2214-6628
language English
last_indexed 2024-12-14T12:15:31Z
publishDate 2016-11-01
publisher Elsevier
record_format Article
series Current Plant Biology
spelling doaj.art-0f453519967f4c62b512b95d4e3651c02022-12-21T23:01:38ZengElsevierCurrent Plant Biology2214-66282016-11-017C162510.1016/j.cpb.2016.12.003SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativaLocedie Mansueto0Roven Rommel Fuentes1Dmytro Chebotarov2Frances Nikki Borja3Jeffrey Detras4Juan Miguel Abriol-Santos5Kevin Palis6Alexandre Poliakov7Inna Dubchak8Victor Solovyev9Ruaraidh Sackville Hamilton10Kenneth L. McNally11Nickolai Alexandrov12Ramil Mauleon13International Rice Research Institute, College, Los Baños, Laguna, 4031, PhilippinesInternational Rice Research Institute, College, Los Baños, Laguna, 4031, PhilippinesInternational Rice Research Institute, College, Los Baños, Laguna, 4031, PhilippinesInternational Rice Research Institute, College, Los Baños, Laguna, 4031, PhilippinesInternational Rice Research Institute, College, Los Baños, Laguna, 4031, PhilippinesInternational Rice Research Institute, College, Los Baños, Laguna, 4031, PhilippinesInternational Rice Research Institute, College, Los Baños, Laguna, 4031, PhilippinesLawrence Berkeley National Laboratory, Berkeley, CA 94720, USALawrence Berkeley National Laboratory, Berkeley, CA 94720, USASoftberry, Inc., Mount Kisco, NY 10549, USAInternational Rice Research Institute, College, Los Baños, Laguna, 4031, PhilippinesInternational Rice Research Institute, College, Los Baños, Laguna, 4031, PhilippinesInternational Rice Research Institute, College, Los Baños, Laguna, 4031, PhilippinesInternational Rice Research Institute, College, Los Baños, Laguna, 4031, PhilippinesThe 3000 Rice Genomes Project generated a large dataset of genomic variation to the world’s most important crop, Oryza sativa L. Using the Burrows-Wheeler Aligner (BWA) and the Genome Analysis Toolkit (GATK) variant calling on this dataset, we identified ∼40 M single-nucleotide polymorphisms (SNPs). Five reference genomes of rice representing the major variety groups were used: Nipponbare (temperate japonica), IR 64 (indica), 93–11 (indica), DJ 123 (aus), and Kasalath (aus). The results are accessible through the Rice SNP-Seek Database (http://snp-seek.irri.org) and through web services of the application programming interface (API). We incorporated legacy phenotypic and passport data for the sequenced varieties originating from the International Rice Genebank Collection Information System (IRGCIS) and gene models from several rice annotation projects. The massive genotypic data in SNP-Seek are stored using hierarchical data format 5 (HDF5) files for quick retrieval. Germplasm, phenotypic, and genomic data are stored in a relational database management system (RDBMS) using the Chado schema, allowing the use of controlled vocabularies from biological ontologies as query constraints in SNP-Seek. In this paper, we discuss the datasets stored in SNP-Seek, architecture of the database and web application, interoperability methodologies in place, and discuss a few use cases demonstrating the utility of SNP-Seek for diversity analysis and molecular breeding.http://www.sciencedirect.com/science/article/pii/S2214662816300780Allele miningOryzaSNPIndelGenotype databaseGenetic diversity
spellingShingle Locedie Mansueto
Roven Rommel Fuentes
Dmytro Chebotarov
Frances Nikki Borja
Jeffrey Detras
Juan Miguel Abriol-Santos
Kevin Palis
Alexandre Poliakov
Inna Dubchak
Victor Solovyev
Ruaraidh Sackville Hamilton
Kenneth L. McNally
Nickolai Alexandrov
Ramil Mauleon
SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa
Current Plant Biology
Allele mining
Oryza
SNP
Indel
Genotype database
Genetic diversity
title SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa
title_full SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa
title_fullStr SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa
title_full_unstemmed SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa
title_short SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa
title_sort snp seek ii a resource for allele mining and analysis of big genomic data in oryza sativa
topic Allele mining
Oryza
SNP
Indel
Genotype database
Genetic diversity
url http://www.sciencedirect.com/science/article/pii/S2214662816300780
work_keys_str_mv AT locediemansueto snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT rovenrommelfuentes snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT dmytrochebotarov snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT francesnikkiborja snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT jeffreydetras snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT juanmiguelabriolsantos snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT kevinpalis snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT alexandrepoliakov snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT innadubchak snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT victorsolovyev snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT ruaraidhsackvillehamilton snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT kennethlmcnally snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT nickolaialexandrov snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa
AT ramilmauleon snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa