SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa
The 3000 Rice Genomes Project generated a large dataset of genomic variation to the world’s most important crop, Oryza sativa L. Using the Burrows-Wheeler Aligner (BWA) and the Genome Analysis Toolkit (GATK) variant calling on this dataset, we identified ∼40 M single-nucleotide polymorphisms (SNPs)....
Main Authors: | , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2016-11-01
|
Series: | Current Plant Biology |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2214662816300780 |
_version_ | 1818417987493822464 |
---|---|
author | Locedie Mansueto Roven Rommel Fuentes Dmytro Chebotarov Frances Nikki Borja Jeffrey Detras Juan Miguel Abriol-Santos Kevin Palis Alexandre Poliakov Inna Dubchak Victor Solovyev Ruaraidh Sackville Hamilton Kenneth L. McNally Nickolai Alexandrov Ramil Mauleon |
author_facet | Locedie Mansueto Roven Rommel Fuentes Dmytro Chebotarov Frances Nikki Borja Jeffrey Detras Juan Miguel Abriol-Santos Kevin Palis Alexandre Poliakov Inna Dubchak Victor Solovyev Ruaraidh Sackville Hamilton Kenneth L. McNally Nickolai Alexandrov Ramil Mauleon |
author_sort | Locedie Mansueto |
collection | DOAJ |
description | The 3000 Rice Genomes Project generated a large dataset of genomic variation to the world’s most important crop, Oryza sativa L. Using the Burrows-Wheeler Aligner (BWA) and the Genome Analysis Toolkit (GATK) variant calling on this dataset, we identified ∼40 M single-nucleotide polymorphisms (SNPs). Five reference genomes of rice representing the major variety groups were used: Nipponbare (temperate japonica), IR 64 (indica), 93–11 (indica), DJ 123 (aus), and Kasalath (aus).
The results are accessible through the Rice SNP-Seek Database (http://snp-seek.irri.org) and through web services of the application programming interface (API). We incorporated legacy phenotypic and passport data for the sequenced varieties originating from the International Rice Genebank Collection Information System (IRGCIS) and gene models from several rice annotation projects. The massive genotypic data in SNP-Seek are stored using hierarchical data format 5 (HDF5) files for quick retrieval. Germplasm, phenotypic, and genomic data are stored in a relational database management system (RDBMS) using the Chado schema, allowing the use of controlled vocabularies from biological ontologies as query constraints in SNP-Seek.
In this paper, we discuss the datasets stored in SNP-Seek, architecture of the database and web application, interoperability methodologies in place, and discuss a few use cases demonstrating the utility of SNP-Seek for diversity analysis and molecular breeding. |
first_indexed | 2024-12-14T12:15:31Z |
format | Article |
id | doaj.art-0f453519967f4c62b512b95d4e3651c0 |
institution | Directory Open Access Journal |
issn | 2214-6628 |
language | English |
last_indexed | 2024-12-14T12:15:31Z |
publishDate | 2016-11-01 |
publisher | Elsevier |
record_format | Article |
series | Current Plant Biology |
spelling | doaj.art-0f453519967f4c62b512b95d4e3651c02022-12-21T23:01:38ZengElsevierCurrent Plant Biology2214-66282016-11-017C162510.1016/j.cpb.2016.12.003SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativaLocedie Mansueto0Roven Rommel Fuentes1Dmytro Chebotarov2Frances Nikki Borja3Jeffrey Detras4Juan Miguel Abriol-Santos5Kevin Palis6Alexandre Poliakov7Inna Dubchak8Victor Solovyev9Ruaraidh Sackville Hamilton10Kenneth L. McNally11Nickolai Alexandrov12Ramil Mauleon13International Rice Research Institute, College, Los Baños, Laguna, 4031, PhilippinesInternational Rice Research Institute, College, Los Baños, Laguna, 4031, PhilippinesInternational Rice Research Institute, College, Los Baños, Laguna, 4031, PhilippinesInternational Rice Research Institute, College, Los Baños, Laguna, 4031, PhilippinesInternational Rice Research Institute, College, Los Baños, Laguna, 4031, PhilippinesInternational Rice Research Institute, College, Los Baños, Laguna, 4031, PhilippinesInternational Rice Research Institute, College, Los Baños, Laguna, 4031, PhilippinesLawrence Berkeley National Laboratory, Berkeley, CA 94720, USALawrence Berkeley National Laboratory, Berkeley, CA 94720, USASoftberry, Inc., Mount Kisco, NY 10549, USAInternational Rice Research Institute, College, Los Baños, Laguna, 4031, PhilippinesInternational Rice Research Institute, College, Los Baños, Laguna, 4031, PhilippinesInternational Rice Research Institute, College, Los Baños, Laguna, 4031, PhilippinesInternational Rice Research Institute, College, Los Baños, Laguna, 4031, PhilippinesThe 3000 Rice Genomes Project generated a large dataset of genomic variation to the world’s most important crop, Oryza sativa L. Using the Burrows-Wheeler Aligner (BWA) and the Genome Analysis Toolkit (GATK) variant calling on this dataset, we identified ∼40 M single-nucleotide polymorphisms (SNPs). Five reference genomes of rice representing the major variety groups were used: Nipponbare (temperate japonica), IR 64 (indica), 93–11 (indica), DJ 123 (aus), and Kasalath (aus). The results are accessible through the Rice SNP-Seek Database (http://snp-seek.irri.org) and through web services of the application programming interface (API). We incorporated legacy phenotypic and passport data for the sequenced varieties originating from the International Rice Genebank Collection Information System (IRGCIS) and gene models from several rice annotation projects. The massive genotypic data in SNP-Seek are stored using hierarchical data format 5 (HDF5) files for quick retrieval. Germplasm, phenotypic, and genomic data are stored in a relational database management system (RDBMS) using the Chado schema, allowing the use of controlled vocabularies from biological ontologies as query constraints in SNP-Seek. In this paper, we discuss the datasets stored in SNP-Seek, architecture of the database and web application, interoperability methodologies in place, and discuss a few use cases demonstrating the utility of SNP-Seek for diversity analysis and molecular breeding.http://www.sciencedirect.com/science/article/pii/S2214662816300780Allele miningOryzaSNPIndelGenotype databaseGenetic diversity |
spellingShingle | Locedie Mansueto Roven Rommel Fuentes Dmytro Chebotarov Frances Nikki Borja Jeffrey Detras Juan Miguel Abriol-Santos Kevin Palis Alexandre Poliakov Inna Dubchak Victor Solovyev Ruaraidh Sackville Hamilton Kenneth L. McNally Nickolai Alexandrov Ramil Mauleon SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa Current Plant Biology Allele mining Oryza SNP Indel Genotype database Genetic diversity |
title | SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa |
title_full | SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa |
title_fullStr | SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa |
title_full_unstemmed | SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa |
title_short | SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa |
title_sort | snp seek ii a resource for allele mining and analysis of big genomic data in oryza sativa |
topic | Allele mining Oryza SNP Indel Genotype database Genetic diversity |
url | http://www.sciencedirect.com/science/article/pii/S2214662816300780 |
work_keys_str_mv | AT locediemansueto snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa AT rovenrommelfuentes snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa AT dmytrochebotarov snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa AT francesnikkiborja snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa AT jeffreydetras snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa AT juanmiguelabriolsantos snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa AT kevinpalis snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa AT alexandrepoliakov snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa AT innadubchak snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa AT victorsolovyev snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa AT ruaraidhsackvillehamilton snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa AT kennethlmcnally snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa AT nickolaialexandrov snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa AT ramilmauleon snpseekiiaresourceforalleleminingandanalysisofbiggenomicdatainoryzasativa |