GBS-DP: a bioinformatics pipeline for processing data coming from genotyping by sequencing

The development of next-generation sequencing technologies has provided new opportunities for genotyping various organisms, including plants. Genotyping by sequencing (GBS) is used to identify genetic variability more rapidly, and is more cost-effective than whole-genome sequencing. GBS has demonstr...

Full description

Bibliographic Details
Main Authors: A. Y. Pronozin, E. A. Salina, D. A. Afonnikov
Format: Article
Language:English
Published: Siberian Branch of the Russian Academy of Sciences, Federal Research Center Institute of Cytology and Genetics, The Vavilov Society of Geneticists and Breeders 2023-12-01
Series:Вавиловский журнал генетики и селекции
Subjects:
Online Access:https://vavilov.elpub.ru/jour/article/view/3973
_version_ 1797213945466978304
author A. Y. Pronozin
E. A. Salina
D. A. Afonnikov
author_facet A. Y. Pronozin
E. A. Salina
D. A. Afonnikov
author_sort A. Y. Pronozin
collection DOAJ
description The development of next-generation sequencing technologies has provided new opportunities for genotyping various organisms, including plants. Genotyping by sequencing (GBS) is used to identify genetic variability more rapidly, and is more cost-effective than whole-genome sequencing. GBS has demonstrated its reliability and flexibility for a number of plant species and populations. It has been applied to genetic mapping, molecular marker discovery, genomic selection, genetic diversity studies, variety identification, conservation biology and evolutio nary studies. However, reduction in sequencing time and cost has led to the need to develop efficient bioinformatics analyses for an ever-expanding amount of sequenced data. Bioinformatics pipelines for GBS data analysis serve the purpose. Due to the similarity of data processing steps, existing pipelines are mainly characterised by a combination of software packages specifically selected either to process data for certain organisms or to process data from any organisms.  However, despite the usage of efficient software packages, these pipelines have some disadvantages. For example, there is a lack of process automation (in some pipelines, each step must be started manually), which significantly reduces the performance of the analysis. In the majority of pipelines, there is no possibility of automatic installation of all necessary software packages; for most of them, it is also impossible to switch off unnecessary or completed steps. In the present work, we have developed a GBS-DP bioinformatics pipeline for GBS data analysis. The pipeline can be applied for various species. The pipeline is implemented using the Snakemake workflow engine. This implementation allows fully automating the process of calculation and installation of the necessary software packages. Our pipeline is able to perform analysis of large datasets (more than 400 samples).
first_indexed 2024-03-07T16:04:05Z
format Article
id doaj.art-60b3953848684ccf8b34bd68d02f67ce
institution Directory Open Access Journal
issn 2500-3259
language English
last_indexed 2024-04-24T11:06:20Z
publishDate 2023-12-01
publisher Siberian Branch of the Russian Academy of Sciences, Federal Research Center Institute of Cytology and Genetics, The Vavilov Society of Geneticists and Breeders
record_format Article
series Вавиловский журнал генетики и селекции
spelling doaj.art-60b3953848684ccf8b34bd68d02f67ce2024-04-11T15:31:06ZengSiberian Branch of the Russian Academy of Sciences, Federal Research Center Institute of Cytology and Genetics, The Vavilov Society of Geneticists and BreedersВавиловский журнал генетики и селекции2500-32592023-12-0127773774510.18699/VJGB-23-861401GBS-DP: a bioinformatics pipeline for processing data coming from genotyping by sequencingA. Y. Pronozin0E. A. Salina1D. A. Afonnikov2Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences; Kurchatov Genomic Center of ICG SB RASInstitute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences; Kurchatov Genomic Center of ICG SB RAS; Novosibirsk State Agrarian UniversityInstitute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences; Kurchatov Genomic Center of ICG SB RAS; Novosibirsk State Agrarian University; Novosibirsk State UniversityThe development of next-generation sequencing technologies has provided new opportunities for genotyping various organisms, including plants. Genotyping by sequencing (GBS) is used to identify genetic variability more rapidly, and is more cost-effective than whole-genome sequencing. GBS has demonstrated its reliability and flexibility for a number of plant species and populations. It has been applied to genetic mapping, molecular marker discovery, genomic selection, genetic diversity studies, variety identification, conservation biology and evolutio nary studies. However, reduction in sequencing time and cost has led to the need to develop efficient bioinformatics analyses for an ever-expanding amount of sequenced data. Bioinformatics pipelines for GBS data analysis serve the purpose. Due to the similarity of data processing steps, existing pipelines are mainly characterised by a combination of software packages specifically selected either to process data for certain organisms or to process data from any organisms.  However, despite the usage of efficient software packages, these pipelines have some disadvantages. For example, there is a lack of process automation (in some pipelines, each step must be started manually), which significantly reduces the performance of the analysis. In the majority of pipelines, there is no possibility of automatic installation of all necessary software packages; for most of them, it is also impossible to switch off unnecessary or completed steps. In the present work, we have developed a GBS-DP bioinformatics pipeline for GBS data analysis. The pipeline can be applied for various species. The pipeline is implemented using the Snakemake workflow engine. This implementation allows fully automating the process of calculation and installation of the necessary software packages. Our pipeline is able to perform analysis of large datasets (more than 400 samples).https://vavilov.elpub.ru/jour/article/view/3973genotyping by sequencing (gbs)bioinformatic pipelinehordeum
spellingShingle A. Y. Pronozin
E. A. Salina
D. A. Afonnikov
GBS-DP: a bioinformatics pipeline for processing data coming from genotyping by sequencing
Вавиловский журнал генетики и селекции
genotyping by sequencing (gbs)
bioinformatic pipeline
hordeum
title GBS-DP: a bioinformatics pipeline for processing data coming from genotyping by sequencing
title_full GBS-DP: a bioinformatics pipeline for processing data coming from genotyping by sequencing
title_fullStr GBS-DP: a bioinformatics pipeline for processing data coming from genotyping by sequencing
title_full_unstemmed GBS-DP: a bioinformatics pipeline for processing data coming from genotyping by sequencing
title_short GBS-DP: a bioinformatics pipeline for processing data coming from genotyping by sequencing
title_sort gbs dp a bioinformatics pipeline for processing data coming from genotyping by sequencing
topic genotyping by sequencing (gbs)
bioinformatic pipeline
hordeum
url https://vavilov.elpub.ru/jour/article/view/3973
work_keys_str_mv AT aypronozin gbsdpabioinformaticspipelineforprocessingdatacomingfromgenotypingbysequencing
AT easalina gbsdpabioinformaticspipelineforprocessingdatacomingfromgenotypingbysequencing
AT daafonnikov gbsdpabioinformaticspipelineforprocessingdatacomingfromgenotypingbysequencing