dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms

Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for non-model organisms wi...

Full description

Bibliographic Details
Main Authors: Jonathan B. Puritz, Christopher M. Hollenbeck, John R. Gold
Format: Article
Language:English
Published: PeerJ Inc. 2014-06-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/431.pdf
_version_ 1797425571898064896
author Jonathan B. Puritz
Christopher M. Hollenbeck
John R. Gold
author_facet Jonathan B. Puritz
Christopher M. Hollenbeck
John R. Gold
author_sort Jonathan B. Puritz
collection DOAJ
description Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for non-model organisms with large effective population sizes and high levels of genetic polymorphism. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is due to the fact that dDocent quality trims instead of filtering, incorporates both forward and reverse reads (including reads with INDEL polymorphisms) in assembly, mapping, and SNP calling. The pipeline and a comprehensive user guide can be found at http://dDocent.wordpress.com.
first_indexed 2024-03-09T08:18:02Z
format Article
id doaj.art-d1b9dd524235425c8e21f3c6ec27e0c1
institution Directory Open Access Journal
issn 2167-8359
language English
last_indexed 2024-03-09T08:18:02Z
publishDate 2014-06-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj.art-d1b9dd524235425c8e21f3c6ec27e0c12023-12-02T22:00:14ZengPeerJ Inc.PeerJ2167-83592014-06-012e43110.7717/peerj.431431dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organismsJonathan B. Puritz0Christopher M. Hollenbeck1John R. Gold2Marine Genomics Laboratory, Harte Research Institute, Texas A&M University-Corpus Christi, Corpus Christi, TX, USAMarine Genomics Laboratory, Harte Research Institute, Texas A&M University-Corpus Christi, Corpus Christi, TX, USAMarine Genomics Laboratory, Harte Research Institute, Texas A&M University-Corpus Christi, Corpus Christi, TX, USARestriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for non-model organisms with large effective population sizes and high levels of genetic polymorphism. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is due to the fact that dDocent quality trims instead of filtering, incorporates both forward and reverse reads (including reads with INDEL polymorphisms) in assembly, mapping, and SNP calling. The pipeline and a comprehensive user guide can be found at http://dDocent.wordpress.com.https://peerj.com/articles/431.pdfRADseqPopulation genomicsBioinformaticsMolecular ecologyNext-generation sequencing
spellingShingle Jonathan B. Puritz
Christopher M. Hollenbeck
John R. Gold
dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms
PeerJ
RADseq
Population genomics
Bioinformatics
Molecular ecology
Next-generation sequencing
title dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms
title_full dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms
title_fullStr dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms
title_full_unstemmed dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms
title_short dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms
title_sort ddocent a radseq variant calling pipeline designed for population genomics of non model organisms
topic RADseq
Population genomics
Bioinformatics
Molecular ecology
Next-generation sequencing
url https://peerj.com/articles/431.pdf
work_keys_str_mv AT jonathanbpuritz ddocentaradseqvariantcallingpipelinedesignedforpopulationgenomicsofnonmodelorganisms
AT christophermhollenbeck ddocentaradseqvariantcallingpipelinedesignedforpopulationgenomicsofnonmodelorganisms
AT johnrgold ddocentaradseqvariantcallingpipelinedesignedforpopulationgenomicsofnonmodelorganisms