A pipeline for assembling low copy nuclear markers from plant genome skimming data for phylogenetic use

Background Genome skimming is a popular method in plant phylogenomics that do not include a biased enrichment step, relying on random shallow sequencing of total genomic DNA. From these data the plastome is usually readily assembled and constitutes the bulk of phylogenetic information generated in t...

Full description

Bibliographic Details
Main Author:	Marcelo Reginato
Format:	Article
Language:	English
Published:	PeerJ Inc. 2022-12-01
Series:	PeerJ
Subjects:	Genome skimming Low copy Mapping reads High-throughput sequencing Phylogenetics Pipeline
Online Access:	https://peerj.com/articles/14525.pdf

_version_	1797425590565863424
author	Marcelo Reginato
author_facet	Marcelo Reginato
author_sort	Marcelo Reginato
collection	DOAJ
description	Background Genome skimming is a popular method in plant phylogenomics that do not include a biased enrichment step, relying on random shallow sequencing of total genomic DNA. From these data the plastome is usually readily assembled and constitutes the bulk of phylogenetic information generated in these studies. Despite a few attempts to use genome skims to recover low copy nuclear loci for direct phylogenetic use, such endeavor remains neglected. Causes might include the trade-off between libraries with few reads and species with large genomes (i.e., missing data caused by low coverage), but also might relate to the lack of pipelines for data assembling. Methods A pipeline and its companion R package designed to automate the recovery of low copy nuclear markers from genome skimming libraries are presented. Additionally, a series of analyses aiming to evaluate the impact of key assembling parameters, reference selection and missing data are presented. Results A substantial amount of putative low copy nuclear loci was assembled and proved useful to base phylogenetic inference across the libraries tested (4 to 11 times more data than previously assembled plastomes from the same libraries). Discussion Critical aspects of assembling low copy nuclear markers from genome skims include the minimum coverage and depth of a sequence to be used. More stringent values of these parameters reduces the amount of assembled data and increases the relative amount of missing data, which can compromise phylogenetic inference, in turn relaxing the same parameters might increase sequence error. These issues are discussed in the text, and parameter tuning through multiple comparisons tracking their effects on support and congruence is highly recommended when using this pipeline. The skimmingLoci pipeline (https://github.com/mreginato/skimmingLoci) might stimulate the use of genome skims to recover nuclear loci for direct phylogenetic use, increasing the power of genome skimming data to resolve phylogenetic relationships, while reducing the amount of sequenced DNA that is commonly wasted.
first_indexed	2024-03-09T08:18:17Z
format	Article
id	doaj.art-1a58c1ac6a704e27a7cd81add0cb66e4
institution	Directory Open Access Journal
issn	2167-8359
language	English
last_indexed	2024-03-09T08:18:17Z
publishDate	2022-12-01
publisher	PeerJ Inc.
record_format	Article
series	PeerJ
spelling	doaj.art-1a58c1ac6a704e27a7cd81add0cb66e42023-12-02T21:55:21ZengPeerJ Inc.PeerJ2167-83592022-12-0110e1452510.7717/peerj.14525A pipeline for assembling low copy nuclear markers from plant genome skimming data for phylogenetic useMarcelo ReginatoBackground Genome skimming is a popular method in plant phylogenomics that do not include a biased enrichment step, relying on random shallow sequencing of total genomic DNA. From these data the plastome is usually readily assembled and constitutes the bulk of phylogenetic information generated in these studies. Despite a few attempts to use genome skims to recover low copy nuclear loci for direct phylogenetic use, such endeavor remains neglected. Causes might include the trade-off between libraries with few reads and species with large genomes (i.e., missing data caused by low coverage), but also might relate to the lack of pipelines for data assembling. Methods A pipeline and its companion R package designed to automate the recovery of low copy nuclear markers from genome skimming libraries are presented. Additionally, a series of analyses aiming to evaluate the impact of key assembling parameters, reference selection and missing data are presented. Results A substantial amount of putative low copy nuclear loci was assembled and proved useful to base phylogenetic inference across the libraries tested (4 to 11 times more data than previously assembled plastomes from the same libraries). Discussion Critical aspects of assembling low copy nuclear markers from genome skims include the minimum coverage and depth of a sequence to be used. More stringent values of these parameters reduces the amount of assembled data and increases the relative amount of missing data, which can compromise phylogenetic inference, in turn relaxing the same parameters might increase sequence error. These issues are discussed in the text, and parameter tuning through multiple comparisons tracking their effects on support and congruence is highly recommended when using this pipeline. The skimmingLoci pipeline (https://github.com/mreginato/skimmingLoci) might stimulate the use of genome skims to recover nuclear loci for direct phylogenetic use, increasing the power of genome skimming data to resolve phylogenetic relationships, while reducing the amount of sequenced DNA that is commonly wasted.https://peerj.com/articles/14525.pdfGenome skimmingLow copyMapping readsHigh-throughput sequencingPhylogeneticsPipeline
spellingShingle	Marcelo Reginato A pipeline for assembling low copy nuclear markers from plant genome skimming data for phylogenetic use PeerJ Genome skimming Low copy Mapping reads High-throughput sequencing Phylogenetics Pipeline
title	A pipeline for assembling low copy nuclear markers from plant genome skimming data for phylogenetic use
title_full	A pipeline for assembling low copy nuclear markers from plant genome skimming data for phylogenetic use
title_fullStr	A pipeline for assembling low copy nuclear markers from plant genome skimming data for phylogenetic use
title_full_unstemmed	A pipeline for assembling low copy nuclear markers from plant genome skimming data for phylogenetic use
title_short	A pipeline for assembling low copy nuclear markers from plant genome skimming data for phylogenetic use
title_sort	pipeline for assembling low copy nuclear markers from plant genome skimming data for phylogenetic use
topic	Genome skimming Low copy Mapping reads High-throughput sequencing Phylogenetics Pipeline
url	https://peerj.com/articles/14525.pdf
work_keys_str_mv	AT marceloreginato apipelineforassemblinglowcopynuclearmarkersfromplantgenomeskimmingdataforphylogeneticuse AT marceloreginato pipelineforassemblinglowcopynuclearmarkersfromplantgenomeskimmingdataforphylogeneticuse

A pipeline for assembling low copy nuclear markers from plant genome skimming data for phylogenetic use

Similar Items