Evaluating bioinformatics pipelines for population‐level inference using environmental DNA

Abstract Environmental DNA is mainly not only used at the interspecific level, to quantify species diversity in ecosystems, but can also be used to quantify intraspecific genetic variability, thus avoiding the need to sample individual tissue. However, errors in the amplification and sequencing of e...

Full description

Bibliographic Details
Main Authors: Bastien Macé, Régis Hocdé, Virginie Marques, Pierre‐Edouard Guerin, Alice Valentini, Véronique Arnal, Loïc Pellissier, Stéphanie Manel
Format: Article
Language:English
Published: Wiley 2022-05-01
Series:Environmental DNA
Subjects:
Online Access:https://doi.org/10.1002/edn3.269
_version_ 1811254085733580800
author Bastien Macé
Régis Hocdé
Virginie Marques
Pierre‐Edouard Guerin
Alice Valentini
Véronique Arnal
Loïc Pellissier
Stéphanie Manel
author_facet Bastien Macé
Régis Hocdé
Virginie Marques
Pierre‐Edouard Guerin
Alice Valentini
Véronique Arnal
Loïc Pellissier
Stéphanie Manel
author_sort Bastien Macé
collection DOAJ
description Abstract Environmental DNA is mainly not only used at the interspecific level, to quantify species diversity in ecosystems, but can also be used to quantify intraspecific genetic variability, thus avoiding the need to sample individual tissue. However, errors in the amplification and sequencing of eDNA samples can blur this intraspecific signal and strongly over‐estimate genetic diversity. Existing bioinformatics pipelines therefore need to be tested to evaluate whether reliable levels of intraspecific genetic variability can be derived from eDNA samples. Here, we compare the ability of twelve metabarcoding pipelines to detect intraspecific genetic variability combining five programs. All pipelines have common pre‐processing steps, a processing data step using programs among obiclean; DADA2; SWARM; and LULU. An additional chimera removal step is also investigated based on two programs (VSEARCH or DADA2). The case study was the natural intraspecific variation within Mullus surmuletus in experimental settings. We developed specific primers for this species, located on the mitochondrial D‐loop fragment (barcode MS‐DL06). Thirty‐nine individuals were collected from the Mediterranean Sea, placed into four aquariums, and their DNA was sequenced on this marker to build an intraspecific reference database. After filtering the aquarium water, DNA was extracted, amplified, and sequenced using the primer pair developed. We then quantified the number of true haplotypes returned by each pipeline and its capacity to eliminate most of the erroneous sequences. We show that the program DADA2 with a two‐parent chimeric sequence removal step is the best tool to estimate intraspecific diversity from eDNA. Furthermore, our approach was also able to detect true M. surmuletus haplotypes in two eDNA samples collected in the Mediterranean Sea. We conclude that the combination of an appropriate intrapopulation barcode and a denoising pipeline like DADA2 with a chimeric sequence removal step is promising to make population‐level inference using environmental DNA possible.
first_indexed 2024-04-12T17:01:36Z
format Article
id doaj.art-a47f6655f3e74e64bcb0262be57bfa4d
institution Directory Open Access Journal
issn 2637-4943
language English
last_indexed 2024-04-12T17:01:36Z
publishDate 2022-05-01
publisher Wiley
record_format Article
series Environmental DNA
spelling doaj.art-a47f6655f3e74e64bcb0262be57bfa4d2022-12-22T03:24:04ZengWileyEnvironmental DNA2637-49432022-05-014367468610.1002/edn3.269Evaluating bioinformatics pipelines for population‐level inference using environmental DNABastien Macé0Régis Hocdé1Virginie Marques2Pierre‐Edouard Guerin3Alice Valentini4Véronique Arnal5Loïc Pellissier6Stéphanie Manel7CEFE, Univ. Montpellier, CNRS EPHE‐PSL University, IRD Montpellier FranceMARBEC Univ. Montpellier CNRS Ifremer IRD Montpellier FranceCEFE, Univ. Montpellier, CNRS EPHE‐PSL University, IRD Montpellier FranceCEFE, Univ. Montpellier, CNRS EPHE‐PSL University, IRD Montpellier FranceSPYGEN Le Bourget ‐du‐Lac FranceCEFE, Univ. Montpellier, CNRS EPHE‐PSL University, IRD Montpellier FranceLandscape Ecology Department of Environmental Systems Science Institute of Terrestrial Ecosystems ETH Zürich Zürich SwitzerlandCEFE, Univ. Montpellier, CNRS EPHE‐PSL University, IRD Montpellier FranceAbstract Environmental DNA is mainly not only used at the interspecific level, to quantify species diversity in ecosystems, but can also be used to quantify intraspecific genetic variability, thus avoiding the need to sample individual tissue. However, errors in the amplification and sequencing of eDNA samples can blur this intraspecific signal and strongly over‐estimate genetic diversity. Existing bioinformatics pipelines therefore need to be tested to evaluate whether reliable levels of intraspecific genetic variability can be derived from eDNA samples. Here, we compare the ability of twelve metabarcoding pipelines to detect intraspecific genetic variability combining five programs. All pipelines have common pre‐processing steps, a processing data step using programs among obiclean; DADA2; SWARM; and LULU. An additional chimera removal step is also investigated based on two programs (VSEARCH or DADA2). The case study was the natural intraspecific variation within Mullus surmuletus in experimental settings. We developed specific primers for this species, located on the mitochondrial D‐loop fragment (barcode MS‐DL06). Thirty‐nine individuals were collected from the Mediterranean Sea, placed into four aquariums, and their DNA was sequenced on this marker to build an intraspecific reference database. After filtering the aquarium water, DNA was extracted, amplified, and sequenced using the primer pair developed. We then quantified the number of true haplotypes returned by each pipeline and its capacity to eliminate most of the erroneous sequences. We show that the program DADA2 with a two‐parent chimeric sequence removal step is the best tool to estimate intraspecific diversity from eDNA. Furthermore, our approach was also able to detect true M. surmuletus haplotypes in two eDNA samples collected in the Mediterranean Sea. We conclude that the combination of an appropriate intrapopulation barcode and a denoising pipeline like DADA2 with a chimeric sequence removal step is promising to make population‐level inference using environmental DNA possible.https://doi.org/10.1002/edn3.269bioinformaticsenvironmental DNAfishgenetic diversitymarine ecology
spellingShingle Bastien Macé
Régis Hocdé
Virginie Marques
Pierre‐Edouard Guerin
Alice Valentini
Véronique Arnal
Loïc Pellissier
Stéphanie Manel
Evaluating bioinformatics pipelines for population‐level inference using environmental DNA
Environmental DNA
bioinformatics
environmental DNA
fish
genetic diversity
marine ecology
title Evaluating bioinformatics pipelines for population‐level inference using environmental DNA
title_full Evaluating bioinformatics pipelines for population‐level inference using environmental DNA
title_fullStr Evaluating bioinformatics pipelines for population‐level inference using environmental DNA
title_full_unstemmed Evaluating bioinformatics pipelines for population‐level inference using environmental DNA
title_short Evaluating bioinformatics pipelines for population‐level inference using environmental DNA
title_sort evaluating bioinformatics pipelines for population level inference using environmental dna
topic bioinformatics
environmental DNA
fish
genetic diversity
marine ecology
url https://doi.org/10.1002/edn3.269
work_keys_str_mv AT bastienmace evaluatingbioinformaticspipelinesforpopulationlevelinferenceusingenvironmentaldna
AT regishocde evaluatingbioinformaticspipelinesforpopulationlevelinferenceusingenvironmentaldna
AT virginiemarques evaluatingbioinformaticspipelinesforpopulationlevelinferenceusingenvironmentaldna
AT pierreedouardguerin evaluatingbioinformaticspipelinesforpopulationlevelinferenceusingenvironmentaldna
AT alicevalentini evaluatingbioinformaticspipelinesforpopulationlevelinferenceusingenvironmentaldna
AT veroniquearnal evaluatingbioinformaticspipelinesforpopulationlevelinferenceusingenvironmentaldna
AT loicpellissier evaluatingbioinformaticspipelinesforpopulationlevelinferenceusingenvironmentaldna
AT stephaniemanel evaluatingbioinformaticspipelinesforpopulationlevelinferenceusingenvironmentaldna