Evaluating bioinformatics pipelines for population‐level inference using environmental DNA
Abstract Environmental DNA is mainly not only used at the interspecific level, to quantify species diversity in ecosystems, but can also be used to quantify intraspecific genetic variability, thus avoiding the need to sample individual tissue. However, errors in the amplification and sequencing of e...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2022-05-01
|
Series: | Environmental DNA |
Subjects: | |
Online Access: | https://doi.org/10.1002/edn3.269 |
_version_ | 1811254085733580800 |
---|---|
author | Bastien Macé Régis Hocdé Virginie Marques Pierre‐Edouard Guerin Alice Valentini Véronique Arnal Loïc Pellissier Stéphanie Manel |
author_facet | Bastien Macé Régis Hocdé Virginie Marques Pierre‐Edouard Guerin Alice Valentini Véronique Arnal Loïc Pellissier Stéphanie Manel |
author_sort | Bastien Macé |
collection | DOAJ |
description | Abstract Environmental DNA is mainly not only used at the interspecific level, to quantify species diversity in ecosystems, but can also be used to quantify intraspecific genetic variability, thus avoiding the need to sample individual tissue. However, errors in the amplification and sequencing of eDNA samples can blur this intraspecific signal and strongly over‐estimate genetic diversity. Existing bioinformatics pipelines therefore need to be tested to evaluate whether reliable levels of intraspecific genetic variability can be derived from eDNA samples. Here, we compare the ability of twelve metabarcoding pipelines to detect intraspecific genetic variability combining five programs. All pipelines have common pre‐processing steps, a processing data step using programs among obiclean; DADA2; SWARM; and LULU. An additional chimera removal step is also investigated based on two programs (VSEARCH or DADA2). The case study was the natural intraspecific variation within Mullus surmuletus in experimental settings. We developed specific primers for this species, located on the mitochondrial D‐loop fragment (barcode MS‐DL06). Thirty‐nine individuals were collected from the Mediterranean Sea, placed into four aquariums, and their DNA was sequenced on this marker to build an intraspecific reference database. After filtering the aquarium water, DNA was extracted, amplified, and sequenced using the primer pair developed. We then quantified the number of true haplotypes returned by each pipeline and its capacity to eliminate most of the erroneous sequences. We show that the program DADA2 with a two‐parent chimeric sequence removal step is the best tool to estimate intraspecific diversity from eDNA. Furthermore, our approach was also able to detect true M. surmuletus haplotypes in two eDNA samples collected in the Mediterranean Sea. We conclude that the combination of an appropriate intrapopulation barcode and a denoising pipeline like DADA2 with a chimeric sequence removal step is promising to make population‐level inference using environmental DNA possible. |
first_indexed | 2024-04-12T17:01:36Z |
format | Article |
id | doaj.art-a47f6655f3e74e64bcb0262be57bfa4d |
institution | Directory Open Access Journal |
issn | 2637-4943 |
language | English |
last_indexed | 2024-04-12T17:01:36Z |
publishDate | 2022-05-01 |
publisher | Wiley |
record_format | Article |
series | Environmental DNA |
spelling | doaj.art-a47f6655f3e74e64bcb0262be57bfa4d2022-12-22T03:24:04ZengWileyEnvironmental DNA2637-49432022-05-014367468610.1002/edn3.269Evaluating bioinformatics pipelines for population‐level inference using environmental DNABastien Macé0Régis Hocdé1Virginie Marques2Pierre‐Edouard Guerin3Alice Valentini4Véronique Arnal5Loïc Pellissier6Stéphanie Manel7CEFE, Univ. Montpellier, CNRS EPHE‐PSL University, IRD Montpellier FranceMARBEC Univ. Montpellier CNRS Ifremer IRD Montpellier FranceCEFE, Univ. Montpellier, CNRS EPHE‐PSL University, IRD Montpellier FranceCEFE, Univ. Montpellier, CNRS EPHE‐PSL University, IRD Montpellier FranceSPYGEN Le Bourget ‐du‐Lac FranceCEFE, Univ. Montpellier, CNRS EPHE‐PSL University, IRD Montpellier FranceLandscape Ecology Department of Environmental Systems Science Institute of Terrestrial Ecosystems ETH Zürich Zürich SwitzerlandCEFE, Univ. Montpellier, CNRS EPHE‐PSL University, IRD Montpellier FranceAbstract Environmental DNA is mainly not only used at the interspecific level, to quantify species diversity in ecosystems, but can also be used to quantify intraspecific genetic variability, thus avoiding the need to sample individual tissue. However, errors in the amplification and sequencing of eDNA samples can blur this intraspecific signal and strongly over‐estimate genetic diversity. Existing bioinformatics pipelines therefore need to be tested to evaluate whether reliable levels of intraspecific genetic variability can be derived from eDNA samples. Here, we compare the ability of twelve metabarcoding pipelines to detect intraspecific genetic variability combining five programs. All pipelines have common pre‐processing steps, a processing data step using programs among obiclean; DADA2; SWARM; and LULU. An additional chimera removal step is also investigated based on two programs (VSEARCH or DADA2). The case study was the natural intraspecific variation within Mullus surmuletus in experimental settings. We developed specific primers for this species, located on the mitochondrial D‐loop fragment (barcode MS‐DL06). Thirty‐nine individuals were collected from the Mediterranean Sea, placed into four aquariums, and their DNA was sequenced on this marker to build an intraspecific reference database. After filtering the aquarium water, DNA was extracted, amplified, and sequenced using the primer pair developed. We then quantified the number of true haplotypes returned by each pipeline and its capacity to eliminate most of the erroneous sequences. We show that the program DADA2 with a two‐parent chimeric sequence removal step is the best tool to estimate intraspecific diversity from eDNA. Furthermore, our approach was also able to detect true M. surmuletus haplotypes in two eDNA samples collected in the Mediterranean Sea. We conclude that the combination of an appropriate intrapopulation barcode and a denoising pipeline like DADA2 with a chimeric sequence removal step is promising to make population‐level inference using environmental DNA possible.https://doi.org/10.1002/edn3.269bioinformaticsenvironmental DNAfishgenetic diversitymarine ecology |
spellingShingle | Bastien Macé Régis Hocdé Virginie Marques Pierre‐Edouard Guerin Alice Valentini Véronique Arnal Loïc Pellissier Stéphanie Manel Evaluating bioinformatics pipelines for population‐level inference using environmental DNA Environmental DNA bioinformatics environmental DNA fish genetic diversity marine ecology |
title | Evaluating bioinformatics pipelines for population‐level inference using environmental DNA |
title_full | Evaluating bioinformatics pipelines for population‐level inference using environmental DNA |
title_fullStr | Evaluating bioinformatics pipelines for population‐level inference using environmental DNA |
title_full_unstemmed | Evaluating bioinformatics pipelines for population‐level inference using environmental DNA |
title_short | Evaluating bioinformatics pipelines for population‐level inference using environmental DNA |
title_sort | evaluating bioinformatics pipelines for population level inference using environmental dna |
topic | bioinformatics environmental DNA fish genetic diversity marine ecology |
url | https://doi.org/10.1002/edn3.269 |
work_keys_str_mv | AT bastienmace evaluatingbioinformaticspipelinesforpopulationlevelinferenceusingenvironmentaldna AT regishocde evaluatingbioinformaticspipelinesforpopulationlevelinferenceusingenvironmentaldna AT virginiemarques evaluatingbioinformaticspipelinesforpopulationlevelinferenceusingenvironmentaldna AT pierreedouardguerin evaluatingbioinformaticspipelinesforpopulationlevelinferenceusingenvironmentaldna AT alicevalentini evaluatingbioinformaticspipelinesforpopulationlevelinferenceusingenvironmentaldna AT veroniquearnal evaluatingbioinformaticspipelinesforpopulationlevelinferenceusingenvironmentaldna AT loicpellissier evaluatingbioinformaticspipelinesforpopulationlevelinferenceusingenvironmentaldna AT stephaniemanel evaluatingbioinformaticspipelinesforpopulationlevelinferenceusingenvironmentaldna |