THAPBI PICT—a fast, cautious, and accurate metabarcoding analysis pipeline

THAPBI PICT is an open source software pipeline for metabarcoding analysis of Illumina paired-end reads, including cases of multiplexing where more than one amplicon is amplified per DNA sample. Initially a Phytophthora ITS1 Classification Tool (PICT), we demonstrate using worked examples with our o...

Full description

Bibliographic Details
Main Authors: Peter J. A. Cock, David E. L. Cooke, Peter Thorpe, Leighton Pritchard
Format: Article
Language:English
Published: PeerJ Inc. 2023-08-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/15648.pdf
_version_ 1797422034755518464
author Peter J. A. Cock
David E. L. Cooke
Peter Thorpe
Leighton Pritchard
author_facet Peter J. A. Cock
David E. L. Cooke
Peter Thorpe
Leighton Pritchard
author_sort Peter J. A. Cock
collection DOAJ
description THAPBI PICT is an open source software pipeline for metabarcoding analysis of Illumina paired-end reads, including cases of multiplexing where more than one amplicon is amplified per DNA sample. Initially a Phytophthora ITS1 Classification Tool (PICT), we demonstrate using worked examples with our own and public data sets how, with appropriate primer settings and a custom database, it can be applied to other amplicons and organisms, and used for reanalysis of existing datasets. The core dataflow of the implementation is (i) data reduction to unique marker sequences, often called amplicon sequence variants (ASVs), (ii) dynamic thresholds for discarding low abundance sequences to remove noise and artifacts (rather than error correction by default), before (iii) classification using a curated reference database. The default classifier assigns a label to each query sequence based on a database match that is either perfect, or a single base pair edit away (substitution, deletion or insertion). Abundance thresholds for inclusion can be set by the user or automatically using per-batch negative or synthetic control samples. Output is designed for practical interpretation by non-specialists and includes a read report (ASVs with classification and counts per sample), sample report (samples with counts per species classification), and a topological graph of ASVs as nodes with short edit distances as edges. Source code available from https://github.com/peterjc/thapbi-pict/ with documentation including installation instructions.
first_indexed 2024-03-09T07:26:18Z
format Article
id doaj.art-8785f97214b64a08b5264d71a8827c00
institution Directory Open Access Journal
issn 2167-8359
language English
last_indexed 2024-03-09T07:26:18Z
publishDate 2023-08-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj.art-8785f97214b64a08b5264d71a8827c002023-12-03T07:01:04ZengPeerJ Inc.PeerJ2167-83592023-08-0111e1564810.7717/peerj.15648THAPBI PICT—a fast, cautious, and accurate metabarcoding analysis pipelinePeter J. A. Cock0David E. L. Cooke1Peter Thorpe2Leighton Pritchard3Information and Computational Sciences, The James Hutton Institute, Dundee, United KingdomCell and Molecular Sciences, The James Hutton Institute, Dundee, United KingdomCell and Molecular Sciences, The James Hutton Institute, Dundee, United KingdomInformation and Computational Sciences, The James Hutton Institute, Dundee, United KingdomTHAPBI PICT is an open source software pipeline for metabarcoding analysis of Illumina paired-end reads, including cases of multiplexing where more than one amplicon is amplified per DNA sample. Initially a Phytophthora ITS1 Classification Tool (PICT), we demonstrate using worked examples with our own and public data sets how, with appropriate primer settings and a custom database, it can be applied to other amplicons and organisms, and used for reanalysis of existing datasets. The core dataflow of the implementation is (i) data reduction to unique marker sequences, often called amplicon sequence variants (ASVs), (ii) dynamic thresholds for discarding low abundance sequences to remove noise and artifacts (rather than error correction by default), before (iii) classification using a curated reference database. The default classifier assigns a label to each query sequence based on a database match that is either perfect, or a single base pair edit away (substitution, deletion or insertion). Abundance thresholds for inclusion can be set by the user or automatically using per-batch negative or synthetic control samples. Output is designed for practical interpretation by non-specialists and includes a read report (ASVs with classification and counts per sample), sample report (samples with counts per species classification), and a topological graph of ASVs as nodes with short edit distances as edges. Source code available from https://github.com/peterjc/thapbi-pict/ with documentation including installation instructions.https://peerj.com/articles/15648.pdfAmpliconEnvironmental DNABiodiversityBarcodingMetabarcodingPhytophthora
spellingShingle Peter J. A. Cock
David E. L. Cooke
Peter Thorpe
Leighton Pritchard
THAPBI PICT—a fast, cautious, and accurate metabarcoding analysis pipeline
PeerJ
Amplicon
Environmental DNA
Biodiversity
Barcoding
Metabarcoding
Phytophthora
title THAPBI PICT—a fast, cautious, and accurate metabarcoding analysis pipeline
title_full THAPBI PICT—a fast, cautious, and accurate metabarcoding analysis pipeline
title_fullStr THAPBI PICT—a fast, cautious, and accurate metabarcoding analysis pipeline
title_full_unstemmed THAPBI PICT—a fast, cautious, and accurate metabarcoding analysis pipeline
title_short THAPBI PICT—a fast, cautious, and accurate metabarcoding analysis pipeline
title_sort thapbi pict a fast cautious and accurate metabarcoding analysis pipeline
topic Amplicon
Environmental DNA
Biodiversity
Barcoding
Metabarcoding
Phytophthora
url https://peerj.com/articles/15648.pdf
work_keys_str_mv AT peterjacock thapbipictafastcautiousandaccuratemetabarcodinganalysispipeline
AT davidelcooke thapbipictafastcautiousandaccuratemetabarcodinganalysispipeline
AT peterthorpe thapbipictafastcautiousandaccuratemetabarcodinganalysispipeline
AT leightonpritchard thapbipictafastcautiousandaccuratemetabarcodinganalysispipeline