Studying pathogens degrades BLAST-based pathogen identification

Abstract As synthetic biology becomes increasingly capable and accessible, it is likewise increasingly critical to be able to make accurate biosecurity determinations regarding the pathogenicity or toxicity of particular nucleic acid or amino acid sequences. At present, this is typically done using...

Full description

Bibliographic Details
Main Authors: Jacob Beal, Adam Clore, Jeff Manthey
Format: Article
Language:English
Published: Nature Portfolio 2023-04-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-023-32481-z
_version_ 1827970522000916480
author Jacob Beal
Adam Clore
Jeff Manthey
author_facet Jacob Beal
Adam Clore
Jeff Manthey
author_sort Jacob Beal
collection DOAJ
description Abstract As synthetic biology becomes increasingly capable and accessible, it is likewise increasingly critical to be able to make accurate biosecurity determinations regarding the pathogenicity or toxicity of particular nucleic acid or amino acid sequences. At present, this is typically done using the BLAST algorithm to determine the best match with sequences in the NCBI nucleic acid and protein databases. Neither BLAST nor any of the NCBI databases, however, are actually designed for biosafety determination. Critically, taxonomic errors or ambiguities in the NCBI nucleic acid and protein databases can also cause errors in BLAST-based taxonomic categorization. With heavily studied taxa and frequently used biotechnology tools, even low frequency taxonomic categorization issues can lead to high rates of errors in biosecurity decision-making. Here we focus on the implications for false positives, finding that BLAST against NCBI’s protein database will now incorrectly categorize a number of commonly used biotechnology tool sequences as the pathogens or toxins with which they have been used. Paradoxically, this implies that problems are expected to be most acute for the pathogens and toxins of highest interest and for the most widely used biotechnology tools. We thus conclude that biosecurity tools should shift away from BLAST against general purpose databases and towards new methods that are specifically tailored for biosafety purposes.
first_indexed 2024-04-09T18:55:09Z
format Article
id doaj.art-a6a45a2c80124d63b21966c7389b2522
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-04-09T18:55:09Z
publishDate 2023-04-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-a6a45a2c80124d63b21966c7389b25222023-04-09T11:15:11ZengNature PortfolioScientific Reports2045-23222023-04-011311610.1038/s41598-023-32481-zStudying pathogens degrades BLAST-based pathogen identificationJacob Beal0Adam Clore1Jeff Manthey2Raytheon BBNIntegrated DNA TechnologiesIntegrated DNA TechnologiesAbstract As synthetic biology becomes increasingly capable and accessible, it is likewise increasingly critical to be able to make accurate biosecurity determinations regarding the pathogenicity or toxicity of particular nucleic acid or amino acid sequences. At present, this is typically done using the BLAST algorithm to determine the best match with sequences in the NCBI nucleic acid and protein databases. Neither BLAST nor any of the NCBI databases, however, are actually designed for biosafety determination. Critically, taxonomic errors or ambiguities in the NCBI nucleic acid and protein databases can also cause errors in BLAST-based taxonomic categorization. With heavily studied taxa and frequently used biotechnology tools, even low frequency taxonomic categorization issues can lead to high rates of errors in biosecurity decision-making. Here we focus on the implications for false positives, finding that BLAST against NCBI’s protein database will now incorrectly categorize a number of commonly used biotechnology tool sequences as the pathogens or toxins with which they have been used. Paradoxically, this implies that problems are expected to be most acute for the pathogens and toxins of highest interest and for the most widely used biotechnology tools. We thus conclude that biosecurity tools should shift away from BLAST against general purpose databases and towards new methods that are specifically tailored for biosafety purposes.https://doi.org/10.1038/s41598-023-32481-z
spellingShingle Jacob Beal
Adam Clore
Jeff Manthey
Studying pathogens degrades BLAST-based pathogen identification
Scientific Reports
title Studying pathogens degrades BLAST-based pathogen identification
title_full Studying pathogens degrades BLAST-based pathogen identification
title_fullStr Studying pathogens degrades BLAST-based pathogen identification
title_full_unstemmed Studying pathogens degrades BLAST-based pathogen identification
title_short Studying pathogens degrades BLAST-based pathogen identification
title_sort studying pathogens degrades blast based pathogen identification
url https://doi.org/10.1038/s41598-023-32481-z
work_keys_str_mv AT jacobbeal studyingpathogensdegradesblastbasedpathogenidentification
AT adamclore studyingpathogensdegradesblastbasedpathogenidentification
AT jeffmanthey studyingpathogensdegradesblastbasedpathogenidentification