VirSorter: mining viral signal from microbial genomic data

Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and r...

Full description

Bibliographic Details
Main Authors: Simon Roux, Francois Enault, Bonnie L. Hurwitz, Matthew B. Sullivan
Format: Article
Language:English
Published: PeerJ Inc. 2015-05-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/985.pdf
_version_ 1797420150050258944
author Simon Roux
Francois Enault
Bonnie L. Hurwitz
Matthew B. Sullivan
author_facet Simon Roux
Francois Enault
Bonnie L. Hurwitz
Matthew B. Sullivan
author_sort Simon Roux
collection DOAJ
description Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the detection of viral signal in microbial genomic data. While multiple approaches have been developed and applied for the detection of prophages (viral genomes integrated in a microbial genome), new types of microbial genomic data are emerging that are more fragmented and larger scale, such as Single-cell Amplified Genomes (SAGs) of uncultivated organisms or genomic fragments assembled from metagenomic sequencing. Here, we present VirSorter, a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. Performance testing shows that VirSorter’s prophage prediction capability compares to that of available prophage predictors for complete genomes, but is superior in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages). Furthermore, VirSorter outperforms existing tools for fragmented genomic and metagenomic datasets, and can identify viral signal in assembled sequence (contigs) as short as 3kb, while providing near-perfect identification (>95% Recall and 100% Precision) on contigs of at least 10kb. Because VirSorter scales to large datasets, it can also be used in “reverse” to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination. Finally, VirSorter is made available through the iPlant Cyberinfrastructure that provides a web-based user interface interconnected with the required computing resources. VirSorter thus complements existing prophage prediction softwares to better leverage fragmented, SAG and metagenomic datasets in a way that will scale to modern sequencing. Given these features, VirSorter should enable the discovery of new viruses in microbial datasets, and further our understanding of uncultivated viral communities across diverse ecosystems.
first_indexed 2024-03-09T06:58:28Z
format Article
id doaj.art-bd322fadba0b40368a4bfdc916e88fcc
institution Directory Open Access Journal
issn 2167-8359
language English
last_indexed 2024-03-09T06:58:28Z
publishDate 2015-05-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj.art-bd322fadba0b40368a4bfdc916e88fcc2023-12-03T10:01:53ZengPeerJ Inc.PeerJ2167-83592015-05-013e98510.7717/peerj.985985VirSorter: mining viral signal from microbial genomic dataSimon Roux0Francois Enault1Bonnie L. Hurwitz2Matthew B. Sullivan3Ecology and Evolutionary Biology, University of Arizona, USAClermont Université, Université Blaise Pascal, Laboratoire “Microorganismes: Génome et Environnement,”, Clermont-Ferrand, FranceDepartment of Agricultural and Biosystems Engineering, University of Arizona, USAEcology and Evolutionary Biology, University of Arizona, USAViruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the detection of viral signal in microbial genomic data. While multiple approaches have been developed and applied for the detection of prophages (viral genomes integrated in a microbial genome), new types of microbial genomic data are emerging that are more fragmented and larger scale, such as Single-cell Amplified Genomes (SAGs) of uncultivated organisms or genomic fragments assembled from metagenomic sequencing. Here, we present VirSorter, a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. Performance testing shows that VirSorter’s prophage prediction capability compares to that of available prophage predictors for complete genomes, but is superior in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages). Furthermore, VirSorter outperforms existing tools for fragmented genomic and metagenomic datasets, and can identify viral signal in assembled sequence (contigs) as short as 3kb, while providing near-perfect identification (>95% Recall and 100% Precision) on contigs of at least 10kb. Because VirSorter scales to large datasets, it can also be used in “reverse” to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination. Finally, VirSorter is made available through the iPlant Cyberinfrastructure that provides a web-based user interface interconnected with the required computing resources. VirSorter thus complements existing prophage prediction softwares to better leverage fragmented, SAG and metagenomic datasets in a way that will scale to modern sequencing. Given these features, VirSorter should enable the discovery of new viruses in microbial datasets, and further our understanding of uncultivated viral communities across diverse ecosystems.https://peerj.com/articles/985.pdfVirusBacteriophageProphageSingle-cell amplified genomeMetagenomicsViral metagenomics
spellingShingle Simon Roux
Francois Enault
Bonnie L. Hurwitz
Matthew B. Sullivan
VirSorter: mining viral signal from microbial genomic data
PeerJ
Virus
Bacteriophage
Prophage
Single-cell amplified genome
Metagenomics
Viral metagenomics
title VirSorter: mining viral signal from microbial genomic data
title_full VirSorter: mining viral signal from microbial genomic data
title_fullStr VirSorter: mining viral signal from microbial genomic data
title_full_unstemmed VirSorter: mining viral signal from microbial genomic data
title_short VirSorter: mining viral signal from microbial genomic data
title_sort virsorter mining viral signal from microbial genomic data
topic Virus
Bacteriophage
Prophage
Single-cell amplified genome
Metagenomics
Viral metagenomics
url https://peerj.com/articles/985.pdf
work_keys_str_mv AT simonroux virsorterminingviralsignalfrommicrobialgenomicdata
AT francoisenault virsorterminingviralsignalfrommicrobialgenomicdata
AT bonnielhurwitz virsorterminingviralsignalfrommicrobialgenomicdata
AT matthewbsullivan virsorterminingviralsignalfrommicrobialgenomicdata