Viral taxonomy derived from evolutionary genome relationships.

We describe a new genome alignment-based model for understanding the diversity of viruses based on evolutionary genetic relationships. This approach uses information theory and a physical model to determine the information shared by the genes in two genomes. Pairwise comparisons of genes from the vi...

Full description

Bibliographic Details
Main Authors: Tyler J Dougan, Stephen R Quake
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2019-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0220440
_version_ 1818988781956497408
author Tyler J Dougan
Stephen R Quake
author_facet Tyler J Dougan
Stephen R Quake
author_sort Tyler J Dougan
collection DOAJ
description We describe a new genome alignment-based model for understanding the diversity of viruses based on evolutionary genetic relationships. This approach uses information theory and a physical model to determine the information shared by the genes in two genomes. Pairwise comparisons of genes from the viruses are created from alignments using NCBI BLAST, and their match scores are combined to produce a metric between genomes, which is in turn used to determine a global classification using the 5,817 viruses on RefSeq. In cases where there is no measurable alignment between any genes, the method falls back to a coarser measure of genome relationship: the mutual information of 4-mer frequency. This results in a principled model which depends only on the genome sequence, which captures many interesting relationships between viral families, and which creates clusters which correlate well with both the Baltimore and ICTV classifications. The incremental computational cost of classifying a novel virus is low and therefore newly discovered viruses can be quickly identified and classified. The model goes beyond alignment-free classifications by producing a full phylogeny similar to those constructed by virologists using qualitative features, while relying only on objective genes. These results bolster the case for mathematical models in microbiology which can characterize organisms using only their genetic material and provide an independent check for phylogenies constructed by humans, considerably faster and more cheaply than less modern approaches.
first_indexed 2024-12-20T19:28:03Z
format Article
id doaj.art-6c91171be163442996c0ee776b9d4484
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-20T19:28:03Z
publishDate 2019-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-6c91171be163442996c0ee776b9d44842022-12-21T19:28:51ZengPublic Library of Science (PLoS)PLoS ONE1932-62032019-01-01148e022044010.1371/journal.pone.0220440Viral taxonomy derived from evolutionary genome relationships.Tyler J DouganStephen R QuakeWe describe a new genome alignment-based model for understanding the diversity of viruses based on evolutionary genetic relationships. This approach uses information theory and a physical model to determine the information shared by the genes in two genomes. Pairwise comparisons of genes from the viruses are created from alignments using NCBI BLAST, and their match scores are combined to produce a metric between genomes, which is in turn used to determine a global classification using the 5,817 viruses on RefSeq. In cases where there is no measurable alignment between any genes, the method falls back to a coarser measure of genome relationship: the mutual information of 4-mer frequency. This results in a principled model which depends only on the genome sequence, which captures many interesting relationships between viral families, and which creates clusters which correlate well with both the Baltimore and ICTV classifications. The incremental computational cost of classifying a novel virus is low and therefore newly discovered viruses can be quickly identified and classified. The model goes beyond alignment-free classifications by producing a full phylogeny similar to those constructed by virologists using qualitative features, while relying only on objective genes. These results bolster the case for mathematical models in microbiology which can characterize organisms using only their genetic material and provide an independent check for phylogenies constructed by humans, considerably faster and more cheaply than less modern approaches.https://doi.org/10.1371/journal.pone.0220440
spellingShingle Tyler J Dougan
Stephen R Quake
Viral taxonomy derived from evolutionary genome relationships.
PLoS ONE
title Viral taxonomy derived from evolutionary genome relationships.
title_full Viral taxonomy derived from evolutionary genome relationships.
title_fullStr Viral taxonomy derived from evolutionary genome relationships.
title_full_unstemmed Viral taxonomy derived from evolutionary genome relationships.
title_short Viral taxonomy derived from evolutionary genome relationships.
title_sort viral taxonomy derived from evolutionary genome relationships
url https://doi.org/10.1371/journal.pone.0220440
work_keys_str_mv AT tylerjdougan viraltaxonomyderivedfromevolutionarygenomerelationships
AT stephenrquake viraltaxonomyderivedfromevolutionarygenomerelationships