VirClust—A Tool for Hierarchical Clustering, Core Protein Detection and Annotation of (<i>Prokaryotic</i>) Viruses

Recent years have seen major changes in the classification criteria and taxonomy of viruses. The current classification scheme, also called “megataxonomy of viruses”, recognizes six different viral realms, defined based on the presence of viral hallmark genes (VHGs). Within the realms, viruses are c...

Full description

Bibliographic Details
Main Author: Cristina Moraru
Format: Article
Language:English
Published: MDPI AG 2023-04-01
Series:Viruses
Subjects:
Online Access:https://www.mdpi.com/1999-4915/15/4/1007
_version_ 1797603149733691392
author Cristina Moraru
author_facet Cristina Moraru
author_sort Cristina Moraru
collection DOAJ
description Recent years have seen major changes in the classification criteria and taxonomy of viruses. The current classification scheme, also called “megataxonomy of viruses”, recognizes six different viral realms, defined based on the presence of viral hallmark genes (VHGs). Within the realms, viruses are classified into hierarchical taxons, ideally defined by the phylogeny of their shared genes. To enable the detection of shared genes, viruses have first to be clustered, and there is currently a need for tools to assist with virus clustering and classification. Here, VirClust is presented. It is a novel, reference-free tool capable of performing: (i) protein clustering, based on BLASTp and Hidden Markov Models (HMMs) similarities; (ii) hierarchical clustering of viruses based on intergenomic distances calculated from their shared protein content; (iii) identification of core proteins and (iv) annotation of viral proteins. VirClust has flexible parameters both for protein clustering and for splitting the viral genome tree into smaller genome clusters, corresponding to different taxonomic levels. Benchmarking on a phage dataset showed that the genome trees produced by VirClust match the current ICTV classification at family, sub-family and genus levels. VirClust is freely available, as a web-service and stand-alone tool.
first_indexed 2024-03-11T04:26:21Z
format Article
id doaj.art-a1bf6749c0dd455ea4e3fdb42b657fb7
institution Directory Open Access Journal
issn 1999-4915
language English
last_indexed 2024-03-11T04:26:21Z
publishDate 2023-04-01
publisher MDPI AG
record_format Article
series Viruses
spelling doaj.art-a1bf6749c0dd455ea4e3fdb42b657fb72023-11-17T21:46:53ZengMDPI AGViruses1999-49152023-04-01154100710.3390/v15041007VirClust—A Tool for Hierarchical Clustering, Core Protein Detection and Annotation of (<i>Prokaryotic</i>) VirusesCristina Moraru0Institute for Chemistry and Biology of the Marine Environment, Carl-von-Ossietzky–Str. 9-11, 26111 Oldenburg, GermanyRecent years have seen major changes in the classification criteria and taxonomy of viruses. The current classification scheme, also called “megataxonomy of viruses”, recognizes six different viral realms, defined based on the presence of viral hallmark genes (VHGs). Within the realms, viruses are classified into hierarchical taxons, ideally defined by the phylogeny of their shared genes. To enable the detection of shared genes, viruses have first to be clustered, and there is currently a need for tools to assist with virus clustering and classification. Here, VirClust is presented. It is a novel, reference-free tool capable of performing: (i) protein clustering, based on BLASTp and Hidden Markov Models (HMMs) similarities; (ii) hierarchical clustering of viruses based on intergenomic distances calculated from their shared protein content; (iii) identification of core proteins and (iv) annotation of viral proteins. VirClust has flexible parameters both for protein clustering and for splitting the viral genome tree into smaller genome clusters, corresponding to different taxonomic levels. Benchmarking on a phage dataset showed that the genome trees produced by VirClust match the current ICTV classification at family, sub-family and genus levels. VirClust is freely available, as a web-service and stand-alone tool.https://www.mdpi.com/1999-4915/15/4/1007VirClustvirus genome clusteringvirus protein clusteringvirus protein annotationvirus classificationphage classification
spellingShingle Cristina Moraru
VirClust—A Tool for Hierarchical Clustering, Core Protein Detection and Annotation of (<i>Prokaryotic</i>) Viruses
Viruses
VirClust
virus genome clustering
virus protein clustering
virus protein annotation
virus classification
phage classification
title VirClust—A Tool for Hierarchical Clustering, Core Protein Detection and Annotation of (<i>Prokaryotic</i>) Viruses
title_full VirClust—A Tool for Hierarchical Clustering, Core Protein Detection and Annotation of (<i>Prokaryotic</i>) Viruses
title_fullStr VirClust—A Tool for Hierarchical Clustering, Core Protein Detection and Annotation of (<i>Prokaryotic</i>) Viruses
title_full_unstemmed VirClust—A Tool for Hierarchical Clustering, Core Protein Detection and Annotation of (<i>Prokaryotic</i>) Viruses
title_short VirClust—A Tool for Hierarchical Clustering, Core Protein Detection and Annotation of (<i>Prokaryotic</i>) Viruses
title_sort virclust a tool for hierarchical clustering core protein detection and annotation of i prokaryotic i viruses
topic VirClust
virus genome clustering
virus protein clustering
virus protein annotation
virus classification
phage classification
url https://www.mdpi.com/1999-4915/15/4/1007
work_keys_str_mv AT cristinamoraru virclustatoolforhierarchicalclusteringcoreproteindetectionandannotationofiprokaryoticiviruses