VirionFinder: Identification of Complete and Partial Prokaryote Virus Virion Protein From Virome Data Using the Sequence and Biochemical Properties of Amino Acids
Viruses are some of the most abundant biological entities on Earth, and prokaryote virus are the dominant members of the viral community. Because of the diversity of prokaryote virus, functional annotation cannot be performed on a large number of genes from newly discovered prokaryote virus by searc...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2021-02-01
|
Series: | Frontiers in Microbiology |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fmicb.2021.615711/full |
_version_ | 1819284165918457856 |
---|---|
author | Zhencheng Fang Zhencheng Fang Hongwei Zhou Hongwei Zhou |
author_facet | Zhencheng Fang Zhencheng Fang Hongwei Zhou Hongwei Zhou |
author_sort | Zhencheng Fang |
collection | DOAJ |
description | Viruses are some of the most abundant biological entities on Earth, and prokaryote virus are the dominant members of the viral community. Because of the diversity of prokaryote virus, functional annotation cannot be performed on a large number of genes from newly discovered prokaryote virus by searching the current database; therefore, the development of an alignment-free algorithm for functional annotation of prokaryote virus proteins is important to understand the viral community. The identification of prokaryote virus virion proteins (PVVPs) is a critical step for many viral analyses, such as species classification, phylogenetic analysis and the exploration of how prokaryote virus interact with their hosts. Although a series of PVVP prediction tools have been developed, the performance of these tools is still not satisfactory. Moreover, viral metagenomic data contains fragmented sequences, leading to the existence of some incomplete genes. Therefore, a tool that can identify partial PVVPs is also needed. In this work, we present a novel algorithm, called VirionFinder, to identify the complete and partial PVVPs from non-prokaryote virus virion proteins (non-PVVPs). VirionFinder uses the sequence and biochemical properties of 20 amino acids as the mathematical model to encode the protein sequences and uses a deep learning technique to identify whether a given protein is a PVVP. Compared with the state-of-the-art tools using artificial benchmark datasets, the results show that under the same specificity (Sp), the sensitivity (Sn) of VirionFinder is approximately 10–34% much higher than the Sn of these tools on both complete and partial proteins. When evaluating related tools using real virome data, the recognition rate of PVVP-like sequences of VirionFinder is also much higher than that of the other tools. We expect that VirionFinder will be a powerful tool for identifying novel virion proteins from both complete prokaryote virus genomes and viral metagenomic data. VirionFinder is freely available at https://github.com/zhenchengfang/VirionFinder. |
first_indexed | 2024-12-24T01:43:03Z |
format | Article |
id | doaj.art-e8944cf1f9cb4c9694530e04c3849b6c |
institution | Directory Open Access Journal |
issn | 1664-302X |
language | English |
last_indexed | 2024-12-24T01:43:03Z |
publishDate | 2021-02-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Microbiology |
spelling | doaj.art-e8944cf1f9cb4c9694530e04c3849b6c2022-12-21T17:21:57ZengFrontiers Media S.A.Frontiers in Microbiology1664-302X2021-02-011210.3389/fmicb.2021.615711615711VirionFinder: Identification of Complete and Partial Prokaryote Virus Virion Protein From Virome Data Using the Sequence and Biochemical Properties of Amino AcidsZhencheng Fang0Zhencheng Fang1Hongwei Zhou2Hongwei Zhou3Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, ChinaCenter for Quantitative Biology, Peking University, Beijing, ChinaMicrobiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, ChinaState Key Laboratory of Organ Failure Research, Southern Medical University, Guangzhou, ChinaViruses are some of the most abundant biological entities on Earth, and prokaryote virus are the dominant members of the viral community. Because of the diversity of prokaryote virus, functional annotation cannot be performed on a large number of genes from newly discovered prokaryote virus by searching the current database; therefore, the development of an alignment-free algorithm for functional annotation of prokaryote virus proteins is important to understand the viral community. The identification of prokaryote virus virion proteins (PVVPs) is a critical step for many viral analyses, such as species classification, phylogenetic analysis and the exploration of how prokaryote virus interact with their hosts. Although a series of PVVP prediction tools have been developed, the performance of these tools is still not satisfactory. Moreover, viral metagenomic data contains fragmented sequences, leading to the existence of some incomplete genes. Therefore, a tool that can identify partial PVVPs is also needed. In this work, we present a novel algorithm, called VirionFinder, to identify the complete and partial PVVPs from non-prokaryote virus virion proteins (non-PVVPs). VirionFinder uses the sequence and biochemical properties of 20 amino acids as the mathematical model to encode the protein sequences and uses a deep learning technique to identify whether a given protein is a PVVP. Compared with the state-of-the-art tools using artificial benchmark datasets, the results show that under the same specificity (Sp), the sensitivity (Sn) of VirionFinder is approximately 10–34% much higher than the Sn of these tools on both complete and partial proteins. When evaluating related tools using real virome data, the recognition rate of PVVP-like sequences of VirionFinder is also much higher than that of the other tools. We expect that VirionFinder will be a powerful tool for identifying novel virion proteins from both complete prokaryote virus genomes and viral metagenomic data. VirionFinder is freely available at https://github.com/zhenchengfang/VirionFinder.https://www.frontiersin.org/articles/10.3389/fmicb.2021.615711/fullviromemetagenomegene function annotationdeep learningprokaryote virus virion protein |
spellingShingle | Zhencheng Fang Zhencheng Fang Hongwei Zhou Hongwei Zhou VirionFinder: Identification of Complete and Partial Prokaryote Virus Virion Protein From Virome Data Using the Sequence and Biochemical Properties of Amino Acids Frontiers in Microbiology virome metagenome gene function annotation deep learning prokaryote virus virion protein |
title | VirionFinder: Identification of Complete and Partial Prokaryote Virus Virion Protein From Virome Data Using the Sequence and Biochemical Properties of Amino Acids |
title_full | VirionFinder: Identification of Complete and Partial Prokaryote Virus Virion Protein From Virome Data Using the Sequence and Biochemical Properties of Amino Acids |
title_fullStr | VirionFinder: Identification of Complete and Partial Prokaryote Virus Virion Protein From Virome Data Using the Sequence and Biochemical Properties of Amino Acids |
title_full_unstemmed | VirionFinder: Identification of Complete and Partial Prokaryote Virus Virion Protein From Virome Data Using the Sequence and Biochemical Properties of Amino Acids |
title_short | VirionFinder: Identification of Complete and Partial Prokaryote Virus Virion Protein From Virome Data Using the Sequence and Biochemical Properties of Amino Acids |
title_sort | virionfinder identification of complete and partial prokaryote virus virion protein from virome data using the sequence and biochemical properties of amino acids |
topic | virome metagenome gene function annotation deep learning prokaryote virus virion protein |
url | https://www.frontiersin.org/articles/10.3389/fmicb.2021.615711/full |
work_keys_str_mv | AT zhenchengfang virionfinderidentificationofcompleteandpartialprokaryotevirusvirionproteinfromviromedatausingthesequenceandbiochemicalpropertiesofaminoacids AT zhenchengfang virionfinderidentificationofcompleteandpartialprokaryotevirusvirionproteinfromviromedatausingthesequenceandbiochemicalpropertiesofaminoacids AT hongweizhou virionfinderidentificationofcompleteandpartialprokaryotevirusvirionproteinfromviromedatausingthesequenceandbiochemicalpropertiesofaminoacids AT hongweizhou virionfinderidentificationofcompleteandpartialprokaryotevirusvirionproteinfromviromedatausingthesequenceandbiochemicalpropertiesofaminoacids |