An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes.
For many disease-causing virus species, global diversity is clustered into a taxonomy of subtypes with clinical significance. In particular, the classification of infections among the subtypes of human immunodeficiency virus type 1 (HIV-1) is a routine component of clinical management, and there are...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2018-01-01
|
Series: | PLoS ONE |
Online Access: | http://europepmc.org/articles/PMC6235296?pdf=render |
_version_ | 1818385989179015168 |
---|---|
author | Stephen Solis-Reyes Mariano Avino Art Poon Lila Kari |
author_facet | Stephen Solis-Reyes Mariano Avino Art Poon Lila Kari |
author_sort | Stephen Solis-Reyes |
collection | DOAJ |
description | For many disease-causing virus species, global diversity is clustered into a taxonomy of subtypes with clinical significance. In particular, the classification of infections among the subtypes of human immunodeficiency virus type 1 (HIV-1) is a routine component of clinical management, and there are now many classification algorithms available for this purpose. Although several of these algorithms are similar in accuracy and speed, the majority are proprietary and require laboratories to transmit HIV-1 sequence data over the network to remote servers. This potentially exposes sensitive patient data to unauthorized access, and makes it impossible to determine how classifications are made and to maintain the data provenance of clinical bioinformatic workflows. We propose an open-source supervised and alignment-free subtyping method (Kameris) that operates on k-mer frequencies in HIV-1 sequences. We performed a detailed study of the accuracy and performance of subtype classification in comparison to four state-of-the-art programs. Based on our testing data set of manually curated real-world HIV-1 sequences (n = 2, 784), Kameris obtained an overall accuracy of 97%, which matches or exceeds all other tested software, with a processing rate of over 1,500 sequences per second. Furthermore, our fully standalone general-purpose software provides key advantages in terms of data security and privacy, transparency and reproducibility. Finally, we show that our method is readily adaptable to subtype classification of other viruses including dengue, influenza A, and hepatitis B and C virus. |
first_indexed | 2024-12-14T03:46:55Z |
format | Article |
id | doaj.art-185bd99dfca4447aba78303b108dbe53 |
institution | Directory Open Access Journal |
issn | 1932-6203 |
language | English |
last_indexed | 2024-12-14T03:46:55Z |
publishDate | 2018-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj.art-185bd99dfca4447aba78303b108dbe532022-12-21T23:18:20ZengPublic Library of Science (PLoS)PLoS ONE1932-62032018-01-011311e020640910.1371/journal.pone.0206409An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes.Stephen Solis-ReyesMariano AvinoArt PoonLila KariFor many disease-causing virus species, global diversity is clustered into a taxonomy of subtypes with clinical significance. In particular, the classification of infections among the subtypes of human immunodeficiency virus type 1 (HIV-1) is a routine component of clinical management, and there are now many classification algorithms available for this purpose. Although several of these algorithms are similar in accuracy and speed, the majority are proprietary and require laboratories to transmit HIV-1 sequence data over the network to remote servers. This potentially exposes sensitive patient data to unauthorized access, and makes it impossible to determine how classifications are made and to maintain the data provenance of clinical bioinformatic workflows. We propose an open-source supervised and alignment-free subtyping method (Kameris) that operates on k-mer frequencies in HIV-1 sequences. We performed a detailed study of the accuracy and performance of subtype classification in comparison to four state-of-the-art programs. Based on our testing data set of manually curated real-world HIV-1 sequences (n = 2, 784), Kameris obtained an overall accuracy of 97%, which matches or exceeds all other tested software, with a processing rate of over 1,500 sequences per second. Furthermore, our fully standalone general-purpose software provides key advantages in terms of data security and privacy, transparency and reproducibility. Finally, we show that our method is readily adaptable to subtype classification of other viruses including dengue, influenza A, and hepatitis B and C virus.http://europepmc.org/articles/PMC6235296?pdf=render |
spellingShingle | Stephen Solis-Reyes Mariano Avino Art Poon Lila Kari An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes. PLoS ONE |
title | An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes. |
title_full | An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes. |
title_fullStr | An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes. |
title_full_unstemmed | An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes. |
title_short | An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes. |
title_sort | open source k mer based machine learning tool for fast and accurate subtyping of hiv 1 genomes |
url | http://europepmc.org/articles/PMC6235296?pdf=render |
work_keys_str_mv | AT stephensolisreyes anopensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes AT marianoavino anopensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes AT artpoon anopensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes AT lilakari anopensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes AT stephensolisreyes opensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes AT marianoavino opensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes AT artpoon opensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes AT lilakari opensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes |