An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes.

For many disease-causing virus species, global diversity is clustered into a taxonomy of subtypes with clinical significance. In particular, the classification of infections among the subtypes of human immunodeficiency virus type 1 (HIV-1) is a routine component of clinical management, and there are...

Full description

Bibliographic Details
Main Authors: Stephen Solis-Reyes, Mariano Avino, Art Poon, Lila Kari
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2018-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC6235296?pdf=render
_version_ 1818385989179015168
author Stephen Solis-Reyes
Mariano Avino
Art Poon
Lila Kari
author_facet Stephen Solis-Reyes
Mariano Avino
Art Poon
Lila Kari
author_sort Stephen Solis-Reyes
collection DOAJ
description For many disease-causing virus species, global diversity is clustered into a taxonomy of subtypes with clinical significance. In particular, the classification of infections among the subtypes of human immunodeficiency virus type 1 (HIV-1) is a routine component of clinical management, and there are now many classification algorithms available for this purpose. Although several of these algorithms are similar in accuracy and speed, the majority are proprietary and require laboratories to transmit HIV-1 sequence data over the network to remote servers. This potentially exposes sensitive patient data to unauthorized access, and makes it impossible to determine how classifications are made and to maintain the data provenance of clinical bioinformatic workflows. We propose an open-source supervised and alignment-free subtyping method (Kameris) that operates on k-mer frequencies in HIV-1 sequences. We performed a detailed study of the accuracy and performance of subtype classification in comparison to four state-of-the-art programs. Based on our testing data set of manually curated real-world HIV-1 sequences (n = 2, 784), Kameris obtained an overall accuracy of 97%, which matches or exceeds all other tested software, with a processing rate of over 1,500 sequences per second. Furthermore, our fully standalone general-purpose software provides key advantages in terms of data security and privacy, transparency and reproducibility. Finally, we show that our method is readily adaptable to subtype classification of other viruses including dengue, influenza A, and hepatitis B and C virus.
first_indexed 2024-12-14T03:46:55Z
format Article
id doaj.art-185bd99dfca4447aba78303b108dbe53
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-14T03:46:55Z
publishDate 2018-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-185bd99dfca4447aba78303b108dbe532022-12-21T23:18:20ZengPublic Library of Science (PLoS)PLoS ONE1932-62032018-01-011311e020640910.1371/journal.pone.0206409An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes.Stephen Solis-ReyesMariano AvinoArt PoonLila KariFor many disease-causing virus species, global diversity is clustered into a taxonomy of subtypes with clinical significance. In particular, the classification of infections among the subtypes of human immunodeficiency virus type 1 (HIV-1) is a routine component of clinical management, and there are now many classification algorithms available for this purpose. Although several of these algorithms are similar in accuracy and speed, the majority are proprietary and require laboratories to transmit HIV-1 sequence data over the network to remote servers. This potentially exposes sensitive patient data to unauthorized access, and makes it impossible to determine how classifications are made and to maintain the data provenance of clinical bioinformatic workflows. We propose an open-source supervised and alignment-free subtyping method (Kameris) that operates on k-mer frequencies in HIV-1 sequences. We performed a detailed study of the accuracy and performance of subtype classification in comparison to four state-of-the-art programs. Based on our testing data set of manually curated real-world HIV-1 sequences (n = 2, 784), Kameris obtained an overall accuracy of 97%, which matches or exceeds all other tested software, with a processing rate of over 1,500 sequences per second. Furthermore, our fully standalone general-purpose software provides key advantages in terms of data security and privacy, transparency and reproducibility. Finally, we show that our method is readily adaptable to subtype classification of other viruses including dengue, influenza A, and hepatitis B and C virus.http://europepmc.org/articles/PMC6235296?pdf=render
spellingShingle Stephen Solis-Reyes
Mariano Avino
Art Poon
Lila Kari
An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes.
PLoS ONE
title An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes.
title_full An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes.
title_fullStr An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes.
title_full_unstemmed An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes.
title_short An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes.
title_sort open source k mer based machine learning tool for fast and accurate subtyping of hiv 1 genomes
url http://europepmc.org/articles/PMC6235296?pdf=render
work_keys_str_mv AT stephensolisreyes anopensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes
AT marianoavino anopensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes
AT artpoon anopensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes
AT lilakari anopensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes
AT stephensolisreyes opensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes
AT marianoavino opensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes
AT artpoon opensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes
AT lilakari opensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes