Machine learning models exploring characteristic single-nucleotide signatures in yellow fever virus.

Yellow fever virus (YFV) is the agent of the most severe mosquito-borne disease in the tropics. Recently, Brazil suffered major YFV outbreaks with a high fatality rate affecting areas where the virus has not been reported for decades, consisting of urban areas where a large number of unvaccinated pe...

Full description

Bibliographic Details
Main Authors: Álvaro Salgado, Raquel C de Melo-Minardi, Marta Giovanetti, Adriano Veloso, Francielly Morais-Rodrigues, Talita Adelino, Ronaldo de Jesus, Stephane Tosta, Vasco Azevedo, José Lourenco, Luiz Carlos J Alcantara
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0278982
_version_ 1811177203181813760
author Álvaro Salgado
Raquel C de Melo-Minardi
Marta Giovanetti
Adriano Veloso
Francielly Morais-Rodrigues
Talita Adelino
Ronaldo de Jesus
Stephane Tosta
Vasco Azevedo
José Lourenco
Luiz Carlos J Alcantara
author_facet Álvaro Salgado
Raquel C de Melo-Minardi
Marta Giovanetti
Adriano Veloso
Francielly Morais-Rodrigues
Talita Adelino
Ronaldo de Jesus
Stephane Tosta
Vasco Azevedo
José Lourenco
Luiz Carlos J Alcantara
author_sort Álvaro Salgado
collection DOAJ
description Yellow fever virus (YFV) is the agent of the most severe mosquito-borne disease in the tropics. Recently, Brazil suffered major YFV outbreaks with a high fatality rate affecting areas where the virus has not been reported for decades, consisting of urban areas where a large number of unvaccinated people live. We developed a machine learning framework combining three different algorithms (XGBoost, random forest and regularized logistic regression) to analyze YFV genomic sequences. This method was applied to 56 YFV sequences from human infections and 27 from non-human primate (NHPs) infections to investigate the presence of genetic signatures possibly related to disease severity (in human related sequences) and differences in PCR cycle threshold (Ct) values (in NHP related sequences). Our analyses reveal four non-synonymous single nucleotide variations (SNVs) on sequences from human infections, in proteins NS3 (E614D), NS4a (I69V), NS5 (R727G, V643A) and six non-synonymous SNVs on NHP sequences, in proteins E (L385F), NS1 (A171V), NS3 (I184V) and NS5 (N11S, I374V, E641D). We performed comparative protein structural analysis on these SNVs, describing possible impacts on protein function. Despite the fact that the dataset is limited in size and that this study does not consider virus-host interactions, our work highlights the use of machine learning as a versatile and fast initial approach to genomic data exploration.
first_indexed 2024-04-10T22:57:48Z
format Article
id doaj.art-32dbbe877581438daca310297309c136
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-04-10T22:57:48Z
publishDate 2022-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-32dbbe877581438daca310297309c1362023-01-14T05:31:35ZengPublic Library of Science (PLoS)PLoS ONE1932-62032022-01-011712e027898210.1371/journal.pone.0278982Machine learning models exploring characteristic single-nucleotide signatures in yellow fever virus.Álvaro SalgadoRaquel C de Melo-MinardiMarta GiovanettiAdriano VelosoFrancielly Morais-RodriguesTalita AdelinoRonaldo de JesusStephane TostaVasco AzevedoJosé LourencoLuiz Carlos J AlcantaraYellow fever virus (YFV) is the agent of the most severe mosquito-borne disease in the tropics. Recently, Brazil suffered major YFV outbreaks with a high fatality rate affecting areas where the virus has not been reported for decades, consisting of urban areas where a large number of unvaccinated people live. We developed a machine learning framework combining three different algorithms (XGBoost, random forest and regularized logistic regression) to analyze YFV genomic sequences. This method was applied to 56 YFV sequences from human infections and 27 from non-human primate (NHPs) infections to investigate the presence of genetic signatures possibly related to disease severity (in human related sequences) and differences in PCR cycle threshold (Ct) values (in NHP related sequences). Our analyses reveal four non-synonymous single nucleotide variations (SNVs) on sequences from human infections, in proteins NS3 (E614D), NS4a (I69V), NS5 (R727G, V643A) and six non-synonymous SNVs on NHP sequences, in proteins E (L385F), NS1 (A171V), NS3 (I184V) and NS5 (N11S, I374V, E641D). We performed comparative protein structural analysis on these SNVs, describing possible impacts on protein function. Despite the fact that the dataset is limited in size and that this study does not consider virus-host interactions, our work highlights the use of machine learning as a versatile and fast initial approach to genomic data exploration.https://doi.org/10.1371/journal.pone.0278982
spellingShingle Álvaro Salgado
Raquel C de Melo-Minardi
Marta Giovanetti
Adriano Veloso
Francielly Morais-Rodrigues
Talita Adelino
Ronaldo de Jesus
Stephane Tosta
Vasco Azevedo
José Lourenco
Luiz Carlos J Alcantara
Machine learning models exploring characteristic single-nucleotide signatures in yellow fever virus.
PLoS ONE
title Machine learning models exploring characteristic single-nucleotide signatures in yellow fever virus.
title_full Machine learning models exploring characteristic single-nucleotide signatures in yellow fever virus.
title_fullStr Machine learning models exploring characteristic single-nucleotide signatures in yellow fever virus.
title_full_unstemmed Machine learning models exploring characteristic single-nucleotide signatures in yellow fever virus.
title_short Machine learning models exploring characteristic single-nucleotide signatures in yellow fever virus.
title_sort machine learning models exploring characteristic single nucleotide signatures in yellow fever virus
url https://doi.org/10.1371/journal.pone.0278982
work_keys_str_mv AT alvarosalgado machinelearningmodelsexploringcharacteristicsinglenucleotidesignaturesinyellowfevervirus
AT raquelcdemelominardi machinelearningmodelsexploringcharacteristicsinglenucleotidesignaturesinyellowfevervirus
AT martagiovanetti machinelearningmodelsexploringcharacteristicsinglenucleotidesignaturesinyellowfevervirus
AT adrianoveloso machinelearningmodelsexploringcharacteristicsinglenucleotidesignaturesinyellowfevervirus
AT franciellymoraisrodrigues machinelearningmodelsexploringcharacteristicsinglenucleotidesignaturesinyellowfevervirus
AT talitaadelino machinelearningmodelsexploringcharacteristicsinglenucleotidesignaturesinyellowfevervirus
AT ronaldodejesus machinelearningmodelsexploringcharacteristicsinglenucleotidesignaturesinyellowfevervirus
AT stephanetosta machinelearningmodelsexploringcharacteristicsinglenucleotidesignaturesinyellowfevervirus
AT vascoazevedo machinelearningmodelsexploringcharacteristicsinglenucleotidesignaturesinyellowfevervirus
AT joselourenco machinelearningmodelsexploringcharacteristicsinglenucleotidesignaturesinyellowfevervirus
AT luizcarlosjalcantara machinelearningmodelsexploringcharacteristicsinglenucleotidesignaturesinyellowfevervirus