Rapidly identifying new coronavirus mutations of potential concern in the Omicron variant using an unsupervised learning strategy

Abstract Extensive mutations in the Omicron spike protein appear to accelerate the transmission of SARS-CoV-2, and rapid infections increase the odds that additional mutants will emerge. To build an investigative framework, we have applied an unsupervised machine learning approach to 4296 Omicron vi...

Full description

Bibliographic Details
Main Authors: Lue Ping Zhao, Terry P. Lybrand, Peter B. Gilbert, Thomas H. Payne, Chul-Woo Pyo, Daniel E. Geraghty, Keith R. Jerome
Format: Article
Language:English
Published: Nature Portfolio 2022-11-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-022-23342-2
_version_ 1817970592607895552
author Lue Ping Zhao
Terry P. Lybrand
Peter B. Gilbert
Thomas H. Payne
Chul-Woo Pyo
Daniel E. Geraghty
Keith R. Jerome
author_facet Lue Ping Zhao
Terry P. Lybrand
Peter B. Gilbert
Thomas H. Payne
Chul-Woo Pyo
Daniel E. Geraghty
Keith R. Jerome
author_sort Lue Ping Zhao
collection DOAJ
description Abstract Extensive mutations in the Omicron spike protein appear to accelerate the transmission of SARS-CoV-2, and rapid infections increase the odds that additional mutants will emerge. To build an investigative framework, we have applied an unsupervised machine learning approach to 4296 Omicron viral genomes collected and deposited to GISAID as of December 14, 2021, and have identified a core haplotype of 28 polymutants (A67V, T95I, G339D, R346K, S371L, S373P, S375F, K417N, N440K, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, N501Y, Y505H, T547K, D614G, H655Y, N679K, P681H, N764K, K796Y, N856K, Q954H, N69K, L981F) in the spike protein and a separate core haplotype of 17 polymutants in non-spike genes: (K38, A1892) in nsp3, T492 in nsp4, (P132, V247, T280, S284) in 3C-like proteinase, I189 in nsp6, P323 in RNA-dependent RNA polymerase, I42 in Exonuclease, T9 in envelope protein, (D3, Q19, A63) in membrane glycoprotein, and (P13, R203, G204) in nucleocapsid phosphoprotein. Using these core haplotypes as reference, we have identified four newly emerging polymutants (R346, A701, I1081, N1192) in the spike protein (p value = 9.37*10−4, 1.0*10−15, 4.76*10−7 and 1.56*10−4, respectively), and five additional polymutants in non-spike genes (D343G in nucleocapsid phosphoprotein, V1069I in nsp3, V94A in nsp4, F694Y in the RNA-dependent RNA polymerase and L106L/F of ORF3a) that exhibit significant increasing trajectories (all p values < 1.0*10−15). In the absence of relevant clinical data for these newly emerging mutations, it is important to monitor them closely. Two emerging mutations may be of particular concern: the N1192S mutation in spike protein locates in an extremely highly conserved region of all human coronaviruses that is integral to the viral fusion process, and the F694Y mutation in the RNA polymerase may induce conformational changes that could impact remdesivir binding.
first_indexed 2024-04-13T20:36:02Z
format Article
id doaj.art-6a6535001053414e9ccd574c6a479829
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-04-13T20:36:02Z
publishDate 2022-11-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-6a6535001053414e9ccd574c6a4798292022-12-22T02:31:02ZengNature PortfolioScientific Reports2045-23222022-11-0112111610.1038/s41598-022-23342-2Rapidly identifying new coronavirus mutations of potential concern in the Omicron variant using an unsupervised learning strategyLue Ping Zhao0Terry P. Lybrand1Peter B. Gilbert2Thomas H. Payne3Chul-Woo Pyo4Daniel E. Geraghty5Keith R. Jerome6Public Health Sciences Division, Fred Hutchinson Cancer Research CenterQuintepa Computing LLCVaccine and Infectious Disease Division, Fred Hutchinson Cancer Research CenterDepartment of Medicine, University of Washington School of MedicineClinical Research Division, Fred Hutchinson Cancer Research CenterClinical Research Division, Fred Hutchinson Cancer Research CenterVaccine and Infectious Disease Division, Fred Hutchinson Cancer Research CenterAbstract Extensive mutations in the Omicron spike protein appear to accelerate the transmission of SARS-CoV-2, and rapid infections increase the odds that additional mutants will emerge. To build an investigative framework, we have applied an unsupervised machine learning approach to 4296 Omicron viral genomes collected and deposited to GISAID as of December 14, 2021, and have identified a core haplotype of 28 polymutants (A67V, T95I, G339D, R346K, S371L, S373P, S375F, K417N, N440K, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, N501Y, Y505H, T547K, D614G, H655Y, N679K, P681H, N764K, K796Y, N856K, Q954H, N69K, L981F) in the spike protein and a separate core haplotype of 17 polymutants in non-spike genes: (K38, A1892) in nsp3, T492 in nsp4, (P132, V247, T280, S284) in 3C-like proteinase, I189 in nsp6, P323 in RNA-dependent RNA polymerase, I42 in Exonuclease, T9 in envelope protein, (D3, Q19, A63) in membrane glycoprotein, and (P13, R203, G204) in nucleocapsid phosphoprotein. Using these core haplotypes as reference, we have identified four newly emerging polymutants (R346, A701, I1081, N1192) in the spike protein (p value = 9.37*10−4, 1.0*10−15, 4.76*10−7 and 1.56*10−4, respectively), and five additional polymutants in non-spike genes (D343G in nucleocapsid phosphoprotein, V1069I in nsp3, V94A in nsp4, F694Y in the RNA-dependent RNA polymerase and L106L/F of ORF3a) that exhibit significant increasing trajectories (all p values < 1.0*10−15). In the absence of relevant clinical data for these newly emerging mutations, it is important to monitor them closely. Two emerging mutations may be of particular concern: the N1192S mutation in spike protein locates in an extremely highly conserved region of all human coronaviruses that is integral to the viral fusion process, and the F694Y mutation in the RNA polymerase may induce conformational changes that could impact remdesivir binding.https://doi.org/10.1038/s41598-022-23342-2
spellingShingle Lue Ping Zhao
Terry P. Lybrand
Peter B. Gilbert
Thomas H. Payne
Chul-Woo Pyo
Daniel E. Geraghty
Keith R. Jerome
Rapidly identifying new coronavirus mutations of potential concern in the Omicron variant using an unsupervised learning strategy
Scientific Reports
title Rapidly identifying new coronavirus mutations of potential concern in the Omicron variant using an unsupervised learning strategy
title_full Rapidly identifying new coronavirus mutations of potential concern in the Omicron variant using an unsupervised learning strategy
title_fullStr Rapidly identifying new coronavirus mutations of potential concern in the Omicron variant using an unsupervised learning strategy
title_full_unstemmed Rapidly identifying new coronavirus mutations of potential concern in the Omicron variant using an unsupervised learning strategy
title_short Rapidly identifying new coronavirus mutations of potential concern in the Omicron variant using an unsupervised learning strategy
title_sort rapidly identifying new coronavirus mutations of potential concern in the omicron variant using an unsupervised learning strategy
url https://doi.org/10.1038/s41598-022-23342-2
work_keys_str_mv AT luepingzhao rapidlyidentifyingnewcoronavirusmutationsofpotentialconcernintheomicronvariantusinganunsupervisedlearningstrategy
AT terryplybrand rapidlyidentifyingnewcoronavirusmutationsofpotentialconcernintheomicronvariantusinganunsupervisedlearningstrategy
AT peterbgilbert rapidlyidentifyingnewcoronavirusmutationsofpotentialconcernintheomicronvariantusinganunsupervisedlearningstrategy
AT thomashpayne rapidlyidentifyingnewcoronavirusmutationsofpotentialconcernintheomicronvariantusinganunsupervisedlearningstrategy
AT chulwoopyo rapidlyidentifyingnewcoronavirusmutationsofpotentialconcernintheomicronvariantusinganunsupervisedlearningstrategy
AT danielegeraghty rapidlyidentifyingnewcoronavirusmutationsofpotentialconcernintheomicronvariantusinganunsupervisedlearningstrategy
AT keithrjerome rapidlyidentifyingnewcoronavirusmutationsofpotentialconcernintheomicronvariantusinganunsupervisedlearningstrategy