Rapidly identifying new coronavirus mutations of potential concern in the Omicron variant using an unsupervised learning strategy
Abstract Extensive mutations in the Omicron spike protein appear to accelerate the transmission of SARS-CoV-2, and rapid infections increase the odds that additional mutants will emerge. To build an investigative framework, we have applied an unsupervised machine learning approach to 4296 Omicron vi...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2022-11-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-022-23342-2 |
_version_ | 1817970592607895552 |
---|---|
author | Lue Ping Zhao Terry P. Lybrand Peter B. Gilbert Thomas H. Payne Chul-Woo Pyo Daniel E. Geraghty Keith R. Jerome |
author_facet | Lue Ping Zhao Terry P. Lybrand Peter B. Gilbert Thomas H. Payne Chul-Woo Pyo Daniel E. Geraghty Keith R. Jerome |
author_sort | Lue Ping Zhao |
collection | DOAJ |
description | Abstract Extensive mutations in the Omicron spike protein appear to accelerate the transmission of SARS-CoV-2, and rapid infections increase the odds that additional mutants will emerge. To build an investigative framework, we have applied an unsupervised machine learning approach to 4296 Omicron viral genomes collected and deposited to GISAID as of December 14, 2021, and have identified a core haplotype of 28 polymutants (A67V, T95I, G339D, R346K, S371L, S373P, S375F, K417N, N440K, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, N501Y, Y505H, T547K, D614G, H655Y, N679K, P681H, N764K, K796Y, N856K, Q954H, N69K, L981F) in the spike protein and a separate core haplotype of 17 polymutants in non-spike genes: (K38, A1892) in nsp3, T492 in nsp4, (P132, V247, T280, S284) in 3C-like proteinase, I189 in nsp6, P323 in RNA-dependent RNA polymerase, I42 in Exonuclease, T9 in envelope protein, (D3, Q19, A63) in membrane glycoprotein, and (P13, R203, G204) in nucleocapsid phosphoprotein. Using these core haplotypes as reference, we have identified four newly emerging polymutants (R346, A701, I1081, N1192) in the spike protein (p value = 9.37*10−4, 1.0*10−15, 4.76*10−7 and 1.56*10−4, respectively), and five additional polymutants in non-spike genes (D343G in nucleocapsid phosphoprotein, V1069I in nsp3, V94A in nsp4, F694Y in the RNA-dependent RNA polymerase and L106L/F of ORF3a) that exhibit significant increasing trajectories (all p values < 1.0*10−15). In the absence of relevant clinical data for these newly emerging mutations, it is important to monitor them closely. Two emerging mutations may be of particular concern: the N1192S mutation in spike protein locates in an extremely highly conserved region of all human coronaviruses that is integral to the viral fusion process, and the F694Y mutation in the RNA polymerase may induce conformational changes that could impact remdesivir binding. |
first_indexed | 2024-04-13T20:36:02Z |
format | Article |
id | doaj.art-6a6535001053414e9ccd574c6a479829 |
institution | Directory Open Access Journal |
issn | 2045-2322 |
language | English |
last_indexed | 2024-04-13T20:36:02Z |
publishDate | 2022-11-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj.art-6a6535001053414e9ccd574c6a4798292022-12-22T02:31:02ZengNature PortfolioScientific Reports2045-23222022-11-0112111610.1038/s41598-022-23342-2Rapidly identifying new coronavirus mutations of potential concern in the Omicron variant using an unsupervised learning strategyLue Ping Zhao0Terry P. Lybrand1Peter B. Gilbert2Thomas H. Payne3Chul-Woo Pyo4Daniel E. Geraghty5Keith R. Jerome6Public Health Sciences Division, Fred Hutchinson Cancer Research CenterQuintepa Computing LLCVaccine and Infectious Disease Division, Fred Hutchinson Cancer Research CenterDepartment of Medicine, University of Washington School of MedicineClinical Research Division, Fred Hutchinson Cancer Research CenterClinical Research Division, Fred Hutchinson Cancer Research CenterVaccine and Infectious Disease Division, Fred Hutchinson Cancer Research CenterAbstract Extensive mutations in the Omicron spike protein appear to accelerate the transmission of SARS-CoV-2, and rapid infections increase the odds that additional mutants will emerge. To build an investigative framework, we have applied an unsupervised machine learning approach to 4296 Omicron viral genomes collected and deposited to GISAID as of December 14, 2021, and have identified a core haplotype of 28 polymutants (A67V, T95I, G339D, R346K, S371L, S373P, S375F, K417N, N440K, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, N501Y, Y505H, T547K, D614G, H655Y, N679K, P681H, N764K, K796Y, N856K, Q954H, N69K, L981F) in the spike protein and a separate core haplotype of 17 polymutants in non-spike genes: (K38, A1892) in nsp3, T492 in nsp4, (P132, V247, T280, S284) in 3C-like proteinase, I189 in nsp6, P323 in RNA-dependent RNA polymerase, I42 in Exonuclease, T9 in envelope protein, (D3, Q19, A63) in membrane glycoprotein, and (P13, R203, G204) in nucleocapsid phosphoprotein. Using these core haplotypes as reference, we have identified four newly emerging polymutants (R346, A701, I1081, N1192) in the spike protein (p value = 9.37*10−4, 1.0*10−15, 4.76*10−7 and 1.56*10−4, respectively), and five additional polymutants in non-spike genes (D343G in nucleocapsid phosphoprotein, V1069I in nsp3, V94A in nsp4, F694Y in the RNA-dependent RNA polymerase and L106L/F of ORF3a) that exhibit significant increasing trajectories (all p values < 1.0*10−15). In the absence of relevant clinical data for these newly emerging mutations, it is important to monitor them closely. Two emerging mutations may be of particular concern: the N1192S mutation in spike protein locates in an extremely highly conserved region of all human coronaviruses that is integral to the viral fusion process, and the F694Y mutation in the RNA polymerase may induce conformational changes that could impact remdesivir binding.https://doi.org/10.1038/s41598-022-23342-2 |
spellingShingle | Lue Ping Zhao Terry P. Lybrand Peter B. Gilbert Thomas H. Payne Chul-Woo Pyo Daniel E. Geraghty Keith R. Jerome Rapidly identifying new coronavirus mutations of potential concern in the Omicron variant using an unsupervised learning strategy Scientific Reports |
title | Rapidly identifying new coronavirus mutations of potential concern in the Omicron variant using an unsupervised learning strategy |
title_full | Rapidly identifying new coronavirus mutations of potential concern in the Omicron variant using an unsupervised learning strategy |
title_fullStr | Rapidly identifying new coronavirus mutations of potential concern in the Omicron variant using an unsupervised learning strategy |
title_full_unstemmed | Rapidly identifying new coronavirus mutations of potential concern in the Omicron variant using an unsupervised learning strategy |
title_short | Rapidly identifying new coronavirus mutations of potential concern in the Omicron variant using an unsupervised learning strategy |
title_sort | rapidly identifying new coronavirus mutations of potential concern in the omicron variant using an unsupervised learning strategy |
url | https://doi.org/10.1038/s41598-022-23342-2 |
work_keys_str_mv | AT luepingzhao rapidlyidentifyingnewcoronavirusmutationsofpotentialconcernintheomicronvariantusinganunsupervisedlearningstrategy AT terryplybrand rapidlyidentifyingnewcoronavirusmutationsofpotentialconcernintheomicronvariantusinganunsupervisedlearningstrategy AT peterbgilbert rapidlyidentifyingnewcoronavirusmutationsofpotentialconcernintheomicronvariantusinganunsupervisedlearningstrategy AT thomashpayne rapidlyidentifyingnewcoronavirusmutationsofpotentialconcernintheomicronvariantusinganunsupervisedlearningstrategy AT chulwoopyo rapidlyidentifyingnewcoronavirusmutationsofpotentialconcernintheomicronvariantusinganunsupervisedlearningstrategy AT danielegeraghty rapidlyidentifyingnewcoronavirusmutationsofpotentialconcernintheomicronvariantusinganunsupervisedlearningstrategy AT keithrjerome rapidlyidentifyingnewcoronavirusmutationsofpotentialconcernintheomicronvariantusinganunsupervisedlearningstrategy |