Emergence of genomic diversity and recurrent mutations in SARS-CoV-2

SARS-CoV-2 is a SARS-like coronavirus of likely zoonotic origin first identified in December 2019 in Wuhan, the capital of China's Hubei province. The virus has since spread globally, resulting in the currently ongoing COVID-19 pandemic. The first whole genome sequence was published on January...

Full description

Bibliographic Details
Main Authors: van Dorp, L, Acman, M, Richard, D, Shaw, LP, Ford, CE, Ormond, L, Owen, CJ, Pang, J, Tan, CCS, Boshier, FAT, Ortiz, AT, Balloux, F
Format: Journal article
Language:English
Published: Elsevier 2020
_version_ 1797080676102569984
author van Dorp, L
Acman, M
Richard, D
Shaw, LP
Ford, CE
Ormond, L
Owen, CJ
Pang, J
Tan, CCS
Boshier, FAT
Ortiz, AT
Balloux, F
author_facet van Dorp, L
Acman, M
Richard, D
Shaw, LP
Ford, CE
Ormond, L
Owen, CJ
Pang, J
Tan, CCS
Boshier, FAT
Ortiz, AT
Balloux, F
author_sort van Dorp, L
collection OXFORD
description SARS-CoV-2 is a SARS-like coronavirus of likely zoonotic origin first identified in December 2019 in Wuhan, the capital of China's Hubei province. The virus has since spread globally, resulting in the currently ongoing COVID-19 pandemic. The first whole genome sequence was published on January 52,020, and thousands of genomes have been sequenced since this date. This resource allows unprecedented insights into the past demography of SARS-CoV-2 but also monitoring of how the virus is adapting to its novel human host, providing information to direct drug and vaccine design. We curated a dataset of 7666 public genome assemblies and analysed the emergence of genomic diversity over time. Our results are in line with previous estimates and point to all sequences sharing a common ancestor towards the end of 2019, supporting this as the period when SARS-CoV-2 jumped into its human host. Due to extensive transmission, the genetic diversity of the virus in several countries recapitulates a large fraction of its worldwide genetic diversity. We identify regions of the SARS-CoV-2 genome that have remained largely invariant to date, and others that have already accumulated diversity. By focusing on mutations which have emerged independently multiple times (homoplasies), we identify 198 filtered recurrent mutations in the SARS-CoV-2 genome. Nearly 80% of the recurrent mutations produced non-synonymous changes at the protein level, suggesting possible ongoing adaptation of SARS-CoV-2. Three sites in Orf1ab in the regions encoding Nsp6, Nsp11, Nsp13, and one in the Spike protein are characterised by a particularly large number of recurrent mutations (>15 events) which may signpost convergent evolution and are of particular interest in the context of adaptation of SARS-CoV-2 to the human host. We additionally provide an interactive user-friendly web-application to query the alignment of the 7666 SARS-CoV-2 genomes.
first_indexed 2024-03-07T01:03:30Z
format Journal article
id oxford-uuid:8a857e59-c095-4714-b441-e1b8747f8332
institution University of Oxford
language English
last_indexed 2024-03-07T01:03:30Z
publishDate 2020
publisher Elsevier
record_format dspace
spelling oxford-uuid:8a857e59-c095-4714-b441-e1b8747f83322022-03-26T22:32:12ZEmergence of genomic diversity and recurrent mutations in SARS-CoV-2Journal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:8a857e59-c095-4714-b441-e1b8747f8332EnglishSymplectic ElementsElsevier2020van Dorp, LAcman, MRichard, DShaw, LPFord, CEOrmond, LOwen, CJPang, JTan, CCSBoshier, FATOrtiz, ATBalloux, FSARS-CoV-2 is a SARS-like coronavirus of likely zoonotic origin first identified in December 2019 in Wuhan, the capital of China's Hubei province. The virus has since spread globally, resulting in the currently ongoing COVID-19 pandemic. The first whole genome sequence was published on January 52,020, and thousands of genomes have been sequenced since this date. This resource allows unprecedented insights into the past demography of SARS-CoV-2 but also monitoring of how the virus is adapting to its novel human host, providing information to direct drug and vaccine design. We curated a dataset of 7666 public genome assemblies and analysed the emergence of genomic diversity over time. Our results are in line with previous estimates and point to all sequences sharing a common ancestor towards the end of 2019, supporting this as the period when SARS-CoV-2 jumped into its human host. Due to extensive transmission, the genetic diversity of the virus in several countries recapitulates a large fraction of its worldwide genetic diversity. We identify regions of the SARS-CoV-2 genome that have remained largely invariant to date, and others that have already accumulated diversity. By focusing on mutations which have emerged independently multiple times (homoplasies), we identify 198 filtered recurrent mutations in the SARS-CoV-2 genome. Nearly 80% of the recurrent mutations produced non-synonymous changes at the protein level, suggesting possible ongoing adaptation of SARS-CoV-2. Three sites in Orf1ab in the regions encoding Nsp6, Nsp11, Nsp13, and one in the Spike protein are characterised by a particularly large number of recurrent mutations (>15 events) which may signpost convergent evolution and are of particular interest in the context of adaptation of SARS-CoV-2 to the human host. We additionally provide an interactive user-friendly web-application to query the alignment of the 7666 SARS-CoV-2 genomes.
spellingShingle van Dorp, L
Acman, M
Richard, D
Shaw, LP
Ford, CE
Ormond, L
Owen, CJ
Pang, J
Tan, CCS
Boshier, FAT
Ortiz, AT
Balloux, F
Emergence of genomic diversity and recurrent mutations in SARS-CoV-2
title Emergence of genomic diversity and recurrent mutations in SARS-CoV-2
title_full Emergence of genomic diversity and recurrent mutations in SARS-CoV-2
title_fullStr Emergence of genomic diversity and recurrent mutations in SARS-CoV-2
title_full_unstemmed Emergence of genomic diversity and recurrent mutations in SARS-CoV-2
title_short Emergence of genomic diversity and recurrent mutations in SARS-CoV-2
title_sort emergence of genomic diversity and recurrent mutations in sars cov 2
work_keys_str_mv AT vandorpl emergenceofgenomicdiversityandrecurrentmutationsinsarscov2
AT acmanm emergenceofgenomicdiversityandrecurrentmutationsinsarscov2
AT richardd emergenceofgenomicdiversityandrecurrentmutationsinsarscov2
AT shawlp emergenceofgenomicdiversityandrecurrentmutationsinsarscov2
AT fordce emergenceofgenomicdiversityandrecurrentmutationsinsarscov2
AT ormondl emergenceofgenomicdiversityandrecurrentmutationsinsarscov2
AT owencj emergenceofgenomicdiversityandrecurrentmutationsinsarscov2
AT pangj emergenceofgenomicdiversityandrecurrentmutationsinsarscov2
AT tanccs emergenceofgenomicdiversityandrecurrentmutationsinsarscov2
AT boshierfat emergenceofgenomicdiversityandrecurrentmutationsinsarscov2
AT ortizat emergenceofgenomicdiversityandrecurrentmutationsinsarscov2
AT ballouxf emergenceofgenomicdiversityandrecurrentmutationsinsarscov2