SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes
Despite its clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. We use comparative genomics to provide a high-confidence protein-coding gene set, characterize evolutionary constraint, and prioritize functional mutations. We select 44 Sarbecoviru...
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Published: |
Springer Science and Business Media LLC
2021
|
Online Access: | https://hdl.handle.net/1721.1/130581 |
_version_ | 1826200476849274880 |
---|---|
author | Jungreis, Irwin Sealfon, Rachel Kellis, Manolis |
author2 | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory |
author_facet | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Jungreis, Irwin Sealfon, Rachel Kellis, Manolis |
author_sort | Jungreis, Irwin |
collection | MIT |
description | Despite its clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. We use comparative genomics to provide a high-confidence protein-coding gene set, characterize evolutionary constraint, and prioritize functional mutations. We select 44 Sarbecovirus genomes at ideally-suited evolutionary distances, and quantify protein-coding evolutionary signatures and overlapping constraint. We find strong protein-coding signatures for ORFs 3a, 6, 7a, 7b, 8, 9b, and a novel alternate-frame gene, ORF3c, whereas ORFs 2b, 3d/3d-2, 3b, 9c, and 10 lack protein-coding signatures or convincing experimental evidence of protein-coding function. Furthermore, we show no other conserved protein-coding genes remain to be discovered. Mutation analysis suggests ORF8 contributes to within-individual fitness but not person-to-person transmission. Cross-strain and within-strain evolutionary pressures agree, except for fewer-than-expected within-strain mutations in nsp3 and S1, and more-than-expected in nucleocapsid, which shows a cluster of mutations in a predicted B-cell epitope, suggesting immune-avoidance selection. Evolutionary histories of residues disrupted by spike-protein substitutions D614G, N501Y, E484K, and K417N/T provide clues about their biology, and we catalog likely-functional co-inherited mutations. Previously reported RNA-modification sites show no enrichment for conservation. Here we report a high-confidence gene set and evolutionary-history annotations providing valuable resources and insights on SARS-CoV-2 biology, mutations, and evolution. |
first_indexed | 2024-09-23T11:37:04Z |
format | Article |
id | mit-1721.1/130581 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T11:37:04Z |
publishDate | 2021 |
publisher | Springer Science and Business Media LLC |
record_format | dspace |
spelling | mit-1721.1/1305812022-09-27T20:46:02Z SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes Jungreis, Irwin Sealfon, Rachel Kellis, Manolis Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Despite its clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. We use comparative genomics to provide a high-confidence protein-coding gene set, characterize evolutionary constraint, and prioritize functional mutations. We select 44 Sarbecovirus genomes at ideally-suited evolutionary distances, and quantify protein-coding evolutionary signatures and overlapping constraint. We find strong protein-coding signatures for ORFs 3a, 6, 7a, 7b, 8, 9b, and a novel alternate-frame gene, ORF3c, whereas ORFs 2b, 3d/3d-2, 3b, 9c, and 10 lack protein-coding signatures or convincing experimental evidence of protein-coding function. Furthermore, we show no other conserved protein-coding genes remain to be discovered. Mutation analysis suggests ORF8 contributes to within-individual fitness but not person-to-person transmission. Cross-strain and within-strain evolutionary pressures agree, except for fewer-than-expected within-strain mutations in nsp3 and S1, and more-than-expected in nucleocapsid, which shows a cluster of mutations in a predicted B-cell epitope, suggesting immune-avoidance selection. Evolutionary histories of residues disrupted by spike-protein substitutions D614G, N501Y, E484K, and K417N/T provide clues about their biology, and we catalog likely-functional co-inherited mutations. Previously reported RNA-modification sites show no enrichment for conservation. Here we report a high-confidence gene set and evolutionary-history annotations providing valuable resources and insights on SARS-CoV-2 biology, mutations, and evolution. 2021-05-12T19:37:41Z 2021-05-12T19:37:41Z 2021-05 2020-09 Article http://purl.org/eprint/type/JournalArticle 2041-1723 https://hdl.handle.net/1721.1/130581 Jungreis, Irwin et al. "SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes." Nature Communications 12, 1 (May 2021): 2642. © 2021 The Author(s) https://doi.org/10.1038/s41467-021-22905-7 Nature Communications Creative Commons Attribution 4.0 International license https://creativecommons.org/licenses/by/4.0/ application/pdf Springer Science and Business Media LLC Nature |
spellingShingle | Jungreis, Irwin Sealfon, Rachel Kellis, Manolis SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title | SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title_full | SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title_fullStr | SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title_full_unstemmed | SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title_short | SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title_sort | sars cov 2 gene content and covid 19 mutation impact by comparing 44 sarbecovirus genomes |
url | https://hdl.handle.net/1721.1/130581 |
work_keys_str_mv | AT jungreisirwin sarscov2genecontentandcovid19mutationimpactbycomparing44sarbecovirusgenomes AT sealfonrachel sarscov2genecontentandcovid19mutationimpactbycomparing44sarbecovirusgenomes AT kellismanolis sarscov2genecontentandcovid19mutationimpactbycomparing44sarbecovirusgenomes |