Identifying featured indels associated with SARS-CoV-2 fitness

ABSTRACT As an RNA virus, severe acute respiratory coronavirus 2 (SARS-CoV-2) is known for frequent substitution mutations, and substitutions in important genome regions are often associated with viral fitness. However, whether indel mutations are related to viral fitness is generally ignored. Here...

Full description

Bibliographic Details
Main Authors: Xiang Li, Hongliang Yan, Gary Wong, Wanli Ouyang, Jie Cui
Format: Article
Language:English
Published: American Society for Microbiology 2023-10-01
Series:Microbiology Spectrum
Subjects:
Online Access:https://journals.asm.org/doi/10.1128/spectrum.02269-23
_version_ 1797658218388783104
author Xiang Li
Hongliang Yan
Gary Wong
Wanli Ouyang
Jie Cui
author_facet Xiang Li
Hongliang Yan
Gary Wong
Wanli Ouyang
Jie Cui
author_sort Xiang Li
collection DOAJ
description ABSTRACT As an RNA virus, severe acute respiratory coronavirus 2 (SARS-CoV-2) is known for frequent substitution mutations, and substitutions in important genome regions are often associated with viral fitness. However, whether indel mutations are related to viral fitness is generally ignored. Here we developed a computational methodology to investigate indels linked to fitness occurring in over 9 million SARS-CoV-2 genomes. Remarkably, by analyzing 31,642,404 deletion records and 1,981,308 insertion records, our pipeline identified 26,765 deletion types and 21,054 insertion types and discovered 65 indel types with a significant association with Pango lineages. We proposed the concept of featured indels representing the population of specific Pango lineages and variants as substitution mutations and termed these 65 indels as featured indels. The selective pressure of all indel types is assessed using the Bayesian model to explore the importance of indels. Our results exhibited higher selective pressure of indels like substitution mutations, which are important for assessing viral fitness and consistent with previous studies in vitro. Evaluation of the growth rate of each viral lineage indicated that indels play key roles in SARS-CoV-2 evolution and deserve more attention as substitution mutations. IMPORTANCE The fitness of indels in pathogen genome evolution has rarely been studied. We developed a computational methodology to investigate the severe acute respiratory coronavirus 2 genomes and analyze over 33 million records of indels systematically, ultimately proposing the concept of featured indels that can represent specific Pango lineages and identifying 65 featured indels. Machine learning model based on Bayesian inference and viral lineage growth rate evaluation suggests that these featured indels exhibit selection pressure comparable to replacement mutations. In conclusion, indels are not negligible for evaluating viral fitness.
first_indexed 2024-03-11T17:56:03Z
format Article
id doaj.art-59dd72b1404d426db9eed45153eadf74
institution Directory Open Access Journal
issn 2165-0497
language English
last_indexed 2024-03-11T17:56:03Z
publishDate 2023-10-01
publisher American Society for Microbiology
record_format Article
series Microbiology Spectrum
spelling doaj.art-59dd72b1404d426db9eed45153eadf742023-10-17T13:04:35ZengAmerican Society for MicrobiologyMicrobiology Spectrum2165-04972023-10-0111510.1128/spectrum.02269-23Identifying featured indels associated with SARS-CoV-2 fitnessXiang Li0Hongliang Yan1Gary Wong2Wanli Ouyang3Jie Cui4CAS Key Laboratory of Molecular Virology & Immunology, Shanghai Institute of Immunity and Infection, Chinese Academy of Sciences , Shanghai, ChinaAI for Science, Shanghai Artificial Intelligence Laboratory , Shanghai, ChinaCAS Key Laboratory of Molecular Virology & Immunology, Shanghai Institute of Immunity and Infection, Chinese Academy of Sciences , Shanghai, ChinaAI for Science, Shanghai Artificial Intelligence Laboratory , Shanghai, ChinaCAS Key Laboratory of Molecular Virology & Immunology, Shanghai Institute of Immunity and Infection, Chinese Academy of Sciences , Shanghai, ChinaABSTRACT As an RNA virus, severe acute respiratory coronavirus 2 (SARS-CoV-2) is known for frequent substitution mutations, and substitutions in important genome regions are often associated with viral fitness. However, whether indel mutations are related to viral fitness is generally ignored. Here we developed a computational methodology to investigate indels linked to fitness occurring in over 9 million SARS-CoV-2 genomes. Remarkably, by analyzing 31,642,404 deletion records and 1,981,308 insertion records, our pipeline identified 26,765 deletion types and 21,054 insertion types and discovered 65 indel types with a significant association with Pango lineages. We proposed the concept of featured indels representing the population of specific Pango lineages and variants as substitution mutations and termed these 65 indels as featured indels. The selective pressure of all indel types is assessed using the Bayesian model to explore the importance of indels. Our results exhibited higher selective pressure of indels like substitution mutations, which are important for assessing viral fitness and consistent with previous studies in vitro. Evaluation of the growth rate of each viral lineage indicated that indels play key roles in SARS-CoV-2 evolution and deserve more attention as substitution mutations. IMPORTANCE The fitness of indels in pathogen genome evolution has rarely been studied. We developed a computational methodology to investigate the severe acute respiratory coronavirus 2 genomes and analyze over 33 million records of indels systematically, ultimately proposing the concept of featured indels that can represent specific Pango lineages and identifying 65 featured indels. Machine learning model based on Bayesian inference and viral lineage growth rate evaluation suggests that these featured indels exhibit selection pressure comparable to replacement mutations. In conclusion, indels are not negligible for evaluating viral fitness.https://journals.asm.org/doi/10.1128/spectrum.02269-23SARS-CoV-2featured indelsmachine learninggenomic epidemiologyevolution
spellingShingle Xiang Li
Hongliang Yan
Gary Wong
Wanli Ouyang
Jie Cui
Identifying featured indels associated with SARS-CoV-2 fitness
Microbiology Spectrum
SARS-CoV-2
featured indels
machine learning
genomic epidemiology
evolution
title Identifying featured indels associated with SARS-CoV-2 fitness
title_full Identifying featured indels associated with SARS-CoV-2 fitness
title_fullStr Identifying featured indels associated with SARS-CoV-2 fitness
title_full_unstemmed Identifying featured indels associated with SARS-CoV-2 fitness
title_short Identifying featured indels associated with SARS-CoV-2 fitness
title_sort identifying featured indels associated with sars cov 2 fitness
topic SARS-CoV-2
featured indels
machine learning
genomic epidemiology
evolution
url https://journals.asm.org/doi/10.1128/spectrum.02269-23
work_keys_str_mv AT xiangli identifyingfeaturedindelsassociatedwithsarscov2fitness
AT hongliangyan identifyingfeaturedindelsassociatedwithsarscov2fitness
AT garywong identifyingfeaturedindelsassociatedwithsarscov2fitness
AT wanliouyang identifyingfeaturedindelsassociatedwithsarscov2fitness
AT jiecui identifyingfeaturedindelsassociatedwithsarscov2fitness