RASCL: Rapid Assessment of Selection in CLades through molecular sequence analysis.

An important unmet need revealed by the COVID-19 pandemic is the near-real-time identification of potentially fitness-altering mutations within rapidly growing SARS-CoV-2 lineages. Although powerful molecular sequence analysis methods are available to detect and characterize patterns of natural sele...

Full description

Bibliographic Details
Main Authors: Alexander G Lucaci, Jordan D Zehr, Stephen D Shank, Dave Bouvier, Alexander Ostrovsky, Han Mei, Anton Nekrutenko, Darren P Martin, Sergei L Kosakovsky Pond
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0275623
_version_ 1797846509098631168
author Alexander G Lucaci
Jordan D Zehr
Stephen D Shank
Dave Bouvier
Alexander Ostrovsky
Han Mei
Anton Nekrutenko
Darren P Martin
Sergei L Kosakovsky Pond
author_facet Alexander G Lucaci
Jordan D Zehr
Stephen D Shank
Dave Bouvier
Alexander Ostrovsky
Han Mei
Anton Nekrutenko
Darren P Martin
Sergei L Kosakovsky Pond
author_sort Alexander G Lucaci
collection DOAJ
description An important unmet need revealed by the COVID-19 pandemic is the near-real-time identification of potentially fitness-altering mutations within rapidly growing SARS-CoV-2 lineages. Although powerful molecular sequence analysis methods are available to detect and characterize patterns of natural selection within modestly sized gene-sequence datasets, the computational complexity of these methods and their sensitivity to sequencing errors render them effectively inapplicable in large-scale genomic surveillance contexts. Motivated by the need to analyze new lineage evolution in near-real time using large numbers of genomes, we developed the Rapid Assessment of Selection within CLades (RASCL) pipeline. RASCL applies state of the art phylogenetic comparative methods to evaluate selective processes acting at individual codon sites and across whole genes. RASCL is scalable and produces automatically updated regular lineage-specific selection analysis reports: even for lineages that include tens or hundreds of thousands of sampled genome sequences. Key to this performance is (i) generation of automatically subsampled high quality datasets of gene/ORF sequences drawn from a selected "query" viral lineage; (ii) contextualization of these query sequences in codon alignments that include high-quality "background" sequences representative of global SARS-CoV-2 diversity; and (iii) the extensive parallelization of a suite of computationally intensive selection analysis tests. Within hours of being deployed to analyze a novel rapidly growing lineage of interest, RASCL will begin yielding JavaScript Object Notation (JSON)-formatted reports that can be either imported into third-party analysis software or explored in standard web-browsers using the premade RASCL interactive data visualization dashboard. By enabling the rapid detection of genome sites evolving under different selective regimes, RASCL is well-suited for near-real-time monitoring of the population-level selective processes that will likely underlie the emergence of future variants of concern in measurably evolving pathogens with extensive genomic surveillance.
first_indexed 2024-04-09T17:56:08Z
format Article
id doaj.art-bca1a56d05294988afd66598cded39bb
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-04-09T17:56:08Z
publishDate 2022-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-bca1a56d05294988afd66598cded39bb2023-04-15T05:31:47ZengPublic Library of Science (PLoS)PLoS ONE1932-62032022-01-011711e027562310.1371/journal.pone.0275623RASCL: Rapid Assessment of Selection in CLades through molecular sequence analysis.Alexander G LucaciJordan D ZehrStephen D ShankDave BouvierAlexander OstrovskyHan MeiAnton NekrutenkoDarren P MartinSergei L Kosakovsky PondAn important unmet need revealed by the COVID-19 pandemic is the near-real-time identification of potentially fitness-altering mutations within rapidly growing SARS-CoV-2 lineages. Although powerful molecular sequence analysis methods are available to detect and characterize patterns of natural selection within modestly sized gene-sequence datasets, the computational complexity of these methods and their sensitivity to sequencing errors render them effectively inapplicable in large-scale genomic surveillance contexts. Motivated by the need to analyze new lineage evolution in near-real time using large numbers of genomes, we developed the Rapid Assessment of Selection within CLades (RASCL) pipeline. RASCL applies state of the art phylogenetic comparative methods to evaluate selective processes acting at individual codon sites and across whole genes. RASCL is scalable and produces automatically updated regular lineage-specific selection analysis reports: even for lineages that include tens or hundreds of thousands of sampled genome sequences. Key to this performance is (i) generation of automatically subsampled high quality datasets of gene/ORF sequences drawn from a selected "query" viral lineage; (ii) contextualization of these query sequences in codon alignments that include high-quality "background" sequences representative of global SARS-CoV-2 diversity; and (iii) the extensive parallelization of a suite of computationally intensive selection analysis tests. Within hours of being deployed to analyze a novel rapidly growing lineage of interest, RASCL will begin yielding JavaScript Object Notation (JSON)-formatted reports that can be either imported into third-party analysis software or explored in standard web-browsers using the premade RASCL interactive data visualization dashboard. By enabling the rapid detection of genome sites evolving under different selective regimes, RASCL is well-suited for near-real-time monitoring of the population-level selective processes that will likely underlie the emergence of future variants of concern in measurably evolving pathogens with extensive genomic surveillance.https://doi.org/10.1371/journal.pone.0275623
spellingShingle Alexander G Lucaci
Jordan D Zehr
Stephen D Shank
Dave Bouvier
Alexander Ostrovsky
Han Mei
Anton Nekrutenko
Darren P Martin
Sergei L Kosakovsky Pond
RASCL: Rapid Assessment of Selection in CLades through molecular sequence analysis.
PLoS ONE
title RASCL: Rapid Assessment of Selection in CLades through molecular sequence analysis.
title_full RASCL: Rapid Assessment of Selection in CLades through molecular sequence analysis.
title_fullStr RASCL: Rapid Assessment of Selection in CLades through molecular sequence analysis.
title_full_unstemmed RASCL: Rapid Assessment of Selection in CLades through molecular sequence analysis.
title_short RASCL: Rapid Assessment of Selection in CLades through molecular sequence analysis.
title_sort rascl rapid assessment of selection in clades through molecular sequence analysis
url https://doi.org/10.1371/journal.pone.0275623
work_keys_str_mv AT alexanderglucaci rasclrapidassessmentofselectionincladesthroughmolecularsequenceanalysis
AT jordandzehr rasclrapidassessmentofselectionincladesthroughmolecularsequenceanalysis
AT stephendshank rasclrapidassessmentofselectionincladesthroughmolecularsequenceanalysis
AT davebouvier rasclrapidassessmentofselectionincladesthroughmolecularsequenceanalysis
AT alexanderostrovsky rasclrapidassessmentofselectionincladesthroughmolecularsequenceanalysis
AT hanmei rasclrapidassessmentofselectionincladesthroughmolecularsequenceanalysis
AT antonnekrutenko rasclrapidassessmentofselectionincladesthroughmolecularsequenceanalysis
AT darrenpmartin rasclrapidassessmentofselectionincladesthroughmolecularsequenceanalysis
AT sergeilkosakovskypond rasclrapidassessmentofselectionincladesthroughmolecularsequenceanalysis