CNCA aligns small annotated genomes

Abstract Background To explore the evolutionary history of sequences, a sequence alignment is a first and necessary step, and its quality is crucial. In the context of the study of the proximal origins of SARS-CoV-2 coronavirus, we wanted to construct an alignment of genomes closely related to SARS-...

Full description

Bibliographic Details
Main Authors: Jean-Noël Lorenzi, François Graner, Virginie Courtier-Orgogozo, Guillaume Achaz
Format: Article
Language:English
Published: BMC 2024-02-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-024-05700-1
_version_ 1797273053384671232
author Jean-Noël Lorenzi
François Graner
Virginie Courtier-Orgogozo
Guillaume Achaz
author_facet Jean-Noël Lorenzi
François Graner
Virginie Courtier-Orgogozo
Guillaume Achaz
author_sort Jean-Noël Lorenzi
collection DOAJ
description Abstract Background To explore the evolutionary history of sequences, a sequence alignment is a first and necessary step, and its quality is crucial. In the context of the study of the proximal origins of SARS-CoV-2 coronavirus, we wanted to construct an alignment of genomes closely related to SARS-CoV-2 using both coding and non-coding sequences. To our knowledge, there is no tool that can be used to construct this type of alignment, which motivated the creation of CNCA. Results CNCA is a web tool that aligns annotated genomes from GenBank files. It generates a nucleotide alignment that is then updated based on the protein sequence alignment. The output final nucleotide alignment matches the protein alignment and guarantees no frameshift. CNCA was designed to align closely related small genome sequences up to 50 kb (typically viruses) for which the gene order is conserved. Conclusions CNCA constructs multiple alignments of small genomes by integrating both coding and non-coding sequences. This preserves regions traditionally ignored in conventional back-translation methods, such as non-coding regions.
first_indexed 2024-03-07T14:38:00Z
format Article
id doaj.art-ace9c721d87c4923b2543bd3c4cd3250
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-03-07T14:38:00Z
publishDate 2024-02-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-ace9c721d87c4923b2543bd3c4cd32502024-03-05T20:31:38ZengBMCBMC Bioinformatics1471-21052024-02-012511410.1186/s12859-024-05700-1CNCA aligns small annotated genomesJean-Noël Lorenzi0François Graner1Virginie Courtier-Orgogozo2Guillaume Achaz3Université Paris CitéUniversité Paris CitéUniversité Paris CitéSMILE Group, Center for Interdisciplinary Research in Biology (CIRB), Collège de FranceAbstract Background To explore the evolutionary history of sequences, a sequence alignment is a first and necessary step, and its quality is crucial. In the context of the study of the proximal origins of SARS-CoV-2 coronavirus, we wanted to construct an alignment of genomes closely related to SARS-CoV-2 using both coding and non-coding sequences. To our knowledge, there is no tool that can be used to construct this type of alignment, which motivated the creation of CNCA. Results CNCA is a web tool that aligns annotated genomes from GenBank files. It generates a nucleotide alignment that is then updated based on the protein sequence alignment. The output final nucleotide alignment matches the protein alignment and guarantees no frameshift. CNCA was designed to align closely related small genome sequences up to 50 kb (typically viruses) for which the gene order is conserved. Conclusions CNCA constructs multiple alignments of small genomes by integrating both coding and non-coding sequences. This preserves regions traditionally ignored in conventional back-translation methods, such as non-coding regions.https://doi.org/10.1186/s12859-024-05700-1Annotated genomesNucleotide alignmentProtein alignment
spellingShingle Jean-Noël Lorenzi
François Graner
Virginie Courtier-Orgogozo
Guillaume Achaz
CNCA aligns small annotated genomes
BMC Bioinformatics
Annotated genomes
Nucleotide alignment
Protein alignment
title CNCA aligns small annotated genomes
title_full CNCA aligns small annotated genomes
title_fullStr CNCA aligns small annotated genomes
title_full_unstemmed CNCA aligns small annotated genomes
title_short CNCA aligns small annotated genomes
title_sort cnca aligns small annotated genomes
topic Annotated genomes
Nucleotide alignment
Protein alignment
url https://doi.org/10.1186/s12859-024-05700-1
work_keys_str_mv AT jeannoellorenzi cncaalignssmallannotatedgenomes
AT francoisgraner cncaalignssmallannotatedgenomes
AT virginiecourtierorgogozo cncaalignssmallannotatedgenomes
AT guillaumeachaz cncaalignssmallannotatedgenomes