EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment

Abstract Background Adding sequences into an existing (possibly user-provided) alignment has multiple applications, including updating a large alignment with new data, adding sequences into a constraint alignment constructed using biological knowledge, or computing alignments in the presence of sequ...

Full description

Bibliographic Details
Main Authors: Chengze Shen, Baqiao Liu, Kelly P. Williams, Tandy Warnow
Format: Article
Language:English
Published: BMC 2023-12-01
Series:Algorithms for Molecular Biology
Subjects:
Online Access:https://doi.org/10.1186/s13015-023-00247-x
_version_ 1797398132879785984
author Chengze Shen
Baqiao Liu
Kelly P. Williams
Tandy Warnow
author_facet Chengze Shen
Baqiao Liu
Kelly P. Williams
Tandy Warnow
author_sort Chengze Shen
collection DOAJ
description Abstract Background Adding sequences into an existing (possibly user-provided) alignment has multiple applications, including updating a large alignment with new data, adding sequences into a constraint alignment constructed using biological knowledge, or computing alignments in the presence of sequence length heterogeneity. Although this is a natural problem, only a few tools have been developed to use this information with high fidelity. Results We present EMMA (Extending Multiple alignments using MAFFT--add) for the problem of adding a set of unaligned sequences into a multiple sequence alignment (i.e., a constraint alignment). EMMA builds on MAFFT--add, which is also designed to add sequences into a given constraint alignment. EMMA improves on MAFFT--add methods by using a divide-and-conquer framework to scale its most accurate version, MAFFT-linsi--add, to constraint alignments with many sequences. We show that EMMA has an accuracy advantage over other techniques for adding sequences into alignments under many realistic conditions and can scale to large datasets with high accuracy (hundreds of thousands of sequences). EMMA is available at https://github.com/c5shen/EMMA . Conclusions EMMA is a new tool that provides high accuracy and scalability for adding sequences into an existing alignment.
first_indexed 2024-03-09T01:21:18Z
format Article
id doaj.art-3ff599092538409eb4e3b3b0db256f89
institution Directory Open Access Journal
issn 1748-7188
language English
last_indexed 2024-03-09T01:21:18Z
publishDate 2023-12-01
publisher BMC
record_format Article
series Algorithms for Molecular Biology
spelling doaj.art-3ff599092538409eb4e3b3b0db256f892023-12-10T12:07:38ZengBMCAlgorithms for Molecular Biology1748-71882023-12-0118111410.1186/s13015-023-00247-xEMMA: a new method for computing multiple sequence alignments given a constraint subset alignmentChengze Shen0Baqiao Liu1Kelly P. Williams2Tandy Warnow3Computer Science, University of Illinois, Urbana-ChampaignComputer Science, University of Illinois, Urbana-ChampaignSandia National LaboratoriesComputer Science, University of Illinois, Urbana-ChampaignAbstract Background Adding sequences into an existing (possibly user-provided) alignment has multiple applications, including updating a large alignment with new data, adding sequences into a constraint alignment constructed using biological knowledge, or computing alignments in the presence of sequence length heterogeneity. Although this is a natural problem, only a few tools have been developed to use this information with high fidelity. Results We present EMMA (Extending Multiple alignments using MAFFT--add) for the problem of adding a set of unaligned sequences into a multiple sequence alignment (i.e., a constraint alignment). EMMA builds on MAFFT--add, which is also designed to add sequences into a given constraint alignment. EMMA improves on MAFFT--add methods by using a divide-and-conquer framework to scale its most accurate version, MAFFT-linsi--add, to constraint alignments with many sequences. We show that EMMA has an accuracy advantage over other techniques for adding sequences into alignments under many realistic conditions and can scale to large datasets with high accuracy (hundreds of thousands of sequences). EMMA is available at https://github.com/c5shen/EMMA . Conclusions EMMA is a new tool that provides high accuracy and scalability for adding sequences into an existing alignment.https://doi.org/10.1186/s13015-023-00247-xMultiple sequence alignmentConstraint alignmentMAFFT
spellingShingle Chengze Shen
Baqiao Liu
Kelly P. Williams
Tandy Warnow
EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment
Algorithms for Molecular Biology
Multiple sequence alignment
Constraint alignment
MAFFT
title EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment
title_full EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment
title_fullStr EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment
title_full_unstemmed EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment
title_short EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment
title_sort emma a new method for computing multiple sequence alignments given a constraint subset alignment
topic Multiple sequence alignment
Constraint alignment
MAFFT
url https://doi.org/10.1186/s13015-023-00247-x
work_keys_str_mv AT chengzeshen emmaanewmethodforcomputingmultiplesequencealignmentsgivenaconstraintsubsetalignment
AT baqiaoliu emmaanewmethodforcomputingmultiplesequencealignmentsgivenaconstraintsubsetalignment
AT kellypwilliams emmaanewmethodforcomputingmultiplesequencealignmentsgivenaconstraintsubsetalignment
AT tandywarnow emmaanewmethodforcomputingmultiplesequencealignmentsgivenaconstraintsubsetalignment