Evaluating the Performance of Multiple Sequence Alignment Programs with Application to Genotyping SARS-CoV-2 in the Saudi Population

This study explores the accuracy and efficiency of multiple sequence alignment (MSA) programs, focusing on Clustal<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mo>Ω</mo></semantics></math>&l...

Full description

Bibliographic Details
Main Authors: Aminah Alqahtani, Meznah Almutairy
Format: Article
Language:English
Published: MDPI AG 2023-11-01
Series:Computation
Subjects:
Online Access:https://www.mdpi.com/2079-3197/11/11/212
_version_ 1797459666646597632
author Aminah Alqahtani
Meznah Almutairy
author_facet Aminah Alqahtani
Meznah Almutairy
author_sort Aminah Alqahtani
collection DOAJ
description This study explores the accuracy and efficiency of multiple sequence alignment (MSA) programs, focusing on Clustal<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mo>Ω</mo></semantics></math></inline-formula>, MAFFT, and MUSCLE in the context of genotyping SARS-CoV-2 for the Saudi population. Our results indicate that MAFFT outperforms the others, making it an ideal choice for large-scale genomic analyses. The comparative performance of MSAs assembled using MergeAlign demonstrates that MAFFT and MUSCLE consistently exhibit higher accuracy than Clustal<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mo>Ω</mo></semantics></math></inline-formula> in both reference-based and consensus-based approaches. The evaluation of genotyping effectiveness reveals that the addition of a reference sequence, such as the SARS-CoV-2 Wuhan-Hu-1 isolate, does not significantly affect the alignment process, suggesting that using consensus sequences derived from individual MSA alignments may yield comparable genotyping outcomes. Investigating single-nucleotide polymorphisms (SNPs) and mutations highlights distinctive features of MSA programs. Clustal<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mo>Ω</mo></semantics></math></inline-formula> and MAFFT show similar counts, while MUSCLE displays the highest SNP count. High-frequency SNP analysis identifies MAFFT as the most accurate MSA program, emphasizing its reliability. Comparisons between Saudi and global SARS-CoV-2 populations underscore regional genetic variations. Saudis exhibit consistently higher frequencies of high-frequency SNPs, attributed to genetic similarity within the population. Transmission dynamics analysis reveals a higher frequency of co-mutations in the Saudi dataset, suggesting shared evolutionary patterns. These findings emphasize the importance of considering regional diversity in genetic analyses.
first_indexed 2024-03-09T16:54:41Z
format Article
id doaj.art-5b2ca1f546f548c0ac97f844cddb992c
institution Directory Open Access Journal
issn 2079-3197
language English
last_indexed 2024-03-09T16:54:41Z
publishDate 2023-11-01
publisher MDPI AG
record_format Article
series Computation
spelling doaj.art-5b2ca1f546f548c0ac97f844cddb992c2023-11-24T14:36:21ZengMDPI AGComputation2079-31972023-11-01111121210.3390/computation11110212Evaluating the Performance of Multiple Sequence Alignment Programs with Application to Genotyping SARS-CoV-2 in the Saudi PopulationAminah Alqahtani0Meznah Almutairy1Computer Science Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University, Riyadh 11564, Saudi ArabiaComputer Science Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University, Riyadh 11564, Saudi ArabiaThis study explores the accuracy and efficiency of multiple sequence alignment (MSA) programs, focusing on Clustal<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mo>Ω</mo></semantics></math></inline-formula>, MAFFT, and MUSCLE in the context of genotyping SARS-CoV-2 for the Saudi population. Our results indicate that MAFFT outperforms the others, making it an ideal choice for large-scale genomic analyses. The comparative performance of MSAs assembled using MergeAlign demonstrates that MAFFT and MUSCLE consistently exhibit higher accuracy than Clustal<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mo>Ω</mo></semantics></math></inline-formula> in both reference-based and consensus-based approaches. The evaluation of genotyping effectiveness reveals that the addition of a reference sequence, such as the SARS-CoV-2 Wuhan-Hu-1 isolate, does not significantly affect the alignment process, suggesting that using consensus sequences derived from individual MSA alignments may yield comparable genotyping outcomes. Investigating single-nucleotide polymorphisms (SNPs) and mutations highlights distinctive features of MSA programs. Clustal<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mo>Ω</mo></semantics></math></inline-formula> and MAFFT show similar counts, while MUSCLE displays the highest SNP count. High-frequency SNP analysis identifies MAFFT as the most accurate MSA program, emphasizing its reliability. Comparisons between Saudi and global SARS-CoV-2 populations underscore regional genetic variations. Saudis exhibit consistently higher frequencies of high-frequency SNPs, attributed to genetic similarity within the population. Transmission dynamics analysis reveals a higher frequency of co-mutations in the Saudi dataset, suggesting shared evolutionary patterns. These findings emphasize the importance of considering regional diversity in genetic analyses.https://www.mdpi.com/2079-3197/11/11/212multiple sequence alignment (MSA)consensus sequenceassembled MSAgenotypingSARS-CoV-2Saudi Arabia
spellingShingle Aminah Alqahtani
Meznah Almutairy
Evaluating the Performance of Multiple Sequence Alignment Programs with Application to Genotyping SARS-CoV-2 in the Saudi Population
Computation
multiple sequence alignment (MSA)
consensus sequence
assembled MSA
genotyping
SARS-CoV-2
Saudi Arabia
title Evaluating the Performance of Multiple Sequence Alignment Programs with Application to Genotyping SARS-CoV-2 in the Saudi Population
title_full Evaluating the Performance of Multiple Sequence Alignment Programs with Application to Genotyping SARS-CoV-2 in the Saudi Population
title_fullStr Evaluating the Performance of Multiple Sequence Alignment Programs with Application to Genotyping SARS-CoV-2 in the Saudi Population
title_full_unstemmed Evaluating the Performance of Multiple Sequence Alignment Programs with Application to Genotyping SARS-CoV-2 in the Saudi Population
title_short Evaluating the Performance of Multiple Sequence Alignment Programs with Application to Genotyping SARS-CoV-2 in the Saudi Population
title_sort evaluating the performance of multiple sequence alignment programs with application to genotyping sars cov 2 in the saudi population
topic multiple sequence alignment (MSA)
consensus sequence
assembled MSA
genotyping
SARS-CoV-2
Saudi Arabia
url https://www.mdpi.com/2079-3197/11/11/212
work_keys_str_mv AT aminahalqahtani evaluatingtheperformanceofmultiplesequencealignmentprogramswithapplicationtogenotypingsarscov2inthesaudipopulation
AT meznahalmutairy evaluatingtheperformanceofmultiplesequencealignmentprogramswithapplicationtogenotypingsarscov2inthesaudipopulation