Evaluating the Performance of Multiple Sequence Alignment Programs with Application to Genotyping SARS-CoV-2 in the Saudi Population
This study explores the accuracy and efficiency of multiple sequence alignment (MSA) programs, focusing on Clustal<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mo>Ω</mo></semantics></math>&l...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-11-01
|
Series: | Computation |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-3197/11/11/212 |
_version_ | 1827640205903921152 |
---|---|
author | Aminah Alqahtani Meznah Almutairy |
author_facet | Aminah Alqahtani Meznah Almutairy |
author_sort | Aminah Alqahtani |
collection | DOAJ |
description | This study explores the accuracy and efficiency of multiple sequence alignment (MSA) programs, focusing on Clustal<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mo>Ω</mo></semantics></math></inline-formula>, MAFFT, and MUSCLE in the context of genotyping SARS-CoV-2 for the Saudi population. Our results indicate that MAFFT outperforms the others, making it an ideal choice for large-scale genomic analyses. The comparative performance of MSAs assembled using MergeAlign demonstrates that MAFFT and MUSCLE consistently exhibit higher accuracy than Clustal<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mo>Ω</mo></semantics></math></inline-formula> in both reference-based and consensus-based approaches. The evaluation of genotyping effectiveness reveals that the addition of a reference sequence, such as the SARS-CoV-2 Wuhan-Hu-1 isolate, does not significantly affect the alignment process, suggesting that using consensus sequences derived from individual MSA alignments may yield comparable genotyping outcomes. Investigating single-nucleotide polymorphisms (SNPs) and mutations highlights distinctive features of MSA programs. Clustal<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mo>Ω</mo></semantics></math></inline-formula> and MAFFT show similar counts, while MUSCLE displays the highest SNP count. High-frequency SNP analysis identifies MAFFT as the most accurate MSA program, emphasizing its reliability. Comparisons between Saudi and global SARS-CoV-2 populations underscore regional genetic variations. Saudis exhibit consistently higher frequencies of high-frequency SNPs, attributed to genetic similarity within the population. Transmission dynamics analysis reveals a higher frequency of co-mutations in the Saudi dataset, suggesting shared evolutionary patterns. These findings emphasize the importance of considering regional diversity in genetic analyses. |
first_indexed | 2024-03-09T16:54:41Z |
format | Article |
id | doaj.art-5b2ca1f546f548c0ac97f844cddb992c |
institution | Directory Open Access Journal |
issn | 2079-3197 |
language | English |
last_indexed | 2024-03-09T16:54:41Z |
publishDate | 2023-11-01 |
publisher | MDPI AG |
record_format | Article |
series | Computation |
spelling | doaj.art-5b2ca1f546f548c0ac97f844cddb992c2023-11-24T14:36:21ZengMDPI AGComputation2079-31972023-11-01111121210.3390/computation11110212Evaluating the Performance of Multiple Sequence Alignment Programs with Application to Genotyping SARS-CoV-2 in the Saudi PopulationAminah Alqahtani0Meznah Almutairy1Computer Science Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University, Riyadh 11564, Saudi ArabiaComputer Science Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University, Riyadh 11564, Saudi ArabiaThis study explores the accuracy and efficiency of multiple sequence alignment (MSA) programs, focusing on Clustal<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mo>Ω</mo></semantics></math></inline-formula>, MAFFT, and MUSCLE in the context of genotyping SARS-CoV-2 for the Saudi population. Our results indicate that MAFFT outperforms the others, making it an ideal choice for large-scale genomic analyses. The comparative performance of MSAs assembled using MergeAlign demonstrates that MAFFT and MUSCLE consistently exhibit higher accuracy than Clustal<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mo>Ω</mo></semantics></math></inline-formula> in both reference-based and consensus-based approaches. The evaluation of genotyping effectiveness reveals that the addition of a reference sequence, such as the SARS-CoV-2 Wuhan-Hu-1 isolate, does not significantly affect the alignment process, suggesting that using consensus sequences derived from individual MSA alignments may yield comparable genotyping outcomes. Investigating single-nucleotide polymorphisms (SNPs) and mutations highlights distinctive features of MSA programs. Clustal<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mo>Ω</mo></semantics></math></inline-formula> and MAFFT show similar counts, while MUSCLE displays the highest SNP count. High-frequency SNP analysis identifies MAFFT as the most accurate MSA program, emphasizing its reliability. Comparisons between Saudi and global SARS-CoV-2 populations underscore regional genetic variations. Saudis exhibit consistently higher frequencies of high-frequency SNPs, attributed to genetic similarity within the population. Transmission dynamics analysis reveals a higher frequency of co-mutations in the Saudi dataset, suggesting shared evolutionary patterns. These findings emphasize the importance of considering regional diversity in genetic analyses.https://www.mdpi.com/2079-3197/11/11/212multiple sequence alignment (MSA)consensus sequenceassembled MSAgenotypingSARS-CoV-2Saudi Arabia |
spellingShingle | Aminah Alqahtani Meznah Almutairy Evaluating the Performance of Multiple Sequence Alignment Programs with Application to Genotyping SARS-CoV-2 in the Saudi Population Computation multiple sequence alignment (MSA) consensus sequence assembled MSA genotyping SARS-CoV-2 Saudi Arabia |
title | Evaluating the Performance of Multiple Sequence Alignment Programs with Application to Genotyping SARS-CoV-2 in the Saudi Population |
title_full | Evaluating the Performance of Multiple Sequence Alignment Programs with Application to Genotyping SARS-CoV-2 in the Saudi Population |
title_fullStr | Evaluating the Performance of Multiple Sequence Alignment Programs with Application to Genotyping SARS-CoV-2 in the Saudi Population |
title_full_unstemmed | Evaluating the Performance of Multiple Sequence Alignment Programs with Application to Genotyping SARS-CoV-2 in the Saudi Population |
title_short | Evaluating the Performance of Multiple Sequence Alignment Programs with Application to Genotyping SARS-CoV-2 in the Saudi Population |
title_sort | evaluating the performance of multiple sequence alignment programs with application to genotyping sars cov 2 in the saudi population |
topic | multiple sequence alignment (MSA) consensus sequence assembled MSA genotyping SARS-CoV-2 Saudi Arabia |
url | https://www.mdpi.com/2079-3197/11/11/212 |
work_keys_str_mv | AT aminahalqahtani evaluatingtheperformanceofmultiplesequencealignmentprogramswithapplicationtogenotypingsarscov2inthesaudipopulation AT meznahalmutairy evaluatingtheperformanceofmultiplesequencealignmentprogramswithapplicationtogenotypingsarscov2inthesaudipopulation |