CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments

Background Throughout biology, multiple sequence alignments (MSAs) form the basis of much investigation into biological features and relationships. These alignments are at the heart of many bioinformatics analyses. However, sequences in MSAs are often incomplete or very divergent, which can lead to...

Full description

Bibliographic Details
Main Authors: Charlotte Tumescheit, Andrew E. Firth, Katherine Brown
Format: Article
Language:English
Published: PeerJ Inc. 2022-03-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/12983.pdf
_version_ 1797417856684523520
author Charlotte Tumescheit
Andrew E. Firth
Katherine Brown
author_facet Charlotte Tumescheit
Andrew E. Firth
Katherine Brown
author_sort Charlotte Tumescheit
collection DOAJ
description Background Throughout biology, multiple sequence alignments (MSAs) form the basis of much investigation into biological features and relationships. These alignments are at the heart of many bioinformatics analyses. However, sequences in MSAs are often incomplete or very divergent, which can lead to poor alignment and large gaps. This slows down computation and can impact conclusions without being biologically relevant. Cleaning the alignment by removing common issues such as gaps, divergent sequences, large insertions and deletions and poorly aligned sequence ends can substantially improve analyses. Manual editing of MSAs is very widespread but is time-consuming and difficult to reproduce. Results We present a comprehensive, user-friendly MSA trimming tool with multiple visualisation options. Our highly customisable command line tool aims to give intervention power to the user by offering various options, and outputs graphical representations of the alignment before and after processing to give the user a clear overview of what has been removed. The main functionalities of the tool include removing regions of low coverage due to insertions, removing gaps, cropping poorly aligned sequence ends and removing sequences that are too divergent or too short. The thresholds for each function can be specified by the user and parameters can be adjusted to each individual MSA. CIAlign is designed with an emphasis on solving specific and common alignment problems and on providing transparency to the user. Conclusion CIAlign effectively removes problematic regions and sequences from MSAs and provides novel visualisation options. This tool can be used to fine-tune alignments for further analysis and processing. The tool is aimed at anyone who wishes to automatically clean up parts of an MSA and those requiring a new, accessible way of visualising large MSAs.
first_indexed 2024-03-09T06:24:44Z
format Article
id doaj.art-4a8b6c6a0f834d21a614de48c42ef951
institution Directory Open Access Journal
issn 2167-8359
language English
last_indexed 2024-03-09T06:24:44Z
publishDate 2022-03-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj.art-4a8b6c6a0f834d21a614de48c42ef9512023-12-03T11:22:54ZengPeerJ Inc.PeerJ2167-83592022-03-0110e1298310.7717/peerj.12983CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignmentsCharlotte Tumescheit0Andrew E. Firth1Katherine Brown2Department of Pathology, University of Cambridge, Cambridge, United KingdomDepartment of Pathology, University of Cambridge, Cambridge, United KingdomDepartment of Pathology, University of Cambridge, Cambridge, United KingdomBackground Throughout biology, multiple sequence alignments (MSAs) form the basis of much investigation into biological features and relationships. These alignments are at the heart of many bioinformatics analyses. However, sequences in MSAs are often incomplete or very divergent, which can lead to poor alignment and large gaps. This slows down computation and can impact conclusions without being biologically relevant. Cleaning the alignment by removing common issues such as gaps, divergent sequences, large insertions and deletions and poorly aligned sequence ends can substantially improve analyses. Manual editing of MSAs is very widespread but is time-consuming and difficult to reproduce. Results We present a comprehensive, user-friendly MSA trimming tool with multiple visualisation options. Our highly customisable command line tool aims to give intervention power to the user by offering various options, and outputs graphical representations of the alignment before and after processing to give the user a clear overview of what has been removed. The main functionalities of the tool include removing regions of low coverage due to insertions, removing gaps, cropping poorly aligned sequence ends and removing sequences that are too divergent or too short. The thresholds for each function can be specified by the user and parameters can be adjusted to each individual MSA. CIAlign is designed with an emphasis on solving specific and common alignment problems and on providing transparency to the user. Conclusion CIAlign effectively removes problematic regions and sequences from MSAs and provides novel visualisation options. This tool can be used to fine-tune alignments for further analysis and processing. The tool is aimed at anyone who wishes to automatically clean up parts of an MSA and those requiring a new, accessible way of visualising large MSAs.https://peerj.com/articles/12983.pdfMultiple sequence alignmentAlignment qualityPython toolComparative genomicsTranscriptomicsPhylogenetics
spellingShingle Charlotte Tumescheit
Andrew E. Firth
Katherine Brown
CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments
PeerJ
Multiple sequence alignment
Alignment quality
Python tool
Comparative genomics
Transcriptomics
Phylogenetics
title CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments
title_full CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments
title_fullStr CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments
title_full_unstemmed CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments
title_short CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments
title_sort cialign a highly customisable command line tool to clean interpret and visualise multiple sequence alignments
topic Multiple sequence alignment
Alignment quality
Python tool
Comparative genomics
Transcriptomics
Phylogenetics
url https://peerj.com/articles/12983.pdf
work_keys_str_mv AT charlottetumescheit cialignahighlycustomisablecommandlinetooltocleaninterpretandvisualisemultiplesequencealignments
AT andrewefirth cialignahighlycustomisablecommandlinetooltocleaninterpretandvisualisemultiplesequencealignments
AT katherinebrown cialignahighlycustomisablecommandlinetooltocleaninterpretandvisualisemultiplesequencealignments