AMAS: a fast tool for alignment manipulation and computing of summary statistics

The amount of data used in phylogenetics has grown explosively in the recent years and many phylogenies are inferred with hundreds or even thousands of loci and many taxa. These modern phylogenomic studies often entail separate analyses of each of the loci in addition to multiple analyses of subsets...

Full description

Bibliographic Details
Main Author: Marek L. Borowiec
Format: Article
Language:English
Published: PeerJ Inc. 2016-01-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/1660.pdf
_version_ 1797424327670366208
author Marek L. Borowiec
author_facet Marek L. Borowiec
author_sort Marek L. Borowiec
collection DOAJ
description The amount of data used in phylogenetics has grown explosively in the recent years and many phylogenies are inferred with hundreds or even thousands of loci and many taxa. These modern phylogenomic studies often entail separate analyses of each of the loci in addition to multiple analyses of subsets of genes or concatenated sequences. Computationally efficient tools for handling and computing properties of thousands of single-locus or large concatenated alignments are needed. Here I present AMAS (Alignment Manipulation And Summary), a tool that can be used either as a stand-alone command-line utility or as a Python package. AMAS works on amino acid and nucleotide alignments and combines capabilities of sequence manipulation with a function that calculates basic statistics. The manipulation functions include conversions among popular formats, concatenation, extracting sites and splitting according to a pre-defined partitioning scheme, creation of replicate data sets, and removal of taxa. The statistics calculated include the number of taxa, alignment length, total count of matrix cells, overall number of undetermined characters, percent of missing data, AT and GC contents (for DNA alignments), count and proportion of variable sites, count and proportion of parsimony informative sites, and counts of all characters relevant for a nucleotide or amino acid alphabet. AMAS is particularly suitable for very large alignments with hundreds of taxa and thousands of loci. It is computationally efficient, utilizes parallel processing, and performs better at concatenation than other popular tools. AMAS is a Python 3 program that relies solely on Python’s core modules and needs no additional dependencies. AMAS source code and manual can be downloaded from http://github.com/marekborowiec/AMAS/ under GNU General Public License.
first_indexed 2024-03-09T08:00:36Z
format Article
id doaj.art-77154b09f0a14c908d7d96e9d29bc03c
institution Directory Open Access Journal
issn 2167-8359
language English
last_indexed 2024-03-09T08:00:36Z
publishDate 2016-01-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj.art-77154b09f0a14c908d7d96e9d29bc03c2023-12-03T00:46:38ZengPeerJ Inc.PeerJ2167-83592016-01-014e166010.7717/peerj.1660AMAS: a fast tool for alignment manipulation and computing of summary statisticsMarek L. Borowiec0Department of Entomology and Nematology, UC Davis, Davis, United StatesThe amount of data used in phylogenetics has grown explosively in the recent years and many phylogenies are inferred with hundreds or even thousands of loci and many taxa. These modern phylogenomic studies often entail separate analyses of each of the loci in addition to multiple analyses of subsets of genes or concatenated sequences. Computationally efficient tools for handling and computing properties of thousands of single-locus or large concatenated alignments are needed. Here I present AMAS (Alignment Manipulation And Summary), a tool that can be used either as a stand-alone command-line utility or as a Python package. AMAS works on amino acid and nucleotide alignments and combines capabilities of sequence manipulation with a function that calculates basic statistics. The manipulation functions include conversions among popular formats, concatenation, extracting sites and splitting according to a pre-defined partitioning scheme, creation of replicate data sets, and removal of taxa. The statistics calculated include the number of taxa, alignment length, total count of matrix cells, overall number of undetermined characters, percent of missing data, AT and GC contents (for DNA alignments), count and proportion of variable sites, count and proportion of parsimony informative sites, and counts of all characters relevant for a nucleotide or amino acid alphabet. AMAS is particularly suitable for very large alignments with hundreds of taxa and thousands of loci. It is computationally efficient, utilizes parallel processing, and performs better at concatenation than other popular tools. AMAS is a Python 3 program that relies solely on Python’s core modules and needs no additional dependencies. AMAS source code and manual can be downloaded from http://github.com/marekborowiec/AMAS/ under GNU General Public License.https://peerj.com/articles/1660.pdfPhylogeneticsPhylogenomicsBioinformaticsAlignment propertiesConcatenation
spellingShingle Marek L. Borowiec
AMAS: a fast tool for alignment manipulation and computing of summary statistics
PeerJ
Phylogenetics
Phylogenomics
Bioinformatics
Alignment properties
Concatenation
title AMAS: a fast tool for alignment manipulation and computing of summary statistics
title_full AMAS: a fast tool for alignment manipulation and computing of summary statistics
title_fullStr AMAS: a fast tool for alignment manipulation and computing of summary statistics
title_full_unstemmed AMAS: a fast tool for alignment manipulation and computing of summary statistics
title_short AMAS: a fast tool for alignment manipulation and computing of summary statistics
title_sort amas a fast tool for alignment manipulation and computing of summary statistics
topic Phylogenetics
Phylogenomics
Bioinformatics
Alignment properties
Concatenation
url https://peerj.com/articles/1660.pdf
work_keys_str_mv AT mareklborowiec amasafasttoolforalignmentmanipulationandcomputingofsummarystatistics