SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files

Sequence files formats (FASTA and FASTQ) are commonly used in bioinformatics, molecular biology and biochemistry. With the advent of next-generation sequencing (NGS) technologies, the number of FASTQ datasets produced and analyzed has grown exponentially, urging the development of dedicated software...

Full description

Bibliographic Details
Main Authors:	Andrea Telatin, Piero Fariselli, Giovanni Birolo
Format:	Article
Language:	English
Published:	MDPI AG 2021-05-01
Series:	Bioengineering
Subjects:	bioinformatics FASTQ FASTA software next-generation sequencing
Online Access:	https://www.mdpi.com/2306-5354/8/5/59

_version_	1827693186466709504
author	Andrea Telatin Piero Fariselli Giovanni Birolo
author_facet	Andrea Telatin Piero Fariselli Giovanni Birolo
author_sort	Andrea Telatin
collection	DOAJ
description	Sequence files formats (FASTA and FASTQ) are commonly used in bioinformatics, molecular biology and biochemistry. With the advent of next-generation sequencing (NGS) technologies, the number of FASTQ datasets produced and analyzed has grown exponentially, urging the development of dedicated software to handle, parse, and manipulate such files efficiently. Several bioinformatics packages are available to filter and manipulate FASTA and FASTQ files, yet some essential tasks remain poorly supported, leaving gaps that any workflow analysis of NGS datasets must fill with custom scripts. This can introduce harmful variability and performance bottlenecks in pivotal steps. Here we present a suite of tools, called SeqFu (Sequence Fastx utilities), that provides a broad range of commands to perform both common and specialist operations with ease and is designed to be easily implemented in high-performance analytical pipelines. SeqFu includes high-performance implementation of algorithms to interleave and deinterleave FASTQ files, merge Illumina lanes, and perform various quality controls (identification of degenerate primers, analysis of length statistics, extraction of portions of the datasets). SeqFu dereplicates sequences from multiple files keeping track of their provenance. SeqFu is developed in Nim for high-performance processing, is freely available, and can be installed with the popular package manager Miniconda.
first_indexed	2024-03-10T11:39:10Z
format	Article
id	doaj.art-27a186886ee24c789c80f629d1fb8b74
institution	Directory Open Access Journal
issn	2306-5354
language	English
last_indexed	2024-03-10T11:39:10Z
publishDate	2021-05-01
publisher	MDPI AG
record_format	Article
series	Bioengineering
spelling	doaj.art-27a186886ee24c789c80f629d1fb8b742023-11-21T18:37:52ZengMDPI AGBioengineering2306-53542021-05-01855910.3390/bioengineering8050059SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence FilesAndrea Telatin0Piero Fariselli1Giovanni Birolo2Gut Microbes and Health Programme, Quadram Institute Bioscience, Norwich NR4 7UQ, UKDepartment of Medical Sciences, University of Turin, 10126 Torino, ItalyDepartment of Medical Sciences, University of Turin, 10126 Torino, ItalySequence files formats (FASTA and FASTQ) are commonly used in bioinformatics, molecular biology and biochemistry. With the advent of next-generation sequencing (NGS) technologies, the number of FASTQ datasets produced and analyzed has grown exponentially, urging the development of dedicated software to handle, parse, and manipulate such files efficiently. Several bioinformatics packages are available to filter and manipulate FASTA and FASTQ files, yet some essential tasks remain poorly supported, leaving gaps that any workflow analysis of NGS datasets must fill with custom scripts. This can introduce harmful variability and performance bottlenecks in pivotal steps. Here we present a suite of tools, called SeqFu (Sequence Fastx utilities), that provides a broad range of commands to perform both common and specialist operations with ease and is designed to be easily implemented in high-performance analytical pipelines. SeqFu includes high-performance implementation of algorithms to interleave and deinterleave FASTQ files, merge Illumina lanes, and perform various quality controls (identification of degenerate primers, analysis of length statistics, extraction of portions of the datasets). SeqFu dereplicates sequences from multiple files keeping track of their provenance. SeqFu is developed in Nim for high-performance processing, is freely available, and can be installed with the popular package manager Miniconda.https://www.mdpi.com/2306-5354/8/5/59bioinformaticsFASTQFASTAsoftwarenext-generation sequencing
spellingShingle	Andrea Telatin Piero Fariselli Giovanni Birolo SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files Bioengineering bioinformatics FASTQ FASTA software next-generation sequencing
title	SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files
title_full	SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files
title_fullStr	SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files
title_full_unstemmed	SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files
title_short	SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files
title_sort	seqfu a suite of utilities for the robust and reproducible manipulation of sequence files
topic	bioinformatics FASTQ FASTA software next-generation sequencing
url	https://www.mdpi.com/2306-5354/8/5/59
work_keys_str_mv	AT andreatelatin seqfuasuiteofutilitiesfortherobustandreproduciblemanipulationofsequencefiles AT pierofariselli seqfuasuiteofutilitiesfortherobustandreproduciblemanipulationofsequencefiles AT giovannibirolo seqfuasuiteofutilitiesfortherobustandreproduciblemanipulationofsequencefiles

SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files

Similar Items