Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review.

The recent decrease in cost and time to sequence and assemble of complete genomes created an increased demand for data storage. As a consequence, several strategies for assembled biological data compression were created. Vertical compression tools implement strategies that take advantage of the high...

Full description

Bibliographic Details
Main Authors: Kelvin V Kredens, Juliano V Martins, Osmar B Dordal, Mauri Ferrandin, Roberto H Herai, Edson E Scalabrin, Bráulio C Ávila
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2020-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0232942
_version_ 1819147726938439680
author Kelvin V Kredens
Juliano V Martins
Osmar B Dordal
Mauri Ferrandin
Roberto H Herai
Edson E Scalabrin
Bráulio C Ávila
author_facet Kelvin V Kredens
Juliano V Martins
Osmar B Dordal
Mauri Ferrandin
Roberto H Herai
Edson E Scalabrin
Bráulio C Ávila
author_sort Kelvin V Kredens
collection DOAJ
description The recent decrease in cost and time to sequence and assemble of complete genomes created an increased demand for data storage. As a consequence, several strategies for assembled biological data compression were created. Vertical compression tools implement strategies that take advantage of the high level of similarity between multiple assembled genomic sequences for better compression results. However, current reviews on vertical compression do not compare the execution flow of each tool, which is constituted by phases of preprocessing, transformation, and data encoding. We performed a systematic literature review to identify and compare existing tools for vertical compression of assembled genomic sequences. The review was centered on PubMed and Scopus, in which 45726 distinct papers were considered. Next, 32 papers were selected according to the following criteria: to present a lossless vertical compression tool; to use the information contained in other sequences for the compression; to be able to manipulate genomic sequences in FASTA format; and no need prior knowledge. Although we extracted performance compression results, they were not compared as the tools did not use a standardized evaluation protocol. Thus, we conclude that there's a lack of definition of an evaluation protocol that must be applied by each tool.
first_indexed 2024-12-22T13:34:24Z
format Article
id doaj.art-cf313749a90b46938105c0c7ad86271d
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-22T13:34:24Z
publishDate 2020-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-cf313749a90b46938105c0c7ad86271d2022-12-21T18:24:06ZengPublic Library of Science (PLoS)PLoS ONE1932-62032020-01-01155e023294210.1371/journal.pone.0232942Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review.Kelvin V KredensJuliano V MartinsOsmar B DordalMauri FerrandinRoberto H HeraiEdson E ScalabrinBráulio C ÁvilaThe recent decrease in cost and time to sequence and assemble of complete genomes created an increased demand for data storage. As a consequence, several strategies for assembled biological data compression were created. Vertical compression tools implement strategies that take advantage of the high level of similarity between multiple assembled genomic sequences for better compression results. However, current reviews on vertical compression do not compare the execution flow of each tool, which is constituted by phases of preprocessing, transformation, and data encoding. We performed a systematic literature review to identify and compare existing tools for vertical compression of assembled genomic sequences. The review was centered on PubMed and Scopus, in which 45726 distinct papers were considered. Next, 32 papers were selected according to the following criteria: to present a lossless vertical compression tool; to use the information contained in other sequences for the compression; to be able to manipulate genomic sequences in FASTA format; and no need prior knowledge. Although we extracted performance compression results, they were not compared as the tools did not use a standardized evaluation protocol. Thus, we conclude that there's a lack of definition of an evaluation protocol that must be applied by each tool.https://doi.org/10.1371/journal.pone.0232942
spellingShingle Kelvin V Kredens
Juliano V Martins
Osmar B Dordal
Mauri Ferrandin
Roberto H Herai
Edson E Scalabrin
Bráulio C Ávila
Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review.
PLoS ONE
title Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review.
title_full Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review.
title_fullStr Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review.
title_full_unstemmed Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review.
title_short Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review.
title_sort vertical lossless genomic data compression tools for assembled genomes a systematic literature review
url https://doi.org/10.1371/journal.pone.0232942
work_keys_str_mv AT kelvinvkredens verticallosslessgenomicdatacompressiontoolsforassembledgenomesasystematicliteraturereview
AT julianovmartins verticallosslessgenomicdatacompressiontoolsforassembledgenomesasystematicliteraturereview
AT osmarbdordal verticallosslessgenomicdatacompressiontoolsforassembledgenomesasystematicliteraturereview
AT mauriferrandin verticallosslessgenomicdatacompressiontoolsforassembledgenomesasystematicliteraturereview
AT robertohherai verticallosslessgenomicdatacompressiontoolsforassembledgenomesasystematicliteraturereview
AT edsonescalabrin verticallosslessgenomicdatacompressiontoolsforassembledgenomesasystematicliteraturereview
AT brauliocavila verticallosslessgenomicdatacompressiontoolsforassembledgenomesasystematicliteraturereview