Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review.
The recent decrease in cost and time to sequence and assemble of complete genomes created an increased demand for data storage. As a consequence, several strategies for assembled biological data compression were created. Vertical compression tools implement strategies that take advantage of the high...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2020-01-01
|
Series: | PLoS ONE |
Online Access: | https://doi.org/10.1371/journal.pone.0232942 |
_version_ | 1819147726938439680 |
---|---|
author | Kelvin V Kredens Juliano V Martins Osmar B Dordal Mauri Ferrandin Roberto H Herai Edson E Scalabrin Bráulio C Ávila |
author_facet | Kelvin V Kredens Juliano V Martins Osmar B Dordal Mauri Ferrandin Roberto H Herai Edson E Scalabrin Bráulio C Ávila |
author_sort | Kelvin V Kredens |
collection | DOAJ |
description | The recent decrease in cost and time to sequence and assemble of complete genomes created an increased demand for data storage. As a consequence, several strategies for assembled biological data compression were created. Vertical compression tools implement strategies that take advantage of the high level of similarity between multiple assembled genomic sequences for better compression results. However, current reviews on vertical compression do not compare the execution flow of each tool, which is constituted by phases of preprocessing, transformation, and data encoding. We performed a systematic literature review to identify and compare existing tools for vertical compression of assembled genomic sequences. The review was centered on PubMed and Scopus, in which 45726 distinct papers were considered. Next, 32 papers were selected according to the following criteria: to present a lossless vertical compression tool; to use the information contained in other sequences for the compression; to be able to manipulate genomic sequences in FASTA format; and no need prior knowledge. Although we extracted performance compression results, they were not compared as the tools did not use a standardized evaluation protocol. Thus, we conclude that there's a lack of definition of an evaluation protocol that must be applied by each tool. |
first_indexed | 2024-12-22T13:34:24Z |
format | Article |
id | doaj.art-cf313749a90b46938105c0c7ad86271d |
institution | Directory Open Access Journal |
issn | 1932-6203 |
language | English |
last_indexed | 2024-12-22T13:34:24Z |
publishDate | 2020-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj.art-cf313749a90b46938105c0c7ad86271d2022-12-21T18:24:06ZengPublic Library of Science (PLoS)PLoS ONE1932-62032020-01-01155e023294210.1371/journal.pone.0232942Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review.Kelvin V KredensJuliano V MartinsOsmar B DordalMauri FerrandinRoberto H HeraiEdson E ScalabrinBráulio C ÁvilaThe recent decrease in cost and time to sequence and assemble of complete genomes created an increased demand for data storage. As a consequence, several strategies for assembled biological data compression were created. Vertical compression tools implement strategies that take advantage of the high level of similarity between multiple assembled genomic sequences for better compression results. However, current reviews on vertical compression do not compare the execution flow of each tool, which is constituted by phases of preprocessing, transformation, and data encoding. We performed a systematic literature review to identify and compare existing tools for vertical compression of assembled genomic sequences. The review was centered on PubMed and Scopus, in which 45726 distinct papers were considered. Next, 32 papers were selected according to the following criteria: to present a lossless vertical compression tool; to use the information contained in other sequences for the compression; to be able to manipulate genomic sequences in FASTA format; and no need prior knowledge. Although we extracted performance compression results, they were not compared as the tools did not use a standardized evaluation protocol. Thus, we conclude that there's a lack of definition of an evaluation protocol that must be applied by each tool.https://doi.org/10.1371/journal.pone.0232942 |
spellingShingle | Kelvin V Kredens Juliano V Martins Osmar B Dordal Mauri Ferrandin Roberto H Herai Edson E Scalabrin Bráulio C Ávila Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review. PLoS ONE |
title | Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review. |
title_full | Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review. |
title_fullStr | Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review. |
title_full_unstemmed | Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review. |
title_short | Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review. |
title_sort | vertical lossless genomic data compression tools for assembled genomes a systematic literature review |
url | https://doi.org/10.1371/journal.pone.0232942 |
work_keys_str_mv | AT kelvinvkredens verticallosslessgenomicdatacompressiontoolsforassembledgenomesasystematicliteraturereview AT julianovmartins verticallosslessgenomicdatacompressiontoolsforassembledgenomesasystematicliteraturereview AT osmarbdordal verticallosslessgenomicdatacompressiontoolsforassembledgenomesasystematicliteraturereview AT mauriferrandin verticallosslessgenomicdatacompressiontoolsforassembledgenomesasystematicliteraturereview AT robertohherai verticallosslessgenomicdatacompressiontoolsforassembledgenomesasystematicliteraturereview AT edsonescalabrin verticallosslessgenomicdatacompressiontoolsforassembledgenomesasystematicliteraturereview AT brauliocavila verticallosslessgenomicdatacompressiontoolsforassembledgenomesasystematicliteraturereview |