IonCRAM: a reference-based compression tool for ion torrent sequence files

Abstract Background Ion Torrent is one of the major next generation sequencing (NGS) technologies and it is frequently used in medical research and diagnosis. The built-in software for the Ion Torrent sequencing machines delivers the sequencing results in the BAM format. In addition to the usual SAM...

Full description

Bibliographic Details
Main Authors: Moustafa Shokrof, Mohamed Abouelhoda
Format: Article
Language:English
Published: BMC 2020-09-01
Series:BMC Bioinformatics
Online Access:http://link.springer.com/article/10.1186/s12859-020-03726-9
_version_ 1818508381143433216
author Moustafa Shokrof
Mohamed Abouelhoda
author_facet Moustafa Shokrof
Mohamed Abouelhoda
author_sort Moustafa Shokrof
collection DOAJ
description Abstract Background Ion Torrent is one of the major next generation sequencing (NGS) technologies and it is frequently used in medical research and diagnosis. The built-in software for the Ion Torrent sequencing machines delivers the sequencing results in the BAM format. In addition to the usual SAM/BAM fields, the Ion Torrent BAM file includes technology-specific flow signal data. The flow signals occupy a big portion of the BAM file (about 75% for the human genome). Compressing SAM/BAM into CRAM format significantly reduces the space needed to store the NGS results. However, the tools for generating the CRAM formats are not designed to handle the flow signals. This missing feature has motivated us to develop a new program to improve the compression of the Ion Torrent files for long term archiving. Results In this paper, we present IonCRAM, the first reference-based compression tool to compress Ion Torrent BAM files for long term archiving. For the BAM files, IonCRAM could achieve a space saving of about 43%. This space saving is superior to what achieved with the CRAM format by about 8–9%. Conclusions Reducing the space consumption of NGS data reduces the cost of storage and data transfer. Therefore, developing efficient compression software for clinical NGS data goes beyond the computational interest; as it ultimately contributes to the overall cost reduction of the clinical test. The space saving achieved by our tool is a practical step in this direction. The tool is open source and available at Code Ocean, github, and http://ioncram.saudigenomeproject.com .
first_indexed 2024-12-10T22:31:09Z
format Article
id doaj.art-6546fc605dc84a37a9e760eb876d149d
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-10T22:31:09Z
publishDate 2020-09-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-6546fc605dc84a37a9e760eb876d149d2022-12-22T01:31:03ZengBMCBMC Bioinformatics1471-21052020-09-0121111610.1186/s12859-020-03726-9IonCRAM: a reference-based compression tool for ion torrent sequence filesMoustafa Shokrof0Mohamed Abouelhoda1Faculty of Computer Science, University of California at DavisKing Faisal Specialist Hospital and Research CenterAbstract Background Ion Torrent is one of the major next generation sequencing (NGS) technologies and it is frequently used in medical research and diagnosis. The built-in software for the Ion Torrent sequencing machines delivers the sequencing results in the BAM format. In addition to the usual SAM/BAM fields, the Ion Torrent BAM file includes technology-specific flow signal data. The flow signals occupy a big portion of the BAM file (about 75% for the human genome). Compressing SAM/BAM into CRAM format significantly reduces the space needed to store the NGS results. However, the tools for generating the CRAM formats are not designed to handle the flow signals. This missing feature has motivated us to develop a new program to improve the compression of the Ion Torrent files for long term archiving. Results In this paper, we present IonCRAM, the first reference-based compression tool to compress Ion Torrent BAM files for long term archiving. For the BAM files, IonCRAM could achieve a space saving of about 43%. This space saving is superior to what achieved with the CRAM format by about 8–9%. Conclusions Reducing the space consumption of NGS data reduces the cost of storage and data transfer. Therefore, developing efficient compression software for clinical NGS data goes beyond the computational interest; as it ultimately contributes to the overall cost reduction of the clinical test. The space saving achieved by our tool is a practical step in this direction. The tool is open source and available at Code Ocean, github, and http://ioncram.saudigenomeproject.com .http://link.springer.com/article/10.1186/s12859-020-03726-9
spellingShingle Moustafa Shokrof
Mohamed Abouelhoda
IonCRAM: a reference-based compression tool for ion torrent sequence files
BMC Bioinformatics
title IonCRAM: a reference-based compression tool for ion torrent sequence files
title_full IonCRAM: a reference-based compression tool for ion torrent sequence files
title_fullStr IonCRAM: a reference-based compression tool for ion torrent sequence files
title_full_unstemmed IonCRAM: a reference-based compression tool for ion torrent sequence files
title_short IonCRAM: a reference-based compression tool for ion torrent sequence files
title_sort ioncram a reference based compression tool for ion torrent sequence files
url http://link.springer.com/article/10.1186/s12859-020-03726-9
work_keys_str_mv AT moustafashokrof ioncramareferencebasedcompressiontoolforiontorrentsequencefiles
AT mohamedabouelhoda ioncramareferencebasedcompressiontoolforiontorrentsequencefiles