An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence

The use of compression techniques in various fields of data management is very encouraging lately. DNA data size becomes large, and this causes a problem of storage and data transfer. Common approach used is to put this datum into the server which adds to the cost of data management. Furthermore, th...

Full description

Bibliographic Details
Main Author: Ahmad, Nor Azhar
Format: Thesis
Published: 2010
Subjects:
_version_ 1796855899497693184
author Ahmad, Nor Azhar
author_facet Ahmad, Nor Azhar
author_sort Ahmad, Nor Azhar
collection ePrints
description The use of compression techniques in various fields of data management is very encouraging lately. DNA data size becomes large, and this causes a problem of storage and data transfer. Common approach used is to put this datum into the server which adds to the cost of data management. Furthermore, the transfer of online data is not the best solution anymore. For research center that has a low speed of Internet connection, the transfer is almost impossible to implement. This study proposed an enhancement of LZ77 algorithm, which is the common non-greedy, data dictionary type, using sliding windows concept for alphabethical data compression. By introducing sectioning sliding windows with hash table approach, the proposed compression algorithm can solve the storage problem of large DNA sequences. This implementation can speed up time and improve data compression rates. Two formats of DNA data (binary and FASTA) are tested and analysed. Simulation proved that, data compression rate shows promising results, with the addition of proportional size of the DNA, where it can compress at the rate of 56% per bit. Comparing to the LZ77 based DNA compression algorithm, BioCompress which has 44% of compress rate; the proposed algorithm has outperformed by 12%. Implications from this study will allow cost reduction in handling large scale DNA data.
first_indexed 2024-03-05T18:35:23Z
format Thesis
id utm.eprints-21289
institution Universiti Teknologi Malaysia - ePrints
last_indexed 2024-03-05T18:35:23Z
publishDate 2010
record_format dspace
spelling utm.eprints-212892020-03-03T07:29:51Z http://eprints.utm.my/21289/ An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence Ahmad, Nor Azhar Q Science (General) QA76 Computer software The use of compression techniques in various fields of data management is very encouraging lately. DNA data size becomes large, and this causes a problem of storage and data transfer. Common approach used is to put this datum into the server which adds to the cost of data management. Furthermore, the transfer of online data is not the best solution anymore. For research center that has a low speed of Internet connection, the transfer is almost impossible to implement. This study proposed an enhancement of LZ77 algorithm, which is the common non-greedy, data dictionary type, using sliding windows concept for alphabethical data compression. By introducing sectioning sliding windows with hash table approach, the proposed compression algorithm can solve the storage problem of large DNA sequences. This implementation can speed up time and improve data compression rates. Two formats of DNA data (binary and FASTA) are tested and analysed. Simulation proved that, data compression rate shows promising results, with the addition of proportional size of the DNA, where it can compress at the rate of 56% per bit. Comparing to the LZ77 based DNA compression algorithm, BioCompress which has 44% of compress rate; the proposed algorithm has outperformed by 12%. Implications from this study will allow cost reduction in handling large scale DNA data. 2010 Thesis NonPeerReviewed Ahmad, Nor Azhar (2010) An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence. Masters thesis, Universiti Teknologi Malaysia, Faculty of Computer Science and Information Systems.
spellingShingle Q Science (General)
QA76 Computer software
Ahmad, Nor Azhar
An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence
title An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence
title_full An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence
title_fullStr An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence
title_full_unstemmed An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence
title_short An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence
title_sort enhanced lz77 algorithm with hash table to compress large scale dna sequence
topic Q Science (General)
QA76 Computer software
work_keys_str_mv AT ahmadnorazhar anenhancedlz77algorithmwithhashtabletocompresslargescalednasequence
AT ahmadnorazhar enhancedlz77algorithmwithhashtabletocompresslargescalednasequence