An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence
The use of compression techniques in various fields of data management is very encouraging lately. DNA data size becomes large, and this causes a problem of storage and data transfer. Common approach used is to put this datum into the server which adds to the cost of data management. Furthermore, th...
Main Author: | |
---|---|
Format: | Thesis |
Published: |
2010
|
Subjects: |
_version_ | 1796855899497693184 |
---|---|
author | Ahmad, Nor Azhar |
author_facet | Ahmad, Nor Azhar |
author_sort | Ahmad, Nor Azhar |
collection | ePrints |
description | The use of compression techniques in various fields of data management is very encouraging lately. DNA data size becomes large, and this causes a problem of storage and data transfer. Common approach used is to put this datum into the server which adds to the cost of data management. Furthermore, the transfer of online data is not the best solution anymore. For research center that has a low speed of Internet connection, the transfer is almost impossible to implement. This study proposed an enhancement of LZ77 algorithm, which is the common non-greedy, data dictionary type, using sliding windows concept for alphabethical data compression. By introducing sectioning sliding windows with hash table approach, the proposed compression algorithm can solve the storage problem of large DNA sequences. This implementation can speed up time and improve data compression rates. Two formats of DNA data (binary and FASTA) are tested and analysed. Simulation proved that, data compression rate shows promising results, with the addition of proportional size of the DNA, where it can compress at the rate of 56% per bit. Comparing to the LZ77 based DNA compression algorithm, BioCompress which has 44% of compress rate; the proposed algorithm has outperformed by 12%. Implications from this study will allow cost reduction in handling large scale DNA data. |
first_indexed | 2024-03-05T18:35:23Z |
format | Thesis |
id | utm.eprints-21289 |
institution | Universiti Teknologi Malaysia - ePrints |
last_indexed | 2024-03-05T18:35:23Z |
publishDate | 2010 |
record_format | dspace |
spelling | utm.eprints-212892020-03-03T07:29:51Z http://eprints.utm.my/21289/ An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence Ahmad, Nor Azhar Q Science (General) QA76 Computer software The use of compression techniques in various fields of data management is very encouraging lately. DNA data size becomes large, and this causes a problem of storage and data transfer. Common approach used is to put this datum into the server which adds to the cost of data management. Furthermore, the transfer of online data is not the best solution anymore. For research center that has a low speed of Internet connection, the transfer is almost impossible to implement. This study proposed an enhancement of LZ77 algorithm, which is the common non-greedy, data dictionary type, using sliding windows concept for alphabethical data compression. By introducing sectioning sliding windows with hash table approach, the proposed compression algorithm can solve the storage problem of large DNA sequences. This implementation can speed up time and improve data compression rates. Two formats of DNA data (binary and FASTA) are tested and analysed. Simulation proved that, data compression rate shows promising results, with the addition of proportional size of the DNA, where it can compress at the rate of 56% per bit. Comparing to the LZ77 based DNA compression algorithm, BioCompress which has 44% of compress rate; the proposed algorithm has outperformed by 12%. Implications from this study will allow cost reduction in handling large scale DNA data. 2010 Thesis NonPeerReviewed Ahmad, Nor Azhar (2010) An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence. Masters thesis, Universiti Teknologi Malaysia, Faculty of Computer Science and Information Systems. |
spellingShingle | Q Science (General) QA76 Computer software Ahmad, Nor Azhar An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence |
title | An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence |
title_full | An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence |
title_fullStr | An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence |
title_full_unstemmed | An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence |
title_short | An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence |
title_sort | enhanced lz77 algorithm with hash table to compress large scale dna sequence |
topic | Q Science (General) QA76 Computer software |
work_keys_str_mv | AT ahmadnorazhar anenhancedlz77algorithmwithhashtabletocompresslargescalednasequence AT ahmadnorazhar enhancedlz77algorithmwithhashtabletocompresslargescalednasequence |