Exploiting Redundancy to Achieve Lossy Text Compression
Regardless of the source language, text documents contain significant amount of redundancy. Data compression exploits this redundancy to improve transmission efficiency and/or save storage space. Conventionally, various lossless text compression algorithms have been introduced for critical applicati...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Pamukkale University
2010-03-01
|
Series: | Pamukkale University Journal of Engineering Sciences |
Subjects: | |
Online Access: | http://dergipark.ulakbim.gov.tr/pajes/article/view/5000088748 |
_version_ | 1828014997681209344 |
---|---|
author | Ebru CELIKEL CANKAYA Venka PALANIAPPAN Shahram LATIFI |
author_facet | Ebru CELIKEL CANKAYA Venka PALANIAPPAN Shahram LATIFI |
author_sort | Ebru CELIKEL CANKAYA |
collection | DOAJ |
description | Regardless of the source language, text documents contain significant amount of redundancy. Data compression exploits this redundancy to improve transmission efficiency and/or save storage space. Conventionally, various lossless text compression algorithms have been introduced for critical applications, where any loss after recovery is intolerable. For non-critical applications, i.e. where data loss to some extent is acceptable, one may employ lossy compression to acquire superior efficiency. We use three recent techniques to achieve character-oriented lossy text compression: Letter Mapping (LM), Dropped Vowels (DV), and Replacement of Characters (RC), and use them as a front end anticipating to improve compression performance of conventional compression algorithms. We implement the scheme on English and Turkish sample texts and compare the results. Additionally, we include performance improvement rates for these models when used as a front end to Huffman and Arithmetic Coding algorithms. As for the future work, we propose several ideas to further improve the current performance of each model. |
first_indexed | 2024-04-10T10:11:23Z |
format | Article |
id | doaj.art-8d0cdea7ddf14f3493c2d3e1d6a61fa4 |
institution | Directory Open Access Journal |
issn | 1300-7009 2147-5881 |
language | English |
last_indexed | 2024-04-10T10:11:23Z |
publishDate | 2010-03-01 |
publisher | Pamukkale University |
record_format | Article |
series | Pamukkale University Journal of Engineering Sciences |
spelling | doaj.art-8d0cdea7ddf14f3493c2d3e1d6a61fa42023-02-15T16:22:09ZengPamukkale UniversityPamukkale University Journal of Engineering Sciences1300-70092147-58812010-03-011632352455000082817Exploiting Redundancy to Achieve Lossy Text CompressionEbru CELIKEL CANKAYAVenka PALANIAPPANShahram LATIFIRegardless of the source language, text documents contain significant amount of redundancy. Data compression exploits this redundancy to improve transmission efficiency and/or save storage space. Conventionally, various lossless text compression algorithms have been introduced for critical applications, where any loss after recovery is intolerable. For non-critical applications, i.e. where data loss to some extent is acceptable, one may employ lossy compression to acquire superior efficiency. We use three recent techniques to achieve character-oriented lossy text compression: Letter Mapping (LM), Dropped Vowels (DV), and Replacement of Characters (RC), and use them as a front end anticipating to improve compression performance of conventional compression algorithms. We implement the scheme on English and Turkish sample texts and compare the results. Additionally, we include performance improvement rates for these models when used as a front end to Huffman and Arithmetic Coding algorithms. As for the future work, we propose several ideas to further improve the current performance of each model.http://dergipark.ulakbim.gov.tr/pajes/article/view/5000088748Kayıplı metin sıkıştırma, Harf eşleme, Düşürülen sesliler, Karakterlerin değiştirilmesi. |
spellingShingle | Ebru CELIKEL CANKAYA Venka PALANIAPPAN Shahram LATIFI Exploiting Redundancy to Achieve Lossy Text Compression Pamukkale University Journal of Engineering Sciences Kayıplı metin sıkıştırma, Harf eşleme, Düşürülen sesliler, Karakterlerin değiştirilmesi. |
title | Exploiting Redundancy to Achieve Lossy Text Compression |
title_full | Exploiting Redundancy to Achieve Lossy Text Compression |
title_fullStr | Exploiting Redundancy to Achieve Lossy Text Compression |
title_full_unstemmed | Exploiting Redundancy to Achieve Lossy Text Compression |
title_short | Exploiting Redundancy to Achieve Lossy Text Compression |
title_sort | exploiting redundancy to achieve lossy text compression |
topic | Kayıplı metin sıkıştırma, Harf eşleme, Düşürülen sesliler, Karakterlerin değiştirilmesi. |
url | http://dergipark.ulakbim.gov.tr/pajes/article/view/5000088748 |
work_keys_str_mv | AT ebrucelikelcankaya exploitingredundancytoachievelossytextcompression AT venkapalaniappan exploitingredundancytoachievelossytextcompression AT shahramlatifi exploitingredundancytoachievelossytextcompression |