Exploiting Redundancy to Achieve Lossy Text Compression

Regardless of the source language, text documents contain significant amount of redundancy. Data compression exploits this redundancy to improve transmission efficiency and/or save storage space. Conventionally, various lossless text compression algorithms have been introduced for critical applicati...

Full description

Bibliographic Details
Main Authors: Ebru CELIKEL CANKAYA, Venka PALANIAPPAN, Shahram LATIFI
Format: Article
Language:English
Published: Pamukkale University 2010-03-01
Series:Pamukkale University Journal of Engineering Sciences
Subjects:
Online Access:http://dergipark.ulakbim.gov.tr/pajes/article/view/5000088748
_version_ 1828014997681209344
author Ebru CELIKEL CANKAYA
Venka PALANIAPPAN
Shahram LATIFI
author_facet Ebru CELIKEL CANKAYA
Venka PALANIAPPAN
Shahram LATIFI
author_sort Ebru CELIKEL CANKAYA
collection DOAJ
description Regardless of the source language, text documents contain significant amount of redundancy. Data compression exploits this redundancy to improve transmission efficiency and/or save storage space. Conventionally, various lossless text compression algorithms have been introduced for critical applications, where any loss after recovery is intolerable. For non-critical applications, i.e. where data loss to some extent is acceptable, one may employ lossy compression to acquire superior efficiency. We use three recent techniques to achieve character-oriented lossy text compression: Letter Mapping (LM), Dropped Vowels (DV), and Replacement of Characters (RC), and use them as a front end anticipating to improve compression performance of conventional compression algorithms. We implement the scheme on English and Turkish sample texts and compare the results. Additionally, we include performance improvement rates for these models when used as a front end to Huffman and Arithmetic Coding algorithms. As for the future work, we propose several ideas to further improve the current performance of each model.
first_indexed 2024-04-10T10:11:23Z
format Article
id doaj.art-8d0cdea7ddf14f3493c2d3e1d6a61fa4
institution Directory Open Access Journal
issn 1300-7009
2147-5881
language English
last_indexed 2024-04-10T10:11:23Z
publishDate 2010-03-01
publisher Pamukkale University
record_format Article
series Pamukkale University Journal of Engineering Sciences
spelling doaj.art-8d0cdea7ddf14f3493c2d3e1d6a61fa42023-02-15T16:22:09ZengPamukkale UniversityPamukkale University Journal of Engineering Sciences1300-70092147-58812010-03-011632352455000082817Exploiting Redundancy to Achieve Lossy Text CompressionEbru CELIKEL CANKAYAVenka PALANIAPPANShahram LATIFIRegardless of the source language, text documents contain significant amount of redundancy. Data compression exploits this redundancy to improve transmission efficiency and/or save storage space. Conventionally, various lossless text compression algorithms have been introduced for critical applications, where any loss after recovery is intolerable. For non-critical applications, i.e. where data loss to some extent is acceptable, one may employ lossy compression to acquire superior efficiency. We use three recent techniques to achieve character-oriented lossy text compression: Letter Mapping (LM), Dropped Vowels (DV), and Replacement of Characters (RC), and use them as a front end anticipating to improve compression performance of conventional compression algorithms. We implement the scheme on English and Turkish sample texts and compare the results. Additionally, we include performance improvement rates for these models when used as a front end to Huffman and Arithmetic Coding algorithms. As for the future work, we propose several ideas to further improve the current performance of each model.http://dergipark.ulakbim.gov.tr/pajes/article/view/5000088748Kayıplı metin sıkıştırma, Harf eşleme, Düşürülen sesliler, Karakterlerin değiştirilmesi.
spellingShingle Ebru CELIKEL CANKAYA
Venka PALANIAPPAN
Shahram LATIFI
Exploiting Redundancy to Achieve Lossy Text Compression
Pamukkale University Journal of Engineering Sciences
Kayıplı metin sıkıştırma, Harf eşleme, Düşürülen sesliler, Karakterlerin değiştirilmesi.
title Exploiting Redundancy to Achieve Lossy Text Compression
title_full Exploiting Redundancy to Achieve Lossy Text Compression
title_fullStr Exploiting Redundancy to Achieve Lossy Text Compression
title_full_unstemmed Exploiting Redundancy to Achieve Lossy Text Compression
title_short Exploiting Redundancy to Achieve Lossy Text Compression
title_sort exploiting redundancy to achieve lossy text compression
topic Kayıplı metin sıkıştırma, Harf eşleme, Düşürülen sesliler, Karakterlerin değiştirilmesi.
url http://dergipark.ulakbim.gov.tr/pajes/article/view/5000088748
work_keys_str_mv AT ebrucelikelcankaya exploitingredundancytoachievelossytextcompression
AT venkapalaniappan exploitingredundancytoachievelossytextcompression
AT shahramlatifi exploitingredundancytoachievelossytextcompression