Sublinear Algorithms for Approximating String Compressibility

We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE) and a variant of Lempel-Ziv (LZ77), and present sublinear alg...

Full description

Bibliographic Details
Main Authors: Raskhodnikova, Sofya, Ron, Dana, Rubinfeld, Ronitt, Smith, Adam
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format: Article
Language:en_US
Published: Springer-Verlag 2012
Online Access:http://hdl.handle.net/1721.1/73520
https://orcid.org/0000-0002-4353-7639
_version_ 1826213022202331136
author Raskhodnikova, Sofya
Ron, Dana
Rubinfeld, Ronitt
Smith, Adam
author2 Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Raskhodnikova, Sofya
Ron, Dana
Rubinfeld, Ronitt
Smith, Adam
author_sort Raskhodnikova, Sofya
collection MIT
description We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE) and a variant of Lempel-Ziv (LZ77), and present sublinear algorithms for approximating compressibility with respect to both schemes. We also give several lower bounds that show that our algorithms for both schemes cannot be improved significantly. Our investigation of LZ77 yields results whose interest goes beyond the initial questions we set out to study. In particular, we prove combinatorial structural lemmas that relate the compressibility of a string with respect to LZ77 to the number of distinct short substrings contained in it (its ℓth subword complexity , for small ℓ). In addition, we show that approximating the compressibility with respect to LZ77 is related to approximating the support size of a distribution.
first_indexed 2024-09-23T15:42:05Z
format Article
id mit-1721.1/73520
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T15:42:05Z
publishDate 2012
publisher Springer-Verlag
record_format dspace
spelling mit-1721.1/735202022-09-29T15:34:36Z Sublinear Algorithms for Approximating String Compressibility Raskhodnikova, Sofya Ron, Dana Rubinfeld, Ronitt Smith, Adam Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Rubinfeld, Ronitt We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE) and a variant of Lempel-Ziv (LZ77), and present sublinear algorithms for approximating compressibility with respect to both schemes. We also give several lower bounds that show that our algorithms for both schemes cannot be improved significantly. Our investigation of LZ77 yields results whose interest goes beyond the initial questions we set out to study. In particular, we prove combinatorial structural lemmas that relate the compressibility of a string with respect to LZ77 to the number of distinct short substrings contained in it (its ℓth subword complexity , for small ℓ). In addition, we show that approximating the compressibility with respect to LZ77 is related to approximating the support size of a distribution. National Science Foundation (U.S.) (Award CCF-1065125) National Science Foundation (U.S.) (Award CCF-0728645) Marie Curie International Reintegration Grant PIRG03-GA-2008-231077 Israel Science Foundation (Grant 1147/09) Israel Science Foundation (Grant 1675/09) 2012-10-01T17:05:04Z 2012-10-01T17:05:04Z 2012-02 2011-03 Article http://purl.org/eprint/type/JournalArticle 0178-4617 1432-0541 http://hdl.handle.net/1721.1/73520 Raskhodnikova, Sofya et al. “Sublinear Algorithms for Approximating String Compressibility.” Algorithmica (2012). https://orcid.org/0000-0002-4353-7639 en_US http://dx.doi.org/10.1007/s00453-012-9618-6 Algorithmica Creative Commons Attribution-Noncommercial-Share Alike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/ application/pdf Springer-Verlag Other University Web Domain
spellingShingle Raskhodnikova, Sofya
Ron, Dana
Rubinfeld, Ronitt
Smith, Adam
Sublinear Algorithms for Approximating String Compressibility
title Sublinear Algorithms for Approximating String Compressibility
title_full Sublinear Algorithms for Approximating String Compressibility
title_fullStr Sublinear Algorithms for Approximating String Compressibility
title_full_unstemmed Sublinear Algorithms for Approximating String Compressibility
title_short Sublinear Algorithms for Approximating String Compressibility
title_sort sublinear algorithms for approximating string compressibility
url http://hdl.handle.net/1721.1/73520
https://orcid.org/0000-0002-4353-7639
work_keys_str_mv AT raskhodnikovasofya sublinearalgorithmsforapproximatingstringcompressibility
AT rondana sublinearalgorithmsforapproximatingstringcompressibility
AT rubinfeldronitt sublinearalgorithmsforapproximatingstringcompressibility
AT smithadam sublinearalgorithmsforapproximatingstringcompressibility