A formal treatment of lossless data compression algorithms
<p>Since its inception, data compression has been practised mostly as an experimental science. Although this thesis continues that trend to some extent, its main emphasis is on formal derivations of compression algorithms and proofs of their correctness. Such a mathematical approach has not...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
2005
|
_version_ | 1797079459734487040 |
---|---|
author | Stratford, B |
author2 | Bird, R |
author_facet | Bird, R Stratford, B |
author_sort | Stratford, B |
collection | OXFORD |
description | <p>Since its inception, data compression has been practised mostly as an experimental science. Although this thesis continues that trend to some extent, its main emphasis is on formal derivations of compression algorithms and proofs of their correctness. Such a mathematical approach has not been taken before, and we have found that it has yielded significant dividends in the form of faster compression.</p> <p>Modern compression schemes can be viewed as a combination of a statistical model and an entropy coder. The method developed in this thesis consists of a PPM (Prediction by Partial Matching) model with arithmetic coding. Other methods, including dictionary-based algorithms and the Burrows-Wheeler Transform, are discussed briefly.</p> <p>Arithmetic coding can be seen as a generalisation of Huffman coding, with fewer restrictions and better compression. However, the algorithm is quite difficult to analyse and understand, and it’s very easy to make a mistake that would render the program incorrect. Our formal approach simplifies the explanation, and gives us confidence in the final software.</p> <p>Prediction by Partial Matching (PPM) represents the state of the art in statistical modelling. Many of its variants outperform all other known methods. Its major drawback is that it is very slow, and requires temporary storage space linear in the size of the input. A number of the design decisions, while intuitively sensible, are not backed up by any theory. We aimed to justify our version of PPM by using Bayesian statistics. Although this approach did not entirely succeed, there was some significant progress towards the target.</p> <p>Our derivations are carried out using notation drawn from the functional programming language Haskell. Haskell provides a number of advantages over the more traditional imperative languages, although all programs are given in C in an appendix.</p> |
first_indexed | 2024-03-07T00:46:10Z |
format | Thesis |
id | oxford-uuid:84bbc5f3-1581-458f-af7c-0cbd4191caa8 |
institution | University of Oxford |
last_indexed | 2024-03-07T00:46:10Z |
publishDate | 2005 |
record_format | dspace |
spelling | oxford-uuid:84bbc5f3-1581-458f-af7c-0cbd4191caa82022-03-26T21:53:04ZA formal treatment of lossless data compression algorithmsThesishttp://purl.org/coar/resource_type/c_db06uuid:84bbc5f3-1581-458f-af7c-0cbd4191caa8ORA Deposit2005Stratford, BBird, R<p>Since its inception, data compression has been practised mostly as an experimental science. Although this thesis continues that trend to some extent, its main emphasis is on formal derivations of compression algorithms and proofs of their correctness. Such a mathematical approach has not been taken before, and we have found that it has yielded significant dividends in the form of faster compression.</p> <p>Modern compression schemes can be viewed as a combination of a statistical model and an entropy coder. The method developed in this thesis consists of a PPM (Prediction by Partial Matching) model with arithmetic coding. Other methods, including dictionary-based algorithms and the Burrows-Wheeler Transform, are discussed briefly.</p> <p>Arithmetic coding can be seen as a generalisation of Huffman coding, with fewer restrictions and better compression. However, the algorithm is quite difficult to analyse and understand, and it’s very easy to make a mistake that would render the program incorrect. Our formal approach simplifies the explanation, and gives us confidence in the final software.</p> <p>Prediction by Partial Matching (PPM) represents the state of the art in statistical modelling. Many of its variants outperform all other known methods. Its major drawback is that it is very slow, and requires temporary storage space linear in the size of the input. A number of the design decisions, while intuitively sensible, are not backed up by any theory. We aimed to justify our version of PPM by using Bayesian statistics. Although this approach did not entirely succeed, there was some significant progress towards the target.</p> <p>Our derivations are carried out using notation drawn from the functional programming language Haskell. Haskell provides a number of advantages over the more traditional imperative languages, although all programs are given in C in an appendix.</p> |
spellingShingle | Stratford, B A formal treatment of lossless data compression algorithms |
title | A formal treatment of lossless data compression algorithms |
title_full | A formal treatment of lossless data compression algorithms |
title_fullStr | A formal treatment of lossless data compression algorithms |
title_full_unstemmed | A formal treatment of lossless data compression algorithms |
title_short | A formal treatment of lossless data compression algorithms |
title_sort | formal treatment of lossless data compression algorithms |
work_keys_str_mv | AT stratfordb aformaltreatmentoflosslessdatacompressionalgorithms AT stratfordb formaltreatmentoflosslessdatacompressionalgorithms |