A formal treatment of lossless data compression algorithms

<p>Since its inception, data compression has been practised mostly as an experimental science. Although this thesis continues that trend to some extent, its main emphasis is on formal derivations of compression algorithms and proofs of their correctness. Such a mathematical approach has not...

Full description

Bibliographic Details
Main Author: Stratford, B
Other Authors: Bird, R
Format: Thesis
Published: 2005
_version_ 1797079459734487040
author Stratford, B
author2 Bird, R
author_facet Bird, R
Stratford, B
author_sort Stratford, B
collection OXFORD
description <p>Since its inception, data compression has been practised mostly as an experimental science. Although this thesis continues that trend to some extent, its main emphasis is on formal derivations of compression algorithms and proofs of their correctness. Such a mathematical approach has not been taken before, and we have found that it has yielded significant dividends in the form of faster compression.</p> <p>Modern compression schemes can be viewed as a combination of a statistical model and an entropy coder. The method developed in this thesis consists of a PPM (Prediction by Partial Matching) model with arithmetic coding. Other methods, including dictionary-based algorithms and the Burrows-Wheeler Transform, are discussed briefly.</p> <p>Arithmetic coding can be seen as a generalisation of Huffman coding, with fewer restrictions and better compression. However, the algorithm is quite difficult to analyse and understand, and it’s very easy to make a mistake that would render the program incorrect. Our formal approach simplifies the explanation, and gives us confidence in the final software.</p> <p>Prediction by Partial Matching (PPM) represents the state of the art in statistical modelling. Many of its variants outperform all other known methods. Its major drawback is that it is very slow, and requires temporary storage space linear in the size of the input. A number of the design decisions, while intuitively sensible, are not backed up by any theory. We aimed to justify our version of PPM by using Bayesian statistics. Although this approach did not entirely succeed, there was some significant progress towards the target.</p> <p>Our derivations are carried out using notation drawn from the functional programming language Haskell. Haskell provides a number of advantages over the more traditional imperative languages, although all programs are given in C in an appendix.</p>
first_indexed 2024-03-07T00:46:10Z
format Thesis
id oxford-uuid:84bbc5f3-1581-458f-af7c-0cbd4191caa8
institution University of Oxford
last_indexed 2024-03-07T00:46:10Z
publishDate 2005
record_format dspace
spelling oxford-uuid:84bbc5f3-1581-458f-af7c-0cbd4191caa82022-03-26T21:53:04ZA formal treatment of lossless data compression algorithmsThesishttp://purl.org/coar/resource_type/c_db06uuid:84bbc5f3-1581-458f-af7c-0cbd4191caa8ORA Deposit2005Stratford, BBird, R<p>Since its inception, data compression has been practised mostly as an experimental science. Although this thesis continues that trend to some extent, its main emphasis is on formal derivations of compression algorithms and proofs of their correctness. Such a mathematical approach has not been taken before, and we have found that it has yielded significant dividends in the form of faster compression.</p> <p>Modern compression schemes can be viewed as a combination of a statistical model and an entropy coder. The method developed in this thesis consists of a PPM (Prediction by Partial Matching) model with arithmetic coding. Other methods, including dictionary-based algorithms and the Burrows-Wheeler Transform, are discussed briefly.</p> <p>Arithmetic coding can be seen as a generalisation of Huffman coding, with fewer restrictions and better compression. However, the algorithm is quite difficult to analyse and understand, and it’s very easy to make a mistake that would render the program incorrect. Our formal approach simplifies the explanation, and gives us confidence in the final software.</p> <p>Prediction by Partial Matching (PPM) represents the state of the art in statistical modelling. Many of its variants outperform all other known methods. Its major drawback is that it is very slow, and requires temporary storage space linear in the size of the input. A number of the design decisions, while intuitively sensible, are not backed up by any theory. We aimed to justify our version of PPM by using Bayesian statistics. Although this approach did not entirely succeed, there was some significant progress towards the target.</p> <p>Our derivations are carried out using notation drawn from the functional programming language Haskell. Haskell provides a number of advantages over the more traditional imperative languages, although all programs are given in C in an appendix.</p>
spellingShingle Stratford, B
A formal treatment of lossless data compression algorithms
title A formal treatment of lossless data compression algorithms
title_full A formal treatment of lossless data compression algorithms
title_fullStr A formal treatment of lossless data compression algorithms
title_full_unstemmed A formal treatment of lossless data compression algorithms
title_short A formal treatment of lossless data compression algorithms
title_sort formal treatment of lossless data compression algorithms
work_keys_str_mv AT stratfordb aformaltreatmentoflosslessdatacompressionalgorithms
AT stratfordb formaltreatmentoflosslessdatacompressionalgorithms