A formal treatment of lossless data compression algorithms

<p>Since its inception, data compression has been practised mostly as an experimental science. Although this thesis continues that trend to some extent, its main emphasis is on formal derivations of compression algorithms and proofs of their correctness. Such a mathematical approach has not...

Full description

Bibliographic Details
Main Author:	Stratford, B
Other Authors:	Bird, R
Format:	Thesis
Published:	2005

_version_	1797079459734487040
author	Stratford, B
author2	Bird, R
author_facet	Bird, R Stratford, B
author_sort	Stratford, B
collection	OXFORD
description	<p>Since its inception, data compression has been practised mostly as an experimental science. Although this thesis continues that trend to some extent, its main emphasis is on formal derivations of compression algorithms and proofs of their correctness. Such a mathematical approach has not been taken before, and we have found that it has yielded significant dividends in the form of faster compression.</p> <p>Modern compression schemes can be viewed as a combination of a statistical model and an entropy coder. The method developed in this thesis consists of a PPM (Prediction by Partial Matching) model with arithmetic coding. Other methods, including dictionary-based algorithms and the Burrows-Wheeler Transform, are discussed briefly.</p> <p>Arithmetic coding can be seen as a generalisation of Huffman coding, with fewer restrictions and better compression. However, the algorithm is quite difficult to analyse and understand, and it’s very easy to make a mistake that would render the program incorrect. Our formal approach simplifies the explanation, and gives us confidence in the final software.</p> <p>Prediction by Partial Matching (PPM) represents the state of the art in statistical modelling. Many of its variants outperform all other known methods. Its major drawback is that it is very slow, and requires temporary storage space linear in the size of the input. A number of the design decisions, while intuitively sensible, are not backed up by any theory. We aimed to justify our version of PPM by using Bayesian statistics. Although this approach did not entirely succeed, there was some significant progress towards the target.</p> <p>Our derivations are carried out using notation drawn from the functional programming language Haskell. Haskell provides a number of advantages over the more traditional imperative languages, although all programs are given in C in an appendix.</p>
first_indexed	2024-03-07T00:46:10Z
format	Thesis
id	oxford-uuid:84bbc5f3-1581-458f-af7c-0cbd4191caa8
institution	University of Oxford
last_indexed	2024-03-07T00:46:10Z
publishDate	2005
record_format	dspace
spelling	oxford-uuid:84bbc5f3-1581-458f-af7c-0cbd4191caa82022-03-26T21:53:04ZA formal treatment of lossless data compression algorithmsThesishttp://purl.org/coar/resource_type/c_db06uuid:84bbc5f3-1581-458f-af7c-0cbd4191caa8ORA Deposit2005Stratford, BBird, R<p>Since its inception, data compression has been practised mostly as an experimental science. Although this thesis continues that trend to some extent, its main emphasis is on formal derivations of compression algorithms and proofs of their correctness. Such a mathematical approach has not been taken before, and we have found that it has yielded significant dividends in the form of faster compression.</p> <p>Modern compression schemes can be viewed as a combination of a statistical model and an entropy coder. The method developed in this thesis consists of a PPM (Prediction by Partial Matching) model with arithmetic coding. Other methods, including dictionary-based algorithms and the Burrows-Wheeler Transform, are discussed briefly.</p> <p>Arithmetic coding can be seen as a generalisation of Huffman coding, with fewer restrictions and better compression. However, the algorithm is quite difficult to analyse and understand, and it’s very easy to make a mistake that would render the program incorrect. Our formal approach simplifies the explanation, and gives us confidence in the final software.</p> <p>Prediction by Partial Matching (PPM) represents the state of the art in statistical modelling. Many of its variants outperform all other known methods. Its major drawback is that it is very slow, and requires temporary storage space linear in the size of the input. A number of the design decisions, while intuitively sensible, are not backed up by any theory. We aimed to justify our version of PPM by using Bayesian statistics. Although this approach did not entirely succeed, there was some significant progress towards the target.</p> <p>Our derivations are carried out using notation drawn from the functional programming language Haskell. Haskell provides a number of advantages over the more traditional imperative languages, although all programs are given in C in an appendix.</p>
spellingShingle	Stratford, B A formal treatment of lossless data compression algorithms
title	A formal treatment of lossless data compression algorithms
title_full	A formal treatment of lossless data compression algorithms
title_fullStr	A formal treatment of lossless data compression algorithms
title_full_unstemmed	A formal treatment of lossless data compression algorithms
title_short	A formal treatment of lossless data compression algorithms
title_sort	formal treatment of lossless data compression algorithms
work_keys_str_mv	AT stratfordb aformaltreatmentoflosslessdatacompressionalgorithms AT stratfordb formaltreatmentoflosslessdatacompressionalgorithms

A formal treatment of lossless data compression algorithms

Similar Items