Grammar-Based Specification and Parsing of Binary File Formats

The capability to validate and view or play binary file formats, as well as to convert binary file formats to standard or current file formats, is critically important to the preservation of digital data and records. This paper describes the extension of context-free grammars from strings to binary...

Full description

Bibliographic Details
Main Author: William Underwood
Format: Article
Language:English
Published: University of Edinburgh 2012-03-01
Series:International Journal of Digital Curation
Online Access:https://ijdc.net/index.php/ijdc/article/view/217
_version_ 1797323814027132928
author William Underwood
author_facet William Underwood
author_sort William Underwood
collection DOAJ
description The capability to validate and view or play binary file formats, as well as to convert binary file formats to standard or current file formats, is critically important to the preservation of digital data and records. This paper describes the extension of context-free grammars from strings to binary files. Binary files are arrays of data types, such as long and short integers, floating-point numbers and pointers, as well as characters. The concept of an attribute grammar is extended to these context-free array grammars. This attribute grammar has been used to define a number of chunk-based and directory-based binary file formats. A parser generator has been used with some of these grammars to generate syntax checkers (recognizers) for validating binary file formats. Among the potential benefits of an attribute grammar-based approach to specification and parsing of binary file formats is that attribute grammars not only support format validation, but support generation of error messages during validation of format, validation of semantic constraints, attribute value extraction (characterization), generation of viewers or players for file formats, and conversion to current or standard file formats. The significance of these results is that with these extensions to core computer science concepts, traditional parser/compiler technologies can potentially be used as a part of a general, cost effective curation strategy for binary file formats.
first_indexed 2024-03-08T05:34:33Z
format Article
id doaj.art-03f6ec0e9f8443dcb68341b2f9bd1bb2
institution Directory Open Access Journal
issn 1746-8256
language English
last_indexed 2024-03-08T05:34:33Z
publishDate 2012-03-01
publisher University of Edinburgh
record_format Article
series International Journal of Digital Curation
spelling doaj.art-03f6ec0e9f8443dcb68341b2f9bd1bb22024-02-06T00:07:07ZengUniversity of EdinburghInternational Journal of Digital Curation1746-82562012-03-0171Grammar-Based Specification and Parsing of Binary File FormatsWilliam UnderwoodThe capability to validate and view or play binary file formats, as well as to convert binary file formats to standard or current file formats, is critically important to the preservation of digital data and records. This paper describes the extension of context-free grammars from strings to binary files. Binary files are arrays of data types, such as long and short integers, floating-point numbers and pointers, as well as characters. The concept of an attribute grammar is extended to these context-free array grammars. This attribute grammar has been used to define a number of chunk-based and directory-based binary file formats. A parser generator has been used with some of these grammars to generate syntax checkers (recognizers) for validating binary file formats. Among the potential benefits of an attribute grammar-based approach to specification and parsing of binary file formats is that attribute grammars not only support format validation, but support generation of error messages during validation of format, validation of semantic constraints, attribute value extraction (characterization), generation of viewers or players for file formats, and conversion to current or standard file formats. The significance of these results is that with these extensions to core computer science concepts, traditional parser/compiler technologies can potentially be used as a part of a general, cost effective curation strategy for binary file formats.https://ijdc.net/index.php/ijdc/article/view/217
spellingShingle William Underwood
Grammar-Based Specification and Parsing of Binary File Formats
International Journal of Digital Curation
title Grammar-Based Specification and Parsing of Binary File Formats
title_full Grammar-Based Specification and Parsing of Binary File Formats
title_fullStr Grammar-Based Specification and Parsing of Binary File Formats
title_full_unstemmed Grammar-Based Specification and Parsing of Binary File Formats
title_short Grammar-Based Specification and Parsing of Binary File Formats
title_sort grammar based specification and parsing of binary file formats
url https://ijdc.net/index.php/ijdc/article/view/217
work_keys_str_mv AT williamunderwood grammarbasedspecificationandparsingofbinaryfileformats