Towards the Development of a Test Corpus of Digital Objects for the Evaluation of File Format Identification Tools and Signatures

The digital preservation community currently utilises a number of tools and automated processes to identify and validate digital objects. The identification of digital objects is a vital first step in their long-term preservation, but the results returned by tools used for this purpose are lacking i...

Full description

Bibliographic Details
Main Authors: Andrew Fetherston, Tim Gollins
Format: Article
Language:English
Published: University of Edinburgh 2012-03-01
Series:International Journal of Digital Curation
Online Access:https://129.215.67.1/ijdc/article/view/211
Description
Summary:The digital preservation community currently utilises a number of tools and automated processes to identify and validate digital objects. The identification of digital objects is a vital first step in their long-term preservation, but the results returned by tools used for this purpose are lacking in transparency, and are not easily tested or verified. This paper suggests that a test corpus of digital objects is one way of providing this verification and validation, ultimately improving trust in the tools, and providing further stimulus to their development. Issues to be considered are outlined, and attention is drawn to particular examples of existing digital corpora which could conceivably provide a useable framework or starting point for our own communities needs. This paper does not seek to answer all questions in this area, but merely attempts to set out areas for consideration in any next step that is taken.
ISSN:1746-8256