Towards the Development of a Test Corpus of Digital Objects for the Evaluation of File Format Identification Tools and Signatures

The digital preservation community currently utilises a number of tools and automated processes to identify and validate digital objects. The identification of digital objects is a vital first step in their long-term preservation, but the results returned by tools used for this purpose are lacking i...

Full description

Bibliographic Details
Main Authors: Andrew Fetherston, Tim Gollins
Format: Article
Language:English
Published: University of Edinburgh 2012-03-01
Series:International Journal of Digital Curation
Online Access:https://129.215.67.1/ijdc/article/view/211
_version_ 1797393150067605504
author Andrew Fetherston
Tim Gollins
author_facet Andrew Fetherston
Tim Gollins
author_sort Andrew Fetherston
collection DOAJ
description The digital preservation community currently utilises a number of tools and automated processes to identify and validate digital objects. The identification of digital objects is a vital first step in their long-term preservation, but the results returned by tools used for this purpose are lacking in transparency, and are not easily tested or verified. This paper suggests that a test corpus of digital objects is one way of providing this verification and validation, ultimately improving trust in the tools, and providing further stimulus to their development. Issues to be considered are outlined, and attention is drawn to particular examples of existing digital corpora which could conceivably provide a useable framework or starting point for our own communities needs. This paper does not seek to answer all questions in this area, but merely attempts to set out areas for consideration in any next step that is taken.
first_indexed 2024-03-08T23:59:01Z
format Article
id doaj.art-af1c407998df4ce6922eca6fba4a36d6
institution Directory Open Access Journal
issn 1746-8256
language English
last_indexed 2024-03-08T23:59:01Z
publishDate 2012-03-01
publisher University of Edinburgh
record_format Article
series International Journal of Digital Curation
spelling doaj.art-af1c407998df4ce6922eca6fba4a36d62023-12-12T23:52:08ZengUniversity of EdinburghInternational Journal of Digital Curation1746-82562012-03-0171Towards the Development of a Test Corpus of Digital Objects for the Evaluation of File Format Identification Tools and SignaturesAndrew FetherstonTim GollinsThe digital preservation community currently utilises a number of tools and automated processes to identify and validate digital objects. The identification of digital objects is a vital first step in their long-term preservation, but the results returned by tools used for this purpose are lacking in transparency, and are not easily tested or verified. This paper suggests that a test corpus of digital objects is one way of providing this verification and validation, ultimately improving trust in the tools, and providing further stimulus to their development. Issues to be considered are outlined, and attention is drawn to particular examples of existing digital corpora which could conceivably provide a useable framework or starting point for our own communities needs. This paper does not seek to answer all questions in this area, but merely attempts to set out areas for consideration in any next step that is taken.https://129.215.67.1/ijdc/article/view/211
spellingShingle Andrew Fetherston
Tim Gollins
Towards the Development of a Test Corpus of Digital Objects for the Evaluation of File Format Identification Tools and Signatures
International Journal of Digital Curation
title Towards the Development of a Test Corpus of Digital Objects for the Evaluation of File Format Identification Tools and Signatures
title_full Towards the Development of a Test Corpus of Digital Objects for the Evaluation of File Format Identification Tools and Signatures
title_fullStr Towards the Development of a Test Corpus of Digital Objects for the Evaluation of File Format Identification Tools and Signatures
title_full_unstemmed Towards the Development of a Test Corpus of Digital Objects for the Evaluation of File Format Identification Tools and Signatures
title_short Towards the Development of a Test Corpus of Digital Objects for the Evaluation of File Format Identification Tools and Signatures
title_sort towards the development of a test corpus of digital objects for the evaluation of file format identification tools and signatures
url https://129.215.67.1/ijdc/article/view/211
work_keys_str_mv AT andrewfetherston towardsthedevelopmentofatestcorpusofdigitalobjectsfortheevaluationoffileformatidentificationtoolsandsignatures
AT timgollins towardsthedevelopmentofatestcorpusofdigitalobjectsfortheevaluationoffileformatidentificationtoolsandsignatures