Generation of a Skeleton Corpus of Digital Objects for the Validation and Evaluation of Format Identification Tools and Signatures

To preserve digital information it is vital that the format of that information can be identified, in-perpetuity. This is the major focus of research within the field of Digital Preservation. The National Archives of the UK called for the Digital Preservation and Digital Curation communities to dev...

Full description

Bibliographic Details
Main Author: Ross Spencer
Format: Article
Language:English
Published: University of Edinburgh 2013-06-01
Series:International Journal of Digital Curation
Online Access:http://129.215.67.233/ijdc/article/view/249
_version_ 1797434995018563584
author Ross Spencer
author_facet Ross Spencer
author_sort Ross Spencer
collection DOAJ
description To preserve digital information it is vital that the format of that information can be identified, in-perpetuity. This is the major focus of research within the field of Digital Preservation. The National Archives of the UK called for the Digital Preservation and Digital Curation communities to develop a test corpus of digital objects to help further develop tools to aid this purpose. Following that call, an attempt has been made to develop the suite. This paper initially outlines a methodology to generate a skeleton corpus using simple user-generated digital objects. It then explores the lessons learnt in the generation of a corpus using scripting language techniques from the file format signatures described in The National Archives PRONOM technical registry. It will also discuss the use of the digital signature for this purpose, the benefits of developing a test corpus using this technique. Finally, this paper will outline a methodology for future research before exploring how the community can best make use of the output of this project and how this project needs to be taken forward to completion.
first_indexed 2024-03-09T10:41:51Z
format Article
id doaj.art-af6cf7ed76034bef9e0c7e2f5a096126
institution Directory Open Access Journal
issn 1746-8256
language English
last_indexed 2024-03-09T10:41:51Z
publishDate 2013-06-01
publisher University of Edinburgh
record_format Article
series International Journal of Digital Curation
spelling doaj.art-af6cf7ed76034bef9e0c7e2f5a0961262023-12-01T14:17:54ZengUniversity of EdinburghInternational Journal of Digital Curation1746-82562013-06-0181Generation of a Skeleton Corpus of Digital Objects for the Validation and Evaluation of Format Identification Tools and SignaturesRoss Spencer To preserve digital information it is vital that the format of that information can be identified, in-perpetuity. This is the major focus of research within the field of Digital Preservation. The National Archives of the UK called for the Digital Preservation and Digital Curation communities to develop a test corpus of digital objects to help further develop tools to aid this purpose. Following that call, an attempt has been made to develop the suite. This paper initially outlines a methodology to generate a skeleton corpus using simple user-generated digital objects. It then explores the lessons learnt in the generation of a corpus using scripting language techniques from the file format signatures described in The National Archives PRONOM technical registry. It will also discuss the use of the digital signature for this purpose, the benefits of developing a test corpus using this technique. Finally, this paper will outline a methodology for future research before exploring how the community can best make use of the output of this project and how this project needs to be taken forward to completion. http://129.215.67.233/ijdc/article/view/249
spellingShingle Ross Spencer
Generation of a Skeleton Corpus of Digital Objects for the Validation and Evaluation of Format Identification Tools and Signatures
International Journal of Digital Curation
title Generation of a Skeleton Corpus of Digital Objects for the Validation and Evaluation of Format Identification Tools and Signatures
title_full Generation of a Skeleton Corpus of Digital Objects for the Validation and Evaluation of Format Identification Tools and Signatures
title_fullStr Generation of a Skeleton Corpus of Digital Objects for the Validation and Evaluation of Format Identification Tools and Signatures
title_full_unstemmed Generation of a Skeleton Corpus of Digital Objects for the Validation and Evaluation of Format Identification Tools and Signatures
title_short Generation of a Skeleton Corpus of Digital Objects for the Validation and Evaluation of Format Identification Tools and Signatures
title_sort generation of a skeleton corpus of digital objects for the validation and evaluation of format identification tools and signatures
url http://129.215.67.233/ijdc/article/view/249
work_keys_str_mv AT rossspencer generationofaskeletoncorpusofdigitalobjectsforthevalidationandevaluationofformatidentificationtoolsandsignatures