Generation of a Skeleton Corpus of Digital Objects for the Validation and Evaluation of Format Identification Tools and Signatures
To preserve digital information it is vital that the format of that information can be identified, in-perpetuity. This is the major focus of research within the field of Digital Preservation. The National Archives of the UK called for the Digital Preservation and Digital Curation communities to dev...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
University of Edinburgh
2013-06-01
|
Series: | International Journal of Digital Curation |
Online Access: | http://129.215.67.233/ijdc/article/view/249 |
_version_ | 1797434995018563584 |
---|---|
author | Ross Spencer |
author_facet | Ross Spencer |
author_sort | Ross Spencer |
collection | DOAJ |
description |
To preserve digital information it is vital that the format of that information can be identified, in-perpetuity. This is the major focus of research within the field of Digital Preservation. The National Archives of the UK called for the Digital Preservation and Digital Curation communities to develop a test corpus of digital objects to help further develop tools to aid this purpose. Following that call, an attempt has been made to develop the suite.
This paper initially outlines a methodology to generate a skeleton corpus using simple user-generated digital objects. It then explores the lessons learnt in the generation of a corpus using scripting language techniques from the file format signatures described in The National Archives PRONOM technical registry. It will also discuss the use of the digital signature for this purpose, the benefits of developing a test corpus using this technique. Finally, this paper will outline a methodology for future research before exploring how the community can best make use of the output of this project and how this project needs to be taken forward to completion.
|
first_indexed | 2024-03-09T10:41:51Z |
format | Article |
id | doaj.art-af6cf7ed76034bef9e0c7e2f5a096126 |
institution | Directory Open Access Journal |
issn | 1746-8256 |
language | English |
last_indexed | 2024-03-09T10:41:51Z |
publishDate | 2013-06-01 |
publisher | University of Edinburgh |
record_format | Article |
series | International Journal of Digital Curation |
spelling | doaj.art-af6cf7ed76034bef9e0c7e2f5a0961262023-12-01T14:17:54ZengUniversity of EdinburghInternational Journal of Digital Curation1746-82562013-06-0181Generation of a Skeleton Corpus of Digital Objects for the Validation and Evaluation of Format Identification Tools and SignaturesRoss Spencer To preserve digital information it is vital that the format of that information can be identified, in-perpetuity. This is the major focus of research within the field of Digital Preservation. The National Archives of the UK called for the Digital Preservation and Digital Curation communities to develop a test corpus of digital objects to help further develop tools to aid this purpose. Following that call, an attempt has been made to develop the suite. This paper initially outlines a methodology to generate a skeleton corpus using simple user-generated digital objects. It then explores the lessons learnt in the generation of a corpus using scripting language techniques from the file format signatures described in The National Archives PRONOM technical registry. It will also discuss the use of the digital signature for this purpose, the benefits of developing a test corpus using this technique. Finally, this paper will outline a methodology for future research before exploring how the community can best make use of the output of this project and how this project needs to be taken forward to completion. http://129.215.67.233/ijdc/article/view/249 |
spellingShingle | Ross Spencer Generation of a Skeleton Corpus of Digital Objects for the Validation and Evaluation of Format Identification Tools and Signatures International Journal of Digital Curation |
title | Generation of a Skeleton Corpus of Digital Objects for the Validation and Evaluation of Format Identification Tools and Signatures |
title_full | Generation of a Skeleton Corpus of Digital Objects for the Validation and Evaluation of Format Identification Tools and Signatures |
title_fullStr | Generation of a Skeleton Corpus of Digital Objects for the Validation and Evaluation of Format Identification Tools and Signatures |
title_full_unstemmed | Generation of a Skeleton Corpus of Digital Objects for the Validation and Evaluation of Format Identification Tools and Signatures |
title_short | Generation of a Skeleton Corpus of Digital Objects for the Validation and Evaluation of Format Identification Tools and Signatures |
title_sort | generation of a skeleton corpus of digital objects for the validation and evaluation of format identification tools and signatures |
url | http://129.215.67.233/ijdc/article/view/249 |
work_keys_str_mv | AT rossspencer generationofaskeletoncorpusofdigitalobjectsforthevalidationandevaluationofformatidentificationtoolsandsignatures |