Kicking and Screaming: Challenges and advantages of bringing TCP texts into line with the Text Encoding Initiative

<p>This paper addresses some of the practical problems of working with the underlying digital files prepared by the Text Creation Partnership using tools developed for standard TEI.</p><p>The 40,000 files of the TCP EEBO collection have been built up over the last decade using the...

Full description

Bibliographic Details
Main Authors: Rahtz, S, Cummings, J
Format: Conference item
Published: Bodleian Libraries, University of Oxford 2012
_version_ 1826306006639968256
author Rahtz, S
Cummings, J
author_facet Rahtz, S
Cummings, J
author_sort Rahtz, S
collection OXFORD
description <p>This paper addresses some of the practical problems of working with the underlying digital files prepared by the Text Creation Partnership using tools developed for standard TEI.</p><p>The 40,000 files of the TCP EEBO collection have been built up over the last decade using the markup technology of the 1990s, namely SGML and variations on the third edition of the Text Encoding Initiative Guidelines, to a gradually increasing standardof consistency. The delivery of the texts through the conventional web site works well, but problems arise if we want to take advantage of some of the tools now commonly used to process digital files, particularly those based on the current TEI recommendations. This involves transforming the SGML markup to XML, and then to the latest edition of the TEI (P5). We will investigate some of the problems involved in this type of conversion, such as:</p><p><ul><li>changes needed to the TEI Guidelines themselves to cover textual phenomena identifiedin EEBO which cannot adequately be described in current TEI recommendations</li><li>decisions needed to map some of the variants adopted by TCP back onto canonical TEI P5 markup</li><li>testing whether the conversion has lost any content along the way</li></ul></p><p>We will present some of the software we have developed for the TCP conversion, and show how it can be delivered in a production environment. The exercise of transformation gives an interesting opportunity to examine some of the encoding of TCP texts, analyze the range of textual phenomena which are recorded, and predict which structures which will be amenable to discovery by future scholars.</p><p>The 40,000-text corpus of TCP also provides a good test of general TEI tools. For this paper wedescribe some tools and the results we found when using them on TCP texts. As a case study we examine the generation of ebook editions (ePub format) of the TCP texts from the converted TEI. The results of such conversions will be assessed for their usefulness for contemporary readers and any failures in representing the intellectual content of the original text.</p>
first_indexed 2024-03-07T06:41:23Z
format Conference item
id oxford-uuid:f9667884-220b-4ec9-bb2f-c79044302399
institution University of Oxford
last_indexed 2024-03-07T06:41:23Z
publishDate 2012
publisher Bodleian Libraries, University of Oxford
record_format dspace
spelling oxford-uuid:f9667884-220b-4ec9-bb2f-c790443023992022-03-27T12:57:40ZKicking and Screaming: Challenges and advantages of bringing TCP texts into line with the Text Encoding InitiativeConference itemhttp://purl.org/coar/resource_type/c_5794uuid:f9667884-220b-4ec9-bb2f-c79044302399Oxford University Research Archive - ValetBodleian Libraries, University of Oxford2012Rahtz, SCummings, J<p>This paper addresses some of the practical problems of working with the underlying digital files prepared by the Text Creation Partnership using tools developed for standard TEI.</p><p>The 40,000 files of the TCP EEBO collection have been built up over the last decade using the markup technology of the 1990s, namely SGML and variations on the third edition of the Text Encoding Initiative Guidelines, to a gradually increasing standardof consistency. The delivery of the texts through the conventional web site works well, but problems arise if we want to take advantage of some of the tools now commonly used to process digital files, particularly those based on the current TEI recommendations. This involves transforming the SGML markup to XML, and then to the latest edition of the TEI (P5). We will investigate some of the problems involved in this type of conversion, such as:</p><p><ul><li>changes needed to the TEI Guidelines themselves to cover textual phenomena identifiedin EEBO which cannot adequately be described in current TEI recommendations</li><li>decisions needed to map some of the variants adopted by TCP back onto canonical TEI P5 markup</li><li>testing whether the conversion has lost any content along the way</li></ul></p><p>We will present some of the software we have developed for the TCP conversion, and show how it can be delivered in a production environment. The exercise of transformation gives an interesting opportunity to examine some of the encoding of TCP texts, analyze the range of textual phenomena which are recorded, and predict which structures which will be amenable to discovery by future scholars.</p><p>The 40,000-text corpus of TCP also provides a good test of general TEI tools. For this paper wedescribe some tools and the results we found when using them on TCP texts. As a case study we examine the generation of ebook editions (ePub format) of the TCP texts from the converted TEI. The results of such conversions will be assessed for their usefulness for contemporary readers and any failures in representing the intellectual content of the original text.</p>
spellingShingle Rahtz, S
Cummings, J
Kicking and Screaming: Challenges and advantages of bringing TCP texts into line with the Text Encoding Initiative
title Kicking and Screaming: Challenges and advantages of bringing TCP texts into line with the Text Encoding Initiative
title_full Kicking and Screaming: Challenges and advantages of bringing TCP texts into line with the Text Encoding Initiative
title_fullStr Kicking and Screaming: Challenges and advantages of bringing TCP texts into line with the Text Encoding Initiative
title_full_unstemmed Kicking and Screaming: Challenges and advantages of bringing TCP texts into line with the Text Encoding Initiative
title_short Kicking and Screaming: Challenges and advantages of bringing TCP texts into line with the Text Encoding Initiative
title_sort kicking and screaming challenges and advantages of bringing tcp texts into line with the text encoding initiative
work_keys_str_mv AT rahtzs kickingandscreamingchallengesandadvantagesofbringingtcptextsintolinewiththetextencodinginitiative
AT cummingsj kickingandscreamingchallengesandadvantagesofbringingtcptextsintolinewiththetextencodinginitiative