Preprocessing Greek Papyri for Linguistic Annotation
Greek documentary papyri form an important direct source for Ancient Greek. It has been exploited surprisingly little in Greek linguistics due to a lack of good tools for searching linguistic structures. This article presents a new tool and digital platform, “Sematia”, which enables transforming the...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nicolas Turenne
2017-06-01
|
Series: | Journal of Data Mining and Digital Humanities |
Subjects: | |
Online Access: | https://jdmdh.episciences.org/1385/pdf |
_version_ | 1797268128429768704 |
---|---|
author | Marja Vierros Erik Henriksson |
author_facet | Marja Vierros Erik Henriksson |
author_sort | Marja Vierros |
collection | DOAJ |
description | Greek documentary papyri form an important direct source for Ancient Greek. It has been exploited surprisingly little in Greek linguistics due to a lack of good tools for searching linguistic structures. This article presents a new tool and digital platform, “Sematia”, which enables transforming the digital texts available in TEI EpiDoc XML format to a format which can be morphologically and syntactically annotated (treebanked), and where the user can add new metadata concerning the text type, writer and handwriting of each act of writing. An important aspect in this process is to take into account the original surviving writing vs. the standardization of language and supplements made by the editors. This is performed by creating two different layers of the same text. The platform is in its early development phase. Ongoing and future developments, such as tagging linguistic variation phenomena as well as queries performed within Sematia, are discussed at the end of the article. |
first_indexed | 2024-03-11T21:05:17Z |
format | Article |
id | doaj.art-043b6b3b2e31469c8df02c2986b8173f |
institution | Directory Open Access Journal |
issn | 2416-5999 |
language | English |
last_indexed | 2024-04-25T01:27:33Z |
publishDate | 2017-06-01 |
publisher | Nicolas Turenne |
record_format | Article |
series | Journal of Data Mining and Digital Humanities |
spelling | doaj.art-043b6b3b2e31469c8df02c2986b8173f2024-03-08T15:27:53ZengNicolas TurenneJournal of Data Mining and Digital Humanities2416-59992017-06-012016Towards a Digital Ecosystem:...10.46298/jdmdh.13851385Preprocessing Greek Papyri for Linguistic AnnotationMarja Vierros0https://orcid.org/0000-0001-8531-7055Erik Henriksson1Department of World CulturesDepartment of World CulturesGreek documentary papyri form an important direct source for Ancient Greek. It has been exploited surprisingly little in Greek linguistics due to a lack of good tools for searching linguistic structures. This article presents a new tool and digital platform, “Sematia”, which enables transforming the digital texts available in TEI EpiDoc XML format to a format which can be morphologically and syntactically annotated (treebanked), and where the user can add new metadata concerning the text type, writer and handwriting of each act of writing. An important aspect in this process is to take into account the original surviving writing vs. the standardization of language and supplements made by the editors. This is performed by creating two different layers of the same text. The platform is in its early development phase. Ongoing and future developments, such as tagging linguistic variation phenomena as well as queries performed within Sematia, are discussed at the end of the article.https://jdmdh.episciences.org/1385/pdfjavascriptpythonmysqltei epidoc xmlgreekpapyrilinguistic annotationtreebankdependency grammar[shs.class] humanities and social sciences/classical studies[shs.langue] humanities and social sciences/linguistics[shs.stat] humanities and social sciences/methods and statistics |
spellingShingle | Marja Vierros Erik Henriksson Preprocessing Greek Papyri for Linguistic Annotation Journal of Data Mining and Digital Humanities javascript python mysql tei epidoc xml greek papyri linguistic annotation treebank dependency grammar [shs.class] humanities and social sciences/classical studies [shs.langue] humanities and social sciences/linguistics [shs.stat] humanities and social sciences/methods and statistics |
title | Preprocessing Greek Papyri for Linguistic Annotation |
title_full | Preprocessing Greek Papyri for Linguistic Annotation |
title_fullStr | Preprocessing Greek Papyri for Linguistic Annotation |
title_full_unstemmed | Preprocessing Greek Papyri for Linguistic Annotation |
title_short | Preprocessing Greek Papyri for Linguistic Annotation |
title_sort | preprocessing greek papyri for linguistic annotation |
topic | javascript python mysql tei epidoc xml greek papyri linguistic annotation treebank dependency grammar [shs.class] humanities and social sciences/classical studies [shs.langue] humanities and social sciences/linguistics [shs.stat] humanities and social sciences/methods and statistics |
url | https://jdmdh.episciences.org/1385/pdf |
work_keys_str_mv | AT marjavierros preprocessinggreekpapyriforlinguisticannotation AT erikhenriksson preprocessinggreekpapyriforlinguisticannotation |