Preprocessing Greek Papyri for Linguistic Annotation

Greek documentary papyri form an important direct source for Ancient Greek. It has been exploited surprisingly little in Greek linguistics due to a lack of good tools for searching linguistic structures. This article presents a new tool and digital platform, “Sematia”, which enables transforming the...

Full description

Bibliographic Details
Main Authors: Marja Vierros, Erik Henriksson
Format: Article
Language:English
Published: Nicolas Turenne 2017-06-01
Series:Journal of Data Mining and Digital Humanities
Subjects:
Online Access:https://jdmdh.episciences.org/1385/pdf
_version_ 1797268128429768704
author Marja Vierros
Erik Henriksson
author_facet Marja Vierros
Erik Henriksson
author_sort Marja Vierros
collection DOAJ
description Greek documentary papyri form an important direct source for Ancient Greek. It has been exploited surprisingly little in Greek linguistics due to a lack of good tools for searching linguistic structures. This article presents a new tool and digital platform, “Sematia”, which enables transforming the digital texts available in TEI EpiDoc XML format to a format which can be morphologically and syntactically annotated (treebanked), and where the user can add new metadata concerning the text type, writer and handwriting of each act of writing. An important aspect in this process is to take into account the original surviving writing vs. the standardization of language and supplements made by the editors. This is performed by creating two different layers of the same text. The platform is in its early development phase. Ongoing and future developments, such as tagging linguistic variation phenomena as well as queries performed within Sematia, are discussed at the end of the article.
first_indexed 2024-03-11T21:05:17Z
format Article
id doaj.art-043b6b3b2e31469c8df02c2986b8173f
institution Directory Open Access Journal
issn 2416-5999
language English
last_indexed 2024-04-25T01:27:33Z
publishDate 2017-06-01
publisher Nicolas Turenne
record_format Article
series Journal of Data Mining and Digital Humanities
spelling doaj.art-043b6b3b2e31469c8df02c2986b8173f2024-03-08T15:27:53ZengNicolas TurenneJournal of Data Mining and Digital Humanities2416-59992017-06-012016Towards a Digital Ecosystem:...10.46298/jdmdh.13851385Preprocessing Greek Papyri for Linguistic AnnotationMarja Vierros0https://orcid.org/0000-0001-8531-7055Erik Henriksson1Department of World CulturesDepartment of World CulturesGreek documentary papyri form an important direct source for Ancient Greek. It has been exploited surprisingly little in Greek linguistics due to a lack of good tools for searching linguistic structures. This article presents a new tool and digital platform, “Sematia”, which enables transforming the digital texts available in TEI EpiDoc XML format to a format which can be morphologically and syntactically annotated (treebanked), and where the user can add new metadata concerning the text type, writer and handwriting of each act of writing. An important aspect in this process is to take into account the original surviving writing vs. the standardization of language and supplements made by the editors. This is performed by creating two different layers of the same text. The platform is in its early development phase. Ongoing and future developments, such as tagging linguistic variation phenomena as well as queries performed within Sematia, are discussed at the end of the article.https://jdmdh.episciences.org/1385/pdfjavascriptpythonmysqltei epidoc xmlgreekpapyrilinguistic annotationtreebankdependency grammar[shs.class] humanities and social sciences/classical studies[shs.langue] humanities and social sciences/linguistics[shs.stat] humanities and social sciences/methods and statistics
spellingShingle Marja Vierros
Erik Henriksson
Preprocessing Greek Papyri for Linguistic Annotation
Journal of Data Mining and Digital Humanities
javascript
python
mysql
tei epidoc xml
greek
papyri
linguistic annotation
treebank
dependency grammar
[shs.class] humanities and social sciences/classical studies
[shs.langue] humanities and social sciences/linguistics
[shs.stat] humanities and social sciences/methods and statistics
title Preprocessing Greek Papyri for Linguistic Annotation
title_full Preprocessing Greek Papyri for Linguistic Annotation
title_fullStr Preprocessing Greek Papyri for Linguistic Annotation
title_full_unstemmed Preprocessing Greek Papyri for Linguistic Annotation
title_short Preprocessing Greek Papyri for Linguistic Annotation
title_sort preprocessing greek papyri for linguistic annotation
topic javascript
python
mysql
tei epidoc xml
greek
papyri
linguistic annotation
treebank
dependency grammar
[shs.class] humanities and social sciences/classical studies
[shs.langue] humanities and social sciences/linguistics
[shs.stat] humanities and social sciences/methods and statistics
url https://jdmdh.episciences.org/1385/pdf
work_keys_str_mv AT marjavierros preprocessinggreekpapyriforlinguisticannotation
AT erikhenriksson preprocessinggreekpapyriforlinguisticannotation