A Named Entity-Annotated Corpus of 19th Century Classical Commentaries

We release a multilingual named entity (NE) corpus of 19th century commentaries to Sophocles’ Ajax. Selected commentaries are written in English, German and French, but are also replete with Latin and Greek quotes. Bibliographic entities were annotated along traditional named entities following our...

Full description

Bibliographic Details
Main Authors: Matteo Romanello, Sven Najem-Meyer
Format: Article
Language:English
Published: Ubiquity Press 2024-01-01
Series:Journal of Open Humanities Data
Subjects:
Online Access:https://account.openhumanitiesdata.metajnl.com/index.php/up-j-johd/article/view/150
_version_ 1797315794531516416
author Matteo Romanello
Sven Najem-Meyer
author_facet Matteo Romanello
Sven Najem-Meyer
author_sort Matteo Romanello
collection DOAJ
description We release a multilingual named entity (NE) corpus of 19th century commentaries to Sophocles’ Ajax. Selected commentaries are written in English, German and French, but are also replete with Latin and Greek quotes. Bibliographic entities were annotated along traditional named entities following our guidelines (Romanello & Najem-Meyer, 2022). The corpus contains about 300 annotated pages, 111,216 tokens and 7,334 entity mentions and was featured in the HIPE-2022 shared task. Although named entity recognition (NER) showed reassuring results, optical character recognition (OCR) mistakes and extensive use of abbreviation kept entity linking (EL) a challenging task. With such characteristics, this corpus offers an excellent way to assess the adaptability of information extraction systems to noisy, domain-specific multilingual and multiscript environments.
first_indexed 2024-03-08T03:09:00Z
format Article
id doaj.art-87a29015dad14af2ba0da81f7fec182c
institution Directory Open Access Journal
issn 2059-481X
language English
last_indexed 2024-03-08T03:09:00Z
publishDate 2024-01-01
publisher Ubiquity Press
record_format Article
series Journal of Open Humanities Data
spelling doaj.art-87a29015dad14af2ba0da81f7fec182c2024-02-13T07:38:06ZengUbiquity PressJournal of Open Humanities Data2059-481X2024-01-01101110.5334/johd.150150A Named Entity-Annotated Corpus of 19th Century Classical CommentariesMatteo Romanello0https://orcid.org/0000-0002-7406-6286Sven Najem-Meyer1https://orcid.org/0000-0002-3661-4579Institute of Archeology and Classical Studies, University of Lausanne, LausanneDigital Humanities Laboratory, Swiss Federal Institute of Technology Lausanne, LausanneWe release a multilingual named entity (NE) corpus of 19th century commentaries to Sophocles’ Ajax. Selected commentaries are written in English, German and French, but are also replete with Latin and Greek quotes. Bibliographic entities were annotated along traditional named entities following our guidelines (Romanello & Najem-Meyer, 2022). The corpus contains about 300 annotated pages, 111,216 tokens and 7,334 entity mentions and was featured in the HIPE-2022 shared task. Although named entity recognition (NER) showed reassuring results, optical character recognition (OCR) mistakes and extensive use of abbreviation kept entity linking (EL) a challenging task. With such characteristics, this corpus offers an excellent way to assess the adaptability of information extraction systems to noisy, domain-specific multilingual and multiscript environments.https://account.openhumanitiesdata.metajnl.com/index.php/up-j-johd/article/view/150historical commentariesclassicsnamed entity recognitionentity linkingbibliographic reference extraction
spellingShingle Matteo Romanello
Sven Najem-Meyer
A Named Entity-Annotated Corpus of 19th Century Classical Commentaries
Journal of Open Humanities Data
historical commentaries
classics
named entity recognition
entity linking
bibliographic reference extraction
title A Named Entity-Annotated Corpus of 19th Century Classical Commentaries
title_full A Named Entity-Annotated Corpus of 19th Century Classical Commentaries
title_fullStr A Named Entity-Annotated Corpus of 19th Century Classical Commentaries
title_full_unstemmed A Named Entity-Annotated Corpus of 19th Century Classical Commentaries
title_short A Named Entity-Annotated Corpus of 19th Century Classical Commentaries
title_sort named entity annotated corpus of 19th century classical commentaries
topic historical commentaries
classics
named entity recognition
entity linking
bibliographic reference extraction
url https://account.openhumanitiesdata.metajnl.com/index.php/up-j-johd/article/view/150
work_keys_str_mv AT matteoromanello anamedentityannotatedcorpusof19thcenturyclassicalcommentaries
AT svennajemmeyer anamedentityannotatedcorpusof19thcenturyclassicalcommentaries
AT matteoromanello namedentityannotatedcorpusof19thcenturyclassicalcommentaries
AT svennajemmeyer namedentityannotatedcorpusof19thcenturyclassicalcommentaries