Automatic Normalization of Temporal Expressions

Dates, periods and timespans are described in archaeological datasets using a number of different textual patterns for which myriad variations exist, rendering direct automated comparison difficult. The issue can occur even within records from the same dataset and is further compounded when attempti...

Full description

Bibliographic Details
Main Authors: Ceri Binding, Douglas Tudhope
Format: Article
Language:English
Published: Ubiquity Press 2023-03-01
Series:Journal of Computer Applications in Archaeology
Subjects:
Online Access:https://journal.caa-international.org/articles/105
_version_ 1797845459028410368
author Ceri Binding
Douglas Tudhope
author_facet Ceri Binding
Douglas Tudhope
author_sort Ceri Binding
collection DOAJ
description Dates, periods and timespans are described in archaeological datasets using a number of different textual patterns for which myriad variations exist, rendering direct automated comparison difficult. The issue can occur even within records from the same dataset and is further compounded when attempting to integrate multilingual data – particularly where dates may be expressed in words rather than numbers. The same problem can be found in temporal metadata, whether manually entered or generated via Natural Language Processing (NLP) techniques from reports and grey literature. Resolving and normalizing dates and periods to internationally agreed standard formats enables efficient data integration, interchange, search, comparison and visualization. This paper reports on the design and implementation of a tool to normalize temporal expressions to a numerical time axis and reflects on key issues. Textual patterns for seven categories of temporal expression have been normalized: Ordinal named or numbered centuries; Year spans; Single year (with tolerance); Decades; Century spans; Single year with prefix; Named periods. The following languages are currently supported: Dutch, English, French, German, Italian, Norwegian, Spanish, Swedish, Welsh. Methods are described together with an (open source) normalization tool developed in Python and four applications of the method are discussed, together with limitations and future work. Results are presented from diverse data sets and languages. The input is a temporal text string and a language code (ISO639-1). The output is a tab delimited text file with start/end years (in ISO 8601 format), relative to Common Era (CE). The normalized outputs are provided as additional attributes along with the original text expression for consuming software to employ in end-user applications.
first_indexed 2024-04-09T17:39:22Z
format Article
id doaj.art-36232ef6f1e845abb785d8a4653914c0
institution Directory Open Access Journal
issn 2514-8362
language English
last_indexed 2024-04-09T17:39:22Z
publishDate 2023-03-01
publisher Ubiquity Press
record_format Article
series Journal of Computer Applications in Archaeology
spelling doaj.art-36232ef6f1e845abb785d8a4653914c02023-04-17T07:09:19ZengUbiquity PressJournal of Computer Applications in Archaeology2514-83622023-03-016110.5334/jcaa.10581Automatic Normalization of Temporal ExpressionsCeri Binding0Douglas Tudhope1Hypermedia Research Group, University of South WalesHypermedia Research Group, University of South WalesDates, periods and timespans are described in archaeological datasets using a number of different textual patterns for which myriad variations exist, rendering direct automated comparison difficult. The issue can occur even within records from the same dataset and is further compounded when attempting to integrate multilingual data – particularly where dates may be expressed in words rather than numbers. The same problem can be found in temporal metadata, whether manually entered or generated via Natural Language Processing (NLP) techniques from reports and grey literature. Resolving and normalizing dates and periods to internationally agreed standard formats enables efficient data integration, interchange, search, comparison and visualization. This paper reports on the design and implementation of a tool to normalize temporal expressions to a numerical time axis and reflects on key issues. Textual patterns for seven categories of temporal expression have been normalized: Ordinal named or numbered centuries; Year spans; Single year (with tolerance); Decades; Century spans; Single year with prefix; Named periods. The following languages are currently supported: Dutch, English, French, German, Italian, Norwegian, Spanish, Swedish, Welsh. Methods are described together with an (open source) normalization tool developed in Python and four applications of the method are discussed, together with limitations and future work. Results are presented from diverse data sets and languages. The input is a temporal text string and a language code (ISO639-1). The output is a tab delimited text file with start/end years (in ISO 8601 format), relative to Common Era (CE). The normalized outputs are provided as additional attributes along with the original text expression for consuming software to employ in end-user applications.https://journal.caa-international.org/articles/105temporal expressionsdatingtime periodssemantic integrationsoftwaremultilingual
spellingShingle Ceri Binding
Douglas Tudhope
Automatic Normalization of Temporal Expressions
Journal of Computer Applications in Archaeology
temporal expressions
dating
time periods
semantic integration
software
multilingual
title Automatic Normalization of Temporal Expressions
title_full Automatic Normalization of Temporal Expressions
title_fullStr Automatic Normalization of Temporal Expressions
title_full_unstemmed Automatic Normalization of Temporal Expressions
title_short Automatic Normalization of Temporal Expressions
title_sort automatic normalization of temporal expressions
topic temporal expressions
dating
time periods
semantic integration
software
multilingual
url https://journal.caa-international.org/articles/105
work_keys_str_mv AT ceribinding automaticnormalizationoftemporalexpressions
AT douglastudhope automaticnormalizationoftemporalexpressions