Automatic Normalization of Temporal Expressions
Dates, periods and timespans are described in archaeological datasets using a number of different textual patterns for which myriad variations exist, rendering direct automated comparison difficult. The issue can occur even within records from the same dataset and is further compounded when attempti...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Ubiquity Press
2023-03-01
|
Series: | Journal of Computer Applications in Archaeology |
Subjects: | |
Online Access: | https://journal.caa-international.org/articles/105 |
_version_ | 1797845459028410368 |
---|---|
author | Ceri Binding Douglas Tudhope |
author_facet | Ceri Binding Douglas Tudhope |
author_sort | Ceri Binding |
collection | DOAJ |
description | Dates, periods and timespans are described in archaeological datasets using a number of different textual patterns for which myriad variations exist, rendering direct automated comparison difficult. The issue can occur even within records from the same dataset and is further compounded when attempting to integrate multilingual data – particularly where dates may be expressed in words rather than numbers. The same problem can be found in temporal metadata, whether manually entered or generated via Natural Language Processing (NLP) techniques from reports and grey literature. Resolving and normalizing dates and periods to internationally agreed standard formats enables efficient data integration, interchange, search, comparison and visualization. This paper reports on the design and implementation of a tool to normalize temporal expressions to a numerical time axis and reflects on key issues. Textual patterns for seven categories of temporal expression have been normalized: Ordinal named or numbered centuries; Year spans; Single year (with tolerance); Decades; Century spans; Single year with prefix; Named periods. The following languages are currently supported: Dutch, English, French, German, Italian, Norwegian, Spanish, Swedish, Welsh. Methods are described together with an (open source) normalization tool developed in Python and four applications of the method are discussed, together with limitations and future work. Results are presented from diverse data sets and languages. The input is a temporal text string and a language code (ISO639-1). The output is a tab delimited text file with start/end years (in ISO 8601 format), relative to Common Era (CE). The normalized outputs are provided as additional attributes along with the original text expression for consuming software to employ in end-user applications. |
first_indexed | 2024-04-09T17:39:22Z |
format | Article |
id | doaj.art-36232ef6f1e845abb785d8a4653914c0 |
institution | Directory Open Access Journal |
issn | 2514-8362 |
language | English |
last_indexed | 2024-04-09T17:39:22Z |
publishDate | 2023-03-01 |
publisher | Ubiquity Press |
record_format | Article |
series | Journal of Computer Applications in Archaeology |
spelling | doaj.art-36232ef6f1e845abb785d8a4653914c02023-04-17T07:09:19ZengUbiquity PressJournal of Computer Applications in Archaeology2514-83622023-03-016110.5334/jcaa.10581Automatic Normalization of Temporal ExpressionsCeri Binding0Douglas Tudhope1Hypermedia Research Group, University of South WalesHypermedia Research Group, University of South WalesDates, periods and timespans are described in archaeological datasets using a number of different textual patterns for which myriad variations exist, rendering direct automated comparison difficult. The issue can occur even within records from the same dataset and is further compounded when attempting to integrate multilingual data – particularly where dates may be expressed in words rather than numbers. The same problem can be found in temporal metadata, whether manually entered or generated via Natural Language Processing (NLP) techniques from reports and grey literature. Resolving and normalizing dates and periods to internationally agreed standard formats enables efficient data integration, interchange, search, comparison and visualization. This paper reports on the design and implementation of a tool to normalize temporal expressions to a numerical time axis and reflects on key issues. Textual patterns for seven categories of temporal expression have been normalized: Ordinal named or numbered centuries; Year spans; Single year (with tolerance); Decades; Century spans; Single year with prefix; Named periods. The following languages are currently supported: Dutch, English, French, German, Italian, Norwegian, Spanish, Swedish, Welsh. Methods are described together with an (open source) normalization tool developed in Python and four applications of the method are discussed, together with limitations and future work. Results are presented from diverse data sets and languages. The input is a temporal text string and a language code (ISO639-1). The output is a tab delimited text file with start/end years (in ISO 8601 format), relative to Common Era (CE). The normalized outputs are provided as additional attributes along with the original text expression for consuming software to employ in end-user applications.https://journal.caa-international.org/articles/105temporal expressionsdatingtime periodssemantic integrationsoftwaremultilingual |
spellingShingle | Ceri Binding Douglas Tudhope Automatic Normalization of Temporal Expressions Journal of Computer Applications in Archaeology temporal expressions dating time periods semantic integration software multilingual |
title | Automatic Normalization of Temporal Expressions |
title_full | Automatic Normalization of Temporal Expressions |
title_fullStr | Automatic Normalization of Temporal Expressions |
title_full_unstemmed | Automatic Normalization of Temporal Expressions |
title_short | Automatic Normalization of Temporal Expressions |
title_sort | automatic normalization of temporal expressions |
topic | temporal expressions dating time periods semantic integration software multilingual |
url | https://journal.caa-international.org/articles/105 |
work_keys_str_mv | AT ceribinding automaticnormalizationoftemporalexpressions AT douglastudhope automaticnormalizationoftemporalexpressions |