How can voting mechanisms improve the robustness and generalizability of toponym disambiguation?

Natural language texts, such as tweets and news, contain a vast amount of geospatial information, which can be extracted by first recognizing toponyms in texts (toponym recognition) and then identifying their geospatial representations (toponym disambiguation). This paper focuses on toponym disambig...

Full description

Bibliographic Details
Main Authors: Xuke Hu, Yeran Sun, Jens Kersten, Zhiyong Zhou, Friederike Klan, Hongchao Fan
Format: Article
Language:English
Published: Elsevier 2023-03-01
Series:International Journal of Applied Earth Observations and Geoinformation
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1569843223000134
_version_ 1811163481393594368
author Xuke Hu
Yeran Sun
Jens Kersten
Zhiyong Zhou
Friederike Klan
Hongchao Fan
author_facet Xuke Hu
Yeran Sun
Jens Kersten
Zhiyong Zhou
Friederike Klan
Hongchao Fan
author_sort Xuke Hu
collection DOAJ
description Natural language texts, such as tweets and news, contain a vast amount of geospatial information, which can be extracted by first recognizing toponyms in texts (toponym recognition) and then identifying their geospatial representations (toponym disambiguation). This paper focuses on toponym disambiguation, which can be approached by toponym resolution and entity linking. Recently, many novel approaches, especially deep learning-based, have been proposed, such as CamCoder, GENRE, and BLINK. However, these approaches were not compared on the same and large datasets. Moreover, there is still a need and space to improve their robustness and generalizability further. To mitigate the two research gaps, in this paper, we propose a spatial clustering-based voting approach combining several individual approaches and compare a voting ensemble with 20 latest and commonly-used approaches based on 12 public datasets, including several highly challenging datasets (e.g., WikToR). They are in six types: tweets, historical documents, news, web pages, scientific articles, and Wikipedia articles, containing 98,300 toponyms. Experimental results show that the voting ensemble performs the best on all the datasets, achieving an average Accuracy@161km of 0.86, proving its generalizability and robustness. It also drastically improves the performance of resolving fine-grained places, i.e., POIs, natural features, and traffic ways. The detailed evaluation results can inform future methodological developments and guide the selection of proper approaches based on application needs.
first_indexed 2024-04-10T15:06:18Z
format Article
id doaj.art-55d0e21395a4439ab77655a6a50e9bc2
institution Directory Open Access Journal
issn 1569-8432
language English
last_indexed 2024-04-10T15:06:18Z
publishDate 2023-03-01
publisher Elsevier
record_format Article
series International Journal of Applied Earth Observations and Geoinformation
spelling doaj.art-55d0e21395a4439ab77655a6a50e9bc22023-02-15T04:27:28ZengElsevierInternational Journal of Applied Earth Observations and Geoinformation1569-84322023-03-01117103191How can voting mechanisms improve the robustness and generalizability of toponym disambiguation?Xuke Hu0Yeran Sun1Jens Kersten2Zhiyong Zhou3Friederike Klan4Hongchao Fan5Institute of Data Science, German Aerospace Center, Germany; Corresponding author.Department of Geography, University of Lincoln, UKInstitute of Data Science, German Aerospace Center, GermanyDepartment of Geography, University of Zurich, SwitzerlandInstitute of Data Science, German Aerospace Center, GermanyDepartment of Civil and Environmental Engineering, Norwegian University of Science and Technology, NorwayNatural language texts, such as tweets and news, contain a vast amount of geospatial information, which can be extracted by first recognizing toponyms in texts (toponym recognition) and then identifying their geospatial representations (toponym disambiguation). This paper focuses on toponym disambiguation, which can be approached by toponym resolution and entity linking. Recently, many novel approaches, especially deep learning-based, have been proposed, such as CamCoder, GENRE, and BLINK. However, these approaches were not compared on the same and large datasets. Moreover, there is still a need and space to improve their robustness and generalizability further. To mitigate the two research gaps, in this paper, we propose a spatial clustering-based voting approach combining several individual approaches and compare a voting ensemble with 20 latest and commonly-used approaches based on 12 public datasets, including several highly challenging datasets (e.g., WikToR). They are in six types: tweets, historical documents, news, web pages, scientific articles, and Wikipedia articles, containing 98,300 toponyms. Experimental results show that the voting ensemble performs the best on all the datasets, achieving an average Accuracy@161km of 0.86, proving its generalizability and robustness. It also drastically improves the performance of resolving fine-grained places, i.e., POIs, natural features, and traffic ways. The detailed evaluation results can inform future methodological developments and guide the selection of proper approaches based on application needs.http://www.sciencedirect.com/science/article/pii/S1569843223000134Toponym disambiguationToponym resolutionGeocodingGeoparsingEntity linkingVoting
spellingShingle Xuke Hu
Yeran Sun
Jens Kersten
Zhiyong Zhou
Friederike Klan
Hongchao Fan
How can voting mechanisms improve the robustness and generalizability of toponym disambiguation?
International Journal of Applied Earth Observations and Geoinformation
Toponym disambiguation
Toponym resolution
Geocoding
Geoparsing
Entity linking
Voting
title How can voting mechanisms improve the robustness and generalizability of toponym disambiguation?
title_full How can voting mechanisms improve the robustness and generalizability of toponym disambiguation?
title_fullStr How can voting mechanisms improve the robustness and generalizability of toponym disambiguation?
title_full_unstemmed How can voting mechanisms improve the robustness and generalizability of toponym disambiguation?
title_short How can voting mechanisms improve the robustness and generalizability of toponym disambiguation?
title_sort how can voting mechanisms improve the robustness and generalizability of toponym disambiguation
topic Toponym disambiguation
Toponym resolution
Geocoding
Geoparsing
Entity linking
Voting
url http://www.sciencedirect.com/science/article/pii/S1569843223000134
work_keys_str_mv AT xukehu howcanvotingmechanismsimprovetherobustnessandgeneralizabilityoftoponymdisambiguation
AT yeransun howcanvotingmechanismsimprovetherobustnessandgeneralizabilityoftoponymdisambiguation
AT jenskersten howcanvotingmechanismsimprovetherobustnessandgeneralizabilityoftoponymdisambiguation
AT zhiyongzhou howcanvotingmechanismsimprovetherobustnessandgeneralizabilityoftoponymdisambiguation
AT friederikeklan howcanvotingmechanismsimprovetherobustnessandgeneralizabilityoftoponymdisambiguation
AT hongchaofan howcanvotingmechanismsimprovetherobustnessandgeneralizabilityoftoponymdisambiguation