How can voting mechanisms improve the robustness and generalizability of toponym disambiguation?
Natural language texts, such as tweets and news, contain a vast amount of geospatial information, which can be extracted by first recognizing toponyms in texts (toponym recognition) and then identifying their geospatial representations (toponym disambiguation). This paper focuses on toponym disambig...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-03-01
|
Series: | International Journal of Applied Earth Observations and Geoinformation |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S1569843223000134 |
_version_ | 1811163481393594368 |
---|---|
author | Xuke Hu Yeran Sun Jens Kersten Zhiyong Zhou Friederike Klan Hongchao Fan |
author_facet | Xuke Hu Yeran Sun Jens Kersten Zhiyong Zhou Friederike Klan Hongchao Fan |
author_sort | Xuke Hu |
collection | DOAJ |
description | Natural language texts, such as tweets and news, contain a vast amount of geospatial information, which can be extracted by first recognizing toponyms in texts (toponym recognition) and then identifying their geospatial representations (toponym disambiguation). This paper focuses on toponym disambiguation, which can be approached by toponym resolution and entity linking. Recently, many novel approaches, especially deep learning-based, have been proposed, such as CamCoder, GENRE, and BLINK. However, these approaches were not compared on the same and large datasets. Moreover, there is still a need and space to improve their robustness and generalizability further. To mitigate the two research gaps, in this paper, we propose a spatial clustering-based voting approach combining several individual approaches and compare a voting ensemble with 20 latest and commonly-used approaches based on 12 public datasets, including several highly challenging datasets (e.g., WikToR). They are in six types: tweets, historical documents, news, web pages, scientific articles, and Wikipedia articles, containing 98,300 toponyms. Experimental results show that the voting ensemble performs the best on all the datasets, achieving an average Accuracy@161km of 0.86, proving its generalizability and robustness. It also drastically improves the performance of resolving fine-grained places, i.e., POIs, natural features, and traffic ways. The detailed evaluation results can inform future methodological developments and guide the selection of proper approaches based on application needs. |
first_indexed | 2024-04-10T15:06:18Z |
format | Article |
id | doaj.art-55d0e21395a4439ab77655a6a50e9bc2 |
institution | Directory Open Access Journal |
issn | 1569-8432 |
language | English |
last_indexed | 2024-04-10T15:06:18Z |
publishDate | 2023-03-01 |
publisher | Elsevier |
record_format | Article |
series | International Journal of Applied Earth Observations and Geoinformation |
spelling | doaj.art-55d0e21395a4439ab77655a6a50e9bc22023-02-15T04:27:28ZengElsevierInternational Journal of Applied Earth Observations and Geoinformation1569-84322023-03-01117103191How can voting mechanisms improve the robustness and generalizability of toponym disambiguation?Xuke Hu0Yeran Sun1Jens Kersten2Zhiyong Zhou3Friederike Klan4Hongchao Fan5Institute of Data Science, German Aerospace Center, Germany; Corresponding author.Department of Geography, University of Lincoln, UKInstitute of Data Science, German Aerospace Center, GermanyDepartment of Geography, University of Zurich, SwitzerlandInstitute of Data Science, German Aerospace Center, GermanyDepartment of Civil and Environmental Engineering, Norwegian University of Science and Technology, NorwayNatural language texts, such as tweets and news, contain a vast amount of geospatial information, which can be extracted by first recognizing toponyms in texts (toponym recognition) and then identifying their geospatial representations (toponym disambiguation). This paper focuses on toponym disambiguation, which can be approached by toponym resolution and entity linking. Recently, many novel approaches, especially deep learning-based, have been proposed, such as CamCoder, GENRE, and BLINK. However, these approaches were not compared on the same and large datasets. Moreover, there is still a need and space to improve their robustness and generalizability further. To mitigate the two research gaps, in this paper, we propose a spatial clustering-based voting approach combining several individual approaches and compare a voting ensemble with 20 latest and commonly-used approaches based on 12 public datasets, including several highly challenging datasets (e.g., WikToR). They are in six types: tweets, historical documents, news, web pages, scientific articles, and Wikipedia articles, containing 98,300 toponyms. Experimental results show that the voting ensemble performs the best on all the datasets, achieving an average Accuracy@161km of 0.86, proving its generalizability and robustness. It also drastically improves the performance of resolving fine-grained places, i.e., POIs, natural features, and traffic ways. The detailed evaluation results can inform future methodological developments and guide the selection of proper approaches based on application needs.http://www.sciencedirect.com/science/article/pii/S1569843223000134Toponym disambiguationToponym resolutionGeocodingGeoparsingEntity linkingVoting |
spellingShingle | Xuke Hu Yeran Sun Jens Kersten Zhiyong Zhou Friederike Klan Hongchao Fan How can voting mechanisms improve the robustness and generalizability of toponym disambiguation? International Journal of Applied Earth Observations and Geoinformation Toponym disambiguation Toponym resolution Geocoding Geoparsing Entity linking Voting |
title | How can voting mechanisms improve the robustness and generalizability of toponym disambiguation? |
title_full | How can voting mechanisms improve the robustness and generalizability of toponym disambiguation? |
title_fullStr | How can voting mechanisms improve the robustness and generalizability of toponym disambiguation? |
title_full_unstemmed | How can voting mechanisms improve the robustness and generalizability of toponym disambiguation? |
title_short | How can voting mechanisms improve the robustness and generalizability of toponym disambiguation? |
title_sort | how can voting mechanisms improve the robustness and generalizability of toponym disambiguation |
topic | Toponym disambiguation Toponym resolution Geocoding Geoparsing Entity linking Voting |
url | http://www.sciencedirect.com/science/article/pii/S1569843223000134 |
work_keys_str_mv | AT xukehu howcanvotingmechanismsimprovetherobustnessandgeneralizabilityoftoponymdisambiguation AT yeransun howcanvotingmechanismsimprovetherobustnessandgeneralizabilityoftoponymdisambiguation AT jenskersten howcanvotingmechanismsimprovetherobustnessandgeneralizabilityoftoponymdisambiguation AT zhiyongzhou howcanvotingmechanismsimprovetherobustnessandgeneralizabilityoftoponymdisambiguation AT friederikeklan howcanvotingmechanismsimprovetherobustnessandgeneralizabilityoftoponymdisambiguation AT hongchaofan howcanvotingmechanismsimprovetherobustnessandgeneralizabilityoftoponymdisambiguation |