Geographic Named Entity Recognition by Employing Natural Language Processing and an Improved BERT Model
Toponym recognition, or the challenge of detecting place names that have a similar referent, is involved in a number of activities connected to geographical information retrieval and geographical information sciences. This research focuses on recognizing Chinese toponyms from social media communicat...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-11-01
|
Series: | ISPRS International Journal of Geo-Information |
Subjects: | |
Online Access: | https://www.mdpi.com/2220-9964/11/12/598 |
_version_ | 1797457345132888064 |
---|---|
author | Liufeng Tao Zhong Xie Dexin Xu Kai Ma Qinjun Qiu Shengyong Pan Bo Huang |
author_facet | Liufeng Tao Zhong Xie Dexin Xu Kai Ma Qinjun Qiu Shengyong Pan Bo Huang |
author_sort | Liufeng Tao |
collection | DOAJ |
description | Toponym recognition, or the challenge of detecting place names that have a similar referent, is involved in a number of activities connected to geographical information retrieval and geographical information sciences. This research focuses on recognizing Chinese toponyms from social media communications. While broad named entity recognition methods are frequently used to locate places, their accuracy is hampered by the many linguistic abnormalities seen in social media posts, such as informal sentence constructions, name abbreviations, and misspellings. In this study, we describe a Chinese toponym identification model based on a hybrid neural network that was created with these linguistic inconsistencies in mind. Our method adds a number of improvements to a standard bidirectional recurrent neural network model to help with location detection in social media messages. We demonstrate the results of a wide-ranging evaluation of the performance of different supervised machine learning methods, which have the natural advantage of avoiding human design features. A set of controlled experiments with four test datasets (one constructed and three public datasets) demonstrates the performance of supervised machine learning that can achieve good results on the task, significantly outperforming seven baseline models. |
first_indexed | 2024-03-09T16:20:55Z |
format | Article |
id | doaj.art-a2cc9de7da8b46a78e4186da90812189 |
institution | Directory Open Access Journal |
issn | 2220-9964 |
language | English |
last_indexed | 2024-03-09T16:20:55Z |
publishDate | 2022-11-01 |
publisher | MDPI AG |
record_format | Article |
series | ISPRS International Journal of Geo-Information |
spelling | doaj.art-a2cc9de7da8b46a78e4186da908121892023-11-24T15:20:58ZengMDPI AGISPRS International Journal of Geo-Information2220-99642022-11-01111259810.3390/ijgi11120598Geographic Named Entity Recognition by Employing Natural Language Processing and an Improved BERT ModelLiufeng Tao0Zhong Xie1Dexin Xu2Kai Ma3Qinjun Qiu4Shengyong Pan5Bo Huang6School of Computer Science, China University of Geosciences, Wuhan 430074, ChinaSchool of Computer Science, China University of Geosciences, Wuhan 430074, ChinaWuhan Geomatics Institute, Wuhan 430074, ChinaHubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering, China Three Gorges University, Yichang 443002, ChinaSchool of Computer Science, China University of Geosciences, Wuhan 430074, ChinaWuhan Zondy Cyber Science & Technology Co., Ltd., Wuhan 430074, ChinaWuhan Zondy Cyber Science & Technology Co., Ltd., Wuhan 430074, ChinaToponym recognition, or the challenge of detecting place names that have a similar referent, is involved in a number of activities connected to geographical information retrieval and geographical information sciences. This research focuses on recognizing Chinese toponyms from social media communications. While broad named entity recognition methods are frequently used to locate places, their accuracy is hampered by the many linguistic abnormalities seen in social media posts, such as informal sentence constructions, name abbreviations, and misspellings. In this study, we describe a Chinese toponym identification model based on a hybrid neural network that was created with these linguistic inconsistencies in mind. Our method adds a number of improvements to a standard bidirectional recurrent neural network model to help with location detection in social media messages. We demonstrate the results of a wide-ranging evaluation of the performance of different supervised machine learning methods, which have the natural advantage of avoiding human design features. A set of controlled experiments with four test datasets (one constructed and three public datasets) demonstrates the performance of supervised machine learning that can achieve good results on the task, significantly outperforming seven baseline models.https://www.mdpi.com/2220-9964/11/12/598geographic named entity recognitionsocial media messagenatural language processingBERTtoponyms recognition |
spellingShingle | Liufeng Tao Zhong Xie Dexin Xu Kai Ma Qinjun Qiu Shengyong Pan Bo Huang Geographic Named Entity Recognition by Employing Natural Language Processing and an Improved BERT Model ISPRS International Journal of Geo-Information geographic named entity recognition social media message natural language processing BERT toponyms recognition |
title | Geographic Named Entity Recognition by Employing Natural Language Processing and an Improved BERT Model |
title_full | Geographic Named Entity Recognition by Employing Natural Language Processing and an Improved BERT Model |
title_fullStr | Geographic Named Entity Recognition by Employing Natural Language Processing and an Improved BERT Model |
title_full_unstemmed | Geographic Named Entity Recognition by Employing Natural Language Processing and an Improved BERT Model |
title_short | Geographic Named Entity Recognition by Employing Natural Language Processing and an Improved BERT Model |
title_sort | geographic named entity recognition by employing natural language processing and an improved bert model |
topic | geographic named entity recognition social media message natural language processing BERT toponyms recognition |
url | https://www.mdpi.com/2220-9964/11/12/598 |
work_keys_str_mv | AT liufengtao geographicnamedentityrecognitionbyemployingnaturallanguageprocessingandanimprovedbertmodel AT zhongxie geographicnamedentityrecognitionbyemployingnaturallanguageprocessingandanimprovedbertmodel AT dexinxu geographicnamedentityrecognitionbyemployingnaturallanguageprocessingandanimprovedbertmodel AT kaima geographicnamedentityrecognitionbyemployingnaturallanguageprocessingandanimprovedbertmodel AT qinjunqiu geographicnamedentityrecognitionbyemployingnaturallanguageprocessingandanimprovedbertmodel AT shengyongpan geographicnamedentityrecognitionbyemployingnaturallanguageprocessingandanimprovedbertmodel AT bohuang geographicnamedentityrecognitionbyemployingnaturallanguageprocessingandanimprovedbertmodel |