Spatial Context-Based Local Toponym Extraction and Chinese Textual Address Segmentation from Urban POI Data

Georeferencing by place names (known as toponyms) is the most common way of associating textual information with geographic locations. While computers use numeric coordinates (such as longitude-latitude pairs) to represent places, people generally refer to places via their toponyms. Query by toponym...

Full description

Bibliographic Details
Main Authors: Xi Kuai, Renzhong Guo, Zhijun Zhang, Biao He, Zhigang Zhao, Han Guo
Format: Article
Language:English
Published: MDPI AG 2020-03-01
Series:ISPRS International Journal of Geo-Information
Subjects:
Online Access:https://www.mdpi.com/2220-9964/9/3/147
_version_ 1818083394330820608
author Xi Kuai
Renzhong Guo
Zhijun Zhang
Biao He
Zhigang Zhao
Han Guo
author_facet Xi Kuai
Renzhong Guo
Zhijun Zhang
Biao He
Zhigang Zhao
Han Guo
author_sort Xi Kuai
collection DOAJ
description Georeferencing by place names (known as toponyms) is the most common way of associating textual information with geographic locations. While computers use numeric coordinates (such as longitude-latitude pairs) to represent places, people generally refer to places via their toponyms. Query by toponym is an effective way to find information about a geographic area. However, segmenting and parsing textual addresses to extract local toponyms is a difficult task in the geocoding field, especially in China. In this paper, a local spatial context-based framework is proposed to extract local toponyms and segment Chinese textual addresses. We collect urban points of interest (POIs) as an input data source; in this dataset, the textual address and geospatial position coordinates correspond at a one-to-one basis and can be easily used to explore the spatial distribution of local toponyms. The proposed framework involves two steps: address element identification and local toponym extraction. The first step identifies as many address element candidates as possible from a continuous string of textual addresses for each urban POI. The second step focuses on merging neighboring candidate pairs into local toponyms. A series of experiments are conducted to determine the thresholds for local toponym extraction based on precision-recall curves. Finally, we evaluate our framework by comparing its performance with three well-known Chinese word segmentation models. The comparative experimental results demonstrate that our framework achieves a better performance than do other models.
first_indexed 2024-12-10T19:37:18Z
format Article
id doaj.art-785e50f81754435294081bf66924fd8d
institution Directory Open Access Journal
issn 2220-9964
language English
last_indexed 2024-12-10T19:37:18Z
publishDate 2020-03-01
publisher MDPI AG
record_format Article
series ISPRS International Journal of Geo-Information
spelling doaj.art-785e50f81754435294081bf66924fd8d2022-12-22T01:36:06ZengMDPI AGISPRS International Journal of Geo-Information2220-99642020-03-019314710.3390/ijgi9030147ijgi9030147Spatial Context-Based Local Toponym Extraction and Chinese Textual Address Segmentation from Urban POI DataXi Kuai0Renzhong Guo1Zhijun Zhang2Biao He3Zhigang Zhao4Han Guo5Research Institute for Smart Cities, School of Architecture and Urban Planning, Shenzhen University, Shenzhen 518061, ChinaResearch Institute for Smart Cities, School of Architecture and Urban Planning, Shenzhen University, Shenzhen 518061, ChinaTianjin Institute of Surveying and Mapping, Changling Road, Tianjin 300381, ChinaResearch Institute for Smart Cities, School of Architecture and Urban Planning, Shenzhen University, Shenzhen 518061, ChinaResearch Institute for Smart Cities, School of Architecture and Urban Planning, Shenzhen University, Shenzhen 518061, ChinaKey Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Land and Resources, Shenzhen 518034, ChinaGeoreferencing by place names (known as toponyms) is the most common way of associating textual information with geographic locations. While computers use numeric coordinates (such as longitude-latitude pairs) to represent places, people generally refer to places via their toponyms. Query by toponym is an effective way to find information about a geographic area. However, segmenting and parsing textual addresses to extract local toponyms is a difficult task in the geocoding field, especially in China. In this paper, a local spatial context-based framework is proposed to extract local toponyms and segment Chinese textual addresses. We collect urban points of interest (POIs) as an input data source; in this dataset, the textual address and geospatial position coordinates correspond at a one-to-one basis and can be easily used to explore the spatial distribution of local toponyms. The proposed framework involves two steps: address element identification and local toponym extraction. The first step identifies as many address element candidates as possible from a continuous string of textual addresses for each urban POI. The second step focuses on merging neighboring candidate pairs into local toponyms. A series of experiments are conducted to determine the thresholds for local toponym extraction based on precision-recall curves. Finally, we evaluate our framework by comparing its performance with three well-known Chinese word segmentation models. The comparative experimental results demonstrate that our framework achieves a better performance than do other models.https://www.mdpi.com/2220-9964/9/3/147chinese textual addresschinese word segmentationdigital gazetteerlocationprecisionpoisegmentationspatial clusteringtoponym
spellingShingle Xi Kuai
Renzhong Guo
Zhijun Zhang
Biao He
Zhigang Zhao
Han Guo
Spatial Context-Based Local Toponym Extraction and Chinese Textual Address Segmentation from Urban POI Data
ISPRS International Journal of Geo-Information
chinese textual address
chinese word segmentation
digital gazetteer
location
precision
poi
segmentation
spatial clustering
toponym
title Spatial Context-Based Local Toponym Extraction and Chinese Textual Address Segmentation from Urban POI Data
title_full Spatial Context-Based Local Toponym Extraction and Chinese Textual Address Segmentation from Urban POI Data
title_fullStr Spatial Context-Based Local Toponym Extraction and Chinese Textual Address Segmentation from Urban POI Data
title_full_unstemmed Spatial Context-Based Local Toponym Extraction and Chinese Textual Address Segmentation from Urban POI Data
title_short Spatial Context-Based Local Toponym Extraction and Chinese Textual Address Segmentation from Urban POI Data
title_sort spatial context based local toponym extraction and chinese textual address segmentation from urban poi data
topic chinese textual address
chinese word segmentation
digital gazetteer
location
precision
poi
segmentation
spatial clustering
toponym
url https://www.mdpi.com/2220-9964/9/3/147
work_keys_str_mv AT xikuai spatialcontextbasedlocaltoponymextractionandchinesetextualaddresssegmentationfromurbanpoidata
AT renzhongguo spatialcontextbasedlocaltoponymextractionandchinesetextualaddresssegmentationfromurbanpoidata
AT zhijunzhang spatialcontextbasedlocaltoponymextractionandchinesetextualaddresssegmentationfromurbanpoidata
AT biaohe spatialcontextbasedlocaltoponymextractionandchinesetextualaddresssegmentationfromurbanpoidata
AT zhigangzhao spatialcontextbasedlocaltoponymextractionandchinesetextualaddresssegmentationfromurbanpoidata
AT hanguo spatialcontextbasedlocaltoponymextractionandchinesetextualaddresssegmentationfromurbanpoidata