Spatial Context-Based Local Toponym Extraction and Chinese Textual Address Segmentation from Urban POI Data
Georeferencing by place names (known as toponyms) is the most common way of associating textual information with geographic locations. While computers use numeric coordinates (such as longitude-latitude pairs) to represent places, people generally refer to places via their toponyms. Query by toponym...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-03-01
|
Series: | ISPRS International Journal of Geo-Information |
Subjects: | |
Online Access: | https://www.mdpi.com/2220-9964/9/3/147 |
_version_ | 1818083394330820608 |
---|---|
author | Xi Kuai Renzhong Guo Zhijun Zhang Biao He Zhigang Zhao Han Guo |
author_facet | Xi Kuai Renzhong Guo Zhijun Zhang Biao He Zhigang Zhao Han Guo |
author_sort | Xi Kuai |
collection | DOAJ |
description | Georeferencing by place names (known as toponyms) is the most common way of associating textual information with geographic locations. While computers use numeric coordinates (such as longitude-latitude pairs) to represent places, people generally refer to places via their toponyms. Query by toponym is an effective way to find information about a geographic area. However, segmenting and parsing textual addresses to extract local toponyms is a difficult task in the geocoding field, especially in China. In this paper, a local spatial context-based framework is proposed to extract local toponyms and segment Chinese textual addresses. We collect urban points of interest (POIs) as an input data source; in this dataset, the textual address and geospatial position coordinates correspond at a one-to-one basis and can be easily used to explore the spatial distribution of local toponyms. The proposed framework involves two steps: address element identification and local toponym extraction. The first step identifies as many address element candidates as possible from a continuous string of textual addresses for each urban POI. The second step focuses on merging neighboring candidate pairs into local toponyms. A series of experiments are conducted to determine the thresholds for local toponym extraction based on precision-recall curves. Finally, we evaluate our framework by comparing its performance with three well-known Chinese word segmentation models. The comparative experimental results demonstrate that our framework achieves a better performance than do other models. |
first_indexed | 2024-12-10T19:37:18Z |
format | Article |
id | doaj.art-785e50f81754435294081bf66924fd8d |
institution | Directory Open Access Journal |
issn | 2220-9964 |
language | English |
last_indexed | 2024-12-10T19:37:18Z |
publishDate | 2020-03-01 |
publisher | MDPI AG |
record_format | Article |
series | ISPRS International Journal of Geo-Information |
spelling | doaj.art-785e50f81754435294081bf66924fd8d2022-12-22T01:36:06ZengMDPI AGISPRS International Journal of Geo-Information2220-99642020-03-019314710.3390/ijgi9030147ijgi9030147Spatial Context-Based Local Toponym Extraction and Chinese Textual Address Segmentation from Urban POI DataXi Kuai0Renzhong Guo1Zhijun Zhang2Biao He3Zhigang Zhao4Han Guo5Research Institute for Smart Cities, School of Architecture and Urban Planning, Shenzhen University, Shenzhen 518061, ChinaResearch Institute for Smart Cities, School of Architecture and Urban Planning, Shenzhen University, Shenzhen 518061, ChinaTianjin Institute of Surveying and Mapping, Changling Road, Tianjin 300381, ChinaResearch Institute for Smart Cities, School of Architecture and Urban Planning, Shenzhen University, Shenzhen 518061, ChinaResearch Institute for Smart Cities, School of Architecture and Urban Planning, Shenzhen University, Shenzhen 518061, ChinaKey Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Land and Resources, Shenzhen 518034, ChinaGeoreferencing by place names (known as toponyms) is the most common way of associating textual information with geographic locations. While computers use numeric coordinates (such as longitude-latitude pairs) to represent places, people generally refer to places via their toponyms. Query by toponym is an effective way to find information about a geographic area. However, segmenting and parsing textual addresses to extract local toponyms is a difficult task in the geocoding field, especially in China. In this paper, a local spatial context-based framework is proposed to extract local toponyms and segment Chinese textual addresses. We collect urban points of interest (POIs) as an input data source; in this dataset, the textual address and geospatial position coordinates correspond at a one-to-one basis and can be easily used to explore the spatial distribution of local toponyms. The proposed framework involves two steps: address element identification and local toponym extraction. The first step identifies as many address element candidates as possible from a continuous string of textual addresses for each urban POI. The second step focuses on merging neighboring candidate pairs into local toponyms. A series of experiments are conducted to determine the thresholds for local toponym extraction based on precision-recall curves. Finally, we evaluate our framework by comparing its performance with three well-known Chinese word segmentation models. The comparative experimental results demonstrate that our framework achieves a better performance than do other models.https://www.mdpi.com/2220-9964/9/3/147chinese textual addresschinese word segmentationdigital gazetteerlocationprecisionpoisegmentationspatial clusteringtoponym |
spellingShingle | Xi Kuai Renzhong Guo Zhijun Zhang Biao He Zhigang Zhao Han Guo Spatial Context-Based Local Toponym Extraction and Chinese Textual Address Segmentation from Urban POI Data ISPRS International Journal of Geo-Information chinese textual address chinese word segmentation digital gazetteer location precision poi segmentation spatial clustering toponym |
title | Spatial Context-Based Local Toponym Extraction and Chinese Textual Address Segmentation from Urban POI Data |
title_full | Spatial Context-Based Local Toponym Extraction and Chinese Textual Address Segmentation from Urban POI Data |
title_fullStr | Spatial Context-Based Local Toponym Extraction and Chinese Textual Address Segmentation from Urban POI Data |
title_full_unstemmed | Spatial Context-Based Local Toponym Extraction and Chinese Textual Address Segmentation from Urban POI Data |
title_short | Spatial Context-Based Local Toponym Extraction and Chinese Textual Address Segmentation from Urban POI Data |
title_sort | spatial context based local toponym extraction and chinese textual address segmentation from urban poi data |
topic | chinese textual address chinese word segmentation digital gazetteer location precision poi segmentation spatial clustering toponym |
url | https://www.mdpi.com/2220-9964/9/3/147 |
work_keys_str_mv | AT xikuai spatialcontextbasedlocaltoponymextractionandchinesetextualaddresssegmentationfromurbanpoidata AT renzhongguo spatialcontextbasedlocaltoponymextractionandchinesetextualaddresssegmentationfromurbanpoidata AT zhijunzhang spatialcontextbasedlocaltoponymextractionandchinesetextualaddresssegmentationfromurbanpoidata AT biaohe spatialcontextbasedlocaltoponymextractionandchinesetextualaddresssegmentationfromurbanpoidata AT zhigangzhao spatialcontextbasedlocaltoponymextractionandchinesetextualaddresssegmentationfromurbanpoidata AT hanguo spatialcontextbasedlocaltoponymextractionandchinesetextualaddresssegmentationfromurbanpoidata |