Improving a Street-Based Geocoding Algorithm Using Machine Learning Techniques

Address matching is a crucial step in geocoding; however, this step forms a bottleneck for geocoding accuracy, as precise input is the biggest challenge for establishing perfect matches. Matches still have to be established despite the inevitability of incorrect address inputs such as misspellings,...

Full description

Bibliographic Details
Main Authors: Kangjae Lee, Alexis Richard C. Claridades, Jiyeong Lee
Format: Article
Language:English
Published: MDPI AG 2020-08-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/10/16/5628
_version_ 1797558271806013440
author Kangjae Lee
Alexis Richard C. Claridades
Jiyeong Lee
author_facet Kangjae Lee
Alexis Richard C. Claridades
Jiyeong Lee
author_sort Kangjae Lee
collection DOAJ
description Address matching is a crucial step in geocoding; however, this step forms a bottleneck for geocoding accuracy, as precise input is the biggest challenge for establishing perfect matches. Matches still have to be established despite the inevitability of incorrect address inputs such as misspellings, abbreviations, informal and non-standard names, slangs, or coded terms. Thus, this study suggests an address geocoding system using machine learning to enhance the address matching implemented on street-based addresses. Three different kinds of machine learning methods are tested to find the best method showing the highest accuracy. The performance of address matching using machine learning models is compared to multiple text similarity metrics, which are generally used for the word matching. It was proved that extreme gradient boosting with the optimal hyper-parameters was the best machine learning method with the highest accuracy in the address matching process, and the accuracy of extreme gradient boosting outperformed similarity metrics when using training data or input data. The address matching process using machine learning achieved high accuracy and can be applied to any geocoding systems to precisely convert addresses into geographic coordinates for various research and applications, including car navigation.
first_indexed 2024-03-10T17:28:56Z
format Article
id doaj.art-eadb3e8e41314bbf8d77e01c4bf86cd0
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-10T17:28:56Z
publishDate 2020-08-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-eadb3e8e41314bbf8d77e01c4bf86cd02023-11-20T10:06:05ZengMDPI AGApplied Sciences2076-34172020-08-011016562810.3390/app10165628Improving a Street-Based Geocoding Algorithm Using Machine Learning TechniquesKangjae Lee0Alexis Richard C. Claridades1Jiyeong Lee2Department of Geoinformatics, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, KoreaDepartment of Geoinformatics, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, KoreaDepartment of Geoinformatics, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, KoreaAddress matching is a crucial step in geocoding; however, this step forms a bottleneck for geocoding accuracy, as precise input is the biggest challenge for establishing perfect matches. Matches still have to be established despite the inevitability of incorrect address inputs such as misspellings, abbreviations, informal and non-standard names, slangs, or coded terms. Thus, this study suggests an address geocoding system using machine learning to enhance the address matching implemented on street-based addresses. Three different kinds of machine learning methods are tested to find the best method showing the highest accuracy. The performance of address matching using machine learning models is compared to multiple text similarity metrics, which are generally used for the word matching. It was proved that extreme gradient boosting with the optimal hyper-parameters was the best machine learning method with the highest accuracy in the address matching process, and the accuracy of extreme gradient boosting outperformed similarity metrics when using training data or input data. The address matching process using machine learning achieved high accuracy and can be applied to any geocoding systems to precisely convert addresses into geographic coordinates for various research and applications, including car navigation.https://www.mdpi.com/2076-3417/10/16/5628geocodingmachine learningaddressalias
spellingShingle Kangjae Lee
Alexis Richard C. Claridades
Jiyeong Lee
Improving a Street-Based Geocoding Algorithm Using Machine Learning Techniques
Applied Sciences
geocoding
machine learning
address
alias
title Improving a Street-Based Geocoding Algorithm Using Machine Learning Techniques
title_full Improving a Street-Based Geocoding Algorithm Using Machine Learning Techniques
title_fullStr Improving a Street-Based Geocoding Algorithm Using Machine Learning Techniques
title_full_unstemmed Improving a Street-Based Geocoding Algorithm Using Machine Learning Techniques
title_short Improving a Street-Based Geocoding Algorithm Using Machine Learning Techniques
title_sort improving a street based geocoding algorithm using machine learning techniques
topic geocoding
machine learning
address
alias
url https://www.mdpi.com/2076-3417/10/16/5628
work_keys_str_mv AT kangjaelee improvingastreetbasedgeocodingalgorithmusingmachinelearningtechniques
AT alexisrichardcclaridades improvingastreetbasedgeocodingalgorithmusingmachinelearningtechniques
AT jiyeonglee improvingastreetbasedgeocodingalgorithmusingmachinelearningtechniques