Improving a Street-Based Geocoding Algorithm Using Machine Learning Techniques
Address matching is a crucial step in geocoding; however, this step forms a bottleneck for geocoding accuracy, as precise input is the biggest challenge for establishing perfect matches. Matches still have to be established despite the inevitability of incorrect address inputs such as misspellings,...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-08-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/10/16/5628 |
_version_ | 1797558271806013440 |
---|---|
author | Kangjae Lee Alexis Richard C. Claridades Jiyeong Lee |
author_facet | Kangjae Lee Alexis Richard C. Claridades Jiyeong Lee |
author_sort | Kangjae Lee |
collection | DOAJ |
description | Address matching is a crucial step in geocoding; however, this step forms a bottleneck for geocoding accuracy, as precise input is the biggest challenge for establishing perfect matches. Matches still have to be established despite the inevitability of incorrect address inputs such as misspellings, abbreviations, informal and non-standard names, slangs, or coded terms. Thus, this study suggests an address geocoding system using machine learning to enhance the address matching implemented on street-based addresses. Three different kinds of machine learning methods are tested to find the best method showing the highest accuracy. The performance of address matching using machine learning models is compared to multiple text similarity metrics, which are generally used for the word matching. It was proved that extreme gradient boosting with the optimal hyper-parameters was the best machine learning method with the highest accuracy in the address matching process, and the accuracy of extreme gradient boosting outperformed similarity metrics when using training data or input data. The address matching process using machine learning achieved high accuracy and can be applied to any geocoding systems to precisely convert addresses into geographic coordinates for various research and applications, including car navigation. |
first_indexed | 2024-03-10T17:28:56Z |
format | Article |
id | doaj.art-eadb3e8e41314bbf8d77e01c4bf86cd0 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-10T17:28:56Z |
publishDate | 2020-08-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-eadb3e8e41314bbf8d77e01c4bf86cd02023-11-20T10:06:05ZengMDPI AGApplied Sciences2076-34172020-08-011016562810.3390/app10165628Improving a Street-Based Geocoding Algorithm Using Machine Learning TechniquesKangjae Lee0Alexis Richard C. Claridades1Jiyeong Lee2Department of Geoinformatics, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, KoreaDepartment of Geoinformatics, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, KoreaDepartment of Geoinformatics, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, KoreaAddress matching is a crucial step in geocoding; however, this step forms a bottleneck for geocoding accuracy, as precise input is the biggest challenge for establishing perfect matches. Matches still have to be established despite the inevitability of incorrect address inputs such as misspellings, abbreviations, informal and non-standard names, slangs, or coded terms. Thus, this study suggests an address geocoding system using machine learning to enhance the address matching implemented on street-based addresses. Three different kinds of machine learning methods are tested to find the best method showing the highest accuracy. The performance of address matching using machine learning models is compared to multiple text similarity metrics, which are generally used for the word matching. It was proved that extreme gradient boosting with the optimal hyper-parameters was the best machine learning method with the highest accuracy in the address matching process, and the accuracy of extreme gradient boosting outperformed similarity metrics when using training data or input data. The address matching process using machine learning achieved high accuracy and can be applied to any geocoding systems to precisely convert addresses into geographic coordinates for various research and applications, including car navigation.https://www.mdpi.com/2076-3417/10/16/5628geocodingmachine learningaddressalias |
spellingShingle | Kangjae Lee Alexis Richard C. Claridades Jiyeong Lee Improving a Street-Based Geocoding Algorithm Using Machine Learning Techniques Applied Sciences geocoding machine learning address alias |
title | Improving a Street-Based Geocoding Algorithm Using Machine Learning Techniques |
title_full | Improving a Street-Based Geocoding Algorithm Using Machine Learning Techniques |
title_fullStr | Improving a Street-Based Geocoding Algorithm Using Machine Learning Techniques |
title_full_unstemmed | Improving a Street-Based Geocoding Algorithm Using Machine Learning Techniques |
title_short | Improving a Street-Based Geocoding Algorithm Using Machine Learning Techniques |
title_sort | improving a street based geocoding algorithm using machine learning techniques |
topic | geocoding machine learning address alias |
url | https://www.mdpi.com/2076-3417/10/16/5628 |
work_keys_str_mv | AT kangjaelee improvingastreetbasedgeocodingalgorithmusingmachinelearningtechniques AT alexisrichardcclaridades improvingastreetbasedgeocodingalgorithmusingmachinelearningtechniques AT jiyeonglee improvingastreetbasedgeocodingalgorithmusingmachinelearningtechniques |