An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data

Various traffic-sensing technologies have been employed to facilitate traffic control. Due to certain factors, e.g., malfunctioning devices and artificial mistakes, missing values typically occur in the Intelligent Transportation System (ITS) sensing datasets, resulting in a decrease in the data qua...

Full description

Bibliographic Details
Main Authors: Junsheng Huang, Baohua Mao, Yun Bai, Tong Zhang, Changjun Miao
Format: Article
Language:English
Published: MDPI AG 2020-04-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/20/7/1992
_version_ 1797571588699193344
author Junsheng Huang
Baohua Mao
Yun Bai
Tong Zhang
Changjun Miao
author_facet Junsheng Huang
Baohua Mao
Yun Bai
Tong Zhang
Changjun Miao
author_sort Junsheng Huang
collection DOAJ
description Various traffic-sensing technologies have been employed to facilitate traffic control. Due to certain factors, e.g., malfunctioning devices and artificial mistakes, missing values typically occur in the Intelligent Transportation System (ITS) sensing datasets, resulting in a decrease in the data quality. In this study, an integrated imputation algorithm based on fuzzy C-means (FCM) and the genetic algorithm (GA) is proposed to improve the accuracy of the estimated values. The GA is applied to optimize the parameter of the membership degree and the number of cluster centroids in the FCM model. An experimental test of the taxi global positioning system (GPS) data in Manhattan, New York City, is employed to demonstrate the effectiveness of the integrated imputation approach. Three evaluation criteria, the root mean squared error (RMSE), correlation coefficient (R), and relative accuracy (RA), are used to verify the experimental results. Under the ±5% and ±10% thresholds, the average RAs obtained by the integrated imputation method are 0.576 and 0.785, which remain the highest among different methods, indicating that the integrated imputation method outperforms the history imputation method and the conventional FCM method. On the other hand, the clustering imputation performance with the Euclidean distance is better than that with the Manhattan distance. Thus, our proposed integrated imputation method can be employed to estimate the missing values in the daily traffic management.
first_indexed 2024-03-10T20:42:51Z
format Article
id doaj.art-54553bc7a56046f2bf8622242bb71e0b
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-10T20:42:51Z
publishDate 2020-04-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-54553bc7a56046f2bf8622242bb71e0b2023-11-19T20:32:18ZengMDPI AGSensors1424-82202020-04-01207199210.3390/s20071992An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS DataJunsheng Huang0Baohua Mao1Yun Bai2Tong Zhang3Changjun Miao4School of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, ChinaSchool of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, ChinaSchool of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, ChinaSchool of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, ChinaSignal & Communication Research Institute, China Academy of Railway Sciences Corporation Limited, Beijing 100081, ChinaVarious traffic-sensing technologies have been employed to facilitate traffic control. Due to certain factors, e.g., malfunctioning devices and artificial mistakes, missing values typically occur in the Intelligent Transportation System (ITS) sensing datasets, resulting in a decrease in the data quality. In this study, an integrated imputation algorithm based on fuzzy C-means (FCM) and the genetic algorithm (GA) is proposed to improve the accuracy of the estimated values. The GA is applied to optimize the parameter of the membership degree and the number of cluster centroids in the FCM model. An experimental test of the taxi global positioning system (GPS) data in Manhattan, New York City, is employed to demonstrate the effectiveness of the integrated imputation approach. Three evaluation criteria, the root mean squared error (RMSE), correlation coefficient (R), and relative accuracy (RA), are used to verify the experimental results. Under the ±5% and ±10% thresholds, the average RAs obtained by the integrated imputation method are 0.576 and 0.785, which remain the highest among different methods, indicating that the integrated imputation method outperforms the history imputation method and the conventional FCM method. On the other hand, the clustering imputation performance with the Euclidean distance is better than that with the Manhattan distance. Thus, our proposed integrated imputation method can be employed to estimate the missing values in the daily traffic management.https://www.mdpi.com/1424-8220/20/7/1992Intelligent Transportation Systemmissing values imputationfuzzy C-meansgenetic algorithm
spellingShingle Junsheng Huang
Baohua Mao
Yun Bai
Tong Zhang
Changjun Miao
An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data
Sensors
Intelligent Transportation System
missing values imputation
fuzzy C-means
genetic algorithm
title An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data
title_full An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data
title_fullStr An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data
title_full_unstemmed An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data
title_short An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data
title_sort integrated fuzzy c means method for missing data imputation using taxi gps data
topic Intelligent Transportation System
missing values imputation
fuzzy C-means
genetic algorithm
url https://www.mdpi.com/1424-8220/20/7/1992
work_keys_str_mv AT junshenghuang anintegratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata
AT baohuamao anintegratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata
AT yunbai anintegratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata
AT tongzhang anintegratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata
AT changjunmiao anintegratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata
AT junshenghuang integratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata
AT baohuamao integratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata
AT yunbai integratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata
AT tongzhang integratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata
AT changjunmiao integratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata