An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data
Various traffic-sensing technologies have been employed to facilitate traffic control. Due to certain factors, e.g., malfunctioning devices and artificial mistakes, missing values typically occur in the Intelligent Transportation System (ITS) sensing datasets, resulting in a decrease in the data qua...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-04-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/20/7/1992 |
_version_ | 1797571588699193344 |
---|---|
author | Junsheng Huang Baohua Mao Yun Bai Tong Zhang Changjun Miao |
author_facet | Junsheng Huang Baohua Mao Yun Bai Tong Zhang Changjun Miao |
author_sort | Junsheng Huang |
collection | DOAJ |
description | Various traffic-sensing technologies have been employed to facilitate traffic control. Due to certain factors, e.g., malfunctioning devices and artificial mistakes, missing values typically occur in the Intelligent Transportation System (ITS) sensing datasets, resulting in a decrease in the data quality. In this study, an integrated imputation algorithm based on fuzzy C-means (FCM) and the genetic algorithm (GA) is proposed to improve the accuracy of the estimated values. The GA is applied to optimize the parameter of the membership degree and the number of cluster centroids in the FCM model. An experimental test of the taxi global positioning system (GPS) data in Manhattan, New York City, is employed to demonstrate the effectiveness of the integrated imputation approach. Three evaluation criteria, the root mean squared error (RMSE), correlation coefficient (R), and relative accuracy (RA), are used to verify the experimental results. Under the ±5% and ±10% thresholds, the average RAs obtained by the integrated imputation method are 0.576 and 0.785, which remain the highest among different methods, indicating that the integrated imputation method outperforms the history imputation method and the conventional FCM method. On the other hand, the clustering imputation performance with the Euclidean distance is better than that with the Manhattan distance. Thus, our proposed integrated imputation method can be employed to estimate the missing values in the daily traffic management. |
first_indexed | 2024-03-10T20:42:51Z |
format | Article |
id | doaj.art-54553bc7a56046f2bf8622242bb71e0b |
institution | Directory Open Access Journal |
issn | 1424-8220 |
language | English |
last_indexed | 2024-03-10T20:42:51Z |
publishDate | 2020-04-01 |
publisher | MDPI AG |
record_format | Article |
series | Sensors |
spelling | doaj.art-54553bc7a56046f2bf8622242bb71e0b2023-11-19T20:32:18ZengMDPI AGSensors1424-82202020-04-01207199210.3390/s20071992An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS DataJunsheng Huang0Baohua Mao1Yun Bai2Tong Zhang3Changjun Miao4School of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, ChinaSchool of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, ChinaSchool of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, ChinaSchool of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, ChinaSignal & Communication Research Institute, China Academy of Railway Sciences Corporation Limited, Beijing 100081, ChinaVarious traffic-sensing technologies have been employed to facilitate traffic control. Due to certain factors, e.g., malfunctioning devices and artificial mistakes, missing values typically occur in the Intelligent Transportation System (ITS) sensing datasets, resulting in a decrease in the data quality. In this study, an integrated imputation algorithm based on fuzzy C-means (FCM) and the genetic algorithm (GA) is proposed to improve the accuracy of the estimated values. The GA is applied to optimize the parameter of the membership degree and the number of cluster centroids in the FCM model. An experimental test of the taxi global positioning system (GPS) data in Manhattan, New York City, is employed to demonstrate the effectiveness of the integrated imputation approach. Three evaluation criteria, the root mean squared error (RMSE), correlation coefficient (R), and relative accuracy (RA), are used to verify the experimental results. Under the ±5% and ±10% thresholds, the average RAs obtained by the integrated imputation method are 0.576 and 0.785, which remain the highest among different methods, indicating that the integrated imputation method outperforms the history imputation method and the conventional FCM method. On the other hand, the clustering imputation performance with the Euclidean distance is better than that with the Manhattan distance. Thus, our proposed integrated imputation method can be employed to estimate the missing values in the daily traffic management.https://www.mdpi.com/1424-8220/20/7/1992Intelligent Transportation Systemmissing values imputationfuzzy C-meansgenetic algorithm |
spellingShingle | Junsheng Huang Baohua Mao Yun Bai Tong Zhang Changjun Miao An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data Sensors Intelligent Transportation System missing values imputation fuzzy C-means genetic algorithm |
title | An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data |
title_full | An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data |
title_fullStr | An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data |
title_full_unstemmed | An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data |
title_short | An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data |
title_sort | integrated fuzzy c means method for missing data imputation using taxi gps data |
topic | Intelligent Transportation System missing values imputation fuzzy C-means genetic algorithm |
url | https://www.mdpi.com/1424-8220/20/7/1992 |
work_keys_str_mv | AT junshenghuang anintegratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata AT baohuamao anintegratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata AT yunbai anintegratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata AT tongzhang anintegratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata AT changjunmiao anintegratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata AT junshenghuang integratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata AT baohuamao integratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata AT yunbai integratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata AT tongzhang integratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata AT changjunmiao integratedfuzzycmeansmethodformissingdataimputationusingtaxigpsdata |