A Map Reduce-Based Nearest Neighbor Approach for Big-Data-Driven Traffic Flow Prediction

In big-data-driven traffic flow prediction systems, the robustness of prediction performance depends on accuracy and timeliness. This paper presents a new MapReduce-based nearest neighbor (NN) approach for <italic>traffic flow prediction</italic> using <italic>correlation</itali...

Full description

Bibliographic Details
Main Authors: Dawen Xia, Huaqing Li, Binfeng Wang, Yantao Li, Zili Zhang
Format: Article
Language:English
Published: IEEE 2016-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/7499912/
_version_ 1818876942393278464
author Dawen Xia
Huaqing Li
Binfeng Wang
Yantao Li
Zili Zhang
author_facet Dawen Xia
Huaqing Li
Binfeng Wang
Yantao Li
Zili Zhang
author_sort Dawen Xia
collection DOAJ
description In big-data-driven traffic flow prediction systems, the robustness of prediction performance depends on accuracy and timeliness. This paper presents a new MapReduce-based nearest neighbor (NN) approach for <italic>traffic flow prediction</italic> using <italic>correlation</italic> analysis (TFPC) on a Hadoop platform. In particular, we develop a real-time prediction system including two key modules, i.e., offline distributed training (ODT) and online parallel prediction (OPP). Moreover, we build a parallel <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-nearest neighbor optimization classifier, which incorporates correlation information among traffic flows into the classification process. Finally, we propose a novel prediction calculation method, combining the current data observed in OPP and the classification results obtained from large-scale historical data in ODT, to generate traffic flow prediction in real time. The empirical study on real-world traffic flow big data using the leave-one-out cross validation method shows that TFPC significantly outperforms four state-of-the-art prediction approaches, i.e., autoregressive integrated moving average, Na&#x00EF;ve Bayes, multilayer perceptron neural networks, and NN regression, in terms of accuracy, which can be improved 90.07&#x0025; in the best case, with an average mean absolute percent error of 5.53&#x0025;. In addition, it displays excellent speedup, scaleup, and sizeup.
first_indexed 2024-12-19T13:50:24Z
format Article
id doaj.art-69d6934a622f4f14ac02fdb01b639de8
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-19T13:50:24Z
publishDate 2016-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-69d6934a622f4f14ac02fdb01b639de82022-12-21T20:18:45ZengIEEEIEEE Access2169-35362016-01-0142920293410.1109/ACCESS.2016.25700217499912A Map Reduce-Based Nearest Neighbor Approach for Big-Data-Driven Traffic Flow PredictionDawen Xia0https://orcid.org/0000-0002-0151-9643Huaqing Li1Binfeng Wang2Yantao Li3Zili Zhang4School of Computer and Information Science, Southwest University, Chongqing, ChinaSchool of Electronics and Information Engineering, Southwest University, Chongqing, ChinaSchool of Computer and Information Science, Southwest University, Chongqing, ChinaSchool of Computer and Information Science, Southwest University, Chongqing, ChinaSchool of Computer and Information Science, Southwest University, Chongqing, ChinaIn big-data-driven traffic flow prediction systems, the robustness of prediction performance depends on accuracy and timeliness. This paper presents a new MapReduce-based nearest neighbor (NN) approach for <italic>traffic flow prediction</italic> using <italic>correlation</italic> analysis (TFPC) on a Hadoop platform. In particular, we develop a real-time prediction system including two key modules, i.e., offline distributed training (ODT) and online parallel prediction (OPP). Moreover, we build a parallel <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-nearest neighbor optimization classifier, which incorporates correlation information among traffic flows into the classification process. Finally, we propose a novel prediction calculation method, combining the current data observed in OPP and the classification results obtained from large-scale historical data in ODT, to generate traffic flow prediction in real time. The empirical study on real-world traffic flow big data using the leave-one-out cross validation method shows that TFPC significantly outperforms four state-of-the-art prediction approaches, i.e., autoregressive integrated moving average, Na&#x00EF;ve Bayes, multilayer perceptron neural networks, and NN regression, in terms of accuracy, which can be improved 90.07&#x0025; in the best case, with an average mean absolute percent error of 5.53&#x0025;. In addition, it displays excellent speedup, scaleup, and sizeup.https://ieeexplore.ieee.org/document/7499912/Big data analyticstraffic flow predictioncorrelation analysisparallel classifierHadoop MapReduce
spellingShingle Dawen Xia
Huaqing Li
Binfeng Wang
Yantao Li
Zili Zhang
A Map Reduce-Based Nearest Neighbor Approach for Big-Data-Driven Traffic Flow Prediction
IEEE Access
Big data analytics
traffic flow prediction
correlation analysis
parallel classifier
Hadoop MapReduce
title A Map Reduce-Based Nearest Neighbor Approach for Big-Data-Driven Traffic Flow Prediction
title_full A Map Reduce-Based Nearest Neighbor Approach for Big-Data-Driven Traffic Flow Prediction
title_fullStr A Map Reduce-Based Nearest Neighbor Approach for Big-Data-Driven Traffic Flow Prediction
title_full_unstemmed A Map Reduce-Based Nearest Neighbor Approach for Big-Data-Driven Traffic Flow Prediction
title_short A Map Reduce-Based Nearest Neighbor Approach for Big-Data-Driven Traffic Flow Prediction
title_sort map reduce based nearest neighbor approach for big data driven traffic flow prediction
topic Big data analytics
traffic flow prediction
correlation analysis
parallel classifier
Hadoop MapReduce
url https://ieeexplore.ieee.org/document/7499912/
work_keys_str_mv AT dawenxia amapreducebasednearestneighborapproachforbigdatadriventrafficflowprediction
AT huaqingli amapreducebasednearestneighborapproachforbigdatadriventrafficflowprediction
AT binfengwang amapreducebasednearestneighborapproachforbigdatadriventrafficflowprediction
AT yantaoli amapreducebasednearestneighborapproachforbigdatadriventrafficflowprediction
AT zilizhang amapreducebasednearestneighborapproachforbigdatadriventrafficflowprediction
AT dawenxia mapreducebasednearestneighborapproachforbigdatadriventrafficflowprediction
AT huaqingli mapreducebasednearestneighborapproachforbigdatadriventrafficflowprediction
AT binfengwang mapreducebasednearestneighborapproachforbigdatadriventrafficflowprediction
AT yantaoli mapreducebasednearestneighborapproachforbigdatadriventrafficflowprediction
AT zilizhang mapreducebasednearestneighborapproachforbigdatadriventrafficflowprediction