Computer vision for transit travel time prediction: an end-to-end framework using roadside urban imagery

Accurate travel time estimation is paramount for providing transit users with reliable schedules and dependable real-time information. This work is the first to utilize roadside urban imagery to aid transit agencies and practitioners in improving travel time prediction. We propose and evaluate an en...

Full description

Bibliographic Details
Main Authors:	Abdelhalim, Awad, Zhao, Jinhua
Other Authors:	Massachusetts Institute of Technology. Department of Urban Studies and Planning
Format:	Article
Language:	English
Published:	Springer Science and Business Media LLC 2024
Subjects:	Management Science and Operations Research Mechanical Engineering Transportation Information Systems
Online Access:	https://hdl.handle.net/1721.1/153640

_version_	1826194555902361600
author	Abdelhalim, Awad Zhao, Jinhua
author2	Massachusetts Institute of Technology. Department of Urban Studies and Planning
author_facet	Massachusetts Institute of Technology. Department of Urban Studies and Planning Abdelhalim, Awad Zhao, Jinhua
author_sort	Abdelhalim, Awad
collection	MIT
description	Accurate travel time estimation is paramount for providing transit users with reliable schedules and dependable real-time information. This work is the first to utilize roadside urban imagery to aid transit agencies and practitioners in improving travel time prediction. We propose and evaluate an end-to-end framework integrating traditional transit data sources with a roadside camera for automated image data acquisition, labeling, and model training to predict transit travel times across a segment of interest. First, we show how the General Transit Feed Specification real-time data can be utilized as an efficient activation mechanism for a roadside camera unit monitoring a segment of interest. Second, automated vehicle location data is utilized to generate ground truth labels for the acquired images based on the observed transit travel time percentiles across the camera-monitored segment during the time of image acquisition. Finally, the generated labeled image dataset is used to train and thoroughly evaluate a Vision Transformer (ViT) model to predict a discrete transit travel time range (band). The results of this exploratory study illustrate that the ViT model is able to learn image features and contents that best help it deduce the expected travel time range with an average validation accuracy ranging between 80 and 85%. We assess the interpretability of the ViT model’s predictions and showcase how this discrete travel time band prediction can subsequently improve continuous transit travel time estimation. The workflow and results presented in this study provide an end-to-end, scalable, automated, and highly efficient approach for integrating traditional transit data sources and roadside imagery to improve the estimation of transit travel duration. This work also demonstrates the added value of incorporating real-time information from computer-vision sources, which are becoming increasingly accessible and can have major implications for improving transit operations and passenger real-time information.
first_indexed	2024-09-23T09:58:04Z
format	Article
id	mit-1721.1/153640
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T09:58:04Z
publishDate	2024
publisher	Springer Science and Business Media LLC
record_format	dspace
spelling	mit-1721.1/1536402024-09-20T19:22:48Z Computer vision for transit travel time prediction: an end-to-end framework using roadside urban imagery Abdelhalim, Awad Zhao, Jinhua Massachusetts Institute of Technology. Department of Urban Studies and Planning Management Science and Operations Research Mechanical Engineering Transportation Information Systems Accurate travel time estimation is paramount for providing transit users with reliable schedules and dependable real-time information. This work is the first to utilize roadside urban imagery to aid transit agencies and practitioners in improving travel time prediction. We propose and evaluate an end-to-end framework integrating traditional transit data sources with a roadside camera for automated image data acquisition, labeling, and model training to predict transit travel times across a segment of interest. First, we show how the General Transit Feed Specification real-time data can be utilized as an efficient activation mechanism for a roadside camera unit monitoring a segment of interest. Second, automated vehicle location data is utilized to generate ground truth labels for the acquired images based on the observed transit travel time percentiles across the camera-monitored segment during the time of image acquisition. Finally, the generated labeled image dataset is used to train and thoroughly evaluate a Vision Transformer (ViT) model to predict a discrete transit travel time range (band). The results of this exploratory study illustrate that the ViT model is able to learn image features and contents that best help it deduce the expected travel time range with an average validation accuracy ranging between 80 and 85%. We assess the interpretability of the ViT model’s predictions and showcase how this discrete travel time band prediction can subsequently improve continuous transit travel time estimation. The workflow and results presented in this study provide an end-to-end, scalable, automated, and highly efficient approach for integrating traditional transit data sources and roadside imagery to improve the estimation of transit travel duration. This work also demonstrates the added value of incorporating real-time information from computer-vision sources, which are becoming increasingly accessible and can have major implications for improving transit operations and passenger real-time information. 2024-03-07T19:23:44Z 2024-03-07T19:23:44Z 2024-02-27 2024-03-03T04:10:34Z Article http://purl.org/eprint/type/JournalArticle 1866-749X 1613-7159 https://hdl.handle.net/1721.1/153640 Abdelhalim, A., Zhao, J. Computer vision for transit travel time prediction: an end-to-end framework using roadside urban imagery. Public Transp (2024). PUBLISHER_CC en 10.1007/s12469-023-00346-3 Public Transport Creative Commons Attribution https://creativecommons.org/licenses/by/4.0/ The Author(s) application/pdf Springer Science and Business Media LLC Springer Berlin Heidelberg
spellingShingle	Management Science and Operations Research Mechanical Engineering Transportation Information Systems Abdelhalim, Awad Zhao, Jinhua Computer vision for transit travel time prediction: an end-to-end framework using roadside urban imagery
title	Computer vision for transit travel time prediction: an end-to-end framework using roadside urban imagery
title_full	Computer vision for transit travel time prediction: an end-to-end framework using roadside urban imagery
title_fullStr	Computer vision for transit travel time prediction: an end-to-end framework using roadside urban imagery
title_full_unstemmed	Computer vision for transit travel time prediction: an end-to-end framework using roadside urban imagery
title_short	Computer vision for transit travel time prediction: an end-to-end framework using roadside urban imagery
title_sort	computer vision for transit travel time prediction an end to end framework using roadside urban imagery
topic	Management Science and Operations Research Mechanical Engineering Transportation Information Systems
url	https://hdl.handle.net/1721.1/153640
work_keys_str_mv	AT abdelhalimawad computervisionfortransittraveltimepredictionanendtoendframeworkusingroadsideurbanimagery AT zhaojinhua computervisionfortransittraveltimepredictionanendtoendframeworkusingroadsideurbanimagery

Computer vision for transit travel time prediction: an end-to-end framework using roadside urban imagery

Similar Items