Computer vision for transit travel time prediction: an end-to-end framework using roadside urban imagery

Accurate travel time estimation is paramount for providing transit users with reliable schedules and dependable real-time information. This work is the first to utilize roadside urban imagery to aid transit agencies and practitioners in improving travel time prediction. We propose and evaluate an en...

Full description

Bibliographic Details
Main Authors: Abdelhalim, Awad, Zhao, Jinhua
Other Authors: Massachusetts Institute of Technology. Department of Urban Studies and Planning
Format: Article
Language:English
Published: Springer Science and Business Media LLC 2024
Subjects:
Online Access:https://hdl.handle.net/1721.1/153640
_version_ 1826194555902361600
author Abdelhalim, Awad
Zhao, Jinhua
author2 Massachusetts Institute of Technology. Department of Urban Studies and Planning
author_facet Massachusetts Institute of Technology. Department of Urban Studies and Planning
Abdelhalim, Awad
Zhao, Jinhua
author_sort Abdelhalim, Awad
collection MIT
description Accurate travel time estimation is paramount for providing transit users with reliable schedules and dependable real-time information. This work is the first to utilize roadside urban imagery to aid transit agencies and practitioners in improving travel time prediction. We propose and evaluate an end-to-end framework integrating traditional transit data sources with a roadside camera for automated image data acquisition, labeling, and model training to predict transit travel times across a segment of interest. First, we show how the General Transit Feed Specification real-time data can be utilized as an efficient activation mechanism for a roadside camera unit monitoring a segment of interest. Second, automated vehicle location data is utilized to generate ground truth labels for the acquired images based on the observed transit travel time percentiles across the camera-monitored segment during the time of image acquisition. Finally, the generated labeled image dataset is used to train and thoroughly evaluate a Vision Transformer (ViT) model to predict a discrete transit travel time range (band). The results of this exploratory study illustrate that the ViT model is able to learn image features and contents that best help it deduce the expected travel time range with an average validation accuracy ranging between 80 and 85%. We assess the interpretability of the ViT model’s predictions and showcase how this discrete travel time band prediction can subsequently improve continuous transit travel time estimation. The workflow and results presented in this study provide an end-to-end, scalable, automated, and highly efficient approach for integrating traditional transit data sources and roadside imagery to improve the estimation of transit travel duration. This work also demonstrates the added value of incorporating real-time information from computer-vision sources, which are becoming increasingly accessible and can have major implications for improving transit operations and passenger real-time information.
first_indexed 2024-09-23T09:58:04Z
format Article
id mit-1721.1/153640
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T09:58:04Z
publishDate 2024
publisher Springer Science and Business Media LLC
record_format dspace
spelling mit-1721.1/1536402024-09-20T19:22:48Z Computer vision for transit travel time prediction: an end-to-end framework using roadside urban imagery Abdelhalim, Awad Zhao, Jinhua Massachusetts Institute of Technology. Department of Urban Studies and Planning Management Science and Operations Research Mechanical Engineering Transportation Information Systems Accurate travel time estimation is paramount for providing transit users with reliable schedules and dependable real-time information. This work is the first to utilize roadside urban imagery to aid transit agencies and practitioners in improving travel time prediction. We propose and evaluate an end-to-end framework integrating traditional transit data sources with a roadside camera for automated image data acquisition, labeling, and model training to predict transit travel times across a segment of interest. First, we show how the General Transit Feed Specification real-time data can be utilized as an efficient activation mechanism for a roadside camera unit monitoring a segment of interest. Second, automated vehicle location data is utilized to generate ground truth labels for the acquired images based on the observed transit travel time percentiles across the camera-monitored segment during the time of image acquisition. Finally, the generated labeled image dataset is used to train and thoroughly evaluate a Vision Transformer (ViT) model to predict a discrete transit travel time range (band). The results of this exploratory study illustrate that the ViT model is able to learn image features and contents that best help it deduce the expected travel time range with an average validation accuracy ranging between 80 and 85%. We assess the interpretability of the ViT model’s predictions and showcase how this discrete travel time band prediction can subsequently improve continuous transit travel time estimation. The workflow and results presented in this study provide an end-to-end, scalable, automated, and highly efficient approach for integrating traditional transit data sources and roadside imagery to improve the estimation of transit travel duration. This work also demonstrates the added value of incorporating real-time information from computer-vision sources, which are becoming increasingly accessible and can have major implications for improving transit operations and passenger real-time information. 2024-03-07T19:23:44Z 2024-03-07T19:23:44Z 2024-02-27 2024-03-03T04:10:34Z Article http://purl.org/eprint/type/JournalArticle 1866-749X 1613-7159 https://hdl.handle.net/1721.1/153640 Abdelhalim, A., Zhao, J. Computer vision for transit travel time prediction: an end-to-end framework using roadside urban imagery. Public Transp (2024). PUBLISHER_CC en 10.1007/s12469-023-00346-3 Public Transport Creative Commons Attribution https://creativecommons.org/licenses/by/4.0/ The Author(s) application/pdf Springer Science and Business Media LLC Springer Berlin Heidelberg
spellingShingle Management Science and Operations Research
Mechanical Engineering
Transportation
Information Systems
Abdelhalim, Awad
Zhao, Jinhua
Computer vision for transit travel time prediction: an end-to-end framework using roadside urban imagery
title Computer vision for transit travel time prediction: an end-to-end framework using roadside urban imagery
title_full Computer vision for transit travel time prediction: an end-to-end framework using roadside urban imagery
title_fullStr Computer vision for transit travel time prediction: an end-to-end framework using roadside urban imagery
title_full_unstemmed Computer vision for transit travel time prediction: an end-to-end framework using roadside urban imagery
title_short Computer vision for transit travel time prediction: an end-to-end framework using roadside urban imagery
title_sort computer vision for transit travel time prediction an end to end framework using roadside urban imagery
topic Management Science and Operations Research
Mechanical Engineering
Transportation
Information Systems
url https://hdl.handle.net/1721.1/153640
work_keys_str_mv AT abdelhalimawad computervisionfortransittraveltimepredictionanendtoendframeworkusingroadsideurbanimagery
AT zhaojinhua computervisionfortransittraveltimepredictionanendtoendframeworkusingroadsideurbanimagery