Efficient Road Traffic Video Congestion Classification Based on the Multi-Head Self-Attention Vision Transformer Model

Due to rapid population growth, traffic congestion has become one of the major issues in urban areas. The utilization of technology may help to address this issue. This paper proposes a new Multi-head Self-attention Vision Transformer (MSViT) based macroscopic approach, for road traffic congestion c...

Full description

Bibliographic Details
Main Authors: Khalladi Sofiane Abdelkrim, Ouessai Asmâa, Benamara Nadir Kamel, Keche Mokhtar
Format: Article
Language:English
Published: Sciendo 2024-02-01
Series:Transport and Telecommunication
Subjects:
Online Access:https://doi.org/10.2478/ttj-2024-0003
_version_ 1797303100327854080
author Khalladi Sofiane Abdelkrim
Ouessai Asmâa
Benamara Nadir Kamel
Keche Mokhtar
author_facet Khalladi Sofiane Abdelkrim
Ouessai Asmâa
Benamara Nadir Kamel
Keche Mokhtar
author_sort Khalladi Sofiane Abdelkrim
collection DOAJ
description Due to rapid population growth, traffic congestion has become one of the major issues in urban areas. The utilization of technology may help to address this issue. This paper proposes a new Multi-head Self-attention Vision Transformer (MSViT) based macroscopic approach, for road traffic congestion classification. To evaluate this approach, we use the UCSD (University of California San Diego) dataset that includes different weather conditions (clear, overcast and rainy) and different traffic scenarios (light, medium and heavy). The classification accuracy reached a high level of 99.76% with this dataset and 99.37% when night-mode frames are added to it. The proposed MSViT based method outperforms the state-of-the-art macroscopic and microscopic methods that have been evaluated using the same UCSD dataset, which makes it an efficient solution for traffic congestion prediction.
first_indexed 2024-03-07T23:48:05Z
format Article
id doaj.art-57809967deb243b58cd15b8340c24944
institution Directory Open Access Journal
issn 1407-6179
language English
last_indexed 2024-03-07T23:48:05Z
publishDate 2024-02-01
publisher Sciendo
record_format Article
series Transport and Telecommunication
spelling doaj.art-57809967deb243b58cd15b8340c249442024-02-19T09:04:01ZengSciendoTransport and Telecommunication1407-61792024-02-01251203010.2478/ttj-2024-0003Efficient Road Traffic Video Congestion Classification Based on the Multi-Head Self-Attention Vision Transformer ModelKhalladi Sofiane Abdelkrim0Ouessai Asmâa1Benamara Nadir Kamel2Keche Mokhtar3Signals and images laboratory, Faculty of Electrical Engineering, Department of Electronics, University of Sciences and Technology of Oran Mohamed Boudiaf USTO-MB, B.P. 1505, El Mnaouar-Bir el Djir-Oran, Algeria2Faculty of Technology, Department of Telecommunications, Dr. Tahar Moulay University, Saida, AlgeriaSignals and images laboratory, Faculty of Electrical Engineering, Department of Electronics, University of Sciences and Technology of Oran Mohamed Boudiaf USTO-MB, B.P. 1505, El Mnaouar-Bir el Djir-Oran, AlgeriaSignals and images laboratory, Faculty of Electrical Engineering, Department of Electronics, University of Sciences and Technology of Oran Mohamed Boudiaf USTO-MB, B.P. 1505, El Mnaouar-Bir el Djir-Oran, AlgeriaDue to rapid population growth, traffic congestion has become one of the major issues in urban areas. The utilization of technology may help to address this issue. This paper proposes a new Multi-head Self-attention Vision Transformer (MSViT) based macroscopic approach, for road traffic congestion classification. To evaluate this approach, we use the UCSD (University of California San Diego) dataset that includes different weather conditions (clear, overcast and rainy) and different traffic scenarios (light, medium and heavy). The classification accuracy reached a high level of 99.76% with this dataset and 99.37% when night-mode frames are added to it. The proposed MSViT based method outperforms the state-of-the-art macroscopic and microscopic methods that have been evaluated using the same UCSD dataset, which makes it an efficient solution for traffic congestion prediction.https://doi.org/10.2478/ttj-2024-0003road traffic classificationmacroscopic approachvision transformersmulti-head self-attentiondeep learning
spellingShingle Khalladi Sofiane Abdelkrim
Ouessai Asmâa
Benamara Nadir Kamel
Keche Mokhtar
Efficient Road Traffic Video Congestion Classification Based on the Multi-Head Self-Attention Vision Transformer Model
Transport and Telecommunication
road traffic classification
macroscopic approach
vision transformers
multi-head self-attention
deep learning
title Efficient Road Traffic Video Congestion Classification Based on the Multi-Head Self-Attention Vision Transformer Model
title_full Efficient Road Traffic Video Congestion Classification Based on the Multi-Head Self-Attention Vision Transformer Model
title_fullStr Efficient Road Traffic Video Congestion Classification Based on the Multi-Head Self-Attention Vision Transformer Model
title_full_unstemmed Efficient Road Traffic Video Congestion Classification Based on the Multi-Head Self-Attention Vision Transformer Model
title_short Efficient Road Traffic Video Congestion Classification Based on the Multi-Head Self-Attention Vision Transformer Model
title_sort efficient road traffic video congestion classification based on the multi head self attention vision transformer model
topic road traffic classification
macroscopic approach
vision transformers
multi-head self-attention
deep learning
url https://doi.org/10.2478/ttj-2024-0003
work_keys_str_mv AT khalladisofianeabdelkrim efficientroadtrafficvideocongestionclassificationbasedonthemultiheadselfattentionvisiontransformermodel
AT ouessaiasmaa efficientroadtrafficvideocongestionclassificationbasedonthemultiheadselfattentionvisiontransformermodel
AT benamaranadirkamel efficientroadtrafficvideocongestionclassificationbasedonthemultiheadselfattentionvisiontransformermodel
AT kechemokhtar efficientroadtrafficvideocongestionclassificationbasedonthemultiheadselfattentionvisiontransformermodel