METFormer: a motion enhanced transformer for multiple object tracking

Multiple object tracking (MOT) is an important task in computer vision, especially video analytics. Transformer-based methods are emerging approaches using both tracking and detection queries. However, motion modeling in existing transformer-based methods lacks effective association capability. Thus...

Full description

Bibliographic Details
Main Authors: Gao, Jianjun, Yap, Kim-Hui, Wang, Yi, Garg, Kratika, Han, Boon Siew
Other Authors: School of Electrical and Electronic Engineering
Format: Conference Paper
Language:English
Published: 2025
Subjects:
Online Access:https://hdl.handle.net/10356/182093
_version_ 1824456313024806912
author Gao, Jianjun
Yap, Kim-Hui
Wang, Yi
Garg, Kratika
Han, Boon Siew
author2 School of Electrical and Electronic Engineering
author_facet School of Electrical and Electronic Engineering
Gao, Jianjun
Yap, Kim-Hui
Wang, Yi
Garg, Kratika
Han, Boon Siew
author_sort Gao, Jianjun
collection NTU
description Multiple object tracking (MOT) is an important task in computer vision, especially video analytics. Transformer-based methods are emerging approaches using both tracking and detection queries. However, motion modeling in existing transformer-based methods lacks effective association capability. Thus, this paper introduces a new METFormer model, a Motion Enhanced TransFormer-based tracker with a novel global-local motion context learning technique to mitigate the lack of motion information in existing transformer-based methods. The global-local motion context learning technique first centers on difference-guided global motion learning to obtain temporal information from adjacent frames. Based on global motion, we leverage context-aware local object motion modelling to study motion patterns and enhance the feature representation for individual objects. Experimental results on the benchmark MOT17 dataset show that our proposed method can surpass the state-of-the-art Trackformer [21] by 1.8% on IDF1 and 21.7% on ID Switches under public detection settings.
first_indexed 2025-02-19T03:52:07Z
format Conference Paper
id ntu-10356/182093
institution Nanyang Technological University
language English
last_indexed 2025-02-19T03:52:07Z
publishDate 2025
record_format dspace
spelling ntu-10356/1820932025-01-10T15:42:27Z METFormer: a motion enhanced transformer for multiple object tracking Gao, Jianjun Yap, Kim-Hui Wang, Yi Garg, Kratika Han, Boon Siew School of Electrical and Electronic Engineering 2023 IEEE International Symposium on Circuits and Systems (ISCAS) Schaeffler Hub for Advanced REsearch (SHARE) Lab Computer and Information Science Multiple object tracking Motion modeling Multiple object tracking (MOT) is an important task in computer vision, especially video analytics. Transformer-based methods are emerging approaches using both tracking and detection queries. However, motion modeling in existing transformer-based methods lacks effective association capability. Thus, this paper introduces a new METFormer model, a Motion Enhanced TransFormer-based tracker with a novel global-local motion context learning technique to mitigate the lack of motion information in existing transformer-based methods. The global-local motion context learning technique first centers on difference-guided global motion learning to obtain temporal information from adjacent frames. Based on global motion, we leverage context-aware local object motion modelling to study motion patterns and enhance the feature representation for individual objects. Experimental results on the benchmark MOT17 dataset show that our proposed method can surpass the state-of-the-art Trackformer [21] by 1.8% on IDF1 and 21.7% on ID Switches under public detection settings. Agency for Science, Technology and Research (A*STAR) Submitted/Accepted version This research is supported by the Agency for Science, Technology and Research (A*STAR) under its IAF-ICP Programme I2001E0067 and the Schaeffler Hub for Advanced Research at NTU. 2025-01-09T06:18:07Z 2025-01-09T06:18:07Z 2023 Conference Paper Gao, J., Yap, K., Wang, Y., Garg, K. & Han, B. S. (2023). METFormer: a motion enhanced transformer for multiple object tracking. 2023 IEEE International Symposium on Circuits and Systems (ISCAS). https://dx.doi.org/10.1109/ISCAS46773.2023.10182032 9781665451093 https://hdl.handle.net/10356/182093 10.1109/ISCAS46773.2023.10182032 2-s2.0-85167728762 en I2001E0067 © 2023 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/ISCAS46773.2023.10182032. application/pdf
spellingShingle Computer and Information Science
Multiple object tracking
Motion modeling
Gao, Jianjun
Yap, Kim-Hui
Wang, Yi
Garg, Kratika
Han, Boon Siew
METFormer: a motion enhanced transformer for multiple object tracking
title METFormer: a motion enhanced transformer for multiple object tracking
title_full METFormer: a motion enhanced transformer for multiple object tracking
title_fullStr METFormer: a motion enhanced transformer for multiple object tracking
title_full_unstemmed METFormer: a motion enhanced transformer for multiple object tracking
title_short METFormer: a motion enhanced transformer for multiple object tracking
title_sort metformer a motion enhanced transformer for multiple object tracking
topic Computer and Information Science
Multiple object tracking
Motion modeling
url https://hdl.handle.net/10356/182093
work_keys_str_mv AT gaojianjun metformeramotionenhancedtransformerformultipleobjecttracking
AT yapkimhui metformeramotionenhancedtransformerformultipleobjecttracking
AT wangyi metformeramotionenhancedtransformerformultipleobjecttracking
AT gargkratika metformeramotionenhancedtransformerformultipleobjecttracking
AT hanboonsiew metformeramotionenhancedtransformerformultipleobjecttracking