METFormer: a motion enhanced transformer for multiple object tracking

Multiple object tracking (MOT) is an important task in computer vision, especially video analytics. Transformer-based methods are emerging approaches using both tracking and detection queries. However, motion modeling in existing transformer-based methods lacks effective association capability. Thus...

Full description

Bibliographic Details
Main Authors:	Gao, Jianjun, Yap, Kim-Hui, Wang, Yi, Garg, Kratika, Han, Boon Siew
Other Authors:	School of Electrical and Electronic Engineering
Format:	Conference Paper
Language:	English
Published:	2025
Subjects:	Computer and Information Science Multiple object tracking Motion modeling
Online Access:	https://hdl.handle.net/10356/182093

_version_	1824456313024806912
author	Gao, Jianjun Yap, Kim-Hui Wang, Yi Garg, Kratika Han, Boon Siew
author2	School of Electrical and Electronic Engineering
author_facet	School of Electrical and Electronic Engineering Gao, Jianjun Yap, Kim-Hui Wang, Yi Garg, Kratika Han, Boon Siew
author_sort	Gao, Jianjun
collection	NTU
description	Multiple object tracking (MOT) is an important task in computer vision, especially video analytics. Transformer-based methods are emerging approaches using both tracking and detection queries. However, motion modeling in existing transformer-based methods lacks effective association capability. Thus, this paper introduces a new METFormer model, a Motion Enhanced TransFormer-based tracker with a novel global-local motion context learning technique to mitigate the lack of motion information in existing transformer-based methods. The global-local motion context learning technique first centers on difference-guided global motion learning to obtain temporal information from adjacent frames. Based on global motion, we leverage context-aware local object motion modelling to study motion patterns and enhance the feature representation for individual objects. Experimental results on the benchmark MOT17 dataset show that our proposed method can surpass the state-of-the-art Trackformer [21] by 1.8% on IDF1 and 21.7% on ID Switches under public detection settings.
first_indexed	2025-02-19T03:52:07Z
format	Conference Paper
id	ntu-10356/182093
institution	Nanyang Technological University
language	English
last_indexed	2025-02-19T03:52:07Z
publishDate	2025
record_format	dspace
spelling	ntu-10356/1820932025-01-10T15:42:27Z METFormer: a motion enhanced transformer for multiple object tracking Gao, Jianjun Yap, Kim-Hui Wang, Yi Garg, Kratika Han, Boon Siew School of Electrical and Electronic Engineering 2023 IEEE International Symposium on Circuits and Systems (ISCAS) Schaeffler Hub for Advanced REsearch (SHARE) Lab Computer and Information Science Multiple object tracking Motion modeling Multiple object tracking (MOT) is an important task in computer vision, especially video analytics. Transformer-based methods are emerging approaches using both tracking and detection queries. However, motion modeling in existing transformer-based methods lacks effective association capability. Thus, this paper introduces a new METFormer model, a Motion Enhanced TransFormer-based tracker with a novel global-local motion context learning technique to mitigate the lack of motion information in existing transformer-based methods. The global-local motion context learning technique first centers on difference-guided global motion learning to obtain temporal information from adjacent frames. Based on global motion, we leverage context-aware local object motion modelling to study motion patterns and enhance the feature representation for individual objects. Experimental results on the benchmark MOT17 dataset show that our proposed method can surpass the state-of-the-art Trackformer [21] by 1.8% on IDF1 and 21.7% on ID Switches under public detection settings. Agency for Science, Technology and Research (ASTAR) Submitted/Accepted version This research is supported by the Agency for Science, Technology and Research (ASTAR) under its IAF-ICP Programme I2001E0067 and the Schaeffler Hub for Advanced Research at NTU. 2025-01-09T06:18:07Z 2025-01-09T06:18:07Z 2023 Conference Paper Gao, J., Yap, K., Wang, Y., Garg, K. & Han, B. S. (2023). METFormer: a motion enhanced transformer for multiple object tracking. 2023 IEEE International Symposium on Circuits and Systems (ISCAS). https://dx.doi.org/10.1109/ISCAS46773.2023.10182032 9781665451093 https://hdl.handle.net/10356/182093 10.1109/ISCAS46773.2023.10182032 2-s2.0-85167728762 en I2001E0067 © 2023 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/ISCAS46773.2023.10182032. application/pdf
spellingShingle	Computer and Information Science Multiple object tracking Motion modeling Gao, Jianjun Yap, Kim-Hui Wang, Yi Garg, Kratika Han, Boon Siew METFormer: a motion enhanced transformer for multiple object tracking
title	METFormer: a motion enhanced transformer for multiple object tracking
title_full	METFormer: a motion enhanced transformer for multiple object tracking
title_fullStr	METFormer: a motion enhanced transformer for multiple object tracking
title_full_unstemmed	METFormer: a motion enhanced transformer for multiple object tracking
title_short	METFormer: a motion enhanced transformer for multiple object tracking
title_sort	metformer a motion enhanced transformer for multiple object tracking
topic	Computer and Information Science Multiple object tracking Motion modeling
url	https://hdl.handle.net/10356/182093
work_keys_str_mv	AT gaojianjun metformeramotionenhancedtransformerformultipleobjecttracking AT yapkimhui metformeramotionenhancedtransformerformultipleobjecttracking AT wangyi metformeramotionenhancedtransformerformultipleobjecttracking AT gargkratika metformeramotionenhancedtransformerformultipleobjecttracking AT hanboonsiew metformeramotionenhancedtransformerformultipleobjecttracking

METFormer: a motion enhanced transformer for multiple object tracking

Similar Items