METFormer: a motion enhanced transformer for multiple object tracking
Multiple object tracking (MOT) is an important task in computer vision, especially video analytics. Transformer-based methods are emerging approaches using both tracking and detection queries. However, motion modeling in existing transformer-based methods lacks effective association capability. Thus...
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Conference Paper |
Language: | English |
Published: |
2025
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/182093 |
_version_ | 1824456313024806912 |
---|---|
author | Gao, Jianjun Yap, Kim-Hui Wang, Yi Garg, Kratika Han, Boon Siew |
author2 | School of Electrical and Electronic Engineering |
author_facet | School of Electrical and Electronic Engineering Gao, Jianjun Yap, Kim-Hui Wang, Yi Garg, Kratika Han, Boon Siew |
author_sort | Gao, Jianjun |
collection | NTU |
description | Multiple object tracking (MOT) is an important task in computer vision, especially video analytics. Transformer-based methods are emerging approaches using both tracking and detection queries. However, motion modeling in existing transformer-based methods lacks effective association capability. Thus, this paper introduces a new METFormer model, a Motion Enhanced TransFormer-based tracker with a novel global-local motion context learning technique to mitigate the lack of motion information in existing transformer-based methods. The global-local motion context learning technique first centers on difference-guided global motion learning to obtain temporal information from adjacent frames. Based on global motion, we leverage context-aware local object motion modelling to study motion patterns and enhance the feature representation for individual objects. Experimental results on the benchmark MOT17 dataset show that our proposed method can surpass the state-of-the-art Trackformer [21] by 1.8% on IDF1 and 21.7% on ID Switches under public detection settings. |
first_indexed | 2025-02-19T03:52:07Z |
format | Conference Paper |
id | ntu-10356/182093 |
institution | Nanyang Technological University |
language | English |
last_indexed | 2025-02-19T03:52:07Z |
publishDate | 2025 |
record_format | dspace |
spelling | ntu-10356/1820932025-01-10T15:42:27Z METFormer: a motion enhanced transformer for multiple object tracking Gao, Jianjun Yap, Kim-Hui Wang, Yi Garg, Kratika Han, Boon Siew School of Electrical and Electronic Engineering 2023 IEEE International Symposium on Circuits and Systems (ISCAS) Schaeffler Hub for Advanced REsearch (SHARE) Lab Computer and Information Science Multiple object tracking Motion modeling Multiple object tracking (MOT) is an important task in computer vision, especially video analytics. Transformer-based methods are emerging approaches using both tracking and detection queries. However, motion modeling in existing transformer-based methods lacks effective association capability. Thus, this paper introduces a new METFormer model, a Motion Enhanced TransFormer-based tracker with a novel global-local motion context learning technique to mitigate the lack of motion information in existing transformer-based methods. The global-local motion context learning technique first centers on difference-guided global motion learning to obtain temporal information from adjacent frames. Based on global motion, we leverage context-aware local object motion modelling to study motion patterns and enhance the feature representation for individual objects. Experimental results on the benchmark MOT17 dataset show that our proposed method can surpass the state-of-the-art Trackformer [21] by 1.8% on IDF1 and 21.7% on ID Switches under public detection settings. Agency for Science, Technology and Research (A*STAR) Submitted/Accepted version This research is supported by the Agency for Science, Technology and Research (A*STAR) under its IAF-ICP Programme I2001E0067 and the Schaeffler Hub for Advanced Research at NTU. 2025-01-09T06:18:07Z 2025-01-09T06:18:07Z 2023 Conference Paper Gao, J., Yap, K., Wang, Y., Garg, K. & Han, B. S. (2023). METFormer: a motion enhanced transformer for multiple object tracking. 2023 IEEE International Symposium on Circuits and Systems (ISCAS). https://dx.doi.org/10.1109/ISCAS46773.2023.10182032 9781665451093 https://hdl.handle.net/10356/182093 10.1109/ISCAS46773.2023.10182032 2-s2.0-85167728762 en I2001E0067 © 2023 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/ISCAS46773.2023.10182032. application/pdf |
spellingShingle | Computer and Information Science Multiple object tracking Motion modeling Gao, Jianjun Yap, Kim-Hui Wang, Yi Garg, Kratika Han, Boon Siew METFormer: a motion enhanced transformer for multiple object tracking |
title | METFormer: a motion enhanced transformer for multiple object tracking |
title_full | METFormer: a motion enhanced transformer for multiple object tracking |
title_fullStr | METFormer: a motion enhanced transformer for multiple object tracking |
title_full_unstemmed | METFormer: a motion enhanced transformer for multiple object tracking |
title_short | METFormer: a motion enhanced transformer for multiple object tracking |
title_sort | metformer a motion enhanced transformer for multiple object tracking |
topic | Computer and Information Science Multiple object tracking Motion modeling |
url | https://hdl.handle.net/10356/182093 |
work_keys_str_mv | AT gaojianjun metformeramotionenhancedtransformerformultipleobjecttracking AT yapkimhui metformeramotionenhancedtransformerformultipleobjecttracking AT wangyi metformeramotionenhancedtransformerformultipleobjecttracking AT gargkratika metformeramotionenhancedtransformerformultipleobjecttracking AT hanboonsiew metformeramotionenhancedtransformerformultipleobjecttracking |