Recurrent DETR: Transformer-Based Object Detection for Crowded Scenes

Recent Transformer-based object detectors have achieved remarkable performance on benchmark datasets, but few have addressed the real-world challenge of object detection in crowded scenes using transformers. This limitation stems from the fixed query set size of the transformer decoder, which restri...

Full description

Bibliographic Details
Main Authors:	Hyeong Kyu Choi, Chong Keun Paik, Hyun Woo Ko, Min-Chul Park, Hyunwoo J. Kim
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Computer vision object detection detection transformers dynamic computation
Online Access:	https://ieeexplore.ieee.org/document/10177153/

_version_	1827868540901785600
author	Hyeong Kyu Choi Chong Keun Paik Hyun Woo Ko Min-Chul Park Hyunwoo J. Kim
author_facet	Hyeong Kyu Choi Chong Keun Paik Hyun Woo Ko Min-Chul Park Hyunwoo J. Kim
author_sort	Hyeong Kyu Choi
collection	DOAJ
description	Recent Transformer-based object detectors have achieved remarkable performance on benchmark datasets, but few have addressed the real-world challenge of object detection in crowded scenes using transformers. This limitation stems from the fixed query set size of the transformer decoder, which restricts the model’s inference capacity. To overcome this challenge, we propose Recurrent Detection Transformer (Recurrent DETR), an object detector that iterates the decoder block to render more predictions with a finite number of query tokens. Recurrent DETR can adaptively control the number of decoder block iterations based on the image’s crowdedness or complexity, resulting in a variable-size prediction set. This is enabled by our novel Pondering Hungarian Loss, which helps the model to learn when additional computation is required to identify all the objects in a crowded scene. We demonstrate the effectiveness of Recurrent DETR on two datasets: COCO 2017, which represents a standard setting, and CrowdHuman, which features a crowded setting. Our experiments on both datasets show that Recurrent DETR achieves significant performance gains of 0.8 AP and 0.4 AP, respectively, over its base architectures. Moreover, we conduct comprehensive analyses under different query set size constraints to provide a thorough evaluation of our proposed method.
first_indexed	2024-03-12T15:32:55Z
format	Article
id	doaj.art-e07a3bf8ea284797839bbe62ed40a841
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-12T15:32:55Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-e07a3bf8ea284797839bbe62ed40a8412023-08-09T23:00:39ZengIEEEIEEE Access2169-35362023-01-0111786237864310.1109/ACCESS.2023.329353210177153Recurrent DETR: Transformer-Based Object Detection for Crowded ScenesHyeong Kyu Choi0https://orcid.org/0000-0003-2090-9273Chong Keun Paik1Hyun Woo Ko2Min-Chul Park3https://orcid.org/0000-0002-8575-085XHyunwoo J. Kim4https://orcid.org/0000-0002-2181-9264Department of Computer Science and Engineering, Korea University, Seoul, Republic of KoreaSamsung Electro-Mechanics, Suwon, Republic of KoreaDepartment of Computer Science and Engineering, Korea University, Seoul, Republic of KoreaDepartment of Computer Science and Engineering, Korea University, Seoul, Republic of KoreaDepartment of Computer Science and Engineering, Korea University, Seoul, Republic of KoreaRecent Transformer-based object detectors have achieved remarkable performance on benchmark datasets, but few have addressed the real-world challenge of object detection in crowded scenes using transformers. This limitation stems from the fixed query set size of the transformer decoder, which restricts the model’s inference capacity. To overcome this challenge, we propose Recurrent Detection Transformer (Recurrent DETR), an object detector that iterates the decoder block to render more predictions with a finite number of query tokens. Recurrent DETR can adaptively control the number of decoder block iterations based on the image’s crowdedness or complexity, resulting in a variable-size prediction set. This is enabled by our novel Pondering Hungarian Loss, which helps the model to learn when additional computation is required to identify all the objects in a crowded scene. We demonstrate the effectiveness of Recurrent DETR on two datasets: COCO 2017, which represents a standard setting, and CrowdHuman, which features a crowded setting. Our experiments on both datasets show that Recurrent DETR achieves significant performance gains of 0.8 AP and 0.4 AP, respectively, over its base architectures. Moreover, we conduct comprehensive analyses under different query set size constraints to provide a thorough evaluation of our proposed method.https://ieeexplore.ieee.org/document/10177153/Computer visionobject detectiondetection transformersdynamic computation
spellingShingle	Hyeong Kyu Choi Chong Keun Paik Hyun Woo Ko Min-Chul Park Hyunwoo J. Kim Recurrent DETR: Transformer-Based Object Detection for Crowded Scenes IEEE Access Computer vision object detection detection transformers dynamic computation
title	Recurrent DETR: Transformer-Based Object Detection for Crowded Scenes
title_full	Recurrent DETR: Transformer-Based Object Detection for Crowded Scenes
title_fullStr	Recurrent DETR: Transformer-Based Object Detection for Crowded Scenes
title_full_unstemmed	Recurrent DETR: Transformer-Based Object Detection for Crowded Scenes
title_short	Recurrent DETR: Transformer-Based Object Detection for Crowded Scenes
title_sort	recurrent detr transformer based object detection for crowded scenes
topic	Computer vision object detection detection transformers dynamic computation
url	https://ieeexplore.ieee.org/document/10177153/
work_keys_str_mv	AT hyeongkyuchoi recurrentdetrtransformerbasedobjectdetectionforcrowdedscenes AT chongkeunpaik recurrentdetrtransformerbasedobjectdetectionforcrowdedscenes AT hyunwooko recurrentdetrtransformerbasedobjectdetectionforcrowdedscenes AT minchulpark recurrentdetrtransformerbasedobjectdetectionforcrowdedscenes AT hyunwoojkim recurrentdetrtransformerbasedobjectdetectionforcrowdedscenes

Recurrent DETR: Transformer-Based Object Detection for Crowded Scenes

Similar Items