DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection

We present a framework for attention-based video object detection using a simple yet effective external memory management algorithm. An attention mechanism has been adopted in video object detection task to enrich the features of key frames using adjacent frames. Although several recent studies util...

Full description

Bibliographic Details
Main Authors: Si-Dong Roh, Ki-Seok Chung
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9874741/
_version_ 1798033392878485504
author Si-Dong Roh
Ki-Seok Chung
author_facet Si-Dong Roh
Ki-Seok Chung
author_sort Si-Dong Roh
collection DOAJ
description We present a framework for attention-based video object detection using a simple yet effective external memory management algorithm. An attention mechanism has been adopted in video object detection task to enrich the features of key frames using adjacent frames. Although several recent studies utilized frame-level first-in-first-out (FIFO) memory to collect global video information, such a memory structure suffers from collection inefficiency, which results in low attention performance and high computational cost. To address this issue, we developed a novel scheme called diversity-aware feature aggregation (DAFA). Whereas other methods do not store sufficient feature information without expanding memory capacity, DAFA efficiently collects diverse features while avoiding redundancy using a simple Euclidean distance-based metric. Experimental results on the ImageNet VID dataset demonstrate that our lightweight model with global attention achieves 83.5 mAP on the ResNet-101 backbone, which exceeds the accuracy levels of most existing methods with a minimum runtime. Our method with global and local attention stages obtains 84.5 and 85.9 mAP on ResNet-101 and ResNeXt-101, respectively, thus achieving state-of-the-art performance without requiring additional post-processing methods.
first_indexed 2024-04-11T20:29:41Z
format Article
id doaj.art-c21154e315254c2a8aabc00c4f8467aa
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-11T20:29:41Z
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-c21154e315254c2a8aabc00c4f8467aa2022-12-22T04:04:33ZengIEEEIEEE Access2169-35362022-01-0110934539346310.1109/ACCESS.2022.32033999874741DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object DetectionSi-Dong Roh0https://orcid.org/0000-0001-5961-948XKi-Seok Chung1https://orcid.org/0000-0002-2908-8443Department of Electronic Engineering, Hanyang University, Seoul, South KoreaDepartment of Electronic Engineering, Hanyang University, Seoul, South KoreaWe present a framework for attention-based video object detection using a simple yet effective external memory management algorithm. An attention mechanism has been adopted in video object detection task to enrich the features of key frames using adjacent frames. Although several recent studies utilized frame-level first-in-first-out (FIFO) memory to collect global video information, such a memory structure suffers from collection inefficiency, which results in low attention performance and high computational cost. To address this issue, we developed a novel scheme called diversity-aware feature aggregation (DAFA). Whereas other methods do not store sufficient feature information without expanding memory capacity, DAFA efficiently collects diverse features while avoiding redundancy using a simple Euclidean distance-based metric. Experimental results on the ImageNet VID dataset demonstrate that our lightweight model with global attention achieves 83.5 mAP on the ResNet-101 backbone, which exceeds the accuracy levels of most existing methods with a minimum runtime. Our method with global and local attention stages obtains 84.5 and 85.9 mAP on ResNet-101 and ResNeXt-101, respectively, thus achieving state-of-the-art performance without requiring additional post-processing methods.https://ieeexplore.ieee.org/document/9874741/Attention mechanismdiversity-awareneural networksspatio-temporalvideo object detection
spellingShingle Si-Dong Roh
Ki-Seok Chung
DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection
IEEE Access
Attention mechanism
diversity-aware
neural networks
spatio-temporal
video object detection
title DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection
title_full DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection
title_fullStr DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection
title_full_unstemmed DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection
title_short DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection
title_sort dafa diversity aware feature aggregation for attention based video object detection
topic Attention mechanism
diversity-aware
neural networks
spatio-temporal
video object detection
url https://ieeexplore.ieee.org/document/9874741/
work_keys_str_mv AT sidongroh dafadiversityawarefeatureaggregationforattentionbasedvideoobjectdetection
AT kiseokchung dafadiversityawarefeatureaggregationforattentionbasedvideoobjectdetection