DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection
We present a framework for attention-based video object detection using a simple yet effective external memory management algorithm. An attention mechanism has been adopted in video object detection task to enrich the features of key frames using adjacent frames. Although several recent studies util...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2022-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9874741/ |
_version_ | 1798033392878485504 |
---|---|
author | Si-Dong Roh Ki-Seok Chung |
author_facet | Si-Dong Roh Ki-Seok Chung |
author_sort | Si-Dong Roh |
collection | DOAJ |
description | We present a framework for attention-based video object detection using a simple yet effective external memory management algorithm. An attention mechanism has been adopted in video object detection task to enrich the features of key frames using adjacent frames. Although several recent studies utilized frame-level first-in-first-out (FIFO) memory to collect global video information, such a memory structure suffers from collection inefficiency, which results in low attention performance and high computational cost. To address this issue, we developed a novel scheme called diversity-aware feature aggregation (DAFA). Whereas other methods do not store sufficient feature information without expanding memory capacity, DAFA efficiently collects diverse features while avoiding redundancy using a simple Euclidean distance-based metric. Experimental results on the ImageNet VID dataset demonstrate that our lightweight model with global attention achieves 83.5 mAP on the ResNet-101 backbone, which exceeds the accuracy levels of most existing methods with a minimum runtime. Our method with global and local attention stages obtains 84.5 and 85.9 mAP on ResNet-101 and ResNeXt-101, respectively, thus achieving state-of-the-art performance without requiring additional post-processing methods. |
first_indexed | 2024-04-11T20:29:41Z |
format | Article |
id | doaj.art-c21154e315254c2a8aabc00c4f8467aa |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-11T20:29:41Z |
publishDate | 2022-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-c21154e315254c2a8aabc00c4f8467aa2022-12-22T04:04:33ZengIEEEIEEE Access2169-35362022-01-0110934539346310.1109/ACCESS.2022.32033999874741DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object DetectionSi-Dong Roh0https://orcid.org/0000-0001-5961-948XKi-Seok Chung1https://orcid.org/0000-0002-2908-8443Department of Electronic Engineering, Hanyang University, Seoul, South KoreaDepartment of Electronic Engineering, Hanyang University, Seoul, South KoreaWe present a framework for attention-based video object detection using a simple yet effective external memory management algorithm. An attention mechanism has been adopted in video object detection task to enrich the features of key frames using adjacent frames. Although several recent studies utilized frame-level first-in-first-out (FIFO) memory to collect global video information, such a memory structure suffers from collection inefficiency, which results in low attention performance and high computational cost. To address this issue, we developed a novel scheme called diversity-aware feature aggregation (DAFA). Whereas other methods do not store sufficient feature information without expanding memory capacity, DAFA efficiently collects diverse features while avoiding redundancy using a simple Euclidean distance-based metric. Experimental results on the ImageNet VID dataset demonstrate that our lightweight model with global attention achieves 83.5 mAP on the ResNet-101 backbone, which exceeds the accuracy levels of most existing methods with a minimum runtime. Our method with global and local attention stages obtains 84.5 and 85.9 mAP on ResNet-101 and ResNeXt-101, respectively, thus achieving state-of-the-art performance without requiring additional post-processing methods.https://ieeexplore.ieee.org/document/9874741/Attention mechanismdiversity-awareneural networksspatio-temporalvideo object detection |
spellingShingle | Si-Dong Roh Ki-Seok Chung DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection IEEE Access Attention mechanism diversity-aware neural networks spatio-temporal video object detection |
title | DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection |
title_full | DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection |
title_fullStr | DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection |
title_full_unstemmed | DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection |
title_short | DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection |
title_sort | dafa diversity aware feature aggregation for attention based video object detection |
topic | Attention mechanism diversity-aware neural networks spatio-temporal video object detection |
url | https://ieeexplore.ieee.org/document/9874741/ |
work_keys_str_mv | AT sidongroh dafadiversityawarefeatureaggregationforattentionbasedvideoobjectdetection AT kiseokchung dafadiversityawarefeatureaggregationforattentionbasedvideoobjectdetection |