Even-Order Taylor Approximation-Based Feature Refinement and Dynamic Aggregation Model for Video Object Detection

Video object detection (VOD) is a sophisticated visual task. It is a consensus that is used to find effective supportive information from correlation frames to boost the performance of the model in VOD tasks. In this paper, we not only improve the method of finding supportive information from correl...

Full description

Bibliographic Details
Main Authors: Liule Chen, Jianqiang Li, Yunyu Li, Qing Zhao
Format: Article
Language:English
Published: MDPI AG 2023-10-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/12/20/4305
_version_ 1797574034985058304
author Liule Chen
Jianqiang Li
Yunyu Li
Qing Zhao
author_facet Liule Chen
Jianqiang Li
Yunyu Li
Qing Zhao
author_sort Liule Chen
collection DOAJ
description Video object detection (VOD) is a sophisticated visual task. It is a consensus that is used to find effective supportive information from correlation frames to boost the performance of the model in VOD tasks. In this paper, we not only improve the method of finding supportive information from correlation frames but also strengthen the quality of the features extracted from the correlation frames to further strengthen the fusion of correlation frames so that the model can achieve better performance. The feature refinement module FRM in our model refines the features through the key–value encoding dictionary based on the even-order Taylor series, and the refined features are used to guide the fusion of features at different stages. In the stage of correlation frame fusion, the generative MLP is applied in the feature aggregation module DFAM to fuse the refined features extracted from the correlation frames. Experiments adequately demonstrate the effectiveness of our proposed approach. Our YOLOX-based model can achieve 83.3% AP50 on the ImageNet VID dataset.
first_indexed 2024-03-10T21:17:30Z
format Article
id doaj.art-5b14c9c9108b47379517042e3292414f
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-10T21:17:30Z
publishDate 2023-10-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-5b14c9c9108b47379517042e3292414f2023-11-19T16:19:46ZengMDPI AGElectronics2079-92922023-10-011220430510.3390/electronics12204305Even-Order Taylor Approximation-Based Feature Refinement and Dynamic Aggregation Model for Video Object DetectionLiule Chen0Jianqiang Li1Yunyu Li2Qing Zhao3Faculty of Information Technology, Beijing University of Technology, Beijing 100124, ChinaFaculty of Information Technology, Beijing University of Technology, Beijing 100124, ChinaFaculty of Information Technology, Beijing University of Technology, Beijing 100124, ChinaFaculty of Information Technology, Beijing University of Technology, Beijing 100124, ChinaVideo object detection (VOD) is a sophisticated visual task. It is a consensus that is used to find effective supportive information from correlation frames to boost the performance of the model in VOD tasks. In this paper, we not only improve the method of finding supportive information from correlation frames but also strengthen the quality of the features extracted from the correlation frames to further strengthen the fusion of correlation frames so that the model can achieve better performance. The feature refinement module FRM in our model refines the features through the key–value encoding dictionary based on the even-order Taylor series, and the refined features are used to guide the fusion of features at different stages. In the stage of correlation frame fusion, the generative MLP is applied in the feature aggregation module DFAM to fuse the refined features extracted from the correlation frames. Experiments adequately demonstrate the effectiveness of our proposed approach. Our YOLOX-based model can achieve 83.3% AP50 on the ImageNet VID dataset.https://www.mdpi.com/2079-9292/12/20/4305video object detectionfeature refinementfeature aggregationTaylor series
spellingShingle Liule Chen
Jianqiang Li
Yunyu Li
Qing Zhao
Even-Order Taylor Approximation-Based Feature Refinement and Dynamic Aggregation Model for Video Object Detection
Electronics
video object detection
feature refinement
feature aggregation
Taylor series
title Even-Order Taylor Approximation-Based Feature Refinement and Dynamic Aggregation Model for Video Object Detection
title_full Even-Order Taylor Approximation-Based Feature Refinement and Dynamic Aggregation Model for Video Object Detection
title_fullStr Even-Order Taylor Approximation-Based Feature Refinement and Dynamic Aggregation Model for Video Object Detection
title_full_unstemmed Even-Order Taylor Approximation-Based Feature Refinement and Dynamic Aggregation Model for Video Object Detection
title_short Even-Order Taylor Approximation-Based Feature Refinement and Dynamic Aggregation Model for Video Object Detection
title_sort even order taylor approximation based feature refinement and dynamic aggregation model for video object detection
topic video object detection
feature refinement
feature aggregation
Taylor series
url https://www.mdpi.com/2079-9292/12/20/4305
work_keys_str_mv AT liulechen evenordertaylorapproximationbasedfeaturerefinementanddynamicaggregationmodelforvideoobjectdetection
AT jianqiangli evenordertaylorapproximationbasedfeaturerefinementanddynamicaggregationmodelforvideoobjectdetection
AT yunyuli evenordertaylorapproximationbasedfeaturerefinementanddynamicaggregationmodelforvideoobjectdetection
AT qingzhao evenordertaylorapproximationbasedfeaturerefinementanddynamicaggregationmodelforvideoobjectdetection