Even-Order Taylor Approximation-Based Feature Refinement and Dynamic Aggregation Model for Video Object Detection
Video object detection (VOD) is a sophisticated visual task. It is a consensus that is used to find effective supportive information from correlation frames to boost the performance of the model in VOD tasks. In this paper, we not only improve the method of finding supportive information from correl...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-10-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/12/20/4305 |
_version_ | 1797574034985058304 |
---|---|
author | Liule Chen Jianqiang Li Yunyu Li Qing Zhao |
author_facet | Liule Chen Jianqiang Li Yunyu Li Qing Zhao |
author_sort | Liule Chen |
collection | DOAJ |
description | Video object detection (VOD) is a sophisticated visual task. It is a consensus that is used to find effective supportive information from correlation frames to boost the performance of the model in VOD tasks. In this paper, we not only improve the method of finding supportive information from correlation frames but also strengthen the quality of the features extracted from the correlation frames to further strengthen the fusion of correlation frames so that the model can achieve better performance. The feature refinement module FRM in our model refines the features through the key–value encoding dictionary based on the even-order Taylor series, and the refined features are used to guide the fusion of features at different stages. In the stage of correlation frame fusion, the generative MLP is applied in the feature aggregation module DFAM to fuse the refined features extracted from the correlation frames. Experiments adequately demonstrate the effectiveness of our proposed approach. Our YOLOX-based model can achieve 83.3% AP50 on the ImageNet VID dataset. |
first_indexed | 2024-03-10T21:17:30Z |
format | Article |
id | doaj.art-5b14c9c9108b47379517042e3292414f |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-10T21:17:30Z |
publishDate | 2023-10-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-5b14c9c9108b47379517042e3292414f2023-11-19T16:19:46ZengMDPI AGElectronics2079-92922023-10-011220430510.3390/electronics12204305Even-Order Taylor Approximation-Based Feature Refinement and Dynamic Aggregation Model for Video Object DetectionLiule Chen0Jianqiang Li1Yunyu Li2Qing Zhao3Faculty of Information Technology, Beijing University of Technology, Beijing 100124, ChinaFaculty of Information Technology, Beijing University of Technology, Beijing 100124, ChinaFaculty of Information Technology, Beijing University of Technology, Beijing 100124, ChinaFaculty of Information Technology, Beijing University of Technology, Beijing 100124, ChinaVideo object detection (VOD) is a sophisticated visual task. It is a consensus that is used to find effective supportive information from correlation frames to boost the performance of the model in VOD tasks. In this paper, we not only improve the method of finding supportive information from correlation frames but also strengthen the quality of the features extracted from the correlation frames to further strengthen the fusion of correlation frames so that the model can achieve better performance. The feature refinement module FRM in our model refines the features through the key–value encoding dictionary based on the even-order Taylor series, and the refined features are used to guide the fusion of features at different stages. In the stage of correlation frame fusion, the generative MLP is applied in the feature aggregation module DFAM to fuse the refined features extracted from the correlation frames. Experiments adequately demonstrate the effectiveness of our proposed approach. Our YOLOX-based model can achieve 83.3% AP50 on the ImageNet VID dataset.https://www.mdpi.com/2079-9292/12/20/4305video object detectionfeature refinementfeature aggregationTaylor series |
spellingShingle | Liule Chen Jianqiang Li Yunyu Li Qing Zhao Even-Order Taylor Approximation-Based Feature Refinement and Dynamic Aggregation Model for Video Object Detection Electronics video object detection feature refinement feature aggregation Taylor series |
title | Even-Order Taylor Approximation-Based Feature Refinement and Dynamic Aggregation Model for Video Object Detection |
title_full | Even-Order Taylor Approximation-Based Feature Refinement and Dynamic Aggregation Model for Video Object Detection |
title_fullStr | Even-Order Taylor Approximation-Based Feature Refinement and Dynamic Aggregation Model for Video Object Detection |
title_full_unstemmed | Even-Order Taylor Approximation-Based Feature Refinement and Dynamic Aggregation Model for Video Object Detection |
title_short | Even-Order Taylor Approximation-Based Feature Refinement and Dynamic Aggregation Model for Video Object Detection |
title_sort | even order taylor approximation based feature refinement and dynamic aggregation model for video object detection |
topic | video object detection feature refinement feature aggregation Taylor series |
url | https://www.mdpi.com/2079-9292/12/20/4305 |
work_keys_str_mv | AT liulechen evenordertaylorapproximationbasedfeaturerefinementanddynamicaggregationmodelforvideoobjectdetection AT jianqiangli evenordertaylorapproximationbasedfeaturerefinementanddynamicaggregationmodelforvideoobjectdetection AT yunyuli evenordertaylorapproximationbasedfeaturerefinementanddynamicaggregationmodelforvideoobjectdetection AT qingzhao evenordertaylorapproximationbasedfeaturerefinementanddynamicaggregationmodelforvideoobjectdetection |