Transformed Dynamic Feature Pyramid for Small Object Detection

The low resolution and less feature information of small targets make it difficult to recognize and locate, which greatly hinders the improvement of object detection accuracy. In this paper, an object detection model (TDFP) based on CNN and transformer was established, which combines local and globa...

Full description

Bibliographic Details
Main Authors: Hong Liang, Ying Yang, Qian Zhang, Linxia Feng, Jie Ren, Qiyao Liang
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9551884/
Description
Summary:The low resolution and less feature information of small targets make it difficult to recognize and locate, which greatly hinders the improvement of object detection accuracy. In this paper, an object detection model (TDFP) based on CNN and transformer was established, which combines local and global context to establish the connection between features. In the proposed transformed dynamic feature pyramid network, a transformer module was designed to dynamically transform and fuse the multi-scale features generated by the backbone to generate a transformed feature pyramid with richer multi-scale features and context information. In this transformation process, gate block is used to dynamically select single-scale transformation or cross-scale transformation to achieve an optimal style of transformation and fusion of multi-scale features. The experimental results show that the model improves the small targets detection accuracy based on CNN and transformer. Based on the backbone ResNeXt-101, TDFP achieves 46.2&#x0025; AP and 26.3&#x0025; AP<sub>S</sub> on MS COCO, and takes the amount of computation as a loss constraint to achieve a better balance between detection accuracy and computational complexity.
ISSN:2169-3536