Combining transformer and CNN for object detection in UAV imagery

Combining multiple models is a well-known technique to improve predictive performance in challenging tasks such as object detection in UAV imagery. In this paper, we propose fusion of transformer-based and convolutional neural network-based (CNN) models with two approaches. First, we ensemble Swin T...

সম্পূর্ণ বিবরণ

গ্রন্থ-পঞ্জীর বিবরন
প্রধান লেখক: Willy Fitra Hendria, Quang Thinh Phan, Fikriansyah Adzaka, Cheol Jeong
বিন্যাস: প্রবন্ধ
ভাষা:English
প্রকাশিত: Elsevier 2023-04-01
মালা:ICT Express
বিষয়গুলি:
অনলাইন ব্যবহার করুন:http://www.sciencedirect.com/science/article/pii/S2405959521001715
বিবরন
সংক্ষিপ্ত:Combining multiple models is a well-known technique to improve predictive performance in challenging tasks such as object detection in UAV imagery. In this paper, we propose fusion of transformer-based and convolutional neural network-based (CNN) models with two approaches. First, we ensemble Swin Transformer and DetectoRS with ResNet backbone, and conduct performance comparison on four typical methods for combining predictions of multiple object detection models. Second, we design a hybrid architecture by combining Swin Transformer backbone with a neck of DetectoRS. We show that the fusion of the transformer and the CNN-based models performs better compared to the respective baseline model.
আইএসএসএন:2405-9595