MS-Faster R-CNN: Multi-Stream Backbone for Improved Faster R-CNN Object Detection and Aerial Tracking from UAV Images

Tracking objects across multiple video frames is a challenging task due to several difficult issues such as occlusions, background clutter, lighting as well as object and camera view-point variations, which directly affect the object detection. These aspects are even more emphasized when analyzing u...

Full description

Bibliographic Details
Main Authors: Danilo Avola, Luigi Cinque, Anxhelo Diko, Alessio Fagioli, Gian Luca Foresti, Alessio Mecca, Daniele Pannone, Claudio Piciarelli
Format: Article
Language:English
Published: MDPI AG 2021-04-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/13/9/1670
_version_ 1827694152109785088
author Danilo Avola
Luigi Cinque
Anxhelo Diko
Alessio Fagioli
Gian Luca Foresti
Alessio Mecca
Daniele Pannone
Claudio Piciarelli
author_facet Danilo Avola
Luigi Cinque
Anxhelo Diko
Alessio Fagioli
Gian Luca Foresti
Alessio Mecca
Daniele Pannone
Claudio Piciarelli
author_sort Danilo Avola
collection DOAJ
description Tracking objects across multiple video frames is a challenging task due to several difficult issues such as occlusions, background clutter, lighting as well as object and camera view-point variations, which directly affect the object detection. These aspects are even more emphasized when analyzing unmanned aerial vehicles (UAV) based images, where the vehicle movement can also impact the image quality. A common strategy employed to address these issues is to analyze the input images at different scales to obtain as much information as possible to correctly detect and track the objects across video sequences. Following this rationale, in this paper, we introduce a simple yet effective novel multi-stream (MS) architecture, where different kernel sizes are applied to each stream to simulate a multi-scale image analysis. The proposed architecture is then used as backbone for the well-known Faster-R-CNN pipeline, defining a MS-Faster R-CNN object detector that consistently detects objects in video sequences. Subsequently, this detector is jointly used with the Simple Online and Real-time Tracking with a Deep Association Metric (Deep SORT) algorithm to achieve real-time tracking capabilities on UAV images. To assess the presented architecture, extensive experiments were performed on the UMCD, UAVDT, UAV20L, and UAV123 datasets. The presented pipeline achieved state-of-the-art performance, confirming that the proposed multi-stream method can correctly emulate the robust multi-scale image analysis paradigm.
first_indexed 2024-03-10T11:58:12Z
format Article
id doaj.art-ab78694468e540bd9bace0bb6c9b5484
institution Directory Open Access Journal
issn 2072-4292
language English
last_indexed 2024-03-10T11:58:12Z
publishDate 2021-04-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj.art-ab78694468e540bd9bace0bb6c9b54842023-11-21T17:08:26ZengMDPI AGRemote Sensing2072-42922021-04-01139167010.3390/rs13091670MS-Faster R-CNN: Multi-Stream Backbone for Improved Faster R-CNN Object Detection and Aerial Tracking from UAV ImagesDanilo Avola0Luigi Cinque1Anxhelo Diko2Alessio Fagioli3Gian Luca Foresti4Alessio Mecca5Daniele Pannone6Claudio Piciarelli7Department of Computer Science, Sapienza University, 00198 Rome, ItalyDepartment of Computer Science, Sapienza University, 00198 Rome, ItalyDepartment of Computer Science, Sapienza University, 00198 Rome, ItalyDepartment of Computer Science, Sapienza University, 00198 Rome, ItalyDepartment of Mathematics, Computer Science and Physics, University of Udine, 33100 Udine, ItalyDepartment of Computer Science, Sapienza University, 00198 Rome, ItalyDepartment of Computer Science, Sapienza University, 00198 Rome, ItalyDepartment of Mathematics, Computer Science and Physics, University of Udine, 33100 Udine, ItalyTracking objects across multiple video frames is a challenging task due to several difficult issues such as occlusions, background clutter, lighting as well as object and camera view-point variations, which directly affect the object detection. These aspects are even more emphasized when analyzing unmanned aerial vehicles (UAV) based images, where the vehicle movement can also impact the image quality. A common strategy employed to address these issues is to analyze the input images at different scales to obtain as much information as possible to correctly detect and track the objects across video sequences. Following this rationale, in this paper, we introduce a simple yet effective novel multi-stream (MS) architecture, where different kernel sizes are applied to each stream to simulate a multi-scale image analysis. The proposed architecture is then used as backbone for the well-known Faster-R-CNN pipeline, defining a MS-Faster R-CNN object detector that consistently detects objects in video sequences. Subsequently, this detector is jointly used with the Simple Online and Real-time Tracking with a Deep Association Metric (Deep SORT) algorithm to achieve real-time tracking capabilities on UAV images. To assess the presented architecture, extensive experiments were performed on the UMCD, UAVDT, UAV20L, and UAV123 datasets. The presented pipeline achieved state-of-the-art performance, confirming that the proposed multi-stream method can correctly emulate the robust multi-scale image analysis paradigm.https://www.mdpi.com/2072-4292/13/9/1670UAVobject detectiontrackingdeep learningaerial images
spellingShingle Danilo Avola
Luigi Cinque
Anxhelo Diko
Alessio Fagioli
Gian Luca Foresti
Alessio Mecca
Daniele Pannone
Claudio Piciarelli
MS-Faster R-CNN: Multi-Stream Backbone for Improved Faster R-CNN Object Detection and Aerial Tracking from UAV Images
Remote Sensing
UAV
object detection
tracking
deep learning
aerial images
title MS-Faster R-CNN: Multi-Stream Backbone for Improved Faster R-CNN Object Detection and Aerial Tracking from UAV Images
title_full MS-Faster R-CNN: Multi-Stream Backbone for Improved Faster R-CNN Object Detection and Aerial Tracking from UAV Images
title_fullStr MS-Faster R-CNN: Multi-Stream Backbone for Improved Faster R-CNN Object Detection and Aerial Tracking from UAV Images
title_full_unstemmed MS-Faster R-CNN: Multi-Stream Backbone for Improved Faster R-CNN Object Detection and Aerial Tracking from UAV Images
title_short MS-Faster R-CNN: Multi-Stream Backbone for Improved Faster R-CNN Object Detection and Aerial Tracking from UAV Images
title_sort ms faster r cnn multi stream backbone for improved faster r cnn object detection and aerial tracking from uav images
topic UAV
object detection
tracking
deep learning
aerial images
url https://www.mdpi.com/2072-4292/13/9/1670
work_keys_str_mv AT daniloavola msfasterrcnnmultistreambackboneforimprovedfasterrcnnobjectdetectionandaerialtrackingfromuavimages
AT luigicinque msfasterrcnnmultistreambackboneforimprovedfasterrcnnobjectdetectionandaerialtrackingfromuavimages
AT anxhelodiko msfasterrcnnmultistreambackboneforimprovedfasterrcnnobjectdetectionandaerialtrackingfromuavimages
AT alessiofagioli msfasterrcnnmultistreambackboneforimprovedfasterrcnnobjectdetectionandaerialtrackingfromuavimages
AT gianlucaforesti msfasterrcnnmultistreambackboneforimprovedfasterrcnnobjectdetectionandaerialtrackingfromuavimages
AT alessiomecca msfasterrcnnmultistreambackboneforimprovedfasterrcnnobjectdetectionandaerialtrackingfromuavimages
AT danielepannone msfasterrcnnmultistreambackboneforimprovedfasterrcnnobjectdetectionandaerialtrackingfromuavimages
AT claudiopiciarelli msfasterrcnnmultistreambackboneforimprovedfasterrcnnobjectdetectionandaerialtrackingfromuavimages