Co-Training for Deep Object Detection: Comparing Single-Modal and Multi-Modal Approaches
Top-performing computer vision models are powered by convolutional neural networks (CNNs). Training an accurate CNN highly depends on both the raw sensor data and their associated ground truth (GT). Collecting such GT is usually done through human labeling, which is time-consuming and does not scale...
Main Authors: | Jose L. Gómez, Gabriel Villalonga, Antonio M. López |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-05-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/21/9/3185 |
Similar Items
-
Co-Training for On-Board Deep Object Detection
by: Gabriel Villalonga, et al.
Published: (2020-01-01) -
Bi-Att3DDet: Attention-Based Bi-Directional Fusion for Multi-Modal 3D Object Detection
by: Xu Gao, et al.
Published: (2025-01-01) -
A Survey of Vision and Language Related Multi-Modal Task
by: Lanxiao Wang, et al.
Published: (2022-12-01) -
Multi-Modal Object Detection Method Based on Dual-Branch Asymmetric Attention Backbone and Feature Fusion Pyramid Network
by: Jinpeng Wang, et al.
Published: (2024-10-01) -
Single-Stage Extensive Semantic Fusion for multi-modal sarcasm detection
by: Hong Fang, et al.
Published: (2024-07-01)