Semantic-Aligned Cross-Modal Visual Grounding Network with Transformers

Multi-modal deep learning methods have achieved great improvements in visual grounding; their objective is to localize text-specified objects in images. Most of the existing methods can localize and classify objects with significant appearance differences but suffer from the misclassification proble...

Full description

Bibliographic Details
Main Authors:	Qianjun Zhang, Jin Yuan
Format:	Article
Language:	English
Published:	MDPI AG 2023-05-01
Series:	Applied Sciences
Subjects:	fine-grained visual grounding contrastive learning multi-modal feature cross-modal fusion
Online Access:	https://www.mdpi.com/2076-3417/13/9/5649

Internet

https://www.mdpi.com/2076-3417/13/9/5649

Semantic-Aligned Cross-Modal Visual Grounding Network with Transformers

Internet

Similar Items