On the Limitations of Visual-Semantic Embedding Networks for Image-to-Text Information Retrieval
Visual-semantic embedding (VSE) networks create joint image–text representations to map images and texts in a shared embedding space to enable various information retrieval-related tasks, such as image–text retrieval, image captioning, and visual question answering. The most recent state-of-the-art...
Main Authors: | Yan Gong, Georgina Cosma, Hui Fang |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-07-01
|
Series: | Journal of Imaging |
Subjects: | |
Online Access: | https://www.mdpi.com/2313-433X/7/8/125 |
Similar Items
-
Deep Semantic Cross Modal Hashing Based on Graph Similarity of Modal-Specific
by: Junzheng Li
Published: (2021-01-01) -
Learning Adequate Alignment and Interaction for Cross-Modal Retrieval
by: MingKang Wang, et al.
Published: (2023-12-01) -
Hierarchical Semantic Loss and Confidence Estimator for Visual-Semantic Embedding-Based Zero-Shot Learning
by: Sanghyun Seo, et al.
Published: (2019-08-01) -
MESH: A Flexible Manifold-Embedded Semantic Hashing for Cross-Modal Retrieval
by: Fangming Zhong, et al.
Published: (2020-01-01) -
Image–Text Cross-Modal Retrieval with Instance Contrastive Embedding
by: Ruigeng Zeng, et al.
Published: (2024-01-01)