On the Limitations of Visual-Semantic Embedding Networks for Image-to-Text Information Retrieval
Visual-semantic embedding (VSE) networks create joint image–text representations to map images and texts in a shared embedding space to enable various information retrieval-related tasks, such as image–text retrieval, image captioning, and visual question answering. The most recent state-of-the-art...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-07-01
|
Series: | Journal of Imaging |
Subjects: | |
Online Access: | https://www.mdpi.com/2313-433X/7/8/125 |