Spatial guided image captioning: Guiding attention with object's spatial interaction

Abstract Nowadays relational position embedding is widely used in many large multi‐modal models. It begins with relational captioning (a branch of image captioning) and contains two procedures: geometric modelling and prior attention. However, there are some problems that remain unsolved in the conv...

Full description

Bibliographic Details
Main Authors: Runyan Du, Wenkai Zhang, Shuoke Li, Jialiang Chen, Zhi Guo
Format: Article
Language:English
Published: Wiley 2024-10-01
Series:IET Image Processing
Subjects:
Online Access:https://doi.org/10.1049/ipr2.13124