Spatial guided image captioning: Guiding attention with object's spatial interaction
Abstract Nowadays relational position embedding is widely used in many large multi‐modal models. It begins with relational captioning (a branch of image captioning) and contains two procedures: geometric modelling and prior attention. However, there are some problems that remain unsolved in the conv...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2024-10-01
|
Series: | IET Image Processing |
Subjects: | |
Online Access: | https://doi.org/10.1049/ipr2.13124 |