Stack-VS : stacked visual-semantic attention for image caption generation
Recently, automatic image caption generation has been an important focus of the work on multimodal translation task. Existing approaches can be roughly categorized into two classes, top-down and bottom-up, the former transfers the image information (called as visual-level feature) directly into a ca...
Main Authors: | Cheng, Ling, Wei, Wei, Mao, Xianling, Liu, Yong, Miao, Chunyan |
---|---|
Other Authors: | School of Computer Science and Engineering |
Format: | Journal Article |
Language: | English |
Published: |
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/148460 |
Similar Items
-
Stack-VS: Stacked Visual-Semantic Attention for Image Caption Generation
by: Ling Cheng, et al.
Published: (2020-01-01) -
Video captioning with stacked attention and semantic hard pull
by: Md. Mushfiqur Rahman, et al.
Published: (2021-08-01) -
Video Captioning Based on Channel Soft Attention and Semantic Reconstructor
by: Zhou Lei, et al.
Published: (2021-02-01) -
Novel Object Captioning with Semantic Match from External Knowledge
by: Sen Du, et al.
Published: (2023-07-01) -
VAA: Visual Aligning Attention Model for Remote Sensing Image Captioning
by: Zhengyuan Zhang, et al.
Published: (2019-01-01)