Stack-VS : stacked visual-semantic attention for image caption generation
Recently, automatic image caption generation has been an important focus of the work on multimodal translation task. Existing approaches can be roughly categorized into two classes, top-down and bottom-up, the former transfers the image information (called as visual-level feature) directly into a ca...
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Journal Article |
Language: | English |
Published: |
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/148460 |