Spatial Encoding and Multi-layer Joint Encoding Enhanced Transformer for Image Captioning
Image captioning is one of the hot research topics in the field of computer vision.It is a cross-media data analysis task that combines computer vision and natural language processing.It describes the image by understanding the content of the image and generating captions that are both semantically...
Main Author: | FANG Zhong-jun, ZHANG Jing, LI Dong-dong |
---|---|
Format: | Article |
Language: | zho |
Published: |
Editorial office of Computer Science
2022-10-01
|
Series: | Jisuanji kexue |
Subjects: | |
Online Access: | https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-10-151.pdf |
Similar Items
-
Exploring Spatial-Based Position Encoding for Image Captioning
by: Xiaobao Yang, et al.
Published: (2023-11-01) -
Review of Image Captioning Methods Based on Encoding-Decoding Technology
by: GENG Yaogang, MEI Hongyan, ZHANG Xing, LI Xiaohui
Published: (2022-10-01) -
Switching Text-Based Image Encoders for Captioning Images With Text
by: Arisa Ueda, et al.
Published: (2023-01-01) -
Multimodal Abstractive Summarization using bidirectional encoder representations from transformers with attention mechanism
by: Dakshata Argade, et al.
Published: (2024-02-01) -
Multi-Source Interactive Stair Attention for Remote Sensing Image Captioning
by: Xiangrong Zhang, et al.
Published: (2023-01-01)