Spatial Encoding and Multi-layer Joint Encoding Enhanced Transformer for Image Captioning

Image captioning is one of the hot research topics in the field of computer vision.It is a cross-media data analysis task that combines computer vision and natural language processing.It describes the image by understanding the content of the image and generating captions that are both semantically...

Full description

Bibliographic Details
Main Author: FANG Zhong-jun, ZHANG Jing, LI Dong-dong
Format: Article
Language:zho
Published: Editorial office of Computer Science 2022-10-01
Series:Jisuanji kexue
Subjects:
Online Access:https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-10-151.pdf