An Attentive Fourier-Augmented Image-Captioning Transformer

Many vision–language models that output natural language, such as image-captioning models, usually use image features merely for grounding the captions and most of the good performance of the model can be attributed to the language model, which does all the heavy lifting, a phenomenon that has persi...

Full description

Bibliographic Details
Main Authors:	Raymond Ian Osolo, Zhan Yang, Jun Long
Format:	Article
Language:	English
Published:	MDPI AG 2021-09-01
Series:	Applied Sciences
Subjects:	image-captioning deep learning transformers
Online Access:	https://www.mdpi.com/2076-3417/11/18/8354

Internet

https://www.mdpi.com/2076-3417/11/18/8354

An Attentive Fourier-Augmented Image-Captioning Transformer

Internet

Similar Items