Separate Syntax and Semantics: Part-of-Speech-Guided Transformer for Image Captioning
Transformer-based image captioning models have recently achieved remarkable performance by using new fully attentive paradigms. However, existing models generally follow the conventional language model of predicting the next word conditioned on the visual features and partially generated words. They...
Main Authors: | Dong Wang, Bing Liu, Yong Zhou, Mingming Liu, Peng Liu, Rui Yao |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-11-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/12/23/11875 |
Similar Items
-
Image Captioning Model Using Part-of-Speech Guidance Module for Description With Diverse Vocabulary
by: Ju-Won Bae, et al.
Published: (2022-01-01) -
Learn and Tell: Learning Priors for Image Caption Generation
by: Pei Liu, et al.
Published: (2020-10-01) -
Full-Memory Transformer for Image Captioning
by: Tongwei Lu, et al.
Published: (2023-01-01) -
An Attentive Fourier-Augmented Image-Captioning Transformer
by: Raymond Ian Osolo, et al.
Published: (2021-09-01) -
Insights into Object Semantics: Leveraging Transformer Networks for Advanced Image Captioning
by: Deema Abdal Hafeth, et al.
Published: (2024-03-01)