Separate Syntax and Semantics: Part-of-Speech-Guided Transformer for Image Captioning
Transformer-based image captioning models have recently achieved remarkable performance by using new fully attentive paradigms. However, existing models generally follow the conventional language model of predicting the next word conditioned on the visual features and partially generated words. They...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-11-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/12/23/11875 |