Semantic-Enhanced Cross-Modal Fusion for Improved Unsupervised Image Captioning

Unsupervised image captioning often grapples with challenges such as image–text mismatches and modality gaps, resulting in suboptimal captions. This paper introduces a semantic-enhanced cross-modal fusion model (SCFM) to address these issues. The SCFM integrates three innovative components: a text s...

Full description

Bibliographic Details
Main Authors: Nan Xiang, Ling Chen, Leiyan Liang, Xingdi Rao, Zehao Gong
Format: Article
Language:English
Published: MDPI AG 2023-08-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/12/17/3549