Semantic-Enhanced Cross-Modal Fusion for Improved Unsupervised Image Captioning
Unsupervised image captioning often grapples with challenges such as image–text mismatches and modality gaps, resulting in suboptimal captions. This paper introduces a semantic-enhanced cross-modal fusion model (SCFM) to address these issues. The SCFM integrates three innovative components: a text s...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-08-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/12/17/3549 |