Image-Caption Model Based on Fusion Feature

The encoder–decoder framework is the main frame of image captioning. The convolutional neural network (CNN) is usually used to extract grid-level features of the image, and the graph convolutional neural network (GCN) is used to extract the image’s region-level features. Grid-level features are poor...

Full description

Bibliographic Details
Main Authors:	Yaogang Geng, Hongyan Mei, Xiaorong Xue, Xing Zhang
Format:	Article
Language:	English
Published:	MDPI AG 2022-09-01
Series:	Applied Sciences
Subjects:	image caption encoder–decoder framework multi-modal fusion features
Online Access:	https://www.mdpi.com/2076-3417/12/19/9861

Internet

https://www.mdpi.com/2076-3417/12/19/9861

Image-Caption Model Based on Fusion Feature

Internet

Similar Items