Image-Caption Model Based on Fusion Feature

The encoder–decoder framework is the main frame of image captioning. The convolutional neural network (CNN) is usually used to extract grid-level features of the image, and the graph convolutional neural network (GCN) is used to extract the image’s region-level features. Grid-level features are poor...

Full description

Bibliographic Details
Main Authors: Yaogang Geng, Hongyan Mei, Xiaorong Xue, Xing Zhang
Format: Article
Language:English
Published: MDPI AG 2022-09-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/19/9861