Insights into Object Semantics: Leveraging Transformer Networks for Advanced Image Captioning

Image captioning is a technique used to generate descriptive captions for images. Typically, it involves employing a Convolutional Neural Network (CNN) as the encoder to extract visual features, and a decoder model, often based on Recurrent Neural Networks (RNNs), to generate the captions. Recently,...

Full description

Bibliographic Details
Main Authors:	Deema Abdal Hafeth, Stefanos Kollias
Format:	Article
Language:	English
Published:	MDPI AG 2024-03-01
Series:	Sensors
Subjects:	image captioning deep learning transformers attention vision language
Online Access:	https://www.mdpi.com/1424-8220/24/6/1796

Internet

https://www.mdpi.com/1424-8220/24/6/1796

Insights into Object Semantics: Leveraging Transformer Networks for Advanced Image Captioning

Internet

Similar Items