Insights into Object Semantics: Leveraging Transformer Networks for Advanced Image Captioning

Image captioning is a technique used to generate descriptive captions for images. Typically, it involves employing a Convolutional Neural Network (CNN) as the encoder to extract visual features, and a decoder model, often based on Recurrent Neural Networks (RNNs), to generate the captions. Recently,...

Full description

Bibliographic Details
Main Authors: Deema Abdal Hafeth, Stefanos Kollias
Format: Article
Language:English
Published: MDPI AG 2024-03-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/24/6/1796