Insights into Object Semantics: Leveraging Transformer Networks for Advanced Image Captioning

Insights into Object Semantics: Leveraging Transformer Networks for Advanced Image Captioning

Image captioning is a technique used to generate descriptive captions for images. Typically, it involves employing a Convolutional Neural Network (CNN) as the encoder to extract visual features, and a decoder model, often based on Recurrent Neural Networks (RNNs), to generate the captions. Recently,...

Full description

Bibliographic Details
Main Authors:	Deema Abdal Hafeth, Stefanos Kollias
Format:	Article
Language:	English
Published:	MDPI AG 2024-03-01
Series:	Sensors
Subjects:	image captioning deep learning transformers attention vision language
Online Access:	https://www.mdpi.com/1424-8220/24/6/1796

Similar Items

Semantic Representations With Attention Networks for Boosting Image Captioning
by: Deema Abdal Hafeth, et al.
Published: (2023-01-01)

An Analysis of the Use of Feed-Forward Sub-Modules for Transformer-Based Image Captioning Tasks
by: Raymond Ian Osolo, et al.
Published: (2021-12-01)

A Context Semantic Auxiliary Network for Image Captioning
by: Jianying Li, et al.
Published: (2023-07-01)

An Attentive Fourier-Augmented Image-Captioning Transformer
by: Raymond Ian Osolo, et al.
Published: (2021-09-01)

Structure Preserving Convolutional Attention for Image Captioning
by: Shichen Lu, et al.
Published: (2019-07-01)

Variational Autoencoder-Based Multiple Image Captioning Using a Caption Attention Map
by: Boeun Kim, et al.
Published: (2019-07-01)

Captioning Transformer with Stacked Attention Modules
by: Xinxin Zhu, et al.
Published: (2018-05-01)

Cascade Semantic Fusion for Image Captioning
by: Shiwei Wang, et al.
Published: (2019-01-01)

Novel Object Captioning with Semantic Match from External Knowledge
by: Sen Du, et al.
Published: (2023-07-01)

From Plane to Hierarchy: Deformable Transformer for Remote Sensing Image Captioning
by: Runyan Du, et al.
Published: (2023-01-01)

Multi-Gate Attention Network for Image Captioning
by: Weitao Jiang, et al.
Published: (2021-01-01)

Novel Advance Image Caption Generation Utilizing Vision Transformer and Generative Adversarial Networks
by: Shourya Tyagi, et al.
Published: (2024-11-01)

Image Captioning Based on Semantic Scenes
by: Fengzhi Zhao, et al.
Published: (2024-10-01)

Separate Syntax and Semantics: Part-of-Speech-Guided Transformer for Image Captioning
by: Dong Wang, et al.
Published: (2022-11-01)

Cross Encoder-Decoder Transformer with Global-Local Visual Extractor for Medical Image Captioning
by: Hojun Lee, et al.
Published: (2022-02-01)

Folk Games Image Captioning using Object Attention
by: Saiful Akbar, et al.
Published: (2023-08-01)

Deep Learning Approaches Based on Transformer Architectures for Image Captioning Tasks
by: Roberto Castro, et al.
Published: (2022-01-01)

Hybrid Attention Distribution and Factorized Embedding Matrix in Image Captioning
by: Jian Wang, et al.
Published: (2020-01-01)

Model Semantic Attention (SemAtt) With Hybrid Learning Separable Neural Network and Long Short-Term Memory to Generate Caption
by: Agus Nursikuwagus, et al.
Published: (2024-01-01)

Generalized Image Captioning for Multilingual Support
by: Suhyun Cho, et al.
Published: (2023-02-01)

UAT: Universal Attention Transformer for Video Captioning
by: Heeju Im, et al.
Published: (2022-06-01)

Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates
by: Nicholas Moratelli, et al.
Published: (2023-01-01)

Cross-Lingual Image Caption Generation Based on Visual Attention Model
by: Bin Wang, et al.
Published: (2020-01-01)

A performance analysis of transformer-based deep learning models for Arabic image captioning
by: Ashwaq Alsayed, et al.
Published: (2023-10-01)

Image Captioning with Word Gate and Adaptive Self-Critical Learning
by: Xinxin Zhu, et al.
Published: (2018-06-01)

Cross-Modal Retrieval and Semantic Refinement for Remote Sensing Image Captioning
by: Zhengxin Li, et al.
Published: (2024-01-01)

Novel concept-based image captioning models using LSTM and multi-encoder transformer architecture
by: Asmaa A. E. Osman, et al.
Published: (2024-09-01)

Region-guided transformer for remote sensing image captioning
by: Kai Zhao, et al.
Published: (2024-12-01)

Video captioning based on vision transformer and reinforcement learning
by: Hong Zhao, et al.
Published: (2022-03-01)

Video captioning with stacked attention and semantic hard pull
by: Md. Mushfiqur Rahman, et al.
Published: (2021-08-01)

A Lightweight Sparse Focus Transformer for Remote Sensing Image Change Captioning
by: Dongwei Sun, et al.
Published: (2024-01-01)

A novel image captioning model with visual-semantic similarities and visual representations re-weighting
by: Alaa Thobhani, et al.
Published: (2024-09-01)

Style-Enhanced Transformer for Image Captioning in Construction Scenes
by: Kani Song, et al.
Published: (2024-03-01)

VAA: Visual Aligning Attention Model for Remote Sensing Image Captioning
by: Zhengyuan Zhang, et al.
Published: (2019-01-01)

Panoptic Segmentation-Based Attention for Image Captioning
by: Wenjie Cai, et al.
Published: (2020-01-01)

Video Captioning Based on Channel Soft Attention and Semantic Reconstructor
by: Zhou Lei, et al.
Published: (2021-02-01)

A Sparse Transformer-Based Approach for Image Captioning
by: Zhou Lei, et al.
Published: (2020-01-01)

An Image Captioning Model Based on Bidirectional Depth Residuals and its Application
by: Ziwei Zhou, et al.
Published: (2021-01-01)

VSAM-Based Visual Keyword Generation for Image Caption
by: Suya Zhang, et al.
Published: (2021-01-01)

Exploring Spatial-Based Position Encoding for Image Captioning
by: Xiaobao Yang, et al.
Published: (2023-11-01)