Context-Adaptive-Based Image Captioning by Bi-CARU

Image captions are abstract expressions of content representations using text sentences, helping readers to better understand and analyse information between different media. With the advantage of encoder-decoder neural networks, captions can provide a rational structure for tasks such as image codi...

Full description

Bibliographic Details
Main Authors:	Sio-Kei Im, Ka-Hou Chan
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	CNN RNN NLP image captioning Bi-CARU context-adaptive
Online Access:	https://ieeexplore.ieee.org/document/10210039/

_version_	1797243517043474432
author	Sio-Kei Im Ka-Hou Chan
author_facet	Sio-Kei Im Ka-Hou Chan
author_sort	Sio-Kei Im
collection	DOAJ
description	Image captions are abstract expressions of content representations using text sentences, helping readers to better understand and analyse information between different media. With the advantage of encoder-decoder neural networks, captions can provide a rational structure for tasks such as image coding and caption prediction. This work introduces a Convolutional Neural Network (CNN) to Bidirectional Content-Adaptive Recurrent Unit (Bi-CARU) (CNN-to-Bi-CARU) model that performs bidirectional structure to consider contextual features and captures major feature from image. The encoded feature coded form image is respectively passed into the forward and backward layer of CARU to refine the word prediction, providing contextual text output for captioning. An attention layer is also introduced to collect the feature produced by the context-adaptive gate in CARU, aiming to compute the weighting information for relationship extraction and determination. In experiments, the proposed CNN-to-Bi-CARU model outperforms other advanced models in the field, achieving better extraction of contextual information and detailed representation of image captions. The model obtains a score of 41.28 on BLEU@4, 31.23 on METEOR, 61.07 on ROUGE-L, and 133.20 on CIDEr-D, making it competitive in the image captioning of MSCOCO dataset.
first_indexed	2024-04-24T18:56:22Z
format	Article
id	doaj.art-9316f0de14fb48eeb0a1d66f10e309e2
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-24T18:56:22Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-9316f0de14fb48eeb0a1d66f10e309e22024-03-26T17:34:32ZengIEEEIEEE Access2169-35362023-01-0111849348494310.1109/ACCESS.2023.330251210210039Context-Adaptive-Based Image Captioning by Bi-CARUSio-Kei Im0https://orcid.org/0000-0002-5599-4300Ka-Hou Chan1https://orcid.org/0000-0002-0183-0685Faculty of Applied Sciences, Macao Polytechnic University, Macau, ChinaFaculty of Applied Sciences, Macao Polytechnic University, Macau, ChinaImage captions are abstract expressions of content representations using text sentences, helping readers to better understand and analyse information between different media. With the advantage of encoder-decoder neural networks, captions can provide a rational structure for tasks such as image coding and caption prediction. This work introduces a Convolutional Neural Network (CNN) to Bidirectional Content-Adaptive Recurrent Unit (Bi-CARU) (CNN-to-Bi-CARU) model that performs bidirectional structure to consider contextual features and captures major feature from image. The encoded feature coded form image is respectively passed into the forward and backward layer of CARU to refine the word prediction, providing contextual text output for captioning. An attention layer is also introduced to collect the feature produced by the context-adaptive gate in CARU, aiming to compute the weighting information for relationship extraction and determination. In experiments, the proposed CNN-to-Bi-CARU model outperforms other advanced models in the field, achieving better extraction of contextual information and detailed representation of image captions. The model obtains a score of 41.28 on BLEU@4, 31.23 on METEOR, 61.07 on ROUGE-L, and 133.20 on CIDEr-D, making it competitive in the image captioning of MSCOCO dataset.https://ieeexplore.ieee.org/document/10210039/CNNRNNNLPimage captioningBi-CARUcontext-adaptive
spellingShingle	Sio-Kei Im Ka-Hou Chan Context-Adaptive-Based Image Captioning by Bi-CARU IEEE Access CNN RNN NLP image captioning Bi-CARU context-adaptive
title	Context-Adaptive-Based Image Captioning by Bi-CARU
title_full	Context-Adaptive-Based Image Captioning by Bi-CARU
title_fullStr	Context-Adaptive-Based Image Captioning by Bi-CARU
title_full_unstemmed	Context-Adaptive-Based Image Captioning by Bi-CARU
title_short	Context-Adaptive-Based Image Captioning by Bi-CARU
title_sort	context adaptive based image captioning by bi caru
topic	CNN RNN NLP image captioning Bi-CARU context-adaptive
url	https://ieeexplore.ieee.org/document/10210039/
work_keys_str_mv	AT siokeiim contextadaptivebasedimagecaptioningbybicaru AT kahouchan contextadaptivebasedimagecaptioningbybicaru

Context-Adaptive-Based Image Captioning by Bi-CARU

Similar Items