Automatic image captioning combining natural language processing and deep neural networks

An image contains a lot of information that humans can detect in a very short time. Image captioning aims to detect this information by describing the image content through image and text processing techniques. One of the peculiarities of the proposed approach is the combination of multiple networks...

Full description

Bibliographic Details
Main Authors: Antonio M. Rinaldi, Cristiano Russo, Cristian Tommasino
Format: Article
Language:English
Published: Elsevier 2023-06-01
Series:Results in Engineering
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2590123023002347
_version_ 1797802785340653568
author Antonio M. Rinaldi
Cristiano Russo
Cristian Tommasino
author_facet Antonio M. Rinaldi
Cristiano Russo
Cristian Tommasino
author_sort Antonio M. Rinaldi
collection DOAJ
description An image contains a lot of information that humans can detect in a very short time. Image captioning aims to detect this information by describing the image content through image and text processing techniques. One of the peculiarities of the proposed approach is the combination of multiple networks to catch as many distinct features as possible from a semantic point of view. In this work, our goal is to prove that a combination strategy of existing methods can efficiently improve the performance in the object detection tasks concerning the performance achieved by each tested individually. This approach involves using different deep neural networks that perform two levels of hierarchical object detection in an image. The results are combined and used by a captioning module that generates image captions through natural language processing techniques. Several experimental results are reported and discussed to show the effectiveness of our framework. The combination strategy has also improved, showing a gain in precision over single models.
first_indexed 2024-03-13T05:10:54Z
format Article
id doaj.art-65b31a8317c443afb3fd361065480f87
institution Directory Open Access Journal
issn 2590-1230
language English
last_indexed 2024-03-13T05:10:54Z
publishDate 2023-06-01
publisher Elsevier
record_format Article
series Results in Engineering
spelling doaj.art-65b31a8317c443afb3fd361065480f872023-06-16T05:10:56ZengElsevierResults in Engineering2590-12302023-06-0118101107Automatic image captioning combining natural language processing and deep neural networksAntonio M. Rinaldi0Cristiano Russo1Cristian Tommasino2Corresponding author.; Department of Electrical Engineering and Information Technologies, IKNOS-LAB Intelligent and Knowledge Systems (LUPT), University of Naples Federico II, 80125 Via Claudio, 21, Napoli, ItalyDepartment of Electrical Engineering and Information Technologies, IKNOS-LAB Intelligent and Knowledge Systems (LUPT), University of Naples Federico II, 80125 Via Claudio, 21, Napoli, ItalyDepartment of Electrical Engineering and Information Technologies, IKNOS-LAB Intelligent and Knowledge Systems (LUPT), University of Naples Federico II, 80125 Via Claudio, 21, Napoli, ItalyAn image contains a lot of information that humans can detect in a very short time. Image captioning aims to detect this information by describing the image content through image and text processing techniques. One of the peculiarities of the proposed approach is the combination of multiple networks to catch as many distinct features as possible from a semantic point of view. In this work, our goal is to prove that a combination strategy of existing methods can efficiently improve the performance in the object detection tasks concerning the performance achieved by each tested individually. This approach involves using different deep neural networks that perform two levels of hierarchical object detection in an image. The results are combined and used by a captioning module that generates image captions through natural language processing techniques. Several experimental results are reported and discussed to show the effectiveness of our framework. The combination strategy has also improved, showing a gain in precision over single models.http://www.sciencedirect.com/science/article/pii/S2590123023002347Object detectionImage captioningDeep neural networksSemantic-instance segmentation
spellingShingle Antonio M. Rinaldi
Cristiano Russo
Cristian Tommasino
Automatic image captioning combining natural language processing and deep neural networks
Results in Engineering
Object detection
Image captioning
Deep neural networks
Semantic-instance segmentation
title Automatic image captioning combining natural language processing and deep neural networks
title_full Automatic image captioning combining natural language processing and deep neural networks
title_fullStr Automatic image captioning combining natural language processing and deep neural networks
title_full_unstemmed Automatic image captioning combining natural language processing and deep neural networks
title_short Automatic image captioning combining natural language processing and deep neural networks
title_sort automatic image captioning combining natural language processing and deep neural networks
topic Object detection
Image captioning
Deep neural networks
Semantic-instance segmentation
url http://www.sciencedirect.com/science/article/pii/S2590123023002347
work_keys_str_mv AT antoniomrinaldi automaticimagecaptioningcombiningnaturallanguageprocessinganddeepneuralnetworks
AT cristianorusso automaticimagecaptioningcombiningnaturallanguageprocessinganddeepneuralnetworks
AT cristiantommasino automaticimagecaptioningcombiningnaturallanguageprocessinganddeepneuralnetworks