Automatic image captioning combining natural language processing and deep neural networks
An image contains a lot of information that humans can detect in a very short time. Image captioning aims to detect this information by describing the image content through image and text processing techniques. One of the peculiarities of the proposed approach is the combination of multiple networks...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-06-01
|
Series: | Results in Engineering |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2590123023002347 |
_version_ | 1797802785340653568 |
---|---|
author | Antonio M. Rinaldi Cristiano Russo Cristian Tommasino |
author_facet | Antonio M. Rinaldi Cristiano Russo Cristian Tommasino |
author_sort | Antonio M. Rinaldi |
collection | DOAJ |
description | An image contains a lot of information that humans can detect in a very short time. Image captioning aims to detect this information by describing the image content through image and text processing techniques. One of the peculiarities of the proposed approach is the combination of multiple networks to catch as many distinct features as possible from a semantic point of view. In this work, our goal is to prove that a combination strategy of existing methods can efficiently improve the performance in the object detection tasks concerning the performance achieved by each tested individually. This approach involves using different deep neural networks that perform two levels of hierarchical object detection in an image. The results are combined and used by a captioning module that generates image captions through natural language processing techniques. Several experimental results are reported and discussed to show the effectiveness of our framework. The combination strategy has also improved, showing a gain in precision over single models. |
first_indexed | 2024-03-13T05:10:54Z |
format | Article |
id | doaj.art-65b31a8317c443afb3fd361065480f87 |
institution | Directory Open Access Journal |
issn | 2590-1230 |
language | English |
last_indexed | 2024-03-13T05:10:54Z |
publishDate | 2023-06-01 |
publisher | Elsevier |
record_format | Article |
series | Results in Engineering |
spelling | doaj.art-65b31a8317c443afb3fd361065480f872023-06-16T05:10:56ZengElsevierResults in Engineering2590-12302023-06-0118101107Automatic image captioning combining natural language processing and deep neural networksAntonio M. Rinaldi0Cristiano Russo1Cristian Tommasino2Corresponding author.; Department of Electrical Engineering and Information Technologies, IKNOS-LAB Intelligent and Knowledge Systems (LUPT), University of Naples Federico II, 80125 Via Claudio, 21, Napoli, ItalyDepartment of Electrical Engineering and Information Technologies, IKNOS-LAB Intelligent and Knowledge Systems (LUPT), University of Naples Federico II, 80125 Via Claudio, 21, Napoli, ItalyDepartment of Electrical Engineering and Information Technologies, IKNOS-LAB Intelligent and Knowledge Systems (LUPT), University of Naples Federico II, 80125 Via Claudio, 21, Napoli, ItalyAn image contains a lot of information that humans can detect in a very short time. Image captioning aims to detect this information by describing the image content through image and text processing techniques. One of the peculiarities of the proposed approach is the combination of multiple networks to catch as many distinct features as possible from a semantic point of view. In this work, our goal is to prove that a combination strategy of existing methods can efficiently improve the performance in the object detection tasks concerning the performance achieved by each tested individually. This approach involves using different deep neural networks that perform two levels of hierarchical object detection in an image. The results are combined and used by a captioning module that generates image captions through natural language processing techniques. Several experimental results are reported and discussed to show the effectiveness of our framework. The combination strategy has also improved, showing a gain in precision over single models.http://www.sciencedirect.com/science/article/pii/S2590123023002347Object detectionImage captioningDeep neural networksSemantic-instance segmentation |
spellingShingle | Antonio M. Rinaldi Cristiano Russo Cristian Tommasino Automatic image captioning combining natural language processing and deep neural networks Results in Engineering Object detection Image captioning Deep neural networks Semantic-instance segmentation |
title | Automatic image captioning combining natural language processing and deep neural networks |
title_full | Automatic image captioning combining natural language processing and deep neural networks |
title_fullStr | Automatic image captioning combining natural language processing and deep neural networks |
title_full_unstemmed | Automatic image captioning combining natural language processing and deep neural networks |
title_short | Automatic image captioning combining natural language processing and deep neural networks |
title_sort | automatic image captioning combining natural language processing and deep neural networks |
topic | Object detection Image captioning Deep neural networks Semantic-instance segmentation |
url | http://www.sciencedirect.com/science/article/pii/S2590123023002347 |
work_keys_str_mv | AT antoniomrinaldi automaticimagecaptioningcombiningnaturallanguageprocessinganddeepneuralnetworks AT cristianorusso automaticimagecaptioningcombiningnaturallanguageprocessinganddeepneuralnetworks AT cristiantommasino automaticimagecaptioningcombiningnaturallanguageprocessinganddeepneuralnetworks |