Generalized Image Captioning for Multilingual Support

Image captioning is a problem of viewing images and describing images in language. This is an important problem that can be solved by understanding the image, and combining two fields of image processing and natural language processing into one. The purpose of image captioning research so far has be...

Full description

Bibliographic Details
Main Authors: Suhyun Cho, Hayoung Oh
Format: Article
Language:English
Published: MDPI AG 2023-02-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/4/2446
_version_ 1797622590364188672
author Suhyun Cho
Hayoung Oh
author_facet Suhyun Cho
Hayoung Oh
author_sort Suhyun Cho
collection DOAJ
description Image captioning is a problem of viewing images and describing images in language. This is an important problem that can be solved by understanding the image, and combining two fields of image processing and natural language processing into one. The purpose of image captioning research so far has been to create general explanatory captions in the learning data. However, various environments in reality must be considered for practical use, as well as image descriptions that suit the purpose of use. Image caption research requires processing new learning data to generate descriptive captions for specific purposes, but it takes a lot of time and effort to create learnable data. In this study, we propose a method to solve this problem. Popular image captioning can help visually impaired people understand their surroundings by automatically recognizing and describing images into text and then into voice and is an important issue that can be applied to many places such as image search, art therapy, sports commentary, and real-time traffic information commentary. Through the domain object dictionary method proposed in this study, we propose a method to generate image captions without the need to process new learning data by adjusting the object dictionary for each domain application. The method proposed in the study is to change the dictionary of the object to focus on the domain object dictionary rather than processing the learning data, leading to the creation of various image captions by intensively explaining the objects required for each domain. In this work, we propose a filter captioning model that induces generation of image captions from various domains while maintaining the performance of existing models.
first_indexed 2024-03-11T09:12:28Z
format Article
id doaj.art-e10152957bd1462896ff77aac05d8624
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-11T09:12:28Z
publishDate 2023-02-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-e10152957bd1462896ff77aac05d86242023-11-16T18:55:51ZengMDPI AGApplied Sciences2076-34172023-02-01134244610.3390/app13042446Generalized Image Captioning for Multilingual SupportSuhyun Cho0Hayoung Oh1Applied Artificial Intelligence Convergence Department, Sungkyunkwan University, Seoul 03063, Republic of KoreaCollege of Computing & Informatics, Sungkyunkwan University, Seoul 03063, Republic of KoreaImage captioning is a problem of viewing images and describing images in language. This is an important problem that can be solved by understanding the image, and combining two fields of image processing and natural language processing into one. The purpose of image captioning research so far has been to create general explanatory captions in the learning data. However, various environments in reality must be considered for practical use, as well as image descriptions that suit the purpose of use. Image caption research requires processing new learning data to generate descriptive captions for specific purposes, but it takes a lot of time and effort to create learnable data. In this study, we propose a method to solve this problem. Popular image captioning can help visually impaired people understand their surroundings by automatically recognizing and describing images into text and then into voice and is an important issue that can be applied to many places such as image search, art therapy, sports commentary, and real-time traffic information commentary. Through the domain object dictionary method proposed in this study, we propose a method to generate image captions without the need to process new learning data by adjusting the object dictionary for each domain application. The method proposed in the study is to change the dictionary of the object to focus on the domain object dictionary rather than processing the learning data, leading to the creation of various image captions by intensively explaining the objects required for each domain. In this work, we propose a filter captioning model that induces generation of image captions from various domains while maintaining the performance of existing models.https://www.mdpi.com/2076-3417/13/4/2446image captioningmultimodalvisionNLP
spellingShingle Suhyun Cho
Hayoung Oh
Generalized Image Captioning for Multilingual Support
Applied Sciences
image captioning
multimodal
vision
NLP
title Generalized Image Captioning for Multilingual Support
title_full Generalized Image Captioning for Multilingual Support
title_fullStr Generalized Image Captioning for Multilingual Support
title_full_unstemmed Generalized Image Captioning for Multilingual Support
title_short Generalized Image Captioning for Multilingual Support
title_sort generalized image captioning for multilingual support
topic image captioning
multimodal
vision
NLP
url https://www.mdpi.com/2076-3417/13/4/2446
work_keys_str_mv AT suhyuncho generalizedimagecaptioningformultilingualsupport
AT hayoungoh generalizedimagecaptioningformultilingualsupport