Generalized Image Captioning for Multilingual Support
Image captioning is a problem of viewing images and describing images in language. This is an important problem that can be solved by understanding the image, and combining two fields of image processing and natural language processing into one. The purpose of image captioning research so far has be...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-02-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/13/4/2446 |
_version_ | 1797622590364188672 |
---|---|
author | Suhyun Cho Hayoung Oh |
author_facet | Suhyun Cho Hayoung Oh |
author_sort | Suhyun Cho |
collection | DOAJ |
description | Image captioning is a problem of viewing images and describing images in language. This is an important problem that can be solved by understanding the image, and combining two fields of image processing and natural language processing into one. The purpose of image captioning research so far has been to create general explanatory captions in the learning data. However, various environments in reality must be considered for practical use, as well as image descriptions that suit the purpose of use. Image caption research requires processing new learning data to generate descriptive captions for specific purposes, but it takes a lot of time and effort to create learnable data. In this study, we propose a method to solve this problem. Popular image captioning can help visually impaired people understand their surroundings by automatically recognizing and describing images into text and then into voice and is an important issue that can be applied to many places such as image search, art therapy, sports commentary, and real-time traffic information commentary. Through the domain object dictionary method proposed in this study, we propose a method to generate image captions without the need to process new learning data by adjusting the object dictionary for each domain application. The method proposed in the study is to change the dictionary of the object to focus on the domain object dictionary rather than processing the learning data, leading to the creation of various image captions by intensively explaining the objects required for each domain. In this work, we propose a filter captioning model that induces generation of image captions from various domains while maintaining the performance of existing models. |
first_indexed | 2024-03-11T09:12:28Z |
format | Article |
id | doaj.art-e10152957bd1462896ff77aac05d8624 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-11T09:12:28Z |
publishDate | 2023-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-e10152957bd1462896ff77aac05d86242023-11-16T18:55:51ZengMDPI AGApplied Sciences2076-34172023-02-01134244610.3390/app13042446Generalized Image Captioning for Multilingual SupportSuhyun Cho0Hayoung Oh1Applied Artificial Intelligence Convergence Department, Sungkyunkwan University, Seoul 03063, Republic of KoreaCollege of Computing & Informatics, Sungkyunkwan University, Seoul 03063, Republic of KoreaImage captioning is a problem of viewing images and describing images in language. This is an important problem that can be solved by understanding the image, and combining two fields of image processing and natural language processing into one. The purpose of image captioning research so far has been to create general explanatory captions in the learning data. However, various environments in reality must be considered for practical use, as well as image descriptions that suit the purpose of use. Image caption research requires processing new learning data to generate descriptive captions for specific purposes, but it takes a lot of time and effort to create learnable data. In this study, we propose a method to solve this problem. Popular image captioning can help visually impaired people understand their surroundings by automatically recognizing and describing images into text and then into voice and is an important issue that can be applied to many places such as image search, art therapy, sports commentary, and real-time traffic information commentary. Through the domain object dictionary method proposed in this study, we propose a method to generate image captions without the need to process new learning data by adjusting the object dictionary for each domain application. The method proposed in the study is to change the dictionary of the object to focus on the domain object dictionary rather than processing the learning data, leading to the creation of various image captions by intensively explaining the objects required for each domain. In this work, we propose a filter captioning model that induces generation of image captions from various domains while maintaining the performance of existing models.https://www.mdpi.com/2076-3417/13/4/2446image captioningmultimodalvisionNLP |
spellingShingle | Suhyun Cho Hayoung Oh Generalized Image Captioning for Multilingual Support Applied Sciences image captioning multimodal vision NLP |
title | Generalized Image Captioning for Multilingual Support |
title_full | Generalized Image Captioning for Multilingual Support |
title_fullStr | Generalized Image Captioning for Multilingual Support |
title_full_unstemmed | Generalized Image Captioning for Multilingual Support |
title_short | Generalized Image Captioning for Multilingual Support |
title_sort | generalized image captioning for multilingual support |
topic | image captioning multimodal vision NLP |
url | https://www.mdpi.com/2076-3417/13/4/2446 |
work_keys_str_mv | AT suhyuncho generalizedimagecaptioningformultilingualsupport AT hayoungoh generalizedimagecaptioningformultilingualsupport |