Text Augmentation Using BERT for Image Captioning

Image captioning is an important task for improving human-computer interaction as well as for a deeper understanding of the mechanisms underlying the image description by human. In recent years, this research field has rapidly developed and a number of impressive results have been achieved. The typi...

Full description

Bibliographic Details
Main Authors:	Viktar Atliha, Dmitrij Šešok
Format:	Article
Language:	English
Published:	MDPI AG 2020-08-01
Series:	Applied Sciences
Subjects:	image captioning augmentation BERT
Online Access:	https://www.mdpi.com/2076-3417/10/17/5978

_version_	1797555177677389824
author	Viktar Atliha Dmitrij Šešok
author_facet	Viktar Atliha Dmitrij Šešok
author_sort	Viktar Atliha
collection	DOAJ
description	Image captioning is an important task for improving human-computer interaction as well as for a deeper understanding of the mechanisms underlying the image description by human. In recent years, this research field has rapidly developed and a number of impressive results have been achieved. The typical models are based on a neural networks, including convolutional ones for encoding images and recurrent ones for decoding them into text. More than that, attention mechanism and transformers are actively used for boosting performance. However, even the best models have a limit in their quality with a lack of data. In order to generate a variety of descriptions of objects in different situations you need a large training set. The current commonly used datasets although rather large in terms of number of images are quite small in terms of the number of different captions per one image. We expanded the training dataset using text augmentation methods. Methods include augmentation with synonyms as a baseline and the state-of-the-art language model called Bidirectional Encoder Representations from Transformers (BERT). As a result, models that were trained on a datasets augmented show better results than that models trained on a dataset without augmentation.
first_indexed	2024-03-10T16:43:45Z
format	Article
id	doaj.art-48c40096dc64458aaf107e175d87134c
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-10T16:43:45Z
publishDate	2020-08-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-48c40096dc64458aaf107e175d87134c2023-11-20T11:46:26ZengMDPI AGApplied Sciences2076-34172020-08-011017597810.3390/app10175978Text Augmentation Using BERT for Image CaptioningViktar Atliha0Dmitrij Šešok1Department of Information Technologies, Vilnius Gediminas Technical University, Saulėtekio al. 11, LT-10223 Vilnius, LithuaniaDepartment of Information Technologies, Vilnius Gediminas Technical University, Saulėtekio al. 11, LT-10223 Vilnius, LithuaniaImage captioning is an important task for improving human-computer interaction as well as for a deeper understanding of the mechanisms underlying the image description by human. In recent years, this research field has rapidly developed and a number of impressive results have been achieved. The typical models are based on a neural networks, including convolutional ones for encoding images and recurrent ones for decoding them into text. More than that, attention mechanism and transformers are actively used for boosting performance. However, even the best models have a limit in their quality with a lack of data. In order to generate a variety of descriptions of objects in different situations you need a large training set. The current commonly used datasets although rather large in terms of number of images are quite small in terms of the number of different captions per one image. We expanded the training dataset using text augmentation methods. Methods include augmentation with synonyms as a baseline and the state-of-the-art language model called Bidirectional Encoder Representations from Transformers (BERT). As a result, models that were trained on a datasets augmented show better results than that models trained on a dataset without augmentation.https://www.mdpi.com/2076-3417/10/17/5978image captioningaugmentationBERT
spellingShingle	Viktar Atliha Dmitrij Šešok Text Augmentation Using BERT for Image Captioning Applied Sciences image captioning augmentation BERT
title	Text Augmentation Using BERT for Image Captioning
title_full	Text Augmentation Using BERT for Image Captioning
title_fullStr	Text Augmentation Using BERT for Image Captioning
title_full_unstemmed	Text Augmentation Using BERT for Image Captioning
title_short	Text Augmentation Using BERT for Image Captioning
title_sort	text augmentation using bert for image captioning
topic	image captioning augmentation BERT
url	https://www.mdpi.com/2076-3417/10/17/5978
work_keys_str_mv	AT viktaratliha textaugmentationusingbertforimagecaptioning AT dmitrijsesok textaugmentationusingbertforimagecaptioning

Text Augmentation Using BERT for Image Captioning

Similar Items