Image Captioning with Style Using Generative Adversarial Networks

Image captioning research, which initially focused on describing images factually, is currently being developed in the direction of incorporating sentiments or styles to produce natural captions that reflect human-generated captions. The problem this research tries to solve the problem that captions...

Full description

Bibliographic Details
Main Authors:	Dennis Setiawan, Maria Astrid Coenradina Saffachrissa, Shintia Tamara, Derwin Suhartono
Format:	Article
Language:	English
Published:	Politeknik Negeri Padang 2022-03-01
Series:	JOIV: International Journal on Informatics Visualization
Subjects:	stylized image captioning seqcapsgan sentiments or styles generative adversarial network (gan) capsule discriminator generator.
Online Access:	https://joiv.org/index.php/joiv/article/view/709

_version_	1827998749968826368
author	Dennis Setiawan Maria Astrid Coenradina Saffachrissa Shintia Tamara Derwin Suhartono
author_facet	Dennis Setiawan Maria Astrid Coenradina Saffachrissa Shintia Tamara Derwin Suhartono
author_sort	Dennis Setiawan
collection	DOAJ
description	Image captioning research, which initially focused on describing images factually, is currently being developed in the direction of incorporating sentiments or styles to produce natural captions that reflect human-generated captions. The problem this research tries to solve the problem that captions produced by existing models are rigid and unnatural due to the lack of sentiment. The purpose of this research is to design a reliable image captioning model that incorporates style based on state-of-the-art SeqCapsGAN architecture. The materials needed are MS COCO and SentiCaps datasets. Research methods are done through literature studies and experiments. While many previous studies compare their works without considering the differences in components and parameters being used, this research proposes a different approach to find more reliable configurations and provide more detailed insights into models’ behavior. This research also does further experiments on the generator part that have not been thoroughly investigated. Experiments are done on the combinations of feature extractor (VGG-19 and ResNet-50), discriminator model (CNN and Capsule), optimizer (Adam, Nadam, and SGD), batch size (8, 16, 32, and 64), and learning rate (0.001 and 0.0001) by doing a grid search. In conclusion, more insights into the models’ behavior can be drawn, and better configuration and result than the baseline can be achieved. Our research implies that research in comparative studies of image recognition models in image captioning context, automated metrics, and larger datasets suited for stylized image captioning might be needed for furthering the research in this field.
first_indexed	2024-04-10T05:47:29Z
format	Article
id	doaj.art-6decb5a8330e4bd29ae54c0644bd4f31
institution	Directory Open Access Journal
issn	2549-9610 2549-9904
language	English
last_indexed	2024-04-10T05:47:29Z
publishDate	2022-03-01
publisher	Politeknik Negeri Padang
record_format	Article
series	JOIV: International Journal on Informatics Visualization
spelling	doaj.art-6decb5a8330e4bd29ae54c0644bd4f312023-03-05T10:28:40ZengPoliteknik Negeri PadangJOIV: International Journal on Informatics Visualization2549-96102549-99042022-03-0161263210.30630/joiv.6.1.709311Image Captioning with Style Using Generative Adversarial NetworksDennis Setiawan0Maria Astrid Coenradina Saffachrissa1Shintia Tamara2Derwin Suhartono3Computer Science Department, School of Computer Science, Bina Nusantara University, Palmerah, Jakarta 11480, IndonesiaComputer Science Department, School of Computer Science, Bina Nusantara University, Palmerah, Jakarta 11480, IndonesiaComputer Science Department, School of Computer Science, Bina Nusantara University, Palmerah, Jakarta 11480, IndonesiaComputer Science Department, School of Computer Science, Bina Nusantara University, Palmerah, Jakarta 11480, IndonesiaImage captioning research, which initially focused on describing images factually, is currently being developed in the direction of incorporating sentiments or styles to produce natural captions that reflect human-generated captions. The problem this research tries to solve the problem that captions produced by existing models are rigid and unnatural due to the lack of sentiment. The purpose of this research is to design a reliable image captioning model that incorporates style based on state-of-the-art SeqCapsGAN architecture. The materials needed are MS COCO and SentiCaps datasets. Research methods are done through literature studies and experiments. While many previous studies compare their works without considering the differences in components and parameters being used, this research proposes a different approach to find more reliable configurations and provide more detailed insights into models’ behavior. This research also does further experiments on the generator part that have not been thoroughly investigated. Experiments are done on the combinations of feature extractor (VGG-19 and ResNet-50), discriminator model (CNN and Capsule), optimizer (Adam, Nadam, and SGD), batch size (8, 16, 32, and 64), and learning rate (0.001 and 0.0001) by doing a grid search. In conclusion, more insights into the models’ behavior can be drawn, and better configuration and result than the baseline can be achieved. Our research implies that research in comparative studies of image recognition models in image captioning context, automated metrics, and larger datasets suited for stylized image captioning might be needed for furthering the research in this field.https://joiv.org/index.php/joiv/article/view/709stylized image captioningseqcapsgansentiments or stylesgenerative adversarial network (gan)capsulediscriminatorgenerator.
spellingShingle	Dennis Setiawan Maria Astrid Coenradina Saffachrissa Shintia Tamara Derwin Suhartono Image Captioning with Style Using Generative Adversarial Networks JOIV: International Journal on Informatics Visualization stylized image captioning seqcapsgan sentiments or styles generative adversarial network (gan) capsule discriminator generator.
title	Image Captioning with Style Using Generative Adversarial Networks
title_full	Image Captioning with Style Using Generative Adversarial Networks
title_fullStr	Image Captioning with Style Using Generative Adversarial Networks
title_full_unstemmed	Image Captioning with Style Using Generative Adversarial Networks
title_short	Image Captioning with Style Using Generative Adversarial Networks
title_sort	image captioning with style using generative adversarial networks
topic	stylized image captioning seqcapsgan sentiments or styles generative adversarial network (gan) capsule discriminator generator.
url	https://joiv.org/index.php/joiv/article/view/709
work_keys_str_mv	AT dennissetiawan imagecaptioningwithstyleusinggenerativeadversarialnetworks AT mariaastridcoenradinasaffachrissa imagecaptioningwithstyleusinggenerativeadversarialnetworks AT shintiatamara imagecaptioningwithstyleusinggenerativeadversarialnetworks AT derwinsuhartono imagecaptioningwithstyleusinggenerativeadversarialnetworks

Image Captioning with Style Using Generative Adversarial Networks

Similar Items