Panoptic Segmentation-Based Attention for Image Captioning

Image captioning is the task of generating textual descriptions of images. In order to obtain a better image representation, attention mechanisms have been widely adopted in image captioning. However, in existing models with detection-based attention, the rectangular attention regions are not fine-g...

Full description

Bibliographic Details
Main Authors:	Wenjie Cai, Zheng Xiong, Xianfang Sun, Paul L. Rosin, Longcun Jin, Xinyi Peng
Format:	Article
Language:	English
Published:	MDPI AG 2020-01-01
Series:	Applied Sciences
Subjects:	image captioning attention mechanism panoptic segmentation
Online Access:	https://www.mdpi.com/2076-3417/10/1/391

_version_	1831806592256835584
author	Wenjie Cai Zheng Xiong Xianfang Sun Paul L. Rosin Longcun Jin Xinyi Peng
author_facet	Wenjie Cai Zheng Xiong Xianfang Sun Paul L. Rosin Longcun Jin Xinyi Peng
author_sort	Wenjie Cai
collection	DOAJ
description	Image captioning is the task of generating textual descriptions of images. In order to obtain a better image representation, attention mechanisms have been widely adopted in image captioning. However, in existing models with detection-based attention, the rectangular attention regions are not fine-grained, as they contain irrelevant regions (e.g., background or overlapped regions) around the object, making the model generate inaccurate captions. To address this issue, we propose panoptic segmentation-based attention that performs attention at a mask-level (i.e., the shape of the main part of an instance). Our approach extracts feature vectors from the corresponding segmentation regions, which is more fine-grained than current attention mechanisms. Moreover, in order to process features of different classes independently, we propose a dual-attention module which is generic and can be applied to other frameworks. Experimental results showed that our model could recognize the overlapped objects and understand the scene better. Our approach achieved competitive performance against state-of-the-art methods. We made our code available.
first_indexed	2024-12-22T19:45:05Z
format	Article
id	doaj.art-775265b2a97549f4a2f9173e25b3829a
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-12-22T19:45:05Z
publishDate	2020-01-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-775265b2a97549f4a2f9173e25b3829a2022-12-21T18:14:42ZengMDPI AGApplied Sciences2076-34172020-01-0110139110.3390/app10010391app10010391Panoptic Segmentation-Based Attention for Image CaptioningWenjie Cai0Zheng Xiong1Xianfang Sun2Paul L. Rosin3Longcun Jin4Xinyi Peng5School of Software Engineering, South China University of Technology, Guangzhou 510006, ChinaSchool of Software Engineering, South China University of Technology, Guangzhou 510006, ChinaSchool of Computer Science and Informatics, Cardiff University, Cardiff CF10 3AT, UKSchool of Computer Science and Informatics, Cardiff University, Cardiff CF10 3AT, UKSchool of Software Engineering, South China University of Technology, Guangzhou 510006, ChinaSchool of Software Engineering, South China University of Technology, Guangzhou 510006, ChinaImage captioning is the task of generating textual descriptions of images. In order to obtain a better image representation, attention mechanisms have been widely adopted in image captioning. However, in existing models with detection-based attention, the rectangular attention regions are not fine-grained, as they contain irrelevant regions (e.g., background or overlapped regions) around the object, making the model generate inaccurate captions. To address this issue, we propose panoptic segmentation-based attention that performs attention at a mask-level (i.e., the shape of the main part of an instance). Our approach extracts feature vectors from the corresponding segmentation regions, which is more fine-grained than current attention mechanisms. Moreover, in order to process features of different classes independently, we propose a dual-attention module which is generic and can be applied to other frameworks. Experimental results showed that our model could recognize the overlapped objects and understand the scene better. Our approach achieved competitive performance against state-of-the-art methods. We made our code available.https://www.mdpi.com/2076-3417/10/1/391image captioningattention mechanismpanoptic segmentation
spellingShingle	Wenjie Cai Zheng Xiong Xianfang Sun Paul L. Rosin Longcun Jin Xinyi Peng Panoptic Segmentation-Based Attention for Image Captioning Applied Sciences image captioning attention mechanism panoptic segmentation
title	Panoptic Segmentation-Based Attention for Image Captioning
title_full	Panoptic Segmentation-Based Attention for Image Captioning
title_fullStr	Panoptic Segmentation-Based Attention for Image Captioning
title_full_unstemmed	Panoptic Segmentation-Based Attention for Image Captioning
title_short	Panoptic Segmentation-Based Attention for Image Captioning
title_sort	panoptic segmentation based attention for image captioning
topic	image captioning attention mechanism panoptic segmentation
url	https://www.mdpi.com/2076-3417/10/1/391
work_keys_str_mv	AT wenjiecai panopticsegmentationbasedattentionforimagecaptioning AT zhengxiong panopticsegmentationbasedattentionforimagecaptioning AT xianfangsun panopticsegmentationbasedattentionforimagecaptioning AT paullrosin panopticsegmentationbasedattentionforimagecaptioning AT longcunjin panopticsegmentationbasedattentionforimagecaptioning AT xinyipeng panopticsegmentationbasedattentionforimagecaptioning

Panoptic Segmentation-Based Attention for Image Captioning

Similar Items