Middle-Level Attribute-Based Language Retouching for Image Caption Generation

Image caption generation is attractive research which focuses on generating natural language sentences to describe the visual content of a given image. It is an interdisciplinary subject combining computer vision (CV) and natural language processing (NLP). The existing image captioning methods are m...

Full description

Bibliographic Details
Main Authors:	Zhibin Guan, Kang Liu, Yan Ma, Xu Qian, Tongkai Ji
Format:	Article
Language:	English
Published:	MDPI AG 2018-10-01
Series:	Applied Sciences
Subjects:	image captioning middle-level attributes language retouching MLALR
Online Access:	http://www.mdpi.com/2076-3417/8/10/1850

_version_	1819157204653047808
author	Zhibin Guan Kang Liu Yan Ma Xu Qian Tongkai Ji
author_facet	Zhibin Guan Kang Liu Yan Ma Xu Qian Tongkai Ji
author_sort	Zhibin Guan
collection	DOAJ
description	Image caption generation is attractive research which focuses on generating natural language sentences to describe the visual content of a given image. It is an interdisciplinary subject combining computer vision (CV) and natural language processing (NLP). The existing image captioning methods are mainly focused on generating the final image caption directly, which may lose significant identification information of objects contained in the raw image. Therefore, we propose a new middle-level attribute-based language retouching (MLALR) method to solve this problem. Our proposed MLALR method uses the middle-level attributes predicted from the object regions to retouch the intermediate image description, which is generated by our language generation model. The advantage of our MLALR method is that it can correct descriptive errors in the intermediate image description and make the final image caption more accurate. Moreover, evaluation using benchmark datasets—MSCOCO, Flickr8K, and Flickr30K—validated the impressive performance of our MLALR method with evaluation metrics—BLEU, METEOR, ROUGE-L, CIDEr, and SPICE.
first_indexed	2024-12-22T16:05:03Z
format	Article
id	doaj.art-9741aa83a7b342019c2043ed4e3c3b5e
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-12-22T16:05:03Z
publishDate	2018-10-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-9741aa83a7b342019c2043ed4e3c3b5e2022-12-21T18:20:36ZengMDPI AGApplied Sciences2076-34172018-10-01810185010.3390/app8101850app8101850Middle-Level Attribute-Based Language Retouching for Image Caption GenerationZhibin Guan0Kang Liu1Yan Ma2Xu Qian3Tongkai Ji4School of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, ChinaSchool of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, ChinaSchool of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, ChinaSchool of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, ChinaSchool of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, ChinaImage caption generation is attractive research which focuses on generating natural language sentences to describe the visual content of a given image. It is an interdisciplinary subject combining computer vision (CV) and natural language processing (NLP). The existing image captioning methods are mainly focused on generating the final image caption directly, which may lose significant identification information of objects contained in the raw image. Therefore, we propose a new middle-level attribute-based language retouching (MLALR) method to solve this problem. Our proposed MLALR method uses the middle-level attributes predicted from the object regions to retouch the intermediate image description, which is generated by our language generation model. The advantage of our MLALR method is that it can correct descriptive errors in the intermediate image description and make the final image caption more accurate. Moreover, evaluation using benchmark datasets—MSCOCO, Flickr8K, and Flickr30K—validated the impressive performance of our MLALR method with evaluation metrics—BLEU, METEOR, ROUGE-L, CIDEr, and SPICE.http://www.mdpi.com/2076-3417/8/10/1850image captioningmiddle-level attributeslanguage retouchingMLALR
spellingShingle	Zhibin Guan Kang Liu Yan Ma Xu Qian Tongkai Ji Middle-Level Attribute-Based Language Retouching for Image Caption Generation Applied Sciences image captioning middle-level attributes language retouching MLALR
title	Middle-Level Attribute-Based Language Retouching for Image Caption Generation
title_full	Middle-Level Attribute-Based Language Retouching for Image Caption Generation
title_fullStr	Middle-Level Attribute-Based Language Retouching for Image Caption Generation
title_full_unstemmed	Middle-Level Attribute-Based Language Retouching for Image Caption Generation
title_short	Middle-Level Attribute-Based Language Retouching for Image Caption Generation
title_sort	middle level attribute based language retouching for image caption generation
topic	image captioning middle-level attributes language retouching MLALR
url	http://www.mdpi.com/2076-3417/8/10/1850
work_keys_str_mv	AT zhibinguan middlelevelattributebasedlanguageretouchingforimagecaptiongeneration AT kangliu middlelevelattributebasedlanguageretouchingforimagecaptiongeneration AT yanma middlelevelattributebasedlanguageretouchingforimagecaptiongeneration AT xuqian middlelevelattributebasedlanguageretouchingforimagecaptiongeneration AT tongkaiji middlelevelattributebasedlanguageretouchingforimagecaptiongeneration

Middle-Level Attribute-Based Language Retouching for Image Caption Generation

Similar Items