Middle-Level Attribute-Based Language Retouching for Image Caption Generation

Image caption generation is attractive research which focuses on generating natural language sentences to describe the visual content of a given image. It is an interdisciplinary subject combining computer vision (CV) and natural language processing (NLP). The existing image captioning methods are m...

Full description

Bibliographic Details
Main Authors: Zhibin Guan, Kang Liu, Yan Ma, Xu Qian, Tongkai Ji
Format: Article
Language:English
Published: MDPI AG 2018-10-01
Series:Applied Sciences
Subjects:
Online Access:http://www.mdpi.com/2076-3417/8/10/1850
_version_ 1819157204653047808
author Zhibin Guan
Kang Liu
Yan Ma
Xu Qian
Tongkai Ji
author_facet Zhibin Guan
Kang Liu
Yan Ma
Xu Qian
Tongkai Ji
author_sort Zhibin Guan
collection DOAJ
description Image caption generation is attractive research which focuses on generating natural language sentences to describe the visual content of a given image. It is an interdisciplinary subject combining computer vision (CV) and natural language processing (NLP). The existing image captioning methods are mainly focused on generating the final image caption directly, which may lose significant identification information of objects contained in the raw image. Therefore, we propose a new middle-level attribute-based language retouching (MLALR) method to solve this problem. Our proposed MLALR method uses the middle-level attributes predicted from the object regions to retouch the intermediate image description, which is generated by our language generation model. The advantage of our MLALR method is that it can correct descriptive errors in the intermediate image description and make the final image caption more accurate. Moreover, evaluation using benchmark datasets—MSCOCO, Flickr8K, and Flickr30K—validated the impressive performance of our MLALR method with evaluation metrics—BLEU, METEOR, ROUGE-L, CIDEr, and SPICE.
first_indexed 2024-12-22T16:05:03Z
format Article
id doaj.art-9741aa83a7b342019c2043ed4e3c3b5e
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-12-22T16:05:03Z
publishDate 2018-10-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-9741aa83a7b342019c2043ed4e3c3b5e2022-12-21T18:20:36ZengMDPI AGApplied Sciences2076-34172018-10-01810185010.3390/app8101850app8101850Middle-Level Attribute-Based Language Retouching for Image Caption GenerationZhibin Guan0Kang Liu1Yan Ma2Xu Qian3Tongkai Ji4School of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, ChinaSchool of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, ChinaSchool of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, ChinaSchool of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, ChinaSchool of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, ChinaImage caption generation is attractive research which focuses on generating natural language sentences to describe the visual content of a given image. It is an interdisciplinary subject combining computer vision (CV) and natural language processing (NLP). The existing image captioning methods are mainly focused on generating the final image caption directly, which may lose significant identification information of objects contained in the raw image. Therefore, we propose a new middle-level attribute-based language retouching (MLALR) method to solve this problem. Our proposed MLALR method uses the middle-level attributes predicted from the object regions to retouch the intermediate image description, which is generated by our language generation model. The advantage of our MLALR method is that it can correct descriptive errors in the intermediate image description and make the final image caption more accurate. Moreover, evaluation using benchmark datasets—MSCOCO, Flickr8K, and Flickr30K—validated the impressive performance of our MLALR method with evaluation metrics—BLEU, METEOR, ROUGE-L, CIDEr, and SPICE.http://www.mdpi.com/2076-3417/8/10/1850image captioningmiddle-level attributeslanguage retouchingMLALR
spellingShingle Zhibin Guan
Kang Liu
Yan Ma
Xu Qian
Tongkai Ji
Middle-Level Attribute-Based Language Retouching for Image Caption Generation
Applied Sciences
image captioning
middle-level attributes
language retouching
MLALR
title Middle-Level Attribute-Based Language Retouching for Image Caption Generation
title_full Middle-Level Attribute-Based Language Retouching for Image Caption Generation
title_fullStr Middle-Level Attribute-Based Language Retouching for Image Caption Generation
title_full_unstemmed Middle-Level Attribute-Based Language Retouching for Image Caption Generation
title_short Middle-Level Attribute-Based Language Retouching for Image Caption Generation
title_sort middle level attribute based language retouching for image caption generation
topic image captioning
middle-level attributes
language retouching
MLALR
url http://www.mdpi.com/2076-3417/8/10/1850
work_keys_str_mv AT zhibinguan middlelevelattributebasedlanguageretouchingforimagecaptiongeneration
AT kangliu middlelevelattributebasedlanguageretouchingforimagecaptiongeneration
AT yanma middlelevelattributebasedlanguageretouchingforimagecaptiongeneration
AT xuqian middlelevelattributebasedlanguageretouchingforimagecaptiongeneration
AT tongkaiji middlelevelattributebasedlanguageretouchingforimagecaptiongeneration