Middle-Level Attribute-Based Language Retouching for Image Caption Generation
Image caption generation is attractive research which focuses on generating natural language sentences to describe the visual content of a given image. It is an interdisciplinary subject combining computer vision (CV) and natural language processing (NLP). The existing image captioning methods are m...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2018-10-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | http://www.mdpi.com/2076-3417/8/10/1850 |
_version_ | 1819157204653047808 |
---|---|
author | Zhibin Guan Kang Liu Yan Ma Xu Qian Tongkai Ji |
author_facet | Zhibin Guan Kang Liu Yan Ma Xu Qian Tongkai Ji |
author_sort | Zhibin Guan |
collection | DOAJ |
description | Image caption generation is attractive research which focuses on generating natural language sentences to describe the visual content of a given image. It is an interdisciplinary subject combining computer vision (CV) and natural language processing (NLP). The existing image captioning methods are mainly focused on generating the final image caption directly, which may lose significant identification information of objects contained in the raw image. Therefore, we propose a new middle-level attribute-based language retouching (MLALR) method to solve this problem. Our proposed MLALR method uses the middle-level attributes predicted from the object regions to retouch the intermediate image description, which is generated by our language generation model. The advantage of our MLALR method is that it can correct descriptive errors in the intermediate image description and make the final image caption more accurate. Moreover, evaluation using benchmark datasets—MSCOCO, Flickr8K, and Flickr30K—validated the impressive performance of our MLALR method with evaluation metrics—BLEU, METEOR, ROUGE-L, CIDEr, and SPICE. |
first_indexed | 2024-12-22T16:05:03Z |
format | Article |
id | doaj.art-9741aa83a7b342019c2043ed4e3c3b5e |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-12-22T16:05:03Z |
publishDate | 2018-10-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-9741aa83a7b342019c2043ed4e3c3b5e2022-12-21T18:20:36ZengMDPI AGApplied Sciences2076-34172018-10-01810185010.3390/app8101850app8101850Middle-Level Attribute-Based Language Retouching for Image Caption GenerationZhibin Guan0Kang Liu1Yan Ma2Xu Qian3Tongkai Ji4School of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, ChinaSchool of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, ChinaSchool of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, ChinaSchool of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, ChinaSchool of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, ChinaImage caption generation is attractive research which focuses on generating natural language sentences to describe the visual content of a given image. It is an interdisciplinary subject combining computer vision (CV) and natural language processing (NLP). The existing image captioning methods are mainly focused on generating the final image caption directly, which may lose significant identification information of objects contained in the raw image. Therefore, we propose a new middle-level attribute-based language retouching (MLALR) method to solve this problem. Our proposed MLALR method uses the middle-level attributes predicted from the object regions to retouch the intermediate image description, which is generated by our language generation model. The advantage of our MLALR method is that it can correct descriptive errors in the intermediate image description and make the final image caption more accurate. Moreover, evaluation using benchmark datasets—MSCOCO, Flickr8K, and Flickr30K—validated the impressive performance of our MLALR method with evaluation metrics—BLEU, METEOR, ROUGE-L, CIDEr, and SPICE.http://www.mdpi.com/2076-3417/8/10/1850image captioningmiddle-level attributeslanguage retouchingMLALR |
spellingShingle | Zhibin Guan Kang Liu Yan Ma Xu Qian Tongkai Ji Middle-Level Attribute-Based Language Retouching for Image Caption Generation Applied Sciences image captioning middle-level attributes language retouching MLALR |
title | Middle-Level Attribute-Based Language Retouching for Image Caption Generation |
title_full | Middle-Level Attribute-Based Language Retouching for Image Caption Generation |
title_fullStr | Middle-Level Attribute-Based Language Retouching for Image Caption Generation |
title_full_unstemmed | Middle-Level Attribute-Based Language Retouching for Image Caption Generation |
title_short | Middle-Level Attribute-Based Language Retouching for Image Caption Generation |
title_sort | middle level attribute based language retouching for image caption generation |
topic | image captioning middle-level attributes language retouching MLALR |
url | http://www.mdpi.com/2076-3417/8/10/1850 |
work_keys_str_mv | AT zhibinguan middlelevelattributebasedlanguageretouchingforimagecaptiongeneration AT kangliu middlelevelattributebasedlanguageretouchingforimagecaptiongeneration AT yanma middlelevelattributebasedlanguageretouchingforimagecaptiongeneration AT xuqian middlelevelattributebasedlanguageretouchingforimagecaptiongeneration AT tongkaiji middlelevelattributebasedlanguageretouchingforimagecaptiongeneration |