Image-Captioning Model Compression

Image captioning is a very important task, which is on the edge between natural language processing (NLP) and computer vision (CV). The current quality of the captioning models allows them to be used for practical tasks, but they require both large computational power and considerable storage space....

Full description

Bibliographic Details
Main Authors:	Viktar Atliha, Dmitrij Šešok
Format:	Article
Language:	English
Published:	MDPI AG 2022-02-01
Series:	Applied Sciences
Subjects:	image captioning model compression pruning quantization
Online Access:	https://www.mdpi.com/2076-3417/12/3/1638

_version_	1797489010750259200
author	Viktar Atliha Dmitrij Šešok
author_facet	Viktar Atliha Dmitrij Šešok
author_sort	Viktar Atliha
collection	DOAJ
description	Image captioning is a very important task, which is on the edge between natural language processing (NLP) and computer vision (CV). The current quality of the captioning models allows them to be used for practical tasks, but they require both large computational power and considerable storage space. Despite the practical importance of the image-captioning problem, only a few papers have investigated model size compression in order to prepare them for use on mobile devices. Furthermore, these works usually only investigate decoder compression in a typical encoder–decoder architecture, while the encoder traditionally occupies most of the space. We applied the most efficient model-compression techniques such as architectural changes, pruning and quantization to several state-of-the-art image-captioning architectures. As a result, all of these models were compressed by no less than 91% in terms of memory (including encoder), but lost no more than 2% and 4.5% in metrics such as CIDEr and SPICE, respectively. At the same time, the best model showed results of 127.4 CIDEr and 21.4 SPICE, with a size equal to only 34.8 MB, which sets a strong baseline for compression problems for image-captioning models, and could be used for practical applications.
first_indexed	2024-03-10T00:10:25Z
format	Article
id	doaj.art-bcb81a741b124700b1f946e7f6165889
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-10T00:10:25Z
publishDate	2022-02-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-bcb81a741b124700b1f946e7f61658892023-11-23T16:00:56ZengMDPI AGApplied Sciences2076-34172022-02-01123163810.3390/app12031638Image-Captioning Model CompressionViktar Atliha0Dmitrij Šešok1Department of Information Technologies, Vilnius Gediminas Technical University, Saulėtekio Al. 11, LT-10223 Vilnius, LithuaniaDepartment of Information Technologies, Vilnius Gediminas Technical University, Saulėtekio Al. 11, LT-10223 Vilnius, LithuaniaImage captioning is a very important task, which is on the edge between natural language processing (NLP) and computer vision (CV). The current quality of the captioning models allows them to be used for practical tasks, but they require both large computational power and considerable storage space. Despite the practical importance of the image-captioning problem, only a few papers have investigated model size compression in order to prepare them for use on mobile devices. Furthermore, these works usually only investigate decoder compression in a typical encoder–decoder architecture, while the encoder traditionally occupies most of the space. We applied the most efficient model-compression techniques such as architectural changes, pruning and quantization to several state-of-the-art image-captioning architectures. As a result, all of these models were compressed by no less than 91% in terms of memory (including encoder), but lost no more than 2% and 4.5% in metrics such as CIDEr and SPICE, respectively. At the same time, the best model showed results of 127.4 CIDEr and 21.4 SPICE, with a size equal to only 34.8 MB, which sets a strong baseline for compression problems for image-captioning models, and could be used for practical applications.https://www.mdpi.com/2076-3417/12/3/1638image captioningmodel compressionpruningquantization
spellingShingle	Viktar Atliha Dmitrij Šešok Image-Captioning Model Compression Applied Sciences image captioning model compression pruning quantization
title	Image-Captioning Model Compression
title_full	Image-Captioning Model Compression
title_fullStr	Image-Captioning Model Compression
title_full_unstemmed	Image-Captioning Model Compression
title_short	Image-Captioning Model Compression
title_sort	image captioning model compression
topic	image captioning model compression pruning quantization
url	https://www.mdpi.com/2076-3417/12/3/1638
work_keys_str_mv	AT viktaratliha imagecaptioningmodelcompression AT dmitrijsesok imagecaptioningmodelcompression

Image-Captioning Model Compression

Similar Items