Image-Captioning Model Compression

Image captioning is a very important task, which is on the edge between natural language processing (NLP) and computer vision (CV). The current quality of the captioning models allows them to be used for practical tasks, but they require both large computational power and considerable storage space....

Full description

Bibliographic Details
Main Authors: Viktar Atliha, Dmitrij Šešok
Format: Article
Language:English
Published: MDPI AG 2022-02-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/3/1638
_version_ 1797489010750259200
author Viktar Atliha
Dmitrij Šešok
author_facet Viktar Atliha
Dmitrij Šešok
author_sort Viktar Atliha
collection DOAJ
description Image captioning is a very important task, which is on the edge between natural language processing (NLP) and computer vision (CV). The current quality of the captioning models allows them to be used for practical tasks, but they require both large computational power and considerable storage space. Despite the practical importance of the image-captioning problem, only a few papers have investigated model size compression in order to prepare them for use on mobile devices. Furthermore, these works usually only investigate decoder compression in a typical encoder–decoder architecture, while the encoder traditionally occupies most of the space. We applied the most efficient model-compression techniques such as architectural changes, pruning and quantization to several state-of-the-art image-captioning architectures. As a result, all of these models were compressed by no less than 91% in terms of memory (including encoder), but lost no more than 2% and 4.5% in metrics such as CIDEr and SPICE, respectively. At the same time, the best model showed results of 127.4 CIDEr and 21.4 SPICE, with a size equal to only 34.8 MB, which sets a strong baseline for compression problems for image-captioning models, and could be used for practical applications.
first_indexed 2024-03-10T00:10:25Z
format Article
id doaj.art-bcb81a741b124700b1f946e7f6165889
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-10T00:10:25Z
publishDate 2022-02-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-bcb81a741b124700b1f946e7f61658892023-11-23T16:00:56ZengMDPI AGApplied Sciences2076-34172022-02-01123163810.3390/app12031638Image-Captioning Model CompressionViktar Atliha0Dmitrij Šešok1Department of Information Technologies, Vilnius Gediminas Technical University, Saulėtekio Al. 11, LT-10223 Vilnius, LithuaniaDepartment of Information Technologies, Vilnius Gediminas Technical University, Saulėtekio Al. 11, LT-10223 Vilnius, LithuaniaImage captioning is a very important task, which is on the edge between natural language processing (NLP) and computer vision (CV). The current quality of the captioning models allows them to be used for practical tasks, but they require both large computational power and considerable storage space. Despite the practical importance of the image-captioning problem, only a few papers have investigated model size compression in order to prepare them for use on mobile devices. Furthermore, these works usually only investigate decoder compression in a typical encoder–decoder architecture, while the encoder traditionally occupies most of the space. We applied the most efficient model-compression techniques such as architectural changes, pruning and quantization to several state-of-the-art image-captioning architectures. As a result, all of these models were compressed by no less than 91% in terms of memory (including encoder), but lost no more than 2% and 4.5% in metrics such as CIDEr and SPICE, respectively. At the same time, the best model showed results of 127.4 CIDEr and 21.4 SPICE, with a size equal to only 34.8 MB, which sets a strong baseline for compression problems for image-captioning models, and could be used for practical applications.https://www.mdpi.com/2076-3417/12/3/1638image captioningmodel compressionpruningquantization
spellingShingle Viktar Atliha
Dmitrij Šešok
Image-Captioning Model Compression
Applied Sciences
image captioning
model compression
pruning
quantization
title Image-Captioning Model Compression
title_full Image-Captioning Model Compression
title_fullStr Image-Captioning Model Compression
title_full_unstemmed Image-Captioning Model Compression
title_short Image-Captioning Model Compression
title_sort image captioning model compression
topic image captioning
model compression
pruning
quantization
url https://www.mdpi.com/2076-3417/12/3/1638
work_keys_str_mv AT viktaratliha imagecaptioningmodelcompression
AT dmitrijsesok imagecaptioningmodelcompression