Image-Captioning Model Compression
Image captioning is a very important task, which is on the edge between natural language processing (NLP) and computer vision (CV). The current quality of the captioning models allows them to be used for practical tasks, but they require both large computational power and considerable storage space....
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-02-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/12/3/1638 |
_version_ | 1797489010750259200 |
---|---|
author | Viktar Atliha Dmitrij Šešok |
author_facet | Viktar Atliha Dmitrij Šešok |
author_sort | Viktar Atliha |
collection | DOAJ |
description | Image captioning is a very important task, which is on the edge between natural language processing (NLP) and computer vision (CV). The current quality of the captioning models allows them to be used for practical tasks, but they require both large computational power and considerable storage space. Despite the practical importance of the image-captioning problem, only a few papers have investigated model size compression in order to prepare them for use on mobile devices. Furthermore, these works usually only investigate decoder compression in a typical encoder–decoder architecture, while the encoder traditionally occupies most of the space. We applied the most efficient model-compression techniques such as architectural changes, pruning and quantization to several state-of-the-art image-captioning architectures. As a result, all of these models were compressed by no less than 91% in terms of memory (including encoder), but lost no more than 2% and 4.5% in metrics such as CIDEr and SPICE, respectively. At the same time, the best model showed results of 127.4 CIDEr and 21.4 SPICE, with a size equal to only 34.8 MB, which sets a strong baseline for compression problems for image-captioning models, and could be used for practical applications. |
first_indexed | 2024-03-10T00:10:25Z |
format | Article |
id | doaj.art-bcb81a741b124700b1f946e7f6165889 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-10T00:10:25Z |
publishDate | 2022-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-bcb81a741b124700b1f946e7f61658892023-11-23T16:00:56ZengMDPI AGApplied Sciences2076-34172022-02-01123163810.3390/app12031638Image-Captioning Model CompressionViktar Atliha0Dmitrij Šešok1Department of Information Technologies, Vilnius Gediminas Technical University, Saulėtekio Al. 11, LT-10223 Vilnius, LithuaniaDepartment of Information Technologies, Vilnius Gediminas Technical University, Saulėtekio Al. 11, LT-10223 Vilnius, LithuaniaImage captioning is a very important task, which is on the edge between natural language processing (NLP) and computer vision (CV). The current quality of the captioning models allows them to be used for practical tasks, but they require both large computational power and considerable storage space. Despite the practical importance of the image-captioning problem, only a few papers have investigated model size compression in order to prepare them for use on mobile devices. Furthermore, these works usually only investigate decoder compression in a typical encoder–decoder architecture, while the encoder traditionally occupies most of the space. We applied the most efficient model-compression techniques such as architectural changes, pruning and quantization to several state-of-the-art image-captioning architectures. As a result, all of these models were compressed by no less than 91% in terms of memory (including encoder), but lost no more than 2% and 4.5% in metrics such as CIDEr and SPICE, respectively. At the same time, the best model showed results of 127.4 CIDEr and 21.4 SPICE, with a size equal to only 34.8 MB, which sets a strong baseline for compression problems for image-captioning models, and could be used for practical applications.https://www.mdpi.com/2076-3417/12/3/1638image captioningmodel compressionpruningquantization |
spellingShingle | Viktar Atliha Dmitrij Šešok Image-Captioning Model Compression Applied Sciences image captioning model compression pruning quantization |
title | Image-Captioning Model Compression |
title_full | Image-Captioning Model Compression |
title_fullStr | Image-Captioning Model Compression |
title_full_unstemmed | Image-Captioning Model Compression |
title_short | Image-Captioning Model Compression |
title_sort | image captioning model compression |
topic | image captioning model compression pruning quantization |
url | https://www.mdpi.com/2076-3417/12/3/1638 |
work_keys_str_mv | AT viktaratliha imagecaptioningmodelcompression AT dmitrijsesok imagecaptioningmodelcompression |