A performance analysis of transformer-based deep learning models for Arabic image captioning
Image captioning has become a fundamental operation that allows the automatic generation of text descriptions of images. However, most existing work focused on performing the image captioning task in English, and only a few proposals exist that address the image captioning task in Arabic. This paper...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-10-01
|
Series: | Journal of King Saud University: Computer and Information Sciences |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S131915782300304X |
_version_ | 1797629778836062208 |
---|---|
author | Ashwaq Alsayed Thamir M. Qadah Muhammad Arif |
author_facet | Ashwaq Alsayed Thamir M. Qadah Muhammad Arif |
author_sort | Ashwaq Alsayed |
collection | DOAJ |
description | Image captioning has become a fundamental operation that allows the automatic generation of text descriptions of images. However, most existing work focused on performing the image captioning task in English, and only a few proposals exist that address the image captioning task in Arabic. This paper focuses on understanding the factors that affect the performance of machine learning models performing Arabic image captioning (AIC). In particular, we focus on transformer-based models for AIC and study the impact of various text-preprocessing methods: CAMeL Tools, ArabertPreprocessor, and Stanza. Our study shows that using CAMeL Tools to preprocess text labels improves the AIC performance by up to 34–92% in the BLEU-4 score. In addition, we study the impact of image recognition models. Our results show that ResNet152 is better than EfficientNet-B0 and can improve BLEU scores performance by 9–11%. Furthermore, we investigate the impact of different datasets on the overall AIC performance and build an extended version of the Arabic Flickr8k dataset. Using the extended version improves the BLEU-4 score of the AIC model by up to 148%. Finally, utilizing our results, we build a model that significantly outperforms the state-of-the-art proposals in AIC by up to 196–379% in the BLUE-4 score. |
first_indexed | 2024-03-11T10:58:44Z |
format | Article |
id | doaj.art-f81fbb81f4e5478c8db380d06b2c4daf |
institution | Directory Open Access Journal |
issn | 1319-1578 |
language | English |
last_indexed | 2024-03-11T10:58:44Z |
publishDate | 2023-10-01 |
publisher | Elsevier |
record_format | Article |
series | Journal of King Saud University: Computer and Information Sciences |
spelling | doaj.art-f81fbb81f4e5478c8db380d06b2c4daf2023-11-13T04:08:58ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782023-10-01359101750A performance analysis of transformer-based deep learning models for Arabic image captioningAshwaq Alsayed0Thamir M. Qadah1Muhammad Arif2Computer Science Department, College of Computer and Information Systems, Umm Al-Qura University, Makkah, Saudi ArabiaCorresponding author.; Computer Science Department, College of Computer and Information Systems, Umm Al-Qura University, Makkah, Saudi ArabiaComputer Science Department, College of Computer and Information Systems, Umm Al-Qura University, Makkah, Saudi ArabiaImage captioning has become a fundamental operation that allows the automatic generation of text descriptions of images. However, most existing work focused on performing the image captioning task in English, and only a few proposals exist that address the image captioning task in Arabic. This paper focuses on understanding the factors that affect the performance of machine learning models performing Arabic image captioning (AIC). In particular, we focus on transformer-based models for AIC and study the impact of various text-preprocessing methods: CAMeL Tools, ArabertPreprocessor, and Stanza. Our study shows that using CAMeL Tools to preprocess text labels improves the AIC performance by up to 34–92% in the BLEU-4 score. In addition, we study the impact of image recognition models. Our results show that ResNet152 is better than EfficientNet-B0 and can improve BLEU scores performance by 9–11%. Furthermore, we investigate the impact of different datasets on the overall AIC performance and build an extended version of the Arabic Flickr8k dataset. Using the extended version improves the BLEU-4 score of the AIC model by up to 148%. Finally, utilizing our results, we build a model that significantly outperforms the state-of-the-art proposals in AIC by up to 196–379% in the BLUE-4 score.http://www.sciencedirect.com/science/article/pii/S131915782300304XImage captioningArabic image captioningTransformer modelPerformance analysis and evaluationDeep learningMachine learning |
spellingShingle | Ashwaq Alsayed Thamir M. Qadah Muhammad Arif A performance analysis of transformer-based deep learning models for Arabic image captioning Journal of King Saud University: Computer and Information Sciences Image captioning Arabic image captioning Transformer model Performance analysis and evaluation Deep learning Machine learning |
title | A performance analysis of transformer-based deep learning models for Arabic image captioning |
title_full | A performance analysis of transformer-based deep learning models for Arabic image captioning |
title_fullStr | A performance analysis of transformer-based deep learning models for Arabic image captioning |
title_full_unstemmed | A performance analysis of transformer-based deep learning models for Arabic image captioning |
title_short | A performance analysis of transformer-based deep learning models for Arabic image captioning |
title_sort | performance analysis of transformer based deep learning models for arabic image captioning |
topic | Image captioning Arabic image captioning Transformer model Performance analysis and evaluation Deep learning Machine learning |
url | http://www.sciencedirect.com/science/article/pii/S131915782300304X |
work_keys_str_mv | AT ashwaqalsayed aperformanceanalysisoftransformerbaseddeeplearningmodelsforarabicimagecaptioning AT thamirmqadah aperformanceanalysisoftransformerbaseddeeplearningmodelsforarabicimagecaptioning AT muhammadarif aperformanceanalysisoftransformerbaseddeeplearningmodelsforarabicimagecaptioning AT ashwaqalsayed performanceanalysisoftransformerbaseddeeplearningmodelsforarabicimagecaptioning AT thamirmqadah performanceanalysisoftransformerbaseddeeplearningmodelsforarabicimagecaptioning AT muhammadarif performanceanalysisoftransformerbaseddeeplearningmodelsforarabicimagecaptioning |