A performance analysis of transformer-based deep learning models for Arabic image captioning

Image captioning has become a fundamental operation that allows the automatic generation of text descriptions of images. However, most existing work focused on performing the image captioning task in English, and only a few proposals exist that address the image captioning task in Arabic. This paper...

Full description

Bibliographic Details
Main Authors: Ashwaq Alsayed, Thamir M. Qadah, Muhammad Arif
Format: Article
Language:English
Published: Elsevier 2023-10-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S131915782300304X
_version_ 1797629778836062208
author Ashwaq Alsayed
Thamir M. Qadah
Muhammad Arif
author_facet Ashwaq Alsayed
Thamir M. Qadah
Muhammad Arif
author_sort Ashwaq Alsayed
collection DOAJ
description Image captioning has become a fundamental operation that allows the automatic generation of text descriptions of images. However, most existing work focused on performing the image captioning task in English, and only a few proposals exist that address the image captioning task in Arabic. This paper focuses on understanding the factors that affect the performance of machine learning models performing Arabic image captioning (AIC). In particular, we focus on transformer-based models for AIC and study the impact of various text-preprocessing methods: CAMeL Tools, ArabertPreprocessor, and Stanza. Our study shows that using CAMeL Tools to preprocess text labels improves the AIC performance by up to 34–92% in the BLEU-4 score. In addition, we study the impact of image recognition models. Our results show that ResNet152 is better than EfficientNet-B0 and can improve BLEU scores performance by 9–11%. Furthermore, we investigate the impact of different datasets on the overall AIC performance and build an extended version of the Arabic Flickr8k dataset. Using the extended version improves the BLEU-4 score of the AIC model by up to 148%. Finally, utilizing our results, we build a model that significantly outperforms the state-of-the-art proposals in AIC by up to 196–379% in the BLUE-4 score.
first_indexed 2024-03-11T10:58:44Z
format Article
id doaj.art-f81fbb81f4e5478c8db380d06b2c4daf
institution Directory Open Access Journal
issn 1319-1578
language English
last_indexed 2024-03-11T10:58:44Z
publishDate 2023-10-01
publisher Elsevier
record_format Article
series Journal of King Saud University: Computer and Information Sciences
spelling doaj.art-f81fbb81f4e5478c8db380d06b2c4daf2023-11-13T04:08:58ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782023-10-01359101750A performance analysis of transformer-based deep learning models for Arabic image captioningAshwaq Alsayed0Thamir M. Qadah1Muhammad Arif2Computer Science Department, College of Computer and Information Systems, Umm Al-Qura University, Makkah, Saudi ArabiaCorresponding author.; Computer Science Department, College of Computer and Information Systems, Umm Al-Qura University, Makkah, Saudi ArabiaComputer Science Department, College of Computer and Information Systems, Umm Al-Qura University, Makkah, Saudi ArabiaImage captioning has become a fundamental operation that allows the automatic generation of text descriptions of images. However, most existing work focused on performing the image captioning task in English, and only a few proposals exist that address the image captioning task in Arabic. This paper focuses on understanding the factors that affect the performance of machine learning models performing Arabic image captioning (AIC). In particular, we focus on transformer-based models for AIC and study the impact of various text-preprocessing methods: CAMeL Tools, ArabertPreprocessor, and Stanza. Our study shows that using CAMeL Tools to preprocess text labels improves the AIC performance by up to 34–92% in the BLEU-4 score. In addition, we study the impact of image recognition models. Our results show that ResNet152 is better than EfficientNet-B0 and can improve BLEU scores performance by 9–11%. Furthermore, we investigate the impact of different datasets on the overall AIC performance and build an extended version of the Arabic Flickr8k dataset. Using the extended version improves the BLEU-4 score of the AIC model by up to 148%. Finally, utilizing our results, we build a model that significantly outperforms the state-of-the-art proposals in AIC by up to 196–379% in the BLUE-4 score.http://www.sciencedirect.com/science/article/pii/S131915782300304XImage captioningArabic image captioningTransformer modelPerformance analysis and evaluationDeep learningMachine learning
spellingShingle Ashwaq Alsayed
Thamir M. Qadah
Muhammad Arif
A performance analysis of transformer-based deep learning models for Arabic image captioning
Journal of King Saud University: Computer and Information Sciences
Image captioning
Arabic image captioning
Transformer model
Performance analysis and evaluation
Deep learning
Machine learning
title A performance analysis of transformer-based deep learning models for Arabic image captioning
title_full A performance analysis of transformer-based deep learning models for Arabic image captioning
title_fullStr A performance analysis of transformer-based deep learning models for Arabic image captioning
title_full_unstemmed A performance analysis of transformer-based deep learning models for Arabic image captioning
title_short A performance analysis of transformer-based deep learning models for Arabic image captioning
title_sort performance analysis of transformer based deep learning models for arabic image captioning
topic Image captioning
Arabic image captioning
Transformer model
Performance analysis and evaluation
Deep learning
Machine learning
url http://www.sciencedirect.com/science/article/pii/S131915782300304X
work_keys_str_mv AT ashwaqalsayed aperformanceanalysisoftransformerbaseddeeplearningmodelsforarabicimagecaptioning
AT thamirmqadah aperformanceanalysisoftransformerbaseddeeplearningmodelsforarabicimagecaptioning
AT muhammadarif aperformanceanalysisoftransformerbaseddeeplearningmodelsforarabicimagecaptioning
AT ashwaqalsayed performanceanalysisoftransformerbaseddeeplearningmodelsforarabicimagecaptioning
AT thamirmqadah performanceanalysisoftransformerbaseddeeplearningmodelsforarabicimagecaptioning
AT muhammadarif performanceanalysisoftransformerbaseddeeplearningmodelsforarabicimagecaptioning