A performance analysis of transformer-based deep learning models for Arabic image captioning

Image captioning has become a fundamental operation that allows the automatic generation of text descriptions of images. However, most existing work focused on performing the image captioning task in English, and only a few proposals exist that address the image captioning task in Arabic. This paper...

Full description

Bibliographic Details
Main Authors:	Ashwaq Alsayed, Thamir M. Qadah, Muhammad Arif
Format:	Article
Language:	English
Published:	Elsevier 2023-10-01
Series:	Journal of King Saud University: Computer and Information Sciences
Subjects:	Image captioning Arabic image captioning Transformer model Performance analysis and evaluation Deep learning Machine learning
Online Access:	http://www.sciencedirect.com/science/article/pii/S131915782300304X

_version_	1797629778836062208
author	Ashwaq Alsayed Thamir M. Qadah Muhammad Arif
author_facet	Ashwaq Alsayed Thamir M. Qadah Muhammad Arif
author_sort	Ashwaq Alsayed
collection	DOAJ
description	Image captioning has become a fundamental operation that allows the automatic generation of text descriptions of images. However, most existing work focused on performing the image captioning task in English, and only a few proposals exist that address the image captioning task in Arabic. This paper focuses on understanding the factors that affect the performance of machine learning models performing Arabic image captioning (AIC). In particular, we focus on transformer-based models for AIC and study the impact of various text-preprocessing methods: CAMeL Tools, ArabertPreprocessor, and Stanza. Our study shows that using CAMeL Tools to preprocess text labels improves the AIC performance by up to 34–92% in the BLEU-4 score. In addition, we study the impact of image recognition models. Our results show that ResNet152 is better than EfficientNet-B0 and can improve BLEU scores performance by 9–11%. Furthermore, we investigate the impact of different datasets on the overall AIC performance and build an extended version of the Arabic Flickr8k dataset. Using the extended version improves the BLEU-4 score of the AIC model by up to 148%. Finally, utilizing our results, we build a model that significantly outperforms the state-of-the-art proposals in AIC by up to 196–379% in the BLUE-4 score.
first_indexed	2024-03-11T10:58:44Z
format	Article
id	doaj.art-f81fbb81f4e5478c8db380d06b2c4daf
institution	Directory Open Access Journal
issn	1319-1578
language	English
last_indexed	2024-03-11T10:58:44Z
publishDate	2023-10-01
publisher	Elsevier
record_format	Article
series	Journal of King Saud University: Computer and Information Sciences
spelling	doaj.art-f81fbb81f4e5478c8db380d06b2c4daf2023-11-13T04:08:58ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782023-10-01359101750A performance analysis of transformer-based deep learning models for Arabic image captioningAshwaq Alsayed0Thamir M. Qadah1Muhammad Arif2Computer Science Department, College of Computer and Information Systems, Umm Al-Qura University, Makkah, Saudi ArabiaCorresponding author.; Computer Science Department, College of Computer and Information Systems, Umm Al-Qura University, Makkah, Saudi ArabiaComputer Science Department, College of Computer and Information Systems, Umm Al-Qura University, Makkah, Saudi ArabiaImage captioning has become a fundamental operation that allows the automatic generation of text descriptions of images. However, most existing work focused on performing the image captioning task in English, and only a few proposals exist that address the image captioning task in Arabic. This paper focuses on understanding the factors that affect the performance of machine learning models performing Arabic image captioning (AIC). In particular, we focus on transformer-based models for AIC and study the impact of various text-preprocessing methods: CAMeL Tools, ArabertPreprocessor, and Stanza. Our study shows that using CAMeL Tools to preprocess text labels improves the AIC performance by up to 34–92% in the BLEU-4 score. In addition, we study the impact of image recognition models. Our results show that ResNet152 is better than EfficientNet-B0 and can improve BLEU scores performance by 9–11%. Furthermore, we investigate the impact of different datasets on the overall AIC performance and build an extended version of the Arabic Flickr8k dataset. Using the extended version improves the BLEU-4 score of the AIC model by up to 148%. Finally, utilizing our results, we build a model that significantly outperforms the state-of-the-art proposals in AIC by up to 196–379% in the BLUE-4 score.http://www.sciencedirect.com/science/article/pii/S131915782300304XImage captioningArabic image captioningTransformer modelPerformance analysis and evaluationDeep learningMachine learning
spellingShingle	Ashwaq Alsayed Thamir M. Qadah Muhammad Arif A performance analysis of transformer-based deep learning models for Arabic image captioning Journal of King Saud University: Computer and Information Sciences Image captioning Arabic image captioning Transformer model Performance analysis and evaluation Deep learning Machine learning
title	A performance analysis of transformer-based deep learning models for Arabic image captioning
title_full	A performance analysis of transformer-based deep learning models for Arabic image captioning
title_fullStr	A performance analysis of transformer-based deep learning models for Arabic image captioning
title_full_unstemmed	A performance analysis of transformer-based deep learning models for Arabic image captioning
title_short	A performance analysis of transformer-based deep learning models for Arabic image captioning
title_sort	performance analysis of transformer based deep learning models for arabic image captioning
topic	Image captioning Arabic image captioning Transformer model Performance analysis and evaluation Deep learning Machine learning
url	http://www.sciencedirect.com/science/article/pii/S131915782300304X
work_keys_str_mv	AT ashwaqalsayed aperformanceanalysisoftransformerbaseddeeplearningmodelsforarabicimagecaptioning AT thamirmqadah aperformanceanalysisoftransformerbaseddeeplearningmodelsforarabicimagecaptioning AT muhammadarif aperformanceanalysisoftransformerbaseddeeplearningmodelsforarabicimagecaptioning AT ashwaqalsayed performanceanalysisoftransformerbaseddeeplearningmodelsforarabicimagecaptioning AT thamirmqadah performanceanalysisoftransformerbaseddeeplearningmodelsforarabicimagecaptioning AT muhammadarif performanceanalysisoftransformerbaseddeeplearningmodelsforarabicimagecaptioning

A performance analysis of transformer-based deep learning models for Arabic image captioning

Similar Items