Combinatorial Analysis of Deep Learning and Machine Learning Video Captioning Studies: A Systematic Literature Review

Recent improvements formulated in the area of video captioning have brought rapid revolutions in its methods and the performance of its models. Machine learning and deep learning techniques are both employed in this regard. However, there is a lack of tracing the latest studies and their remarkable...

Full description

Bibliographic Details
Main Authors:	Tanzila Kehkashan, Abdullah Alsaeedi, Wael M. S. Yafooz, Nor Azman Ismail, Arafat Al-Dhaqm
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	Deep learning machine learning performance evaluation metrics video analysis
Online Access:	https://ieeexplore.ieee.org/document/10413461/

_version_	1827131555633430528
author	Tanzila Kehkashan Abdullah Alsaeedi Wael M. S. Yafooz Nor Azman Ismail Arafat Al-Dhaqm
author_facet	Tanzila Kehkashan Abdullah Alsaeedi Wael M. S. Yafooz Nor Azman Ismail Arafat Al-Dhaqm
author_sort	Tanzila Kehkashan
collection	DOAJ
description	Recent improvements formulated in the area of video captioning have brought rapid revolutions in its methods and the performance of its models. Machine learning and deep learning techniques are both employed in this regard. However, there is a lack of tracing the latest studies and their remarkable results. Although several studies have been proposed employing the ML and DL algorithms in different other areas, there is no systematic review utilizing the video captioning task. This study aims to examine, evaluate, and synthesize the primary studies into a thorough Systematic Literature Review (SLR) that provides a general overview of the methods used for video captioning. We performed the SLR to determine the research problems under which machine learning models were preferred over the deep learning models and vice versa. We collected a total of 1,656 studies retrieved from four electronic databases; Scopus, WoS, IEEE Xplore, and ACM, based on our search string from which 162 published studies passed the selection criteria related to one primary and two secondary research questions after a systematic process. Moreover, insufficient data collection and inefficient comparison of results are common issues identified during the review process. We conclude that the 2D/3D CNN for video feature extraction and LSTM for caption generation, METEOR and BLEU performance evaluation tools, and MSVD dataset are most frequently employed for video captioning. Our study is the pioneer in comparing the implementation of ML and DL algorithms employing the video captioning area. Thus, our study will accelerate the critical assessment of the state-of-the-art in other research fields of video analysis and human-computer interaction.
first_indexed	2024-04-24T18:53:52Z
format	Article
id	doaj.art-47887a2ee10d4187b349211668af235d
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2025-03-20T16:33:57Z
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-47887a2ee10d4187b349211668af235d2024-08-29T23:00:50ZengIEEEIEEE Access2169-35362024-01-0112350483508010.1109/ACCESS.2024.335798010413461Combinatorial Analysis of Deep Learning and Machine Learning Video Captioning Studies: A Systematic Literature ReviewTanzila Kehkashan0https://orcid.org/0000-0002-6325-4409Abdullah Alsaeedi1https://orcid.org/0000-0002-7974-7638Wael M. S. Yafooz2https://orcid.org/0000-0002-2842-9736Nor Azman Ismail3https://orcid.org/0000-0003-1785-008XArafat Al-Dhaqm4Faculty of Computing, Universiti Teknologi Malaysia, Johor Bahru, MalaysiaComputer Science Department, College of Computer Science and Engineering, Taibah University, Madina, Saudi ArabiaComputer Science Department, College of Computer Science and Engineering, Taibah University, Madina, Saudi ArabiaFaculty of Computing, Universiti Teknologi Malaysia, Johor Bahru, MalaysiaComputer and Information Sciences Department, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, Perak, MalaysiaRecent improvements formulated in the area of video captioning have brought rapid revolutions in its methods and the performance of its models. Machine learning and deep learning techniques are both employed in this regard. However, there is a lack of tracing the latest studies and their remarkable results. Although several studies have been proposed employing the ML and DL algorithms in different other areas, there is no systematic review utilizing the video captioning task. This study aims to examine, evaluate, and synthesize the primary studies into a thorough Systematic Literature Review (SLR) that provides a general overview of the methods used for video captioning. We performed the SLR to determine the research problems under which machine learning models were preferred over the deep learning models and vice versa. We collected a total of 1,656 studies retrieved from four electronic databases; Scopus, WoS, IEEE Xplore, and ACM, based on our search string from which 162 published studies passed the selection criteria related to one primary and two secondary research questions after a systematic process. Moreover, insufficient data collection and inefficient comparison of results are common issues identified during the review process. We conclude that the 2D/3D CNN for video feature extraction and LSTM for caption generation, METEOR and BLEU performance evaluation tools, and MSVD dataset are most frequently employed for video captioning. Our study is the pioneer in comparing the implementation of ML and DL algorithms employing the video captioning area. Thus, our study will accelerate the critical assessment of the state-of-the-art in other research fields of video analysis and human-computer interaction.https://ieeexplore.ieee.org/document/10413461/Deep learningmachine learningperformance evaluation metricsvideo analysis
spellingShingle	Tanzila Kehkashan Abdullah Alsaeedi Wael M. S. Yafooz Nor Azman Ismail Arafat Al-Dhaqm Combinatorial Analysis of Deep Learning and Machine Learning Video Captioning Studies: A Systematic Literature Review IEEE Access Deep learning machine learning performance evaluation metrics video analysis
title	Combinatorial Analysis of Deep Learning and Machine Learning Video Captioning Studies: A Systematic Literature Review
title_full	Combinatorial Analysis of Deep Learning and Machine Learning Video Captioning Studies: A Systematic Literature Review
title_fullStr	Combinatorial Analysis of Deep Learning and Machine Learning Video Captioning Studies: A Systematic Literature Review
title_full_unstemmed	Combinatorial Analysis of Deep Learning and Machine Learning Video Captioning Studies: A Systematic Literature Review
title_short	Combinatorial Analysis of Deep Learning and Machine Learning Video Captioning Studies: A Systematic Literature Review
title_sort	combinatorial analysis of deep learning and machine learning video captioning studies a systematic literature review
topic	Deep learning machine learning performance evaluation metrics video analysis
url	https://ieeexplore.ieee.org/document/10413461/
work_keys_str_mv	AT tanzilakehkashan combinatorialanalysisofdeeplearningandmachinelearningvideocaptioningstudiesasystematicliteraturereview AT abdullahalsaeedi combinatorialanalysisofdeeplearningandmachinelearningvideocaptioningstudiesasystematicliteraturereview AT waelmsyafooz combinatorialanalysisofdeeplearningandmachinelearningvideocaptioningstudiesasystematicliteraturereview AT norazmanismail combinatorialanalysisofdeeplearningandmachinelearningvideocaptioningstudiesasystematicliteraturereview AT arafataldhaqm combinatorialanalysisofdeeplearningandmachinelearningvideocaptioningstudiesasystematicliteraturereview

Combinatorial Analysis of Deep Learning and Machine Learning Video Captioning Studies: A Systematic Literature Review

Similar Items