A Comprehensive Review of Arabic Text Summarization

The explosion of online and offline data has changed how we gather, evaluate, and understand data. It is frequently difficult and time-consuming to comprehend large text documents and extract crucial information from them. Text summarization techniques address the mentioned problems by compressing l...

Full description

Bibliographic Details
Main Authors: Asmaa Elsaid, Ammar Mohammed, Lamiaa Fattouh Ibrahim, Mohammed M. Sakre
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9745159/
_version_ 1818060179328991232
author Asmaa Elsaid
Ammar Mohammed
Lamiaa Fattouh Ibrahim
Mohammed M. Sakre
author_facet Asmaa Elsaid
Ammar Mohammed
Lamiaa Fattouh Ibrahim
Mohammed M. Sakre
author_sort Asmaa Elsaid
collection DOAJ
description The explosion of online and offline data has changed how we gather, evaluate, and understand data. It is frequently difficult and time-consuming to comprehend large text documents and extract crucial information from them. Text summarization techniques address the mentioned problems by compressing long texts while retaining their essential contents. These techniques rely on the fast delivery of filtered, high-quality content to their users. Due to the massive amounts of data generated by technology and various sources, automated text summarization of large-scale data is challenging. There are three types of automatic text summarization techniques: extractive, abstractive, and hybrid. Regardless of these previous techniques, the generated summaries are a long way from the summarization produced by human experts. Although Arabic is a widely spoken language that is frequently used for content sharing on the web, Arabic text summarization of Arabic content is limited and still immature because of several problems, including the Arabic language’s morphological structure, the variety of dialects, and the lack of adequate data sources. This paper reviews text summarization approaches and recent deep learning models for this approach. Additionally, it focuses on existing datasets for these approaches, which are also reviewed, along with their characteristics and limitations. The most often used metrics for summarization quality evaluation are ROUGE1, ROUGE2, ROUGE L, and Bleu. The challenges that are encountered during Arabic text summarizing methods and approaches and the solutions proposed in each approach are analyzed. Many Arabic text summarization methods have problems, such as the lack of golden tokens during testing, being out of vocabulary (OOV) words, repeating summary sentences, lack of standard systematic methodologies and architectures, and the complexity of the Arabic language. Finally, providing the required corpora, improving evaluation using semantic representations, the lack of using rouge metrics in abstractive text summarization, and using recent deep learning models to adopt them in Arabic summarization studies is an essential demand.
first_indexed 2024-12-10T13:28:18Z
format Article
id doaj.art-3d57ae145e164d3f9b048fecba547463
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-10T13:28:18Z
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-3d57ae145e164d3f9b048fecba5474632022-12-22T01:47:05ZengIEEEIEEE Access2169-35362022-01-0110380123803010.1109/ACCESS.2022.31632929745159A Comprehensive Review of Arabic Text SummarizationAsmaa Elsaid0https://orcid.org/0000-0003-0514-1278Ammar Mohammed1https://orcid.org/0000-0001-6844-9451Lamiaa Fattouh Ibrahim2https://orcid.org/0000-0001-5671-8941Mohammed M. Sakre3Department of Computer Science, Faculty of Graduate Studies of Statistical Researches, Cairo University, Giza, EgyptDepartment of Computer Science, Faculty of Graduate Studies of Statistical Researches, Cairo University, Giza, EgyptDepartment of Computer Science, Faculty of Graduate Studies of Statistical Researches, Cairo University, Giza, EgyptHigher Institute of Computer Science and Information Technology, El Shorouk, EgyptThe explosion of online and offline data has changed how we gather, evaluate, and understand data. It is frequently difficult and time-consuming to comprehend large text documents and extract crucial information from them. Text summarization techniques address the mentioned problems by compressing long texts while retaining their essential contents. These techniques rely on the fast delivery of filtered, high-quality content to their users. Due to the massive amounts of data generated by technology and various sources, automated text summarization of large-scale data is challenging. There are three types of automatic text summarization techniques: extractive, abstractive, and hybrid. Regardless of these previous techniques, the generated summaries are a long way from the summarization produced by human experts. Although Arabic is a widely spoken language that is frequently used for content sharing on the web, Arabic text summarization of Arabic content is limited and still immature because of several problems, including the Arabic language’s morphological structure, the variety of dialects, and the lack of adequate data sources. This paper reviews text summarization approaches and recent deep learning models for this approach. Additionally, it focuses on existing datasets for these approaches, which are also reviewed, along with their characteristics and limitations. The most often used metrics for summarization quality evaluation are ROUGE1, ROUGE2, ROUGE L, and Bleu. The challenges that are encountered during Arabic text summarizing methods and approaches and the solutions proposed in each approach are analyzed. Many Arabic text summarization methods have problems, such as the lack of golden tokens during testing, being out of vocabulary (OOV) words, repeating summary sentences, lack of standard systematic methodologies and architectures, and the complexity of the Arabic language. Finally, providing the required corpora, improving evaluation using semantic representations, the lack of using rouge metrics in abstractive text summarization, and using recent deep learning models to adopt them in Arabic summarization studies is an essential demand.https://ieeexplore.ieee.org/document/9745159/Text summarizationarabic natural language processingmachine learningextractive text summarizationabstractive text summarizationand deep learning models
spellingShingle Asmaa Elsaid
Ammar Mohammed
Lamiaa Fattouh Ibrahim
Mohammed M. Sakre
A Comprehensive Review of Arabic Text Summarization
IEEE Access
Text summarization
arabic natural language processing
machine learning
extractive text summarization
abstractive text summarization
and deep learning models
title A Comprehensive Review of Arabic Text Summarization
title_full A Comprehensive Review of Arabic Text Summarization
title_fullStr A Comprehensive Review of Arabic Text Summarization
title_full_unstemmed A Comprehensive Review of Arabic Text Summarization
title_short A Comprehensive Review of Arabic Text Summarization
title_sort comprehensive review of arabic text summarization
topic Text summarization
arabic natural language processing
machine learning
extractive text summarization
abstractive text summarization
and deep learning models
url https://ieeexplore.ieee.org/document/9745159/
work_keys_str_mv AT asmaaelsaid acomprehensivereviewofarabictextsummarization
AT ammarmohammed acomprehensivereviewofarabictextsummarization
AT lamiaafattouhibrahim acomprehensivereviewofarabictextsummarization
AT mohammedmsakre acomprehensivereviewofarabictextsummarization
AT asmaaelsaid comprehensivereviewofarabictextsummarization
AT ammarmohammed comprehensivereviewofarabictextsummarization
AT lamiaafattouhibrahim comprehensivereviewofarabictextsummarization
AT mohammedmsakre comprehensivereviewofarabictextsummarization