Leveraging DistilBERT for Summarizing Arabic Text: An Extractive Dual-Stage Approach

Towards tackling the phenomenon of textual information overload that is exponentially pumping with redundancy over the Internet, this paper investigates a solution depending on the Automatic Text Summarization (ATS) method. The idea of ATS is to assist, e.g., online readers, in getting a simplified...

Full description

Bibliographic Details
Main Authors: Abdullah Alshanqiti, Abdallah Namoun, Aeshah Alsughayyir, Aisha Mousa Mashraqi, Abdul Rehman Gilal, Sami Saad Albouq
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9539169/
_version_ 1798001928839364608
author Abdullah Alshanqiti
Abdallah Namoun
Aeshah Alsughayyir
Aisha Mousa Mashraqi
Abdul Rehman Gilal
Sami Saad Albouq
author_facet Abdullah Alshanqiti
Abdallah Namoun
Aeshah Alsughayyir
Aisha Mousa Mashraqi
Abdul Rehman Gilal
Sami Saad Albouq
author_sort Abdullah Alshanqiti
collection DOAJ
description Towards tackling the phenomenon of textual information overload that is exponentially pumping with redundancy over the Internet, this paper investigates a solution depending on the Automatic Text Summarization (ATS) method. The idea of ATS is to assist, e.g., online readers, in getting a simplified version of texts for preserving their time/effort required to skim a given large body of text. However, ATS is deemed as one of the most complex NLP applications, particularly for the Arabic language that has not been intelligently developed like the other Indo-European languages. Thus, we present an extractive-based summarizer (ArDBertSum) for text written in Arabic, relying on the DistilBERT model. Besides, we propose a domain-specific sentence-clauses segmentater (SCSAR) to support our ArDBertSum in further shortening long/complex sentences. The results of our experiments illustrate that our ArDBertSum yields the best performance, compared with non-heuristic Arabic summarizers, in producing an acceptable quality of candidate summaries. These experiments have been conducted on EASC-dataset (along with our proposed dataset) to report on (1) a statistical evaluation utilizing ROUGE metrics and (2) a specific human-based evaluation. The human evaluation results revealed promising perceptions; however, further works are needed to ameliorate the coherence and punctuation of the automatic summaries.
first_indexed 2024-04-11T11:44:03Z
format Article
id doaj.art-0fce151e800044d0bde95e3055f61f22
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-11T11:44:03Z
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-0fce151e800044d0bde95e3055f61f222022-12-22T04:25:40ZengIEEEIEEE Access2169-35362021-01-01913559413560710.1109/ACCESS.2021.31132569539169Leveraging DistilBERT for Summarizing Arabic Text: An Extractive Dual-Stage ApproachAbdullah Alshanqiti0https://orcid.org/0000-0002-6080-5236Abdallah Namoun1https://orcid.org/0000-0002-7050-0532Aeshah Alsughayyir2https://orcid.org/0000-0003-3710-7103Aisha Mousa Mashraqi3https://orcid.org/0000-0003-0449-8910Abdul Rehman Gilal4https://orcid.org/0000-0002-1904-1588Sami Saad Albouq5https://orcid.org/0000-0002-1549-7334Faculty of Computer and Information Systems, Islamic University of Madinah, Madinah, Saudi ArabiaFaculty of Computer and Information Systems, Islamic University of Madinah, Madinah, Saudi ArabiaCollege of Computer Science and Engineering, Taibah University, Madinah, Saudi ArabiaCollege of Computer Science, Najran University (NU), Najran, Saudi ArabiaDepartment of Computer Science, Sukkur IBA University, Sukkur, PakistanFaculty of Computer and Information Systems, Islamic University of Madinah, Madinah, Saudi ArabiaTowards tackling the phenomenon of textual information overload that is exponentially pumping with redundancy over the Internet, this paper investigates a solution depending on the Automatic Text Summarization (ATS) method. The idea of ATS is to assist, e.g., online readers, in getting a simplified version of texts for preserving their time/effort required to skim a given large body of text. However, ATS is deemed as one of the most complex NLP applications, particularly for the Arabic language that has not been intelligently developed like the other Indo-European languages. Thus, we present an extractive-based summarizer (ArDBertSum) for text written in Arabic, relying on the DistilBERT model. Besides, we propose a domain-specific sentence-clauses segmentater (SCSAR) to support our ArDBertSum in further shortening long/complex sentences. The results of our experiments illustrate that our ArDBertSum yields the best performance, compared with non-heuristic Arabic summarizers, in producing an acceptable quality of candidate summaries. These experiments have been conducted on EASC-dataset (along with our proposed dataset) to report on (1) a statistical evaluation utilizing ROUGE metrics and (2) a specific human-based evaluation. The human evaluation results revealed promising perceptions; however, further works are needed to ameliorate the coherence and punctuation of the automatic summaries.https://ieeexplore.ieee.org/document/9539169/NLPnatural language understandingautomatic text summarizationmachine learningtransfer learningpre-trained model
spellingShingle Abdullah Alshanqiti
Abdallah Namoun
Aeshah Alsughayyir
Aisha Mousa Mashraqi
Abdul Rehman Gilal
Sami Saad Albouq
Leveraging DistilBERT for Summarizing Arabic Text: An Extractive Dual-Stage Approach
IEEE Access
NLP
natural language understanding
automatic text summarization
machine learning
transfer learning
pre-trained model
title Leveraging DistilBERT for Summarizing Arabic Text: An Extractive Dual-Stage Approach
title_full Leveraging DistilBERT for Summarizing Arabic Text: An Extractive Dual-Stage Approach
title_fullStr Leveraging DistilBERT for Summarizing Arabic Text: An Extractive Dual-Stage Approach
title_full_unstemmed Leveraging DistilBERT for Summarizing Arabic Text: An Extractive Dual-Stage Approach
title_short Leveraging DistilBERT for Summarizing Arabic Text: An Extractive Dual-Stage Approach
title_sort leveraging distilbert for summarizing arabic text an extractive dual stage approach
topic NLP
natural language understanding
automatic text summarization
machine learning
transfer learning
pre-trained model
url https://ieeexplore.ieee.org/document/9539169/
work_keys_str_mv AT abdullahalshanqiti leveragingdistilbertforsummarizingarabictextanextractivedualstageapproach
AT abdallahnamoun leveragingdistilbertforsummarizingarabictextanextractivedualstageapproach
AT aeshahalsughayyir leveragingdistilbertforsummarizingarabictextanextractivedualstageapproach
AT aishamousamashraqi leveragingdistilbertforsummarizingarabictextanextractivedualstageapproach
AT abdulrehmangilal leveragingdistilbertforsummarizingarabictextanextractivedualstageapproach
AT samisaadalbouq leveragingdistilbertforsummarizingarabictextanextractivedualstageapproach