Leveraging DistilBERT for Summarizing Arabic Text: An Extractive Dual-Stage Approach
Towards tackling the phenomenon of textual information overload that is exponentially pumping with redundancy over the Internet, this paper investigates a solution depending on the Automatic Text Summarization (ATS) method. The idea of ATS is to assist, e.g., online readers, in getting a simplified...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9539169/ |
_version_ | 1798001928839364608 |
---|---|
author | Abdullah Alshanqiti Abdallah Namoun Aeshah Alsughayyir Aisha Mousa Mashraqi Abdul Rehman Gilal Sami Saad Albouq |
author_facet | Abdullah Alshanqiti Abdallah Namoun Aeshah Alsughayyir Aisha Mousa Mashraqi Abdul Rehman Gilal Sami Saad Albouq |
author_sort | Abdullah Alshanqiti |
collection | DOAJ |
description | Towards tackling the phenomenon of textual information overload that is exponentially pumping with redundancy over the Internet, this paper investigates a solution depending on the Automatic Text Summarization (ATS) method. The idea of ATS is to assist, e.g., online readers, in getting a simplified version of texts for preserving their time/effort required to skim a given large body of text. However, ATS is deemed as one of the most complex NLP applications, particularly for the Arabic language that has not been intelligently developed like the other Indo-European languages. Thus, we present an extractive-based summarizer (ArDBertSum) for text written in Arabic, relying on the DistilBERT model. Besides, we propose a domain-specific sentence-clauses segmentater (SCSAR) to support our ArDBertSum in further shortening long/complex sentences. The results of our experiments illustrate that our ArDBertSum yields the best performance, compared with non-heuristic Arabic summarizers, in producing an acceptable quality of candidate summaries. These experiments have been conducted on EASC-dataset (along with our proposed dataset) to report on (1) a statistical evaluation utilizing ROUGE metrics and (2) a specific human-based evaluation. The human evaluation results revealed promising perceptions; however, further works are needed to ameliorate the coherence and punctuation of the automatic summaries. |
first_indexed | 2024-04-11T11:44:03Z |
format | Article |
id | doaj.art-0fce151e800044d0bde95e3055f61f22 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-11T11:44:03Z |
publishDate | 2021-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-0fce151e800044d0bde95e3055f61f222022-12-22T04:25:40ZengIEEEIEEE Access2169-35362021-01-01913559413560710.1109/ACCESS.2021.31132569539169Leveraging DistilBERT for Summarizing Arabic Text: An Extractive Dual-Stage ApproachAbdullah Alshanqiti0https://orcid.org/0000-0002-6080-5236Abdallah Namoun1https://orcid.org/0000-0002-7050-0532Aeshah Alsughayyir2https://orcid.org/0000-0003-3710-7103Aisha Mousa Mashraqi3https://orcid.org/0000-0003-0449-8910Abdul Rehman Gilal4https://orcid.org/0000-0002-1904-1588Sami Saad Albouq5https://orcid.org/0000-0002-1549-7334Faculty of Computer and Information Systems, Islamic University of Madinah, Madinah, Saudi ArabiaFaculty of Computer and Information Systems, Islamic University of Madinah, Madinah, Saudi ArabiaCollege of Computer Science and Engineering, Taibah University, Madinah, Saudi ArabiaCollege of Computer Science, Najran University (NU), Najran, Saudi ArabiaDepartment of Computer Science, Sukkur IBA University, Sukkur, PakistanFaculty of Computer and Information Systems, Islamic University of Madinah, Madinah, Saudi ArabiaTowards tackling the phenomenon of textual information overload that is exponentially pumping with redundancy over the Internet, this paper investigates a solution depending on the Automatic Text Summarization (ATS) method. The idea of ATS is to assist, e.g., online readers, in getting a simplified version of texts for preserving their time/effort required to skim a given large body of text. However, ATS is deemed as one of the most complex NLP applications, particularly for the Arabic language that has not been intelligently developed like the other Indo-European languages. Thus, we present an extractive-based summarizer (ArDBertSum) for text written in Arabic, relying on the DistilBERT model. Besides, we propose a domain-specific sentence-clauses segmentater (SCSAR) to support our ArDBertSum in further shortening long/complex sentences. The results of our experiments illustrate that our ArDBertSum yields the best performance, compared with non-heuristic Arabic summarizers, in producing an acceptable quality of candidate summaries. These experiments have been conducted on EASC-dataset (along with our proposed dataset) to report on (1) a statistical evaluation utilizing ROUGE metrics and (2) a specific human-based evaluation. The human evaluation results revealed promising perceptions; however, further works are needed to ameliorate the coherence and punctuation of the automatic summaries.https://ieeexplore.ieee.org/document/9539169/NLPnatural language understandingautomatic text summarizationmachine learningtransfer learningpre-trained model |
spellingShingle | Abdullah Alshanqiti Abdallah Namoun Aeshah Alsughayyir Aisha Mousa Mashraqi Abdul Rehman Gilal Sami Saad Albouq Leveraging DistilBERT for Summarizing Arabic Text: An Extractive Dual-Stage Approach IEEE Access NLP natural language understanding automatic text summarization machine learning transfer learning pre-trained model |
title | Leveraging DistilBERT for Summarizing Arabic Text: An Extractive Dual-Stage Approach |
title_full | Leveraging DistilBERT for Summarizing Arabic Text: An Extractive Dual-Stage Approach |
title_fullStr | Leveraging DistilBERT for Summarizing Arabic Text: An Extractive Dual-Stage Approach |
title_full_unstemmed | Leveraging DistilBERT for Summarizing Arabic Text: An Extractive Dual-Stage Approach |
title_short | Leveraging DistilBERT for Summarizing Arabic Text: An Extractive Dual-Stage Approach |
title_sort | leveraging distilbert for summarizing arabic text an extractive dual stage approach |
topic | NLP natural language understanding automatic text summarization machine learning transfer learning pre-trained model |
url | https://ieeexplore.ieee.org/document/9539169/ |
work_keys_str_mv | AT abdullahalshanqiti leveragingdistilbertforsummarizingarabictextanextractivedualstageapproach AT abdallahnamoun leveragingdistilbertforsummarizingarabictextanextractivedualstageapproach AT aeshahalsughayyir leveragingdistilbertforsummarizingarabictextanextractivedualstageapproach AT aishamousamashraqi leveragingdistilbertforsummarizingarabictextanextractivedualstageapproach AT abdulrehmangilal leveragingdistilbertforsummarizingarabictextanextractivedualstageapproach AT samisaadalbouq leveragingdistilbertforsummarizingarabictextanextractivedualstageapproach |