End to End Urdu Abstractive Text Summarization With Dataset and Improvement in Evaluation Metric

Urdu, being a common language in South Asia, has not received significant attention in terms of language processing compared to more advanced languages. In the field of Natural Language Processing (NLP), the task of text summarization holds great importance due to its ability to comprehend textual c...

Full description

Bibliographic Details
Main Authors:	Hassan Raza, Waseem Shahzad
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	Datasets neural networks CA-RoBERTa score text summarization
Online Access:	https://ieeexplore.ieee.org/document/10472483/

_version_	1827308178585419776
author	Hassan Raza Waseem Shahzad
author_facet	Hassan Raza Waseem Shahzad
author_sort	Hassan Raza
collection	DOAJ
description	Urdu, being a common language in South Asia, has not received significant attention in terms of language processing compared to more advanced languages. In the field of Natural Language Processing (NLP), the task of text summarization holds great importance due to its ability to comprehend textual content and generate concise summaries. Text summarization can be either extractive or abstractive in nature. While considerable efforts have been made to advance extractive summarization techniques, the limitations associated with it have been extensively explored and explained in the paper. However, the domain of abstractive summarization for the Urdu language remains largely unexplored. The challenges and underlying factors that have impeded progress in this domain have also been addressed. This paper specifically focuses on abstractive summarization of the Urdu language using supervised learning. To accomplish this, a labeled dataset consisting of Urdu text and its abstractive summaries is required. A dataset of Urdu text and its corresponding abstractive summaries has been prepared for the purpose of supervised learning. Additionally, the paper presents the results of summary generation, measured in terms of a rough score. Transformer’s encoder-decoder network was employed to generate abstractive summaries in Urdu, yielding a ROUGE-1 score of 25.18 in Urdu text summarization. Moreover, a novel evaluation metric called the “disconnection rate” has been introduced as a context-aware evaluation metric to enhance the assessment of a summary, known as the Context Aware RoBERTa Score.
first_indexed	2024-04-24T18:52:52Z
format	Article
id	doaj.art-736a577ad67b4fe58bf354331c507c90
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-24T18:52:52Z
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-736a577ad67b4fe58bf354331c507c902024-03-26T17:48:20ZengIEEEIEEE Access2169-35362024-01-0112403114032410.1109/ACCESS.2024.337746310472483End to End Urdu Abstractive Text Summarization With Dataset and Improvement in Evaluation MetricHassan Raza0https://orcid.org/0009-0001-4857-4818Waseem Shahzad1https://orcid.org/0000-0002-9491-3761FAST School of Computing, National University of Computer and Emerging Sciences, Islamabad, PakistanFAST School of Computing, National University of Computer and Emerging Sciences, Islamabad, PakistanUrdu, being a common language in South Asia, has not received significant attention in terms of language processing compared to more advanced languages. In the field of Natural Language Processing (NLP), the task of text summarization holds great importance due to its ability to comprehend textual content and generate concise summaries. Text summarization can be either extractive or abstractive in nature. While considerable efforts have been made to advance extractive summarization techniques, the limitations associated with it have been extensively explored and explained in the paper. However, the domain of abstractive summarization for the Urdu language remains largely unexplored. The challenges and underlying factors that have impeded progress in this domain have also been addressed. This paper specifically focuses on abstractive summarization of the Urdu language using supervised learning. To accomplish this, a labeled dataset consisting of Urdu text and its abstractive summaries is required. A dataset of Urdu text and its corresponding abstractive summaries has been prepared for the purpose of supervised learning. Additionally, the paper presents the results of summary generation, measured in terms of a rough score. Transformer’s encoder-decoder network was employed to generate abstractive summaries in Urdu, yielding a ROUGE-1 score of 25.18 in Urdu text summarization. Moreover, a novel evaluation metric called the “disconnection rate” has been introduced as a context-aware evaluation metric to enhance the assessment of a summary, known as the Context Aware RoBERTa Score.https://ieeexplore.ieee.org/document/10472483/Datasetsneural networksCA-RoBERTa scoretext summarization
spellingShingle	Hassan Raza Waseem Shahzad End to End Urdu Abstractive Text Summarization With Dataset and Improvement in Evaluation Metric IEEE Access Datasets neural networks CA-RoBERTa score text summarization
title	End to End Urdu Abstractive Text Summarization With Dataset and Improvement in Evaluation Metric
title_full	End to End Urdu Abstractive Text Summarization With Dataset and Improvement in Evaluation Metric
title_fullStr	End to End Urdu Abstractive Text Summarization With Dataset and Improvement in Evaluation Metric
title_full_unstemmed	End to End Urdu Abstractive Text Summarization With Dataset and Improvement in Evaluation Metric
title_short	End to End Urdu Abstractive Text Summarization With Dataset and Improvement in Evaluation Metric
title_sort	end to end urdu abstractive text summarization with dataset and improvement in evaluation metric
topic	Datasets neural networks CA-RoBERTa score text summarization
url	https://ieeexplore.ieee.org/document/10472483/
work_keys_str_mv	AT hassanraza endtoendurduabstractivetextsummarizationwithdatasetandimprovementinevaluationmetric AT waseemshahzad endtoendurduabstractivetextsummarizationwithdatasetandimprovementinevaluationmetric

End to End Urdu Abstractive Text Summarization With Dataset and Improvement in Evaluation Metric

Similar Items