Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization

Text summarization aims to reduce text by removing less useful information to obtain information quickly and precisely. In Indonesian abstractive text summarization, the research mostly focuses on multi-document summarization which methods will not work optimally in single-document summarization. A...

Full description

Bibliographic Details
Main Authors: Lucky, Henry, Suhartono, Derwin
Format: Article
Language:English
Published: Universiti Utara Malaysia Press 2022
Subjects:
Online Access:https://repo.uum.edu.my/id/eprint/28753/1/JICT%2021%2001%202022%2071-94.pdf
_version_ 1825805857128972288
author Lucky, Henry
Suhartono, Derwin
author_facet Lucky, Henry
Suhartono, Derwin
author_sort Lucky, Henry
collection UUM
description Text summarization aims to reduce text by removing less useful information to obtain information quickly and precisely. In Indonesian abstractive text summarization, the research mostly focuses on multi-document summarization which methods will not work optimally in single-document summarization. As the public summarization datasets and works in English are focusing on single-document summarization, this study emphasized on Indonesian single-document summarization. Abstractive text summarization studies in English frequently use Bidirectional Encoder Representations from Transformers (BERT), and since Indonesian BERT checkpoint is available, it was employed in this study. This study investigated the use of Indonesian BERT in abstractive text summarization on the IndoSum dataset using the BERTSum model. The investigation proceeded by using various combinations of model encoders, model embedding sizes, and model decoders. Evaluation results showed that models with more embedding size and used Generative Pre-Training (GPT)-like decoder could improve the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score and BERTScore of the model results.
first_indexed 2024-07-04T06:39:04Z
format Article
id uum-28753
institution Universiti Utara Malaysia
language English
last_indexed 2024-07-04T06:39:04Z
publishDate 2022
publisher Universiti Utara Malaysia Press
record_format eprints
spelling uum-287532023-02-09T03:05:34Z https://repo.uum.edu.my/id/eprint/28753/ Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization Lucky, Henry Suhartono, Derwin QA75 Electronic computers. Computer science Text summarization aims to reduce text by removing less useful information to obtain information quickly and precisely. In Indonesian abstractive text summarization, the research mostly focuses on multi-document summarization which methods will not work optimally in single-document summarization. As the public summarization datasets and works in English are focusing on single-document summarization, this study emphasized on Indonesian single-document summarization. Abstractive text summarization studies in English frequently use Bidirectional Encoder Representations from Transformers (BERT), and since Indonesian BERT checkpoint is available, it was employed in this study. This study investigated the use of Indonesian BERT in abstractive text summarization on the IndoSum dataset using the BERTSum model. The investigation proceeded by using various combinations of model encoders, model embedding sizes, and model decoders. Evaluation results showed that models with more embedding size and used Generative Pre-Training (GPT)-like decoder could improve the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score and BERTScore of the model results. Universiti Utara Malaysia Press 2022 Article PeerReviewed application/pdf en cc4_by https://repo.uum.edu.my/id/eprint/28753/1/JICT%2021%2001%202022%2071-94.pdf Lucky, Henry and Suhartono, Derwin (2022) Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization. Journal of Information and Communication Technology, 21 (01). pp. 71-94. ISSN 2180-3862 https://e-journal.uum.edu.my/index.php/jict/article/view/13548
spellingShingle QA75 Electronic computers. Computer science
Lucky, Henry
Suhartono, Derwin
Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization
title Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization
title_full Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization
title_fullStr Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization
title_full_unstemmed Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization
title_short Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization
title_sort investigation of pre trained bidirectional encoder representations from transformers checkpoints for indonesian abstractive text summarization
topic QA75 Electronic computers. Computer science
url https://repo.uum.edu.my/id/eprint/28753/1/JICT%2021%2001%202022%2071-94.pdf
work_keys_str_mv AT luckyhenry investigationofpretrainedbidirectionalencoderrepresentationsfromtransformerscheckpointsforindonesianabstractivetextsummarization
AT suhartonoderwin investigationofpretrainedbidirectionalencoderrepresentationsfromtransformerscheckpointsforindonesianabstractivetextsummarization