Investigating the Pre-Training Bias in Low-Resource Abstractive Summarization

Recent advances in low-resource abstractive summarization were largely made through the adoption of specialized pre-training, pseudo-summarization, that integrates the content selection knowledge through various centrality-based sentence recovery tasks. However, despite the substantial results, ther...

Full description

Bibliographic Details
Main Authors:	Daniil Chernyshev, Boris Dobrov
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	Abstractive summarization attention mechanism low-resource text processing pre-trained language models model probing
Online Access:	https://ieeexplore.ieee.org/document/10474365/

_version_	1797221994343694336
author	Daniil Chernyshev Boris Dobrov
author_facet	Daniil Chernyshev Boris Dobrov
author_sort	Daniil Chernyshev
collection	DOAJ
description	Recent advances in low-resource abstractive summarization were largely made through the adoption of specialized pre-training, pseudo-summarization, that integrates the content selection knowledge through various centrality-based sentence recovery tasks. However, despite the substantial results, there are several cases where the predecessor general-purpose pre-trained language model BART outperforms the summarization-specialized counterparts in both few-shot and fine-tuned scenarios. In this work, we investigate these performance irregularities and shed some light on the effect of pseudo-summarization pre-training in low-resource settings. We benchmarked five pre-trained abstractive summarization models on five datasets of diverse domains and analyzed their behavior in terms of extractive intuition and attention patterns. Despite that all models exhibit extractive behavior, some lack the prediction confidence to copy longer text fragments and have a misaligned attention distribution with the structure of the real-world texts. The latter happens to be the major factor of underperformance in fiction, news, and scientific articles domains as the better initial attention alignment of BART leads to the best benchmark results in all few-shot settings. A further examination reveals that BART summarization capabilities are the side-effect of the combination of sentence permutation task and specificities of the pre-training dataset. Based on the discovery we introduce Pegasus-SP, an improved pre-trained abstractive summarization model that unifies pseudo-summarization with sentence permutation. The new model outperforms the existing counterparts in low-resource settings and demonstrates superior adaptability. Additionally, we show that all pre-trained summarization models benefit from data-wise attention correction, achieving up to 10% relative ROUGE improvement on model-data pairs with the largest distribution discrepancies.
first_indexed	2024-04-24T13:14:16Z
format	Article
id	doaj.art-b870b80545c042528dd2d77cb21fc0f7
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-24T13:14:16Z
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-b870b80545c042528dd2d77cb21fc0f72024-04-04T23:00:42ZengIEEEIEEE Access2169-35362024-01-0112472194723010.1109/ACCESS.2024.337913910474365Investigating the Pre-Training Bias in Low-Resource Abstractive SummarizationDaniil Chernyshev0https://orcid.org/0009-0001-6847-2122Boris Dobrov1Research Computing Center, Lomonosov Moscow State University, Moscow, RussiaResearch Computing Center, Lomonosov Moscow State University, Moscow, RussiaRecent advances in low-resource abstractive summarization were largely made through the adoption of specialized pre-training, pseudo-summarization, that integrates the content selection knowledge through various centrality-based sentence recovery tasks. However, despite the substantial results, there are several cases where the predecessor general-purpose pre-trained language model BART outperforms the summarization-specialized counterparts in both few-shot and fine-tuned scenarios. In this work, we investigate these performance irregularities and shed some light on the effect of pseudo-summarization pre-training in low-resource settings. We benchmarked five pre-trained abstractive summarization models on five datasets of diverse domains and analyzed their behavior in terms of extractive intuition and attention patterns. Despite that all models exhibit extractive behavior, some lack the prediction confidence to copy longer text fragments and have a misaligned attention distribution with the structure of the real-world texts. The latter happens to be the major factor of underperformance in fiction, news, and scientific articles domains as the better initial attention alignment of BART leads to the best benchmark results in all few-shot settings. A further examination reveals that BART summarization capabilities are the side-effect of the combination of sentence permutation task and specificities of the pre-training dataset. Based on the discovery we introduce Pegasus-SP, an improved pre-trained abstractive summarization model that unifies pseudo-summarization with sentence permutation. The new model outperforms the existing counterparts in low-resource settings and demonstrates superior adaptability. Additionally, we show that all pre-trained summarization models benefit from data-wise attention correction, achieving up to 10% relative ROUGE improvement on model-data pairs with the largest distribution discrepancies.https://ieeexplore.ieee.org/document/10474365/Abstractive summarizationattention mechanismlow-resource text processingpre-trained language modelsmodel probing
spellingShingle	Daniil Chernyshev Boris Dobrov Investigating the Pre-Training Bias in Low-Resource Abstractive Summarization IEEE Access Abstractive summarization attention mechanism low-resource text processing pre-trained language models model probing
title	Investigating the Pre-Training Bias in Low-Resource Abstractive Summarization
title_full	Investigating the Pre-Training Bias in Low-Resource Abstractive Summarization
title_fullStr	Investigating the Pre-Training Bias in Low-Resource Abstractive Summarization
title_full_unstemmed	Investigating the Pre-Training Bias in Low-Resource Abstractive Summarization
title_short	Investigating the Pre-Training Bias in Low-Resource Abstractive Summarization
title_sort	investigating the pre training bias in low resource abstractive summarization
topic	Abstractive summarization attention mechanism low-resource text processing pre-trained language models model probing
url	https://ieeexplore.ieee.org/document/10474365/
work_keys_str_mv	AT daniilchernyshev investigatingthepretrainingbiasinlowresourceabstractivesummarization AT borisdobrov investigatingthepretrainingbiasinlowresourceabstractivesummarization

Investigating the Pre-Training Bias in Low-Resource Abstractive Summarization

Similar Items