Multistage Mixed-Attention Unsupervised Keyword Extraction for Summary Generation

Summary generation is an important research direction in natural language processing. Aimed at the problems of redundant information processing difficulties and an inability to generate high-quality summaries from long text in existing summary generation models, BART is the backbone model, an <i&...

Full description

Bibliographic Details
Main Authors: Di Wu, Peng Cheng, Yuying Zheng
Format: Article
Language:English
Published: MDPI AG 2024-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/14/6/2435
_version_ 1797242198782115840
author Di Wu
Peng Cheng
Yuying Zheng
author_facet Di Wu
Peng Cheng
Yuying Zheng
author_sort Di Wu
collection DOAJ
description Summary generation is an important research direction in natural language processing. Aimed at the problems of redundant information processing difficulties and an inability to generate high-quality summaries from long text in existing summary generation models, BART is the backbone model, an <i>N</i> + 1 coarse–fine-grained multistage summary generation framework is constructed, and a multistage mixed-attention unsupervised keyword extraction summary generation model is proposed (multistage mixed-attention unsupervised keyword extraction for summary generation, MSMAUKE-S<span style="font-variant: small-caps;">umm</span><i><sup>N</sup></i>). In the <i>N</i>-coarse-grained summary generation stages, a sentence filtering layer (PureText) is constructed to remove redundant information in long text. A mixed-attention unsupervised approach is used to iteratively extract keywords, assisting summary inference and enriching the global semantic information of coarse-grained summaries. In the 1-fine-grained summary generation stage, a self-attentive keyword selection module (KeywordSelect) is designed to obtain keywords with higher weights and enhance the local semantic representation of fine-grained summaries. Tandem <i>N</i>-coarse-grained and 1-fine-grained summary generation stages are used to obtain long text summaries through a multistage generation approach. The experimental results show that the model improves the ROUGE-1, ROUGE-2, and ROUGE-L metrics by a minimum of 0.75%, 1.48%, and 1.25% over the HMNET, TextRank, HAT-BART, DDAMS, and S<span style="font-variant: small-caps;">umm</span><i><sup>N</sup></i> models on summarized datasets such as AMI, ICSI, and QMSum.
first_indexed 2024-04-24T18:35:25Z
format Article
id doaj.art-3f3ac65d379c49e79d46dc3d77b42ddf
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-04-24T18:35:25Z
publishDate 2024-03-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-3f3ac65d379c49e79d46dc3d77b42ddf2024-03-27T13:19:41ZengMDPI AGApplied Sciences2076-34172024-03-01146243510.3390/app14062435Multistage Mixed-Attention Unsupervised Keyword Extraction for Summary GenerationDi Wu0Peng Cheng1Yuying Zheng2School of Information and Electronic Engineering, Hebei University of Engineering, No. 19 Taiji Road, Handan 056000, ChinaSchool of Information and Electronic Engineering, Hebei University of Engineering, No. 19 Taiji Road, Handan 056000, ChinaSchool of Information and Electronic Engineering, Hebei University of Engineering, No. 19 Taiji Road, Handan 056000, ChinaSummary generation is an important research direction in natural language processing. Aimed at the problems of redundant information processing difficulties and an inability to generate high-quality summaries from long text in existing summary generation models, BART is the backbone model, an <i>N</i> + 1 coarse–fine-grained multistage summary generation framework is constructed, and a multistage mixed-attention unsupervised keyword extraction summary generation model is proposed (multistage mixed-attention unsupervised keyword extraction for summary generation, MSMAUKE-S<span style="font-variant: small-caps;">umm</span><i><sup>N</sup></i>). In the <i>N</i>-coarse-grained summary generation stages, a sentence filtering layer (PureText) is constructed to remove redundant information in long text. A mixed-attention unsupervised approach is used to iteratively extract keywords, assisting summary inference and enriching the global semantic information of coarse-grained summaries. In the 1-fine-grained summary generation stage, a self-attentive keyword selection module (KeywordSelect) is designed to obtain keywords with higher weights and enhance the local semantic representation of fine-grained summaries. Tandem <i>N</i>-coarse-grained and 1-fine-grained summary generation stages are used to obtain long text summaries through a multistage generation approach. The experimental results show that the model improves the ROUGE-1, ROUGE-2, and ROUGE-L metrics by a minimum of 0.75%, 1.48%, and 1.25% over the HMNET, TextRank, HAT-BART, DDAMS, and S<span style="font-variant: small-caps;">umm</span><i><sup>N</sup></i> models on summarized datasets such as AMI, ICSI, and QMSum.https://www.mdpi.com/2076-3417/14/6/2435summary generationmultistagemixed-attentionunsupervisedkeyword extraction
spellingShingle Di Wu
Peng Cheng
Yuying Zheng
Multistage Mixed-Attention Unsupervised Keyword Extraction for Summary Generation
Applied Sciences
summary generation
multistage
mixed-attention
unsupervised
keyword extraction
title Multistage Mixed-Attention Unsupervised Keyword Extraction for Summary Generation
title_full Multistage Mixed-Attention Unsupervised Keyword Extraction for Summary Generation
title_fullStr Multistage Mixed-Attention Unsupervised Keyword Extraction for Summary Generation
title_full_unstemmed Multistage Mixed-Attention Unsupervised Keyword Extraction for Summary Generation
title_short Multistage Mixed-Attention Unsupervised Keyword Extraction for Summary Generation
title_sort multistage mixed attention unsupervised keyword extraction for summary generation
topic summary generation
multistage
mixed-attention
unsupervised
keyword extraction
url https://www.mdpi.com/2076-3417/14/6/2435
work_keys_str_mv AT diwu multistagemixedattentionunsupervisedkeywordextractionforsummarygeneration
AT pengcheng multistagemixedattentionunsupervisedkeywordextractionforsummarygeneration
AT yuyingzheng multistagemixedattentionunsupervisedkeywordextractionforsummarygeneration