Abstractive summarization model considering hybrid lexical features
In order to use lexical features (including n-gram and part of speech information) to identify more key vocabulary content in the summarization generation process to further improve the quality of the summarization, an algorithm based on sequence-to-sequence (Seq2Seq) structure and attention mechani...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | zho |
Published: |
Hebei University of Science and Technology
2019-04-01
|
Series: | Journal of Hebei University of Science and Technology |
Subjects: | |
Online Access: | http://xuebao.hebust.edu.cn/hbkjdx/ch/reader/create_pdf.aspx?file_no=b201902009&flag=1&journal_ |
_version_ | 1828848488594538496 |
---|---|
author | Yuehua JIANG Lei DING Jiaoe LI Haoxuan DU Kai GAO |
author_facet | Yuehua JIANG Lei DING Jiaoe LI Haoxuan DU Kai GAO |
author_sort | Yuehua JIANG |
collection | DOAJ |
description | In order to use lexical features (including n-gram and part of speech information) to identify more key vocabulary content in the summarization generation process to further improve the quality of the summarization, an algorithm based on sequence-to-sequence (Seq2Seq) structure and attention mechanism and combining lexical features is proposed. The input layer of the algorithm combines the part of speech vector with the word vector, which is the input of the encoder layer. The encoder layer is composed of bi-directional LSTM, and the context vector is composed of the output of the encoder and the lexical feature vector extracted from the convolution neural network. The convolutional neural network layer in the model controls the lexical information, the bi-directional LSTM controls the sentence information, and the decoder layer uses unidirectional LSTM to decode the context vector and generates the summarization. The experiments on public dataset and the self-collected dataset show that the performance of the summarization generation model considering lexical feature is better than that of the contrast model. The ROUGE-1, ROUGE-2 and ROUGE-L scores on the public dataset are improved by 0.024, 0.033 and 0.030, respectively. Therefore, the generation of summarization is not only related to the semantics and themes of the article, but also to the lexical features.The proposed model provides a certain reference value in the research of generating summarization of integrating key infromation. |
first_indexed | 2024-12-12T22:31:07Z |
format | Article |
id | doaj.art-e54d53649e3742658c532cf055698970 |
institution | Directory Open Access Journal |
issn | 1008-1542 |
language | zho |
last_indexed | 2024-12-12T22:31:07Z |
publishDate | 2019-04-01 |
publisher | Hebei University of Science and Technology |
record_format | Article |
series | Journal of Hebei University of Science and Technology |
spelling | doaj.art-e54d53649e3742658c532cf0556989702022-12-22T00:09:36ZzhoHebei University of Science and TechnologyJournal of Hebei University of Science and Technology1008-15422019-04-0140215215810.7535/hbkd.2019yx02009b201902009Abstractive summarization model considering hybrid lexical featuresYuehua JIANG0Lei DING1Jiaoe LI2Haoxuan DU3Kai GAO4School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, Hebei 050018, ChinaInformation Center of Shijiazhuang Public Security Bureau, Shijiazhuang, Hebei 050021, ChinaSchool of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, Hebei 050018, ChinaXi'dian University, Xi'an, Shaanxi 710126, ChinaSchool of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, Hebei 050018, ChinaIn order to use lexical features (including n-gram and part of speech information) to identify more key vocabulary content in the summarization generation process to further improve the quality of the summarization, an algorithm based on sequence-to-sequence (Seq2Seq) structure and attention mechanism and combining lexical features is proposed. The input layer of the algorithm combines the part of speech vector with the word vector, which is the input of the encoder layer. The encoder layer is composed of bi-directional LSTM, and the context vector is composed of the output of the encoder and the lexical feature vector extracted from the convolution neural network. The convolutional neural network layer in the model controls the lexical information, the bi-directional LSTM controls the sentence information, and the decoder layer uses unidirectional LSTM to decode the context vector and generates the summarization. The experiments on public dataset and the self-collected dataset show that the performance of the summarization generation model considering lexical feature is better than that of the contrast model. The ROUGE-1, ROUGE-2 and ROUGE-L scores on the public dataset are improved by 0.024, 0.033 and 0.030, respectively. Therefore, the generation of summarization is not only related to the semantics and themes of the article, but also to the lexical features.The proposed model provides a certain reference value in the research of generating summarization of integrating key infromation.http://xuebao.hebust.edu.cn/hbkjdx/ch/reader/create_pdf.aspx?file_no=b201902009&flag=1&journal_natural language processingtext summarizationattention mechanismLSTMCNN |
spellingShingle | Yuehua JIANG Lei DING Jiaoe LI Haoxuan DU Kai GAO Abstractive summarization model considering hybrid lexical features Journal of Hebei University of Science and Technology natural language processing text summarization attention mechanism LSTM CNN |
title | Abstractive summarization model considering hybrid lexical features |
title_full | Abstractive summarization model considering hybrid lexical features |
title_fullStr | Abstractive summarization model considering hybrid lexical features |
title_full_unstemmed | Abstractive summarization model considering hybrid lexical features |
title_short | Abstractive summarization model considering hybrid lexical features |
title_sort | abstractive summarization model considering hybrid lexical features |
topic | natural language processing text summarization attention mechanism LSTM CNN |
url | http://xuebao.hebust.edu.cn/hbkjdx/ch/reader/create_pdf.aspx?file_no=b201902009&flag=1&journal_ |
work_keys_str_mv | AT yuehuajiang abstractivesummarizationmodelconsideringhybridlexicalfeatures AT leiding abstractivesummarizationmodelconsideringhybridlexicalfeatures AT jiaoeli abstractivesummarizationmodelconsideringhybridlexicalfeatures AT haoxuandu abstractivesummarizationmodelconsideringhybridlexicalfeatures AT kaigao abstractivesummarizationmodelconsideringhybridlexicalfeatures |