STSG: A Short Text Semantic Graph Model for Similarity Computing Based on Dependency Parsing and Pre-trained Language Models

ABSTRACTShort text semantic similarity is a crucial research area in nature language processing, which is used to predict the similarity between two sentences. Due to the sparsity features of short texts, words are isolated in the sentence and the correlations of words are ignored, it is very diffic...

Full description

Bibliographic Details
Main Authors: Hai Liao, Yan Liang, Song Chen, Lingyun Xiang, Zhimin Chang, Yun Xiao
Format: Article
Language:English
Published: Taylor & Francis Group 2024-12-01
Series:Applied Artificial Intelligence
Online Access:https://www.tandfonline.com/doi/10.1080/08839514.2024.2321552
_version_ 1797279036994486272
author Hai Liao
Yan Liang
Song Chen
Lingyun Xiang
Zhimin Chang
Yun Xiao
author_facet Hai Liao
Yan Liang
Song Chen
Lingyun Xiang
Zhimin Chang
Yun Xiao
author_sort Hai Liao
collection DOAJ
description ABSTRACTShort text semantic similarity is a crucial research area in nature language processing, which is used to predict the similarity between two sentences. Due to the sparsity features of short texts, words are isolated in the sentence and the correlations of words are ignored, it is very difficult to calculate the global semantic information. Based on this, short text semantic graph (STSG) model based on dependency parsing and pre-trained language models is proposed in this paper. It utilizes the syntactic information to obtain word dependency relationships and incorporate it into pre-trained language models to enhance the global semantic information of sentences. So it can address the semantic sparsity more effectively. A text semantic graph layer based on the graph attention network (GAT) is also realized, which regards word vectors as node features and word dependency as edge features. The attention mechanism of GAT can identify the importance of different word correlations and solve the word dependency modeling effectively. On the challenging short text semantic benchmark dataset MRPC, the STSG model achieves an F1-score of .946, which is further improved 2.16% over previous SOTA approaches. At the time of writing, STSG has achieved a new SOTA performance on the MRPC dataset.
first_indexed 2024-03-07T16:17:38Z
format Article
id doaj.art-2e0b2d0c0e6a4cfdbf704bbc290efa2c
institution Directory Open Access Journal
issn 0883-9514
1087-6545
language English
last_indexed 2024-03-07T16:17:38Z
publishDate 2024-12-01
publisher Taylor & Francis Group
record_format Article
series Applied Artificial Intelligence
spelling doaj.art-2e0b2d0c0e6a4cfdbf704bbc290efa2c2024-03-04T09:05:44ZengTaylor & Francis GroupApplied Artificial Intelligence0883-95141087-65452024-12-0138110.1080/08839514.2024.2321552STSG: A Short Text Semantic Graph Model for Similarity Computing Based on Dependency Parsing and Pre-trained Language ModelsHai Liao0Yan Liang1Song Chen2Lingyun Xiang3Zhimin Chang4Yun Xiao5School of Computer Engineering, Chengdu Technological University, Chengdu, ChinaSchool of Computer Engineering, Chengdu Technological University, Chengdu, ChinaSchool of Computer Engineering, Chengdu Technological University, Chengdu, ChinaFaculty of Informatics, Eötvös Loránd University, Budapest, HungaryThe Research and Development Department, HAN Networks Corporation Limited, Beijing, ChinaSchool of Software, Sichuan Vocational College of Information Technology, Guangyuan, ChinaABSTRACTShort text semantic similarity is a crucial research area in nature language processing, which is used to predict the similarity between two sentences. Due to the sparsity features of short texts, words are isolated in the sentence and the correlations of words are ignored, it is very difficult to calculate the global semantic information. Based on this, short text semantic graph (STSG) model based on dependency parsing and pre-trained language models is proposed in this paper. It utilizes the syntactic information to obtain word dependency relationships and incorporate it into pre-trained language models to enhance the global semantic information of sentences. So it can address the semantic sparsity more effectively. A text semantic graph layer based on the graph attention network (GAT) is also realized, which regards word vectors as node features and word dependency as edge features. The attention mechanism of GAT can identify the importance of different word correlations and solve the word dependency modeling effectively. On the challenging short text semantic benchmark dataset MRPC, the STSG model achieves an F1-score of .946, which is further improved 2.16% over previous SOTA approaches. At the time of writing, STSG has achieved a new SOTA performance on the MRPC dataset.https://www.tandfonline.com/doi/10.1080/08839514.2024.2321552
spellingShingle Hai Liao
Yan Liang
Song Chen
Lingyun Xiang
Zhimin Chang
Yun Xiao
STSG: A Short Text Semantic Graph Model for Similarity Computing Based on Dependency Parsing and Pre-trained Language Models
Applied Artificial Intelligence
title STSG: A Short Text Semantic Graph Model for Similarity Computing Based on Dependency Parsing and Pre-trained Language Models
title_full STSG: A Short Text Semantic Graph Model for Similarity Computing Based on Dependency Parsing and Pre-trained Language Models
title_fullStr STSG: A Short Text Semantic Graph Model for Similarity Computing Based on Dependency Parsing and Pre-trained Language Models
title_full_unstemmed STSG: A Short Text Semantic Graph Model for Similarity Computing Based on Dependency Parsing and Pre-trained Language Models
title_short STSG: A Short Text Semantic Graph Model for Similarity Computing Based on Dependency Parsing and Pre-trained Language Models
title_sort stsg a short text semantic graph model for similarity computing based on dependency parsing and pre trained language models
url https://www.tandfonline.com/doi/10.1080/08839514.2024.2321552
work_keys_str_mv AT hailiao stsgashorttextsemanticgraphmodelforsimilaritycomputingbasedondependencyparsingandpretrainedlanguagemodels
AT yanliang stsgashorttextsemanticgraphmodelforsimilaritycomputingbasedondependencyparsingandpretrainedlanguagemodels
AT songchen stsgashorttextsemanticgraphmodelforsimilaritycomputingbasedondependencyparsingandpretrainedlanguagemodels
AT lingyunxiang stsgashorttextsemanticgraphmodelforsimilaritycomputingbasedondependencyparsingandpretrainedlanguagemodels
AT zhiminchang stsgashorttextsemanticgraphmodelforsimilaritycomputingbasedondependencyparsingandpretrainedlanguagemodels
AT yunxiao stsgashorttextsemanticgraphmodelforsimilaritycomputingbasedondependencyparsingandpretrainedlanguagemodels