Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization With Clustering Analysis
Text similarity measurement, which is a basic task in natural language processing, is widely used in text information mining, news classification and clustering, artificial intelligence, and other fields. This paper proposes a text similarity measure method named word vector distance decentralizatio...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8784295/ |
_version_ | 1818611560037220352 |
---|---|
author | Shenghan Zhou Xingxing Xu Yinglai Liu Runfeng Chang Yiyong Xiao |
author_facet | Shenghan Zhou Xingxing Xu Yinglai Liu Runfeng Chang Yiyong Xiao |
author_sort | Shenghan Zhou |
collection | DOAJ |
description | Text similarity measurement, which is a basic task in natural language processing, is widely used in text information mining, news classification and clustering, artificial intelligence, and other fields. This paper proposes a text similarity measure method named word vector distance decentralization (WVDD) which can deal with complex semantic relations, including sentence components, word order and weights for Chinese language. Then, the clustering analysis is performed for the obtained similarity results. A K-means algorithm based on Spark architecture for parallel computing is adopted to accelerate clustering speed here. In experimental verification, the test sets are significant number of customer comments posted on the Jingdong website, which is a comprehensive online shopping mall. F-measure is used to evaluate the accuracy of the results obtained by the proposed method. The superiority of the proposed method is verified and compared with the sentence vector model (Doc2vec) and bag-of-words model. The proposed method can be applied to analyze network language, such as customers' comments online and web chat data. |
first_indexed | 2024-12-16T15:32:16Z |
format | Article |
id | doaj.art-c5d8e6ce276045bba9501a2ec36bc633 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-16T15:32:16Z |
publishDate | 2019-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-c5d8e6ce276045bba9501a2ec36bc6332022-12-21T22:26:19ZengIEEEIEEE Access2169-35362019-01-01710724710725810.1109/ACCESS.2019.29323348784295Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization With Clustering AnalysisShenghan Zhou0https://orcid.org/0000-0001-7979-4912Xingxing Xu1https://orcid.org/0000-0001-9296-0284Yinglai Liu2Runfeng Chang3Yiyong Xiao4School of Reliability and Systems Engineering, Beihang University, Beijing, ChinaSchool of Reliability and Systems Engineering, Beihang University, Beijing, ChinaSchool of Reliability and Systems Engineering, Beihang University, Beijing, ChinaSchool of Information Science and Technology, North China University of Technology, Beijing, ChinaSchool of Reliability and Systems Engineering, Beihang University, Beijing, ChinaText similarity measurement, which is a basic task in natural language processing, is widely used in text information mining, news classification and clustering, artificial intelligence, and other fields. This paper proposes a text similarity measure method named word vector distance decentralization (WVDD) which can deal with complex semantic relations, including sentence components, word order and weights for Chinese language. Then, the clustering analysis is performed for the obtained similarity results. A K-means algorithm based on Spark architecture for parallel computing is adopted to accelerate clustering speed here. In experimental verification, the test sets are significant number of customer comments posted on the Jingdong website, which is a comprehensive online shopping mall. F-measure is used to evaluate the accuracy of the results obtained by the proposed method. The superiority of the proposed method is verified and compared with the sentence vector model (Doc2vec) and bag-of-words model. The proposed method can be applied to analyze network language, such as customers' comments online and web chat data.https://ieeexplore.ieee.org/document/8784295/Text similarityclustering analysisparallel computingsemantic cognition |
spellingShingle | Shenghan Zhou Xingxing Xu Yinglai Liu Runfeng Chang Yiyong Xiao Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization With Clustering Analysis IEEE Access Text similarity clustering analysis parallel computing semantic cognition |
title | Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization With Clustering Analysis |
title_full | Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization With Clustering Analysis |
title_fullStr | Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization With Clustering Analysis |
title_full_unstemmed | Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization With Clustering Analysis |
title_short | Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization With Clustering Analysis |
title_sort | text similarity measurement of semantic cognition based on word vector distance decentralization with clustering analysis |
topic | Text similarity clustering analysis parallel computing semantic cognition |
url | https://ieeexplore.ieee.org/document/8784295/ |
work_keys_str_mv | AT shenghanzhou textsimilaritymeasurementofsemanticcognitionbasedonwordvectordistancedecentralizationwithclusteringanalysis AT xingxingxu textsimilaritymeasurementofsemanticcognitionbasedonwordvectordistancedecentralizationwithclusteringanalysis AT yinglailiu textsimilaritymeasurementofsemanticcognitionbasedonwordvectordistancedecentralizationwithclusteringanalysis AT runfengchang textsimilaritymeasurementofsemanticcognitionbasedonwordvectordistancedecentralizationwithclusteringanalysis AT yiyongxiao textsimilaritymeasurementofsemanticcognitionbasedonwordvectordistancedecentralizationwithclusteringanalysis |