Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization With Clustering Analysis

Text similarity measurement, which is a basic task in natural language processing, is widely used in text information mining, news classification and clustering, artificial intelligence, and other fields. This paper proposes a text similarity measure method named word vector distance decentralizatio...

Full description

Bibliographic Details
Main Authors: Shenghan Zhou, Xingxing Xu, Yinglai Liu, Runfeng Chang, Yiyong Xiao
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8784295/
_version_ 1818611560037220352
author Shenghan Zhou
Xingxing Xu
Yinglai Liu
Runfeng Chang
Yiyong Xiao
author_facet Shenghan Zhou
Xingxing Xu
Yinglai Liu
Runfeng Chang
Yiyong Xiao
author_sort Shenghan Zhou
collection DOAJ
description Text similarity measurement, which is a basic task in natural language processing, is widely used in text information mining, news classification and clustering, artificial intelligence, and other fields. This paper proposes a text similarity measure method named word vector distance decentralization (WVDD) which can deal with complex semantic relations, including sentence components, word order and weights for Chinese language. Then, the clustering analysis is performed for the obtained similarity results. A K-means algorithm based on Spark architecture for parallel computing is adopted to accelerate clustering speed here. In experimental verification, the test sets are significant number of customer comments posted on the Jingdong website, which is a comprehensive online shopping mall. F-measure is used to evaluate the accuracy of the results obtained by the proposed method. The superiority of the proposed method is verified and compared with the sentence vector model (Doc2vec) and bag-of-words model. The proposed method can be applied to analyze network language, such as customers' comments online and web chat data.
first_indexed 2024-12-16T15:32:16Z
format Article
id doaj.art-c5d8e6ce276045bba9501a2ec36bc633
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-16T15:32:16Z
publishDate 2019-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-c5d8e6ce276045bba9501a2ec36bc6332022-12-21T22:26:19ZengIEEEIEEE Access2169-35362019-01-01710724710725810.1109/ACCESS.2019.29323348784295Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization With Clustering AnalysisShenghan Zhou0https://orcid.org/0000-0001-7979-4912Xingxing Xu1https://orcid.org/0000-0001-9296-0284Yinglai Liu2Runfeng Chang3Yiyong Xiao4School of Reliability and Systems Engineering, Beihang University, Beijing, ChinaSchool of Reliability and Systems Engineering, Beihang University, Beijing, ChinaSchool of Reliability and Systems Engineering, Beihang University, Beijing, ChinaSchool of Information Science and Technology, North China University of Technology, Beijing, ChinaSchool of Reliability and Systems Engineering, Beihang University, Beijing, ChinaText similarity measurement, which is a basic task in natural language processing, is widely used in text information mining, news classification and clustering, artificial intelligence, and other fields. This paper proposes a text similarity measure method named word vector distance decentralization (WVDD) which can deal with complex semantic relations, including sentence components, word order and weights for Chinese language. Then, the clustering analysis is performed for the obtained similarity results. A K-means algorithm based on Spark architecture for parallel computing is adopted to accelerate clustering speed here. In experimental verification, the test sets are significant number of customer comments posted on the Jingdong website, which is a comprehensive online shopping mall. F-measure is used to evaluate the accuracy of the results obtained by the proposed method. The superiority of the proposed method is verified and compared with the sentence vector model (Doc2vec) and bag-of-words model. The proposed method can be applied to analyze network language, such as customers' comments online and web chat data.https://ieeexplore.ieee.org/document/8784295/Text similarityclustering analysisparallel computingsemantic cognition
spellingShingle Shenghan Zhou
Xingxing Xu
Yinglai Liu
Runfeng Chang
Yiyong Xiao
Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization With Clustering Analysis
IEEE Access
Text similarity
clustering analysis
parallel computing
semantic cognition
title Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization With Clustering Analysis
title_full Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization With Clustering Analysis
title_fullStr Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization With Clustering Analysis
title_full_unstemmed Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization With Clustering Analysis
title_short Text Similarity Measurement of Semantic Cognition Based on Word Vector Distance Decentralization With Clustering Analysis
title_sort text similarity measurement of semantic cognition based on word vector distance decentralization with clustering analysis
topic Text similarity
clustering analysis
parallel computing
semantic cognition
url https://ieeexplore.ieee.org/document/8784295/
work_keys_str_mv AT shenghanzhou textsimilaritymeasurementofsemanticcognitionbasedonwordvectordistancedecentralizationwithclusteringanalysis
AT xingxingxu textsimilaritymeasurementofsemanticcognitionbasedonwordvectordistancedecentralizationwithclusteringanalysis
AT yinglailiu textsimilaritymeasurementofsemanticcognitionbasedonwordvectordistancedecentralizationwithclusteringanalysis
AT runfengchang textsimilaritymeasurementofsemanticcognitionbasedonwordvectordistancedecentralizationwithclusteringanalysis
AT yiyongxiao textsimilaritymeasurementofsemanticcognitionbasedonwordvectordistancedecentralizationwithclusteringanalysis