Topic Discovery and Hotspot Analysis of Sentiment Analysis of Chinese Text Using Information-Theoretic Method

Currently, sentiment analysis is a research hotspot in many fields such as computer science and statistical science. Topic discovery of the literature in the field of text sentiment analysis aims to provide scholars with a quick and effective understanding of its research trends. In this paper, we p...

Full description

Bibliographic Details
Main Authors: Changlu Zhang, Haojie Fan, Jian Zhang, Qiong Yang, Liqian Tang
Format: Article
Language:English
Published: MDPI AG 2023-06-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/25/6/935
_version_ 1797594927068086272
author Changlu Zhang
Haojie Fan
Jian Zhang
Qiong Yang
Liqian Tang
author_facet Changlu Zhang
Haojie Fan
Jian Zhang
Qiong Yang
Liqian Tang
author_sort Changlu Zhang
collection DOAJ
description Currently, sentiment analysis is a research hotspot in many fields such as computer science and statistical science. Topic discovery of the literature in the field of text sentiment analysis aims to provide scholars with a quick and effective understanding of its research trends. In this paper, we propose a new model for the topic discovery analysis of literature. Firstly, the FastText model is applied to calculate the word vector of literature keywords, based on which cosine similarity is applied to calculate keyword similarity, to carry out the merging of synonymous keywords. Secondly, the hierarchical clustering method based on the Jaccard coefficient is used to cluster the domain literature and count the literature volume of each topic. Thirdly, the information gain method is applied to extract the high information gain characteristic words of various topics, based on which the connotation of each topic is condensed. Finally, by conducting a time series analysis of the literature, a four-quadrant matrix of topic distribution is constructed to compare the research trends of each topic within different stages. The 1186 articles in the field of text sentiment analysis from 2012 to 2022 can be divided into 12 categories. By comparing and analyzing the topic distribution matrices of the two phases of 2012 to 2016 and 2017 to 2022, it is found that the various categories of topics have obvious research development changes in different phases. The results show that: ① Among the 12 categories, online opinion analysis of social media comments represented by microblogs is one of the current hot topics. ② The integration and application of methods such as sentiment lexicon, traditional machine learning and deep learning should be enhanced. ③ Semantic disambiguation of aspect-level sentiment analysis is one of the current difficult problems this field faces. ④ Research on multimodal sentiment analysis and cross-modal sentiment analysis should be promoted.
first_indexed 2024-03-11T02:29:38Z
format Article
id doaj.art-4de308f09dd44af8a0c60da83279edb5
institution Directory Open Access Journal
issn 1099-4300
language English
last_indexed 2024-03-11T02:29:38Z
publishDate 2023-06-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj.art-4de308f09dd44af8a0c60da83279edb52023-11-18T10:18:29ZengMDPI AGEntropy1099-43002023-06-0125693510.3390/e25060935Topic Discovery and Hotspot Analysis of Sentiment Analysis of Chinese Text Using Information-Theoretic MethodChanglu Zhang0Haojie Fan1Jian Zhang2Qiong Yang3Liqian Tang4School of Economics & Management, Beijing Information Science & Technology University, Beijing 100192, ChinaBeijing Key Lab of Green Development Decision Based on Big Data, Beijing 100192, ChinaSchool of Economics & Management, Beijing Information Science & Technology University, Beijing 100192, ChinaSchool of Economics & Management, Beijing Information Science & Technology University, Beijing 100192, ChinaSchool of Economics & Management, Beijing Information Science & Technology University, Beijing 100192, ChinaCurrently, sentiment analysis is a research hotspot in many fields such as computer science and statistical science. Topic discovery of the literature in the field of text sentiment analysis aims to provide scholars with a quick and effective understanding of its research trends. In this paper, we propose a new model for the topic discovery analysis of literature. Firstly, the FastText model is applied to calculate the word vector of literature keywords, based on which cosine similarity is applied to calculate keyword similarity, to carry out the merging of synonymous keywords. Secondly, the hierarchical clustering method based on the Jaccard coefficient is used to cluster the domain literature and count the literature volume of each topic. Thirdly, the information gain method is applied to extract the high information gain characteristic words of various topics, based on which the connotation of each topic is condensed. Finally, by conducting a time series analysis of the literature, a four-quadrant matrix of topic distribution is constructed to compare the research trends of each topic within different stages. The 1186 articles in the field of text sentiment analysis from 2012 to 2022 can be divided into 12 categories. By comparing and analyzing the topic distribution matrices of the two phases of 2012 to 2016 and 2017 to 2022, it is found that the various categories of topics have obvious research development changes in different phases. The results show that: ① Among the 12 categories, online opinion analysis of social media comments represented by microblogs is one of the current hot topics. ② The integration and application of methods such as sentiment lexicon, traditional machine learning and deep learning should be enhanced. ③ Semantic disambiguation of aspect-level sentiment analysis is one of the current difficult problems this field faces. ④ Research on multimodal sentiment analysis and cross-modal sentiment analysis should be promoted.https://www.mdpi.com/1099-4300/25/6/935sentiment analysistopic discoveryFastTextinformation gainhierarchical clustering
spellingShingle Changlu Zhang
Haojie Fan
Jian Zhang
Qiong Yang
Liqian Tang
Topic Discovery and Hotspot Analysis of Sentiment Analysis of Chinese Text Using Information-Theoretic Method
Entropy
sentiment analysis
topic discovery
FastText
information gain
hierarchical clustering
title Topic Discovery and Hotspot Analysis of Sentiment Analysis of Chinese Text Using Information-Theoretic Method
title_full Topic Discovery and Hotspot Analysis of Sentiment Analysis of Chinese Text Using Information-Theoretic Method
title_fullStr Topic Discovery and Hotspot Analysis of Sentiment Analysis of Chinese Text Using Information-Theoretic Method
title_full_unstemmed Topic Discovery and Hotspot Analysis of Sentiment Analysis of Chinese Text Using Information-Theoretic Method
title_short Topic Discovery and Hotspot Analysis of Sentiment Analysis of Chinese Text Using Information-Theoretic Method
title_sort topic discovery and hotspot analysis of sentiment analysis of chinese text using information theoretic method
topic sentiment analysis
topic discovery
FastText
information gain
hierarchical clustering
url https://www.mdpi.com/1099-4300/25/6/935
work_keys_str_mv AT changluzhang topicdiscoveryandhotspotanalysisofsentimentanalysisofchinesetextusinginformationtheoreticmethod
AT haojiefan topicdiscoveryandhotspotanalysisofsentimentanalysisofchinesetextusinginformationtheoreticmethod
AT jianzhang topicdiscoveryandhotspotanalysisofsentimentanalysisofchinesetextusinginformationtheoreticmethod
AT qiongyang topicdiscoveryandhotspotanalysisofsentimentanalysisofchinesetextusinginformationtheoreticmethod
AT liqiantang topicdiscoveryandhotspotanalysisofsentimentanalysisofchinesetextusinginformationtheoreticmethod