Linguistic-inspired Chinese sentiment analysis : from characters to radicals and phonetics

Sentiment analysis or opinion mining is a task concerning identifying, extracting and quantifying the sentiment orientations or affective states. The task utilizes a synthesis of techniques like natural language processing, computational linguistics, text mining and so forth. Under its big umbrella,...

Full description

Bibliographic Details
Main Author: Peng, Haiyun
Other Authors: Erik Cambria
Format: Thesis
Language:English
Published: 2019
Subjects:
Online Access:https://hdl.handle.net/10356/84297
http://hdl.handle.net/10220/48173
_version_ 1811688262106873856
author Peng, Haiyun
author2 Erik Cambria
author_facet Erik Cambria
Peng, Haiyun
author_sort Peng, Haiyun
collection NTU
description Sentiment analysis or opinion mining is a task concerning identifying, extracting and quantifying the sentiment orientations or affective states. The task utilizes a synthesis of techniques like natural language processing, computational linguistics, text mining and so forth. Under its big umbrella, various sub-tasks exist, such as subjectivity detection, sentiment classification, named entity recognition, and sarcasm detection etc. Large quantities of research work that studied the aforementioned tasks were conducted on the English language, due to the popularity of English on the international platform and, thus, its abundance of language resource. Although this research could be applied to other Indo-European languages, they are deficient in performing on many oriental languages, especially on the Chinese language. This was caused by the specific characteristics of the Chinese language. Inspired by linguistics, this thesis discusses the situations and features that make the Chinese language different from English and proposes corresponding approaches to utilize these opportunities. In the beginning, we reviewed the literature on Chinese sentiment analysis research. Amongst which we noticed that existing Chinese sentiment resource was relatively scarce compared to other languages. This was reflected in two aspects: no semantic connection between words and missing sentiment intensity (fine-grained) measure. Thus, we proposed an unsupervised method to construct a semantic-connected valence Chinese sentiment resource. The mapping-based method leveraged on multiple multilingual and sentiment resources, such as WordNet etc. Next, we found that Chinese word segmentation could be a source of errors in sentiment analysis, especially in a non-general domain, such as finance or medical. In addition, we analyzed that intra-character components (radicals) of Chinese text carry semantics due to its origin of the pictogram (or ideogram). To this end, we proposed a radical-based hierarchical character embedding to skip the word segmentation step and also to inject intra-character semantics to the text representation. The new text representation outperformed word-level representation by a considerable margin in the sentiment classification task. When we tried to extend the hierarchical embedding to aspect-based sentiment analysis task, we realized that existing methods all tend to take the averaged embeddings of multi-word aspect target to represent the aspect target. This assumption will work in English on the condition that the proportion of multi-word aspect target is relatively low. However, almost all Chinese aspect targets are multi-character targets. Thus, we introduced an aspect target sequence modeling (ATSM) network to specifically learn adaptive aspect target representation based on sentence context and ATSM-Fusion network to consider the multi-granularity feature of Chinese text. The ATSM model alone achieved the state-of-the-art performance in English ABSA and ATSM-Fusion pushed the Chinese ABSA performance higher. In addition to addressing Chinese sentiment analysis from textual modality, we proposed to incorporate phonetic information for textual sentiment analysis. We introduce two effective features to encode phonetic information. Then, we developed a disambiguate intonation for sentiment analysis (DISA) network using a reinforcement network. It functions as disambiguating intonations for each Chinese character (pinyin). Thus, a precise phonetic representation of Chinese is learned. Furthermore, we fused phonetic features with textual and visual features in order to mimic the way humans read and understand Chinese text. Experimental results show that the inclusion of phonetic features significantly and consistently improves the performance of textual and visual representations In summary, this thesis introduces several approaches to Chinese sentiment analysis, addressing and utilizing the linguistic characteristics (e.g., compositionality, multi-granularity, phonology) that distinguish Chinese from other languages.
first_indexed 2024-10-01T05:29:24Z
format Thesis
id ntu-10356/84297
institution Nanyang Technological University
language English
last_indexed 2024-10-01T05:29:24Z
publishDate 2019
record_format dspace
spelling ntu-10356/842972020-07-01T05:17:27Z Linguistic-inspired Chinese sentiment analysis : from characters to radicals and phonetics Peng, Haiyun Erik Cambria School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering Sentiment analysis or opinion mining is a task concerning identifying, extracting and quantifying the sentiment orientations or affective states. The task utilizes a synthesis of techniques like natural language processing, computational linguistics, text mining and so forth. Under its big umbrella, various sub-tasks exist, such as subjectivity detection, sentiment classification, named entity recognition, and sarcasm detection etc. Large quantities of research work that studied the aforementioned tasks were conducted on the English language, due to the popularity of English on the international platform and, thus, its abundance of language resource. Although this research could be applied to other Indo-European languages, they are deficient in performing on many oriental languages, especially on the Chinese language. This was caused by the specific characteristics of the Chinese language. Inspired by linguistics, this thesis discusses the situations and features that make the Chinese language different from English and proposes corresponding approaches to utilize these opportunities. In the beginning, we reviewed the literature on Chinese sentiment analysis research. Amongst which we noticed that existing Chinese sentiment resource was relatively scarce compared to other languages. This was reflected in two aspects: no semantic connection between words and missing sentiment intensity (fine-grained) measure. Thus, we proposed an unsupervised method to construct a semantic-connected valence Chinese sentiment resource. The mapping-based method leveraged on multiple multilingual and sentiment resources, such as WordNet etc. Next, we found that Chinese word segmentation could be a source of errors in sentiment analysis, especially in a non-general domain, such as finance or medical. In addition, we analyzed that intra-character components (radicals) of Chinese text carry semantics due to its origin of the pictogram (or ideogram). To this end, we proposed a radical-based hierarchical character embedding to skip the word segmentation step and also to inject intra-character semantics to the text representation. The new text representation outperformed word-level representation by a considerable margin in the sentiment classification task. When we tried to extend the hierarchical embedding to aspect-based sentiment analysis task, we realized that existing methods all tend to take the averaged embeddings of multi-word aspect target to represent the aspect target. This assumption will work in English on the condition that the proportion of multi-word aspect target is relatively low. However, almost all Chinese aspect targets are multi-character targets. Thus, we introduced an aspect target sequence modeling (ATSM) network to specifically learn adaptive aspect target representation based on sentence context and ATSM-Fusion network to consider the multi-granularity feature of Chinese text. The ATSM model alone achieved the state-of-the-art performance in English ABSA and ATSM-Fusion pushed the Chinese ABSA performance higher. In addition to addressing Chinese sentiment analysis from textual modality, we proposed to incorporate phonetic information for textual sentiment analysis. We introduce two effective features to encode phonetic information. Then, we developed a disambiguate intonation for sentiment analysis (DISA) network using a reinforcement network. It functions as disambiguating intonations for each Chinese character (pinyin). Thus, a precise phonetic representation of Chinese is learned. Furthermore, we fused phonetic features with textual and visual features in order to mimic the way humans read and understand Chinese text. Experimental results show that the inclusion of phonetic features significantly and consistently improves the performance of textual and visual representations In summary, this thesis introduces several approaches to Chinese sentiment analysis, addressing and utilizing the linguistic characteristics (e.g., compositionality, multi-granularity, phonology) that distinguish Chinese from other languages. Doctor of Philosophy 2019-05-13T05:44:42Z 2019-12-06T15:42:20Z 2019-05-13T05:44:42Z 2019-12-06T15:42:20Z 2019 Thesis Peng, H. (2019). Linguistic-inspired Chinese sentiment analysis : from characters to radicals and phonetics. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/84297 http://hdl.handle.net/10220/48173 10.32657/10220/48173 en 143 p. application/pdf
spellingShingle DRNTU::Engineering::Computer science and engineering
Peng, Haiyun
Linguistic-inspired Chinese sentiment analysis : from characters to radicals and phonetics
title Linguistic-inspired Chinese sentiment analysis : from characters to radicals and phonetics
title_full Linguistic-inspired Chinese sentiment analysis : from characters to radicals and phonetics
title_fullStr Linguistic-inspired Chinese sentiment analysis : from characters to radicals and phonetics
title_full_unstemmed Linguistic-inspired Chinese sentiment analysis : from characters to radicals and phonetics
title_short Linguistic-inspired Chinese sentiment analysis : from characters to radicals and phonetics
title_sort linguistic inspired chinese sentiment analysis from characters to radicals and phonetics
topic DRNTU::Engineering::Computer science and engineering
url https://hdl.handle.net/10356/84297
http://hdl.handle.net/10220/48173
work_keys_str_mv AT penghaiyun linguisticinspiredchinesesentimentanalysisfromcharacterstoradicalsandphonetics