Sentiment analysis based on statistical machine learning and deep learning

Sentiment analysis is a fundamental yet challenging task in the field of natural language processing, aiming to identify and extract subjective information from text data. It has wide applications in areas such as business intelligence and social media monitoring. This dissertation primarily explore...

Full description

Bibliographic Details
Main Author: Zhu, Wen
Other Authors: Mao Kezhi
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/179725
_version_ 1811682522225967104
author Zhu, Wen
author2 Mao Kezhi
author_facet Mao Kezhi
Zhu, Wen
author_sort Zhu, Wen
collection NTU
description Sentiment analysis is a fundamental yet challenging task in the field of natural language processing, aiming to identify and extract subjective information from text data. It has wide applications in areas such as business intelligence and social media monitoring. This dissertation primarily explores text sentiment analysis methods, focusing on effectively capturing and classifying subjective emotions in short text data from social media. Through a systematic literature review, the study traces the evolution of sentiment analysis from rule-based methods using sentiment lexicons to modern deep learning techniques. It compares the performance of statistical machine learning methods (logistic regression and Naive Bayes) with deep learning techniques (LSTM and BERT) in text sentiment recognition tasks. This dissertation discusses key steps in text preprocessing, including data cleaning, tokenization, stopword removal, and stemming, as well as the databases chosen for each type of experiment. It explores feature extraction techniques, particularly frequency-based methods and TF-IDF. Detailed descriptions of model implementation, training, and testing processes are provided, with a focus on how different text preprocessing techniques and model parameter settings affect sentiment classification performance. Finally, experiments compare the performance of statistical machine learning methods and deep learning methods in understanding text sentiment, with a detailed analysis of the results. Our experiments validated that the TF-IDF feature extraction method has significant advantages over traditional frequency-based feature extraction methods in statistical machine learning models. The results highlight the effectiveness of TF-IDF in evaluating the importance of keywords in texts, particularly when handling large volumes of text data. Moreover, the BERT model demonstrated superior performance across all test databases, significantly surpassing the LSTM model in accuracy, especially when dealing with complex and noisy datasets. BERT's robustness is particularly notable, attributed to its deep bidirectional contextual understanding, enabling it to capture more nuanced emotional expressions in text. Pre-trained on large-scale text data, BERT learns rich language patterns, providing a solid foundation for fine-tuning on specific tasks. This dissertation not only provides a theoretical foundation and practical guidance for selecting sentiment analysis techniques but also offers insights into choosing the most suitable model based on data characteristics and task requirements. Additionally, the research discusses hyperparameter tuning, strategies for handling imbalanced datasets, and how transfer learning can enhance model generalization. These findings and recommendations will help advance the application and development of sentiment analysis technology in business intelligence, social media monitoring, and other related fields.
first_indexed 2024-10-01T03:58:10Z
format Thesis-Master by Coursework
id ntu-10356/179725
institution Nanyang Technological University
language English
last_indexed 2024-10-01T03:58:10Z
publishDate 2024
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1797252024-08-23T15:43:55Z Sentiment analysis based on statistical machine learning and deep learning Zhu, Wen Mao Kezhi School of Electrical and Electronic Engineering EKZMao@ntu.edu.sg Computer and Information Science Engineering Sentiment analysis Machine learning Deep learning Sentiment analysis is a fundamental yet challenging task in the field of natural language processing, aiming to identify and extract subjective information from text data. It has wide applications in areas such as business intelligence and social media monitoring. This dissertation primarily explores text sentiment analysis methods, focusing on effectively capturing and classifying subjective emotions in short text data from social media. Through a systematic literature review, the study traces the evolution of sentiment analysis from rule-based methods using sentiment lexicons to modern deep learning techniques. It compares the performance of statistical machine learning methods (logistic regression and Naive Bayes) with deep learning techniques (LSTM and BERT) in text sentiment recognition tasks. This dissertation discusses key steps in text preprocessing, including data cleaning, tokenization, stopword removal, and stemming, as well as the databases chosen for each type of experiment. It explores feature extraction techniques, particularly frequency-based methods and TF-IDF. Detailed descriptions of model implementation, training, and testing processes are provided, with a focus on how different text preprocessing techniques and model parameter settings affect sentiment classification performance. Finally, experiments compare the performance of statistical machine learning methods and deep learning methods in understanding text sentiment, with a detailed analysis of the results. Our experiments validated that the TF-IDF feature extraction method has significant advantages over traditional frequency-based feature extraction methods in statistical machine learning models. The results highlight the effectiveness of TF-IDF in evaluating the importance of keywords in texts, particularly when handling large volumes of text data. Moreover, the BERT model demonstrated superior performance across all test databases, significantly surpassing the LSTM model in accuracy, especially when dealing with complex and noisy datasets. BERT's robustness is particularly notable, attributed to its deep bidirectional contextual understanding, enabling it to capture more nuanced emotional expressions in text. Pre-trained on large-scale text data, BERT learns rich language patterns, providing a solid foundation for fine-tuning on specific tasks. This dissertation not only provides a theoretical foundation and practical guidance for selecting sentiment analysis techniques but also offers insights into choosing the most suitable model based on data characteristics and task requirements. Additionally, the research discusses hyperparameter tuning, strategies for handling imbalanced datasets, and how transfer learning can enhance model generalization. These findings and recommendations will help advance the application and development of sentiment analysis technology in business intelligence, social media monitoring, and other related fields. Master's degree 2024-08-20T01:47:09Z 2024-08-20T01:47:09Z 2024 Thesis-Master by Coursework Zhu, W. (2024). Sentiment analysis based on statistical machine learning and deep learning. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/179725 https://hdl.handle.net/10356/179725 en ISM-DISS-03925 application/pdf Nanyang Technological University
spellingShingle Computer and Information Science
Engineering
Sentiment analysis
Machine learning
Deep learning
Zhu, Wen
Sentiment analysis based on statistical machine learning and deep learning
title Sentiment analysis based on statistical machine learning and deep learning
title_full Sentiment analysis based on statistical machine learning and deep learning
title_fullStr Sentiment analysis based on statistical machine learning and deep learning
title_full_unstemmed Sentiment analysis based on statistical machine learning and deep learning
title_short Sentiment analysis based on statistical machine learning and deep learning
title_sort sentiment analysis based on statistical machine learning and deep learning
topic Computer and Information Science
Engineering
Sentiment analysis
Machine learning
Deep learning
url https://hdl.handle.net/10356/179725
work_keys_str_mv AT zhuwen sentimentanalysisbasedonstatisticalmachinelearninganddeeplearning