Emotion Detection on Indonesian Tweets Using CNN and Contextualized Word Embedding

Twitter is one of the popular social media to share information through text with a limit of 280 characters called tweets. Many tweets express the emotions of their users, and many studies have been conducted to detect emotions in tweets. In this paper, we train machine learning models to classify I...

Full description

Bibliographic Details
Main Authors: Heldiansyah, Muhammad Fikri, Winarko, Edi
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers Inc. 2022
Subjects:
Online Access:https://repository.ugm.ac.id/278888/1/Heldiansyah_TK.pdf
Description
Summary:Twitter is one of the popular social media to share information through text with a limit of 280 characters called tweets. Many tweets express the emotions of their users, and many studies have been conducted to detect emotions in tweets. In this paper, we train machine learning models to classify Indonesian tweets into five emotion class labels: happy, angry, fear, sadness, and love. The models are based on Convolutional Neural Network (CNN) combined with ELMo (Embedding from Language Models, BERT (Bidirectional Encoder Representations from Transformers), or Word2Vec. Based on the experiment result, we conclude that the best CNN model on tweet data without stemming is BERT-CNN, with a macro-averaged f1-score of 72.83%, compared with the ELMo-CNN model, which has a macro-averaged f1-score of 55.69% and Word2Vec-CNN, which has a macro-averaged f1-score of 65.57%.