Implementation Word2Vec for Feature Expansion in Twitter Sentiment Analysis

Abstract Twitter is a microblog-based social media site launched on July 13, 2006. In March 2020, 476.696 tweets about the government policy in COVID-19 spread on Twitter were captured by the Institute for Development of Economics and Finance (Indef). Government policy has a standard meaning, nam...

Full description

Bibliographic Details
Main Authors: Naufal Adi Nugroho, Erwin Budi Setiawan
Format: Article
Language:English
Published: Ikatan Ahli Informatika Indonesia 2021-10-01
Series:Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
Subjects:
Online Access:http://jurnal.iaii.or.id/index.php/RESTI/article/view/3325
_version_ 1827361239260463104
author Naufal Adi Nugroho
Erwin Budi Setiawan
author_facet Naufal Adi Nugroho
Erwin Budi Setiawan
author_sort Naufal Adi Nugroho
collection DOAJ
description Abstract Twitter is a microblog-based social media site launched on July 13, 2006. In March 2020, 476.696 tweets about the government policy in COVID-19 spread on Twitter were captured by the Institute for Development of Economics and Finance (Indef). Government policy has a standard meaning, namely a decision systematically made by the government with specific goals and objectives relating to the public interest, whether carried out directly or indirectly. Sentiment analysis analyzes people’s opinions, sentiments, evaluations, attitudes, and emotions from written language. In this decade, Sentiment Analysis is has become a trendy research area. The purpose of this paper is to focus how to implement word2vec using similarity word as a feature expansion for minimize the vocabulary mismatch in Twitter Sentiment Analysis using “word embeddings”. This research contains 11.395 tweets for a dataset, where the dataset will be used in two classifications: Support Vector Machine Algorithm and Artificial Neural Network Algorithm. The output of Word2Vec will be used for feature expansion in this research, where the algorithm of expansion will check in each row in the corpus where has a similarity vector with that word and will replace the word with the similarity of this words if the value is 0. The dataset in Feature Expansion is using 142.545 articles from Indonesian media. The result of this research is ANN is better than SVM, where the ANN without feature expansion gets 68.89 % and using feature expansion gets 72.58 %. For SVM, the final accuracy without feature expansion is 63.95 %, and using feature expansion gets 68.56 %. This research proves that feature expansion can improve the final accuracy.
first_indexed 2024-03-08T07:01:11Z
format Article
id doaj.art-d37332f11904492b8e6359c707b42069
institution Directory Open Access Journal
issn 2580-0760
language English
last_indexed 2024-03-08T07:01:11Z
publishDate 2021-10-01
publisher Ikatan Ahli Informatika Indonesia
record_format Article
series Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
spelling doaj.art-d37332f11904492b8e6359c707b420692024-02-03T05:47:33ZengIkatan Ahli Informatika IndonesiaJurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)2580-07602021-10-015583784210.29207/resti.v5i5.33253325Implementation Word2Vec for Feature Expansion in Twitter Sentiment AnalysisNaufal Adi Nugroho0Erwin Budi Setiawan1Telkom UniversityTelkom UniversityAbstract Twitter is a microblog-based social media site launched on July 13, 2006. In March 2020, 476.696 tweets about the government policy in COVID-19 spread on Twitter were captured by the Institute for Development of Economics and Finance (Indef). Government policy has a standard meaning, namely a decision systematically made by the government with specific goals and objectives relating to the public interest, whether carried out directly or indirectly. Sentiment analysis analyzes people’s opinions, sentiments, evaluations, attitudes, and emotions from written language. In this decade, Sentiment Analysis is has become a trendy research area. The purpose of this paper is to focus how to implement word2vec using similarity word as a feature expansion for minimize the vocabulary mismatch in Twitter Sentiment Analysis using “word embeddings”. This research contains 11.395 tweets for a dataset, where the dataset will be used in two classifications: Support Vector Machine Algorithm and Artificial Neural Network Algorithm. The output of Word2Vec will be used for feature expansion in this research, where the algorithm of expansion will check in each row in the corpus where has a similarity vector with that word and will replace the word with the similarity of this words if the value is 0. The dataset in Feature Expansion is using 142.545 articles from Indonesian media. The result of this research is ANN is better than SVM, where the ANN without feature expansion gets 68.89 % and using feature expansion gets 72.58 %. For SVM, the final accuracy without feature expansion is 63.95 %, and using feature expansion gets 68.56 %. This research proves that feature expansion can improve the final accuracy.http://jurnal.iaii.or.id/index.php/RESTI/article/view/3325sentiment analysis, svm, ann, word2vec, tf-idf
spellingShingle Naufal Adi Nugroho
Erwin Budi Setiawan
Implementation Word2Vec for Feature Expansion in Twitter Sentiment Analysis
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
sentiment analysis, svm, ann, word2vec, tf-idf
title Implementation Word2Vec for Feature Expansion in Twitter Sentiment Analysis
title_full Implementation Word2Vec for Feature Expansion in Twitter Sentiment Analysis
title_fullStr Implementation Word2Vec for Feature Expansion in Twitter Sentiment Analysis
title_full_unstemmed Implementation Word2Vec for Feature Expansion in Twitter Sentiment Analysis
title_short Implementation Word2Vec for Feature Expansion in Twitter Sentiment Analysis
title_sort implementation word2vec for feature expansion in twitter sentiment analysis
topic sentiment analysis, svm, ann, word2vec, tf-idf
url http://jurnal.iaii.or.id/index.php/RESTI/article/view/3325
work_keys_str_mv AT naufaladinugroho implementationword2vecforfeatureexpansionintwittersentimentanalysis
AT erwinbudisetiawan implementationword2vecforfeatureexpansionintwittersentimentanalysis