Lexicon-Based Indonesian Local Language Abusive Words Dictionary to Detect Hate Speech in Social Media

Background: Hate speech is an expression to someone or a group of people that contain feelings of hate and/or anger at people or groups. On social media users are free to express themselves by writing harsh words and share them with a group of people so that it triggers separations and conflicts bet...

Full description

Bibliographic Details
Main Authors:	Mardhiya Hayaty, Sumarni Adi, Anggit Dwi Hartanto
Format:	Article
Language:	English
Published:	Universitas Airlangga 2020-04-01
Series:	Journal of Information Systems Engineering and Business Intelligence
Online Access:	https://e-journal.unair.ac.id/JISEBI/article/view/17372

Description
Summary:	Background: Hate speech is an expression to someone or a group of people that contain feelings of hate and/or anger at people or groups. On social media users are free to express themselves by writing harsh words and share them with a group of people so that it triggers separations and conflicts between groups. Currently, research has been conducted by several experts to detect hate speech in social media namely machine learning-based and lexicon-based, but the machine learning approach has a weakness namely the manual labelling process by an annotator in separating positive, negative or neutral opinions takes time long and tiring Objective: This study aims to produce a dictionary containing abusive words from local languages in Indonesia. Lexicon-base is very dependent on the language contained in dictionary words. Indonesia has thousands of tribes with 2500 local languages, and 80% of the population of Indonesia use local languages in communication, with the result that a significant challenge to detect hate speech of social media. Methods: Abusive words surveys are conducted by using proportionate stratified random sampling techniques in 4 major tribes on the island of Java, namely Betawi, Sundanese, Javanese, Madurese Results: The experimental results produce 250 abusive words dictionary from 4 major Indonesian tribes to detect hate speech in Indonesian social media by using the lexicon-based approach. Conclusion: A stratified random sampling technique has been conducted in 4 major Indonesian tribes to produce 250 abusive words for hate speech detection using the lexicon-based approach.
ISSN:	2598-6333 2443-2555

Lexicon-Based Indonesian Local Language Abusive Words Dictionary to Detect Hate Speech in Social Media

Similar Items