Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social Media

The last two decades have seen an exponential increase in the use of the Internet and social media, which has changed basic human interaction. This has led to many positive outcomes. At the same time, it has brought risks and harms. The volume of harmful content online, such as hate speech, is not m...

Full description

Bibliographic Details
Main Authors: Neeraj Vashistha, Arkaitz Zubiaga
Format: Article
Language:English
Published: MDPI AG 2020-12-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/12/1/5
_version_ 1797543869405986816
author Neeraj Vashistha
Arkaitz Zubiaga
author_facet Neeraj Vashistha
Arkaitz Zubiaga
author_sort Neeraj Vashistha
collection DOAJ
description The last two decades have seen an exponential increase in the use of the Internet and social media, which has changed basic human interaction. This has led to many positive outcomes. At the same time, it has brought risks and harms. The volume of harmful content online, such as hate speech, is not manageable by humans. The interest in the academic community to investigate automated means for hate speech detection has increased. In this study, we analyse six publicly available datasets by combining them into a single homogeneous dataset. Having classified them into three classes, abusive, hateful or neither, we create a baseline model and improve model performance scores using various optimisation techniques. After attaining a competitive performance score, we create a tool that identifies and scores a page with an effective metric in near-real-time and uses the same feedback to re-train our model. We prove the competitive performance of our multilingual model in two languages, English and Hindi. This leads to comparable or superior performance to most monolingual models.
first_indexed 2024-03-10T13:51:43Z
format Article
id doaj.art-b496dc96a5b646b4b68607648e9b14f8
institution Directory Open Access Journal
issn 2078-2489
language English
last_indexed 2024-03-10T13:51:43Z
publishDate 2020-12-01
publisher MDPI AG
record_format Article
series Information
spelling doaj.art-b496dc96a5b646b4b68607648e9b14f82023-11-21T02:07:15ZengMDPI AGInformation2078-24892020-12-01121510.3390/info12010005Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social MediaNeeraj Vashistha0Arkaitz Zubiaga1School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UKSchool of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UKThe last two decades have seen an exponential increase in the use of the Internet and social media, which has changed basic human interaction. This has led to many positive outcomes. At the same time, it has brought risks and harms. The volume of harmful content online, such as hate speech, is not manageable by humans. The interest in the academic community to investigate automated means for hate speech detection has increased. In this study, we analyse six publicly available datasets by combining them into a single homogeneous dataset. Having classified them into three classes, abusive, hateful or neither, we create a baseline model and improve model performance scores using various optimisation techniques. After attaining a competitive performance score, we create a tool that identifies and scores a page with an effective metric in near-real-time and uses the same feedback to re-train our model. We prove the competitive performance of our multilingual model in two languages, English and Hindi. This leads to comparable or superior performance to most monolingual models.https://www.mdpi.com/2078-2489/12/1/5social mediahate speechtext classification
spellingShingle Neeraj Vashistha
Arkaitz Zubiaga
Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social Media
Information
social media
hate speech
text classification
title Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social Media
title_full Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social Media
title_fullStr Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social Media
title_full_unstemmed Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social Media
title_short Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social Media
title_sort online multilingual hate speech detection experimenting with hindi and english social media
topic social media
hate speech
text classification
url https://www.mdpi.com/2078-2489/12/1/5
work_keys_str_mv AT neerajvashistha onlinemultilingualhatespeechdetectionexperimentingwithhindiandenglishsocialmedia
AT arkaitzzubiaga onlinemultilingualhatespeechdetectionexperimentingwithhindiandenglishsocialmedia