Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social Media

The last two decades have seen an exponential increase in the use of the Internet and social media, which has changed basic human interaction. This has led to many positive outcomes. At the same time, it has brought risks and harms. The volume of harmful content online, such as hate speech, is not m...

Full description

Bibliographic Details
Main Authors:	Neeraj Vashistha, Arkaitz Zubiaga
Format:	Article
Language:	English
Published:	MDPI AG 2020-12-01
Series:	Information
Subjects:	social media hate speech text classification
Online Access:	https://www.mdpi.com/2078-2489/12/1/5

_version_	1797543869405986816
author	Neeraj Vashistha Arkaitz Zubiaga
author_facet	Neeraj Vashistha Arkaitz Zubiaga
author_sort	Neeraj Vashistha
collection	DOAJ
description	The last two decades have seen an exponential increase in the use of the Internet and social media, which has changed basic human interaction. This has led to many positive outcomes. At the same time, it has brought risks and harms. The volume of harmful content online, such as hate speech, is not manageable by humans. The interest in the academic community to investigate automated means for hate speech detection has increased. In this study, we analyse six publicly available datasets by combining them into a single homogeneous dataset. Having classified them into three classes, abusive, hateful or neither, we create a baseline model and improve model performance scores using various optimisation techniques. After attaining a competitive performance score, we create a tool that identifies and scores a page with an effective metric in near-real-time and uses the same feedback to re-train our model. We prove the competitive performance of our multilingual model in two languages, English and Hindi. This leads to comparable or superior performance to most monolingual models.
first_indexed	2024-03-10T13:51:43Z
format	Article
id	doaj.art-b496dc96a5b646b4b68607648e9b14f8
institution	Directory Open Access Journal
issn	2078-2489
language	English
last_indexed	2024-03-10T13:51:43Z
publishDate	2020-12-01
publisher	MDPI AG
record_format	Article
series	Information
spelling	doaj.art-b496dc96a5b646b4b68607648e9b14f82023-11-21T02:07:15ZengMDPI AGInformation2078-24892020-12-01121510.3390/info12010005Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social MediaNeeraj Vashistha0Arkaitz Zubiaga1School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UKSchool of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UKThe last two decades have seen an exponential increase in the use of the Internet and social media, which has changed basic human interaction. This has led to many positive outcomes. At the same time, it has brought risks and harms. The volume of harmful content online, such as hate speech, is not manageable by humans. The interest in the academic community to investigate automated means for hate speech detection has increased. In this study, we analyse six publicly available datasets by combining them into a single homogeneous dataset. Having classified them into three classes, abusive, hateful or neither, we create a baseline model and improve model performance scores using various optimisation techniques. After attaining a competitive performance score, we create a tool that identifies and scores a page with an effective metric in near-real-time and uses the same feedback to re-train our model. We prove the competitive performance of our multilingual model in two languages, English and Hindi. This leads to comparable or superior performance to most monolingual models.https://www.mdpi.com/2078-2489/12/1/5social mediahate speechtext classification
spellingShingle	Neeraj Vashistha Arkaitz Zubiaga Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social Media Information social media hate speech text classification
title	Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social Media
title_full	Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social Media
title_fullStr	Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social Media
title_full_unstemmed	Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social Media
title_short	Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social Media
title_sort	online multilingual hate speech detection experimenting with hindi and english social media
topic	social media hate speech text classification
url	https://www.mdpi.com/2078-2489/12/1/5
work_keys_str_mv	AT neerajvashistha onlinemultilingualhatespeechdetectionexperimentingwithhindiandenglishsocialmedia AT arkaitzzubiaga onlinemultilingualhatespeechdetectionexperimentingwithhindiandenglishsocialmedia

Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social Media

Similar Items