Semi-Supervised Self-Training of Hate and Offensive Speech from Social Media

Improving Offensive and Hate Speech (OHS) classifiers’ performances requires a large, confidently labeled textual training dataset. Our study devises a semi-supervised classification approach with self-training to leverage the abundant social media content and develop a robust OHS classifier. The cl...

Full description

Bibliographic Details
Main Authors:	Safa Alsafari, Samira Sadaoui
Format:	Article
Language:	English
Published:	Taylor & Francis Group 2021-12-01
Series:	Applied Artificial Intelligence
Online Access:	http://dx.doi.org/10.1080/08839514.2021.1988443

_version_	1797684832234373120
author	Safa Alsafari Samira Sadaoui
author_facet	Safa Alsafari Samira Sadaoui
author_sort	Safa Alsafari
collection	DOAJ
description	Improving Offensive and Hate Speech (OHS) classifiers’ performances requires a large, confidently labeled textual training dataset. Our study devises a semi-supervised classification approach with self-training to leverage the abundant social media content and develop a robust OHS classifier. The classifier is self-trained iteratively using the most confidently predicted labels obtained from an unlabeled Twitter corpus of 5 million tweets. Hence, we produce the largest supervised Arabic OHS dataset. To this end, we first select the best classifier to conduct the semi-supervised learning by assessing multiple heterogeneous pairs of text vectorization algorithms (such as N-Grams, World2Vec Skip-Gram, AraBert and DistilBert) and machine learning algorithms (such as SVM, CNN and BiLSTM). Then, based on the best text classifier, we perform six groups of experiments to demonstrate our approach’s feasibility and efficacy based on several self-training iterations.
first_indexed	2024-03-12T00:36:28Z
format	Article
id	doaj.art-b733397d0e934e43bd40aa7315decb79
institution	Directory Open Access Journal
issn	0883-9514 1087-6545
language	English
last_indexed	2024-03-12T00:36:28Z
publishDate	2021-12-01
publisher	Taylor & Francis Group
record_format	Article
series	Applied Artificial Intelligence
spelling	doaj.art-b733397d0e934e43bd40aa7315decb792023-09-15T09:33:59ZengTaylor & Francis GroupApplied Artificial Intelligence0883-95141087-65452021-12-0135151621164510.1080/08839514.2021.19884431988443Semi-Supervised Self-Training of Hate and Offensive Speech from Social MediaSafa Alsafari0Samira Sadaoui1University of ReginaUniversity of ReginaImproving Offensive and Hate Speech (OHS) classifiers’ performances requires a large, confidently labeled textual training dataset. Our study devises a semi-supervised classification approach with self-training to leverage the abundant social media content and develop a robust OHS classifier. The classifier is self-trained iteratively using the most confidently predicted labels obtained from an unlabeled Twitter corpus of 5 million tweets. Hence, we produce the largest supervised Arabic OHS dataset. To this end, we first select the best classifier to conduct the semi-supervised learning by assessing multiple heterogeneous pairs of text vectorization algorithms (such as N-Grams, World2Vec Skip-Gram, AraBert and DistilBert) and machine learning algorithms (such as SVM, CNN and BiLSTM). Then, based on the best text classifier, we perform six groups of experiments to demonstrate our approach’s feasibility and efficacy based on several self-training iterations.http://dx.doi.org/10.1080/08839514.2021.1988443
spellingShingle	Safa Alsafari Samira Sadaoui Semi-Supervised Self-Training of Hate and Offensive Speech from Social Media Applied Artificial Intelligence
title	Semi-Supervised Self-Training of Hate and Offensive Speech from Social Media
title_full	Semi-Supervised Self-Training of Hate and Offensive Speech from Social Media
title_fullStr	Semi-Supervised Self-Training of Hate and Offensive Speech from Social Media
title_full_unstemmed	Semi-Supervised Self-Training of Hate and Offensive Speech from Social Media
title_short	Semi-Supervised Self-Training of Hate and Offensive Speech from Social Media
title_sort	semi supervised self training of hate and offensive speech from social media
url	http://dx.doi.org/10.1080/08839514.2021.1988443
work_keys_str_mv	AT safaalsafari semisupervisedselftrainingofhateandoffensivespeechfromsocialmedia AT samirasadaoui semisupervisedselftrainingofhateandoffensivespeechfromsocialmedia

Semi-Supervised Self-Training of Hate and Offensive Speech from Social Media

Similar Items