Semi-Supervised Self-Training of Hate and Offensive Speech from Social Media
Improving Offensive and Hate Speech (OHS) classifiers’ performances requires a large, confidently labeled textual training dataset. Our study devises a semi-supervised classification approach with self-training to leverage the abundant social media content and develop a robust OHS classifier. The cl...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Taylor & Francis Group
2021-12-01
|
Series: | Applied Artificial Intelligence |
Online Access: | http://dx.doi.org/10.1080/08839514.2021.1988443 |
_version_ | 1797684832234373120 |
---|---|
author | Safa Alsafari Samira Sadaoui |
author_facet | Safa Alsafari Samira Sadaoui |
author_sort | Safa Alsafari |
collection | DOAJ |
description | Improving Offensive and Hate Speech (OHS) classifiers’ performances requires a large, confidently labeled textual training dataset. Our study devises a semi-supervised classification approach with self-training to leverage the abundant social media content and develop a robust OHS classifier. The classifier is self-trained iteratively using the most confidently predicted labels obtained from an unlabeled Twitter corpus of 5 million tweets. Hence, we produce the largest supervised Arabic OHS dataset. To this end, we first select the best classifier to conduct the semi-supervised learning by assessing multiple heterogeneous pairs of text vectorization algorithms (such as N-Grams, World2Vec Skip-Gram, AraBert and DistilBert) and machine learning algorithms (such as SVM, CNN and BiLSTM). Then, based on the best text classifier, we perform six groups of experiments to demonstrate our approach’s feasibility and efficacy based on several self-training iterations. |
first_indexed | 2024-03-12T00:36:28Z |
format | Article |
id | doaj.art-b733397d0e934e43bd40aa7315decb79 |
institution | Directory Open Access Journal |
issn | 0883-9514 1087-6545 |
language | English |
last_indexed | 2024-03-12T00:36:28Z |
publishDate | 2021-12-01 |
publisher | Taylor & Francis Group |
record_format | Article |
series | Applied Artificial Intelligence |
spelling | doaj.art-b733397d0e934e43bd40aa7315decb792023-09-15T09:33:59ZengTaylor & Francis GroupApplied Artificial Intelligence0883-95141087-65452021-12-0135151621164510.1080/08839514.2021.19884431988443Semi-Supervised Self-Training of Hate and Offensive Speech from Social MediaSafa Alsafari0Samira Sadaoui1University of ReginaUniversity of ReginaImproving Offensive and Hate Speech (OHS) classifiers’ performances requires a large, confidently labeled textual training dataset. Our study devises a semi-supervised classification approach with self-training to leverage the abundant social media content and develop a robust OHS classifier. The classifier is self-trained iteratively using the most confidently predicted labels obtained from an unlabeled Twitter corpus of 5 million tweets. Hence, we produce the largest supervised Arabic OHS dataset. To this end, we first select the best classifier to conduct the semi-supervised learning by assessing multiple heterogeneous pairs of text vectorization algorithms (such as N-Grams, World2Vec Skip-Gram, AraBert and DistilBert) and machine learning algorithms (such as SVM, CNN and BiLSTM). Then, based on the best text classifier, we perform six groups of experiments to demonstrate our approach’s feasibility and efficacy based on several self-training iterations.http://dx.doi.org/10.1080/08839514.2021.1988443 |
spellingShingle | Safa Alsafari Samira Sadaoui Semi-Supervised Self-Training of Hate and Offensive Speech from Social Media Applied Artificial Intelligence |
title | Semi-Supervised Self-Training of Hate and Offensive Speech from Social Media |
title_full | Semi-Supervised Self-Training of Hate and Offensive Speech from Social Media |
title_fullStr | Semi-Supervised Self-Training of Hate and Offensive Speech from Social Media |
title_full_unstemmed | Semi-Supervised Self-Training of Hate and Offensive Speech from Social Media |
title_short | Semi-Supervised Self-Training of Hate and Offensive Speech from Social Media |
title_sort | semi supervised self training of hate and offensive speech from social media |
url | http://dx.doi.org/10.1080/08839514.2021.1988443 |
work_keys_str_mv | AT safaalsafari semisupervisedselftrainingofhateandoffensivespeechfromsocialmedia AT samirasadaoui semisupervisedselftrainingofhateandoffensivespeechfromsocialmedia |