arHateDetector: detection of hate speech from standard and dialectal Arabic Tweets

Abstract Hate speech has become a phenomenon on social media platforms, such as Twitter. These websites and apps that were initially designed to facilitate our expression of free speech, are sometimes being used to spread hate towards each other. In the Arab region, Twitter is a very popular social...

Full description

Bibliographic Details
Main Authors:	Ramzi Khezzar, Abdelrahman Moursi, Zaher Al Aghbari
Format:	Article
Language:	English
Published:	Springer 2023-03-01
Series:	Discover Internet of Things
Subjects:	Hate speech Arabic Twitter Machine learning Deep learning
Online Access:	https://doi.org/10.1007/s43926-023-00030-9

_version_	1797863813263917056
author	Ramzi Khezzar Abdelrahman Moursi Zaher Al Aghbari
author_facet	Ramzi Khezzar Abdelrahman Moursi Zaher Al Aghbari
author_sort	Ramzi Khezzar
collection	DOAJ
description	Abstract Hate speech has become a phenomenon on social media platforms, such as Twitter. These websites and apps that were initially designed to facilitate our expression of free speech, are sometimes being used to spread hate towards each other. In the Arab region, Twitter is a very popular social media platform and thus the number of tweets that contain hate speech is increasing rapidly. Many tweets are written either in standard, dialectal Arabic, or mix. Existing work on Arabic hate speech are targeted towards either standard or single dialectal text, but not both. To fight hate speech more efficiently, in this paper, we conducted extensive experiments to investigate Arabic hate speech in tweets. Therefore, we propose a framework, called arHateDetector, that detects hate speech in the Arabic text of tweets. The proposed arHateDetector supports both standard and several dialectal Arabic. A large Arabic hate speech dataset, called arHateDataset, was compiled from several Arabic standard and dialectal tweets. The tweets are preprocessed to remove the unwanted content. We investigated the use of recent machine learning and deep learning models such as AraBERT to detect hate speech. All classification models used in the investigation are trained with the compiled dataset. Our experiments shows that AraBERT outperformed the other models producing the best performance across seven different datasets including the compiled arHateDataset with an accuracy of 93%. CNN and LinearSVC produced 88% and 89% respectively.
first_indexed	2024-04-09T22:41:40Z
format	Article
id	doaj.art-7f9f68b56118484cb9c78e1ade711d60
institution	Directory Open Access Journal
issn	2730-7239
language	English
last_indexed	2024-04-09T22:41:40Z
publishDate	2023-03-01
publisher	Springer
record_format	Article
series	Discover Internet of Things
spelling	doaj.art-7f9f68b56118484cb9c78e1ade711d602023-03-22T12:06:22ZengSpringerDiscover Internet of Things2730-72392023-03-013111310.1007/s43926-023-00030-9arHateDetector: detection of hate speech from standard and dialectal Arabic TweetsRamzi Khezzar0Abdelrahman Moursi1Zaher Al Aghbari2Department of Computer Science, University of SharjahDepartment of Computer Science, University of SharjahDepartment of Computer Science, University of SharjahAbstract Hate speech has become a phenomenon on social media platforms, such as Twitter. These websites and apps that were initially designed to facilitate our expression of free speech, are sometimes being used to spread hate towards each other. In the Arab region, Twitter is a very popular social media platform and thus the number of tweets that contain hate speech is increasing rapidly. Many tweets are written either in standard, dialectal Arabic, or mix. Existing work on Arabic hate speech are targeted towards either standard or single dialectal text, but not both. To fight hate speech more efficiently, in this paper, we conducted extensive experiments to investigate Arabic hate speech in tweets. Therefore, we propose a framework, called arHateDetector, that detects hate speech in the Arabic text of tweets. The proposed arHateDetector supports both standard and several dialectal Arabic. A large Arabic hate speech dataset, called arHateDataset, was compiled from several Arabic standard and dialectal tweets. The tweets are preprocessed to remove the unwanted content. We investigated the use of recent machine learning and deep learning models such as AraBERT to detect hate speech. All classification models used in the investigation are trained with the compiled dataset. Our experiments shows that AraBERT outperformed the other models producing the best performance across seven different datasets including the compiled arHateDataset with an accuracy of 93%. CNN and LinearSVC produced 88% and 89% respectively.https://doi.org/10.1007/s43926-023-00030-9Hate speechArabicTwitterMachine learningDeep learning
spellingShingle	Ramzi Khezzar Abdelrahman Moursi Zaher Al Aghbari arHateDetector: detection of hate speech from standard and dialectal Arabic Tweets Discover Internet of Things Hate speech Arabic Twitter Machine learning Deep learning
title	arHateDetector: detection of hate speech from standard and dialectal Arabic Tweets
title_full	arHateDetector: detection of hate speech from standard and dialectal Arabic Tweets
title_fullStr	arHateDetector: detection of hate speech from standard and dialectal Arabic Tweets
title_full_unstemmed	arHateDetector: detection of hate speech from standard and dialectal Arabic Tweets
title_short	arHateDetector: detection of hate speech from standard and dialectal Arabic Tweets
title_sort	arhatedetector detection of hate speech from standard and dialectal arabic tweets
topic	Hate speech Arabic Twitter Machine learning Deep learning
url	https://doi.org/10.1007/s43926-023-00030-9
work_keys_str_mv	AT ramzikhezzar arhatedetectordetectionofhatespeechfromstandardanddialectalarabictweets AT abdelrahmanmoursi arhatedetectordetectionofhatespeechfromstandardanddialectalarabictweets AT zaheralaghbari arhatedetectordetectionofhatespeechfromstandardanddialectalarabictweets

arHateDetector: detection of hate speech from standard and dialectal Arabic Tweets

Similar Items