Detecting Arabic sexual harassment using bidirectional long-short-term memory and a temporal convolutional network

Due to advances in technology, social media has become the most popular medium for spreading news. Many messages are published on social media sites such as Facebook, Twitter, Instagram, etc. Social media platforms also provide opportunities to express opinions and social phenomena such as hate, off...

Full description

Bibliographic Details
Main Authors: Noor Amer Hamzah, Ban N. Dhannoon
Format: Article
Language:English
Published: Elsevier 2023-07-01
Series:Egyptian Informatics Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1110866523000300
_version_ 1797812071728939008
author Noor Amer Hamzah
Ban N. Dhannoon
author_facet Noor Amer Hamzah
Ban N. Dhannoon
author_sort Noor Amer Hamzah
collection DOAJ
description Due to advances in technology, social media has become the most popular medium for spreading news. Many messages are published on social media sites such as Facebook, Twitter, Instagram, etc. Social media platforms also provide opportunities to express opinions and social phenomena such as hate, offensive language, racism, sexual content, and all forms of verbal violence, which have amazingly increased. These behaviors do not only affect specific countries, groups, or societies but extend beyond these areas into people's daily lives. This study examines sexual content and harassment discourse in Arabic social media to build an accurate system for detecting sexual harassment expressions. The dataset was collected from Twitter posts to make the classification. A deep learning model was developed as a classification system to identify sexual speech using Bidirectional Long-Short-Term Memory (BiLSTM), Temporal Convolutional Network (TCN) with word embedding and the FastText previously trained on the Arabic language model. The proposed (TCN-BiLSTM) model was compared with Extreme Gradient Boosting (XGBoost). The CASH dataset implemented with the (TCN -Bi-LSTM) model gate obtained an accuracy rate of 96.65% and an F0.5 value of 0.969. The implementation of XGBoost using word embeddings resulted in an accuracy rate of 92.56% and an F0.5 value of 0.925. Findings and manual interpretation showed that different text representation methods with various deep learning algorithms obtain higher classification performance easily in complex sentences. This strategy is helpful with languages that are difficult to study morphologically, like Arabic, Turkish, and Lithuanian.
first_indexed 2024-03-13T07:33:09Z
format Article
id doaj.art-0d3c184cebb14d4fb3bd5d6f0c38da77
institution Directory Open Access Journal
issn 1110-8665
language English
last_indexed 2024-03-13T07:33:09Z
publishDate 2023-07-01
publisher Elsevier
record_format Article
series Egyptian Informatics Journal
spelling doaj.art-0d3c184cebb14d4fb3bd5d6f0c38da772023-06-04T04:23:21ZengElsevierEgyptian Informatics Journal1110-86652023-07-01242365373Detecting Arabic sexual harassment using bidirectional long-short-term memory and a temporal convolutional networkNoor Amer Hamzah0Ban N. Dhannoon1Corresponding authors.; Al-Nahrain University, Baghdad, IraqCorresponding authors.; Al-Nahrain University, Baghdad, IraqDue to advances in technology, social media has become the most popular medium for spreading news. Many messages are published on social media sites such as Facebook, Twitter, Instagram, etc. Social media platforms also provide opportunities to express opinions and social phenomena such as hate, offensive language, racism, sexual content, and all forms of verbal violence, which have amazingly increased. These behaviors do not only affect specific countries, groups, or societies but extend beyond these areas into people's daily lives. This study examines sexual content and harassment discourse in Arabic social media to build an accurate system for detecting sexual harassment expressions. The dataset was collected from Twitter posts to make the classification. A deep learning model was developed as a classification system to identify sexual speech using Bidirectional Long-Short-Term Memory (BiLSTM), Temporal Convolutional Network (TCN) with word embedding and the FastText previously trained on the Arabic language model. The proposed (TCN-BiLSTM) model was compared with Extreme Gradient Boosting (XGBoost). The CASH dataset implemented with the (TCN -Bi-LSTM) model gate obtained an accuracy rate of 96.65% and an F0.5 value of 0.969. The implementation of XGBoost using word embeddings resulted in an accuracy rate of 92.56% and an F0.5 value of 0.925. Findings and manual interpretation showed that different text representation methods with various deep learning algorithms obtain higher classification performance easily in complex sentences. This strategy is helpful with languages that are difficult to study morphologically, like Arabic, Turkish, and Lithuanian.http://www.sciencedirect.com/science/article/pii/S1110866523000300Sexual harassmentsArabic text classification Sentiment AnalysisNatural Language ProcessingWord embeddingDeep learningXGBoost
spellingShingle Noor Amer Hamzah
Ban N. Dhannoon
Detecting Arabic sexual harassment using bidirectional long-short-term memory and a temporal convolutional network
Egyptian Informatics Journal
Sexual harassments
Arabic text classification Sentiment Analysis
Natural Language Processing
Word embedding
Deep learning
XGBoost
title Detecting Arabic sexual harassment using bidirectional long-short-term memory and a temporal convolutional network
title_full Detecting Arabic sexual harassment using bidirectional long-short-term memory and a temporal convolutional network
title_fullStr Detecting Arabic sexual harassment using bidirectional long-short-term memory and a temporal convolutional network
title_full_unstemmed Detecting Arabic sexual harassment using bidirectional long-short-term memory and a temporal convolutional network
title_short Detecting Arabic sexual harassment using bidirectional long-short-term memory and a temporal convolutional network
title_sort detecting arabic sexual harassment using bidirectional long short term memory and a temporal convolutional network
topic Sexual harassments
Arabic text classification Sentiment Analysis
Natural Language Processing
Word embedding
Deep learning
XGBoost
url http://www.sciencedirect.com/science/article/pii/S1110866523000300
work_keys_str_mv AT nooramerhamzah detectingarabicsexualharassmentusingbidirectionallongshorttermmemoryandatemporalconvolutionalnetwork
AT banndhannoon detectingarabicsexualharassmentusingbidirectionallongshorttermmemoryandatemporalconvolutionalnetwork