Cyberbullying Text Identification based on Deep Learning and Transformer-based Language Models

In the contemporary digital age, social media platforms like Facebook, Twitter, and YouTube serve as vital channels for individuals to express ideas and connect with others. Despite fostering increased connectivity, these platforms have inadvertently given rise to negative behaviors, particularly c...

Full description

Bibliographic Details
Main Authors:	Khalid Saifullah, Muhammad Ibrahim Khan, Suhaima Jamal, Iqbal H. Sarker
Format:	Article
Language:	English
Published:	European Alliance for Innovation (EAI) 2024-02-01
Series:	EAI Endorsed Transactions on Industrial Networks and Intelligent Systems
Subjects:	Cyberbullying large language modeling deep learning transformers models natural language processing NLP
Online Access:	https://publications.eai.eu/index.php/inis/article/view/4703

_version_	1797299901463265280
author	Khalid Saifullah Muhammad Ibrahim Khan Suhaima Jamal Iqbal H. Sarker
author_facet	Khalid Saifullah Muhammad Ibrahim Khan Suhaima Jamal Iqbal H. Sarker
author_sort	Khalid Saifullah
collection	DOAJ
description	In the contemporary digital age, social media platforms like Facebook, Twitter, and YouTube serve as vital channels for individuals to express ideas and connect with others. Despite fostering increased connectivity, these platforms have inadvertently given rise to negative behaviors, particularly cyberbullying. While extensive research has been conducted on high-resource languages such as English, there is a notable scarcity of resources for low-resource languages like Bengali, Arabic, Tamil, etc., particularly in terms of language modeling. This study addresses this gap by developing a cyberbullying text identification system called BullyFilterNeT tailored for social media texts, considering Bengali as a test case. The intelligent BullyFilterNeT system devised overcomes Out-of-Vocabulary (OOV) challenges associated with non-contextual embeddings and addresses the limitations of context-aware feature representations. To facilitate a comprehensive understanding, three non-contextual embedding models GloVe, FastText, and Word2Vec are developed for feature extraction in Bengali. These embedding models are utilized in the classification models, employing three statistical models (SVM, SGD, Libsvm), and four deep learning models (CNN, VDCNN, LSTM, GRU). Additionally, the study employs six transformer-based language models: mBERT, bELECTRA, IndicBERT, XML-RoBERTa, DistilBERT, and BanglaBERT, respectively to overcome the limitations of earlier models. Remarkably, BanglaBERT-based BullyFilterNeT achieves the highest accuracy of 88.04% in our test set, underscoring its effectiveness in cyberbullying text identification in the Bengali language.
first_indexed	2024-03-07T22:58:24Z
format	Article
id	doaj.art-8fa7c8b0dd03480c8c9af5a364287c1d
institution	Directory Open Access Journal
issn	2410-0218
language	English
last_indexed	2024-03-07T22:58:24Z
publishDate	2024-02-01
publisher	European Alliance for Innovation (EAI)
record_format	Article
series	EAI Endorsed Transactions on Industrial Networks and Intelligent Systems
spelling	doaj.art-8fa7c8b0dd03480c8c9af5a364287c1d2024-02-22T18:57:22ZengEuropean Alliance for Innovation (EAI)EAI Endorsed Transactions on Industrial Networks and Intelligent Systems2410-02182024-02-0111110.4108/eetinis.v11i1.4703Cyberbullying Text Identification based on Deep Learning and Transformer-based Language ModelsKhalid Saifullah0Muhammad Ibrahim Khan1Suhaima Jamal2Iqbal H. Sarker3Chittagong University of Engineering & Technology Chittagong University of Engineering & Technology Georgia Southern University Edith Cowan University In the contemporary digital age, social media platforms like Facebook, Twitter, and YouTube serve as vital channels for individuals to express ideas and connect with others. Despite fostering increased connectivity, these platforms have inadvertently given rise to negative behaviors, particularly cyberbullying. While extensive research has been conducted on high-resource languages such as English, there is a notable scarcity of resources for low-resource languages like Bengali, Arabic, Tamil, etc., particularly in terms of language modeling. This study addresses this gap by developing a cyberbullying text identification system called BullyFilterNeT tailored for social media texts, considering Bengali as a test case. The intelligent BullyFilterNeT system devised overcomes Out-of-Vocabulary (OOV) challenges associated with non-contextual embeddings and addresses the limitations of context-aware feature representations. To facilitate a comprehensive understanding, three non-contextual embedding models GloVe, FastText, and Word2Vec are developed for feature extraction in Bengali. These embedding models are utilized in the classification models, employing three statistical models (SVM, SGD, Libsvm), and four deep learning models (CNN, VDCNN, LSTM, GRU). Additionally, the study employs six transformer-based language models: mBERT, bELECTRA, IndicBERT, XML-RoBERTa, DistilBERT, and BanglaBERT, respectively to overcome the limitations of earlier models. Remarkably, BanglaBERT-based BullyFilterNeT achieves the highest accuracy of 88.04% in our test set, underscoring its effectiveness in cyberbullying text identification in the Bengali language. https://publications.eai.eu/index.php/inis/article/view/4703Cyberbullyinglarge language modelingdeep learningtransformers modelsnatural language processingNLP
spellingShingle	Khalid Saifullah Muhammad Ibrahim Khan Suhaima Jamal Iqbal H. Sarker Cyberbullying Text Identification based on Deep Learning and Transformer-based Language Models EAI Endorsed Transactions on Industrial Networks and Intelligent Systems Cyberbullying large language modeling deep learning transformers models natural language processing NLP
title	Cyberbullying Text Identification based on Deep Learning and Transformer-based Language Models
title_full	Cyberbullying Text Identification based on Deep Learning and Transformer-based Language Models
title_fullStr	Cyberbullying Text Identification based on Deep Learning and Transformer-based Language Models
title_full_unstemmed	Cyberbullying Text Identification based on Deep Learning and Transformer-based Language Models
title_short	Cyberbullying Text Identification based on Deep Learning and Transformer-based Language Models
title_sort	cyberbullying text identification based on deep learning and transformer based language models
topic	Cyberbullying large language modeling deep learning transformers models natural language processing NLP
url	https://publications.eai.eu/index.php/inis/article/view/4703
work_keys_str_mv	AT khalidsaifullah cyberbullyingtextidentificationbasedondeeplearningandtransformerbasedlanguagemodels AT muhammadibrahimkhan cyberbullyingtextidentificationbasedondeeplearningandtransformerbasedlanguagemodels AT suhaimajamal cyberbullyingtextidentificationbasedondeeplearningandtransformerbasedlanguagemodels AT iqbalhsarker cyberbullyingtextidentificationbasedondeeplearningandtransformerbasedlanguagemodels

Cyberbullying Text Identification based on Deep Learning and Transformer-based Language Models

Similar Items