Cyberbullying Detection: Hybrid Models Based on Machine Learning and Natural Language Processing Techniques

The rise in web and social media interactions has resulted in the efortless proliferation of offensive language and hate speech. Such online harassment, insults, and attacks are commonly termed cyberbullying. The sheer volume of user-generated content has made it challenging to identify such illicit...

Full description

Bibliographic Details
Main Authors: Chahat Raj, Ayush Agarwal, Gnana Bharathy, Bhuva Narayan, Mukesh Prasad
Format: Article
Language:English
Published: MDPI AG 2021-11-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/10/22/2810
_version_ 1797510535065894912
author Chahat Raj
Ayush Agarwal
Gnana Bharathy
Bhuva Narayan
Mukesh Prasad
author_facet Chahat Raj
Ayush Agarwal
Gnana Bharathy
Bhuva Narayan
Mukesh Prasad
author_sort Chahat Raj
collection DOAJ
description The rise in web and social media interactions has resulted in the efortless proliferation of offensive language and hate speech. Such online harassment, insults, and attacks are commonly termed cyberbullying. The sheer volume of user-generated content has made it challenging to identify such illicit content. Machine learning has wide applications in text classification, and researchers are shifting towards using deep neural networks in detecting cyberbullying due to the several advantages they have over traditional machine learning algorithms. This paper proposes a novel neural network framework with parameter optimization and an algorithmic comparative study of eleven classification methods: four traditional machine learning and seven shallow neural networks on two real world cyberbullying datasets. In addition, this paper also examines the effect of feature extraction and word-embedding-techniques-based natural language processing on algorithmic performance. Key observations from this study show that bidirectional neural networks and attention models provide high classification results. Logistic Regression was observed to be the best among the traditional machine learning classifiers used. Term Frequency-Inverse Document Frequency (TF-IDF) demonstrates consistently high accuracies with traditional machine learning techniques. Global Vectors (GloVe) perform better with neural network models. Bi-GRU and Bi-LSTM worked best amongst the neural networks used. The extensive experiments performed on the two datasets establish the importance of this work by comparing eleven classification methods and seven feature extraction techniques. Our proposed shallow neural networks outperform existing state-of-the-art approaches for cyberbullying detection, with accuracy and F1-scores as high as ~95% and ~98%, respectively.
first_indexed 2024-03-10T05:33:35Z
format Article
id doaj.art-b7ac124425df485898e1232de3eb6bf3
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-10T05:33:35Z
publishDate 2021-11-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-b7ac124425df485898e1232de3eb6bf32023-11-22T23:07:26ZengMDPI AGElectronics2079-92922021-11-011022281010.3390/electronics10222810Cyberbullying Detection: Hybrid Models Based on Machine Learning and Natural Language Processing TechniquesChahat Raj0Ayush Agarwal1Gnana Bharathy2Bhuva Narayan3Mukesh Prasad4School of Computer Science, FEIT, University of Technology Sydney, Sydney, NSW 2007, AustraliaDepartment of Information Technology, Delhi Technological University, Delhi 110042, IndiaSchool of Computer Science, FEIT, University of Technology Sydney, Sydney, NSW 2007, AustraliaSchool of Communication, FASS, University of Technology Sydney, Sydney, NSW 2007, AustraliaSchool of Computer Science, FEIT, University of Technology Sydney, Sydney, NSW 2007, AustraliaThe rise in web and social media interactions has resulted in the efortless proliferation of offensive language and hate speech. Such online harassment, insults, and attacks are commonly termed cyberbullying. The sheer volume of user-generated content has made it challenging to identify such illicit content. Machine learning has wide applications in text classification, and researchers are shifting towards using deep neural networks in detecting cyberbullying due to the several advantages they have over traditional machine learning algorithms. This paper proposes a novel neural network framework with parameter optimization and an algorithmic comparative study of eleven classification methods: four traditional machine learning and seven shallow neural networks on two real world cyberbullying datasets. In addition, this paper also examines the effect of feature extraction and word-embedding-techniques-based natural language processing on algorithmic performance. Key observations from this study show that bidirectional neural networks and attention models provide high classification results. Logistic Regression was observed to be the best among the traditional machine learning classifiers used. Term Frequency-Inverse Document Frequency (TF-IDF) demonstrates consistently high accuracies with traditional machine learning techniques. Global Vectors (GloVe) perform better with neural network models. Bi-GRU and Bi-LSTM worked best amongst the neural networks used. The extensive experiments performed on the two datasets establish the importance of this work by comparing eleven classification methods and seven feature extraction techniques. Our proposed shallow neural networks outperform existing state-of-the-art approaches for cyberbullying detection, with accuracy and F1-scores as high as ~95% and ~98%, respectively.https://www.mdpi.com/2079-9292/10/22/2810cyberbullyinghate speechoffensive languagemachine learningneural networksdeep learning
spellingShingle Chahat Raj
Ayush Agarwal
Gnana Bharathy
Bhuva Narayan
Mukesh Prasad
Cyberbullying Detection: Hybrid Models Based on Machine Learning and Natural Language Processing Techniques
Electronics
cyberbullying
hate speech
offensive language
machine learning
neural networks
deep learning
title Cyberbullying Detection: Hybrid Models Based on Machine Learning and Natural Language Processing Techniques
title_full Cyberbullying Detection: Hybrid Models Based on Machine Learning and Natural Language Processing Techniques
title_fullStr Cyberbullying Detection: Hybrid Models Based on Machine Learning and Natural Language Processing Techniques
title_full_unstemmed Cyberbullying Detection: Hybrid Models Based on Machine Learning and Natural Language Processing Techniques
title_short Cyberbullying Detection: Hybrid Models Based on Machine Learning and Natural Language Processing Techniques
title_sort cyberbullying detection hybrid models based on machine learning and natural language processing techniques
topic cyberbullying
hate speech
offensive language
machine learning
neural networks
deep learning
url https://www.mdpi.com/2079-9292/10/22/2810
work_keys_str_mv AT chahatraj cyberbullyingdetectionhybridmodelsbasedonmachinelearningandnaturallanguageprocessingtechniques
AT ayushagarwal cyberbullyingdetectionhybridmodelsbasedonmachinelearningandnaturallanguageprocessingtechniques
AT gnanabharathy cyberbullyingdetectionhybridmodelsbasedonmachinelearningandnaturallanguageprocessingtechniques
AT bhuvanarayan cyberbullyingdetectionhybridmodelsbasedonmachinelearningandnaturallanguageprocessingtechniques
AT mukeshprasad cyberbullyingdetectionhybridmodelsbasedonmachinelearningandnaturallanguageprocessingtechniques