Innovative deep learning techniques for monitoring aggressive behavior in social media posts

Abstract The study aims to evaluate and compare the performance of various machine learning (ML) classifiers in the context of detecting cyber-trolling behaviors. With the rising prevalence of online harassment, developing effective automated tools for aggression detection in digital communications...

Full description

Bibliographic Details
Main Authors: Huimin Han, Muhammad Asif, Emad Mahrous Awwad, Nadia Sarhan, Yazeed Yasid Ghadi, Bo Xu
Format: Article
Language:English
Published: SpringerOpen 2024-01-01
Series:Journal of Cloud Computing: Advances, Systems and Applications
Subjects:
Online Access:https://doi.org/10.1186/s13677-023-00577-6
Description
Summary:Abstract The study aims to evaluate and compare the performance of various machine learning (ML) classifiers in the context of detecting cyber-trolling behaviors. With the rising prevalence of online harassment, developing effective automated tools for aggression detection in digital communications has become imperative. This research assesses the efficacy of Random Forest, Light Gradient Boosting Machine (LightGBM), Logistic Regression, Support Vector Machine (SVM), and Naive Bayes classifiers in identifying cyber troll posts within a publicly available dataset. Each ML classifier was trained and tested on a dataset curated for the detection of cyber trolls. The performance of the classifiers was gauged using confusion matrices, which provide detailed counts of true positives, true negatives, false positives, and false negatives. These metrics were then utilized to calculate the accuracy, precision, recall, and F1 scores to better understand each model’s predictive capabilities. The Random Forest classifier outperformed other models, exhibiting the highest accuracy and balanced precision-recall trade-off, as indicated by the highest true positive and true negative rates, alongside the lowest false positive and false negative rates. LightGBM, while effective, showed a tendency towards higher false predictions. Logistic Regression, SVM, and Naive Bayes displayed identical confusion matrix results, an anomaly suggesting potential data handling or model application issues that warrant further investigation. The findings underscore the effectiveness of ensemble methods, with Random Forest leading in the cyber troll detection task. The study highlights the importance of selecting appropriate ML algorithms for text classification tasks in social media contexts and emphasizes the need for further scrutiny into the anomaly observed among the Logistic Regression, SVM, and Naive Bayes results. Future work will focus on exploring the reasons behind this occurrence and the potential of deep learning techniques in enhancing detection performance.
ISSN:2192-113X