Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter
Twitter enables millions of active users to send and read concise messages on the internet every day. Yet some people use Twitter to propagate violent and threatening messages resulting in cyberbullying. Previous research has focused on whether cyberbullying behavior exists or not in a tweet (binary...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-11-01
|
Series: | Informatics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-9709/7/4/52 |
_version_ | 1797547844384587776 |
---|---|
author | Bandeh Ali Talpur Declan O’Sullivan |
author_facet | Bandeh Ali Talpur Declan O’Sullivan |
author_sort | Bandeh Ali Talpur |
collection | DOAJ |
description | Twitter enables millions of active users to send and read concise messages on the internet every day. Yet some people use Twitter to propagate violent and threatening messages resulting in cyberbullying. Previous research has focused on whether cyberbullying behavior exists or not in a tweet (binary classification). In this research, we developed a model for detecting the severity of cyberbullying in a tweet. The developed model is a feature-based model that uses features from the content of a tweet, to develop a machine learning classifier for classifying the tweets as non-cyberbullied, and low, medium, or high-level cyberbullied tweets. In this study, we introduced pointwise semantic orientation as a new input feature along with utilizing predicted features (gender, age, and personality type) and Twitter API features. Results from experiments with our proposed framework in a multi-class setting are promising both with respect to Kappa (84%), classifier accuracy (93%), and F-measure (92%) metric. Overall, 40% of the classifiers increased performance in comparison with baseline approaches. Our analysis shows that features with the highest odd ratio: for detecting low-level severity include: age group between 19–22 years and users with <1 year of Twitter account activation; for medium-level severity: neuroticism, age group between 23–29 years, and being a Twitter user between one to two years; and for high-level severity: neuroticism and extraversion, and the number of times tweet has been favorited by other users. We believe that this research using a multi-class classification approach provides a step forward in identifying severity at different levels (low, medium, high) when the content of a tweet is classified as cyberbullied. Lastly, the current study only focused on the Twitter platform; other social network platforms can be investigated using the same approach to detect cyberbullying severity patterns. |
first_indexed | 2024-03-10T14:50:00Z |
format | Article |
id | doaj.art-aafd51f9165d4138a9c096814cd53f6f |
institution | Directory Open Access Journal |
issn | 2227-9709 |
language | English |
last_indexed | 2024-03-10T14:50:00Z |
publishDate | 2020-11-01 |
publisher | MDPI AG |
record_format | Article |
series | Informatics |
spelling | doaj.art-aafd51f9165d4138a9c096814cd53f6f2023-11-20T21:02:48ZengMDPI AGInformatics2227-97092020-11-01745210.3390/informatics7040052Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in TwitterBandeh Ali Talpur0Declan O’Sullivan1School of Computer Science and Statistics, Trinity College Dublin, D02 PN40 Dublin, IrelandSchool of Computer Science and Statistics, Trinity College Dublin, D02 PN40 Dublin, IrelandTwitter enables millions of active users to send and read concise messages on the internet every day. Yet some people use Twitter to propagate violent and threatening messages resulting in cyberbullying. Previous research has focused on whether cyberbullying behavior exists or not in a tweet (binary classification). In this research, we developed a model for detecting the severity of cyberbullying in a tweet. The developed model is a feature-based model that uses features from the content of a tweet, to develop a machine learning classifier for classifying the tweets as non-cyberbullied, and low, medium, or high-level cyberbullied tweets. In this study, we introduced pointwise semantic orientation as a new input feature along with utilizing predicted features (gender, age, and personality type) and Twitter API features. Results from experiments with our proposed framework in a multi-class setting are promising both with respect to Kappa (84%), classifier accuracy (93%), and F-measure (92%) metric. Overall, 40% of the classifiers increased performance in comparison with baseline approaches. Our analysis shows that features with the highest odd ratio: for detecting low-level severity include: age group between 19–22 years and users with <1 year of Twitter account activation; for medium-level severity: neuroticism, age group between 23–29 years, and being a Twitter user between one to two years; and for high-level severity: neuroticism and extraversion, and the number of times tweet has been favorited by other users. We believe that this research using a multi-class classification approach provides a step forward in identifying severity at different levels (low, medium, high) when the content of a tweet is classified as cyberbullied. Lastly, the current study only focused on the Twitter platform; other social network platforms can be investigated using the same approach to detect cyberbullying severity patterns.https://www.mdpi.com/2227-9709/7/4/52cyberbullyingTwittersocial networksalgorithms |
spellingShingle | Bandeh Ali Talpur Declan O’Sullivan Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter Informatics cyberbullying social networks algorithms |
title | Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter |
title_full | Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter |
title_fullStr | Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter |
title_full_unstemmed | Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter |
title_short | Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter |
title_sort | multi class imbalance in text classification a feature engineering approach to detect cyberbullying in twitter |
topic | cyberbullying social networks algorithms |
url | https://www.mdpi.com/2227-9709/7/4/52 |
work_keys_str_mv | AT bandehalitalpur multiclassimbalanceintextclassificationafeatureengineeringapproachtodetectcyberbullyingintwitter AT declanosullivan multiclassimbalanceintextclassificationafeatureengineeringapproachtodetectcyberbullyingintwitter |