Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter

Twitter enables millions of active users to send and read concise messages on the internet every day. Yet some people use Twitter to propagate violent and threatening messages resulting in cyberbullying. Previous research has focused on whether cyberbullying behavior exists or not in a tweet (binary...

Full description

Bibliographic Details
Main Authors:	Bandeh Ali Talpur, Declan O’Sullivan
Format:	Article
Language:	English
Published:	MDPI AG 2020-11-01
Series:	Informatics
Subjects:	cyberbullying Twitter social networks algorithms
Online Access:	https://www.mdpi.com/2227-9709/7/4/52

_version_	1797547844384587776
author	Bandeh Ali Talpur Declan O’Sullivan
author_facet	Bandeh Ali Talpur Declan O’Sullivan
author_sort	Bandeh Ali Talpur
collection	DOAJ
description	Twitter enables millions of active users to send and read concise messages on the internet every day. Yet some people use Twitter to propagate violent and threatening messages resulting in cyberbullying. Previous research has focused on whether cyberbullying behavior exists or not in a tweet (binary classification). In this research, we developed a model for detecting the severity of cyberbullying in a tweet. The developed model is a feature-based model that uses features from the content of a tweet, to develop a machine learning classifier for classifying the tweets as non-cyberbullied, and low, medium, or high-level cyberbullied tweets. In this study, we introduced pointwise semantic orientation as a new input feature along with utilizing predicted features (gender, age, and personality type) and Twitter API features. Results from experiments with our proposed framework in a multi-class setting are promising both with respect to Kappa (84%), classifier accuracy (93%), and F-measure (92%) metric. Overall, 40% of the classifiers increased performance in comparison with baseline approaches. Our analysis shows that features with the highest odd ratio: for detecting low-level severity include: age group between 19–22 years and users with <1 year of Twitter account activation; for medium-level severity: neuroticism, age group between 23–29 years, and being a Twitter user between one to two years; and for high-level severity: neuroticism and extraversion, and the number of times tweet has been favorited by other users. We believe that this research using a multi-class classification approach provides a step forward in identifying severity at different levels (low, medium, high) when the content of a tweet is classified as cyberbullied. Lastly, the current study only focused on the Twitter platform; other social network platforms can be investigated using the same approach to detect cyberbullying severity patterns.
first_indexed	2024-03-10T14:50:00Z
format	Article
id	doaj.art-aafd51f9165d4138a9c096814cd53f6f
institution	Directory Open Access Journal
issn	2227-9709
language	English
last_indexed	2024-03-10T14:50:00Z
publishDate	2020-11-01
publisher	MDPI AG
record_format	Article
series	Informatics
spelling	doaj.art-aafd51f9165d4138a9c096814cd53f6f2023-11-20T21:02:48ZengMDPI AGInformatics2227-97092020-11-01745210.3390/informatics7040052Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in TwitterBandeh Ali Talpur0Declan O’Sullivan1School of Computer Science and Statistics, Trinity College Dublin, D02 PN40 Dublin, IrelandSchool of Computer Science and Statistics, Trinity College Dublin, D02 PN40 Dublin, IrelandTwitter enables millions of active users to send and read concise messages on the internet every day. Yet some people use Twitter to propagate violent and threatening messages resulting in cyberbullying. Previous research has focused on whether cyberbullying behavior exists or not in a tweet (binary classification). In this research, we developed a model for detecting the severity of cyberbullying in a tweet. The developed model is a feature-based model that uses features from the content of a tweet, to develop a machine learning classifier for classifying the tweets as non-cyberbullied, and low, medium, or high-level cyberbullied tweets. In this study, we introduced pointwise semantic orientation as a new input feature along with utilizing predicted features (gender, age, and personality type) and Twitter API features. Results from experiments with our proposed framework in a multi-class setting are promising both with respect to Kappa (84%), classifier accuracy (93%), and F-measure (92%) metric. Overall, 40% of the classifiers increased performance in comparison with baseline approaches. Our analysis shows that features with the highest odd ratio: for detecting low-level severity include: age group between 19–22 years and users with <1 year of Twitter account activation; for medium-level severity: neuroticism, age group between 23–29 years, and being a Twitter user between one to two years; and for high-level severity: neuroticism and extraversion, and the number of times tweet has been favorited by other users. We believe that this research using a multi-class classification approach provides a step forward in identifying severity at different levels (low, medium, high) when the content of a tweet is classified as cyberbullied. Lastly, the current study only focused on the Twitter platform; other social network platforms can be investigated using the same approach to detect cyberbullying severity patterns.https://www.mdpi.com/2227-9709/7/4/52cyberbullyingTwittersocial networksalgorithms
spellingShingle	Bandeh Ali Talpur Declan O’Sullivan Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter Informatics cyberbullying Twitter social networks algorithms
title	Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter
title_full	Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter
title_fullStr	Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter
title_full_unstemmed	Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter
title_short	Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter
title_sort	multi class imbalance in text classification a feature engineering approach to detect cyberbullying in twitter
topic	cyberbullying Twitter social networks algorithms
url	https://www.mdpi.com/2227-9709/7/4/52
work_keys_str_mv	AT bandehalitalpur multiclassimbalanceintextclassificationafeatureengineeringapproachtodetectcyberbullyingintwitter AT declanosullivan multiclassimbalanceintextclassificationafeatureengineeringapproachtodetectcyberbullyingintwitter

Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter

Similar Items