Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid Model

The popularization of the internet and the widespread use of smartphones have led to a rapid growth in the number of social media users. While information technology has brought convenience to people, it has also given rise to cyberbullying, which has a serious negative impact. The identity of onlin...

Full description

Bibliographic Details
Main Authors:	Shifeng Chen, Jialin Wang, Ketai He
Format:	Article
Language:	English
Published:	MDPI AG 2024-02-01
Series:	Information
Subjects:	social media cyberbullying detection deep learning language model
Online Access:	https://www.mdpi.com/2078-2489/15/2/93

_version_	1827343458799452160
author	Shifeng Chen Jialin Wang Ketai He
author_facet	Shifeng Chen Jialin Wang Ketai He
author_sort	Shifeng Chen
collection	DOAJ
description	The popularization of the internet and the widespread use of smartphones have led to a rapid growth in the number of social media users. While information technology has brought convenience to people, it has also given rise to cyberbullying, which has a serious negative impact. The identity of online users is hidden, and due to the lack of supervision and the imperfections of relevant laws and policies, cyberbullying occurs from time to time, bringing serious mental harm and psychological trauma to the victims. The pre-trained language model BERT (Bidirectional Encoder Representations from Transformers) has achieved good results in the field of natural language processing, which can be used for cyberbullying detection. In this research, we construct a variety of traditional machine learning, deep learning and Chinese pre-trained language models as a baseline, and propose a hybrid model based on a variant of BERT: XLNet, and deep Bi-LSTM for Chinese cyberbullying detection. In addition, real cyber bullying remarks are collected to expand the Chinese offensive language dataset COLDATASET. The performance of the proposed model outperforms all baseline models on this dataset, improving 4.29% compared to SVM—the best performing method in traditional machine learning, 1.49% compared to GRU—the best performing method in deep learning, and 1.13% compared to BERT.
first_indexed	2024-03-07T22:27:33Z
format	Article
id	doaj.art-551418771f454570b52c912ca7998054
institution	Directory Open Access Journal
issn	2078-2489
language	English
last_indexed	2024-03-07T22:27:33Z
publishDate	2024-02-01
publisher	MDPI AG
record_format	Article
series	Information
spelling	doaj.art-551418771f454570b52c912ca79980542024-02-23T15:21:05ZengMDPI AGInformation2078-24892024-02-011529310.3390/info15020093Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid ModelShifeng Chen0Jialin Wang1Ketai He2School of Mechanical Engineering, University of Science and Technology Beijing, Beijing 100083, ChinaSchool of Mechanical Engineering, University of Science and Technology Beijing, Beijing 100083, ChinaSchool of Mechanical Engineering, University of Science and Technology Beijing, Beijing 100083, ChinaThe popularization of the internet and the widespread use of smartphones have led to a rapid growth in the number of social media users. While information technology has brought convenience to people, it has also given rise to cyberbullying, which has a serious negative impact. The identity of online users is hidden, and due to the lack of supervision and the imperfections of relevant laws and policies, cyberbullying occurs from time to time, bringing serious mental harm and psychological trauma to the victims. The pre-trained language model BERT (Bidirectional Encoder Representations from Transformers) has achieved good results in the field of natural language processing, which can be used for cyberbullying detection. In this research, we construct a variety of traditional machine learning, deep learning and Chinese pre-trained language models as a baseline, and propose a hybrid model based on a variant of BERT: XLNet, and deep Bi-LSTM for Chinese cyberbullying detection. In addition, real cyber bullying remarks are collected to expand the Chinese offensive language dataset COLDATASET. The performance of the proposed model outperforms all baseline models on this dataset, improving 4.29% compared to SVM—the best performing method in traditional machine learning, 1.49% compared to GRU—the best performing method in deep learning, and 1.13% compared to BERT.https://www.mdpi.com/2078-2489/15/2/93social mediacyberbullying detectiondeep learninglanguage model
spellingShingle	Shifeng Chen Jialin Wang Ketai He Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid Model Information social media cyberbullying detection deep learning language model
title	Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid Model
title_full	Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid Model
title_fullStr	Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid Model
title_full_unstemmed	Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid Model
title_short	Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid Model
title_sort	chinese cyberbullying detection using xlnet and deep bi lstm hybrid model
topic	social media cyberbullying detection deep learning language model
url	https://www.mdpi.com/2078-2489/15/2/93
work_keys_str_mv	AT shifengchen chinesecyberbullyingdetectionusingxlnetanddeepbilstmhybridmodel AT jialinwang chinesecyberbullyingdetectionusingxlnetanddeepbilstmhybridmodel AT ketaihe chinesecyberbullyingdetectionusingxlnetanddeepbilstmhybridmodel

Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid Model

Similar Items