Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid Model

The popularization of the internet and the widespread use of smartphones have led to a rapid growth in the number of social media users. While information technology has brought convenience to people, it has also given rise to cyberbullying, which has a serious negative impact. The identity of onlin...

Full description

Bibliographic Details
Main Authors: Shifeng Chen, Jialin Wang, Ketai He
Format: Article
Language:English
Published: MDPI AG 2024-02-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/15/2/93
_version_ 1797297936739074048
author Shifeng Chen
Jialin Wang
Ketai He
author_facet Shifeng Chen
Jialin Wang
Ketai He
author_sort Shifeng Chen
collection DOAJ
description The popularization of the internet and the widespread use of smartphones have led to a rapid growth in the number of social media users. While information technology has brought convenience to people, it has also given rise to cyberbullying, which has a serious negative impact. The identity of online users is hidden, and due to the lack of supervision and the imperfections of relevant laws and policies, cyberbullying occurs from time to time, bringing serious mental harm and psychological trauma to the victims. The pre-trained language model BERT (Bidirectional Encoder Representations from Transformers) has achieved good results in the field of natural language processing, which can be used for cyberbullying detection. In this research, we construct a variety of traditional machine learning, deep learning and Chinese pre-trained language models as a baseline, and propose a hybrid model based on a variant of BERT: XLNet, and deep Bi-LSTM for Chinese cyberbullying detection. In addition, real cyber bullying remarks are collected to expand the Chinese offensive language dataset COLDATASET. The performance of the proposed model outperforms all baseline models on this dataset, improving 4.29% compared to SVM—the best performing method in traditional machine learning, 1.49% compared to GRU—the best performing method in deep learning, and 1.13% compared to BERT.
first_indexed 2024-03-07T22:27:33Z
format Article
id doaj.art-551418771f454570b52c912ca7998054
institution Directory Open Access Journal
issn 2078-2489
language English
last_indexed 2024-03-07T22:27:33Z
publishDate 2024-02-01
publisher MDPI AG
record_format Article
series Information
spelling doaj.art-551418771f454570b52c912ca79980542024-02-23T15:21:05ZengMDPI AGInformation2078-24892024-02-011529310.3390/info15020093Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid ModelShifeng Chen0Jialin Wang1Ketai He2School of Mechanical Engineering, University of Science and Technology Beijing, Beijing 100083, ChinaSchool of Mechanical Engineering, University of Science and Technology Beijing, Beijing 100083, ChinaSchool of Mechanical Engineering, University of Science and Technology Beijing, Beijing 100083, ChinaThe popularization of the internet and the widespread use of smartphones have led to a rapid growth in the number of social media users. While information technology has brought convenience to people, it has also given rise to cyberbullying, which has a serious negative impact. The identity of online users is hidden, and due to the lack of supervision and the imperfections of relevant laws and policies, cyberbullying occurs from time to time, bringing serious mental harm and psychological trauma to the victims. The pre-trained language model BERT (Bidirectional Encoder Representations from Transformers) has achieved good results in the field of natural language processing, which can be used for cyberbullying detection. In this research, we construct a variety of traditional machine learning, deep learning and Chinese pre-trained language models as a baseline, and propose a hybrid model based on a variant of BERT: XLNet, and deep Bi-LSTM for Chinese cyberbullying detection. In addition, real cyber bullying remarks are collected to expand the Chinese offensive language dataset COLDATASET. The performance of the proposed model outperforms all baseline models on this dataset, improving 4.29% compared to SVM—the best performing method in traditional machine learning, 1.49% compared to GRU—the best performing method in deep learning, and 1.13% compared to BERT.https://www.mdpi.com/2078-2489/15/2/93social mediacyberbullying detectiondeep learninglanguage model
spellingShingle Shifeng Chen
Jialin Wang
Ketai He
Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid Model
Information
social media
cyberbullying detection
deep learning
language model
title Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid Model
title_full Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid Model
title_fullStr Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid Model
title_full_unstemmed Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid Model
title_short Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid Model
title_sort chinese cyberbullying detection using xlnet and deep bi lstm hybrid model
topic social media
cyberbullying detection
deep learning
language model
url https://www.mdpi.com/2078-2489/15/2/93
work_keys_str_mv AT shifengchen chinesecyberbullyingdetectionusingxlnetanddeepbilstmhybridmodel
AT jialinwang chinesecyberbullyingdetectionusingxlnetanddeepbilstmhybridmodel
AT ketaihe chinesecyberbullyingdetectionusingxlnetanddeepbilstmhybridmodel