Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid Model
The popularization of the internet and the widespread use of smartphones have led to a rapid growth in the number of social media users. While information technology has brought convenience to people, it has also given rise to cyberbullying, which has a serious negative impact. The identity of onlin...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-02-01
|
Series: | Information |
Subjects: | |
Online Access: | https://www.mdpi.com/2078-2489/15/2/93 |
_version_ | 1797297936739074048 |
---|---|
author | Shifeng Chen Jialin Wang Ketai He |
author_facet | Shifeng Chen Jialin Wang Ketai He |
author_sort | Shifeng Chen |
collection | DOAJ |
description | The popularization of the internet and the widespread use of smartphones have led to a rapid growth in the number of social media users. While information technology has brought convenience to people, it has also given rise to cyberbullying, which has a serious negative impact. The identity of online users is hidden, and due to the lack of supervision and the imperfections of relevant laws and policies, cyberbullying occurs from time to time, bringing serious mental harm and psychological trauma to the victims. The pre-trained language model BERT (Bidirectional Encoder Representations from Transformers) has achieved good results in the field of natural language processing, which can be used for cyberbullying detection. In this research, we construct a variety of traditional machine learning, deep learning and Chinese pre-trained language models as a baseline, and propose a hybrid model based on a variant of BERT: XLNet, and deep Bi-LSTM for Chinese cyberbullying detection. In addition, real cyber bullying remarks are collected to expand the Chinese offensive language dataset COLDATASET. The performance of the proposed model outperforms all baseline models on this dataset, improving 4.29% compared to SVM—the best performing method in traditional machine learning, 1.49% compared to GRU—the best performing method in deep learning, and 1.13% compared to BERT. |
first_indexed | 2024-03-07T22:27:33Z |
format | Article |
id | doaj.art-551418771f454570b52c912ca7998054 |
institution | Directory Open Access Journal |
issn | 2078-2489 |
language | English |
last_indexed | 2024-03-07T22:27:33Z |
publishDate | 2024-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Information |
spelling | doaj.art-551418771f454570b52c912ca79980542024-02-23T15:21:05ZengMDPI AGInformation2078-24892024-02-011529310.3390/info15020093Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid ModelShifeng Chen0Jialin Wang1Ketai He2School of Mechanical Engineering, University of Science and Technology Beijing, Beijing 100083, ChinaSchool of Mechanical Engineering, University of Science and Technology Beijing, Beijing 100083, ChinaSchool of Mechanical Engineering, University of Science and Technology Beijing, Beijing 100083, ChinaThe popularization of the internet and the widespread use of smartphones have led to a rapid growth in the number of social media users. While information technology has brought convenience to people, it has also given rise to cyberbullying, which has a serious negative impact. The identity of online users is hidden, and due to the lack of supervision and the imperfections of relevant laws and policies, cyberbullying occurs from time to time, bringing serious mental harm and psychological trauma to the victims. The pre-trained language model BERT (Bidirectional Encoder Representations from Transformers) has achieved good results in the field of natural language processing, which can be used for cyberbullying detection. In this research, we construct a variety of traditional machine learning, deep learning and Chinese pre-trained language models as a baseline, and propose a hybrid model based on a variant of BERT: XLNet, and deep Bi-LSTM for Chinese cyberbullying detection. In addition, real cyber bullying remarks are collected to expand the Chinese offensive language dataset COLDATASET. The performance of the proposed model outperforms all baseline models on this dataset, improving 4.29% compared to SVM—the best performing method in traditional machine learning, 1.49% compared to GRU—the best performing method in deep learning, and 1.13% compared to BERT.https://www.mdpi.com/2078-2489/15/2/93social mediacyberbullying detectiondeep learninglanguage model |
spellingShingle | Shifeng Chen Jialin Wang Ketai He Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid Model Information social media cyberbullying detection deep learning language model |
title | Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid Model |
title_full | Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid Model |
title_fullStr | Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid Model |
title_full_unstemmed | Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid Model |
title_short | Chinese Cyberbullying Detection Using XLNet and Deep Bi-LSTM Hybrid Model |
title_sort | chinese cyberbullying detection using xlnet and deep bi lstm hybrid model |
topic | social media cyberbullying detection deep learning language model |
url | https://www.mdpi.com/2078-2489/15/2/93 |
work_keys_str_mv | AT shifengchen chinesecyberbullyingdetectionusingxlnetanddeepbilstmhybridmodel AT jialinwang chinesecyberbullyingdetectionusingxlnetanddeepbilstmhybridmodel AT ketaihe chinesecyberbullyingdetectionusingxlnetanddeepbilstmhybridmodel |