A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service
The dynamic increase in user-generated content on the web presents significant challenges in protecting Internet users from exposure to offensive material, such as cyberbullying and hate speech, while also minimizing the spread of wrongful conduct. However, designing automated detection models for...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Cracow Tertium Society for the Promotion of Language Studies
2024-02-01
|
Series: | Półrocznik Językoznawczy Tertium |
Subjects: | |
Online Access: | https://journal.tertium.edu.pl/JaK/article/view/245 |
_version_ | 1797303429739053056 |
---|---|
author | Inez Okulska Anna Kołos |
author_facet | Inez Okulska Anna Kołos |
author_sort | Inez Okulska |
collection | DOAJ |
description |
The dynamic increase in user-generated content on the web presents significant challenges in protecting Internet users from exposure to offensive material, such as cyberbullying and hate speech, while also minimizing the spread of wrongful conduct. However, designing automated detection models for such offensive content remains complex, particularly in languages with limited publicly available data. To address this issue, our research collaborates with the Wykop.pl web service to fine-tune a model using genuine content that has been banned by professional moderators. In this paper, we focus on the Polish language and discuss the notion of datasets and annotation frameworks, presenting our stylometric analysis of Wykop.pl content to identify morpho-syntactic structures that are commonly applied in cyberbullying and hate speech. By doing so, we contribute to the ongoing discussion on offensive language and hate speech in sociolinguistic studies, emphasizing the need to consider user-generated online content.
|
first_indexed | 2024-03-07T23:52:40Z |
format | Article |
id | doaj.art-1304275ae99e4881b0aee8a348671d3e |
institution | Directory Open Access Journal |
issn | 2543-7844 |
language | English |
last_indexed | 2024-03-07T23:52:40Z |
publishDate | 2024-02-01 |
publisher | Cracow Tertium Society for the Promotion of Language Studies |
record_format | Article |
series | Półrocznik Językoznawczy Tertium |
spelling | doaj.art-1304275ae99e4881b0aee8a348671d3e2024-02-18T21:34:57ZengCracow Tertium Society for the Promotion of Language StudiesPółrocznik Językoznawczy Tertium2543-78442024-02-018210.7592/Tertium.2023.8.2.245A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web ServiceInez Okulska0Anna Kołos1NASK National Research Institute, Warszawa, Poland NASK National Research Institute, Warszawa, Poland The dynamic increase in user-generated content on the web presents significant challenges in protecting Internet users from exposure to offensive material, such as cyberbullying and hate speech, while also minimizing the spread of wrongful conduct. However, designing automated detection models for such offensive content remains complex, particularly in languages with limited publicly available data. To address this issue, our research collaborates with the Wykop.pl web service to fine-tune a model using genuine content that has been banned by professional moderators. In this paper, we focus on the Polish language and discuss the notion of datasets and annotation frameworks, presenting our stylometric analysis of Wykop.pl content to identify morpho-syntactic structures that are commonly applied in cyberbullying and hate speech. By doing so, we contribute to the ongoing discussion on offensive language and hate speech in sociolinguistic studies, emphasizing the need to consider user-generated online content. https://journal.tertium.edu.pl/JaK/article/view/245cyberbullyinghate speechuser-generated online contentautomated detectionstylometry |
spellingShingle | Inez Okulska Anna Kołos A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service Półrocznik Językoznawczy Tertium cyberbullying hate speech user-generated online content automated detection stylometry |
title | A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service |
title_full | A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service |
title_fullStr | A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service |
title_full_unstemmed | A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service |
title_short | A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service |
title_sort | morpho syntactic analysis of human moderated hate speech samples from wykop pl web service |
topic | cyberbullying hate speech user-generated online content automated detection stylometry |
url | https://journal.tertium.edu.pl/JaK/article/view/245 |
work_keys_str_mv | AT inezokulska amorphosyntacticanalysisofhumanmoderatedhatespeechsamplesfromwykopplwebservice AT annakołos amorphosyntacticanalysisofhumanmoderatedhatespeechsamplesfromwykopplwebservice AT inezokulska morphosyntacticanalysisofhumanmoderatedhatespeechsamplesfromwykopplwebservice AT annakołos morphosyntacticanalysisofhumanmoderatedhatespeechsamplesfromwykopplwebservice |