A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service

The dynamic increase in user-generated content on the web presents significant challenges in protecting Internet users from exposure to offensive material, such as cyberbullying and hate speech, while also minimizing the spread of wrongful conduct. However, designing automated detection models for...

Full description

Bibliographic Details
Main Authors: Inez Okulska, Anna Kołos
Format: Article
Language:English
Published: Cracow Tertium Society for the Promotion of Language Studies 2024-02-01
Series:Półrocznik Językoznawczy Tertium
Subjects:
Online Access:https://journal.tertium.edu.pl/JaK/article/view/245
_version_ 1797303429739053056
author Inez Okulska
Anna Kołos
author_facet Inez Okulska
Anna Kołos
author_sort Inez Okulska
collection DOAJ
description The dynamic increase in user-generated content on the web presents significant challenges in protecting Internet users from exposure to offensive material, such as cyberbullying and hate speech, while also minimizing the spread of wrongful conduct. However, designing automated detection models for such offensive content remains complex, particularly in languages with limited publicly available data. To address this issue, our research collaborates with the Wykop.pl web service to fine-tune a model using genuine content that has been banned by professional moderators. In this paper, we focus on the Polish language and discuss the notion of datasets and annotation frameworks, presenting our stylometric analysis of Wykop.pl content to identify morpho-syntactic structures that are commonly applied in cyberbullying and hate speech. By doing so, we contribute to the ongoing discussion on offensive language and hate speech in sociolinguistic studies, emphasizing the need to consider user-generated online content.
first_indexed 2024-03-07T23:52:40Z
format Article
id doaj.art-1304275ae99e4881b0aee8a348671d3e
institution Directory Open Access Journal
issn 2543-7844
language English
last_indexed 2024-03-07T23:52:40Z
publishDate 2024-02-01
publisher Cracow Tertium Society for the Promotion of Language Studies
record_format Article
series Półrocznik Językoznawczy Tertium
spelling doaj.art-1304275ae99e4881b0aee8a348671d3e2024-02-18T21:34:57ZengCracow Tertium Society for the Promotion of Language StudiesPółrocznik Językoznawczy Tertium2543-78442024-02-018210.7592/Tertium.2023.8.2.245A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web ServiceInez Okulska0Anna Kołos1NASK National Research Institute, Warszawa, Poland NASK National Research Institute, Warszawa, Poland The dynamic increase in user-generated content on the web presents significant challenges in protecting Internet users from exposure to offensive material, such as cyberbullying and hate speech, while also minimizing the spread of wrongful conduct. However, designing automated detection models for such offensive content remains complex, particularly in languages with limited publicly available data. To address this issue, our research collaborates with the Wykop.pl web service to fine-tune a model using genuine content that has been banned by professional moderators. In this paper, we focus on the Polish language and discuss the notion of datasets and annotation frameworks, presenting our stylometric analysis of Wykop.pl content to identify morpho-syntactic structures that are commonly applied in cyberbullying and hate speech. By doing so, we contribute to the ongoing discussion on offensive language and hate speech in sociolinguistic studies, emphasizing the need to consider user-generated online content. https://journal.tertium.edu.pl/JaK/article/view/245cyberbullyinghate speechuser-generated online contentautomated detectionstylometry
spellingShingle Inez Okulska
Anna Kołos
A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service
Półrocznik Językoznawczy Tertium
cyberbullying
hate speech
user-generated online content
automated detection
stylometry
title A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service
title_full A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service
title_fullStr A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service
title_full_unstemmed A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service
title_short A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service
title_sort morpho syntactic analysis of human moderated hate speech samples from wykop pl web service
topic cyberbullying
hate speech
user-generated online content
automated detection
stylometry
url https://journal.tertium.edu.pl/JaK/article/view/245
work_keys_str_mv AT inezokulska amorphosyntacticanalysisofhumanmoderatedhatespeechsamplesfromwykopplwebservice
AT annakołos amorphosyntacticanalysisofhumanmoderatedhatespeechsamplesfromwykopplwebservice
AT inezokulska morphosyntacticanalysisofhumanmoderatedhatespeechsamplesfromwykopplwebservice
AT annakołos morphosyntacticanalysisofhumanmoderatedhatespeechsamplesfromwykopplwebservice