A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service

The dynamic increase in user-generated content on the web presents significant challenges in protecting Internet users from exposure to offensive material, such as cyberbullying and hate speech, while also minimizing the spread of wrongful conduct. However, designing automated detection models for...

Full description

Bibliographic Details
Main Authors:	Inez Okulska, Anna Kołos
Format:	Article
Language:	English
Published:	Cracow Tertium Society for the Promotion of Language Studies 2024-02-01
Series:	Półrocznik Językoznawczy Tertium
Subjects:	cyberbullying hate speech user-generated online content automated detection stylometry
Online Access:	https://journal.tertium.edu.pl/JaK/article/view/245

_version_	1797303429739053056
author	Inez Okulska Anna Kołos
author_facet	Inez Okulska Anna Kołos
author_sort	Inez Okulska
collection	DOAJ
description	The dynamic increase in user-generated content on the web presents significant challenges in protecting Internet users from exposure to offensive material, such as cyberbullying and hate speech, while also minimizing the spread of wrongful conduct. However, designing automated detection models for such offensive content remains complex, particularly in languages with limited publicly available data. To address this issue, our research collaborates with the Wykop.pl web service to fine-tune a model using genuine content that has been banned by professional moderators. In this paper, we focus on the Polish language and discuss the notion of datasets and annotation frameworks, presenting our stylometric analysis of Wykop.pl content to identify morpho-syntactic structures that are commonly applied in cyberbullying and hate speech. By doing so, we contribute to the ongoing discussion on offensive language and hate speech in sociolinguistic studies, emphasizing the need to consider user-generated online content.
first_indexed	2024-03-07T23:52:40Z
format	Article
id	doaj.art-1304275ae99e4881b0aee8a348671d3e
institution	Directory Open Access Journal
issn	2543-7844
language	English
last_indexed	2024-03-07T23:52:40Z
publishDate	2024-02-01
publisher	Cracow Tertium Society for the Promotion of Language Studies
record_format	Article
series	Półrocznik Językoznawczy Tertium
spelling	doaj.art-1304275ae99e4881b0aee8a348671d3e2024-02-18T21:34:57ZengCracow Tertium Society for the Promotion of Language StudiesPółrocznik Językoznawczy Tertium2543-78442024-02-018210.7592/Tertium.2023.8.2.245A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web ServiceInez Okulska0Anna Kołos1NASK National Research Institute, Warszawa, Poland NASK National Research Institute, Warszawa, Poland The dynamic increase in user-generated content on the web presents significant challenges in protecting Internet users from exposure to offensive material, such as cyberbullying and hate speech, while also minimizing the spread of wrongful conduct. However, designing automated detection models for such offensive content remains complex, particularly in languages with limited publicly available data. To address this issue, our research collaborates with the Wykop.pl web service to fine-tune a model using genuine content that has been banned by professional moderators. In this paper, we focus on the Polish language and discuss the notion of datasets and annotation frameworks, presenting our stylometric analysis of Wykop.pl content to identify morpho-syntactic structures that are commonly applied in cyberbullying and hate speech. By doing so, we contribute to the ongoing discussion on offensive language and hate speech in sociolinguistic studies, emphasizing the need to consider user-generated online content. https://journal.tertium.edu.pl/JaK/article/view/245cyberbullyinghate speechuser-generated online contentautomated detectionstylometry
spellingShingle	Inez Okulska Anna Kołos A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service Półrocznik Językoznawczy Tertium cyberbullying hate speech user-generated online content automated detection stylometry
title	A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service
title_full	A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service
title_fullStr	A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service
title_full_unstemmed	A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service
title_short	A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service
title_sort	morpho syntactic analysis of human moderated hate speech samples from wykop pl web service
topic	cyberbullying hate speech user-generated online content automated detection stylometry
url	https://journal.tertium.edu.pl/JaK/article/view/245
work_keys_str_mv	AT inezokulska amorphosyntacticanalysisofhumanmoderatedhatespeechsamplesfromwykopplwebservice AT annakołos amorphosyntacticanalysisofhumanmoderatedhatespeechsamplesfromwykopplwebservice AT inezokulska morphosyntacticanalysisofhumanmoderatedhatespeechsamplesfromwykopplwebservice AT annakołos morphosyntacticanalysisofhumanmoderatedhatespeechsamplesfromwykopplwebservice

A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service

Similar Items