Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate

Detecting online hate is a complex task, and low-performing detection models have harmful consequences when used for sensitive applications such as content moderation. Emoji-based hate is a key emerging challenge for online hate detection. We present HatemojiCheck, a test suite of 3,930 short-form s...

Full description

Bibliographic Details
Main Authors:	Kirk, H, Vidgen, B, Röttger, P, Hale, SA
Format:	Working paper
Language:	English
Published:	2021

_version_	1797051361502691328
author	Kirk, H Vidgen, B Röttger, P Hale, SA
author_facet	Kirk, H Vidgen, B Röttger, P Hale, SA
author_sort	Kirk, H
collection	OXFORD
description	Detecting online hate is a complex task, and low-performing detection models have harmful consequences when used for sensitive applications such as content moderation. Emoji-based hate is a key emerging challenge for online hate detection. We present HatemojiCheck, a test suite of 3,930 short-form statements that allows us to evaluate how detection models perform on hateful language expressed with emoji. Using the test suite, we expose weaknesses in existing hate detection models. To address these weaknesses, we create the HatemojiTrain dataset using an innovative human-and-model-in-the-loop approach. Models trained on these 5,912 adversarial examples perform substantially better at detecting emoji-based hate, while retaining strong performance on text-only hate. Both HatemojiCheck and HatemojiTrain are made publicly available.
first_indexed	2024-03-06T18:18:26Z
format	Working paper
id	oxford-uuid:0570eaf5-e729-4ef5-b27a-b6d511abcdc3
institution	University of Oxford
language	English
last_indexed	2024-03-06T18:18:26Z
publishDate	2021
record_format	dspace
spelling	oxford-uuid:0570eaf5-e729-4ef5-b27a-b6d511abcdc32022-03-26T08:57:14ZHatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hateWorking paperhttp://purl.org/coar/resource_type/c_8042uuid:0570eaf5-e729-4ef5-b27a-b6d511abcdc3EnglishSymplectic Elements2021Kirk, HVidgen, BRöttger, PHale, SADetecting online hate is a complex task, and low-performing detection models have harmful consequences when used for sensitive applications such as content moderation. Emoji-based hate is a key emerging challenge for online hate detection. We present HatemojiCheck, a test suite of 3,930 short-form statements that allows us to evaluate how detection models perform on hateful language expressed with emoji. Using the test suite, we expose weaknesses in existing hate detection models. To address these weaknesses, we create the HatemojiTrain dataset using an innovative human-and-model-in-the-loop approach. Models trained on these 5,912 adversarial examples perform substantially better at detecting emoji-based hate, while retaining strong performance on text-only hate. Both HatemojiCheck and HatemojiTrain are made publicly available.
spellingShingle	Kirk, H Vidgen, B Röttger, P Hale, SA Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
title	Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
title_full	Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
title_fullStr	Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
title_full_unstemmed	Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
title_short	Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
title_sort	hatemoji a test suite and adversarially generated dataset for benchmarking and detecting emoji based hate
work_keys_str_mv	AT kirkh hatemojiatestsuiteandadversariallygenerateddatasetforbenchmarkinganddetectingemojibasedhate AT vidgenb hatemojiatestsuiteandadversariallygenerateddatasetforbenchmarkinganddetectingemojibasedhate AT rottgerp hatemojiatestsuiteandadversariallygenerateddatasetforbenchmarkinganddetectingemojibasedhate AT halesa hatemojiatestsuiteandadversariallygenerateddatasetforbenchmarkinganddetectingemojibasedhate

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate

Similar Items