Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate

Detecting online hate is a complex task, and low-performing detection models have harmful consequences when used for sensitive applications such as content moderation. Emoji-based hate is a key emerging challenge for online hate detection. We present HatemojiCheck, a test suite of 3,930 short-form s...

Full description

Bibliographic Details
Main Authors: Kirk, H, Vidgen, B, Röttger, P, Hale, SA
Format: Working paper
Language:English
Published: 2021
_version_ 1797051361502691328
author Kirk, H
Vidgen, B
Röttger, P
Hale, SA
author_facet Kirk, H
Vidgen, B
Röttger, P
Hale, SA
author_sort Kirk, H
collection OXFORD
description Detecting online hate is a complex task, and low-performing detection models have harmful consequences when used for sensitive applications such as content moderation. Emoji-based hate is a key emerging challenge for online hate detection. We present HatemojiCheck, a test suite of 3,930 short-form statements that allows us to evaluate how detection models perform on hateful language expressed with emoji. Using the test suite, we expose weaknesses in existing hate detection models. To address these weaknesses, we create the HatemojiTrain dataset using an innovative human-and-model-in-the-loop approach. Models trained on these 5,912 adversarial examples perform substantially better at detecting emoji-based hate, while retaining strong performance on text-only hate. Both HatemojiCheck and HatemojiTrain are made publicly available.
first_indexed 2024-03-06T18:18:26Z
format Working paper
id oxford-uuid:0570eaf5-e729-4ef5-b27a-b6d511abcdc3
institution University of Oxford
language English
last_indexed 2024-03-06T18:18:26Z
publishDate 2021
record_format dspace
spelling oxford-uuid:0570eaf5-e729-4ef5-b27a-b6d511abcdc32022-03-26T08:57:14ZHatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hateWorking paperhttp://purl.org/coar/resource_type/c_8042uuid:0570eaf5-e729-4ef5-b27a-b6d511abcdc3EnglishSymplectic Elements2021Kirk, HVidgen, BRöttger, PHale, SADetecting online hate is a complex task, and low-performing detection models have harmful consequences when used for sensitive applications such as content moderation. Emoji-based hate is a key emerging challenge for online hate detection. We present HatemojiCheck, a test suite of 3,930 short-form statements that allows us to evaluate how detection models perform on hateful language expressed with emoji. Using the test suite, we expose weaknesses in existing hate detection models. To address these weaknesses, we create the HatemojiTrain dataset using an innovative human-and-model-in-the-loop approach. Models trained on these 5,912 adversarial examples perform substantially better at detecting emoji-based hate, while retaining strong performance on text-only hate. Both HatemojiCheck and HatemojiTrain are made publicly available.
spellingShingle Kirk, H
Vidgen, B
Röttger, P
Hale, SA
Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
title Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
title_full Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
title_fullStr Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
title_full_unstemmed Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
title_short Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
title_sort hatemoji a test suite and adversarially generated dataset for benchmarking and detecting emoji based hate
work_keys_str_mv AT kirkh hatemojiatestsuiteandadversariallygenerateddatasetforbenchmarkinganddetectingemojibasedhate
AT vidgenb hatemojiatestsuiteandadversariallygenerateddatasetforbenchmarkinganddetectingemojibasedhate
AT rottgerp hatemojiatestsuiteandadversariallygenerateddatasetforbenchmarkinganddetectingemojibasedhate
AT halesa hatemojiatestsuiteandadversariallygenerateddatasetforbenchmarkinganddetectingemojibasedhate