HateCheck: functional tests for hate speech detection models

Detecting online hate is a difficult task that even state-of-the-art models struggle with. Typically, hate speech detection models are evaluated by measuring their performance on held-out test data using metrics such as accuracy and F1 score. However, this approach makes it difficult to identify spe...

Full description

Bibliographic Details
Main Authors:	Röttger, P, Vidgen, B, Dong, N, Waseem, Z, Margetts, H, Pierrehumbert, JB
Format:	Conference item
Language:	English
Published:	Association for Computational Linguistics 2021

_version_	1826311950786625536
author	Röttger, P Vidgen, B Dong, N Waseem, Z Margetts, H Pierrehumbert, JB
author_facet	Röttger, P Vidgen, B Dong, N Waseem, Z Margetts, H Pierrehumbert, JB
author_sort	Röttger, P
collection	OXFORD
description	Detecting online hate is a difficult task that even state-of-the-art models struggle with. Typically, hate speech detection models are evaluated by measuring their performance on held-out test data using metrics such as accuracy and F1 score. However, this approach makes it difficult to identify specific model weak points. It also risks overestimating generalisable model performance due to increasingly well-evidenced systematic gaps and biases in hate speech datasets. To enable more targeted diagnostic insights, we introduce HateCheck, a suite of functional tests for hate speech detection models. We specify 29 model functionalities motivated by a review of previous research and a series of interviews with civil society stakeholders. We craft test cases for each functionality and validate their quality through a structured annotation process. To illustrate HateCheck’s utility, we test near-state-of-the-art transformer models as well as two popular commercial models, revealing critical model weaknesses.
first_indexed	2024-03-07T08:18:51Z
format	Conference item
id	oxford-uuid:87f02544-2770-4d4a-a143-c1be682e7f95
institution	University of Oxford
language	English
last_indexed	2024-03-07T08:18:51Z
publishDate	2021
publisher	Association for Computational Linguistics
record_format	dspace
spelling	oxford-uuid:87f02544-2770-4d4a-a143-c1be682e7f952024-01-16T14:24:54ZHateCheck: functional tests for hate speech detection modelsConference itemhttp://purl.org/coar/resource_type/c_5794uuid:87f02544-2770-4d4a-a143-c1be682e7f95EnglishSymplectic ElementsAssociation for Computational Linguistics2021Röttger, PVidgen, BDong, NWaseem, ZMargetts, HPierrehumbert, JBDetecting online hate is a difficult task that even state-of-the-art models struggle with. Typically, hate speech detection models are evaluated by measuring their performance on held-out test data using metrics such as accuracy and F1 score. However, this approach makes it difficult to identify specific model weak points. It also risks overestimating generalisable model performance due to increasingly well-evidenced systematic gaps and biases in hate speech datasets. To enable more targeted diagnostic insights, we introduce HateCheck, a suite of functional tests for hate speech detection models. We specify 29 model functionalities motivated by a review of previous research and a series of interviews with civil society stakeholders. We craft test cases for each functionality and validate their quality through a structured annotation process. To illustrate HateCheck’s utility, we test near-state-of-the-art transformer models as well as two popular commercial models, revealing critical model weaknesses.
spellingShingle	Röttger, P Vidgen, B Dong, N Waseem, Z Margetts, H Pierrehumbert, JB HateCheck: functional tests for hate speech detection models
title	HateCheck: functional tests for hate speech detection models
title_full	HateCheck: functional tests for hate speech detection models
title_fullStr	HateCheck: functional tests for hate speech detection models
title_full_unstemmed	HateCheck: functional tests for hate speech detection models
title_short	HateCheck: functional tests for hate speech detection models
title_sort	hatecheck functional tests for hate speech detection models
work_keys_str_mv	AT rottgerp hatecheckfunctionaltestsforhatespeechdetectionmodels AT vidgenb hatecheckfunctionaltestsforhatespeechdetectionmodels AT dongn hatecheckfunctionaltestsforhatespeechdetectionmodels AT waseemz hatecheckfunctionaltestsforhatespeechdetectionmodels AT margettsh hatecheckfunctionaltestsforhatespeechdetectionmodels AT pierrehumbertjb hatecheckfunctionaltestsforhatespeechdetectionmodels

HateCheck: functional tests for hate speech detection models

Similar Items