Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP

Abstract⚠ This paper contains prompts and model outputs that are offensive in nature.When trained on large, unfiltered crawls from the Internet, language models pick up and reproduce all kinds of undesirable biases that can be found in the data: They often generate racist, sexist, vi...

Full description

Bibliographic Details
Main Authors:	Timo Schick, Sahana Udupa, Hinrich Schütze
Format:	Article
Language:	English
Published:	The MIT Press 2021-01-01
Series:	Transactions of the Association for Computational Linguistics
Online Access:	https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00434/108865/Self-Diagnosis-and-Self-Debiasing-A-Proposal-for

_version_	1811338137011486720
author	Timo Schick Sahana Udupa Hinrich Schütze
author_facet	Timo Schick Sahana Udupa Hinrich Schütze
author_sort	Timo Schick
collection	DOAJ
description	Abstract⚠ This paper contains prompts and model outputs that are offensive in nature.When trained on large, unfiltered crawls from the Internet, language models pick up and reproduce all kinds of undesirable biases that can be found in the data: They often generate racist, sexist, violent, or otherwise toxic language. As large models require millions of training examples to achieve good performance, it is difficult to completely prevent them from being exposed to such content. In this paper, we first demonstrate a surprising finding: Pretrained language models recognize, to a considerable degree, their undesirable biases and the toxicity of the content they produce. We refer to this capability as self-diagnosis. Based on this finding, we then propose a decoding algorithm that, given only a textual description of the undesired behavior, reduces the probability of a language model producing problematic text. We refer to this approach as self-debiasing. Self-debiasing does not rely on manually curated word lists, nor does it require any training data or changes to the model’s parameters. While we by no means eliminate the issue of language models generating biased text, we believe our approach to be an important step in this direction.1
first_indexed	2024-04-13T18:06:34Z
format	Article
id	doaj.art-7865d581bc554481bb1d3d28fe5f98e4
institution	Directory Open Access Journal
issn	2307-387X
language	English
last_indexed	2024-04-13T18:06:34Z
publishDate	2021-01-01
publisher	The MIT Press
record_format	Article
series	Transactions of the Association for Computational Linguistics
spelling	doaj.art-7865d581bc554481bb1d3d28fe5f98e42022-12-22T02:36:04ZengThe MIT PressTransactions of the Association for Computational Linguistics2307-387X2021-01-0191408142410.1162/tacl_a_00434Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLPTimo Schick0Sahana Udupa1Hinrich Schütze2Center for Information and Language Processing (CIS), LMU Munich, Germany. schickt@cis.lmu.deInstitute of Social and Cultural Anthropology, LMU Munich, Germany. sahana.udupa@lmu.deCenter for Information and Language Processing (CIS), LMU Munich, Germany. inquiries@cislmu.org Abstract⚠ This paper contains prompts and model outputs that are offensive in nature.When trained on large, unfiltered crawls from the Internet, language models pick up and reproduce all kinds of undesirable biases that can be found in the data: They often generate racist, sexist, violent, or otherwise toxic language. As large models require millions of training examples to achieve good performance, it is difficult to completely prevent them from being exposed to such content. In this paper, we first demonstrate a surprising finding: Pretrained language models recognize, to a considerable degree, their undesirable biases and the toxicity of the content they produce. We refer to this capability as self-diagnosis. Based on this finding, we then propose a decoding algorithm that, given only a textual description of the undesired behavior, reduces the probability of a language model producing problematic text. We refer to this approach as self-debiasing. Self-debiasing does not rely on manually curated word lists, nor does it require any training data or changes to the model’s parameters. While we by no means eliminate the issue of language models generating biased text, we believe our approach to be an important step in this direction.1https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00434/108865/Self-Diagnosis-and-Self-Debiasing-A-Proposal-for
spellingShingle	Timo Schick Sahana Udupa Hinrich Schütze Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP Transactions of the Association for Computational Linguistics
title	Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
title_full	Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
title_fullStr	Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
title_full_unstemmed	Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
title_short	Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
title_sort	self diagnosis and self debiasing a proposal for reducing corpus based bias in nlp
url	https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00434/108865/Self-Diagnosis-and-Self-Debiasing-A-Proposal-for
work_keys_str_mv	AT timoschick selfdiagnosisandselfdebiasingaproposalforreducingcorpusbasedbiasinnlp AT sahanaudupa selfdiagnosisandselfdebiasingaproposalforreducingcorpusbasedbiasinnlp AT hinrichschutze selfdiagnosisandselfdebiasingaproposalforreducingcorpusbasedbiasinnlp

Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP

Similar Items