Reducing Bias in Sentiment Analysis Models Through Causal Mediation Analysis and Targeted Counterfactual Training

Large language models provide high-accuracy solutions in many natural language processing tasks. In particular, they are used as word embeddings in sentiment analysis models. However, these models pick up on and amplify biases and social stereotypes in the data. Causality theory has recently driven...

Full description

Bibliographic Details
Main Authors:	Yifei Da, Matias Nicolas Bossa, Abel Diaz Berenguer, Hichem Sahli
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	BERT bias causal mediation analysis counterfactual training large language models sentiment analysis
Online Access:	https://ieeexplore.ieee.org/document/10388308/

_version_	1797348670813765632
author	Yifei Da Matias Nicolas Bossa Abel Diaz Berenguer Hichem Sahli
author_facet	Yifei Da Matias Nicolas Bossa Abel Diaz Berenguer Hichem Sahli
author_sort	Yifei Da
collection	DOAJ
description	Large language models provide high-accuracy solutions in many natural language processing tasks. In particular, they are used as word embeddings in sentiment analysis models. However, these models pick up on and amplify biases and social stereotypes in the data. Causality theory has recently driven the development of effective algorithms to evaluate and mitigate these biases. Causal mediation was used to detect biases, while counterfactual training was proposed to mitigate bias. In both cases, counterfactual sentences are created by changing an attribute, such as the gender of a noun, for which no change in the model output is expected. Biases are detected and eventually corrected each time the model behavior differs between the original and the counterfactual sentence. We propose a new method for de-biasing sentiment analysis models that leverages the causal mediation analysis to identify the parts of the model primarily responsible for the bias and apply targeted counterfactual training for model de-biasing. We validated the methodology by fine-tuning the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model for sentiment prediction. We trained two sentiment analysis models using the Stanford Sentiment Treebank dataset and the Amazon Product Reviews, respectively, and we evaluated the fairness and prediction performances using the Equity Evaluation Corpus. We illustrated the causal patterns in the network and showed that our method achieves both high fairness and more accurate sentiment analysis than the state-of-the-art approach. Contrary to state-of-the-art models, we achieved a noticeable improvement in gender fairness without hindering sentiment prediction accuracy.
first_indexed	2024-03-08T12:09:18Z
format	Article
id	doaj.art-0bf6b4166d434aabb73d28394f148412
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-08T12:09:18Z
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-0bf6b4166d434aabb73d28394f1484122024-01-23T00:04:31ZengIEEEIEEE Access2169-35362024-01-0112101201013410.1109/ACCESS.2024.335305610388308Reducing Bias in Sentiment Analysis Models Through Causal Mediation Analysis and Targeted Counterfactual TrainingYifei Da0https://orcid.org/0000-0002-9442-612XMatias Nicolas Bossa1https://orcid.org/0000-0001-5127-2573Abel Diaz Berenguer2https://orcid.org/0000-0003-4970-6517Hichem Sahli3https://orcid.org/0000-0002-1774-2970Department of Electronics and Informatics, Vrije Universiteit Brussel, Brussels, BelgiumDepartment of Electronics and Informatics, Vrije Universiteit Brussel, Brussels, BelgiumDepartment of Electronics and Informatics, Vrije Universiteit Brussel, Brussels, BelgiumDepartment of Electronics and Informatics, Vrije Universiteit Brussel, Brussels, BelgiumLarge language models provide high-accuracy solutions in many natural language processing tasks. In particular, they are used as word embeddings in sentiment analysis models. However, these models pick up on and amplify biases and social stereotypes in the data. Causality theory has recently driven the development of effective algorithms to evaluate and mitigate these biases. Causal mediation was used to detect biases, while counterfactual training was proposed to mitigate bias. In both cases, counterfactual sentences are created by changing an attribute, such as the gender of a noun, for which no change in the model output is expected. Biases are detected and eventually corrected each time the model behavior differs between the original and the counterfactual sentence. We propose a new method for de-biasing sentiment analysis models that leverages the causal mediation analysis to identify the parts of the model primarily responsible for the bias and apply targeted counterfactual training for model de-biasing. We validated the methodology by fine-tuning the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model for sentiment prediction. We trained two sentiment analysis models using the Stanford Sentiment Treebank dataset and the Amazon Product Reviews, respectively, and we evaluated the fairness and prediction performances using the Equity Evaluation Corpus. We illustrated the causal patterns in the network and showed that our method achieves both high fairness and more accurate sentiment analysis than the state-of-the-art approach. Contrary to state-of-the-art models, we achieved a noticeable improvement in gender fairness without hindering sentiment prediction accuracy.https://ieeexplore.ieee.org/document/10388308/BERTbiascausal mediation analysiscounterfactual traininglarge language modelssentiment analysis
spellingShingle	Yifei Da Matias Nicolas Bossa Abel Diaz Berenguer Hichem Sahli Reducing Bias in Sentiment Analysis Models Through Causal Mediation Analysis and Targeted Counterfactual Training IEEE Access BERT bias causal mediation analysis counterfactual training large language models sentiment analysis
title	Reducing Bias in Sentiment Analysis Models Through Causal Mediation Analysis and Targeted Counterfactual Training
title_full	Reducing Bias in Sentiment Analysis Models Through Causal Mediation Analysis and Targeted Counterfactual Training
title_fullStr	Reducing Bias in Sentiment Analysis Models Through Causal Mediation Analysis and Targeted Counterfactual Training
title_full_unstemmed	Reducing Bias in Sentiment Analysis Models Through Causal Mediation Analysis and Targeted Counterfactual Training
title_short	Reducing Bias in Sentiment Analysis Models Through Causal Mediation Analysis and Targeted Counterfactual Training
title_sort	reducing bias in sentiment analysis models through causal mediation analysis and targeted counterfactual training
topic	BERT bias causal mediation analysis counterfactual training large language models sentiment analysis
url	https://ieeexplore.ieee.org/document/10388308/
work_keys_str_mv	AT yifeida reducingbiasinsentimentanalysismodelsthroughcausalmediationanalysisandtargetedcounterfactualtraining AT matiasnicolasbossa reducingbiasinsentimentanalysismodelsthroughcausalmediationanalysisandtargetedcounterfactualtraining AT abeldiazberenguer reducingbiasinsentimentanalysismodelsthroughcausalmediationanalysisandtargetedcounterfactualtraining AT hichemsahli reducingbiasinsentimentanalysismodelsthroughcausalmediationanalysisandtargetedcounterfactualtraining

Reducing Bias in Sentiment Analysis Models Through Causal Mediation Analysis and Targeted Counterfactual Training

Similar Items