CausaLM: Causal Model Explanation Through Counterfactual Language Models

AbstractUnderstanding predictions made by deep neural networks is notoriously difficult, but also crucial to their dissemination. As all machine learning–based methods, they are as good as their training data, and can also capture unwanted biases. While there are tools that can help...

Full description

Bibliographic Details
Main Authors:	Amir Feder, Nadav Oved, Uri Shalit, Roi Reichart
Format:	Article
Language:	English
Published:	The MIT Press 2021-07-01
Series:	Computational Linguistics
Online Access:	https://direct.mit.edu/coli/article/47/2/333/98518/CausaLM-Causal-Model-Explanation-Through

_version_	1811251409562107904
author	Amir Feder Nadav Oved Uri Shalit Roi Reichart
author_facet	Amir Feder Nadav Oved Uri Shalit Roi Reichart
author_sort	Amir Feder
collection	DOAJ
description	AbstractUnderstanding predictions made by deep neural networks is notoriously difficult, but also crucial to their dissemination. As all machine learning–based methods, they are as good as their training data, and can also capture unwanted biases. While there are tools that can help understand whether such biases exist, they do not distinguish between correlation and causation, and might be ill-suited for text-based models and for reasoning about high-level language concepts. A key problem of estimating the causal effect of a concept of interest on a given model is that this estimation requires the generation of counterfactual examples, which is challenging with existing generation technology. To bridge that gap, we propose CausaLM, a framework for producing causal model explanations using counterfactual language representation models. Our approach is based on fine-tuning of deep contextualized embedding models with auxiliary adversarial tasks derived from the causal graph of the problem. Concretely, we show that by carefully choosing auxiliary adversarial pre-training tasks, language representation models such as BERT can effectively learn a counterfactual representation for a given concept of interest, and be used to estimate its true causal effect on model performance. A byproduct of our method is a language representation model that is unaffected by the tested concept, which can be useful in mitigating unwanted bias ingrained in the data.1
first_indexed	2024-04-12T16:19:12Z
format	Article
id	doaj.art-695c189dcd6c464bb43626363309264e
institution	Directory Open Access Journal
issn	0891-2017 1530-9312
language	English
last_indexed	2024-04-12T16:19:12Z
publishDate	2021-07-01
publisher	The MIT Press
record_format	Article
series	Computational Linguistics
spelling	doaj.art-695c189dcd6c464bb43626363309264e2022-12-22T03:25:37ZengThe MIT PressComputational Linguistics0891-20171530-93122021-07-0147233338610.1162/coli_a_00404CausaLM: Causal Model Explanation Through Counterfactual Language ModelsAmir Feder0Nadav Oved1Uri Shalit2Roi Reichart3Faculty of Industrial Engineering and Management, Technion - Israel Institute of Technology. feder@campus.technion.ac.ilFaculty of Industrial Engineering and Management, Technion - Israel Institute of Technology. nadavo@campus.technion.ac.ilFaculty of Industrial Engineering and Management, Technion - Israel Institute of Technology. urishalit@technion.ac.ilFaculty of Industrial Engineering and Management, Technion - Israel Institute of Technology. roiri@technion.ac.il AbstractUnderstanding predictions made by deep neural networks is notoriously difficult, but also crucial to their dissemination. As all machine learning–based methods, they are as good as their training data, and can also capture unwanted biases. While there are tools that can help understand whether such biases exist, they do not distinguish between correlation and causation, and might be ill-suited for text-based models and for reasoning about high-level language concepts. A key problem of estimating the causal effect of a concept of interest on a given model is that this estimation requires the generation of counterfactual examples, which is challenging with existing generation technology. To bridge that gap, we propose CausaLM, a framework for producing causal model explanations using counterfactual language representation models. Our approach is based on fine-tuning of deep contextualized embedding models with auxiliary adversarial tasks derived from the causal graph of the problem. Concretely, we show that by carefully choosing auxiliary adversarial pre-training tasks, language representation models such as BERT can effectively learn a counterfactual representation for a given concept of interest, and be used to estimate its true causal effect on model performance. A byproduct of our method is a language representation model that is unaffected by the tested concept, which can be useful in mitigating unwanted bias ingrained in the data.1https://direct.mit.edu/coli/article/47/2/333/98518/CausaLM-Causal-Model-Explanation-Through
spellingShingle	Amir Feder Nadav Oved Uri Shalit Roi Reichart CausaLM: Causal Model Explanation Through Counterfactual Language Models Computational Linguistics
title	CausaLM: Causal Model Explanation Through Counterfactual Language Models
title_full	CausaLM: Causal Model Explanation Through Counterfactual Language Models
title_fullStr	CausaLM: Causal Model Explanation Through Counterfactual Language Models
title_full_unstemmed	CausaLM: Causal Model Explanation Through Counterfactual Language Models
title_short	CausaLM: Causal Model Explanation Through Counterfactual Language Models
title_sort	causalm causal model explanation through counterfactual language models
url	https://direct.mit.edu/coli/article/47/2/333/98518/CausaLM-Causal-Model-Explanation-Through
work_keys_str_mv	AT amirfeder causalmcausalmodelexplanationthroughcounterfactuallanguagemodels AT nadavoved causalmcausalmodelexplanationthroughcounterfactuallanguagemodels AT urishalit causalmcausalmodelexplanationthroughcounterfactuallanguagemodels AT roireichart causalmcausalmodelexplanationthroughcounterfactuallanguagemodels

CausaLM: Causal Model Explanation Through Counterfactual Language Models

Similar Items