CausaLM: Causal Model Explanation Through Counterfactual Language Models

AbstractUnderstanding predictions made by deep neural networks is notoriously difficult, but also crucial to their dissemination. As all machine learning–based methods, they are as good as their training data, and can also capture unwanted biases. While there are tools that can help...

Full description

Bibliographic Details
Main Authors: Amir Feder, Nadav Oved, Uri Shalit, Roi Reichart
Format: Article
Language:English
Published: The MIT Press 2021-07-01
Series:Computational Linguistics
Online Access:https://direct.mit.edu/coli/article/47/2/333/98518/CausaLM-Causal-Model-Explanation-Through
_version_ 1811251409562107904
author Amir Feder
Nadav Oved
Uri Shalit
Roi Reichart
author_facet Amir Feder
Nadav Oved
Uri Shalit
Roi Reichart
author_sort Amir Feder
collection DOAJ
description AbstractUnderstanding predictions made by deep neural networks is notoriously difficult, but also crucial to their dissemination. As all machine learning–based methods, they are as good as their training data, and can also capture unwanted biases. While there are tools that can help understand whether such biases exist, they do not distinguish between correlation and causation, and might be ill-suited for text-based models and for reasoning about high-level language concepts. A key problem of estimating the causal effect of a concept of interest on a given model is that this estimation requires the generation of counterfactual examples, which is challenging with existing generation technology. To bridge that gap, we propose CausaLM, a framework for producing causal model explanations using counterfactual language representation models. Our approach is based on fine-tuning of deep contextualized embedding models with auxiliary adversarial tasks derived from the causal graph of the problem. Concretely, we show that by carefully choosing auxiliary adversarial pre-training tasks, language representation models such as BERT can effectively learn a counterfactual representation for a given concept of interest, and be used to estimate its true causal effect on model performance. A byproduct of our method is a language representation model that is unaffected by the tested concept, which can be useful in mitigating unwanted bias ingrained in the data.1
first_indexed 2024-04-12T16:19:12Z
format Article
id doaj.art-695c189dcd6c464bb43626363309264e
institution Directory Open Access Journal
issn 0891-2017
1530-9312
language English
last_indexed 2024-04-12T16:19:12Z
publishDate 2021-07-01
publisher The MIT Press
record_format Article
series Computational Linguistics
spelling doaj.art-695c189dcd6c464bb43626363309264e2022-12-22T03:25:37ZengThe MIT PressComputational Linguistics0891-20171530-93122021-07-0147233338610.1162/coli_a_00404CausaLM: Causal Model Explanation Through Counterfactual Language ModelsAmir Feder0Nadav Oved1Uri Shalit2Roi Reichart3Faculty of Industrial Engineering and Management, Technion - Israel Institute of Technology. feder@campus.technion.ac.ilFaculty of Industrial Engineering and Management, Technion - Israel Institute of Technology. nadavo@campus.technion.ac.ilFaculty of Industrial Engineering and Management, Technion - Israel Institute of Technology. urishalit@technion.ac.ilFaculty of Industrial Engineering and Management, Technion - Israel Institute of Technology. roiri@technion.ac.il AbstractUnderstanding predictions made by deep neural networks is notoriously difficult, but also crucial to their dissemination. As all machine learning–based methods, they are as good as their training data, and can also capture unwanted biases. While there are tools that can help understand whether such biases exist, they do not distinguish between correlation and causation, and might be ill-suited for text-based models and for reasoning about high-level language concepts. A key problem of estimating the causal effect of a concept of interest on a given model is that this estimation requires the generation of counterfactual examples, which is challenging with existing generation technology. To bridge that gap, we propose CausaLM, a framework for producing causal model explanations using counterfactual language representation models. Our approach is based on fine-tuning of deep contextualized embedding models with auxiliary adversarial tasks derived from the causal graph of the problem. Concretely, we show that by carefully choosing auxiliary adversarial pre-training tasks, language representation models such as BERT can effectively learn a counterfactual representation for a given concept of interest, and be used to estimate its true causal effect on model performance. A byproduct of our method is a language representation model that is unaffected by the tested concept, which can be useful in mitigating unwanted bias ingrained in the data.1https://direct.mit.edu/coli/article/47/2/333/98518/CausaLM-Causal-Model-Explanation-Through
spellingShingle Amir Feder
Nadav Oved
Uri Shalit
Roi Reichart
CausaLM: Causal Model Explanation Through Counterfactual Language Models
Computational Linguistics
title CausaLM: Causal Model Explanation Through Counterfactual Language Models
title_full CausaLM: Causal Model Explanation Through Counterfactual Language Models
title_fullStr CausaLM: Causal Model Explanation Through Counterfactual Language Models
title_full_unstemmed CausaLM: Causal Model Explanation Through Counterfactual Language Models
title_short CausaLM: Causal Model Explanation Through Counterfactual Language Models
title_sort causalm causal model explanation through counterfactual language models
url https://direct.mit.edu/coli/article/47/2/333/98518/CausaLM-Causal-Model-Explanation-Through
work_keys_str_mv AT amirfeder causalmcausalmodelexplanationthroughcounterfactuallanguagemodels
AT nadavoved causalmcausalmodelexplanationthroughcounterfactuallanguagemodels
AT urishalit causalmcausalmodelexplanationthroughcounterfactuallanguagemodels
AT roireichart causalmcausalmodelexplanationthroughcounterfactuallanguagemodels