An Analysis of Neural Rationale Models andInfluence Functions for Interpretable MachineLearning
In recent years, increasingly powerful machine learning models have shown remarkable performance on a wide variety of tasks and thus their use is becoming more and more prevalent, including deployment in high stakes settings such as for medical and legal applications. Because these models are comple...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2023
|
Online Access: | https://hdl.handle.net/1721.1/151413 |
_version_ | 1826205967951331328 |
---|---|
author | Zheng, Yiming |
author2 | Shah, Julie A. |
author_facet | Shah, Julie A. Zheng, Yiming |
author_sort | Zheng, Yiming |
collection | MIT |
description | In recent years, increasingly powerful machine learning models have shown remarkable performance on a wide variety of tasks and thus their use is becoming more and more prevalent, including deployment in high stakes settings such as for medical and legal applications. Because these models are complex, their decision process is hard to understand, suggesting a need for model interpretability. Interpretability can be deceptively challenging. First, explanations for a model’s decision on example inputs may appear understandable. However, if the underlying explanation method is not interpretable, more care must be taken before making a claim about the interpretability of the explanation method. Second, it can be difficult to use interpretability techniques efficiently on large models with many parameters.
Through the lens of the first challenge, we examine neural rationale models, which are popular for interpretable predictions of natural language processing (NLP) tasks. In these, a selector extracts segments of the input text, called rationales, and passes these segments to a classifier for prediction. Since the rationale is the only information accessible to the classifier, it is plausibly defined to be the explanation. However, through both philosophical perspectives and empirical studies, we argue rationale models may be less interpretable than expected. We call for more rigorous evaluations of these models to ensure desired properties of interpretability are indeed achieved. Through the lens of the second challenge, we study influence functions which explain a model’s output by tracing the model decision process back to the training data. Given a test point, influence functions compute an influence score for each training point representing how influential it is on the model’s decision with the test point as input. While expensive to compute on large models with many parameters, we aim to gain intuition on influence functions in low dimensional settings and develop simple, cheap to compute heuristics which are competitive with influence functions. |
first_indexed | 2024-09-23T13:21:54Z |
format | Thesis |
id | mit-1721.1/151413 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T13:21:54Z |
publishDate | 2023 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1514132023-08-01T03:38:10Z An Analysis of Neural Rationale Models andInfluence Functions for Interpretable MachineLearning Zheng, Yiming Shah, Julie A. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science In recent years, increasingly powerful machine learning models have shown remarkable performance on a wide variety of tasks and thus their use is becoming more and more prevalent, including deployment in high stakes settings such as for medical and legal applications. Because these models are complex, their decision process is hard to understand, suggesting a need for model interpretability. Interpretability can be deceptively challenging. First, explanations for a model’s decision on example inputs may appear understandable. However, if the underlying explanation method is not interpretable, more care must be taken before making a claim about the interpretability of the explanation method. Second, it can be difficult to use interpretability techniques efficiently on large models with many parameters. Through the lens of the first challenge, we examine neural rationale models, which are popular for interpretable predictions of natural language processing (NLP) tasks. In these, a selector extracts segments of the input text, called rationales, and passes these segments to a classifier for prediction. Since the rationale is the only information accessible to the classifier, it is plausibly defined to be the explanation. However, through both philosophical perspectives and empirical studies, we argue rationale models may be less interpretable than expected. We call for more rigorous evaluations of these models to ensure desired properties of interpretability are indeed achieved. Through the lens of the second challenge, we study influence functions which explain a model’s output by tracing the model decision process back to the training data. Given a test point, influence functions compute an influence score for each training point representing how influential it is on the model’s decision with the test point as input. While expensive to compute on large models with many parameters, we aim to gain intuition on influence functions in low dimensional settings and develop simple, cheap to compute heuristics which are competitive with influence functions. M.Eng. 2023-07-31T19:37:55Z 2023-07-31T19:37:55Z 2023-06 2023-06-06T16:35:27.418Z Thesis https://hdl.handle.net/1721.1/151413 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Zheng, Yiming An Analysis of Neural Rationale Models andInfluence Functions for Interpretable MachineLearning |
title | An Analysis of Neural Rationale Models andInfluence Functions for Interpretable MachineLearning |
title_full | An Analysis of Neural Rationale Models andInfluence Functions for Interpretable MachineLearning |
title_fullStr | An Analysis of Neural Rationale Models andInfluence Functions for Interpretable MachineLearning |
title_full_unstemmed | An Analysis of Neural Rationale Models andInfluence Functions for Interpretable MachineLearning |
title_short | An Analysis of Neural Rationale Models andInfluence Functions for Interpretable MachineLearning |
title_sort | analysis of neural rationale models andinfluence functions for interpretable machinelearning |
url | https://hdl.handle.net/1721.1/151413 |
work_keys_str_mv | AT zhengyiming ananalysisofneuralrationalemodelsandinfluencefunctionsforinterpretablemachinelearning AT zhengyiming analysisofneuralrationalemodelsandinfluencefunctionsforinterpretablemachinelearning |