Building Transparent Models

Transparency has become a key desideratum of machine learning. Properties such as interpretability or robustness are indispensable when model predictions are fed into mission critical applications or those dealing with sensitive/controversial topics (e.g., social, legal, financial, medical, or secur...

Full description

Bibliographic Details
Main Author: Lee, Guang-He
Other Authors: Jaakkola, Tommi S.
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/139573
https://orcid.org/0000-0001-6561-0692
Description
Summary:Transparency has become a key desideratum of machine learning. Properties such as interpretability or robustness are indispensable when model predictions are fed into mission critical applications or those dealing with sensitive/controversial topics (e.g., social, legal, financial, medical, or security tasks). While the desired notion of transparency can vary widely across different scenarios, modern predictors (like deep neural networks) often lack any semblance of this concept, primarily due to their inherent complexity. In this thesis, we focus on a set of formal properties of transparency and design a series of algorithms to build models with these specified properties. In particular, these properties include: (i) the model class (of oblique decision trees), effectively represented and trained via a new family of neural models, (ii) local model classes (e.g., locally linear models), induced from and estimated jointly with a black-box predictor, possibly over structured objects, and (iii) local certificates of robustness, derived for ensembles of any black-box predictors in continuous or discrete spaces. The contributions of this thesis are mainly methodological and theoretical. We also emphasize scalability in large-scale settings. Compared to a human-centric approach to interpretability, our methods are particularly suited for scenarios that require factual verification or cases that are challenging to subjectively judge explanations by humans (e.g., for superhuman models).