Interpreting neural network judgments via minimal, stable, and symbolic corrections

© 2018 Curran Associates Inc..All rights reserved. We present a new algorithm to generate minimal, stable, and symbolic corrections to an input that will cause a neural network with ReLU activations to change its output. We argue that such a correction is a useful way to provide feedback to a user w...

Full description

Bibliographic Details
Main Authors: Solar Lezama, Armando, Singh, Rishabh, Zhang, Xin
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:English
Published: 2021
Online Access:https://hdl.handle.net/1721.1/137906
_version_ 1826197579135713280
author Solar Lezama, Armando
Singh, Rishabh
Zhang, Xin
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Solar Lezama, Armando
Singh, Rishabh
Zhang, Xin
author_sort Solar Lezama, Armando
collection MIT
description © 2018 Curran Associates Inc..All rights reserved. We present a new algorithm to generate minimal, stable, and symbolic corrections to an input that will cause a neural network with ReLU activations to change its output. We argue that such a correction is a useful way to provide feedback to a user when the network's output is different from a desired output. Our algorithm generates such a correction by solving a series of linear constraint satisfaction problems. The technique is evaluated on three neural network models: one predicting whether an applicant will pay a mortgage, one predicting whether a first-order theorem can be proved efficiently by a solver using certain heuristics, and the final one judging whether a drawing is an accurate rendition of a canonical drawing of a cat.
first_indexed 2024-09-23T10:49:48Z
format Article
id mit-1721.1/137906
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T10:49:48Z
publishDate 2021
record_format dspace
spelling mit-1721.1/1379062023-02-01T21:51:38Z Interpreting neural network judgments via minimal, stable, and symbolic corrections Solar Lezama, Armando Singh, Rishabh Zhang, Xin Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory © 2018 Curran Associates Inc..All rights reserved. We present a new algorithm to generate minimal, stable, and symbolic corrections to an input that will cause a neural network with ReLU activations to change its output. We argue that such a correction is a useful way to provide feedback to a user when the network's output is different from a desired output. Our algorithm generates such a correction by solving a series of linear constraint satisfaction problems. The technique is evaluated on three neural network models: one predicting whether an applicant will pay a mortgage, one predicting whether a first-order theorem can be proved efficiently by a solver using certain heuristics, and the final one judging whether a drawing is an accurate rendition of a canonical drawing of a cat. 2021-11-09T15:09:50Z 2021-11-09T15:09:50Z 2018 2019-07-10T13:22:05Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/137906 Solar Lezama, Armando, Singh, Rishabh and Zhang, Xin. 2018. "Interpreting neural network judgments via minimal, stable, and symbolic corrections." en https://papers.nips.cc/paper/7736-interpreting-neural-network-judgments-via-minimal-stable-and-symbolic-corrections Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf Neural Information Processing Systems (NIPS)
spellingShingle Solar Lezama, Armando
Singh, Rishabh
Zhang, Xin
Interpreting neural network judgments via minimal, stable, and symbolic corrections
title Interpreting neural network judgments via minimal, stable, and symbolic corrections
title_full Interpreting neural network judgments via minimal, stable, and symbolic corrections
title_fullStr Interpreting neural network judgments via minimal, stable, and symbolic corrections
title_full_unstemmed Interpreting neural network judgments via minimal, stable, and symbolic corrections
title_short Interpreting neural network judgments via minimal, stable, and symbolic corrections
title_sort interpreting neural network judgments via minimal stable and symbolic corrections
url https://hdl.handle.net/1721.1/137906
work_keys_str_mv AT solarlezamaarmando interpretingneuralnetworkjudgmentsviaminimalstableandsymboliccorrections
AT singhrishabh interpretingneuralnetworkjudgmentsviaminimalstableandsymboliccorrections
AT zhangxin interpretingneuralnetworkjudgmentsviaminimalstableandsymboliccorrections