Equitability and dependence

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.

Bibliographic Details
Main Author: Reshef, David N
Other Authors: Tommi S. Jaakkola and Joshua B. Tenenbaum.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2017
Subjects:
Online Access:http://hdl.handle.net/1721.1/112024
_version_ 1826213681234444288
author Reshef, David N
author2 Tommi S. Jaakkola and Joshua B. Tenenbaum.
author_facet Tommi S. Jaakkola and Joshua B. Tenenbaum.
Reshef, David N
author_sort Reshef, David N
collection MIT
description Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.
first_indexed 2024-09-23T15:53:08Z
format Thesis
id mit-1721.1/112024
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T15:53:08Z
publishDate 2017
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1120242019-04-10T14:03:17Z Equitability and dependence Reshef, David N Tommi S. Jaakkola and Joshua B. Tenenbaum. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. Cataloged from PDF version of thesis. Includes bibliographical references (pages 205-209). Given a high-dimensional data set, we often wish to find the strongest relationships within it. Increasingly, and particularly at modern sample sizes, screening for all non-trivial relationships using an independence test can yield too many results to be a useful approach. What is needed is a way of identifying a smaller set of "strongest" relationships, independent of relationship type (e.g., linear, exponential, etc.). The first goal of this work is to formally present and characterize equitability, a property of measures of dependence that aims to overcome this challenge. We formalize equitability in terms of interval estimates of relationship strength, and then show that under moderate assumptions it is equivalent to requiring that a measure of dependence yield well powered tests not only for distinguishing non-trivial relationships from trivial ones but also for distinguishing stronger relationships from weaker ones. We then show that equitability, to the extent it is achieved, implies that a statistic will be well powered to detect all relationships of a certain minimal strength, across different relationship types. Thus, equitability is a strengthening of power against independence that enables exploration of data sets with a small number of strong, interesting relationships and a large number of weaker, less interesting ones. The second goal of this thesis is to define and theoretically characterize two new statistics that together yield an efficient approach for obtaining both power and equitability. To do this, we first introduce a new population measure of dependence whose goal is equitability and show three equivalent ways that it can be viewed, including as a canonical "smoothing" of mutual information. We then introduce and characterize an efficiently computable consistent estimator of our population measure of dependence, MICe, and we empirically establish its equitability on a large class of noisy functional relationships. Next, we derive a second, related statistic, TICe, whose computation is a trivial side-product of our algorithm and whose goal is powerful independence testing rather than equitability. We prove that this statistic yields a consistent independence test and show in simulations that the test has good power against independence. The third and final goal of this thesis is to present an extensive empirical evaluation of the equitability, power against independence, and runtime of several leading measures of dependence, including MICe and TICe. Our analysis finds that MIC, and TICe achieve state-of-the-art equitability on functional relationships and power against independence, respectively. We also show evidence for a trade-off between power against independence and equitability consistent with our theoretical findings. In the high-dimensional setting, our results suggest that an efficient and practical strategy for achieving a combination of power against independence and equitability is to filter the large set of candidate relationships by TICe and then to rank the remaining ones using MICe. by David N. Reshef. Ph. D. 2017-10-30T15:28:10Z 2017-10-30T15:28:10Z 2017 2017 Thesis http://hdl.handle.net/1721.1/112024 1006379187 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 209 pages application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Reshef, David N
Equitability and dependence
title Equitability and dependence
title_full Equitability and dependence
title_fullStr Equitability and dependence
title_full_unstemmed Equitability and dependence
title_short Equitability and dependence
title_sort equitability and dependence
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/112024
work_keys_str_mv AT reshefdavidn equitabilityanddependence