Detecting Novel Associations in Large Data Sets
Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and fo...
Main Authors: | , , , , , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
American Association for the Advancement of Science (AAAS)
2014
|
Online Access: | http://hdl.handle.net/1721.1/84636 https://orcid.org/0000-0001-6463-4203 https://orcid.org/0000-0001-5410-7274 https://orcid.org/0000-0002-3355-6983 |
_version_ | 1826196591189426176 |
---|---|
author | Reshef, David N. Reshef, Yakir Grossman, Sharon Rachel Finucane, Hilary Kiyo McVean, Gilean Turnbaugh, Peter J. Mitzenmacher, Michael Sabeti, Pardis C. Lander, Eric Steven |
author2 | Whitaker College of Health Sciences and Technology |
author_facet | Whitaker College of Health Sciences and Technology Reshef, David N. Reshef, Yakir Grossman, Sharon Rachel Finucane, Hilary Kiyo McVean, Gilean Turnbaugh, Peter J. Mitzenmacher, Michael Sabeti, Pardis C. Lander, Eric Steven |
author_sort | Reshef, David N. |
collection | MIT |
description | Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R[superscript 2]) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships. |
first_indexed | 2024-09-23T10:29:44Z |
format | Article |
id | mit-1721.1/84636 |
institution | Massachusetts Institute of Technology |
language | en_US |
last_indexed | 2024-09-23T10:29:44Z |
publishDate | 2014 |
publisher | American Association for the Advancement of Science (AAAS) |
record_format | dspace |
spelling | mit-1721.1/846362022-09-30T21:28:33Z Detecting Novel Associations in Large Data Sets Reshef, David N. Reshef, Yakir Grossman, Sharon Rachel Finucane, Hilary Kiyo McVean, Gilean Turnbaugh, Peter J. Mitzenmacher, Michael Sabeti, Pardis C. Lander, Eric Steven Whitaker College of Health Sciences and Technology Massachusetts Institute of Technology. Department of Biology Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Reshef, David N. Reshef, Yakir Grossman, Sharon Rachel Lander, Eric S. Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R[superscript 2]) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships. National Institute of General Medical Sciences (U.S.) (Medical Scientist Training Program) 2014-02-03T13:18:52Z 2014-02-03T13:18:52Z 2011-12 2011-03 Article http://purl.org/eprint/type/JournalArticle 0036-8075 1095-9203 http://hdl.handle.net/1721.1/84636 Reshef, D. N., Y. A. Reshef, H. K. Finucane, S. R. Grossman, G. McVean, P. J. Turnbaugh, E. S. Lander, M. Mitzenmacher, and P. C. Sabeti. “Detecting Novel Associations in Large Data Sets.” Science 334, no. 6062 (December 15, 2011): 1518-1524. https://orcid.org/0000-0001-6463-4203 https://orcid.org/0000-0001-5410-7274 https://orcid.org/0000-0002-3355-6983 en_US http://dx.doi.org/10.1126/science.1205438 Science Creative Commons Attribution-Noncommercial-Share Alike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/ application/pdf American Association for the Advancement of Science (AAAS) PMC |
spellingShingle | Reshef, David N. Reshef, Yakir Grossman, Sharon Rachel Finucane, Hilary Kiyo McVean, Gilean Turnbaugh, Peter J. Mitzenmacher, Michael Sabeti, Pardis C. Lander, Eric Steven Detecting Novel Associations in Large Data Sets |
title | Detecting Novel Associations in Large Data Sets |
title_full | Detecting Novel Associations in Large Data Sets |
title_fullStr | Detecting Novel Associations in Large Data Sets |
title_full_unstemmed | Detecting Novel Associations in Large Data Sets |
title_short | Detecting Novel Associations in Large Data Sets |
title_sort | detecting novel associations in large data sets |
url | http://hdl.handle.net/1721.1/84636 https://orcid.org/0000-0001-6463-4203 https://orcid.org/0000-0001-5410-7274 https://orcid.org/0000-0002-3355-6983 |
work_keys_str_mv | AT reshefdavidn detectingnovelassociationsinlargedatasets AT reshefyakir detectingnovelassociationsinlargedatasets AT grossmansharonrachel detectingnovelassociationsinlargedatasets AT finucanehilarykiyo detectingnovelassociationsinlargedatasets AT mcveangilean detectingnovelassociationsinlargedatasets AT turnbaughpeterj detectingnovelassociationsinlargedatasets AT mitzenmachermichael detectingnovelassociationsinlargedatasets AT sabetipardisc detectingnovelassociationsinlargedatasets AT landerericsteven detectingnovelassociationsinlargedatasets |