Detecting Novel Associations in Large Data Sets
Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and fo...
Main Authors: | , , , , , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
American Association for the Advancement of Science (AAAS)
2014
|
Online Access: | http://hdl.handle.net/1721.1/84636 https://orcid.org/0000-0001-6463-4203 https://orcid.org/0000-0001-5410-7274 https://orcid.org/0000-0002-3355-6983 |
Summary: | Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R[superscript 2]) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships. |
---|