Network Maximal Correlation

Identifying nonlinear relationships in large datasets is a daunting task particularly when the form of the nonlinearity is unknown. Here, we introduce Network Maximal Correlation (NMC) as a fundamental measure to capture nonlinear associations in networks without the knowledge of underlying nonlinea...

Full description

Bibliographic Details
Main Authors: Feizi, Soheil, Makhdoumi, Ali, Duffy, Ken, Kellis, Manolis, Medard, Muriel
Other Authors: Manolis Kellis
Published: 2015
Online Access:http://hdl.handle.net/1721.1/98878
_version_ 1811091956325941248
author Feizi, Soheil
Makhdoumi, Ali
Duffy, Ken
Kellis, Manolis
Medard, Muriel
author2 Manolis Kellis
author_facet Manolis Kellis
Feizi, Soheil
Makhdoumi, Ali
Duffy, Ken
Kellis, Manolis
Medard, Muriel
author_sort Feizi, Soheil
collection MIT
description Identifying nonlinear relationships in large datasets is a daunting task particularly when the form of the nonlinearity is unknown. Here, we introduce Network Maximal Correlation (NMC) as a fundamental measure to capture nonlinear associations in networks without the knowledge of underlying nonlinearity shapes. NMC infers, possibly nonlinear, transformations of variables with zero means and unit variances by maximizing total nonlinear correlation over the underlying network. For the case of having two variables, NMC is equivalent to the standard Maximal Correlation. We characterize a solution of the NMC optimization using geometric properties of Hilbert spaces for both discrete and jointly Gaussian variables. For discrete random variables, we show that the NMC optimization is an instance of the Maximum Correlation Problem and provide necessary conditions for its global optimal solution. Moreover, we propose an efficient algorithm based on Alternating Conditional Expectation (ACE) which converges to a local NMC optimum. For this algorithm, we provide guidelines for choosing appropriate starting points to jump out of local maximizers. We also propose a distributed algorithm to compute a 1-$\epsilon$ approximation of the NMC value for large and dense graphs using graph partitioning. For jointly Gaussian variables, under some conditions, we show that the NMC optimization can be simplified to a Max-Cut problem, where we provide conditions under which an NMC solution can be computed exactly. Under some general conditions, we show that NMC can infer the underlying graphical model for functions of latent jointly Gaussian variables. These functions are unknown, bijective, and can be nonlinear. This result broadens the family of continuous distributions whose graphical models can be characterized efficiently. We illustrate the robustness of NMC in real world applications by showing its continuity with respect to small perturbations of joint distributions. We also show that sample NMC (NMC computed using empirical distributions) converges exponentially fast to the true NMC value. Finally, we apply NMC to different cancer datasets including breast, kidney and liver cancers, and show that NMC infers gene modules that are significantly associated with survival times of individuals while they are not detected using linear association measures.
first_indexed 2024-09-23T15:10:37Z
id mit-1721.1/98878
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T15:10:37Z
publishDate 2015
record_format dspace
spelling mit-1721.1/988782019-04-11T12:34:19Z Network Maximal Correlation Feizi, Soheil Makhdoumi, Ali Duffy, Ken Kellis, Manolis Medard, Muriel Manolis Kellis Computational Biology (Kellis) Identifying nonlinear relationships in large datasets is a daunting task particularly when the form of the nonlinearity is unknown. Here, we introduce Network Maximal Correlation (NMC) as a fundamental measure to capture nonlinear associations in networks without the knowledge of underlying nonlinearity shapes. NMC infers, possibly nonlinear, transformations of variables with zero means and unit variances by maximizing total nonlinear correlation over the underlying network. For the case of having two variables, NMC is equivalent to the standard Maximal Correlation. We characterize a solution of the NMC optimization using geometric properties of Hilbert spaces for both discrete and jointly Gaussian variables. For discrete random variables, we show that the NMC optimization is an instance of the Maximum Correlation Problem and provide necessary conditions for its global optimal solution. Moreover, we propose an efficient algorithm based on Alternating Conditional Expectation (ACE) which converges to a local NMC optimum. For this algorithm, we provide guidelines for choosing appropriate starting points to jump out of local maximizers. We also propose a distributed algorithm to compute a 1-$\epsilon$ approximation of the NMC value for large and dense graphs using graph partitioning. For jointly Gaussian variables, under some conditions, we show that the NMC optimization can be simplified to a Max-Cut problem, where we provide conditions under which an NMC solution can be computed exactly. Under some general conditions, we show that NMC can infer the underlying graphical model for functions of latent jointly Gaussian variables. These functions are unknown, bijective, and can be nonlinear. This result broadens the family of continuous distributions whose graphical models can be characterized efficiently. We illustrate the robustness of NMC in real world applications by showing its continuity with respect to small perturbations of joint distributions. We also show that sample NMC (NMC computed using empirical distributions) converges exponentially fast to the true NMC value. Finally, we apply NMC to different cancer datasets including breast, kidney and liver cancers, and show that NMC infers gene modules that are significantly associated with survival times of individuals while they are not detected using linear association measures. 2015-09-23T19:00:07Z 2015-09-23T19:00:07Z 2015-09-21 2015-09-23T19:00:07Z http://hdl.handle.net/1721.1/98878 MIT-CSAIL-TR-2015-028 Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ 48 p. application/pdf
spellingShingle Feizi, Soheil
Makhdoumi, Ali
Duffy, Ken
Kellis, Manolis
Medard, Muriel
Network Maximal Correlation
title Network Maximal Correlation
title_full Network Maximal Correlation
title_fullStr Network Maximal Correlation
title_full_unstemmed Network Maximal Correlation
title_short Network Maximal Correlation
title_sort network maximal correlation
url http://hdl.handle.net/1721.1/98878
work_keys_str_mv AT feizisoheil networkmaximalcorrelation
AT makhdoumiali networkmaximalcorrelation
AT duffyken networkmaximalcorrelation
AT kellismanolis networkmaximalcorrelation
AT medardmuriel networkmaximalcorrelation