Flexible Low-Rank Statistical Modeling with Missing Data and Side Information

We explore a general statistical framework for low-rank modeling of matrix-valued data, based on convex optimization with a generalized nuclear norm penalty. We study several related problems: the usual low-rank matrix completion problem with flexible loss functions arising from generalized linear m...

Full description

Bibliographic Details
Main Authors: Fithian, William, Mazumder, Rahul
Other Authors: Sloan School of Management
Format: Article
Published: Institute of Mathematical Statistics 2019
Online Access:http://hdl.handle.net/1721.1/120549
https://orcid.org/0000-0003-1384-9743
_version_ 1826207372643663872
author Fithian, William
Mazumder, Rahul
author2 Sloan School of Management
author_facet Sloan School of Management
Fithian, William
Mazumder, Rahul
author_sort Fithian, William
collection MIT
description We explore a general statistical framework for low-rank modeling of matrix-valued data, based on convex optimization with a generalized nuclear norm penalty. We study several related problems: the usual low-rank matrix completion problem with flexible loss functions arising from generalized linear models; reduced-rank regression and multi-task learning; and generalizations of both problems where side information about rows and columns is available, in the form of features or smoothing kernels. We show that our approach encompasses maximum a posteriori estimation arising from Bayesian hierarchical modeling with latent factors, and discuss ramifications of the missing-data mechanism in the context of matrix completion. While the above problems can be naturally posed as rank-constrained optimization problems, which are nonconvex and computationally difficult, we show how to relax them via generalized nuclear norm regularization to obtain convex optimization problems. We discuss algorithms drawing inspiration from modern convex optimization methods to address these large scale convex optimization computational tasks. Finally, we illustrate our flexible approach in problems arising in functional data reconstruction and ecological species distribution modeling.
first_indexed 2024-09-23T13:48:19Z
format Article
id mit-1721.1/120549
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T13:48:19Z
publishDate 2019
publisher Institute of Mathematical Statistics
record_format dspace
spelling mit-1721.1/1205492022-10-01T17:17:00Z Flexible Low-Rank Statistical Modeling with Missing Data and Side Information Fithian, William Mazumder, Rahul Sloan School of Management Mazumder, Rahul We explore a general statistical framework for low-rank modeling of matrix-valued data, based on convex optimization with a generalized nuclear norm penalty. We study several related problems: the usual low-rank matrix completion problem with flexible loss functions arising from generalized linear models; reduced-rank regression and multi-task learning; and generalizations of both problems where side information about rows and columns is available, in the form of features or smoothing kernels. We show that our approach encompasses maximum a posteriori estimation arising from Bayesian hierarchical modeling with latent factors, and discuss ramifications of the missing-data mechanism in the context of matrix completion. While the above problems can be naturally posed as rank-constrained optimization problems, which are nonconvex and computationally difficult, we show how to relax them via generalized nuclear norm regularization to obtain convex optimization problems. We discuss algorithms drawing inspiration from modern convex optimization methods to address these large scale convex optimization computational tasks. Finally, we illustrate our flexible approach in problems arising in functional data reconstruction and ecological species distribution modeling. United States. Office of Naval Research (Grant N000141512342) 2019-02-26T20:20:49Z 2019-02-26T20:20:49Z 2017-08 2019-02-25T21:17:42Z Article http://purl.org/eprint/type/JournalArticle 0883-4237 http://hdl.handle.net/1721.1/120549 Fithian, William and Rahul Mazumder. “Flexible Low-Rank Statistical Modeling with Missing Data and Side Information.” Statistical Science 33, 2 (May 2018): 238–260 © 2018 Institute of Mathematical Statistics https://orcid.org/0000-0003-1384-9743 http://dx.doi.org/10.1214/18-STS642 Statistical Science Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Institute of Mathematical Statistics arXiv
spellingShingle Fithian, William
Mazumder, Rahul
Flexible Low-Rank Statistical Modeling with Missing Data and Side Information
title Flexible Low-Rank Statistical Modeling with Missing Data and Side Information
title_full Flexible Low-Rank Statistical Modeling with Missing Data and Side Information
title_fullStr Flexible Low-Rank Statistical Modeling with Missing Data and Side Information
title_full_unstemmed Flexible Low-Rank Statistical Modeling with Missing Data and Side Information
title_short Flexible Low-Rank Statistical Modeling with Missing Data and Side Information
title_sort flexible low rank statistical modeling with missing data and side information
url http://hdl.handle.net/1721.1/120549
https://orcid.org/0000-0003-1384-9743
work_keys_str_mv AT fithianwilliam flexiblelowrankstatisticalmodelingwithmissingdataandsideinformation
AT mazumderrahul flexiblelowrankstatisticalmodelingwithmissingdataandsideinformation