Linear regression with partially mismatched data: local search with theoretical guarantees

Abstract Linear regression is a fundamental modeling tool in statistics and related fields. In this paper, we study an important variant of linear regression in which the predictor-response pairs are partially mismatched. We use an optimization formulation to simultaneously learn the...

Full description

Bibliographic Details
Main Authors: Mazumder, Rahul, Wang, Haoyue
Other Authors: Sloan School of Management
Format: Article
Language:English
Published: Springer Berlin Heidelberg 2022
Online Access:https://hdl.handle.net/1721.1/144400
_version_ 1811085878118842368
author Mazumder, Rahul
Wang, Haoyue
author2 Sloan School of Management
author_facet Sloan School of Management
Mazumder, Rahul
Wang, Haoyue
author_sort Mazumder, Rahul
collection MIT
description Abstract Linear regression is a fundamental modeling tool in statistics and related fields. In this paper, we study an important variant of linear regression in which the predictor-response pairs are partially mismatched. We use an optimization formulation to simultaneously learn the underlying regression coefficients and the permutation corresponding to the mismatches. The combinatorial structure of the problem leads to computational challenges. We propose and study a simple greedy local search algorithm for this optimization problem that enjoys strong theoretical guarantees and appealing computational performance. We prove that under a suitable scaling of the number of mismatched pairs compared to the number of samples and features, and certain assumptions on problem data; our local search algorithm converges to a nearly-optimal solution at a linear rate. In particular, in the noiseless case, our algorithm converges to the global optimal solution with a linear convergence rate. Based on this result, we prove an upper bound for the estimation error of the parameter. We also propose an approximate local search step that allows us to scale our approach to much larger instances. We conduct numerical experiments to gather further insights into our theoretical results, and show promising performance gains compared to existing approaches.
first_indexed 2024-09-23T13:16:59Z
format Article
id mit-1721.1/144400
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T13:16:59Z
publishDate 2022
publisher Springer Berlin Heidelberg
record_format dspace
spelling mit-1721.1/1444002023-06-22T13:41:05Z Linear regression with partially mismatched data: local search with theoretical guarantees Mazumder, Rahul Wang, Haoyue Sloan School of Management Abstract Linear regression is a fundamental modeling tool in statistics and related fields. In this paper, we study an important variant of linear regression in which the predictor-response pairs are partially mismatched. We use an optimization formulation to simultaneously learn the underlying regression coefficients and the permutation corresponding to the mismatches. The combinatorial structure of the problem leads to computational challenges. We propose and study a simple greedy local search algorithm for this optimization problem that enjoys strong theoretical guarantees and appealing computational performance. We prove that under a suitable scaling of the number of mismatched pairs compared to the number of samples and features, and certain assumptions on problem data; our local search algorithm converges to a nearly-optimal solution at a linear rate. In particular, in the noiseless case, our algorithm converges to the global optimal solution with a linear convergence rate. Based on this result, we prove an upper bound for the estimation error of the parameter. We also propose an approximate local search step that allows us to scale our approach to much larger instances. We conduct numerical experiments to gather further insights into our theoretical results, and show promising performance gains compared to existing approaches. 2022-08-22T13:03:39Z 2022-08-22T13:03:39Z 2022-08-17 2022-08-21T03:10:42Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/144400 Mazumder, Rahul and Wang, Haoyue. 2022. "Linear regression with partially mismatched data: local search with theoretical guarantees." PUBLISHER_CC en https://doi.org/10.1007/s10107-022-01863-y Creative Commons Attribution https://creativecommons.org/licenses/by/4.0/ The Author(s) application/pdf Springer Berlin Heidelberg Springer Berlin Heidelberg
spellingShingle Mazumder, Rahul
Wang, Haoyue
Linear regression with partially mismatched data: local search with theoretical guarantees
title Linear regression with partially mismatched data: local search with theoretical guarantees
title_full Linear regression with partially mismatched data: local search with theoretical guarantees
title_fullStr Linear regression with partially mismatched data: local search with theoretical guarantees
title_full_unstemmed Linear regression with partially mismatched data: local search with theoretical guarantees
title_short Linear regression with partially mismatched data: local search with theoretical guarantees
title_sort linear regression with partially mismatched data local search with theoretical guarantees
url https://hdl.handle.net/1721.1/144400
work_keys_str_mv AT mazumderrahul linearregressionwithpartiallymismatcheddatalocalsearchwiththeoreticalguarantees
AT wanghaoyue linearregressionwithpartiallymismatcheddatalocalsearchwiththeoreticalguarantees