Efficiently learning structured distributions from untrusted batches

© 2020 ACM. We study the problem, introduced by Qiao and Valiant, of learning from untrusted batches. Here, we assume m users, all of whom have samples from some underlying distribution over 1, ..., n. Each user sends a batch of k i.i.d. samples from this distribution; however an "-fraction of...

Full description

Bibliographic Details
Main Authors:	Chen, Sitan, Li, Jerry, Moitra, Ankur
Format:	Article
Language:	English
Published:	ACM 2021
Online Access:	https://hdl.handle.net/1721.1/137507

_version_	1826196929911980032
author	Chen, Sitan Li, Jerry Moitra, Ankur
author_facet	Chen, Sitan Li, Jerry Moitra, Ankur
author_sort	Chen, Sitan
collection	MIT
description	© 2020 ACM. We study the problem, introduced by Qiao and Valiant, of learning from untrusted batches. Here, we assume m users, all of whom have samples from some underlying distribution over 1, ..., n. Each user sends a batch of k i.i.d. samples from this distribution; however an "-fraction of users are untrustworthy and can send adversarially chosen responses. The goal of the algorithm is to learn in total variation distance. When k = 1 this is the standard robust univariate density estimation setting and it is well-understood that (") error is unavoidable. Suprisingly, Qiao and Valiant gave an estimator which improves upon this rate when k is large. Unfortunately, their algorithms run in time which is exponential in either n or k. We first give a sequence of polynomial time algorithms whose estimation error approaches the information-theoretically optimal bound for this problem. Our approach is based on recent algorithms derived from the sum-of-squares hierarchy, in the context of high-dimensional robust estimation. We show that algorithms for learning from untrusted batches can also be cast in this framework, but by working with a more complicated set of test functions. It turns out that this abstraction is quite powerful, and can be generalized to incorporate additional problem specific constraints. Our second and main result is to show that this technology can be leveraged to build in prior knowledge about the shape of the distribution. Crucially, this allows us to reduce the sample complexity of learning from untrusted batches to polylogarithmic in n for most natural classes of distributions, which is important in many applications. To do so, we demonstrate that these sum-of-squares algorithms for robust mean estimation can be made to handle complex combinatorial constraints (e.g. those arising from VC theory), which may be of independent technical interest.
first_indexed	2024-09-23T10:39:55Z
format	Article
id	mit-1721.1/137507
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T10:39:55Z
publishDate	2021
publisher	ACM
record_format	dspace
spelling	mit-1721.1/1375072022-09-27T14:07:11Z Efficiently learning structured distributions from untrusted batches Chen, Sitan Li, Jerry Moitra, Ankur © 2020 ACM. We study the problem, introduced by Qiao and Valiant, of learning from untrusted batches. Here, we assume m users, all of whom have samples from some underlying distribution over 1, ..., n. Each user sends a batch of k i.i.d. samples from this distribution; however an "-fraction of users are untrustworthy and can send adversarially chosen responses. The goal of the algorithm is to learn in total variation distance. When k = 1 this is the standard robust univariate density estimation setting and it is well-understood that (") error is unavoidable. Suprisingly, Qiao and Valiant gave an estimator which improves upon this rate when k is large. Unfortunately, their algorithms run in time which is exponential in either n or k. We first give a sequence of polynomial time algorithms whose estimation error approaches the information-theoretically optimal bound for this problem. Our approach is based on recent algorithms derived from the sum-of-squares hierarchy, in the context of high-dimensional robust estimation. We show that algorithms for learning from untrusted batches can also be cast in this framework, but by working with a more complicated set of test functions. It turns out that this abstraction is quite powerful, and can be generalized to incorporate additional problem specific constraints. Our second and main result is to show that this technology can be leveraged to build in prior knowledge about the shape of the distribution. Crucially, this allows us to reduce the sample complexity of learning from untrusted batches to polylogarithmic in n for most natural classes of distributions, which is important in many applications. To do so, we demonstrate that these sum-of-squares algorithms for robust mean estimation can be made to handle complex combinatorial constraints (e.g. those arising from VC theory), which may be of independent technical interest. 2021-11-05T15:09:02Z 2021-11-05T15:09:02Z 2020 2021-05-24T18:23:48Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/137507 Chen, Sitan, Li, Jerry and Moitra, Ankur. 2020. "Efficiently learning structured distributions from untrusted batches." Proceedings of the Annual ACM Symposium on Theory of Computing. en 10.1145/3357713.3384337 Proceedings of the Annual ACM Symposium on Theory of Computing Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf ACM arXiv
spellingShingle	Chen, Sitan Li, Jerry Moitra, Ankur Efficiently learning structured distributions from untrusted batches
title	Efficiently learning structured distributions from untrusted batches
title_full	Efficiently learning structured distributions from untrusted batches
title_fullStr	Efficiently learning structured distributions from untrusted batches
title_full_unstemmed	Efficiently learning structured distributions from untrusted batches
title_short	Efficiently learning structured distributions from untrusted batches
title_sort	efficiently learning structured distributions from untrusted batches
url	https://hdl.handle.net/1721.1/137507
work_keys_str_mv	AT chensitan efficientlylearningstructureddistributionsfromuntrustedbatches AT lijerry efficientlylearningstructureddistributionsfromuntrustedbatches AT moitraankur efficientlylearningstructureddistributionsfromuntrustedbatches

Efficiently learning structured distributions from untrusted batches

Similar Items