Optimal testing for properties of distributions

Given samples from an unknown discrete distribution p, is it possible to distinguish whether p belongs to some class of distributions C versus p being far from every distribution in C? This fundamental question has received tremendous attention in statistics, focusing primarily on asymptotic analysi...

Full description

Bibliographic Details
Main Authors: Acharya, Jayadev, Daskalakis, Konstantinos, Kamath, Gautam Chetan
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format: Article
Language:en_US
Published: Neural Information Processing Systems Foundation 2017
Online Access:http://hdl.handle.net/1721.1/110838
https://orcid.org/0000-0001-6416-2904
https://orcid.org/0000-0002-5451-0490
https://orcid.org/0000-0003-0048-2559
_version_ 1811088934324666368
author Acharya, Jayadev
Daskalakis, Konstantinos
Kamath, Gautam Chetan
author2 Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Acharya, Jayadev
Daskalakis, Konstantinos
Kamath, Gautam Chetan
author_sort Acharya, Jayadev
collection MIT
description Given samples from an unknown discrete distribution p, is it possible to distinguish whether p belongs to some class of distributions C versus p being far from every distribution in C? This fundamental question has received tremendous attention in statistics, focusing primarily on asymptotic analysis, as well as in information theory and theoretical computer science, where the emphasis has been on small sample size and computational complexity. Nevertheless, even for basic properties of discrete distributions such as monotonicity, independence, logconcavity, unimodality, and monotone-hazard rate, the optimal sample complexity is unknown. We provide a general approach via which we obtain sample-optimal and computationally efficient testers for all these distribution families. At the core of our approach is an algorithm which solves the following problem: Given samples from an unknown distribution p, and a known distribution q, are p and q close in x[superscript 2]-distance, or far in total variation distance? The optimality of our testers is established by providing matching lower bounds, up to constant factors. Finally, a necessary building block for our testers and an important byproduct of our work are the first known computationally efficient proper learners for discrete log-concave, monotone hazard rate distributions.
first_indexed 2024-09-23T14:10:21Z
format Article
id mit-1721.1/110838
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T14:10:21Z
publishDate 2017
publisher Neural Information Processing Systems Foundation
record_format dspace
spelling mit-1721.1/1108382022-09-28T19:01:25Z Optimal testing for properties of distributions Acharya, Jayadev Daskalakis, Konstantinos Kamath, Gautam Chetan Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Acharya, Jayadev Daskalakis, Konstantinos Kamath, Gautam Chetan Given samples from an unknown discrete distribution p, is it possible to distinguish whether p belongs to some class of distributions C versus p being far from every distribution in C? This fundamental question has received tremendous attention in statistics, focusing primarily on asymptotic analysis, as well as in information theory and theoretical computer science, where the emphasis has been on small sample size and computational complexity. Nevertheless, even for basic properties of discrete distributions such as monotonicity, independence, logconcavity, unimodality, and monotone-hazard rate, the optimal sample complexity is unknown. We provide a general approach via which we obtain sample-optimal and computationally efficient testers for all these distribution families. At the core of our approach is an algorithm which solves the following problem: Given samples from an unknown distribution p, and a known distribution q, are p and q close in x[superscript 2]-distance, or far in total variation distance? The optimality of our testers is established by providing matching lower bounds, up to constant factors. Finally, a necessary building block for our testers and an important byproduct of our work are the first known computationally efficient proper learners for discrete log-concave, monotone hazard rate distributions. 2017-07-25T17:38:21Z 2017-07-25T17:38:21Z 2015-12 Article http://purl.org/eprint/type/ConferencePaper 1049-5258 http://hdl.handle.net/1721.1/110838 Acharya, Jayadev, Constantinos Daskalakis, and Gautam Kamath. "Optimal Testing for Properties of Distributions." Advances in Neural Information Processing Systems 28 (NIPS 2015), Montreal, Canada, 7-12 December, 2015. NIPS 2015. https://orcid.org/0000-0001-6416-2904 https://orcid.org/0000-0002-5451-0490 https://orcid.org/0000-0003-0048-2559 en_US https://papers.nips.cc/paper/5839-optimal-testing-for-properties-of-distributions Advances in Neural Information Processing Systems 28 (NIPS 2015) Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf Neural Information Processing Systems Foundation Neural Information Processing Systems (NIPS)
spellingShingle Acharya, Jayadev
Daskalakis, Konstantinos
Kamath, Gautam Chetan
Optimal testing for properties of distributions
title Optimal testing for properties of distributions
title_full Optimal testing for properties of distributions
title_fullStr Optimal testing for properties of distributions
title_full_unstemmed Optimal testing for properties of distributions
title_short Optimal testing for properties of distributions
title_sort optimal testing for properties of distributions
url http://hdl.handle.net/1721.1/110838
https://orcid.org/0000-0001-6416-2904
https://orcid.org/0000-0002-5451-0490
https://orcid.org/0000-0003-0048-2559
work_keys_str_mv AT acharyajayadev optimaltestingforpropertiesofdistributions
AT daskalakiskonstantinos optimaltestingforpropertiesofdistributions
AT kamathgautamchetan optimaltestingforpropertiesofdistributions