Optimal stopping rules for sequential hypothesis testing

Suppose that we are given sample access to an unknown distribution p over n elements and an explicit distribution q over the same n elements. We would like to reject the null hypothesis "p = q" after seeing as few samples as possible, when p ≠q, while we never want to reject the null, when...

Full description

Bibliographic Details
Main Authors:	Daskalakis, C, Kawase, Y
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format:	Article
Language:	English
Published:	2022
Online Access:	https://hdl.handle.net/1721.1/143459

_version_	1811097695499059200
author	Daskalakis, C Kawase, Y
author2	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Daskalakis, C Kawase, Y
author_sort	Daskalakis, C
collection	MIT
description	Suppose that we are given sample access to an unknown distribution p over n elements and an explicit distribution q over the same n elements. We would like to reject the null hypothesis "p = q" after seeing as few samples as possible, when p ≠q, while we never want to reject the null, when p = q. Well-known results show that ϵ(√ n/ϵ2) samples are necessary and sufficient for distinguishing whether p equals q versus p is ϵ -far from q in total variation distance. However, this requires the distinguishing radius ϵ to be fixed prior to deciding how many samples to request. Our goal is instead to design sequential hypothesis testers, i.e. online algorithms that request i.i.d. samples from p and stop as soon as they can confidently reject the hypothesis p = q, without being given a lower bound on the distance between p and q, when p ≠q. In particular, we want to minimize the number of samples requested by our tests as a function of the distance between p and q, and if p = q we want the algorithm, with high probability, to never reject the null. Our work is motivated by and addresses the practical challenge of sequential A/B testing in Statistics. We show that, when n = 2, any sequential hypothesis test must see Ω (1 /dtv(p,q)2 log log 1 dtv(p,q) ) samples, with high (constant) probability, before it rejects p = q, where dtv(p, q) is the-unknown to the tester-Total variation distance between p and q. We match the dependence of this lower bound on dtv(p, q) by proposing a sequential tester that rejects p = q from at most O √n/dtv(p,q)2 log log 1/dtv(p,q) samples with high (constant) probability. The Ω (√ n) dependence on the support size n is also known to be necessary. We similarly provide two-sample sequential hypothesis testers, when sample access is given to both p and q, and discuss applications to sequential A/B testing.
first_indexed	2024-09-23T17:03:27Z
format	Article
id	mit-1721.1/143459
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T17:03:27Z
publishDate	2022
record_format	dspace
spelling	mit-1721.1/1434592023-01-10T17:02:14Z Optimal stopping rules for sequential hypothesis testing Daskalakis, C Kawase, Y Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Suppose that we are given sample access to an unknown distribution p over n elements and an explicit distribution q over the same n elements. We would like to reject the null hypothesis "p = q" after seeing as few samples as possible, when p ≠q, while we never want to reject the null, when p = q. Well-known results show that ϵ(√ n/ϵ2) samples are necessary and sufficient for distinguishing whether p equals q versus p is ϵ -far from q in total variation distance. However, this requires the distinguishing radius ϵ to be fixed prior to deciding how many samples to request. Our goal is instead to design sequential hypothesis testers, i.e. online algorithms that request i.i.d. samples from p and stop as soon as they can confidently reject the hypothesis p = q, without being given a lower bound on the distance between p and q, when p ≠q. In particular, we want to minimize the number of samples requested by our tests as a function of the distance between p and q, and if p = q we want the algorithm, with high probability, to never reject the null. Our work is motivated by and addresses the practical challenge of sequential A/B testing in Statistics. We show that, when n = 2, any sequential hypothesis test must see Ω (1 /dtv(p,q)2 log log 1 dtv(p,q) ) samples, with high (constant) probability, before it rejects p = q, where dtv(p, q) is the-unknown to the tester-Total variation distance between p and q. We match the dependence of this lower bound on dtv(p, q) by proposing a sequential tester that rejects p = q from at most O √n/dtv(p,q)2 log log 1/dtv(p,q) samples with high (constant) probability. The Ω (√ n) dependence on the support size n is also known to be necessary. We similarly provide two-sample sequential hypothesis testers, when sample access is given to both p and q, and discuss applications to sequential A/B testing. 2022-06-17T14:20:50Z 2022-06-17T14:20:50Z 2017-09-01 2022-06-17T14:15:30Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/143459 Daskalakis, C and Kawase, Y. 2017. "Optimal stopping rules for sequential hypothesis testing." Leibniz International Proceedings in Informatics, LIPIcs, 87. en 10.4230/LIPIcs.ESA.2017.32 Leibniz International Proceedings in Informatics, LIPIcs Creative Commons Attribution 3.0 unported license https://creativecommons.org/licenses/by/3.0/ application/pdf DROPS
spellingShingle	Daskalakis, C Kawase, Y Optimal stopping rules for sequential hypothesis testing
title	Optimal stopping rules for sequential hypothesis testing
title_full	Optimal stopping rules for sequential hypothesis testing
title_fullStr	Optimal stopping rules for sequential hypothesis testing
title_full_unstemmed	Optimal stopping rules for sequential hypothesis testing
title_short	Optimal stopping rules for sequential hypothesis testing
title_sort	optimal stopping rules for sequential hypothesis testing
url	https://hdl.handle.net/1721.1/143459
work_keys_str_mv	AT daskalakisc optimalstoppingrulesforsequentialhypothesistesting AT kawasey optimalstoppingrulesforsequentialhypothesistesting

Optimal stopping rules for sequential hypothesis testing

Similar Items