External Sampling

36th International Colloquium, ICALP 2009, Rhodes, Greece, July 5-12, 2009, Proceedings, Part I

Bibliographic Details
Main Authors: Andoni, Alexandr, Indyk, Piotr, Onak, Krzysztof, Rubinfeld, Ronitt
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format: Article
Language:en_US
Published: Springer Berlin / Heidelberg 2012
Online Access:http://hdl.handle.net/1721.1/73886
https://orcid.org/0000-0002-4353-7639
https://orcid.org/0000-0002-7983-9524
_version_ 1826216275432439808
author Andoni, Alexandr
Indyk, Piotr
Onak, Krzysztof
Rubinfeld, Ronitt
author2 Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Andoni, Alexandr
Indyk, Piotr
Onak, Krzysztof
Rubinfeld, Ronitt
author_sort Andoni, Alexandr
collection MIT
description 36th International Colloquium, ICALP 2009, Rhodes, Greece, July 5-12, 2009, Proceedings, Part I
first_indexed 2024-09-23T16:45:06Z
format Article
id mit-1721.1/73886
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T16:45:06Z
publishDate 2012
publisher Springer Berlin / Heidelberg
record_format dspace
spelling mit-1721.1/738862022-09-29T21:14:11Z External Sampling Andoni, Alexandr Indyk, Piotr Onak, Krzysztof Rubinfeld, Ronitt Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Andoni, Alexandr Indyk, Piotr Onak, Krzysztof Rubinfeld, Ronitt 36th International Colloquium, ICALP 2009, Rhodes, Greece, July 5-12, 2009, Proceedings, Part I We initiate the study of sublinear-time algorithms in the external memory model [1]. In this model, the data is stored in blocks of a certain size B, and the algorithm is charged a unit cost for each block access. This model is well-studied, since it reflects the computational issues occurring when the (massive) input is stored on a disk. Since each block access operates on B data elements in parallel, many problems have external memory algorithms whose number of block accesses is only a small fraction (e.g. 1/B) of their main memory complexity. However, to the best of our knowledge, no such reduction in complexity is known for any sublinear-time algorithm. One plausible explanation is that the vast majority of sublinear-time algorithms use random sampling and thus exhibit no locality of reference. This state of affairs is quite unfortunate, since both sublinear-time algorithms and the external memory model are important approaches to dealing with massive data sets, and ideally they should be combined to achieve best performance. In this paper we show that such combination is indeed possible. In particular, we consider three well-studied problems: testing of distinctness, uniformity and identity of an empirical distribution induced by data. For these problems we show random-sampling-based algorithms whose number of block accesses is up to a factor of 1/√B smaller than the main memory complexity of those problems. We also show that this improvement is optimal for those problems. Since these problems are natural primitives for a number of sampling-based algorithms for other problems, our tools improve the external memory complexity of other problems as well. David & Lucile Packard Foundation (Fellowship) Center for Massive Data Algorithmics (MADALGO) Marie Curie (International Reintegration Grant 231077) National Science Foundation (U.S.) (Grant 0514771) National Science Foundation (U.S.) (Grant 0728645) National Science Foundation (U.S.) (Grant 0732334) Symantec Research Labs (Research Fellowship) 2012-10-11T18:22:51Z 2012-10-11T18:22:51Z 2009-07 2009-07 Article http://purl.org/eprint/type/JournalArticle 978-3-642-02926-4 0302-9743 1611-3349 http://hdl.handle.net/1721.1/73886 Andoni, Alexandr et al. “External Sampling.” Automata, Languages and Programming. Ed. Susanne Albers et al. LNCS Vol. 5555. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009. 83–94. https://orcid.org/0000-0002-4353-7639 https://orcid.org/0000-0002-7983-9524 en_US http://dx.doi.org/10.1007/978-3-642-02927-1_9 Automata, Languages and Programming Creative Commons Attribution-Noncommercial-Share Alike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/ application/pdf Springer Berlin / Heidelberg MIT web domain
spellingShingle Andoni, Alexandr
Indyk, Piotr
Onak, Krzysztof
Rubinfeld, Ronitt
External Sampling
title External Sampling
title_full External Sampling
title_fullStr External Sampling
title_full_unstemmed External Sampling
title_short External Sampling
title_sort external sampling
url http://hdl.handle.net/1721.1/73886
https://orcid.org/0000-0002-4353-7639
https://orcid.org/0000-0002-7983-9524
work_keys_str_mv AT andonialexandr externalsampling
AT indykpiotr externalsampling
AT onakkrzysztof externalsampling
AT rubinfeldronitt externalsampling