Super-sampling with a reservoir

We introduce an alternative to reservoir sampling, a classic and popular algorithm for drawing a fixed-size subsample from streaming data in a single pass. Rather than draw a random sample, our approach performs an online optimization which aims to select the subset that provides the best overall ap...

Ful tanımlama

Detaylı Bibliyografya
Asıl Yazarlar:	Paige, B, Sejdinovic, D, Wood, F
Materyal Türü:	Conference item
Baskı/Yayın Bilgisi:	AUAI Press 2016

_version_	1826275714638282752
author	Paige, B Sejdinovic, D Wood, F
author_facet	Paige, B Sejdinovic, D Wood, F
author_sort	Paige, B
collection	OXFORD
description	We introduce an alternative to reservoir sampling, a classic and popular algorithm for drawing a fixed-size subsample from streaming data in a single pass. Rather than draw a random sample, our approach performs an online optimization which aims to select the subset that provides the best overall approximation to the full data set, as judged using a kernel two-sample test. This produces subsets which minimize the worst-case relative error when computing expectations of functions in a specified function class, using just the samples from the subset. Kernel functions are approximated using random Fourier features, and the subset of samples itself is stored in a random projection tree. The resulting algorithm runs in a single pass through the whole data set, and has a per-iteration computational complexity logarithmic in the size of the subset. These “supersamples” subsampled from the full data provide a concise summary, as demonstrated empirically on mixture models and the MNIST dataset.
first_indexed	2024-03-06T23:02:59Z
format	Conference item
id	oxford-uuid:62cf0b18-df18-4729-bd1f-21a9e834e2b9
institution	University of Oxford
last_indexed	2024-03-06T23:02:59Z
publishDate	2016
publisher	AUAI Press
record_format	dspace
spelling	oxford-uuid:62cf0b18-df18-4729-bd1f-21a9e834e2b92022-03-26T18:08:40ZSuper-sampling with a reservoirConference itemhttp://purl.org/coar/resource_type/c_5794uuid:62cf0b18-df18-4729-bd1f-21a9e834e2b9Symplectic Elements at OxfordAUAI Press2016Paige, BSejdinovic, DWood, FWe introduce an alternative to reservoir sampling, a classic and popular algorithm for drawing a fixed-size subsample from streaming data in a single pass. Rather than draw a random sample, our approach performs an online optimization which aims to select the subset that provides the best overall approximation to the full data set, as judged using a kernel two-sample test. This produces subsets which minimize the worst-case relative error when computing expectations of functions in a specified function class, using just the samples from the subset. Kernel functions are approximated using random Fourier features, and the subset of samples itself is stored in a random projection tree. The resulting algorithm runs in a single pass through the whole data set, and has a per-iteration computational complexity logarithmic in the size of the subset. These “supersamples” subsampled from the full data provide a concise summary, as demonstrated empirically on mixture models and the MNIST dataset.
spellingShingle	Paige, B Sejdinovic, D Wood, F Super-sampling with a reservoir
title	Super-sampling with a reservoir
title_full	Super-sampling with a reservoir
title_fullStr	Super-sampling with a reservoir
title_full_unstemmed	Super-sampling with a reservoir
title_short	Super-sampling with a reservoir
title_sort	super sampling with a reservoir
work_keys_str_mv	AT paigeb supersamplingwithareservoir AT sejdinovicd supersamplingwithareservoir AT woodf supersamplingwithareservoir

Super-sampling with a reservoir

Benzer Materyaller