Learning Big (Image) Data via Coresets for Dictionaries

Signal and image processing have seen an explosion of interest in the last few years in a new form of signal/image characterization via the concept of sparsity with respect to a dictionary. An active field of research is dictionary learning: the representation of a given large set of vectors (e.g. s...

Full description

Bibliographic Details
Main Authors:	Sochen, Nir, Feldman, Dan, Feigin-Almon, Micha
Other Authors:	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format:	Article
Language:	English
Published:	Springer US 2016
Online Access:	http://hdl.handle.net/1721.1/105528 https://orcid.org/0000-0001-7649-9539

_version_	1811094948126130176
author	Sochen, Nir Feldman, Dan Feigin-Almon, Micha
author2	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Sochen, Nir Feldman, Dan Feigin-Almon, Micha
author_sort	Sochen, Nir
collection	MIT
description	Signal and image processing have seen an explosion of interest in the last few years in a new form of signal/image characterization via the concept of sparsity with respect to a dictionary. An active field of research is dictionary learning: the representation of a given large set of vectors (e.g. signals or images) as linear combinations of only few vectors (patterns). To further reduce the size of the representation, the combinations are usually required to be sparse, i.e., each signal is a linear combination of only a small number of patterns. This paper suggests a new computational approach to the problem of dictionary learning, known in computational geometry as coresets. A coreset for dictionary learning is a small smart non-uniform sample from the input signals such that the quality of any given dictionary with respect to the input can be approximated via the coreset. In particular, the optimal dictionary for the input can be approximated by learning the coreset. Since the coreset is small, the learning is faster. Moreover, using merge-and-reduce, the coreset can be constructed for streaming signals that do not fit in memory and can also be computed in parallel. We apply our coresets for dictionary learning of images using the K-SVD algorithm and bound their size and approximation error analytically. Our simulations demonstrate gain factors of up to 60 in computational time with the same, and even better, performance. We also demonstrate our ability to perform computations on larger patches and high-definition images, where the traditional approach breaks down.
first_indexed	2024-09-23T16:08:04Z
format	Article
id	mit-1721.1/105528
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T16:08:04Z
publishDate	2016
publisher	Springer US
record_format	dspace
spelling	mit-1721.1/1055282022-09-29T18:27:24Z Learning Big (Image) Data via Coresets for Dictionaries Sochen, Nir Feldman, Dan Feigin-Almon, Micha Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Media Laboratory Feldman, Dan Feigin-Almon, Micha Signal and image processing have seen an explosion of interest in the last few years in a new form of signal/image characterization via the concept of sparsity with respect to a dictionary. An active field of research is dictionary learning: the representation of a given large set of vectors (e.g. signals or images) as linear combinations of only few vectors (patterns). To further reduce the size of the representation, the combinations are usually required to be sparse, i.e., each signal is a linear combination of only a small number of patterns. This paper suggests a new computational approach to the problem of dictionary learning, known in computational geometry as coresets. A coreset for dictionary learning is a small smart non-uniform sample from the input signals such that the quality of any given dictionary with respect to the input can be approximated via the coreset. In particular, the optimal dictionary for the input can be approximated by learning the coreset. Since the coreset is small, the learning is faster. Moreover, using merge-and-reduce, the coreset can be constructed for streaming signals that do not fit in memory and can also be computed in parallel. We apply our coresets for dictionary learning of images using the K-SVD algorithm and bound their size and approximation error analytically. Our simulations demonstrate gain factors of up to 60 in computational time with the same, and even better, performance. We also demonstrate our ability to perform computations on larger patches and high-definition images, where the traditional approach breaks down. 2016-12-02T17:45:23Z 2016-12-02T17:45:23Z 2013-03 2016-08-18T15:43:40Z Article http://purl.org/eprint/type/JournalArticle 0924-9907 1573-7683 http://hdl.handle.net/1721.1/105528 Feldman, Dan, Micha Feigin, and Nir Sochen. “Learning Big (Image) Data via Coresets for Dictionaries.” Journal of Mathematical Imaging and Vision 46, no. 3 (March 20, 2013): 276–291. https://orcid.org/0000-0001-7649-9539 en http://dx.doi.org/10.1007/s10851-013-0431-x Journal of Mathematical Imaging and Vision Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. Springer Science+Business Media New York application/pdf Springer US Springer US
spellingShingle	Sochen, Nir Feldman, Dan Feigin-Almon, Micha Learning Big (Image) Data via Coresets for Dictionaries
title	Learning Big (Image) Data via Coresets for Dictionaries
title_full	Learning Big (Image) Data via Coresets for Dictionaries
title_fullStr	Learning Big (Image) Data via Coresets for Dictionaries
title_full_unstemmed	Learning Big (Image) Data via Coresets for Dictionaries
title_short	Learning Big (Image) Data via Coresets for Dictionaries
title_sort	learning big image data via coresets for dictionaries
url	http://hdl.handle.net/1721.1/105528 https://orcid.org/0000-0001-7649-9539
work_keys_str_mv	AT sochennir learningbigimagedataviacoresetsfordictionaries AT feldmandan learningbigimagedataviacoresetsfordictionaries AT feiginalmonmicha learningbigimagedataviacoresetsfordictionaries

Learning Big (Image) Data via Coresets for Dictionaries

Similar Items