Restructuring Sparse High Dimensional Data for Effective Retrieval

The task in text retrieval is to find the subset of a collection of documents relevant to a user's information request, usually expressed as a set of words. Classically, documents and queries are represented as vectors of word counts. In its simplest form, relevance is defined to be the d...

Full description

Bibliographic Details
Main Authors:	Isbell, Charles, Viola, Paul
Language:	en_US
Published:	2004
Online Access:	http://hdl.handle.net/1721.1/6674

_version_	1826217192627109888
author	Isbell, Charles Viola, Paul
author_facet	Isbell, Charles Viola, Paul
author_sort	Isbell, Charles
collection	MIT
description	The task in text retrieval is to find the subset of a collection of documents relevant to a user's information request, usually expressed as a set of words. Classically, documents and queries are represented as vectors of word counts. In its simplest form, relevance is defined to be the dot product between a document and a query vector--a measure of the number of common terms. A central difficulty in text retrieval is that the presence or absence of a word is not sufficient to determine relevance to a query. Linear dimensionality reduction has been proposed as a technique for extracting underlying structure from the document collection. In some domains (such as vision) dimensionality reduction reduces computational complexity. In text retrieval it is more often used to improve retrieval performance. We propose an alternative and novel technique that produces sparse representations constructed from sets of highly-related words. Documents and queries are represented by their distance to these sets. and relevance is measured by the number of common clusters. This technique significantly improves retrieval performance, is efficient to compute and shares properties with the optimal linear projection operator and the independent components of documents.
first_indexed	2024-09-23T16:59:29Z
id	mit-1721.1/6674
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T16:59:29Z
publishDate	2004
record_format	dspace
spelling	mit-1721.1/66742019-04-12T08:31:45Z Restructuring Sparse High Dimensional Data for Effective Retrieval Isbell, Charles Viola, Paul The task in text retrieval is to find the subset of a collection of documents relevant to a user's information request, usually expressed as a set of words. Classically, documents and queries are represented as vectors of word counts. In its simplest form, relevance is defined to be the dot product between a document and a query vector--a measure of the number of common terms. A central difficulty in text retrieval is that the presence or absence of a word is not sufficient to determine relevance to a query. Linear dimensionality reduction has been proposed as a technique for extracting underlying structure from the document collection. In some domains (such as vision) dimensionality reduction reduces computational complexity. In text retrieval it is more often used to improve retrieval performance. We propose an alternative and novel technique that produces sparse representations constructed from sets of highly-related words. Documents and queries are represented by their distance to these sets. and relevance is measured by the number of common clusters. This technique significantly improves retrieval performance, is efficient to compute and shares properties with the optimal linear projection operator and the independent components of documents. 2004-10-08T20:37:10Z 2004-10-08T20:37:10Z 1998-05-01 AIM-1636 http://hdl.handle.net/1721.1/6674 en_US AIM-1636 5435006 bytes 502542 bytes application/postscript application/pdf application/postscript application/pdf
spellingShingle	Isbell, Charles Viola, Paul Restructuring Sparse High Dimensional Data for Effective Retrieval
title	Restructuring Sparse High Dimensional Data for Effective Retrieval
title_full	Restructuring Sparse High Dimensional Data for Effective Retrieval
title_fullStr	Restructuring Sparse High Dimensional Data for Effective Retrieval
title_full_unstemmed	Restructuring Sparse High Dimensional Data for Effective Retrieval
title_short	Restructuring Sparse High Dimensional Data for Effective Retrieval
title_sort	restructuring sparse high dimensional data for effective retrieval
url	http://hdl.handle.net/1721.1/6674
work_keys_str_mv	AT isbellcharles restructuringsparsehighdimensionaldataforeffectiveretrieval AT violapaul restructuringsparsehighdimensionaldataforeffectiveretrieval

Restructuring Sparse High Dimensional Data for Effective Retrieval

Similar Items