Tuning Database-Friendly Random Projection Matrices for Improved Distance Preservation on Specific Data

Random Projection is one of the most popular and successful dimensionality reduction algorithms for large volumes of data. However, given its stochastic nature, different initializations of the projection matrix can lead to very different levels of performance. This paper presents a guided random se...

Full description

Bibliographic Details
Main Authors: López-Sánchez, Daniel, de Bodt, Cyril, Lee, John A., Arrieta, Angélica G., Corchado, Juan M.
Other Authors: Massachusetts Institute of Technology. Media Laboratory
Format: Article
Language:English
Published: Springer US 2021
Online Access:https://hdl.handle.net/1721.1/133016
_version_ 1826217895573585920
author López-Sánchez, Daniel
de Bodt, Cyril
Lee, John A.
Arrieta, Angélica G.
Corchado, Juan M.
author2 Massachusetts Institute of Technology. Media Laboratory
author_facet Massachusetts Institute of Technology. Media Laboratory
López-Sánchez, Daniel
de Bodt, Cyril
Lee, John A.
Arrieta, Angélica G.
Corchado, Juan M.
author_sort López-Sánchez, Daniel
collection MIT
description Random Projection is one of the most popular and successful dimensionality reduction algorithms for large volumes of data. However, given its stochastic nature, different initializations of the projection matrix can lead to very different levels of performance. This paper presents a guided random search algorithm to mitigate this problem. The proposed method uses a small number of training data samples to iteratively adjust a projection matrix, improving its performance on similarly distributed data. Experimental results show that projection matrices generated with the proposed method result in a better preservation of distances between data samples. Conveniently, this is achieved while preserving the database-friendliness of the projection matrix, as it remains sparse and comprised exclusively of integers after being tuned with our algorithm. Moreover, running the proposed algorithm on a consumer-grade CPU requires only a few seconds.
first_indexed 2024-09-23T17:10:45Z
format Article
id mit-1721.1/133016
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T17:10:45Z
publishDate 2021
publisher Springer US
record_format dspace
spelling mit-1721.1/1330162024-06-05T23:44:20Z Tuning Database-Friendly Random Projection Matrices for Improved Distance Preservation on Specific Data López-Sánchez, Daniel de Bodt, Cyril Lee, John A. Arrieta, Angélica G. Corchado, Juan M. Massachusetts Institute of Technology. Media Laboratory Random Projection is one of the most popular and successful dimensionality reduction algorithms for large volumes of data. However, given its stochastic nature, different initializations of the projection matrix can lead to very different levels of performance. This paper presents a guided random search algorithm to mitigate this problem. The proposed method uses a small number of training data samples to iteratively adjust a projection matrix, improving its performance on similarly distributed data. Experimental results show that projection matrices generated with the proposed method result in a better preservation of distances between data samples. Conveniently, this is achieved while preserving the database-friendliness of the projection matrix, as it remains sparse and comprised exclusively of integers after being tuned with our algorithm. Moreover, running the proposed algorithm on a consumer-grade CPU requires only a few seconds. 2021-10-18T13:53:52Z 2021-10-18T13:53:52Z 2021-07 2021-10-17T03:14:38Z Article http://purl.org/eprint/type/JournalArticle 1573-7497 0924-669X https://hdl.handle.net/1721.1/133016 López-Sánchez, D., de Bodt, C., Lee, J.A. et al. Tuning Database-Friendly Random Projection Matrices for Improved Distance Preservation on Specific Data. Appl Intell (2021) en https://doi.org/10.1007/s10489-021-02626-6 Applied Intelligence Creative Commons Attribution https://creativecommons.org/licenses/by/4.0/ The Author(s) application/pdf Springer US Springer US
spellingShingle López-Sánchez, Daniel
de Bodt, Cyril
Lee, John A.
Arrieta, Angélica G.
Corchado, Juan M.
Tuning Database-Friendly Random Projection Matrices for Improved Distance Preservation on Specific Data
title Tuning Database-Friendly Random Projection Matrices for Improved Distance Preservation on Specific Data
title_full Tuning Database-Friendly Random Projection Matrices for Improved Distance Preservation on Specific Data
title_fullStr Tuning Database-Friendly Random Projection Matrices for Improved Distance Preservation on Specific Data
title_full_unstemmed Tuning Database-Friendly Random Projection Matrices for Improved Distance Preservation on Specific Data
title_short Tuning Database-Friendly Random Projection Matrices for Improved Distance Preservation on Specific Data
title_sort tuning database friendly random projection matrices for improved distance preservation on specific data
url https://hdl.handle.net/1721.1/133016
work_keys_str_mv AT lopezsanchezdaniel tuningdatabasefriendlyrandomprojectionmatricesforimproveddistancepreservationonspecificdata
AT debodtcyril tuningdatabasefriendlyrandomprojectionmatricesforimproveddistancepreservationonspecificdata
AT leejohna tuningdatabasefriendlyrandomprojectionmatricesforimproveddistancepreservationonspecificdata
AT arrietaangelicag tuningdatabasefriendlyrandomprojectionmatricesforimproveddistancepreservationonspecificdata
AT corchadojuanm tuningdatabasefriendlyrandomprojectionmatricesforimproveddistancepreservationonspecificdata