New guidance for using t-SNE: Alternative defaults, hyperparameter selection automation, and comparative evaluation

We present new guidelines for choosing hyperparameters for t-SNE and an evaluation comparing these guidelines to current ones. These guidelines include a proposed empirically optimum guideline derived from a t-SNE hyperparameter grid search over a large collection of data sets. We also introduce a n...

Full description

Bibliographic Details
Main Authors: Robert Gove, Lucas Cadalzo, Nicholas Leiby, Jedediah M. Singer, Alexander Zaitzeff
Format: Article
Language:English
Published: Elsevier 2022-06-01
Series:Visual Informatics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2468502X22000201
_version_ 1828438673413111808
author Robert Gove
Lucas Cadalzo
Nicholas Leiby
Jedediah M. Singer
Alexander Zaitzeff
author_facet Robert Gove
Lucas Cadalzo
Nicholas Leiby
Jedediah M. Singer
Alexander Zaitzeff
author_sort Robert Gove
collection DOAJ
description We present new guidelines for choosing hyperparameters for t-SNE and an evaluation comparing these guidelines to current ones. These guidelines include a proposed empirically optimum guideline derived from a t-SNE hyperparameter grid search over a large collection of data sets. We also introduce a new method to featurize data sets using graph-based metrics called scagnostics; we use these features to train a neural network that predicts optimal t-SNE hyperparameters for the respective data set. This neural network has the potential to simplify the use of t-SNE by removing guesswork about which hyperparameters will produce the best embedding. We evaluate and compare our neural network-derived and empirically optimum hyperparameters to several other t-SNE hyperparameter guidelines from the literature on 68 data sets. The hyperparameters predicted by our neural network yield embeddings with similar accuracy as the best current t-SNE guidelines. Using our empirically optimum hyperparameters is simpler than following previously published guidelines but yields more accurate embeddings, in some cases by a statistically significant margin. We find that the useful ranges for t-SNE hyperparameters are narrower and include smaller values than previously reported in the literature. Importantly, we also quantify the potential for future improvements in this area: using data from a grid search of t-SNE hyperparameters we find that an optimal selection method could improve embedding accuracy by up to two percentage points over the methods examined in this paper.
first_indexed 2024-12-10T20:08:32Z
format Article
id doaj.art-594babfbfc27491ca8f51a4869de6823
institution Directory Open Access Journal
issn 2468-502X
language English
last_indexed 2024-12-10T20:08:32Z
publishDate 2022-06-01
publisher Elsevier
record_format Article
series Visual Informatics
spelling doaj.art-594babfbfc27491ca8f51a4869de68232022-12-22T01:35:20ZengElsevierVisual Informatics2468-502X2022-06-01628797New guidance for using t-SNE: Alternative defaults, hyperparameter selection automation, and comparative evaluationRobert Gove0Lucas Cadalzo1Nicholas Leiby2Jedediah M. Singer3Alexander Zaitzeff4Corresponding author.; Two Six Technologies, USATwo Six Technologies, USATwo Six Technologies, USATwo Six Technologies, USATwo Six Technologies, USAWe present new guidelines for choosing hyperparameters for t-SNE and an evaluation comparing these guidelines to current ones. These guidelines include a proposed empirically optimum guideline derived from a t-SNE hyperparameter grid search over a large collection of data sets. We also introduce a new method to featurize data sets using graph-based metrics called scagnostics; we use these features to train a neural network that predicts optimal t-SNE hyperparameters for the respective data set. This neural network has the potential to simplify the use of t-SNE by removing guesswork about which hyperparameters will produce the best embedding. We evaluate and compare our neural network-derived and empirically optimum hyperparameters to several other t-SNE hyperparameter guidelines from the literature on 68 data sets. The hyperparameters predicted by our neural network yield embeddings with similar accuracy as the best current t-SNE guidelines. Using our empirically optimum hyperparameters is simpler than following previously published guidelines but yields more accurate embeddings, in some cases by a statistically significant margin. We find that the useful ranges for t-SNE hyperparameters are narrower and include smaller values than previously reported in the literature. Importantly, we also quantify the potential for future improvements in this area: using data from a grid search of t-SNE hyperparameters we find that an optimal selection method could improve embedding accuracy by up to two percentage points over the methods examined in this paper.http://www.sciencedirect.com/science/article/pii/S2468502X22000201Dimensionality reductionMachine learningt-SNE
spellingShingle Robert Gove
Lucas Cadalzo
Nicholas Leiby
Jedediah M. Singer
Alexander Zaitzeff
New guidance for using t-SNE: Alternative defaults, hyperparameter selection automation, and comparative evaluation
Visual Informatics
Dimensionality reduction
Machine learning
t-SNE
title New guidance for using t-SNE: Alternative defaults, hyperparameter selection automation, and comparative evaluation
title_full New guidance for using t-SNE: Alternative defaults, hyperparameter selection automation, and comparative evaluation
title_fullStr New guidance for using t-SNE: Alternative defaults, hyperparameter selection automation, and comparative evaluation
title_full_unstemmed New guidance for using t-SNE: Alternative defaults, hyperparameter selection automation, and comparative evaluation
title_short New guidance for using t-SNE: Alternative defaults, hyperparameter selection automation, and comparative evaluation
title_sort new guidance for using t sne alternative defaults hyperparameter selection automation and comparative evaluation
topic Dimensionality reduction
Machine learning
t-SNE
url http://www.sciencedirect.com/science/article/pii/S2468502X22000201
work_keys_str_mv AT robertgove newguidanceforusingtsnealternativedefaultshyperparameterselectionautomationandcomparativeevaluation
AT lucascadalzo newguidanceforusingtsnealternativedefaultshyperparameterselectionautomationandcomparativeevaluation
AT nicholasleiby newguidanceforusingtsnealternativedefaultshyperparameterselectionautomationandcomparativeevaluation
AT jedediahmsinger newguidanceforusingtsnealternativedefaultshyperparameterselectionautomationandcomparativeevaluation
AT alexanderzaitzeff newguidanceforusingtsnealternativedefaultshyperparameterselectionautomationandcomparativeevaluation