Partition Maps

Tree ensembles, notably Random Forests, have been shown to deliver very accurate predictions on a wide range of regression and classification tasks. A common, yet maybe unjustified, criticism is that they operate as black boxes and provide very little understanding of the data beyond accurate predic...

Full description

Bibliographic Details
Main Author: Meinshausen, N
Format: Journal article
Language:English
Published: 2011
_version_ 1797096338910871552
author Meinshausen, N
author_facet Meinshausen, N
author_sort Meinshausen, N
collection OXFORD
description Tree ensembles, notably Random Forests, have been shown to deliver very accurate predictions on a wide range of regression and classification tasks. A common, yet maybe unjustified, criticism is that they operate as black boxes and provide very little understanding of the data beyond accurate predictions. We focus on multiclass classification and show that Homogeneity Analysis, a technique mostly used in psychometrics, can be leveraged to provide interesting and meaningful visualizations of tree ensemble predictions. Observations and nodes of the tree ensemble are placed in a bipartite graph, connecting each observation to all nodes it belongs to. The graph layout is then chosen by minimizing the sum of the squared edge lengths under certain constraints. We propose a variation of Homogeneity Analysis, called Partition Maps, and analyze advantages and shortcomings compared with multidimensional scaling of proximity matrices. Partition Maps have as potential advantages that (a) the influence of the original nodes and variables is visible in the low-dimensional embedding, similar to biplots, (b) new test observations can be added very easily, and (c) the test error is very similar to the original tree ensemble when using simple nearest-neighbor classification in the two-dimensional Partition Map embedding. Class boundaries, as found by the original tree ensemble algorithm, are thus reflected accurately in Partition Maps. Subgroups and outliers can furthermore be identified in the low-dimensional visualizations, allowing meaningful exploratory analysis of tree ensembles. An R-package partitionMap is provided as a supplementary material. © 2011 American Statistical Association.
first_indexed 2024-03-07T04:40:24Z
format Journal article
id oxford-uuid:d16d7015-482a-4194-8714-1c2a83e18abd
institution University of Oxford
language English
last_indexed 2024-03-07T04:40:24Z
publishDate 2011
record_format dspace
spelling oxford-uuid:d16d7015-482a-4194-8714-1c2a83e18abd2022-03-27T07:56:53ZPartition MapsJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:d16d7015-482a-4194-8714-1c2a83e18abdEnglishSymplectic Elements at Oxford2011Meinshausen, NTree ensembles, notably Random Forests, have been shown to deliver very accurate predictions on a wide range of regression and classification tasks. A common, yet maybe unjustified, criticism is that they operate as black boxes and provide very little understanding of the data beyond accurate predictions. We focus on multiclass classification and show that Homogeneity Analysis, a technique mostly used in psychometrics, can be leveraged to provide interesting and meaningful visualizations of tree ensemble predictions. Observations and nodes of the tree ensemble are placed in a bipartite graph, connecting each observation to all nodes it belongs to. The graph layout is then chosen by minimizing the sum of the squared edge lengths under certain constraints. We propose a variation of Homogeneity Analysis, called Partition Maps, and analyze advantages and shortcomings compared with multidimensional scaling of proximity matrices. Partition Maps have as potential advantages that (a) the influence of the original nodes and variables is visible in the low-dimensional embedding, similar to biplots, (b) new test observations can be added very easily, and (c) the test error is very similar to the original tree ensemble when using simple nearest-neighbor classification in the two-dimensional Partition Map embedding. Class boundaries, as found by the original tree ensemble algorithm, are thus reflected accurately in Partition Maps. Subgroups and outliers can furthermore be identified in the low-dimensional visualizations, allowing meaningful exploratory analysis of tree ensembles. An R-package partitionMap is provided as a supplementary material. © 2011 American Statistical Association.
spellingShingle Meinshausen, N
Partition Maps
title Partition Maps
title_full Partition Maps
title_fullStr Partition Maps
title_full_unstemmed Partition Maps
title_short Partition Maps
title_sort partition maps
work_keys_str_mv AT meinshausenn partitionmaps