Partition Maps
Tree ensembles, notably Random Forests, have been shown to deliver very accurate predictions on a wide range of regression and classification tasks. A common, yet maybe unjustified, criticism is that they operate as black boxes and provide very little understanding of the data beyond accurate predic...
Main Author: | |
---|---|
Format: | Journal article |
Language: | English |
Published: |
2011
|
_version_ | 1797096338910871552 |
---|---|
author | Meinshausen, N |
author_facet | Meinshausen, N |
author_sort | Meinshausen, N |
collection | OXFORD |
description | Tree ensembles, notably Random Forests, have been shown to deliver very accurate predictions on a wide range of regression and classification tasks. A common, yet maybe unjustified, criticism is that they operate as black boxes and provide very little understanding of the data beyond accurate predictions. We focus on multiclass classification and show that Homogeneity Analysis, a technique mostly used in psychometrics, can be leveraged to provide interesting and meaningful visualizations of tree ensemble predictions. Observations and nodes of the tree ensemble are placed in a bipartite graph, connecting each observation to all nodes it belongs to. The graph layout is then chosen by minimizing the sum of the squared edge lengths under certain constraints. We propose a variation of Homogeneity Analysis, called Partition Maps, and analyze advantages and shortcomings compared with multidimensional scaling of proximity matrices. Partition Maps have as potential advantages that (a) the influence of the original nodes and variables is visible in the low-dimensional embedding, similar to biplots, (b) new test observations can be added very easily, and (c) the test error is very similar to the original tree ensemble when using simple nearest-neighbor classification in the two-dimensional Partition Map embedding. Class boundaries, as found by the original tree ensemble algorithm, are thus reflected accurately in Partition Maps. Subgroups and outliers can furthermore be identified in the low-dimensional visualizations, allowing meaningful exploratory analysis of tree ensembles. An R-package partitionMap is provided as a supplementary material. © 2011 American Statistical Association. |
first_indexed | 2024-03-07T04:40:24Z |
format | Journal article |
id | oxford-uuid:d16d7015-482a-4194-8714-1c2a83e18abd |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-07T04:40:24Z |
publishDate | 2011 |
record_format | dspace |
spelling | oxford-uuid:d16d7015-482a-4194-8714-1c2a83e18abd2022-03-27T07:56:53ZPartition MapsJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:d16d7015-482a-4194-8714-1c2a83e18abdEnglishSymplectic Elements at Oxford2011Meinshausen, NTree ensembles, notably Random Forests, have been shown to deliver very accurate predictions on a wide range of regression and classification tasks. A common, yet maybe unjustified, criticism is that they operate as black boxes and provide very little understanding of the data beyond accurate predictions. We focus on multiclass classification and show that Homogeneity Analysis, a technique mostly used in psychometrics, can be leveraged to provide interesting and meaningful visualizations of tree ensemble predictions. Observations and nodes of the tree ensemble are placed in a bipartite graph, connecting each observation to all nodes it belongs to. The graph layout is then chosen by minimizing the sum of the squared edge lengths under certain constraints. We propose a variation of Homogeneity Analysis, called Partition Maps, and analyze advantages and shortcomings compared with multidimensional scaling of proximity matrices. Partition Maps have as potential advantages that (a) the influence of the original nodes and variables is visible in the low-dimensional embedding, similar to biplots, (b) new test observations can be added very easily, and (c) the test error is very similar to the original tree ensemble when using simple nearest-neighbor classification in the two-dimensional Partition Map embedding. Class boundaries, as found by the original tree ensemble algorithm, are thus reflected accurately in Partition Maps. Subgroups and outliers can furthermore be identified in the low-dimensional visualizations, allowing meaningful exploratory analysis of tree ensembles. An R-package partitionMap is provided as a supplementary material. © 2011 American Statistical Association. |
spellingShingle | Meinshausen, N Partition Maps |
title | Partition Maps |
title_full | Partition Maps |
title_fullStr | Partition Maps |
title_full_unstemmed | Partition Maps |
title_short | Partition Maps |
title_sort | partition maps |
work_keys_str_mv | AT meinshausenn partitionmaps |