GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles

SHapley Additive exPlanation (SHAP) values (Lundberg & Lee, 2017) provide a game theoretic interpretation of the predictions of machine learning models based on Shapley values (Shapley, 1953). While exact calculation of SHAP values is computationally intractable in general, a recursive polynomia...

Full description

Bibliographic Details
Main Authors:	Rory Mitchell, Eibe Frank, Geoffrey Holmes
Format:	Article
Language:	English
Published:	PeerJ Inc. 2022-04-01
Series:	PeerJ Computer Science
Subjects:	Shapley values GPU computing Interpretability
Online Access:	https://peerj.com/articles/cs-880.pdf

_version_	1811313240409374720
author	Rory Mitchell Eibe Frank Geoffrey Holmes
author_facet	Rory Mitchell Eibe Frank Geoffrey Holmes
author_sort	Rory Mitchell
collection	DOAJ
description	SHapley Additive exPlanation (SHAP) values (Lundberg & Lee, 2017) provide a game theoretic interpretation of the predictions of machine learning models based on Shapley values (Shapley, 1953). While exact calculation of SHAP values is computationally intractable in general, a recursive polynomial-time algorithm called TreeShap (Lundberg et al., 2020) is available for decision tree models. However, despite its polynomial time complexity, TreeShap can become a significant bottleneck in practical machine learning pipelines when applied to large decision tree ensembles. Unfortunately, the complicated TreeShap algorithm is difficult to map to hardware accelerators such as GPUs. In this work, we present GPUTreeShap, a reformulated TreeShap algorithm suitable for massively parallel computation on graphics processing units. Our approach first preprocesses each decision tree to isolate variable sized sub-problems from the original recursive algorithm, then solves a bin packing problem, and finally maps sub-problems to single-instruction, multiple-thread (SIMT) tasks for parallel execution with specialised hardware instructions. With a single NVIDIA Tesla V100-32 GPU, we achieve speedups of up to 19× for SHAP values, and speedups of up to 340× for SHAP interaction values, over a state-of-the-art multi-core CPU implementation executed on two 20-core Xeon E5-2698 v4 2.2 GHz CPUs. We also experiment with multi-GPU computing using eight V100 GPUs, demonstrating throughput of 1.2 M rows per second—equivalent CPU-based performance is estimated to require 6850 CPU cores.
first_indexed	2024-04-13T10:50:16Z
format	Article
id	doaj.art-91dba1df65cb4e54b4de0685e14215a9
institution	Directory Open Access Journal
issn	2376-5992
language	English
last_indexed	2024-04-13T10:50:16Z
publishDate	2022-04-01
publisher	PeerJ Inc.
record_format	Article
series	PeerJ Computer Science
spelling	doaj.art-91dba1df65cb4e54b4de0685e14215a92022-12-22T02:49:41ZengPeerJ Inc.PeerJ Computer Science2376-59922022-04-018e88010.7717/peerj-cs.880GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensemblesRory Mitchell0Eibe Frank1Geoffrey Holmes2Nvidia, Santa Clara, United StatesUniversity of Waikato, Hamilton, New ZealandUniversity of Waikato, Hamilton, New ZealandSHapley Additive exPlanation (SHAP) values (Lundberg & Lee, 2017) provide a game theoretic interpretation of the predictions of machine learning models based on Shapley values (Shapley, 1953). While exact calculation of SHAP values is computationally intractable in general, a recursive polynomial-time algorithm called TreeShap (Lundberg et al., 2020) is available for decision tree models. However, despite its polynomial time complexity, TreeShap can become a significant bottleneck in practical machine learning pipelines when applied to large decision tree ensembles. Unfortunately, the complicated TreeShap algorithm is difficult to map to hardware accelerators such as GPUs. In this work, we present GPUTreeShap, a reformulated TreeShap algorithm suitable for massively parallel computation on graphics processing units. Our approach first preprocesses each decision tree to isolate variable sized sub-problems from the original recursive algorithm, then solves a bin packing problem, and finally maps sub-problems to single-instruction, multiple-thread (SIMT) tasks for parallel execution with specialised hardware instructions. With a single NVIDIA Tesla V100-32 GPU, we achieve speedups of up to 19× for SHAP values, and speedups of up to 340× for SHAP interaction values, over a state-of-the-art multi-core CPU implementation executed on two 20-core Xeon E5-2698 v4 2.2 GHz CPUs. We also experiment with multi-GPU computing using eight V100 GPUs, demonstrating throughput of 1.2 M rows per second—equivalent CPU-based performance is estimated to require 6850 CPU cores.https://peerj.com/articles/cs-880.pdfShapley valuesGPU computingInterpretability
spellingShingle	Rory Mitchell Eibe Frank Geoffrey Holmes GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles PeerJ Computer Science Shapley values GPU computing Interpretability
title	GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles
title_full	GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles
title_fullStr	GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles
title_full_unstemmed	GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles
title_short	GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles
title_sort	gputreeshap massively parallel exact calculation of shap scores for tree ensembles
topic	Shapley values GPU computing Interpretability
url	https://peerj.com/articles/cs-880.pdf
work_keys_str_mv	AT rorymitchell gputreeshapmassivelyparallelexactcalculationofshapscoresfortreeensembles AT eibefrank gputreeshapmassivelyparallelexactcalculationofshapscoresfortreeensembles AT geoffreyholmes gputreeshapmassivelyparallelexactcalculationofshapscoresfortreeensembles

GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles

Similar Items