Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection

Gradient Boosting Machines (GBM) are among the go-to algorithms on tabular data, which produce state-of-the-art results in many prediction tasks. Despite its popularity, the GBM framework suffers from a fundamental flaw in its base learners. Specifically, most implementations utilize decision trees...

Full description

Bibliographic Details
Main Authors: Afek Ilay Adler, Amichai Painsky
Format: Article
Language:English
Published: MDPI AG 2022-05-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/24/5/687
_version_ 1797500058823819264
author Afek Ilay Adler
Amichai Painsky
author_facet Afek Ilay Adler
Amichai Painsky
author_sort Afek Ilay Adler
collection DOAJ
description Gradient Boosting Machines (GBM) are among the go-to algorithms on tabular data, which produce state-of-the-art results in many prediction tasks. Despite its popularity, the GBM framework suffers from a fundamental flaw in its base learners. Specifically, most implementations utilize decision trees that are typically biased towards categorical variables with large cardinalities. The effect of this bias was extensively studied over the years, mostly in terms of predictive performance. In this work, we extend the scope and study the effect of biased base learners on GBM feature importance (FI) measures. We demonstrate that although these implementation demonstrate highly competitive predictive performance, they still, surprisingly, suffer from bias in FI. By utilizing cross-validated (CV) unbiased base learners, we fix this flaw at a relatively low computational cost. We demonstrate the suggested framework in a variety of synthetic and real-world setups, showing a significant improvement in all GBM FI measures while maintaining relatively the same level of prediction accuracy.
first_indexed 2024-03-10T03:56:23Z
format Article
id doaj.art-b26417123f93416ea45431cdcb803ddb
institution Directory Open Access Journal
issn 1099-4300
language English
last_indexed 2024-03-10T03:56:23Z
publishDate 2022-05-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj.art-b26417123f93416ea45431cdcb803ddb2023-11-23T10:55:42ZengMDPI AGEntropy1099-43002022-05-0124568710.3390/e24050687Feature Importance in Gradient Boosting Trees with Cross-Validation Feature SelectionAfek Ilay Adler0Amichai Painsky1The Industrial Engineering Department, Tel Aviv University, Tel Aviv 69978, IsraelThe Industrial Engineering Department, Tel Aviv University, Tel Aviv 69978, IsraelGradient Boosting Machines (GBM) are among the go-to algorithms on tabular data, which produce state-of-the-art results in many prediction tasks. Despite its popularity, the GBM framework suffers from a fundamental flaw in its base learners. Specifically, most implementations utilize decision trees that are typically biased towards categorical variables with large cardinalities. The effect of this bias was extensively studied over the years, mostly in terms of predictive performance. In this work, we extend the scope and study the effect of biased base learners on GBM feature importance (FI) measures. We demonstrate that although these implementation demonstrate highly competitive predictive performance, they still, surprisingly, suffer from bias in FI. By utilizing cross-validated (CV) unbiased base learners, we fix this flaw at a relatively low computational cost. We demonstrate the suggested framework in a variety of synthetic and real-world setups, showing a significant improvement in all GBM FI measures while maintaining relatively the same level of prediction accuracy.https://www.mdpi.com/1099-4300/24/5/687gradient boostingfeature importancetree-based methodsclassification and regression trees
spellingShingle Afek Ilay Adler
Amichai Painsky
Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection
Entropy
gradient boosting
feature importance
tree-based methods
classification and regression trees
title Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection
title_full Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection
title_fullStr Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection
title_full_unstemmed Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection
title_short Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection
title_sort feature importance in gradient boosting trees with cross validation feature selection
topic gradient boosting
feature importance
tree-based methods
classification and regression trees
url https://www.mdpi.com/1099-4300/24/5/687
work_keys_str_mv AT afekilayadler featureimportanceingradientboostingtreeswithcrossvalidationfeatureselection
AT amichaipainsky featureimportanceingradientboostingtreeswithcrossvalidationfeatureselection