Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection
Gradient Boosting Machines (GBM) are among the go-to algorithms on tabular data, which produce state-of-the-art results in many prediction tasks. Despite its popularity, the GBM framework suffers from a fundamental flaw in its base learners. Specifically, most implementations utilize decision trees...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-05-01
|
Series: | Entropy |
Subjects: | |
Online Access: | https://www.mdpi.com/1099-4300/24/5/687 |
_version_ | 1797500058823819264 |
---|---|
author | Afek Ilay Adler Amichai Painsky |
author_facet | Afek Ilay Adler Amichai Painsky |
author_sort | Afek Ilay Adler |
collection | DOAJ |
description | Gradient Boosting Machines (GBM) are among the go-to algorithms on tabular data, which produce state-of-the-art results in many prediction tasks. Despite its popularity, the GBM framework suffers from a fundamental flaw in its base learners. Specifically, most implementations utilize decision trees that are typically biased towards categorical variables with large cardinalities. The effect of this bias was extensively studied over the years, mostly in terms of predictive performance. In this work, we extend the scope and study the effect of biased base learners on GBM feature importance (FI) measures. We demonstrate that although these implementation demonstrate highly competitive predictive performance, they still, surprisingly, suffer from bias in FI. By utilizing cross-validated (CV) unbiased base learners, we fix this flaw at a relatively low computational cost. We demonstrate the suggested framework in a variety of synthetic and real-world setups, showing a significant improvement in all GBM FI measures while maintaining relatively the same level of prediction accuracy. |
first_indexed | 2024-03-10T03:56:23Z |
format | Article |
id | doaj.art-b26417123f93416ea45431cdcb803ddb |
institution | Directory Open Access Journal |
issn | 1099-4300 |
language | English |
last_indexed | 2024-03-10T03:56:23Z |
publishDate | 2022-05-01 |
publisher | MDPI AG |
record_format | Article |
series | Entropy |
spelling | doaj.art-b26417123f93416ea45431cdcb803ddb2023-11-23T10:55:42ZengMDPI AGEntropy1099-43002022-05-0124568710.3390/e24050687Feature Importance in Gradient Boosting Trees with Cross-Validation Feature SelectionAfek Ilay Adler0Amichai Painsky1The Industrial Engineering Department, Tel Aviv University, Tel Aviv 69978, IsraelThe Industrial Engineering Department, Tel Aviv University, Tel Aviv 69978, IsraelGradient Boosting Machines (GBM) are among the go-to algorithms on tabular data, which produce state-of-the-art results in many prediction tasks. Despite its popularity, the GBM framework suffers from a fundamental flaw in its base learners. Specifically, most implementations utilize decision trees that are typically biased towards categorical variables with large cardinalities. The effect of this bias was extensively studied over the years, mostly in terms of predictive performance. In this work, we extend the scope and study the effect of biased base learners on GBM feature importance (FI) measures. We demonstrate that although these implementation demonstrate highly competitive predictive performance, they still, surprisingly, suffer from bias in FI. By utilizing cross-validated (CV) unbiased base learners, we fix this flaw at a relatively low computational cost. We demonstrate the suggested framework in a variety of synthetic and real-world setups, showing a significant improvement in all GBM FI measures while maintaining relatively the same level of prediction accuracy.https://www.mdpi.com/1099-4300/24/5/687gradient boostingfeature importancetree-based methodsclassification and regression trees |
spellingShingle | Afek Ilay Adler Amichai Painsky Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection Entropy gradient boosting feature importance tree-based methods classification and regression trees |
title | Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection |
title_full | Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection |
title_fullStr | Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection |
title_full_unstemmed | Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection |
title_short | Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection |
title_sort | feature importance in gradient boosting trees with cross validation feature selection |
topic | gradient boosting feature importance tree-based methods classification and regression trees |
url | https://www.mdpi.com/1099-4300/24/5/687 |
work_keys_str_mv | AT afekilayadler featureimportanceingradientboostingtreeswithcrossvalidationfeatureselection AT amichaipainsky featureimportanceingradientboostingtreeswithcrossvalidationfeatureselection |