Research and application of XGBoost in imbalanced data

As a new and efficient ensemble learning algorithm, XGBoost has been widely applied for its multitudinous advantages, but its classification effect in the case of data imbalance is often not ideal. Aiming at this problem, an attempt was made to optimize the regularization term of XGBoost, and a clas...

Full description

Bibliographic Details
Main Authors: Ping Zhang, Yiqiao Jia, Youlin Shang
Format: Article
Language:English
Published: Hindawi - SAGE Publishing 2022-06-01
Series:International Journal of Distributed Sensor Networks
Online Access:https://doi.org/10.1177/15501329221106935
_version_ 1797713175689297920
author Ping Zhang
Yiqiao Jia
Youlin Shang
author_facet Ping Zhang
Yiqiao Jia
Youlin Shang
author_sort Ping Zhang
collection DOAJ
description As a new and efficient ensemble learning algorithm, XGBoost has been widely applied for its multitudinous advantages, but its classification effect in the case of data imbalance is often not ideal. Aiming at this problem, an attempt was made to optimize the regularization term of XGBoost, and a classification algorithm based on mixed sampling and ensemble learning is proposed. The main idea is to combine SVM-SMOTE over-sampling and EasyEnsemble under-sampling technologies for data processing, and then obtain the final model based on XGBoost by training and ensemble. At the same time, the optimal parameters are automatically searched and adjusted through the Bayesian optimization algorithm to realize classification prediction. In the experimental stage, the G-mean and area under the curve (AUC) values are used as evaluation indicators to compare and analyze the classification performance of different sampling methods and algorithm models. The experimental results on the public data set also verify the feasibility and effectiveness of the proposed algorithm.
first_indexed 2024-03-12T07:32:15Z
format Article
id doaj.art-d237e5f4e2dc4342b637598e53bbfe7a
institution Directory Open Access Journal
issn 1550-1477
language English
last_indexed 2024-03-12T07:32:15Z
publishDate 2022-06-01
publisher Hindawi - SAGE Publishing
record_format Article
series International Journal of Distributed Sensor Networks
spelling doaj.art-d237e5f4e2dc4342b637598e53bbfe7a2023-09-02T21:44:24ZengHindawi - SAGE PublishingInternational Journal of Distributed Sensor Networks1550-14772022-06-011810.1177/15501329221106935Research and application of XGBoost in imbalanced dataPing ZhangYiqiao JiaYoulin ShangAs a new and efficient ensemble learning algorithm, XGBoost has been widely applied for its multitudinous advantages, but its classification effect in the case of data imbalance is often not ideal. Aiming at this problem, an attempt was made to optimize the regularization term of XGBoost, and a classification algorithm based on mixed sampling and ensemble learning is proposed. The main idea is to combine SVM-SMOTE over-sampling and EasyEnsemble under-sampling technologies for data processing, and then obtain the final model based on XGBoost by training and ensemble. At the same time, the optimal parameters are automatically searched and adjusted through the Bayesian optimization algorithm to realize classification prediction. In the experimental stage, the G-mean and area under the curve (AUC) values are used as evaluation indicators to compare and analyze the classification performance of different sampling methods and algorithm models. The experimental results on the public data set also verify the feasibility and effectiveness of the proposed algorithm.https://doi.org/10.1177/15501329221106935
spellingShingle Ping Zhang
Yiqiao Jia
Youlin Shang
Research and application of XGBoost in imbalanced data
International Journal of Distributed Sensor Networks
title Research and application of XGBoost in imbalanced data
title_full Research and application of XGBoost in imbalanced data
title_fullStr Research and application of XGBoost in imbalanced data
title_full_unstemmed Research and application of XGBoost in imbalanced data
title_short Research and application of XGBoost in imbalanced data
title_sort research and application of xgboost in imbalanced data
url https://doi.org/10.1177/15501329221106935
work_keys_str_mv AT pingzhang researchandapplicationofxgboostinimbalanceddata
AT yiqiaojia researchandapplicationofxgboostinimbalanceddata
AT youlinshang researchandapplicationofxgboostinimbalanceddata