Research and application of XGBoost in imbalanced data
As a new and efficient ensemble learning algorithm, XGBoost has been widely applied for its multitudinous advantages, but its classification effect in the case of data imbalance is often not ideal. Aiming at this problem, an attempt was made to optimize the regularization term of XGBoost, and a clas...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi - SAGE Publishing
2022-06-01
|
Series: | International Journal of Distributed Sensor Networks |
Online Access: | https://doi.org/10.1177/15501329221106935 |
_version_ | 1797713175689297920 |
---|---|
author | Ping Zhang Yiqiao Jia Youlin Shang |
author_facet | Ping Zhang Yiqiao Jia Youlin Shang |
author_sort | Ping Zhang |
collection | DOAJ |
description | As a new and efficient ensemble learning algorithm, XGBoost has been widely applied for its multitudinous advantages, but its classification effect in the case of data imbalance is often not ideal. Aiming at this problem, an attempt was made to optimize the regularization term of XGBoost, and a classification algorithm based on mixed sampling and ensemble learning is proposed. The main idea is to combine SVM-SMOTE over-sampling and EasyEnsemble under-sampling technologies for data processing, and then obtain the final model based on XGBoost by training and ensemble. At the same time, the optimal parameters are automatically searched and adjusted through the Bayesian optimization algorithm to realize classification prediction. In the experimental stage, the G-mean and area under the curve (AUC) values are used as evaluation indicators to compare and analyze the classification performance of different sampling methods and algorithm models. The experimental results on the public data set also verify the feasibility and effectiveness of the proposed algorithm. |
first_indexed | 2024-03-12T07:32:15Z |
format | Article |
id | doaj.art-d237e5f4e2dc4342b637598e53bbfe7a |
institution | Directory Open Access Journal |
issn | 1550-1477 |
language | English |
last_indexed | 2024-03-12T07:32:15Z |
publishDate | 2022-06-01 |
publisher | Hindawi - SAGE Publishing |
record_format | Article |
series | International Journal of Distributed Sensor Networks |
spelling | doaj.art-d237e5f4e2dc4342b637598e53bbfe7a2023-09-02T21:44:24ZengHindawi - SAGE PublishingInternational Journal of Distributed Sensor Networks1550-14772022-06-011810.1177/15501329221106935Research and application of XGBoost in imbalanced dataPing ZhangYiqiao JiaYoulin ShangAs a new and efficient ensemble learning algorithm, XGBoost has been widely applied for its multitudinous advantages, but its classification effect in the case of data imbalance is often not ideal. Aiming at this problem, an attempt was made to optimize the regularization term of XGBoost, and a classification algorithm based on mixed sampling and ensemble learning is proposed. The main idea is to combine SVM-SMOTE over-sampling and EasyEnsemble under-sampling technologies for data processing, and then obtain the final model based on XGBoost by training and ensemble. At the same time, the optimal parameters are automatically searched and adjusted through the Bayesian optimization algorithm to realize classification prediction. In the experimental stage, the G-mean and area under the curve (AUC) values are used as evaluation indicators to compare and analyze the classification performance of different sampling methods and algorithm models. The experimental results on the public data set also verify the feasibility and effectiveness of the proposed algorithm.https://doi.org/10.1177/15501329221106935 |
spellingShingle | Ping Zhang Yiqiao Jia Youlin Shang Research and application of XGBoost in imbalanced data International Journal of Distributed Sensor Networks |
title | Research and application of XGBoost in imbalanced data |
title_full | Research and application of XGBoost in imbalanced data |
title_fullStr | Research and application of XGBoost in imbalanced data |
title_full_unstemmed | Research and application of XGBoost in imbalanced data |
title_short | Research and application of XGBoost in imbalanced data |
title_sort | research and application of xgboost in imbalanced data |
url | https://doi.org/10.1177/15501329221106935 |
work_keys_str_mv | AT pingzhang researchandapplicationofxgboostinimbalanceddata AT yiqiaojia researchandapplicationofxgboostinimbalanceddata AT youlinshang researchandapplicationofxgboostinimbalanceddata |