A Credit Risk Model with Small Sample Data Based on G-XGBoost

Currently existing credit risk models, e.g., Scoring Card and Extreme Gradient Boosting (XGBoost), usually have requirements for the capacity of modeling samples. The small sample size may result in the adverse outcomes for the trained models which may neither achieve the expected accuracy nor disti...

Full description

Bibliographic Details
Main Authors: Jian Li, Haibin Liu, Zhijun Yang, Lei Han
Format: Article
Language:English
Published: Taylor & Francis Group 2021-12-01
Series:Applied Artificial Intelligence
Online Access:http://dx.doi.org/10.1080/08839514.2021.1987707
_version_ 1827817547496423424
author Jian Li
Haibin Liu
Zhijun Yang
Lei Han
author_facet Jian Li
Haibin Liu
Zhijun Yang
Lei Han
author_sort Jian Li
collection DOAJ
description Currently existing credit risk models, e.g., Scoring Card and Extreme Gradient Boosting (XGBoost), usually have requirements for the capacity of modeling samples. The small sample size may result in the adverse outcomes for the trained models which may neither achieve the expected accuracy nor distinguish risks well. On the other hand, data acquisition can be difficult and restricted due to data protection regulations. In view of the above dilemma, this paper applies Generative Adversarial Nets (GAN) to the construction of small and micro enterprises (SMEs) credit risk model, and proposes a novel training method, namely G-XGBoost, based on the XGBoost model. A few batches of real data are selected to train GAN. When the generative network reaches Nash equilibrium, the network is used to generate pseudo data with the same distribution. The pseudo data is then combined with real data to form an amplified sample set. The amplified sample set is used to train XGBoost for credit risk prediction. The feasibility and advantages of the G-XGBoost model are demonstrated by comparing with the XGBoost model.
first_indexed 2024-03-12T00:35:42Z
format Article
id doaj.art-07c55606468c49728e90b64451ad7d62
institution Directory Open Access Journal
issn 0883-9514
1087-6545
language English
last_indexed 2024-03-12T00:35:42Z
publishDate 2021-12-01
publisher Taylor & Francis Group
record_format Article
series Applied Artificial Intelligence
spelling doaj.art-07c55606468c49728e90b64451ad7d622023-09-15T09:33:59ZengTaylor & Francis GroupApplied Artificial Intelligence0883-95141087-65452021-12-0135151550156610.1080/08839514.2021.19877071987707A Credit Risk Model with Small Sample Data Based on G-XGBoostJian Li0Haibin Liu1Zhijun Yang2Lei Han3Beijing University of TechnologyBeijing University of TechnologyMiddlesex UniversityChina Aerospace Academy of Systems Science and EngineeringCurrently existing credit risk models, e.g., Scoring Card and Extreme Gradient Boosting (XGBoost), usually have requirements for the capacity of modeling samples. The small sample size may result in the adverse outcomes for the trained models which may neither achieve the expected accuracy nor distinguish risks well. On the other hand, data acquisition can be difficult and restricted due to data protection regulations. In view of the above dilemma, this paper applies Generative Adversarial Nets (GAN) to the construction of small and micro enterprises (SMEs) credit risk model, and proposes a novel training method, namely G-XGBoost, based on the XGBoost model. A few batches of real data are selected to train GAN. When the generative network reaches Nash equilibrium, the network is used to generate pseudo data with the same distribution. The pseudo data is then combined with real data to form an amplified sample set. The amplified sample set is used to train XGBoost for credit risk prediction. The feasibility and advantages of the G-XGBoost model are demonstrated by comparing with the XGBoost model.http://dx.doi.org/10.1080/08839514.2021.1987707
spellingShingle Jian Li
Haibin Liu
Zhijun Yang
Lei Han
A Credit Risk Model with Small Sample Data Based on G-XGBoost
Applied Artificial Intelligence
title A Credit Risk Model with Small Sample Data Based on G-XGBoost
title_full A Credit Risk Model with Small Sample Data Based on G-XGBoost
title_fullStr A Credit Risk Model with Small Sample Data Based on G-XGBoost
title_full_unstemmed A Credit Risk Model with Small Sample Data Based on G-XGBoost
title_short A Credit Risk Model with Small Sample Data Based on G-XGBoost
title_sort credit risk model with small sample data based on g xgboost
url http://dx.doi.org/10.1080/08839514.2021.1987707
work_keys_str_mv AT jianli acreditriskmodelwithsmallsampledatabasedongxgboost
AT haibinliu acreditriskmodelwithsmallsampledatabasedongxgboost
AT zhijunyang acreditriskmodelwithsmallsampledatabasedongxgboost
AT leihan acreditriskmodelwithsmallsampledatabasedongxgboost
AT jianli creditriskmodelwithsmallsampledatabasedongxgboost
AT haibinliu creditriskmodelwithsmallsampledatabasedongxgboost
AT zhijunyang creditriskmodelwithsmallsampledatabasedongxgboost
AT leihan creditriskmodelwithsmallsampledatabasedongxgboost