A Credit Risk Model with Small Sample Data Based on G-XGBoost
Currently existing credit risk models, e.g., Scoring Card and Extreme Gradient Boosting (XGBoost), usually have requirements for the capacity of modeling samples. The small sample size may result in the adverse outcomes for the trained models which may neither achieve the expected accuracy nor disti...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Taylor & Francis Group
2021-12-01
|
Series: | Applied Artificial Intelligence |
Online Access: | http://dx.doi.org/10.1080/08839514.2021.1987707 |
_version_ | 1827817547496423424 |
---|---|
author | Jian Li Haibin Liu Zhijun Yang Lei Han |
author_facet | Jian Li Haibin Liu Zhijun Yang Lei Han |
author_sort | Jian Li |
collection | DOAJ |
description | Currently existing credit risk models, e.g., Scoring Card and Extreme Gradient Boosting (XGBoost), usually have requirements for the capacity of modeling samples. The small sample size may result in the adverse outcomes for the trained models which may neither achieve the expected accuracy nor distinguish risks well. On the other hand, data acquisition can be difficult and restricted due to data protection regulations. In view of the above dilemma, this paper applies Generative Adversarial Nets (GAN) to the construction of small and micro enterprises (SMEs) credit risk model, and proposes a novel training method, namely G-XGBoost, based on the XGBoost model. A few batches of real data are selected to train GAN. When the generative network reaches Nash equilibrium, the network is used to generate pseudo data with the same distribution. The pseudo data is then combined with real data to form an amplified sample set. The amplified sample set is used to train XGBoost for credit risk prediction. The feasibility and advantages of the G-XGBoost model are demonstrated by comparing with the XGBoost model. |
first_indexed | 2024-03-12T00:35:42Z |
format | Article |
id | doaj.art-07c55606468c49728e90b64451ad7d62 |
institution | Directory Open Access Journal |
issn | 0883-9514 1087-6545 |
language | English |
last_indexed | 2024-03-12T00:35:42Z |
publishDate | 2021-12-01 |
publisher | Taylor & Francis Group |
record_format | Article |
series | Applied Artificial Intelligence |
spelling | doaj.art-07c55606468c49728e90b64451ad7d622023-09-15T09:33:59ZengTaylor & Francis GroupApplied Artificial Intelligence0883-95141087-65452021-12-0135151550156610.1080/08839514.2021.19877071987707A Credit Risk Model with Small Sample Data Based on G-XGBoostJian Li0Haibin Liu1Zhijun Yang2Lei Han3Beijing University of TechnologyBeijing University of TechnologyMiddlesex UniversityChina Aerospace Academy of Systems Science and EngineeringCurrently existing credit risk models, e.g., Scoring Card and Extreme Gradient Boosting (XGBoost), usually have requirements for the capacity of modeling samples. The small sample size may result in the adverse outcomes for the trained models which may neither achieve the expected accuracy nor distinguish risks well. On the other hand, data acquisition can be difficult and restricted due to data protection regulations. In view of the above dilemma, this paper applies Generative Adversarial Nets (GAN) to the construction of small and micro enterprises (SMEs) credit risk model, and proposes a novel training method, namely G-XGBoost, based on the XGBoost model. A few batches of real data are selected to train GAN. When the generative network reaches Nash equilibrium, the network is used to generate pseudo data with the same distribution. The pseudo data is then combined with real data to form an amplified sample set. The amplified sample set is used to train XGBoost for credit risk prediction. The feasibility and advantages of the G-XGBoost model are demonstrated by comparing with the XGBoost model.http://dx.doi.org/10.1080/08839514.2021.1987707 |
spellingShingle | Jian Li Haibin Liu Zhijun Yang Lei Han A Credit Risk Model with Small Sample Data Based on G-XGBoost Applied Artificial Intelligence |
title | A Credit Risk Model with Small Sample Data Based on G-XGBoost |
title_full | A Credit Risk Model with Small Sample Data Based on G-XGBoost |
title_fullStr | A Credit Risk Model with Small Sample Data Based on G-XGBoost |
title_full_unstemmed | A Credit Risk Model with Small Sample Data Based on G-XGBoost |
title_short | A Credit Risk Model with Small Sample Data Based on G-XGBoost |
title_sort | credit risk model with small sample data based on g xgboost |
url | http://dx.doi.org/10.1080/08839514.2021.1987707 |
work_keys_str_mv | AT jianli acreditriskmodelwithsmallsampledatabasedongxgboost AT haibinliu acreditriskmodelwithsmallsampledatabasedongxgboost AT zhijunyang acreditriskmodelwithsmallsampledatabasedongxgboost AT leihan acreditriskmodelwithsmallsampledatabasedongxgboost AT jianli creditriskmodelwithsmallsampledatabasedongxgboost AT haibinliu creditriskmodelwithsmallsampledatabasedongxgboost AT zhijunyang creditriskmodelwithsmallsampledatabasedongxgboost AT leihan creditriskmodelwithsmallsampledatabasedongxgboost |