Alternative credit scoring for the un(der)banked
In this thesis, I address the topic of credit scoring for the un- and underbanked by utilizing machine-learning and, most notably, social graph theory, and test the corresponding models using a unique set of anonymized alternative data from an Egypt based digital lending partner (Company X). My work...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | English |
Published: |
2024
|
Subjects: |
_version_ | 1824458880380305408 |
---|---|
author | Elrakabawy, S |
author2 | Hale, S |
author_facet | Hale, S Elrakabawy, S |
author_sort | Elrakabawy, S |
collection | OXFORD |
description | In this thesis, I address the topic of credit scoring for the un- and underbanked by utilizing machine-learning and, most notably, social graph theory, and test the corresponding models using a unique set of anonymized alternative data from an Egypt based digital lending partner (Company X). My work contributes socially by improving access to finance for the un- and underbanked, as well as academically by filling gaps in research. A comprehensive empirical comparative demographic analysis of Company X's loan applicant base highlights social aspects (such as gender inequality) and emphasizes a lack and / or low access to finance. To address this problem, I utilize machine-learning for algorithmic credit risk prediction and conduct a detailed empirical correlation study of feature vs. dependent variables. I show that machine-learning yields substantial above chance probabilities in estimating credit risk, and I also manage to reduce the feature set dimensionality (i.e. credit request steps) by 96 percent at near-identical prediction performance, which translates to a quasi-single-question credit request. A major finding of my research constitutes a revelation of the relationship between homophily and credit risk, with loan applicants exhibiting similar credit risk profiles tending to be socially similar; a finding which I put to use by defining and testing various graph-theoretical structural social similarity metrics for their applicability to predict credit risk. Based on the results, I extend graph-theoretical structural similarity with loan applicant features, and introduce SMBP and SMHP; two novel credit risk prediction algorithms based on social graph theory. SMBP and SMHP slightly outperform the best-performing machine-learning model, XGBoost, in terms of prediction, at a negligible fraction (as low as 0.001 percent) of its computational time. Overall, the findings, if adopted in the real world, would considerably improve adoption, scalability, and cost of loan granting for a fraction of resources. |
first_indexed | 2025-02-19T04:32:55Z |
format | Thesis |
id | oxford-uuid:490f52bb-8abf-4421-818e-64436a7f9784 |
institution | University of Oxford |
language | English |
last_indexed | 2025-02-19T04:32:55Z |
publishDate | 2024 |
record_format | dspace |
spelling | oxford-uuid:490f52bb-8abf-4421-818e-64436a7f97842025-01-20T10:34:40ZAlternative credit scoring for the un(der)bankedThesishttp://purl.org/coar/resource_type/c_db06uuid:490f52bb-8abf-4421-818e-64436a7f9784Credit scoring systemsMachine learningEnglishHyrax Deposit2024Elrakabawy, SHale, SVendres, BIn this thesis, I address the topic of credit scoring for the un- and underbanked by utilizing machine-learning and, most notably, social graph theory, and test the corresponding models using a unique set of anonymized alternative data from an Egypt based digital lending partner (Company X). My work contributes socially by improving access to finance for the un- and underbanked, as well as academically by filling gaps in research. A comprehensive empirical comparative demographic analysis of Company X's loan applicant base highlights social aspects (such as gender inequality) and emphasizes a lack and / or low access to finance. To address this problem, I utilize machine-learning for algorithmic credit risk prediction and conduct a detailed empirical correlation study of feature vs. dependent variables. I show that machine-learning yields substantial above chance probabilities in estimating credit risk, and I also manage to reduce the feature set dimensionality (i.e. credit request steps) by 96 percent at near-identical prediction performance, which translates to a quasi-single-question credit request. A major finding of my research constitutes a revelation of the relationship between homophily and credit risk, with loan applicants exhibiting similar credit risk profiles tending to be socially similar; a finding which I put to use by defining and testing various graph-theoretical structural social similarity metrics for their applicability to predict credit risk. Based on the results, I extend graph-theoretical structural similarity with loan applicant features, and introduce SMBP and SMHP; two novel credit risk prediction algorithms based on social graph theory. SMBP and SMHP slightly outperform the best-performing machine-learning model, XGBoost, in terms of prediction, at a negligible fraction (as low as 0.001 percent) of its computational time. Overall, the findings, if adopted in the real world, would considerably improve adoption, scalability, and cost of loan granting for a fraction of resources. |
spellingShingle | Credit scoring systems Machine learning Elrakabawy, S Alternative credit scoring for the un(der)banked |
title | Alternative credit scoring for the un(der)banked |
title_full | Alternative credit scoring for the un(der)banked |
title_fullStr | Alternative credit scoring for the un(der)banked |
title_full_unstemmed | Alternative credit scoring for the un(der)banked |
title_short | Alternative credit scoring for the un(der)banked |
title_sort | alternative credit scoring for the un der banked |
topic | Credit scoring systems Machine learning |
work_keys_str_mv | AT elrakabawys alternativecreditscoringfortheunderbanked |