Ensemble Methods in Customer Churn Prediction: A Comparative Analysis of the State-of-the-Art
In the past several single classifiers, homogeneous and heterogeneous ensembles have been proposed to detect the customers who are most likely to churn. Despite the popularity and accuracy of heterogeneous ensembles in various domains, customer churn prediction models have not yet been picked up. Mo...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-02-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/11/5/1137 |
_version_ | 1827752522878550016 |
---|---|
author | Matthias Bogaert Lex Delaere |
author_facet | Matthias Bogaert Lex Delaere |
author_sort | Matthias Bogaert |
collection | DOAJ |
description | In the past several single classifiers, homogeneous and heterogeneous ensembles have been proposed to detect the customers who are most likely to churn. Despite the popularity and accuracy of heterogeneous ensembles in various domains, customer churn prediction models have not yet been picked up. Moreover, there are other developments in the performance evaluation and model comparison level that have not been introduced in a systematic way. Therefore, the aim of this study is to perform a large scale benchmark study in customer churn prediction implementing these novel methods. To do so, we benchmark 33 classifiers, including 6 single classifiers, 14 homogeneous, and 13 heterogeneous ensembles across 11 datasets. Our findings indicate that heterogeneous ensembles are consistently ranked higher than homogeneous ensembles and single classifiers. It is observed that a heterogeneous ensemble with simulated annealing classifier selection is ranked the highest in terms of AUC and expected maximum profits. For accuracy, F1 measure and top-decile lift, a heterogenous ensemble optimized by non-negative binomial likelihood, and a stacked heterogeneous ensemble are, respectively, the top ranked classifiers. Our study contributes to the literature by being the first to include such an extensive set of classifiers, performance metrics, and statistical tests in a benchmark study of customer churn. |
first_indexed | 2024-03-11T07:18:32Z |
format | Article |
id | doaj.art-6623e9203578403aa1ea1dbef9399bc4 |
institution | Directory Open Access Journal |
issn | 2227-7390 |
language | English |
last_indexed | 2024-03-11T07:18:32Z |
publishDate | 2023-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Mathematics |
spelling | doaj.art-6623e9203578403aa1ea1dbef9399bc42023-11-17T08:08:43ZengMDPI AGMathematics2227-73902023-02-01115113710.3390/math11051137Ensemble Methods in Customer Churn Prediction: A Comparative Analysis of the State-of-the-ArtMatthias Bogaert0Lex Delaere1Departement of Marketing, Innovation and Organization, Ghent University, 9000 Ghent, BelgiumDepartement of Marketing, Innovation and Organization, Ghent University, 9000 Ghent, BelgiumIn the past several single classifiers, homogeneous and heterogeneous ensembles have been proposed to detect the customers who are most likely to churn. Despite the popularity and accuracy of heterogeneous ensembles in various domains, customer churn prediction models have not yet been picked up. Moreover, there are other developments in the performance evaluation and model comparison level that have not been introduced in a systematic way. Therefore, the aim of this study is to perform a large scale benchmark study in customer churn prediction implementing these novel methods. To do so, we benchmark 33 classifiers, including 6 single classifiers, 14 homogeneous, and 13 heterogeneous ensembles across 11 datasets. Our findings indicate that heterogeneous ensembles are consistently ranked higher than homogeneous ensembles and single classifiers. It is observed that a heterogeneous ensemble with simulated annealing classifier selection is ranked the highest in terms of AUC and expected maximum profits. For accuracy, F1 measure and top-decile lift, a heterogenous ensemble optimized by non-negative binomial likelihood, and a stacked heterogeneous ensemble are, respectively, the top ranked classifiers. Our study contributes to the literature by being the first to include such an extensive set of classifiers, performance metrics, and statistical tests in a benchmark study of customer churn.https://www.mdpi.com/2227-7390/11/5/1137churn predictionensemble methodsmachine learningdata miningCRM |
spellingShingle | Matthias Bogaert Lex Delaere Ensemble Methods in Customer Churn Prediction: A Comparative Analysis of the State-of-the-Art Mathematics churn prediction ensemble methods machine learning data mining CRM |
title | Ensemble Methods in Customer Churn Prediction: A Comparative Analysis of the State-of-the-Art |
title_full | Ensemble Methods in Customer Churn Prediction: A Comparative Analysis of the State-of-the-Art |
title_fullStr | Ensemble Methods in Customer Churn Prediction: A Comparative Analysis of the State-of-the-Art |
title_full_unstemmed | Ensemble Methods in Customer Churn Prediction: A Comparative Analysis of the State-of-the-Art |
title_short | Ensemble Methods in Customer Churn Prediction: A Comparative Analysis of the State-of-the-Art |
title_sort | ensemble methods in customer churn prediction a comparative analysis of the state of the art |
topic | churn prediction ensemble methods machine learning data mining CRM |
url | https://www.mdpi.com/2227-7390/11/5/1137 |
work_keys_str_mv | AT matthiasbogaert ensemblemethodsincustomerchurnpredictionacomparativeanalysisofthestateoftheart AT lexdelaere ensemblemethodsincustomerchurnpredictionacomparativeanalysisofthestateoftheart |