A Sampling-Based Stack Framework for Imbalanced Learning in Churn Prediction

Churn prediction is gaining popularity in the research community as a powerful paradigm that supports data-driven operational decisions. Datasets related to churn prediction are often skewed with imbalanced class distribution. Data-level solutions, like over-sampling and under-sampling, have been co...

Full description

Bibliographic Details
Main Authors: Soumi De, P. Prabu
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9803037/
_version_ 1811342167925325824
author Soumi De
P. Prabu
author_facet Soumi De
P. Prabu
author_sort Soumi De
collection DOAJ
description Churn prediction is gaining popularity in the research community as a powerful paradigm that supports data-driven operational decisions. Datasets related to churn prediction are often skewed with imbalanced class distribution. Data-level solutions, like over-sampling and under-sampling, have been commonly used by researchers to address this problem. There are limited number of case studies that attempt to evolve these data-level solutions by integrating them with computationally advanced frameworks, like ensembles. Ensembles primarily employ algorithmic diversity using a fixed set of training instances to achieve superior performance. This study aims to introduce algorithmic diversity in ensembles by modifying the fixed set of training instances using diverse sampling strategies to increase predictive performance in imbalanced learning. Data is acquired from the world’s largest open hotel commerce platform company. A four-part series of experiments is conducted to analyze the effectiveness of sampling techniques and ensemble solutions on model performance. A new sampling-based stack framework called “Stacking of Samplers for Imbalanced Learning” is proposed. The framework combines the prediction capabilities of sampling solutions to stimulate the information gain of the meta features in ensemble. It is observed that the proposed framework leads to improvement in model performance with AUC of 86.4% and top-decile lift of 4.7 for customers of the hotel technology provider. Additionally, results show that the framework records a higher information gain for meta features used in a stack, compared to commonly used stack frameworks.
first_indexed 2024-04-13T19:06:42Z
format Article
id doaj.art-361285820c4f4e328aafa4bdbd83a948
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-13T19:06:42Z
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-361285820c4f4e328aafa4bdbd83a9482022-12-22T02:33:57ZengIEEEIEEE Access2169-35362022-01-0110680176802810.1109/ACCESS.2022.31852279803037A Sampling-Based Stack Framework for Imbalanced Learning in Churn PredictionSoumi De0https://orcid.org/0000-0002-4606-796XP. Prabu1Department of Data Science, CHRIST (Deemed to be University), Bengaluru, IndiaDepartment of Computer Science, CHRIST (Deemed to be University), Bengaluru, IndiaChurn prediction is gaining popularity in the research community as a powerful paradigm that supports data-driven operational decisions. Datasets related to churn prediction are often skewed with imbalanced class distribution. Data-level solutions, like over-sampling and under-sampling, have been commonly used by researchers to address this problem. There are limited number of case studies that attempt to evolve these data-level solutions by integrating them with computationally advanced frameworks, like ensembles. Ensembles primarily employ algorithmic diversity using a fixed set of training instances to achieve superior performance. This study aims to introduce algorithmic diversity in ensembles by modifying the fixed set of training instances using diverse sampling strategies to increase predictive performance in imbalanced learning. Data is acquired from the world’s largest open hotel commerce platform company. A four-part series of experiments is conducted to analyze the effectiveness of sampling techniques and ensemble solutions on model performance. A new sampling-based stack framework called “Stacking of Samplers for Imbalanced Learning” is proposed. The framework combines the prediction capabilities of sampling solutions to stimulate the information gain of the meta features in ensemble. It is observed that the proposed framework leads to improvement in model performance with AUC of 86.4% and top-decile lift of 4.7 for customers of the hotel technology provider. Additionally, results show that the framework records a higher information gain for meta features used in a stack, compared to commonly used stack frameworks.https://ieeexplore.ieee.org/document/9803037/Churn predictionensemble classifiersover-samplingunder-samplingensemble stack
spellingShingle Soumi De
P. Prabu
A Sampling-Based Stack Framework for Imbalanced Learning in Churn Prediction
IEEE Access
Churn prediction
ensemble classifiers
over-sampling
under-sampling
ensemble stack
title A Sampling-Based Stack Framework for Imbalanced Learning in Churn Prediction
title_full A Sampling-Based Stack Framework for Imbalanced Learning in Churn Prediction
title_fullStr A Sampling-Based Stack Framework for Imbalanced Learning in Churn Prediction
title_full_unstemmed A Sampling-Based Stack Framework for Imbalanced Learning in Churn Prediction
title_short A Sampling-Based Stack Framework for Imbalanced Learning in Churn Prediction
title_sort sampling based stack framework for imbalanced learning in churn prediction
topic Churn prediction
ensemble classifiers
over-sampling
under-sampling
ensemble stack
url https://ieeexplore.ieee.org/document/9803037/
work_keys_str_mv AT soumide asamplingbasedstackframeworkforimbalancedlearninginchurnprediction
AT pprabu asamplingbasedstackframeworkforimbalancedlearninginchurnprediction
AT soumide samplingbasedstackframeworkforimbalancedlearninginchurnprediction
AT pprabu samplingbasedstackframeworkforimbalancedlearninginchurnprediction