Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study

Customer retention is a major issue for various service-based organizations particularly telecom industry, wherein predictive models for observing the behavior of customers are one of the great instruments in customer retention process and inferring the future behavior of the customers. However, the...

Full description

Bibliographic Details
Main Authors:	Adnan Amin, Sajid Anwar, Awais Adnan, Muhammad Nawaz, Newton Howard, Junaid Qadir, Ahmad Hawalah, Amir Hussain
Format:	Article
Language:	English
Published:	IEEE 2016-01-01
Series:	IEEE Access
Subjects:	SMOTE ADASYN mega trend diffusion function class imbalance rough set customer churn
Online Access:	https://ieeexplore.ieee.org/document/7707454/

_version_	1828966594714271744
author	Adnan Amin Sajid Anwar Awais Adnan Muhammad Nawaz Newton Howard Junaid Qadir Ahmad Hawalah Amir Hussain
author_facet	Adnan Amin Sajid Anwar Awais Adnan Muhammad Nawaz Newton Howard Junaid Qadir Ahmad Hawalah Amir Hussain
author_sort	Adnan Amin
collection	DOAJ
description	Customer retention is a major issue for various service-based organizations particularly telecom industry, wherein predictive models for observing the behavior of customers are one of the great instruments in customer retention process and inferring the future behavior of the customers. However, the performances of predictive models are greatly affected when the real-world data set is highly imbalanced. A data set is called imbalanced if the samples size from one class is very much smaller or larger than the other classes. The most commonly used technique is over/under sampling for handling the class-imbalance problem (CIP) in various domains. In this paper, we survey six well-known sampling techniques and compare the performances of these key techniques, i.e., mega-trend diffusion function (MTDF), synthetic minority oversampling technique, adaptive synthetic sampling approach, couples top-N reverse k-nearest neighbor, majority weighted minority oversampling technique, and immune centroids oversampling technique. Moreover, this paper also reveals the evaluation of four rules-generation algorithms (the learning from example module, version 2 (LEM2), covering, exhaustive, and genetic algorithms) using publicly available data sets. The empirical results demonstrate that the overall predictive performance of MTDF and rules-generation based on genetic algorithms performed the best as compared with the rest of the evaluated oversampling methods and rule-generation algorithms.
first_indexed	2024-12-14T11:34:28Z
format	Article
id	doaj.art-b0201a5a2f8c4d6aa4f5edaa7d9f6d8b
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-14T11:34:28Z
publishDate	2016-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-b0201a5a2f8c4d6aa4f5edaa7d9f6d8b2022-12-21T23:03:08ZengIEEEIEEE Access2169-35362016-01-0147940795710.1109/ACCESS.2016.26197197707454Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case StudyAdnan Amin0https://orcid.org/0000-0002-0852-8833Sajid Anwar1Awais Adnan2Muhammad Nawaz3Newton Howard4Junaid Qadir5https://orcid.org/0000-0001-9466-2475Ahmad Hawalah6Amir Hussain7Center for Excellence in Information Technology, Institute of Management Sciences, Peshawar, PakistanCenter for Excellence in Information Technology, Institute of Management Sciences, Peshawar, PakistanCenter for Excellence in Information Technology, Institute of Management Sciences, Peshawar, PakistanCenter for Excellence in Information Technology, Institute of Management Sciences, Peshawar, PakistanNuffield Department of Surgical Sciences, University of Oxford, Oxford, U.K.Arfa Software Technology Park, Information Technology University, Lahore, PakistanCollege of Computer Science and Engineering, Taibah University, Medina, Saudi ArabiaDivision of Computing Science and Maths, University of Stirling, Stirling, U.K.Customer retention is a major issue for various service-based organizations particularly telecom industry, wherein predictive models for observing the behavior of customers are one of the great instruments in customer retention process and inferring the future behavior of the customers. However, the performances of predictive models are greatly affected when the real-world data set is highly imbalanced. A data set is called imbalanced if the samples size from one class is very much smaller or larger than the other classes. The most commonly used technique is over/under sampling for handling the class-imbalance problem (CIP) in various domains. In this paper, we survey six well-known sampling techniques and compare the performances of these key techniques, i.e., mega-trend diffusion function (MTDF), synthetic minority oversampling technique, adaptive synthetic sampling approach, couples top-N reverse k-nearest neighbor, majority weighted minority oversampling technique, and immune centroids oversampling technique. Moreover, this paper also reveals the evaluation of four rules-generation algorithms (the learning from example module, version 2 (LEM2), covering, exhaustive, and genetic algorithms) using publicly available data sets. The empirical results demonstrate that the overall predictive performance of MTDF and rules-generation based on genetic algorithms performed the best as compared with the rest of the evaluated oversampling methods and rule-generation algorithms.https://ieeexplore.ieee.org/document/7707454/SMOTEADASYNmega trend diffusion functionclass imbalancerough setcustomer churn
spellingShingle	Adnan Amin Sajid Anwar Awais Adnan Muhammad Nawaz Newton Howard Junaid Qadir Ahmad Hawalah Amir Hussain Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study IEEE Access SMOTE ADASYN mega trend diffusion function class imbalance rough set customer churn
title	Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study
title_full	Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study
title_fullStr	Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study
title_full_unstemmed	Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study
title_short	Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study
title_sort	comparing oversampling techniques to handle the class imbalance problem a customer churn prediction case study
topic	SMOTE ADASYN mega trend diffusion function class imbalance rough set customer churn
url	https://ieeexplore.ieee.org/document/7707454/
work_keys_str_mv	AT adnanamin comparingoversamplingtechniquestohandletheclassimbalanceproblemacustomerchurnpredictioncasestudy AT sajidanwar comparingoversamplingtechniquestohandletheclassimbalanceproblemacustomerchurnpredictioncasestudy AT awaisadnan comparingoversamplingtechniquestohandletheclassimbalanceproblemacustomerchurnpredictioncasestudy AT muhammadnawaz comparingoversamplingtechniquestohandletheclassimbalanceproblemacustomerchurnpredictioncasestudy AT newtonhoward comparingoversamplingtechniquestohandletheclassimbalanceproblemacustomerchurnpredictioncasestudy AT junaidqadir comparingoversamplingtechniquestohandletheclassimbalanceproblemacustomerchurnpredictioncasestudy AT ahmadhawalah comparingoversamplingtechniquestohandletheclassimbalanceproblemacustomerchurnpredictioncasestudy AT amirhussain comparingoversamplingtechniquestohandletheclassimbalanceproblemacustomerchurnpredictioncasestudy

Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study

Similar Items