Summary: | In this day where telecommunication is getting saturated due to the same pricing model
applied by most telcos, it is very easy for customers to leave one telco and join a
competitive one. Churn prediction is a data mining technique to predict the probability
of a customers wanting to leave the service.
In this project, churn prediction classifier is implemented with data from an
anonymous telecommunication company. The classifier is a binary classification with
2 labels churn and non-churn. We aggregate the data mining features from Call Detail
Records (CDR) with basic features such as number of messages in a month, total
duration of incoming/outgoing calls in a month, etc. Besides these basic features, graph
theory features (Label Propagation and PageRank) are also incorporated in the feature
selection method.
With the huge amount of data, MapReduce is used to parallelize and partition graph
computation such that graph size of 600000 nodes and more can be run comfortably
in a personal computer.
We achieve commendable results for the classification with all classifiers return
around 90% accuracy and more. The classifiers used are Naïve Bayes, Logistic KNN,
Logistic Regression, Decision Tree, Random Forest and Bagging. Logistic Regression
consistently outperforms other classifiers with the highest result at 96.9% accuracy
with AUC score of 0.988. We are confident that the telco will make profit in the long
run if they offer these highly accurate potential churners attractive packages to keep
them in the service.