A Representation-Based Query Strategy to Derive Qualitative Features for Improved Churn Prediction

The effectiveness of any Machine Learning process depends on the accuracy of annotated data that is used to train a learner. However, manual annotation is expensive. Hence, researchers adopt a semi-supervised approach called active learning that aims to achieve state-of-the-art performance using min...

Full description

Bibliographic Details
Main Authors: Soumi De, P. Prabu
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10004958/
_version_ 1828070218404986880
author Soumi De
P. Prabu
author_facet Soumi De
P. Prabu
author_sort Soumi De
collection DOAJ
description The effectiveness of any Machine Learning process depends on the accuracy of annotated data that is used to train a learner. However, manual annotation is expensive. Hence, researchers adopt a semi-supervised approach called active learning that aims to achieve state-of-the-art performance using minimal number of samples. Although it boosts classifier performance, the underlying query strategies are unable to eliminate redundancy in selected samples. Redundant samples lead to increased cost and sub-optimal performance of learner. Inspired by this challenge, the study proposes a new representation-based query strategy that selects highly informative and representative subsets of samples for manual annotation. Data comprises messages of a set of customers sent to a service provider. Series of experiments are conducted to analyze the effectiveness of the proposed query strategy, called “Entropy-based Min Max Similarity” (E-MMSIM), in the context of topic classification for churn prediction. The foundation of E-MMSIM is an algorithm that is popularly used to sequence proteins in protein databases. The algorithm is modified and utilized to select the most representative and informative samples. The performance is evaluated using F1-score, AUC and accuracy. It is observed that “E-MMSIM” outperforms popular query strategies, and improves performance of topic classifiers for each of the 4 topics of churn prediction. The trained topic classifiers are used to derive qualitative features. These features are further integrated with structured variables for the same group of customers to predict churn. Experiments provide evidence that inclusion of qualitative features derived using E-MMSIM, enhance the performance of churn classifiers by 5%.
first_indexed 2024-04-11T00:34:27Z
format Article
id doaj.art-0700b561fa2d491c9bc4be1a75117dfa
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-11T00:34:27Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-0700b561fa2d491c9bc4be1a75117dfa2023-01-07T00:00:33ZengIEEEIEEE Access2169-35362023-01-01111213122310.1109/ACCESS.2022.323376810004958A Representation-Based Query Strategy to Derive Qualitative Features for Improved Churn PredictionSoumi De0https://orcid.org/0000-0002-4606-796XP. Prabu1Department of Data Science, CHRIST (Deemed to be University), Bengaluru, IndiaDepartment of Computer Science, CHRIST (Deemed to be University), Bengaluru, IndiaThe effectiveness of any Machine Learning process depends on the accuracy of annotated data that is used to train a learner. However, manual annotation is expensive. Hence, researchers adopt a semi-supervised approach called active learning that aims to achieve state-of-the-art performance using minimal number of samples. Although it boosts classifier performance, the underlying query strategies are unable to eliminate redundancy in selected samples. Redundant samples lead to increased cost and sub-optimal performance of learner. Inspired by this challenge, the study proposes a new representation-based query strategy that selects highly informative and representative subsets of samples for manual annotation. Data comprises messages of a set of customers sent to a service provider. Series of experiments are conducted to analyze the effectiveness of the proposed query strategy, called “Entropy-based Min Max Similarity” (E-MMSIM), in the context of topic classification for churn prediction. The foundation of E-MMSIM is an algorithm that is popularly used to sequence proteins in protein databases. The algorithm is modified and utilized to select the most representative and informative samples. The performance is evaluated using F1-score, AUC and accuracy. It is observed that “E-MMSIM” outperforms popular query strategies, and improves performance of topic classifiers for each of the 4 topics of churn prediction. The trained topic classifiers are used to derive qualitative features. These features are further integrated with structured variables for the same group of customers to predict churn. Experiments provide evidence that inclusion of qualitative features derived using E-MMSIM, enhance the performance of churn classifiers by 5%.https://ieeexplore.ieee.org/document/10004958/Active learningchurn predictionquery strategyentropytopic classification
spellingShingle Soumi De
P. Prabu
A Representation-Based Query Strategy to Derive Qualitative Features for Improved Churn Prediction
IEEE Access
Active learning
churn prediction
query strategy
entropy
topic classification
title A Representation-Based Query Strategy to Derive Qualitative Features for Improved Churn Prediction
title_full A Representation-Based Query Strategy to Derive Qualitative Features for Improved Churn Prediction
title_fullStr A Representation-Based Query Strategy to Derive Qualitative Features for Improved Churn Prediction
title_full_unstemmed A Representation-Based Query Strategy to Derive Qualitative Features for Improved Churn Prediction
title_short A Representation-Based Query Strategy to Derive Qualitative Features for Improved Churn Prediction
title_sort representation based query strategy to derive qualitative features for improved churn prediction
topic Active learning
churn prediction
query strategy
entropy
topic classification
url https://ieeexplore.ieee.org/document/10004958/
work_keys_str_mv AT soumide arepresentationbasedquerystrategytoderivequalitativefeaturesforimprovedchurnprediction
AT pprabu arepresentationbasedquerystrategytoderivequalitativefeaturesforimprovedchurnprediction
AT soumide representationbasedquerystrategytoderivequalitativefeaturesforimprovedchurnprediction
AT pprabu representationbasedquerystrategytoderivequalitativefeaturesforimprovedchurnprediction