A Representation-Based Query Strategy to Derive Qualitative Features for Improved Churn Prediction
The effectiveness of any Machine Learning process depends on the accuracy of annotated data that is used to train a learner. However, manual annotation is expensive. Hence, researchers adopt a semi-supervised approach called active learning that aims to achieve state-of-the-art performance using min...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10004958/ |
_version_ | 1828070218404986880 |
---|---|
author | Soumi De P. Prabu |
author_facet | Soumi De P. Prabu |
author_sort | Soumi De |
collection | DOAJ |
description | The effectiveness of any Machine Learning process depends on the accuracy of annotated data that is used to train a learner. However, manual annotation is expensive. Hence, researchers adopt a semi-supervised approach called active learning that aims to achieve state-of-the-art performance using minimal number of samples. Although it boosts classifier performance, the underlying query strategies are unable to eliminate redundancy in selected samples. Redundant samples lead to increased cost and sub-optimal performance of learner. Inspired by this challenge, the study proposes a new representation-based query strategy that selects highly informative and representative subsets of samples for manual annotation. Data comprises messages of a set of customers sent to a service provider. Series of experiments are conducted to analyze the effectiveness of the proposed query strategy, called “Entropy-based Min Max Similarity” (E-MMSIM), in the context of topic classification for churn prediction. The foundation of E-MMSIM is an algorithm that is popularly used to sequence proteins in protein databases. The algorithm is modified and utilized to select the most representative and informative samples. The performance is evaluated using F1-score, AUC and accuracy. It is observed that “E-MMSIM” outperforms popular query strategies, and improves performance of topic classifiers for each of the 4 topics of churn prediction. The trained topic classifiers are used to derive qualitative features. These features are further integrated with structured variables for the same group of customers to predict churn. Experiments provide evidence that inclusion of qualitative features derived using E-MMSIM, enhance the performance of churn classifiers by 5%. |
first_indexed | 2024-04-11T00:34:27Z |
format | Article |
id | doaj.art-0700b561fa2d491c9bc4be1a75117dfa |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-11T00:34:27Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-0700b561fa2d491c9bc4be1a75117dfa2023-01-07T00:00:33ZengIEEEIEEE Access2169-35362023-01-01111213122310.1109/ACCESS.2022.323376810004958A Representation-Based Query Strategy to Derive Qualitative Features for Improved Churn PredictionSoumi De0https://orcid.org/0000-0002-4606-796XP. Prabu1Department of Data Science, CHRIST (Deemed to be University), Bengaluru, IndiaDepartment of Computer Science, CHRIST (Deemed to be University), Bengaluru, IndiaThe effectiveness of any Machine Learning process depends on the accuracy of annotated data that is used to train a learner. However, manual annotation is expensive. Hence, researchers adopt a semi-supervised approach called active learning that aims to achieve state-of-the-art performance using minimal number of samples. Although it boosts classifier performance, the underlying query strategies are unable to eliminate redundancy in selected samples. Redundant samples lead to increased cost and sub-optimal performance of learner. Inspired by this challenge, the study proposes a new representation-based query strategy that selects highly informative and representative subsets of samples for manual annotation. Data comprises messages of a set of customers sent to a service provider. Series of experiments are conducted to analyze the effectiveness of the proposed query strategy, called “Entropy-based Min Max Similarity” (E-MMSIM), in the context of topic classification for churn prediction. The foundation of E-MMSIM is an algorithm that is popularly used to sequence proteins in protein databases. The algorithm is modified and utilized to select the most representative and informative samples. The performance is evaluated using F1-score, AUC and accuracy. It is observed that “E-MMSIM” outperforms popular query strategies, and improves performance of topic classifiers for each of the 4 topics of churn prediction. The trained topic classifiers are used to derive qualitative features. These features are further integrated with structured variables for the same group of customers to predict churn. Experiments provide evidence that inclusion of qualitative features derived using E-MMSIM, enhance the performance of churn classifiers by 5%.https://ieeexplore.ieee.org/document/10004958/Active learningchurn predictionquery strategyentropytopic classification |
spellingShingle | Soumi De P. Prabu A Representation-Based Query Strategy to Derive Qualitative Features for Improved Churn Prediction IEEE Access Active learning churn prediction query strategy entropy topic classification |
title | A Representation-Based Query Strategy to Derive Qualitative Features for Improved Churn Prediction |
title_full | A Representation-Based Query Strategy to Derive Qualitative Features for Improved Churn Prediction |
title_fullStr | A Representation-Based Query Strategy to Derive Qualitative Features for Improved Churn Prediction |
title_full_unstemmed | A Representation-Based Query Strategy to Derive Qualitative Features for Improved Churn Prediction |
title_short | A Representation-Based Query Strategy to Derive Qualitative Features for Improved Churn Prediction |
title_sort | representation based query strategy to derive qualitative features for improved churn prediction |
topic | Active learning churn prediction query strategy entropy topic classification |
url | https://ieeexplore.ieee.org/document/10004958/ |
work_keys_str_mv | AT soumide arepresentationbasedquerystrategytoderivequalitativefeaturesforimprovedchurnprediction AT pprabu arepresentationbasedquerystrategytoderivequalitativefeaturesforimprovedchurnprediction AT soumide representationbasedquerystrategytoderivequalitativefeaturesforimprovedchurnprediction AT pprabu representationbasedquerystrategytoderivequalitativefeaturesforimprovedchurnprediction |