Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation
To beat competition and obtain valuable information, decision-makers must conduct in-depth machine learning or data mining for data analytics. Traditionally, clustering and classification are two common methods used in machine mining. For clustering, data are divided into various groups according to...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-08-01
|
Series: | Symmetry |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-8994/13/9/1557 |
_version_ | 1797517109696135168 |
---|---|
author | Zne-Jung Lee Chou-Yuan Lee Li-Yun Chang Natsuki Sano |
author_facet | Zne-Jung Lee Chou-Yuan Lee Li-Yun Chang Natsuki Sano |
author_sort | Zne-Jung Lee |
collection | DOAJ |
description | To beat competition and obtain valuable information, decision-makers must conduct in-depth machine learning or data mining for data analytics. Traditionally, clustering and classification are two common methods used in machine mining. For clustering, data are divided into various groups according to the similarity or common features. On the other hand, classification refers to building a model by given training data, where the target class or label is predicted for the test data. In recent years, many researchers focus on the hybrid of clustering and classification. These techniques have admirable achievements, but there is still room to ameliorate performances, such as distributed process. Therefore, we propose clustering and classification based on distributed automatic feature engineering (AFE) for customer segmentation in this paper. In the proposed algorithm, AFE uses artificial bee colony (ABC) to select valuable features of input data, and then RFM provides the basic data analytics. In AFE, it first initializes the number of cluster <i>k</i>. Moreover, the clustering methods of <i>k-</i>means, Wald method, and fuzzy c-means (FCM) are processed to cluster the examples in variant groups. Finally, the classification method of an improved fuzzy decision tree classifies the target data and generates decision rules for explaining the detail situations. AFE also determines the value of the split number in the improved fuzzy decision tree to increase classification accuracy. The proposed clustering and classification based on automatic feature engineering is distributed, performed in Apache Spark platform. The topic of this paper is about solving the problem of clustering and classification for machine learning. From the results, the corresponding classification accuracy outperforms other approaches. Moreover, we also provide useful strategies and decision rules from data analytics for decision-makers. |
first_indexed | 2024-03-10T07:10:14Z |
format | Article |
id | doaj.art-08f2380a866346abb7a1b07ffd69d1db |
institution | Directory Open Access Journal |
issn | 2073-8994 |
language | English |
last_indexed | 2024-03-10T07:10:14Z |
publishDate | 2021-08-01 |
publisher | MDPI AG |
record_format | Article |
series | Symmetry |
spelling | doaj.art-08f2380a866346abb7a1b07ffd69d1db2023-11-22T15:26:35ZengMDPI AGSymmetry2073-89942021-08-01139155710.3390/sym13091557Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer SegmentationZne-Jung Lee0Chou-Yuan Lee1Li-Yun Chang2Natsuki Sano3School of Intelligent Construction, Fuzhou University of International Studies and Trade, No. 28, Yuhuan Road, Shouzhan New District, Changle, Fuzhou 350202, ChinaSchool of Big Data, Fuzhou University of International Studies and Trade, No. 28, Yuhuan Road, Shouzhan New District, Changle, Fuzhou 350202, ChinaDepartment of Information Management, National Taipei University of Nursing and Health Sciences, No. 365, Ming-Te Road, Peitou District, Taipei City 11219, TaiwanDepartment of Informatics, Tokyo University of Information Sciences, 4-1 Onaridai, Wakaba-ku, Chiba 265-8501, JapanTo beat competition and obtain valuable information, decision-makers must conduct in-depth machine learning or data mining for data analytics. Traditionally, clustering and classification are two common methods used in machine mining. For clustering, data are divided into various groups according to the similarity or common features. On the other hand, classification refers to building a model by given training data, where the target class or label is predicted for the test data. In recent years, many researchers focus on the hybrid of clustering and classification. These techniques have admirable achievements, but there is still room to ameliorate performances, such as distributed process. Therefore, we propose clustering and classification based on distributed automatic feature engineering (AFE) for customer segmentation in this paper. In the proposed algorithm, AFE uses artificial bee colony (ABC) to select valuable features of input data, and then RFM provides the basic data analytics. In AFE, it first initializes the number of cluster <i>k</i>. Moreover, the clustering methods of <i>k-</i>means, Wald method, and fuzzy c-means (FCM) are processed to cluster the examples in variant groups. Finally, the classification method of an improved fuzzy decision tree classifies the target data and generates decision rules for explaining the detail situations. AFE also determines the value of the split number in the improved fuzzy decision tree to increase classification accuracy. The proposed clustering and classification based on automatic feature engineering is distributed, performed in Apache Spark platform. The topic of this paper is about solving the problem of clustering and classification for machine learning. From the results, the corresponding classification accuracy outperforms other approaches. Moreover, we also provide useful strategies and decision rules from data analytics for decision-makers.https://www.mdpi.com/2073-8994/13/9/1557clusteringclassificationautomatic feature engineeringmachine learningimproved fuzzy decision treeApache Spark |
spellingShingle | Zne-Jung Lee Chou-Yuan Lee Li-Yun Chang Natsuki Sano Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation Symmetry clustering classification automatic feature engineering machine learning improved fuzzy decision tree Apache Spark |
title | Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation |
title_full | Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation |
title_fullStr | Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation |
title_full_unstemmed | Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation |
title_short | Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation |
title_sort | clustering and classification based on distributed automatic feature engineering for customer segmentation |
topic | clustering classification automatic feature engineering machine learning improved fuzzy decision tree Apache Spark |
url | https://www.mdpi.com/2073-8994/13/9/1557 |
work_keys_str_mv | AT znejunglee clusteringandclassificationbasedondistributedautomaticfeatureengineeringforcustomersegmentation AT chouyuanlee clusteringandclassificationbasedondistributedautomaticfeatureengineeringforcustomersegmentation AT liyunchang clusteringandclassificationbasedondistributedautomaticfeatureengineeringforcustomersegmentation AT natsukisano clusteringandclassificationbasedondistributedautomaticfeatureengineeringforcustomersegmentation |