Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation

To beat competition and obtain valuable information, decision-makers must conduct in-depth machine learning or data mining for data analytics. Traditionally, clustering and classification are two common methods used in machine mining. For clustering, data are divided into various groups according to...

Full description

Bibliographic Details
Main Authors: Zne-Jung Lee, Chou-Yuan Lee, Li-Yun Chang, Natsuki Sano
Format: Article
Language:English
Published: MDPI AG 2021-08-01
Series:Symmetry
Subjects:
Online Access:https://www.mdpi.com/2073-8994/13/9/1557
_version_ 1797517109696135168
author Zne-Jung Lee
Chou-Yuan Lee
Li-Yun Chang
Natsuki Sano
author_facet Zne-Jung Lee
Chou-Yuan Lee
Li-Yun Chang
Natsuki Sano
author_sort Zne-Jung Lee
collection DOAJ
description To beat competition and obtain valuable information, decision-makers must conduct in-depth machine learning or data mining for data analytics. Traditionally, clustering and classification are two common methods used in machine mining. For clustering, data are divided into various groups according to the similarity or common features. On the other hand, classification refers to building a model by given training data, where the target class or label is predicted for the test data. In recent years, many researchers focus on the hybrid of clustering and classification. These techniques have admirable achievements, but there is still room to ameliorate performances, such as distributed process. Therefore, we propose clustering and classification based on distributed automatic feature engineering (AFE) for customer segmentation in this paper. In the proposed algorithm, AFE uses artificial bee colony (ABC) to select valuable features of input data, and then RFM provides the basic data analytics. In AFE, it first initializes the number of cluster <i>k</i>. Moreover, the clustering methods of <i>k-</i>means, Wald method, and fuzzy c-means (FCM) are processed to cluster the examples in variant groups. Finally, the classification method of an improved fuzzy decision tree classifies the target data and generates decision rules for explaining the detail situations. AFE also determines the value of the split number in the improved fuzzy decision tree to increase classification accuracy. The proposed clustering and classification based on automatic feature engineering is distributed, performed in Apache Spark platform. The topic of this paper is about solving the problem of clustering and classification for machine learning. From the results, the corresponding classification accuracy outperforms other approaches. Moreover, we also provide useful strategies and decision rules from data analytics for decision-makers.
first_indexed 2024-03-10T07:10:14Z
format Article
id doaj.art-08f2380a866346abb7a1b07ffd69d1db
institution Directory Open Access Journal
issn 2073-8994
language English
last_indexed 2024-03-10T07:10:14Z
publishDate 2021-08-01
publisher MDPI AG
record_format Article
series Symmetry
spelling doaj.art-08f2380a866346abb7a1b07ffd69d1db2023-11-22T15:26:35ZengMDPI AGSymmetry2073-89942021-08-01139155710.3390/sym13091557Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer SegmentationZne-Jung Lee0Chou-Yuan Lee1Li-Yun Chang2Natsuki Sano3School of Intelligent Construction, Fuzhou University of International Studies and Trade, No. 28, Yuhuan Road, Shouzhan New District, Changle, Fuzhou 350202, ChinaSchool of Big Data, Fuzhou University of International Studies and Trade, No. 28, Yuhuan Road, Shouzhan New District, Changle, Fuzhou 350202, ChinaDepartment of Information Management, National Taipei University of Nursing and Health Sciences, No. 365, Ming-Te Road, Peitou District, Taipei City 11219, TaiwanDepartment of Informatics, Tokyo University of Information Sciences, 4-1 Onaridai, Wakaba-ku, Chiba 265-8501, JapanTo beat competition and obtain valuable information, decision-makers must conduct in-depth machine learning or data mining for data analytics. Traditionally, clustering and classification are two common methods used in machine mining. For clustering, data are divided into various groups according to the similarity or common features. On the other hand, classification refers to building a model by given training data, where the target class or label is predicted for the test data. In recent years, many researchers focus on the hybrid of clustering and classification. These techniques have admirable achievements, but there is still room to ameliorate performances, such as distributed process. Therefore, we propose clustering and classification based on distributed automatic feature engineering (AFE) for customer segmentation in this paper. In the proposed algorithm, AFE uses artificial bee colony (ABC) to select valuable features of input data, and then RFM provides the basic data analytics. In AFE, it first initializes the number of cluster <i>k</i>. Moreover, the clustering methods of <i>k-</i>means, Wald method, and fuzzy c-means (FCM) are processed to cluster the examples in variant groups. Finally, the classification method of an improved fuzzy decision tree classifies the target data and generates decision rules for explaining the detail situations. AFE also determines the value of the split number in the improved fuzzy decision tree to increase classification accuracy. The proposed clustering and classification based on automatic feature engineering is distributed, performed in Apache Spark platform. The topic of this paper is about solving the problem of clustering and classification for machine learning. From the results, the corresponding classification accuracy outperforms other approaches. Moreover, we also provide useful strategies and decision rules from data analytics for decision-makers.https://www.mdpi.com/2073-8994/13/9/1557clusteringclassificationautomatic feature engineeringmachine learningimproved fuzzy decision treeApache Spark
spellingShingle Zne-Jung Lee
Chou-Yuan Lee
Li-Yun Chang
Natsuki Sano
Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation
Symmetry
clustering
classification
automatic feature engineering
machine learning
improved fuzzy decision tree
Apache Spark
title Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation
title_full Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation
title_fullStr Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation
title_full_unstemmed Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation
title_short Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation
title_sort clustering and classification based on distributed automatic feature engineering for customer segmentation
topic clustering
classification
automatic feature engineering
machine learning
improved fuzzy decision tree
Apache Spark
url https://www.mdpi.com/2073-8994/13/9/1557
work_keys_str_mv AT znejunglee clusteringandclassificationbasedondistributedautomaticfeatureengineeringforcustomersegmentation
AT chouyuanlee clusteringandclassificationbasedondistributedautomaticfeatureengineeringforcustomersegmentation
AT liyunchang clusteringandclassificationbasedondistributedautomaticfeatureengineeringforcustomersegmentation
AT natsukisano clusteringandclassificationbasedondistributedautomaticfeatureengineeringforcustomersegmentation