Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation

To beat competition and obtain valuable information, decision-makers must conduct in-depth machine learning or data mining for data analytics. Traditionally, clustering and classification are two common methods used in machine mining. For clustering, data are divided into various groups according to...

Full description

Bibliographic Details
Main Authors:	Zne-Jung Lee, Chou-Yuan Lee, Li-Yun Chang, Natsuki Sano
Format:	Article
Language:	English
Published:	MDPI AG 2021-08-01
Series:	Symmetry
Subjects:	clustering classification automatic feature engineering machine learning improved fuzzy decision tree Apache Spark
Online Access:	https://www.mdpi.com/2073-8994/13/9/1557

_version_	1797517109696135168
author	Zne-Jung Lee Chou-Yuan Lee Li-Yun Chang Natsuki Sano
author_facet	Zne-Jung Lee Chou-Yuan Lee Li-Yun Chang Natsuki Sano
author_sort	Zne-Jung Lee
collection	DOAJ
description	To beat competition and obtain valuable information, decision-makers must conduct in-depth machine learning or data mining for data analytics. Traditionally, clustering and classification are two common methods used in machine mining. For clustering, data are divided into various groups according to the similarity or common features. On the other hand, classification refers to building a model by given training data, where the target class or label is predicted for the test data. In recent years, many researchers focus on the hybrid of clustering and classification. These techniques have admirable achievements, but there is still room to ameliorate performances, such as distributed process. Therefore, we propose clustering and classification based on distributed automatic feature engineering (AFE) for customer segmentation in this paper. In the proposed algorithm, AFE uses artificial bee colony (ABC) to select valuable features of input data, and then RFM provides the basic data analytics. In AFE, it first initializes the number of cluster <i>k</i>. Moreover, the clustering methods of <i>k-</i>means, Wald method, and fuzzy c-means (FCM) are processed to cluster the examples in variant groups. Finally, the classification method of an improved fuzzy decision tree classifies the target data and generates decision rules for explaining the detail situations. AFE also determines the value of the split number in the improved fuzzy decision tree to increase classification accuracy. The proposed clustering and classification based on automatic feature engineering is distributed, performed in Apache Spark platform. The topic of this paper is about solving the problem of clustering and classification for machine learning. From the results, the corresponding classification accuracy outperforms other approaches. Moreover, we also provide useful strategies and decision rules from data analytics for decision-makers.
first_indexed	2024-03-10T07:10:14Z
format	Article
id	doaj.art-08f2380a866346abb7a1b07ffd69d1db
institution	Directory Open Access Journal
issn	2073-8994
language	English
last_indexed	2024-03-10T07:10:14Z
publishDate	2021-08-01
publisher	MDPI AG
record_format	Article
series	Symmetry
spelling	doaj.art-08f2380a866346abb7a1b07ffd69d1db2023-11-22T15:26:35ZengMDPI AGSymmetry2073-89942021-08-01139155710.3390/sym13091557Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer SegmentationZne-Jung Lee0Chou-Yuan Lee1Li-Yun Chang2Natsuki Sano3School of Intelligent Construction, Fuzhou University of International Studies and Trade, No. 28, Yuhuan Road, Shouzhan New District, Changle, Fuzhou 350202, ChinaSchool of Big Data, Fuzhou University of International Studies and Trade, No. 28, Yuhuan Road, Shouzhan New District, Changle, Fuzhou 350202, ChinaDepartment of Information Management, National Taipei University of Nursing and Health Sciences, No. 365, Ming-Te Road, Peitou District, Taipei City 11219, TaiwanDepartment of Informatics, Tokyo University of Information Sciences, 4-1 Onaridai, Wakaba-ku, Chiba 265-8501, JapanTo beat competition and obtain valuable information, decision-makers must conduct in-depth machine learning or data mining for data analytics. Traditionally, clustering and classification are two common methods used in machine mining. For clustering, data are divided into various groups according to the similarity or common features. On the other hand, classification refers to building a model by given training data, where the target class or label is predicted for the test data. In recent years, many researchers focus on the hybrid of clustering and classification. These techniques have admirable achievements, but there is still room to ameliorate performances, such as distributed process. Therefore, we propose clustering and classification based on distributed automatic feature engineering (AFE) for customer segmentation in this paper. In the proposed algorithm, AFE uses artificial bee colony (ABC) to select valuable features of input data, and then RFM provides the basic data analytics. In AFE, it first initializes the number of cluster <i>k</i>. Moreover, the clustering methods of <i>k-</i>means, Wald method, and fuzzy c-means (FCM) are processed to cluster the examples in variant groups. Finally, the classification method of an improved fuzzy decision tree classifies the target data and generates decision rules for explaining the detail situations. AFE also determines the value of the split number in the improved fuzzy decision tree to increase classification accuracy. The proposed clustering and classification based on automatic feature engineering is distributed, performed in Apache Spark platform. The topic of this paper is about solving the problem of clustering and classification for machine learning. From the results, the corresponding classification accuracy outperforms other approaches. Moreover, we also provide useful strategies and decision rules from data analytics for decision-makers.https://www.mdpi.com/2073-8994/13/9/1557clusteringclassificationautomatic feature engineeringmachine learningimproved fuzzy decision treeApache Spark
spellingShingle	Zne-Jung Lee Chou-Yuan Lee Li-Yun Chang Natsuki Sano Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation Symmetry clustering classification automatic feature engineering machine learning improved fuzzy decision tree Apache Spark
title	Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation
title_full	Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation
title_fullStr	Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation
title_full_unstemmed	Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation
title_short	Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation
title_sort	clustering and classification based on distributed automatic feature engineering for customer segmentation
topic	clustering classification automatic feature engineering machine learning improved fuzzy decision tree Apache Spark
url	https://www.mdpi.com/2073-8994/13/9/1557
work_keys_str_mv	AT znejunglee clusteringandclassificationbasedondistributedautomaticfeatureengineeringforcustomersegmentation AT chouyuanlee clusteringandclassificationbasedondistributedautomaticfeatureengineeringforcustomersegmentation AT liyunchang clusteringandclassificationbasedondistributedautomaticfeatureengineeringforcustomersegmentation AT natsukisano clusteringandclassificationbasedondistributedautomaticfeatureengineeringforcustomersegmentation

Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation

Similar Items