Fast Data Reduction With Granulation-Based Instances Importance Labeling

Data reduction has become greatly significant prior to applying instance-based machine learning algorithms in the Big Data era. Data reduction is used to reduce the size of data sets while retaining representative data. Existing algorithms, however, suffer from heavy computational cost and in having...

Full description

Bibliographic Details
Main Authors:	Xiaoyan Sun, Lian Liu, Cong Geng, Shaofeng Yang
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Data reduction granular computing data importance label <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>NN
Online Access:	https://ieeexplore.ieee.org/document/8585005/

_version_	1828908179204866048
author	Xiaoyan Sun Lian Liu Cong Geng Shaofeng Yang
author_facet	Xiaoyan Sun Lian Liu Cong Geng Shaofeng Yang
author_sort	Xiaoyan Sun
collection	DOAJ
description	Data reduction has become greatly significant prior to applying instance-based machine learning algorithms in the Big Data era. Data reduction is used to reduce the size of data sets while retaining representative data. Existing algorithms, however, suffer from heavy computational cost and in having tradeoff in size reduction rate and learning accuracy. In this paper, we propose a fast data reduction approach by using granular computing to label important instances, i.e., instances with higher contributions to the learning task. The original data set is first granulated into K granules by applying K-means to a mapped lower-dimension space. Then, the importance of each instance in every granule is labeled based on its Hausdorff distance. Those instances whose importance values are lower than an experimentally tuned threshold are eliminated. The presented algorithm is applied to k NN classification tasks with eighteen different sizes of data sets from the UCI repository, and its outstanding performance in classification accuracy, size reduction rate, and runtime is illustrated by comparing with seven data reduction methods. The experimental results demonstrate that the proposed algorithm can greatly reduce the computational cost and achieve a higher classification accuracy when the reduction size is the same for all the compared algorithms.
first_indexed	2024-12-13T18:05:35Z
format	Article
id	doaj.art-b0067616156646b88162e4feb7020415
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-13T18:05:35Z
publishDate	2019-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-b0067616156646b88162e4feb70204152022-12-21T23:36:05ZengIEEEIEEE Access2169-35362019-01-017335873359710.1109/ACCESS.2018.28891228585005Fast Data Reduction With Granulation-Based Instances Importance LabelingXiaoyan Sun0https://orcid.org/0000-0002-1386-6853Lian Liu1https://orcid.org/0000-0002-7833-2131Cong Geng2Shaofeng Yang3Information and Control Engineering College, China University of Mining and Technology, Xuzhou, ChinaInformation and Control Engineering College, China University of Mining and Technology, Xuzhou, ChinaInformation and Control Engineering College, China University of Mining and Technology, Xuzhou, ChinaAsset Management Co., Ltd., China University of Mining and Technology, Xuzhou, ChinaData reduction has become greatly significant prior to applying instance-based machine learning algorithms in the Big Data era. Data reduction is used to reduce the size of data sets while retaining representative data. Existing algorithms, however, suffer from heavy computational cost and in having tradeoff in size reduction rate and learning accuracy. In this paper, we propose a fast data reduction approach by using granular computing to label important instances, i.e., instances with higher contributions to the learning task. The original data set is first granulated into K granules by applying K-means to a mapped lower-dimension space. Then, the importance of each instance in every granule is labeled based on its Hausdorff distance. Those instances whose importance values are lower than an experimentally tuned threshold are eliminated. The presented algorithm is applied to k NN classification tasks with eighteen different sizes of data sets from the UCI repository, and its outstanding performance in classification accuracy, size reduction rate, and runtime is illustrated by comparing with seven data reduction methods. The experimental results demonstrate that the proposed algorithm can greatly reduce the computational cost and achieve a higher classification accuracy when the reduction size is the same for all the compared algorithms.https://ieeexplore.ieee.org/document/8585005/Data reductiongranular computingdata importance label<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>NN
spellingShingle	Xiaoyan Sun Lian Liu Cong Geng Shaofeng Yang Fast Data Reduction With Granulation-Based Instances Importance Labeling IEEE Access Data reduction granular computing data importance label <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>NN
title	Fast Data Reduction With Granulation-Based Instances Importance Labeling
title_full	Fast Data Reduction With Granulation-Based Instances Importance Labeling
title_fullStr	Fast Data Reduction With Granulation-Based Instances Importance Labeling
title_full_unstemmed	Fast Data Reduction With Granulation-Based Instances Importance Labeling
title_short	Fast Data Reduction With Granulation-Based Instances Importance Labeling
title_sort	fast data reduction with granulation based instances importance labeling
topic	Data reduction granular computing data importance label <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k</italic>NN
url	https://ieeexplore.ieee.org/document/8585005/
work_keys_str_mv	AT xiaoyansun fastdatareductionwithgranulationbasedinstancesimportancelabeling AT lianliu fastdatareductionwithgranulationbasedinstancesimportancelabeling AT conggeng fastdatareductionwithgranulationbasedinstancesimportancelabeling AT shaofengyang fastdatareductionwithgranulationbasedinstancesimportancelabeling

Fast Data Reduction With Granulation-Based Instances Importance Labeling

Similar Items