PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data

Remote sensing big data (RSBD) is generally characterized by huge volumes, diversity, and high dimensionality. Mining hidden information from RSBD for different applications imposes significant computational challenges. Clustering is an important data mining technique widely used in processing and a...

Full description

Bibliographic Details
Main Authors: Huiyu Xia, Wei Huang, Ning Li, Jianzhong Zhou, Dongying Zhang
Format: Article
Language:English
Published: MDPI AG 2019-08-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/19/15/3438
_version_ 1828146484385677312
author Huiyu Xia
Wei Huang
Ning Li
Jianzhong Zhou
Dongying Zhang
author_facet Huiyu Xia
Wei Huang
Ning Li
Jianzhong Zhou
Dongying Zhang
author_sort Huiyu Xia
collection DOAJ
description Remote sensing big data (RSBD) is generally characterized by huge volumes, diversity, and high dimensionality. Mining hidden information from RSBD for different applications imposes significant computational challenges. Clustering is an important data mining technique widely used in processing and analyzing remote sensing imagery. However, conventional clustering algorithms are designed for relatively small datasets. When applied to problems with RSBD, they are, in general, too slow or inefficient for practical use. In this paper, we proposed a parallel subsampling-based clustering (PARSUC) method for improving the performance of RSBD clustering in terms of both efficiency and accuracy. PARSUC leverages a novel subsampling-based data partitioning (SubDP) method to realize three-step parallel clustering, effectively solving the notable performance bottleneck of the existing parallel clustering algorithms; that is, they must cope with numerous repeated calculations to get a reasonable result. Furthermore, we propose a centroid filtering algorithm (CFA) to eliminate subsampling errors and to guarantee the accuracy of the clustering results. PARSUC was implemented on a Hadoop platform by using the MapReduce parallel model. Experiments conducted on massive remote sensing imageries with different sizes showed that PARSUC (1) provided much better accuracy than conventional remote sensing clustering algorithms in handling larger image data; (2) achieved notable scalability with increased computing nodes added; and (3) spent much less time than the existing parallel clustering algorithm in handling RSBD.
first_indexed 2024-04-11T20:48:04Z
format Article
id doaj.art-3e8738ec19b142da98d419f6b3fb5642
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-04-11T20:48:04Z
publishDate 2019-08-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-3e8738ec19b142da98d419f6b3fb56422022-12-22T04:03:58ZengMDPI AGSensors1424-82202019-08-011915343810.3390/s19153438s19153438PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big DataHuiyu Xia0Wei Huang1Ning Li2Jianzhong Zhou3Dongying Zhang4Yangtze River Waterway Bureau, Nanjing 210011, ChinaSchool of Hydropower and Information Engineering, Huazhong University of Science and Technology, Wuhan 430074, ChinaYellow River Engineering Consulting Co., Ltd., Zhengzhou 450003, ChinaSchool of Hydropower and Information Engineering, Huazhong University of Science and Technology, Wuhan 430074, ChinaSchool of Hydropower and Information Engineering, Huazhong University of Science and Technology, Wuhan 430074, ChinaRemote sensing big data (RSBD) is generally characterized by huge volumes, diversity, and high dimensionality. Mining hidden information from RSBD for different applications imposes significant computational challenges. Clustering is an important data mining technique widely used in processing and analyzing remote sensing imagery. However, conventional clustering algorithms are designed for relatively small datasets. When applied to problems with RSBD, they are, in general, too slow or inefficient for practical use. In this paper, we proposed a parallel subsampling-based clustering (PARSUC) method for improving the performance of RSBD clustering in terms of both efficiency and accuracy. PARSUC leverages a novel subsampling-based data partitioning (SubDP) method to realize three-step parallel clustering, effectively solving the notable performance bottleneck of the existing parallel clustering algorithms; that is, they must cope with numerous repeated calculations to get a reasonable result. Furthermore, we propose a centroid filtering algorithm (CFA) to eliminate subsampling errors and to guarantee the accuracy of the clustering results. PARSUC was implemented on a Hadoop platform by using the MapReduce parallel model. Experiments conducted on massive remote sensing imageries with different sizes showed that PARSUC (1) provided much better accuracy than conventional remote sensing clustering algorithms in handling larger image data; (2) achieved notable scalability with increased computing nodes added; and (3) spent much less time than the existing parallel clustering algorithm in handling RSBD.https://www.mdpi.com/1424-8220/19/15/3438clusteringparallel computingremote sensing big dataMapReduce
spellingShingle Huiyu Xia
Wei Huang
Ning Li
Jianzhong Zhou
Dongying Zhang
PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data
Sensors
clustering
parallel computing
remote sensing big data
MapReduce
title PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data
title_full PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data
title_fullStr PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data
title_full_unstemmed PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data
title_short PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data
title_sort parsuc a parallel subsampling based method for clustering remote sensing big data
topic clustering
parallel computing
remote sensing big data
MapReduce
url https://www.mdpi.com/1424-8220/19/15/3438
work_keys_str_mv AT huiyuxia parsucaparallelsubsamplingbasedmethodforclusteringremotesensingbigdata
AT weihuang parsucaparallelsubsamplingbasedmethodforclusteringremotesensingbigdata
AT ningli parsucaparallelsubsamplingbasedmethodforclusteringremotesensingbigdata
AT jianzhongzhou parsucaparallelsubsamplingbasedmethodforclusteringremotesensingbigdata
AT dongyingzhang parsucaparallelsubsamplingbasedmethodforclusteringremotesensingbigdata