PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data
Remote sensing big data (RSBD) is generally characterized by huge volumes, diversity, and high dimensionality. Mining hidden information from RSBD for different applications imposes significant computational challenges. Clustering is an important data mining technique widely used in processing and a...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2019-08-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/19/15/3438 |
_version_ | 1828146484385677312 |
---|---|
author | Huiyu Xia Wei Huang Ning Li Jianzhong Zhou Dongying Zhang |
author_facet | Huiyu Xia Wei Huang Ning Li Jianzhong Zhou Dongying Zhang |
author_sort | Huiyu Xia |
collection | DOAJ |
description | Remote sensing big data (RSBD) is generally characterized by huge volumes, diversity, and high dimensionality. Mining hidden information from RSBD for different applications imposes significant computational challenges. Clustering is an important data mining technique widely used in processing and analyzing remote sensing imagery. However, conventional clustering algorithms are designed for relatively small datasets. When applied to problems with RSBD, they are, in general, too slow or inefficient for practical use. In this paper, we proposed a parallel subsampling-based clustering (PARSUC) method for improving the performance of RSBD clustering in terms of both efficiency and accuracy. PARSUC leverages a novel subsampling-based data partitioning (SubDP) method to realize three-step parallel clustering, effectively solving the notable performance bottleneck of the existing parallel clustering algorithms; that is, they must cope with numerous repeated calculations to get a reasonable result. Furthermore, we propose a centroid filtering algorithm (CFA) to eliminate subsampling errors and to guarantee the accuracy of the clustering results. PARSUC was implemented on a Hadoop platform by using the MapReduce parallel model. Experiments conducted on massive remote sensing imageries with different sizes showed that PARSUC (1) provided much better accuracy than conventional remote sensing clustering algorithms in handling larger image data; (2) achieved notable scalability with increased computing nodes added; and (3) spent much less time than the existing parallel clustering algorithm in handling RSBD. |
first_indexed | 2024-04-11T20:48:04Z |
format | Article |
id | doaj.art-3e8738ec19b142da98d419f6b3fb5642 |
institution | Directory Open Access Journal |
issn | 1424-8220 |
language | English |
last_indexed | 2024-04-11T20:48:04Z |
publishDate | 2019-08-01 |
publisher | MDPI AG |
record_format | Article |
series | Sensors |
spelling | doaj.art-3e8738ec19b142da98d419f6b3fb56422022-12-22T04:03:58ZengMDPI AGSensors1424-82202019-08-011915343810.3390/s19153438s19153438PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big DataHuiyu Xia0Wei Huang1Ning Li2Jianzhong Zhou3Dongying Zhang4Yangtze River Waterway Bureau, Nanjing 210011, ChinaSchool of Hydropower and Information Engineering, Huazhong University of Science and Technology, Wuhan 430074, ChinaYellow River Engineering Consulting Co., Ltd., Zhengzhou 450003, ChinaSchool of Hydropower and Information Engineering, Huazhong University of Science and Technology, Wuhan 430074, ChinaSchool of Hydropower and Information Engineering, Huazhong University of Science and Technology, Wuhan 430074, ChinaRemote sensing big data (RSBD) is generally characterized by huge volumes, diversity, and high dimensionality. Mining hidden information from RSBD for different applications imposes significant computational challenges. Clustering is an important data mining technique widely used in processing and analyzing remote sensing imagery. However, conventional clustering algorithms are designed for relatively small datasets. When applied to problems with RSBD, they are, in general, too slow or inefficient for practical use. In this paper, we proposed a parallel subsampling-based clustering (PARSUC) method for improving the performance of RSBD clustering in terms of both efficiency and accuracy. PARSUC leverages a novel subsampling-based data partitioning (SubDP) method to realize three-step parallel clustering, effectively solving the notable performance bottleneck of the existing parallel clustering algorithms; that is, they must cope with numerous repeated calculations to get a reasonable result. Furthermore, we propose a centroid filtering algorithm (CFA) to eliminate subsampling errors and to guarantee the accuracy of the clustering results. PARSUC was implemented on a Hadoop platform by using the MapReduce parallel model. Experiments conducted on massive remote sensing imageries with different sizes showed that PARSUC (1) provided much better accuracy than conventional remote sensing clustering algorithms in handling larger image data; (2) achieved notable scalability with increased computing nodes added; and (3) spent much less time than the existing parallel clustering algorithm in handling RSBD.https://www.mdpi.com/1424-8220/19/15/3438clusteringparallel computingremote sensing big dataMapReduce |
spellingShingle | Huiyu Xia Wei Huang Ning Li Jianzhong Zhou Dongying Zhang PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data Sensors clustering parallel computing remote sensing big data MapReduce |
title | PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data |
title_full | PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data |
title_fullStr | PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data |
title_full_unstemmed | PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data |
title_short | PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data |
title_sort | parsuc a parallel subsampling based method for clustering remote sensing big data |
topic | clustering parallel computing remote sensing big data MapReduce |
url | https://www.mdpi.com/1424-8220/19/15/3438 |
work_keys_str_mv | AT huiyuxia parsucaparallelsubsamplingbasedmethodforclusteringremotesensingbigdata AT weihuang parsucaparallelsubsamplingbasedmethodforclusteringremotesensingbigdata AT ningli parsucaparallelsubsamplingbasedmethodforclusteringremotesensingbigdata AT jianzhongzhou parsucaparallelsubsamplingbasedmethodforclusteringremotesensingbigdata AT dongyingzhang parsucaparallelsubsamplingbasedmethodforclusteringremotesensingbigdata |