Parallel mining of uncertain data using segmentation of data set area and Voronoi diagrams

Clustering of uncertain objects in large uncertain databases and problem of mining uncertain data has been well studied. In this paper, clustering of uncertain objects with location uncertainty is studied. Moving objects, like mobile devices, report their locations periodically, thus their locations...

Full description

Bibliographic Details
Main Authors: Ivica Lukić, Željko Hocenski, Mirko Köhler, Tomislav Galba
Format: Article
Language:English
Published: Taylor & Francis Group 2018-10-01
Series:Automatika
Subjects:
Online Access:http://dx.doi.org/10.1080/00051144.2018.1541645
_version_ 1819103258375880704
author Ivica Lukić
Željko Hocenski
Mirko Köhler
Tomislav Galba
author_facet Ivica Lukić
Željko Hocenski
Mirko Köhler
Tomislav Galba
author_sort Ivica Lukić
collection DOAJ
description Clustering of uncertain objects in large uncertain databases and problem of mining uncertain data has been well studied. In this paper, clustering of uncertain objects with location uncertainty is studied. Moving objects, like mobile devices, report their locations periodically, thus their locations are uncertain and best described by a probability density function. The number of objects in a database can be large which makes the process of mining accurate data, a challenging and time consuming task. Authors will give an overview of existing clustering methods and present a new approach for data mining and parallel computing of clustering problems. All existing methods use pruning to avoid expected distance calculations. It is required to calculate the expected distance numerical integration, which is time-consuming. Therefore, a new method, called Segmentation of Data Set Area-Parallel, is proposed. In this method, a data set area is divided into many small segments. Only clusters and objects in that segment are observed. The number of segments is calculated using the number and location of clusters. The use of segments gives the possibility of parallel computing, because segments are mutually independent. Thus, each segment can be computed on multiple cores. Paralelno klasteriranje nesigurnih podatka koristeći se segmentacijom područja podataka i Voronojevim dijagramima. Klasteriranje podataka s nesigurnošću je vrlo proučavano područje u velikim bazama nesigurnih podataka. U takvim bazama podataka teško je pronaći korisne podatke u mnoštvu podataka s nesigurnošću. U ovom radu proučavano je klasteriranje objekata koji imaju nesigurnost položaja. Većina pokretnih objekata, kao što su mobilni uređd-aji, periodički izvještava svoj položaj, stoga je njihov položaj neprecizan te se mora opisati funkcijom gustoće vjerojatnosti. Broj objekata u bazi podataka može biti jako velik i doći do točnih podataka je izazovan zadatak i zahtijeva puno vremena. Sve metode za klasteriranje nesigurnih podataka koriste slične principe. Ovim radom predložen je nov pristup. Prvo je dan pregled postojećih metoda, a nakon toga predložena je nova metoda za paralelno klasteriranje nesigurnih podataka. Sve postojeće metode koriste se različitim postupcima pročišćavanja kako bi se izbjeglo računanje očekivane udaljenosti jer ono uključuje numeričke integracije i zahtijeva puno vremena. Predložili smo metodu nazvanu paralelna segmentacija područja podataka. U toj metodi, klastersko područje podijeljeno je u mnogo malih segmenata te se promatraju samo klasteri i objekti u tim malim segmentima. Broj segmenata izračunava se pomoću broja i položaja klastera u prostoru. To nam daje mogućnost za paralelno računanje jer segmenti su međd-usobno neovisni te se tako svaki segment može računati na više procesorskih jezgri.
first_indexed 2024-12-22T01:47:36Z
format Article
id doaj.art-2306f70ae53d472db904c1fe42d3f3e3
institution Directory Open Access Journal
issn 0005-1144
1848-3380
language English
last_indexed 2024-12-22T01:47:36Z
publishDate 2018-10-01
publisher Taylor & Francis Group
record_format Article
series Automatika
spelling doaj.art-2306f70ae53d472db904c1fe42d3f3e32022-12-21T18:43:00ZengTaylor & Francis GroupAutomatika0005-11441848-33802018-10-01593-434935610.1080/00051144.2018.15416451541645Parallel mining of uncertain data using segmentation of data set area and Voronoi diagramsIvica Lukić0Željko Hocenski1Mirko Köhler2Tomislav Galba3Josip Juraj Strossmayer University of OsijekJosip Juraj Strossmayer University of OsijekJosip Juraj Strossmayer University of OsijekJosip Juraj Strossmayer University of OsijekClustering of uncertain objects in large uncertain databases and problem of mining uncertain data has been well studied. In this paper, clustering of uncertain objects with location uncertainty is studied. Moving objects, like mobile devices, report their locations periodically, thus their locations are uncertain and best described by a probability density function. The number of objects in a database can be large which makes the process of mining accurate data, a challenging and time consuming task. Authors will give an overview of existing clustering methods and present a new approach for data mining and parallel computing of clustering problems. All existing methods use pruning to avoid expected distance calculations. It is required to calculate the expected distance numerical integration, which is time-consuming. Therefore, a new method, called Segmentation of Data Set Area-Parallel, is proposed. In this method, a data set area is divided into many small segments. Only clusters and objects in that segment are observed. The number of segments is calculated using the number and location of clusters. The use of segments gives the possibility of parallel computing, because segments are mutually independent. Thus, each segment can be computed on multiple cores. Paralelno klasteriranje nesigurnih podatka koristeći se segmentacijom područja podataka i Voronojevim dijagramima. Klasteriranje podataka s nesigurnošću je vrlo proučavano područje u velikim bazama nesigurnih podataka. U takvim bazama podataka teško je pronaći korisne podatke u mnoštvu podataka s nesigurnošću. U ovom radu proučavano je klasteriranje objekata koji imaju nesigurnost položaja. Većina pokretnih objekata, kao što su mobilni uređd-aji, periodički izvještava svoj položaj, stoga je njihov položaj neprecizan te se mora opisati funkcijom gustoće vjerojatnosti. Broj objekata u bazi podataka može biti jako velik i doći do točnih podataka je izazovan zadatak i zahtijeva puno vremena. Sve metode za klasteriranje nesigurnih podataka koriste slične principe. Ovim radom predložen je nov pristup. Prvo je dan pregled postojećih metoda, a nakon toga predložena je nova metoda za paralelno klasteriranje nesigurnih podataka. Sve postojeće metode koriste se različitim postupcima pročišćavanja kako bi se izbjeglo računanje očekivane udaljenosti jer ono uključuje numeričke integracije i zahtijeva puno vremena. Predložili smo metodu nazvanu paralelna segmentacija područja podataka. U toj metodi, klastersko područje podijeljeno je u mnogo malih segmenata te se promatraju samo klasteri i objekti u tim malim segmentima. Broj segmenata izračunava se pomoću broja i položaja klastera u prostoru. To nam daje mogućnost za paralelno računanje jer segmenti su međd-usobno neovisni te se tako svaki segment može računati na više procesorskih jezgri.http://dx.doi.org/10.1080/00051144.2018.1541645Clustering algorithmsdata miningdata uncertaintyEuclidean distanceparallel algorithms
spellingShingle Ivica Lukić
Željko Hocenski
Mirko Köhler
Tomislav Galba
Parallel mining of uncertain data using segmentation of data set area and Voronoi diagrams
Automatika
Clustering algorithms
data mining
data uncertainty
Euclidean distance
parallel algorithms
title Parallel mining of uncertain data using segmentation of data set area and Voronoi diagrams
title_full Parallel mining of uncertain data using segmentation of data set area and Voronoi diagrams
title_fullStr Parallel mining of uncertain data using segmentation of data set area and Voronoi diagrams
title_full_unstemmed Parallel mining of uncertain data using segmentation of data set area and Voronoi diagrams
title_short Parallel mining of uncertain data using segmentation of data set area and Voronoi diagrams
title_sort parallel mining of uncertain data using segmentation of data set area and voronoi diagrams
topic Clustering algorithms
data mining
data uncertainty
Euclidean distance
parallel algorithms
url http://dx.doi.org/10.1080/00051144.2018.1541645
work_keys_str_mv AT ivicalukic parallelminingofuncertaindatausingsegmentationofdatasetareaandvoronoidiagrams
AT zeljkohocenski parallelminingofuncertaindatausingsegmentationofdatasetareaandvoronoidiagrams
AT mirkokohler parallelminingofuncertaindatausingsegmentationofdatasetareaandvoronoidiagrams
AT tomislavgalba parallelminingofuncertaindatausingsegmentationofdatasetareaandvoronoidiagrams