CUDA-Based Parallelization of Power Iteration Clustering for Large Datasets

This paper presents a new clustering algorithm, the GPIC, a graphics processing unit (GPU) accelerated algorithm for power iteration clustering (PIC). Our algorithm is based on the original PIC proposal, adapted to take advantage of the GPU architecture, maintaining the algorithm's original pro...

Full description

Bibliographic Details
Main Authors: Gustavo Rodrigues Lacerda Silva, Rafael Ribeiro De Medeiros, Brayan Rene Acevedo Jaimes, Carla Caldeira Takahashi, Douglas Alexandre Gomes Vieira, Antonio De Padua Braga
Format: Article
Language:English
Published: IEEE 2017-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8078163/
_version_ 1818641502495047680
author Gustavo Rodrigues Lacerda Silva
Rafael Ribeiro De Medeiros
Brayan Rene Acevedo Jaimes
Carla Caldeira Takahashi
Douglas Alexandre Gomes Vieira
Antonio De Padua Braga
author_facet Gustavo Rodrigues Lacerda Silva
Rafael Ribeiro De Medeiros
Brayan Rene Acevedo Jaimes
Carla Caldeira Takahashi
Douglas Alexandre Gomes Vieira
Antonio De Padua Braga
author_sort Gustavo Rodrigues Lacerda Silva
collection DOAJ
description This paper presents a new clustering algorithm, the GPIC, a graphics processing unit (GPU) accelerated algorithm for power iteration clustering (PIC). Our algorithm is based on the original PIC proposal, adapted to take advantage of the GPU architecture, maintaining the algorithm's original properties. The proposed method was compared against the serial implementation, achieving a considerable speedup in tests with synthetic and real data sets. A significant volume of real data application (>107 records) was used, and we identified that GPIC implementation has good scalability to handle data sets with millions of data points. Our implementation efforts are directed towards two aspects: to process large data sets in less time and to maintain the same quality of the clusters results generated by the original PIC version.
first_indexed 2024-12-16T23:28:11Z
format Article
id doaj.art-8a482fd8f0b94d1f8109255913798040
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-16T23:28:11Z
publishDate 2017-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-8a482fd8f0b94d1f81092559137980402022-12-21T22:11:57ZengIEEEIEEE Access2169-35362017-01-015272632727110.1109/ACCESS.2017.27653808078163CUDA-Based Parallelization of Power Iteration Clustering for Large DatasetsGustavo Rodrigues Lacerda Silva0https://orcid.org/0000-0002-1436-8485Rafael Ribeiro De Medeiros1Brayan Rene Acevedo Jaimes2Carla Caldeira Takahashi3Douglas Alexandre Gomes Vieira4Antonio De Padua Braga5Graduate Program in Electrical Engineering - Universidade Federal de Minas Gerais - Av. Antônio Carlos 6627, Belo Horizonte, MG, BrazilENACOM Handcrafted Technologies, Belo Horizonte, BrazilGraduate Program in Electrical Engineering - Universidade Federal de Minas Gerais - Av. Antônio Carlos 6627, Belo Horizonte, MG, BrazilGraduate Program in Electrical Engineering - Universidade Federal de Minas Gerais - Av. Antônio Carlos 6627, Belo Horizonte, MG, BrazilENACOM Handcrafted Technologies, Belo Horizonte, BrazilGraduate Program in Electrical Engineering - Universidade Federal de Minas Gerais - Av. Antônio Carlos 6627, Belo Horizonte, MG, BrazilThis paper presents a new clustering algorithm, the GPIC, a graphics processing unit (GPU) accelerated algorithm for power iteration clustering (PIC). Our algorithm is based on the original PIC proposal, adapted to take advantage of the GPU architecture, maintaining the algorithm's original properties. The proposed method was compared against the serial implementation, achieving a considerable speedup in tests with synthetic and real data sets. A significant volume of real data application (>107 records) was used, and we identified that GPIC implementation has good scalability to handle data sets with millions of data points. Our implementation efforts are directed towards two aspects: to process large data sets in less time and to maintain the same quality of the clusters results generated by the original PIC version.https://ieeexplore.ieee.org/document/8078163/Scalable machine learning algorithmsGPUpower iteration clustering
spellingShingle Gustavo Rodrigues Lacerda Silva
Rafael Ribeiro De Medeiros
Brayan Rene Acevedo Jaimes
Carla Caldeira Takahashi
Douglas Alexandre Gomes Vieira
Antonio De Padua Braga
CUDA-Based Parallelization of Power Iteration Clustering for Large Datasets
IEEE Access
Scalable machine learning algorithms
GPU
power iteration clustering
title CUDA-Based Parallelization of Power Iteration Clustering for Large Datasets
title_full CUDA-Based Parallelization of Power Iteration Clustering for Large Datasets
title_fullStr CUDA-Based Parallelization of Power Iteration Clustering for Large Datasets
title_full_unstemmed CUDA-Based Parallelization of Power Iteration Clustering for Large Datasets
title_short CUDA-Based Parallelization of Power Iteration Clustering for Large Datasets
title_sort cuda based parallelization of power iteration clustering for large datasets
topic Scalable machine learning algorithms
GPU
power iteration clustering
url https://ieeexplore.ieee.org/document/8078163/
work_keys_str_mv AT gustavorodrigueslacerdasilva cudabasedparallelizationofpoweriterationclusteringforlargedatasets
AT rafaelribeirodemedeiros cudabasedparallelizationofpoweriterationclusteringforlargedatasets
AT brayanreneacevedojaimes cudabasedparallelizationofpoweriterationclusteringforlargedatasets
AT carlacaldeiratakahashi cudabasedparallelizationofpoweriterationclusteringforlargedatasets
AT douglasalexandregomesvieira cudabasedparallelizationofpoweriterationclusteringforlargedatasets
AT antoniodepaduabraga cudabasedparallelizationofpoweriterationclusteringforlargedatasets