CUDA-Based Parallelization of Power Iteration Clustering for Large Datasets
This paper presents a new clustering algorithm, the GPIC, a graphics processing unit (GPU) accelerated algorithm for power iteration clustering (PIC). Our algorithm is based on the original PIC proposal, adapted to take advantage of the GPU architecture, maintaining the algorithm's original pro...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2017-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8078163/ |
_version_ | 1818641502495047680 |
---|---|
author | Gustavo Rodrigues Lacerda Silva Rafael Ribeiro De Medeiros Brayan Rene Acevedo Jaimes Carla Caldeira Takahashi Douglas Alexandre Gomes Vieira Antonio De Padua Braga |
author_facet | Gustavo Rodrigues Lacerda Silva Rafael Ribeiro De Medeiros Brayan Rene Acevedo Jaimes Carla Caldeira Takahashi Douglas Alexandre Gomes Vieira Antonio De Padua Braga |
author_sort | Gustavo Rodrigues Lacerda Silva |
collection | DOAJ |
description | This paper presents a new clustering algorithm, the GPIC, a graphics processing unit (GPU) accelerated algorithm for power iteration clustering (PIC). Our algorithm is based on the original PIC proposal, adapted to take advantage of the GPU architecture, maintaining the algorithm's original properties. The proposed method was compared against the serial implementation, achieving a considerable speedup in tests with synthetic and real data sets. A significant volume of real data application (>107 records) was used, and we identified that GPIC implementation has good scalability to handle data sets with millions of data points. Our implementation efforts are directed towards two aspects: to process large data sets in less time and to maintain the same quality of the clusters results generated by the original PIC version. |
first_indexed | 2024-12-16T23:28:11Z |
format | Article |
id | doaj.art-8a482fd8f0b94d1f8109255913798040 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-16T23:28:11Z |
publishDate | 2017-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-8a482fd8f0b94d1f81092559137980402022-12-21T22:11:57ZengIEEEIEEE Access2169-35362017-01-015272632727110.1109/ACCESS.2017.27653808078163CUDA-Based Parallelization of Power Iteration Clustering for Large DatasetsGustavo Rodrigues Lacerda Silva0https://orcid.org/0000-0002-1436-8485Rafael Ribeiro De Medeiros1Brayan Rene Acevedo Jaimes2Carla Caldeira Takahashi3Douglas Alexandre Gomes Vieira4Antonio De Padua Braga5Graduate Program in Electrical Engineering - Universidade Federal de Minas Gerais - Av. Antônio Carlos 6627, Belo Horizonte, MG, BrazilENACOM Handcrafted Technologies, Belo Horizonte, BrazilGraduate Program in Electrical Engineering - Universidade Federal de Minas Gerais - Av. Antônio Carlos 6627, Belo Horizonte, MG, BrazilGraduate Program in Electrical Engineering - Universidade Federal de Minas Gerais - Av. Antônio Carlos 6627, Belo Horizonte, MG, BrazilENACOM Handcrafted Technologies, Belo Horizonte, BrazilGraduate Program in Electrical Engineering - Universidade Federal de Minas Gerais - Av. Antônio Carlos 6627, Belo Horizonte, MG, BrazilThis paper presents a new clustering algorithm, the GPIC, a graphics processing unit (GPU) accelerated algorithm for power iteration clustering (PIC). Our algorithm is based on the original PIC proposal, adapted to take advantage of the GPU architecture, maintaining the algorithm's original properties. The proposed method was compared against the serial implementation, achieving a considerable speedup in tests with synthetic and real data sets. A significant volume of real data application (>107 records) was used, and we identified that GPIC implementation has good scalability to handle data sets with millions of data points. Our implementation efforts are directed towards two aspects: to process large data sets in less time and to maintain the same quality of the clusters results generated by the original PIC version.https://ieeexplore.ieee.org/document/8078163/Scalable machine learning algorithmsGPUpower iteration clustering |
spellingShingle | Gustavo Rodrigues Lacerda Silva Rafael Ribeiro De Medeiros Brayan Rene Acevedo Jaimes Carla Caldeira Takahashi Douglas Alexandre Gomes Vieira Antonio De Padua Braga CUDA-Based Parallelization of Power Iteration Clustering for Large Datasets IEEE Access Scalable machine learning algorithms GPU power iteration clustering |
title | CUDA-Based Parallelization of Power Iteration Clustering for Large Datasets |
title_full | CUDA-Based Parallelization of Power Iteration Clustering for Large Datasets |
title_fullStr | CUDA-Based Parallelization of Power Iteration Clustering for Large Datasets |
title_full_unstemmed | CUDA-Based Parallelization of Power Iteration Clustering for Large Datasets |
title_short | CUDA-Based Parallelization of Power Iteration Clustering for Large Datasets |
title_sort | cuda based parallelization of power iteration clustering for large datasets |
topic | Scalable machine learning algorithms GPU power iteration clustering |
url | https://ieeexplore.ieee.org/document/8078163/ |
work_keys_str_mv | AT gustavorodrigueslacerdasilva cudabasedparallelizationofpoweriterationclusteringforlargedatasets AT rafaelribeirodemedeiros cudabasedparallelizationofpoweriterationclusteringforlargedatasets AT brayanreneacevedojaimes cudabasedparallelizationofpoweriterationclusteringforlargedatasets AT carlacaldeiratakahashi cudabasedparallelizationofpoweriterationclusteringforlargedatasets AT douglasalexandregomesvieira cudabasedparallelizationofpoweriterationclusteringforlargedatasets AT antoniodepaduabraga cudabasedparallelizationofpoweriterationclusteringforlargedatasets |