D-GENE: Deferring the GENEration of Power Sets for Discovering Frequent Itemsets in Sparse Big Data

Sparseness is the distinctive aspect of big data generated by numerous applications at present. Furthermore, several similar records exist in real-world sparse datasets. Based on Iterative Trimmed Transaction Lattice (ITTL), the recently proposed TRICE algorithm learns frequent itemsets efficiently...

Full description

Bibliographic Details
Main Authors:	Muhammad Yasir, Muhammad Asif Habib, Muhammad Ashraf, Shahzad Sarwar, Muhammad Umar Chaudhry, Hamayoun Shahwani, Mudassar Ahmad, CH. Muhammad Nadeem Faisal
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Big data applications pattern recognition association rules frequent item set mining IoT
Online Access:	https://ieeexplore.ieee.org/document/8984308/

_version_	1818330313506422784
author	Muhammad Yasir Muhammad Asif Habib Muhammad Ashraf Shahzad Sarwar Muhammad Umar Chaudhry Hamayoun Shahwani Mudassar Ahmad CH. Muhammad Nadeem Faisal
author_facet	Muhammad Yasir Muhammad Asif Habib Muhammad Ashraf Shahzad Sarwar Muhammad Umar Chaudhry Hamayoun Shahwani Mudassar Ahmad CH. Muhammad Nadeem Faisal
author_sort	Muhammad Yasir
collection	DOAJ
description	Sparseness is the distinctive aspect of big data generated by numerous applications at present. Furthermore, several similar records exist in real-world sparse datasets. Based on Iterative Trimmed Transaction Lattice (ITTL), the recently proposed TRICE algorithm learns frequent itemsets efficiently from sparse datasets. TRICE stores alike transactions once, and eliminates the infrequent part of each distinct transaction afterward. However, removing the infrequent part of two or more distinct transactions may result in similar trimmed transactions. TRICE repeatedly generates ITTLs of similar trimmed transactions that induce redundant computations and eventually, affects the runtime efficiency. This paper presents D-GENE, a technique that optimizes TRICE by introducing a deferred ITTL generation mechanism. D-GENE suspends the process of ITTL generation till the completion of transaction pruning phase. The deferral strategy enables D-GENE to generate ITTLs of similar trimmed transactions once. Experimental results show that by avoiding the redundant computations, D-GENE gets better runtime efficiency. D-GENE beats TRICE, FP-growth, and optimized versions of SaM and RElim algorithms comprehensively, especially when the difference between distinct transactions and distinct trimmed transactions is high.
first_indexed	2024-12-13T13:01:58Z
format	Article
id	doaj.art-74f12ce047464e3dadab0434394163e9
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-13T13:01:58Z
publishDate	2020-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-74f12ce047464e3dadab0434394163e92022-12-21T23:44:58ZengIEEEIEEE Access2169-35362020-01-018273752739210.1109/ACCESS.2020.29718348984308D-GENE: Deferring the GENEration of Power Sets for Discovering Frequent Itemsets in Sparse Big DataMuhammad Yasir0https://orcid.org/0000-0002-7106-6598Muhammad Asif Habib1https://orcid.org/0000-0002-2675-1975Muhammad Ashraf2https://orcid.org/0000-0001-8721-0921Shahzad Sarwar3https://orcid.org/0000-0003-3074-9162Muhammad Umar Chaudhry4https://orcid.org/0000-0002-7287-2372Hamayoun Shahwani5https://orcid.org/0000-0003-2211-8360Mudassar Ahmad6https://orcid.org/0000-0002-6366-8230CH. Muhammad Nadeem Faisal7https://orcid.org/0000-0001-8781-4143Department of Computer Science, National Textile University, Faisalabad, PakistanDepartment of Computer Science, National Textile University, Faisalabad, PakistanDepartment of Computer Engineering, Balochistan University of Information Technology, Engineering, and Management Sciences, Quetta, PakistanPunjab University College of Information Technology, University of the Punjab, Lahore, PakistanDepartment of Computer Science, National College of Business Administration and Economics, Multan, PakistanDepartment of Telecommunications, Balochistan University of Information Technology, Engineering, and Management Sciences, Quetta, PakistanDepartment of Computer Science, National Textile University, Faisalabad, PakistanDepartment of Computer Science, National Textile University, Faisalabad, PakistanSparseness is the distinctive aspect of big data generated by numerous applications at present. Furthermore, several similar records exist in real-world sparse datasets. Based on Iterative Trimmed Transaction Lattice (ITTL), the recently proposed TRICE algorithm learns frequent itemsets efficiently from sparse datasets. TRICE stores alike transactions once, and eliminates the infrequent part of each distinct transaction afterward. However, removing the infrequent part of two or more distinct transactions may result in similar trimmed transactions. TRICE repeatedly generates ITTLs of similar trimmed transactions that induce redundant computations and eventually, affects the runtime efficiency. This paper presents D-GENE, a technique that optimizes TRICE by introducing a deferred ITTL generation mechanism. D-GENE suspends the process of ITTL generation till the completion of transaction pruning phase. The deferral strategy enables D-GENE to generate ITTLs of similar trimmed transactions once. Experimental results show that by avoiding the redundant computations, D-GENE gets better runtime efficiency. D-GENE beats TRICE, FP-growth, and optimized versions of SaM and RElim algorithms comprehensively, especially when the difference between distinct transactions and distinct trimmed transactions is high.https://ieeexplore.ieee.org/document/8984308/Big data applicationspattern recognitionassociation rulesfrequent item set miningIoT
spellingShingle	Muhammad Yasir Muhammad Asif Habib Muhammad Ashraf Shahzad Sarwar Muhammad Umar Chaudhry Hamayoun Shahwani Mudassar Ahmad CH. Muhammad Nadeem Faisal D-GENE: Deferring the GENEration of Power Sets for Discovering Frequent Itemsets in Sparse Big Data IEEE Access Big data applications pattern recognition association rules frequent item set mining IoT
title	D-GENE: Deferring the GENEration of Power Sets for Discovering Frequent Itemsets in Sparse Big Data
title_full	D-GENE: Deferring the GENEration of Power Sets for Discovering Frequent Itemsets in Sparse Big Data
title_fullStr	D-GENE: Deferring the GENEration of Power Sets for Discovering Frequent Itemsets in Sparse Big Data
title_full_unstemmed	D-GENE: Deferring the GENEration of Power Sets for Discovering Frequent Itemsets in Sparse Big Data
title_short	D-GENE: Deferring the GENEration of Power Sets for Discovering Frequent Itemsets in Sparse Big Data
title_sort	d gene deferring the generation of power sets for discovering frequent itemsets in sparse big data
topic	Big data applications pattern recognition association rules frequent item set mining IoT
url	https://ieeexplore.ieee.org/document/8984308/
work_keys_str_mv	AT muhammadyasir dgenedeferringthegenerationofpowersetsfordiscoveringfrequentitemsetsinsparsebigdata AT muhammadasifhabib dgenedeferringthegenerationofpowersetsfordiscoveringfrequentitemsetsinsparsebigdata AT muhammadashraf dgenedeferringthegenerationofpowersetsfordiscoveringfrequentitemsetsinsparsebigdata AT shahzadsarwar dgenedeferringthegenerationofpowersetsfordiscoveringfrequentitemsetsinsparsebigdata AT muhammadumarchaudhry dgenedeferringthegenerationofpowersetsfordiscoveringfrequentitemsetsinsparsebigdata AT hamayounshahwani dgenedeferringthegenerationofpowersetsfordiscoveringfrequentitemsetsinsparsebigdata AT mudassarahmad dgenedeferringthegenerationofpowersetsfordiscoveringfrequentitemsetsinsparsebigdata AT chmuhammadnadeemfaisal dgenedeferringthegenerationofpowersetsfordiscoveringfrequentitemsetsinsparsebigdata

D-GENE: Deferring the GENEration of Power Sets for Discovering Frequent Itemsets in Sparse Big Data

Similar Items