A General Framework for Sorting Large Data Sets Using Independent Subarrays of Approximately Equal Length

Designing an efficient data sorting algorithm that requires less time and space complexity is essential for computer science, different engineering disciplines, data mining systems, wireless networks, and the Internet of things. This paper proposes a general low-complex data sorting framework that d...

Full description

Bibliographic Details
Main Authors: Shahriar Shirvani Moghaddam, Kiaksar Shirvani Moghaddam
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9690854/
Description
Summary:Designing an efficient data sorting algorithm that requires less time and space complexity is essential for computer science, different engineering disciplines, data mining systems, wireless networks, and the Internet of things. This paper proposes a general low-complex data sorting framework that distinguishes the sorted or similar data, makes independent subarrays approximately in equal length, and sorts the subarrays&#x2019; data using one of the popular comparison-based sorting algorithms. Two frameworks, one for serial realization and another for parallel realization, are proposed. The time complexity analyses of the proposed framework demonstrate an improvement compared to the conventional Merge and Quick sorting algorithms. Following complexity analysis, the simulation results indicate slight improvements in the elapsed time and the number of swaps of the proposed serial Merge-based and Quick-based frameworks compared to the conventional ones for low/high variance integer/non-integer data sets, in different data sizes and the number of divisions. It is about <inline-formula> <tex-math notation="LaTeX">$(1-1.6\%)$ </tex-math></inline-formula> to <inline-formula> <tex-math notation="LaTeX">$(3.5-4\%)$ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$(0.3-1.8\%)$ </tex-math></inline-formula> to <inline-formula> <tex-math notation="LaTeX">$(2-4\%)$ </tex-math></inline-formula> improvements in the elapsed times for 1, 2, 3, and 4 divisions, respectively for small and very large data sets in Merge-based and Quick-based scenarios. Although these improvements in serial realization are minor, making independent low-variance subarrays allows the sorted components to be extracted sequentially and gradually before the end of the sorting process. Also, it proposes a general framework for parallelizing conventional sorting algorithms using non-connected (independent) or connected (dependent) multi-core structures. As the second experiment, the numerical analyses that compare the results of the parallel realization of the proposed framework to the serial one in 1, 2, 3, and 4 divisions, show a speedup factor of <inline-formula> <tex-math notation="LaTeX">$(2-4)$ </tex-math></inline-formula> for small to <inline-formula> <tex-math notation="LaTeX">$(2-16)$ </tex-math></inline-formula> for very large data sets. The third experiment shows the effectiveness of the proposed parallel framework to the parallel sorting based on the random-access machine model. Finally, we prove that the mean-based pivot is as efficient as the median-based and much better than the random pivot for making subarrays of approximately equal length.
ISSN:2169-3536