Summary: | High-dimensional datasets comprise a few rows but a huge total of features. The increased difficulty data scientists have in efficiently gleaning insights from high-dimensional large data is a direct outcome of this trend. Therefore, a K-Means clustering method (KMC) must be implemented to cleanse this data, with the centroid of KMC being ideally chosen by improved squirrel search algorithm (ISSA). The suggested algorithm features not one but two different types of searches: the leaping search and the progressive search. The linear regression selection strategy is utilised to automatically choose the applicable approach during the evolutionary phase; this increases SSA's stability. Harris Hawks Optimizer (HHO), which replicates the behaviour of a Harris hawk during rabbit predation, is applied to the cleaned data to cluster the massive pattern. However, HHO has problems with low accuracy and early convergence because of its inability to strike a good balance among exploitation. A new variant of HHO, dubbed velocity-guided HHO (VGHHO), including three improvements is proposed to address these drawbacks. By including a velocity operator and an inertia weight into the search equation, we are able to create a unique modified position search equation for use during the exploitation phase. Then, we incorporate a learning mechanism based on refraction and opposition to provide the promising resolutions and aid the swarm in escaping the local optimal solution. After the massive cluster has been built with VGHHO, uproot technology is used to uncover massive patterns. We run the tests on a wide variety of high-dimensional datasets and employ a number of different efficiency metrics. Evidence from these investigations shows that the proposed method produces high-quality mining results.
|