A heuristic approach for finding similarity indexes of multivariate data sets

Multivariate data sets (MDSs), with enormous size and certain ratio of noise/outliers, are generated routinely in various application domains. A major issue, tightly coupled with these MDSs, is how to compute their similarity indexes with available resources in presence of noise/outliers - which is...

Full description

Bibliographic Details
Main Authors:	Khan, Rahim, Zakarya, Muhammad, Khan, Ayaz Ali, Ur Rahman, Izaz, Abd Rahman, Mohd Amiruddin, Abdul Karim, Muhammad Khalis, Mustafa, Mohd Shafie
Format:	Article
Language:	English
Published:	Institute of Electrical and Electronics Engineers 2020
Online Access:	http://psasir.upm.edu.my/id/eprint/87601/1/ABSTRACT.pdf

_version_	1825952388569104384
author	Khan, Rahim Zakarya, Muhammad Khan, Ayaz Ali Ur Rahman, Izaz Abd Rahman, Mohd Amiruddin Abdul Karim, Muhammad Khalis Mustafa, Mohd Shafie
author_facet	Khan, Rahim Zakarya, Muhammad Khan, Ayaz Ali Ur Rahman, Izaz Abd Rahman, Mohd Amiruddin Abdul Karim, Muhammad Khalis Mustafa, Mohd Shafie
author_sort	Khan, Rahim
collection	UPM
description	Multivariate data sets (MDSs), with enormous size and certain ratio of noise/outliers, are generated routinely in various application domains. A major issue, tightly coupled with these MDSs, is how to compute their similarity indexes with available resources in presence of noise/outliers - which is addressed with the development of both classical and non-metric based approaches. However, classical techniques are sensitive to outliers and most of the non-classical approaches are either problem/application specific or overlay complex. Therefore, the development of an efficient and reliable algorithm for MDSs, with minimum time and space complexity, is highly encouraged by the research community. In this paper, a non-metric based similarity measure algorithm, for MDSs, is presented that solves the aforementioned issues, particularly, noise and computational time, successfully. This technique finds the similarity indexes of noisy MDSs, of both equal and variable sizes, through utilizing minimum possible resources i.e., space and time. Experiments were conducted with both benchmark and real time MDSs for evaluating the proposed algorithm`s performance against its rival algorithms, which are traditional dynamic programming based and sequential similarity measure algorithms. Experimental results show that the proposed scheme performs exceptionally well, in terms of time and space, than its counterpart algorithms and effectively tolerates a considerable portion of noisy data.
first_indexed	2024-03-06T10:43:53Z
format	Article
id	upm.eprints-87601
institution	Universiti Putra Malaysia
language	English
last_indexed	2024-03-06T10:43:53Z
publishDate	2020
publisher	Institute of Electrical and Electronics Engineers
record_format	dspace
spelling	upm.eprints-876012022-07-06T08:17:57Z http://psasir.upm.edu.my/id/eprint/87601/ A heuristic approach for finding similarity indexes of multivariate data sets Khan, Rahim Zakarya, Muhammad Khan, Ayaz Ali Ur Rahman, Izaz Abd Rahman, Mohd Amiruddin Abdul Karim, Muhammad Khalis Mustafa, Mohd Shafie Multivariate data sets (MDSs), with enormous size and certain ratio of noise/outliers, are generated routinely in various application domains. A major issue, tightly coupled with these MDSs, is how to compute their similarity indexes with available resources in presence of noise/outliers - which is addressed with the development of both classical and non-metric based approaches. However, classical techniques are sensitive to outliers and most of the non-classical approaches are either problem/application specific or overlay complex. Therefore, the development of an efficient and reliable algorithm for MDSs, with minimum time and space complexity, is highly encouraged by the research community. In this paper, a non-metric based similarity measure algorithm, for MDSs, is presented that solves the aforementioned issues, particularly, noise and computational time, successfully. This technique finds the similarity indexes of noisy MDSs, of both equal and variable sizes, through utilizing minimum possible resources i.e., space and time. Experiments were conducted with both benchmark and real time MDSs for evaluating the proposed algorithm`s performance against its rival algorithms, which are traditional dynamic programming based and sequential similarity measure algorithms. Experimental results show that the proposed scheme performs exceptionally well, in terms of time and space, than its counterpart algorithms and effectively tolerates a considerable portion of noisy data. Institute of Electrical and Electronics Engineers 2020 Article PeerReviewed text en http://psasir.upm.edu.my/id/eprint/87601/1/ABSTRACT.pdf Khan, Rahim and Zakarya, Muhammad and Khan, Ayaz Ali and Ur Rahman, Izaz and Abd Rahman, Mohd Amiruddin and Abdul Karim, Muhammad Khalis and Mustafa, Mohd Shafie (2020) A heuristic approach for finding similarity indexes of multivariate data sets. IEEE Access, 8. 21759 - 21769. ISSN 2169-3536 https://ieeexplore.ieee.org/document/8963981 10.1109/ACCESS.2020.2968222
spellingShingle	Khan, Rahim Zakarya, Muhammad Khan, Ayaz Ali Ur Rahman, Izaz Abd Rahman, Mohd Amiruddin Abdul Karim, Muhammad Khalis Mustafa, Mohd Shafie A heuristic approach for finding similarity indexes of multivariate data sets
title	A heuristic approach for finding similarity indexes of multivariate data sets
title_full	A heuristic approach for finding similarity indexes of multivariate data sets
title_fullStr	A heuristic approach for finding similarity indexes of multivariate data sets
title_full_unstemmed	A heuristic approach for finding similarity indexes of multivariate data sets
title_short	A heuristic approach for finding similarity indexes of multivariate data sets
title_sort	heuristic approach for finding similarity indexes of multivariate data sets
url	http://psasir.upm.edu.my/id/eprint/87601/1/ABSTRACT.pdf
work_keys_str_mv	AT khanrahim aheuristicapproachforfindingsimilarityindexesofmultivariatedatasets AT zakaryamuhammad aheuristicapproachforfindingsimilarityindexesofmultivariatedatasets AT khanayazali aheuristicapproachforfindingsimilarityindexesofmultivariatedatasets AT urrahmanizaz aheuristicapproachforfindingsimilarityindexesofmultivariatedatasets AT abdrahmanmohdamiruddin aheuristicapproachforfindingsimilarityindexesofmultivariatedatasets AT abdulkarimmuhammadkhalis aheuristicapproachforfindingsimilarityindexesofmultivariatedatasets AT mustafamohdshafie aheuristicapproachforfindingsimilarityindexesofmultivariatedatasets AT khanrahim heuristicapproachforfindingsimilarityindexesofmultivariatedatasets AT zakaryamuhammad heuristicapproachforfindingsimilarityindexesofmultivariatedatasets AT khanayazali heuristicapproachforfindingsimilarityindexesofmultivariatedatasets AT urrahmanizaz heuristicapproachforfindingsimilarityindexesofmultivariatedatasets AT abdrahmanmohdamiruddin heuristicapproachforfindingsimilarityindexesofmultivariatedatasets AT abdulkarimmuhammadkhalis heuristicapproachforfindingsimilarityindexesofmultivariatedatasets AT mustafamohdshafie heuristicapproachforfindingsimilarityindexesofmultivariatedatasets

A heuristic approach for finding similarity indexes of multivariate data sets

Similar Items