A distance metric for ordinal data based on misclassification

Distances between data sets are used for analyses such as classification and clustering analyses. Some existing distance metrics, such as the Manhattan (City Block or L1 ) distance, are suitable for use with categorical data, where the data subtype is numeric, or more specifically, integers. Howeve...

Full description

Bibliographic Details
Main Author:	Dreas Nielsen
Format:	Article
Language:	English
Published:	Institute of Sciences and Technology, University Center Abdelhafid Boussouf, Mila 2024-01-01
Series:	Journal of Innovative Applied Mathematics and Computational Sciences
Subjects:	ordinal distance multinomial, categorical, misclassification
Online Access:	https://jiamcs.centre-univ-mila.dz/index.php/jiamcs/article/view/83

_version_	1827378007966220288
author	Dreas Nielsen
author_facet	Dreas Nielsen
author_sort	Dreas Nielsen
collection	DOAJ
description	Distances between data sets are used for analyses such as classification and clustering analyses. Some existing distance metrics, such as the Manhattan (City Block or L1 ) distance, are suitable for use with categorical data, where the data subtype is numeric, or more specifically, integers. However, ordinality of categories imposes additional constraints on data distributions, and the ordering of categories should be considered in the calculation of distances. A new distance metric is presented here that is based on the number of misclassifications that must have occurred within one data set if it were in fact identical to another data set. This "misclassification distance" is equivalent to the number of reclassifications necessary to transform one data set into another. This metric takes account not only of the numbers of observations in corresponding ordinal categories, but also of the number of categories across which observations must be moved to correct all misclassifications. Each stepwise movement of an observation across one or more categories that is required to equalize the distributions increases the distance metric, thus this method is referred to as a stepwise ordinal misclassification distance (SOMD). An algorithm is provided for the calculation of this metric.
first_indexed	2024-03-08T12:48:39Z
format	Article
id	doaj.art-b33f1a1ec0eb418ca2cdb74f01cbc823
institution	Directory Open Access Journal
issn	2773-4196
language	English
last_indexed	2024-03-08T12:48:39Z
publishDate	2024-01-01
publisher	Institute of Sciences and Technology, University Center Abdelhafid Boussouf, Mila
record_format	Article
series	Journal of Innovative Applied Mathematics and Computational Sciences
spelling	doaj.art-b33f1a1ec0eb418ca2cdb74f01cbc8232024-01-20T21:50:35ZengInstitute of Sciences and Technology, University Center Abdelhafid Boussouf, MilaJournal of Innovative Applied Mathematics and Computational Sciences2773-41962024-01-013210.58205/jiamcs.v3i2.83A distance metric for ordinal data based on misclassificationDreas Nielsen0Integral Consulting Inc., 508 Yale Ave. N. Suite 204, Seattle WA 98109, United States Distances between data sets are used for analyses such as classification and clustering analyses. Some existing distance metrics, such as the Manhattan (City Block or L1 ) distance, are suitable for use with categorical data, where the data subtype is numeric, or more specifically, integers. However, ordinality of categories imposes additional constraints on data distributions, and the ordering of categories should be considered in the calculation of distances. A new distance metric is presented here that is based on the number of misclassifications that must have occurred within one data set if it were in fact identical to another data set. This "misclassification distance" is equivalent to the number of reclassifications necessary to transform one data set into another. This metric takes account not only of the numbers of observations in corresponding ordinal categories, but also of the number of categories across which observations must be moved to correct all misclassifications. Each stepwise movement of an observation across one or more categories that is required to equalize the distributions increases the distance metric, thus this method is referred to as a stepwise ordinal misclassification distance (SOMD). An algorithm is provided for the calculation of this metric. https://jiamcs.centre-univ-mila.dz/index.php/jiamcs/article/view/83ordinaldistancemultinomial, categorical, misclassification
spellingShingle	Dreas Nielsen A distance metric for ordinal data based on misclassification Journal of Innovative Applied Mathematics and Computational Sciences ordinal distance multinomial, categorical, misclassification
title	A distance metric for ordinal data based on misclassification
title_full	A distance metric for ordinal data based on misclassification
title_fullStr	A distance metric for ordinal data based on misclassification
title_full_unstemmed	A distance metric for ordinal data based on misclassification
title_short	A distance metric for ordinal data based on misclassification
title_sort	distance metric for ordinal data based on misclassification
topic	ordinal distance multinomial, categorical, misclassification
url	https://jiamcs.centre-univ-mila.dz/index.php/jiamcs/article/view/83
work_keys_str_mv	AT dreasnielsen adistancemetricforordinaldatabasedonmisclassification AT dreasnielsen distancemetricforordinaldatabasedonmisclassification

A distance metric for ordinal data based on misclassification

Similar Items