Why Is Multiclass Classification Hard?

In classification problems, as the number of classes increases, correctly classifying a new instance into one of them is assumed to be more challenging than making the same decision in the presence of fewer classes. The essence of the problem is that using the learning algorithm on each decision bou...

Full description

Bibliographic Details
Main Authors: Pablo Del Moral, Slawomir Nowaczyk, Sepideh Pashami
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9839332/
_version_ 1798040388109336576
author Pablo Del Moral
Slawomir Nowaczyk
Sepideh Pashami
author_facet Pablo Del Moral
Slawomir Nowaczyk
Sepideh Pashami
author_sort Pablo Del Moral
collection DOAJ
description In classification problems, as the number of classes increases, correctly classifying a new instance into one of them is assumed to be more challenging than making the same decision in the presence of fewer classes. The essence of the problem is that using the learning algorithm on each decision boundary individually is better than using the same learning algorithm on several of them simultaneously. However, why and when it happens is still not well-understood today. This work’s main contribution is to introduce the concept of heterogeneity of decision boundaries as an explanation of this phenomenon. Based on the definition of heterogeneity of decision boundaries, we analyze and explain the differences in the performance of state of the art approaches to solve multi-class classification. We demonstrate that as the heterogeneity increases, the performances of all approaches, except one-vs-one, decrease. We show that by correctly encoding the knowledge of the heterogeneity of decision boundaries in a decomposition of the multi-class problem, we can obtain better results than state of the art decompositions. The benefits can be an increase in classification performance or a decrease in the time it takes to train and evaluate the models. We first provide intuitions and illustrate the effects of the heterogeneity of decision boundaries using synthetic datasets and a simplistic classifier. Then, we demonstrate how a real dataset exhibits these same principles, also under realistic learning algorithms. In this setting, we devise a method to quantify the heterogeneity of different decision boundaries, and use it to decompose the multi-class problem. The results show significant improvements over state-of-the-art decompositions that do not take the heterogeneity of decision boundaries into account.
first_indexed 2024-04-11T22:06:56Z
format Article
id doaj.art-2dcc9192e6a04bbb9ab8b167f8525779
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-11T22:06:56Z
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-2dcc9192e6a04bbb9ab8b167f85257792022-12-22T04:00:40ZengIEEEIEEE Access2169-35362022-01-0110804488046210.1109/ACCESS.2022.31925149839332Why Is Multiclass Classification Hard?Pablo Del Moral0https://orcid.org/0000-0001-5395-5482Slawomir Nowaczyk1https://orcid.org/0000-0002-7796-5201Sepideh Pashami2https://orcid.org/0000-0003-3272-4145Center for Applied Intelligent Systems Research (CAISR), Halmstad University, Halmstad, SwedenCenter for Applied Intelligent Systems Research (CAISR), Halmstad University, Halmstad, SwedenCenter for Applied Intelligent Systems Research (CAISR), Halmstad University, Halmstad, SwedenIn classification problems, as the number of classes increases, correctly classifying a new instance into one of them is assumed to be more challenging than making the same decision in the presence of fewer classes. The essence of the problem is that using the learning algorithm on each decision boundary individually is better than using the same learning algorithm on several of them simultaneously. However, why and when it happens is still not well-understood today. This work’s main contribution is to introduce the concept of heterogeneity of decision boundaries as an explanation of this phenomenon. Based on the definition of heterogeneity of decision boundaries, we analyze and explain the differences in the performance of state of the art approaches to solve multi-class classification. We demonstrate that as the heterogeneity increases, the performances of all approaches, except one-vs-one, decrease. We show that by correctly encoding the knowledge of the heterogeneity of decision boundaries in a decomposition of the multi-class problem, we can obtain better results than state of the art decompositions. The benefits can be an increase in classification performance or a decrease in the time it takes to train and evaluate the models. We first provide intuitions and illustrate the effects of the heterogeneity of decision boundaries using synthetic datasets and a simplistic classifier. Then, we demonstrate how a real dataset exhibits these same principles, also under realistic learning algorithms. In this setting, we devise a method to quantify the heterogeneity of different decision boundaries, and use it to decompose the multi-class problem. The results show significant improvements over state-of-the-art decompositions that do not take the heterogeneity of decision boundaries into account.https://ieeexplore.ieee.org/document/9839332/Classification complexityheterogeneity of decision boundariesmulti-class classification
spellingShingle Pablo Del Moral
Slawomir Nowaczyk
Sepideh Pashami
Why Is Multiclass Classification Hard?
IEEE Access
Classification complexity
heterogeneity of decision boundaries
multi-class classification
title Why Is Multiclass Classification Hard?
title_full Why Is Multiclass Classification Hard?
title_fullStr Why Is Multiclass Classification Hard?
title_full_unstemmed Why Is Multiclass Classification Hard?
title_short Why Is Multiclass Classification Hard?
title_sort why is multiclass classification hard
topic Classification complexity
heterogeneity of decision boundaries
multi-class classification
url https://ieeexplore.ieee.org/document/9839332/
work_keys_str_mv AT pablodelmoral whyismulticlassclassificationhard
AT slawomirnowaczyk whyismulticlassclassificationhard
AT sepidehpashami whyismulticlassclassificationhard