Why Is Multiclass Classification Hard?
In classification problems, as the number of classes increases, correctly classifying a new instance into one of them is assumed to be more challenging than making the same decision in the presence of fewer classes. The essence of the problem is that using the learning algorithm on each decision bou...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2022-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9839332/ |
_version_ | 1798040388109336576 |
---|---|
author | Pablo Del Moral Slawomir Nowaczyk Sepideh Pashami |
author_facet | Pablo Del Moral Slawomir Nowaczyk Sepideh Pashami |
author_sort | Pablo Del Moral |
collection | DOAJ |
description | In classification problems, as the number of classes increases, correctly classifying a new instance into one of them is assumed to be more challenging than making the same decision in the presence of fewer classes. The essence of the problem is that using the learning algorithm on each decision boundary individually is better than using the same learning algorithm on several of them simultaneously. However, why and when it happens is still not well-understood today. This work’s main contribution is to introduce the concept of heterogeneity of decision boundaries as an explanation of this phenomenon. Based on the definition of heterogeneity of decision boundaries, we analyze and explain the differences in the performance of state of the art approaches to solve multi-class classification. We demonstrate that as the heterogeneity increases, the performances of all approaches, except one-vs-one, decrease. We show that by correctly encoding the knowledge of the heterogeneity of decision boundaries in a decomposition of the multi-class problem, we can obtain better results than state of the art decompositions. The benefits can be an increase in classification performance or a decrease in the time it takes to train and evaluate the models. We first provide intuitions and illustrate the effects of the heterogeneity of decision boundaries using synthetic datasets and a simplistic classifier. Then, we demonstrate how a real dataset exhibits these same principles, also under realistic learning algorithms. In this setting, we devise a method to quantify the heterogeneity of different decision boundaries, and use it to decompose the multi-class problem. The results show significant improvements over state-of-the-art decompositions that do not take the heterogeneity of decision boundaries into account. |
first_indexed | 2024-04-11T22:06:56Z |
format | Article |
id | doaj.art-2dcc9192e6a04bbb9ab8b167f8525779 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-11T22:06:56Z |
publishDate | 2022-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-2dcc9192e6a04bbb9ab8b167f85257792022-12-22T04:00:40ZengIEEEIEEE Access2169-35362022-01-0110804488046210.1109/ACCESS.2022.31925149839332Why Is Multiclass Classification Hard?Pablo Del Moral0https://orcid.org/0000-0001-5395-5482Slawomir Nowaczyk1https://orcid.org/0000-0002-7796-5201Sepideh Pashami2https://orcid.org/0000-0003-3272-4145Center for Applied Intelligent Systems Research (CAISR), Halmstad University, Halmstad, SwedenCenter for Applied Intelligent Systems Research (CAISR), Halmstad University, Halmstad, SwedenCenter for Applied Intelligent Systems Research (CAISR), Halmstad University, Halmstad, SwedenIn classification problems, as the number of classes increases, correctly classifying a new instance into one of them is assumed to be more challenging than making the same decision in the presence of fewer classes. The essence of the problem is that using the learning algorithm on each decision boundary individually is better than using the same learning algorithm on several of them simultaneously. However, why and when it happens is still not well-understood today. This work’s main contribution is to introduce the concept of heterogeneity of decision boundaries as an explanation of this phenomenon. Based on the definition of heterogeneity of decision boundaries, we analyze and explain the differences in the performance of state of the art approaches to solve multi-class classification. We demonstrate that as the heterogeneity increases, the performances of all approaches, except one-vs-one, decrease. We show that by correctly encoding the knowledge of the heterogeneity of decision boundaries in a decomposition of the multi-class problem, we can obtain better results than state of the art decompositions. The benefits can be an increase in classification performance or a decrease in the time it takes to train and evaluate the models. We first provide intuitions and illustrate the effects of the heterogeneity of decision boundaries using synthetic datasets and a simplistic classifier. Then, we demonstrate how a real dataset exhibits these same principles, also under realistic learning algorithms. In this setting, we devise a method to quantify the heterogeneity of different decision boundaries, and use it to decompose the multi-class problem. The results show significant improvements over state-of-the-art decompositions that do not take the heterogeneity of decision boundaries into account.https://ieeexplore.ieee.org/document/9839332/Classification complexityheterogeneity of decision boundariesmulti-class classification |
spellingShingle | Pablo Del Moral Slawomir Nowaczyk Sepideh Pashami Why Is Multiclass Classification Hard? IEEE Access Classification complexity heterogeneity of decision boundaries multi-class classification |
title | Why Is Multiclass Classification Hard? |
title_full | Why Is Multiclass Classification Hard? |
title_fullStr | Why Is Multiclass Classification Hard? |
title_full_unstemmed | Why Is Multiclass Classification Hard? |
title_short | Why Is Multiclass Classification Hard? |
title_sort | why is multiclass classification hard |
topic | Classification complexity heterogeneity of decision boundaries multi-class classification |
url | https://ieeexplore.ieee.org/document/9839332/ |
work_keys_str_mv | AT pablodelmoral whyismulticlassclassificationhard AT slawomirnowaczyk whyismulticlassclassificationhard AT sepidehpashami whyismulticlassclassificationhard |