A direct ensemble classifier for learning imbalanced multiclass data

A traditional direct single classifier can be easily applied to solve a multiclass classification problem. However, the performance of a single classifier is decreased with the existence of imbalanced data in multiclass classification tasks. Thus, an ensemble of classifiers is one of the methods use...

ver descrição completa

Detalhes bibliográficos
Autor principal: Samry @ Mohd Shamrie Sainin
Formato: Thesis
Idioma:English
English
Publicado em: 2013
Assuntos:
Acesso em linha:https://eprints.ums.edu.my/id/eprint/38557/1/24%20PAGES.pdf
https://eprints.ums.edu.my/id/eprint/38557/2/FULLTEXT.pdf
_version_ 1825715554778873856
author Samry @ Mohd Shamrie Sainin
author_facet Samry @ Mohd Shamrie Sainin
author_sort Samry @ Mohd Shamrie Sainin
collection UMS
description A traditional direct single classifier can be easily applied to solve a multiclass classification problem. However, the performance of a single classifier is decreased with the existence of imbalanced data in multiclass classification tasks. Thus, an ensemble of classifiers is one of the methods used to solve multiclass classification tasks. In this thesis, the problem of learning from imbalanced multiclass data classification is studied. In the multiclass classification problem, decision can be estimated not only by the final single class label, but also by other appropriate class. Many real-world multiclass classification problems can be represented into a setting where non-crisp label need to be observed. An in-depth review and method to solve this special learning task is explained in this thesis. An alternative ensemble learning framework called Direct Ensemble Classifier for Imbalance Learning (DECIML) is proposed combining the advantages of existing single classifiers and ensemble methods and strategies. The learning framework consists of ensemble learning and decision combiner model with general supervised learning algorithms as base learner. Feature selection is also applied in DECIML in order to increase the performance of the ensemble learning. In order to facilitate the experiments and future research on the imbalanced multiclass problem, a standard pool of benchmark data is created, which consists of 16 datasets with different degrees of imbalanced ratio and 4 datasets for imbalanced multiclass with feature selection purposes. The benchmark data is used to evaluate and compare the proposed frameworks with several ensemble methods, such as bagging and adaboost. The DECIML with feature selection is also evaluated and compared with methods named CFsSubsetEval and Filteredsubseteval. The results obtained show that the proposed learning frameworks are comparable to other methods. In addition, the selected benchmark data, experiments and the results are useful for future research on the imbalanced multiclass classification problem. Furthermore, the DECIML framework was applied to the real world leaf classification problem based on the shape features. Extensive experiments and results show that the DECIML method does provide a promising performance in imbalanced multiclass with highly noisy data.
first_indexed 2024-09-24T00:48:38Z
format Thesis
id ums.eprints-38557
institution Universiti Malaysia Sabah
language English
English
last_indexed 2024-09-24T00:48:38Z
publishDate 2013
record_format dspace
spelling ums.eprints-385572024-04-29T02:37:16Z https://eprints.ums.edu.my/id/eprint/38557/ A direct ensemble classifier for learning imbalanced multiclass data Samry @ Mohd Shamrie Sainin TK7885-7895 Computer engineering. Computer hardware A traditional direct single classifier can be easily applied to solve a multiclass classification problem. However, the performance of a single classifier is decreased with the existence of imbalanced data in multiclass classification tasks. Thus, an ensemble of classifiers is one of the methods used to solve multiclass classification tasks. In this thesis, the problem of learning from imbalanced multiclass data classification is studied. In the multiclass classification problem, decision can be estimated not only by the final single class label, but also by other appropriate class. Many real-world multiclass classification problems can be represented into a setting where non-crisp label need to be observed. An in-depth review and method to solve this special learning task is explained in this thesis. An alternative ensemble learning framework called Direct Ensemble Classifier for Imbalance Learning (DECIML) is proposed combining the advantages of existing single classifiers and ensemble methods and strategies. The learning framework consists of ensemble learning and decision combiner model with general supervised learning algorithms as base learner. Feature selection is also applied in DECIML in order to increase the performance of the ensemble learning. In order to facilitate the experiments and future research on the imbalanced multiclass problem, a standard pool of benchmark data is created, which consists of 16 datasets with different degrees of imbalanced ratio and 4 datasets for imbalanced multiclass with feature selection purposes. The benchmark data is used to evaluate and compare the proposed frameworks with several ensemble methods, such as bagging and adaboost. The DECIML with feature selection is also evaluated and compared with methods named CFsSubsetEval and Filteredsubseteval. The results obtained show that the proposed learning frameworks are comparable to other methods. In addition, the selected benchmark data, experiments and the results are useful for future research on the imbalanced multiclass classification problem. Furthermore, the DECIML framework was applied to the real world leaf classification problem based on the shape features. Extensive experiments and results show that the DECIML method does provide a promising performance in imbalanced multiclass with highly noisy data. 2013 Thesis NonPeerReviewed text en https://eprints.ums.edu.my/id/eprint/38557/1/24%20PAGES.pdf text en https://eprints.ums.edu.my/id/eprint/38557/2/FULLTEXT.pdf Samry @ Mohd Shamrie Sainin (2013) A direct ensemble classifier for learning imbalanced multiclass data. Doctoral thesis, Universiti Malaysia Sabah.
spellingShingle TK7885-7895 Computer engineering. Computer hardware
Samry @ Mohd Shamrie Sainin
A direct ensemble classifier for learning imbalanced multiclass data
title A direct ensemble classifier for learning imbalanced multiclass data
title_full A direct ensemble classifier for learning imbalanced multiclass data
title_fullStr A direct ensemble classifier for learning imbalanced multiclass data
title_full_unstemmed A direct ensemble classifier for learning imbalanced multiclass data
title_short A direct ensemble classifier for learning imbalanced multiclass data
title_sort direct ensemble classifier for learning imbalanced multiclass data
topic TK7885-7895 Computer engineering. Computer hardware
url https://eprints.ums.edu.my/id/eprint/38557/1/24%20PAGES.pdf
https://eprints.ums.edu.my/id/eprint/38557/2/FULLTEXT.pdf
work_keys_str_mv AT samrymohdshamriesainin adirectensembleclassifierforlearningimbalancedmulticlassdata
AT samrymohdshamriesainin directensembleclassifierforlearningimbalancedmulticlassdata