Hybrid of Active Learning and Dynamic Self-Training for Data Stream Classification

Most of the data stream classification methods need plenty of labeled samples to achieve a reasonable result. However, in a real data stream environment, it is crucial and expensive to obtain labeled samples, unlike the unlabeled ones. Although Active learning is one way to tackle this challenge, it...

Full description

Bibliographic Details
Main Authors:	MohammadReza Keyvanpour, Mahnoosh Kholghi, Sogol Haghani
Format:	Article
Language:	English
Published:	Iran Telecom Research Center 2017-12-01
Series:	International Journal of Information and Communication Technology Research
Subjects:	computer science data mining semi-supervised learning classification data stream.
Online Access:	http://ijict.itrc.ac.ir/article-1-26-en.html

_version_	1811169229351682048
author	MohammadReza Keyvanpour Mahnoosh Kholghi Sogol Haghani
author_facet	MohammadReza Keyvanpour Mahnoosh Kholghi Sogol Haghani
author_sort	MohammadReza Keyvanpour
collection	DOAJ
description	Most of the data stream classification methods need plenty of labeled samples to achieve a reasonable result. However, in a real data stream environment, it is crucial and expensive to obtain labeled samples, unlike the unlabeled ones. Although Active learning is one way to tackle this challenge, it ignores the effect of unlabeled instances utilization that can help with strength supervised learning. This paper proposes a hybrid framework named “DSeSAL”, which combines active learning and dynamic self-training to achieve both strengths. Also, this framework introduces variance based self-training that uses minimal variance as a confidence measure. Since an early mistake by the base classifier in self-training can reinforce itself by generating incorrectly labeled data, especially in multi-class condition. A dynamic approach to avoid classifier accuracy deterioration, is considered. The other capability of the proposed framework is controlling the accuracy reduction by specifying a tolerance measure. To overcome data stream challenges, i.e., infinite length and evolving nature, we use the chunking method along with a classifier ensemble. A classifier is trained on each chunk and with previous classifiers form an ensemble of M such classifiers. Experimental results on synthetic and real-world data indicate the performance of the proposed framework in comparison with other approaches.
first_indexed	2024-04-10T16:40:03Z
format	Article
id	doaj.art-7443196057dd4e218380199bca1b4785
institution	Directory Open Access Journal
issn	2251-6107 2783-4425
language	English
last_indexed	2024-04-10T16:40:03Z
publishDate	2017-12-01
publisher	Iran Telecom Research Center
record_format	Article
series	International Journal of Information and Communication Technology Research
spelling	doaj.art-7443196057dd4e218380199bca1b47852023-02-08T07:56:26ZengIran Telecom Research CenterInternational Journal of Information and Communication Technology Research2251-61072783-44252017-12-01943749Hybrid of Active Learning and Dynamic Self-Training for Data Stream ClassificationMohammadReza Keyvanpour0Mahnoosh Kholghi1Sogol Haghani2 Most of the data stream classification methods need plenty of labeled samples to achieve a reasonable result. However, in a real data stream environment, it is crucial and expensive to obtain labeled samples, unlike the unlabeled ones. Although Active learning is one way to tackle this challenge, it ignores the effect of unlabeled instances utilization that can help with strength supervised learning. This paper proposes a hybrid framework named “DSeSAL”, which combines active learning and dynamic self-training to achieve both strengths. Also, this framework introduces variance based self-training that uses minimal variance as a confidence measure. Since an early mistake by the base classifier in self-training can reinforce itself by generating incorrectly labeled data, especially in multi-class condition. A dynamic approach to avoid classifier accuracy deterioration, is considered. The other capability of the proposed framework is controlling the accuracy reduction by specifying a tolerance measure. To overcome data stream challenges, i.e., infinite length and evolving nature, we use the chunking method along with a classifier ensemble. A classifier is trained on each chunk and with previous classifiers form an ensemble of M such classifiers. Experimental results on synthetic and real-world data indicate the performance of the proposed framework in comparison with other approaches.http://ijict.itrc.ac.ir/article-1-26-en.htmlcomputer sciencedata miningsemi-supervised learningclassificationdata stream.
spellingShingle	MohammadReza Keyvanpour Mahnoosh Kholghi Sogol Haghani Hybrid of Active Learning and Dynamic Self-Training for Data Stream Classification International Journal of Information and Communication Technology Research computer science data mining semi-supervised learning classification data stream.
title	Hybrid of Active Learning and Dynamic Self-Training for Data Stream Classification
title_full	Hybrid of Active Learning and Dynamic Self-Training for Data Stream Classification
title_fullStr	Hybrid of Active Learning and Dynamic Self-Training for Data Stream Classification
title_full_unstemmed	Hybrid of Active Learning and Dynamic Self-Training for Data Stream Classification
title_short	Hybrid of Active Learning and Dynamic Self-Training for Data Stream Classification
title_sort	hybrid of active learning and dynamic self training for data stream classification
topic	computer science data mining semi-supervised learning classification data stream.
url	http://ijict.itrc.ac.ir/article-1-26-en.html
work_keys_str_mv	AT mohammadrezakeyvanpour hybridofactivelearninganddynamicselftrainingfordatastreamclassification AT mahnooshkholghi hybridofactivelearninganddynamicselftrainingfordatastreamclassification AT sogolhaghani hybridofactivelearninganddynamicselftrainingfordatastreamclassification

Hybrid of Active Learning and Dynamic Self-Training for Data Stream Classification

Similar Items