Using Domain Adaptation for Incremental SVM Classification of Drift Data

A common assumption in machine learning is that training data is complete, and the data distribution is fixed. However, in many practical applications, this assumption does not hold. Incremental learning was proposed to compensate for this problem. Common approaches include retraining models and inc...

Full description

Bibliographic Details
Main Authors: Junya Tang, Kuo-Yi Lin, Li Li
Format: Article
Language:English
Published: MDPI AG 2022-09-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/10/19/3579
_version_ 1797478101989457920
author Junya Tang
Kuo-Yi Lin
Li Li
author_facet Junya Tang
Kuo-Yi Lin
Li Li
author_sort Junya Tang
collection DOAJ
description A common assumption in machine learning is that training data is complete, and the data distribution is fixed. However, in many practical applications, this assumption does not hold. Incremental learning was proposed to compensate for this problem. Common approaches include retraining models and incremental learning to compensate for the shortage of training data. Retraining models is time-consuming and computationally expensive, while incremental learning can save time and computational costs. However, the concept drift may affect the performance. Two crucial issues should be considered to address concept drift in incremental learning: gaining new knowledge without forgetting previously acquired knowledge and forgetting obsolete information without corrupting valid information. This paper proposes an incremental support vector machine learning approach with domain adaptation, considering both crucial issues. Firstly, a small amount of new data is used to fine-tune the previous model to generate a model that is sensitive to the new data but retains the previous data information by transferring parameters. Secondly, an ensemble and model selection mechanism based on Bayesian theory is proposed to keep the valid information. The computational experiments indicate that the performance of the proposed model improved as new data was acquired. In addition, the influence of the degree of data drift on the algorithm is also explored. A gain in performance on four out of five industrial datasets and four synthetic datasets has been demonstrated over the support vector machine and incremental support vector machine algorithms.
first_indexed 2024-03-09T21:27:15Z
format Article
id doaj.art-210b5f14bd6a436bb9d9e156c0a8bf9c
institution Directory Open Access Journal
issn 2227-7390
language English
last_indexed 2024-03-09T21:27:15Z
publishDate 2022-09-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj.art-210b5f14bd6a436bb9d9e156c0a8bf9c2023-11-23T21:03:56ZengMDPI AGMathematics2227-73902022-09-011019357910.3390/math10193579Using Domain Adaptation for Incremental SVM Classification of Drift DataJunya Tang0Kuo-Yi Lin1Li Li2School of Electronics and Information Engineering, Tongji University, Shanghai 201804, ChinaSchool of Electronics and Information Engineering, Tongji University, Shanghai 201804, ChinaSchool of Electronics and Information Engineering, Tongji University, Shanghai 201804, ChinaA common assumption in machine learning is that training data is complete, and the data distribution is fixed. However, in many practical applications, this assumption does not hold. Incremental learning was proposed to compensate for this problem. Common approaches include retraining models and incremental learning to compensate for the shortage of training data. Retraining models is time-consuming and computationally expensive, while incremental learning can save time and computational costs. However, the concept drift may affect the performance. Two crucial issues should be considered to address concept drift in incremental learning: gaining new knowledge without forgetting previously acquired knowledge and forgetting obsolete information without corrupting valid information. This paper proposes an incremental support vector machine learning approach with domain adaptation, considering both crucial issues. Firstly, a small amount of new data is used to fine-tune the previous model to generate a model that is sensitive to the new data but retains the previous data information by transferring parameters. Secondly, an ensemble and model selection mechanism based on Bayesian theory is proposed to keep the valid information. The computational experiments indicate that the performance of the proposed model improved as new data was acquired. In addition, the influence of the degree of data drift on the algorithm is also explored. A gain in performance on four out of five industrial datasets and four synthetic datasets has been demonstrated over the support vector machine and incremental support vector machine algorithms.https://www.mdpi.com/2227-7390/10/19/3579incremental learningdomain adaptationSVM classificationensemble learning
spellingShingle Junya Tang
Kuo-Yi Lin
Li Li
Using Domain Adaptation for Incremental SVM Classification of Drift Data
Mathematics
incremental learning
domain adaptation
SVM classification
ensemble learning
title Using Domain Adaptation for Incremental SVM Classification of Drift Data
title_full Using Domain Adaptation for Incremental SVM Classification of Drift Data
title_fullStr Using Domain Adaptation for Incremental SVM Classification of Drift Data
title_full_unstemmed Using Domain Adaptation for Incremental SVM Classification of Drift Data
title_short Using Domain Adaptation for Incremental SVM Classification of Drift Data
title_sort using domain adaptation for incremental svm classification of drift data
topic incremental learning
domain adaptation
SVM classification
ensemble learning
url https://www.mdpi.com/2227-7390/10/19/3579
work_keys_str_mv AT junyatang usingdomainadaptationforincrementalsvmclassificationofdriftdata
AT kuoyilin usingdomainadaptationforincrementalsvmclassificationofdriftdata
AT lili usingdomainadaptationforincrementalsvmclassificationofdriftdata