Mimicking the oracle: an initial phase decorrelation approach for class incremental learning
Class Incremental Learning (CIL) aims at learning a classifier in a phase-by-phase manner, in which only data of a subset of the classes are provided at each phase. Previous works mainly focus on mitigating forgetting in phases after the initial one. However, we find that improving CIL at its initia...
Main Authors: | , , , , , , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
IEEE
2022
|
_version_ | 1797107820883083264 |
---|---|
author | Shi, Y Zhou, K Liang, J Jiang, Z Feng, J Torr, P Bai, S Tan, VYF |
author_facet | Shi, Y Zhou, K Liang, J Jiang, Z Feng, J Torr, P Bai, S Tan, VYF |
author_sort | Shi, Y |
collection | OXFORD |
description | Class Incremental Learning (CIL) aims at learning a
classifier in a phase-by-phase manner, in which only data of
a subset of the classes are provided at each phase. Previous
works mainly focus on mitigating forgetting in phases after
the initial one. However, we find that improving CIL at its
initial phase is also a promising direction. Specifically, we
experimentally show that directly encouraging CIL Learner
at the initial phase to output similar representations as the
model jointly trained on all classes can greatly boost the
CIL performance. Motivated by this, we study the difference between a na¨ıvely-trained initial-phase model and the
oracle model. Specifically, since one major difference between these two models is the number of training classes,
we investigate how such difference affects the model representations. We find that, with fewer training classes, the
data representations of each class lie in a long and narrow
region; with more training classes, the representations of
each class scatter more uniformly. Inspired by this observation, we propose Class-wise Decorrelation (CwD) that effectively regularizes representations of each class to scatter
more uniformly, thus mimicking the model jointly trained
with all classes (i.e., the oracle model). Our CwD is simple
to implement and easy to plug into existing methods. Extensive experiments on various benchmark datasets show
that CwD consistently and significantly improves the performance of existing state-of-the-art methods by around 1%
to 3%. |
first_indexed | 2024-03-07T07:19:33Z |
format | Conference item |
id | oxford-uuid:3feb80ee-62e5-49d9-9438-cad0b06434ec |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-07T07:19:33Z |
publishDate | 2022 |
publisher | IEEE |
record_format | dspace |
spelling | oxford-uuid:3feb80ee-62e5-49d9-9438-cad0b06434ec2022-10-04T10:17:09ZMimicking the oracle: an initial phase decorrelation approach for class incremental learningConference itemhttp://purl.org/coar/resource_type/c_5794uuid:3feb80ee-62e5-49d9-9438-cad0b06434ecEnglishSymplectic ElementsIEEE2022Shi, YZhou, KLiang, JJiang, ZFeng, JTorr, PBai, STan, VYFClass Incremental Learning (CIL) aims at learning a classifier in a phase-by-phase manner, in which only data of a subset of the classes are provided at each phase. Previous works mainly focus on mitigating forgetting in phases after the initial one. However, we find that improving CIL at its initial phase is also a promising direction. Specifically, we experimentally show that directly encouraging CIL Learner at the initial phase to output similar representations as the model jointly trained on all classes can greatly boost the CIL performance. Motivated by this, we study the difference between a na¨ıvely-trained initial-phase model and the oracle model. Specifically, since one major difference between these two models is the number of training classes, we investigate how such difference affects the model representations. We find that, with fewer training classes, the data representations of each class lie in a long and narrow region; with more training classes, the representations of each class scatter more uniformly. Inspired by this observation, we propose Class-wise Decorrelation (CwD) that effectively regularizes representations of each class to scatter more uniformly, thus mimicking the model jointly trained with all classes (i.e., the oracle model). Our CwD is simple to implement and easy to plug into existing methods. Extensive experiments on various benchmark datasets show that CwD consistently and significantly improves the performance of existing state-of-the-art methods by around 1% to 3%. |
spellingShingle | Shi, Y Zhou, K Liang, J Jiang, Z Feng, J Torr, P Bai, S Tan, VYF Mimicking the oracle: an initial phase decorrelation approach for class incremental learning |
title | Mimicking the oracle: an initial phase decorrelation approach for class incremental learning |
title_full | Mimicking the oracle: an initial phase decorrelation approach for class incremental learning |
title_fullStr | Mimicking the oracle: an initial phase decorrelation approach for class incremental learning |
title_full_unstemmed | Mimicking the oracle: an initial phase decorrelation approach for class incremental learning |
title_short | Mimicking the oracle: an initial phase decorrelation approach for class incremental learning |
title_sort | mimicking the oracle an initial phase decorrelation approach for class incremental learning |
work_keys_str_mv | AT shiy mimickingtheoracleaninitialphasedecorrelationapproachforclassincrementallearning AT zhouk mimickingtheoracleaninitialphasedecorrelationapproachforclassincrementallearning AT liangj mimickingtheoracleaninitialphasedecorrelationapproachforclassincrementallearning AT jiangz mimickingtheoracleaninitialphasedecorrelationapproachforclassincrementallearning AT fengj mimickingtheoracleaninitialphasedecorrelationapproachforclassincrementallearning AT torrp mimickingtheoracleaninitialphasedecorrelationapproachforclassincrementallearning AT bais mimickingtheoracleaninitialphasedecorrelationapproachforclassincrementallearning AT tanvyf mimickingtheoracleaninitialphasedecorrelationapproachforclassincrementallearning |