Student loss: towards the probability assumption in inaccurate supervision

Noisy labels are often encountered in datasets, but learning with them is challenging. Although natural discrepancies between clean and mislabeled samples in a noisy category exist, most techniques in this field still gather them indiscriminately, which leads to their performances being partially ro...

Full description

Bibliographic Details
Main Authors:	Zhang, S, Li, J-Q, Fujita, H, Li, Y-W, Wang, D-B, Zhu, T-T, Zhang, M-L, Liu, C-Y
Format:	Journal article
Language:	English
Published:	IEEE 2024

_version_	1826314636853510144
author	Zhang, S Li, J-Q Fujita, H Li, Y-W Wang, D-B Zhu, T-T Zhang, M-L Liu, C-Y
author_facet	Zhang, S Li, J-Q Fujita, H Li, Y-W Wang, D-B Zhu, T-T Zhang, M-L Liu, C-Y
author_sort	Zhang, S
collection	OXFORD
description	Noisy labels are often encountered in datasets, but learning with them is challenging. Although natural discrepancies between clean and mislabeled samples in a noisy category exist, most techniques in this field still gather them indiscriminately, which leads to their performances being partially robust. In this paper, we reveal both empirically and theoretically that the learning robustness can be improved by assuming deep features with the same labels follow a student distribution, resulting in a more intuitive method called student loss. By embedding the student distribution and exploiting the sharpness of its curve, our method is naturally data-selective and can offer extra strength to resist mislabeled samples. This ability makes clean samples aggregate tightly in the center, while mislabeled samples scatter, even if they share the same label. Additionally, we employ the metric learning strategy and develop a large-margin student (LT) loss for better capability. It should be noted that our approach is the first work that adopts the prior probability assumption in feature representation to decrease the contributions of mislabeled samples. This strategy can enhance various losses to join the student loss family, even if they have been robust losses. Experiments demonstrate that our approach is more effective in inaccurate supervision. Enhanced LT losses significantly outperform various state-of-the-art methods in most cases. Even huge improvements of over 50% can be obtained under some conditions.
first_indexed	2024-12-09T03:10:18Z
format	Journal article
id	oxford-uuid:b88b345c-140e-43a4-bbe5-c776d3d04a07
institution	University of Oxford
language	English
last_indexed	2024-12-09T03:10:18Z
publishDate	2024
publisher	IEEE
record_format	dspace
spelling	oxford-uuid:b88b345c-140e-43a4-bbe5-c776d3d04a072024-09-26T14:44:35ZStudent loss: towards the probability assumption in inaccurate supervisionJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:b88b345c-140e-43a4-bbe5-c776d3d04a07EnglishSymplectic ElementsIEEE2024Zhang, SLi, J-QFujita, HLi, Y-WWang, D-BZhu, T-TZhang, M-LLiu, C-YNoisy labels are often encountered in datasets, but learning with them is challenging. Although natural discrepancies between clean and mislabeled samples in a noisy category exist, most techniques in this field still gather them indiscriminately, which leads to their performances being partially robust. In this paper, we reveal both empirically and theoretically that the learning robustness can be improved by assuming deep features with the same labels follow a student distribution, resulting in a more intuitive method called student loss. By embedding the student distribution and exploiting the sharpness of its curve, our method is naturally data-selective and can offer extra strength to resist mislabeled samples. This ability makes clean samples aggregate tightly in the center, while mislabeled samples scatter, even if they share the same label. Additionally, we employ the metric learning strategy and develop a large-margin student (LT) loss for better capability. It should be noted that our approach is the first work that adopts the prior probability assumption in feature representation to decrease the contributions of mislabeled samples. This strategy can enhance various losses to join the student loss family, even if they have been robust losses. Experiments demonstrate that our approach is more effective in inaccurate supervision. Enhanced LT losses significantly outperform various state-of-the-art methods in most cases. Even huge improvements of over 50% can be obtained under some conditions.
spellingShingle	Zhang, S Li, J-Q Fujita, H Li, Y-W Wang, D-B Zhu, T-T Zhang, M-L Liu, C-Y Student loss: towards the probability assumption in inaccurate supervision
title	Student loss: towards the probability assumption in inaccurate supervision
title_full	Student loss: towards the probability assumption in inaccurate supervision
title_fullStr	Student loss: towards the probability assumption in inaccurate supervision
title_full_unstemmed	Student loss: towards the probability assumption in inaccurate supervision
title_short	Student loss: towards the probability assumption in inaccurate supervision
title_sort	student loss towards the probability assumption in inaccurate supervision
work_keys_str_mv	AT zhangs studentlosstowardstheprobabilityassumptionininaccuratesupervision AT lijq studentlosstowardstheprobabilityassumptionininaccuratesupervision AT fujitah studentlosstowardstheprobabilityassumptionininaccuratesupervision AT liyw studentlosstowardstheprobabilityassumptionininaccuratesupervision AT wangdb studentlosstowardstheprobabilityassumptionininaccuratesupervision AT zhutt studentlosstowardstheprobabilityassumptionininaccuratesupervision AT zhangml studentlosstowardstheprobabilityassumptionininaccuratesupervision AT liucy studentlosstowardstheprobabilityassumptionininaccuratesupervision

Student loss: towards the probability assumption in inaccurate supervision

Similar Items