Bounds on Performance for Recovery of Corrupted Labels in Supervised Learning: A Finite Query-Testing Approach

Label corruption leads to a significant challenge in supervised learning, particularly in deep neural networks. This paper considers recovering a small corrupted subset of data samples which are typically caused by non-expert sources, such as automatic classifiers. Our aim is to recover the corrupte...

Full description

Bibliographic Details
Main Author: Jin-Taek Seong
Format: Article
Language:English
Published: MDPI AG 2023-08-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/11/17/3636
_version_ 1797582205430530048
author Jin-Taek Seong
author_facet Jin-Taek Seong
author_sort Jin-Taek Seong
collection DOAJ
description Label corruption leads to a significant challenge in supervised learning, particularly in deep neural networks. This paper considers recovering a small corrupted subset of data samples which are typically caused by non-expert sources, such as automatic classifiers. Our aim is to recover the corrupted data samples by exploiting a finite query-testing system as an additional expert. The task involves identifying the corrupted data samples with minimal expert queries and finding them to their true label values. The proposed query-testing system uses a random selection of a subset of data samples and utilizes finite field operations to construct combined responses. In this paper, we demonstrate an information-theoretic lower bound on the minimum number of queries required for recovering corrupted labels. The lower bound can be represented as a function of joint entropy with an imbalanced rate of data samples and mislabeled probability. In addition, we find an upper bound on the error probability using maximum a posteriori decoding.
first_indexed 2024-03-10T23:18:40Z
format Article
id doaj.art-9dd97c18e97a4c5db32c112a7b802c12
institution Directory Open Access Journal
issn 2227-7390
language English
last_indexed 2024-03-10T23:18:40Z
publishDate 2023-08-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj.art-9dd97c18e97a4c5db32c112a7b802c122023-11-19T08:30:05ZengMDPI AGMathematics2227-73902023-08-011117363610.3390/math11173636Bounds on Performance for Recovery of Corrupted Labels in Supervised Learning: A Finite Query-Testing ApproachJin-Taek Seong0Graduate School of Data Science, Chonnam National University, Gwangju 61186, Republic of KoreaLabel corruption leads to a significant challenge in supervised learning, particularly in deep neural networks. This paper considers recovering a small corrupted subset of data samples which are typically caused by non-expert sources, such as automatic classifiers. Our aim is to recover the corrupted data samples by exploiting a finite query-testing system as an additional expert. The task involves identifying the corrupted data samples with minimal expert queries and finding them to their true label values. The proposed query-testing system uses a random selection of a subset of data samples and utilizes finite field operations to construct combined responses. In this paper, we demonstrate an information-theoretic lower bound on the minimum number of queries required for recovering corrupted labels. The lower bound can be represented as a function of joint entropy with an imbalanced rate of data samples and mislabeled probability. In addition, we find an upper bound on the error probability using maximum a posteriori decoding.https://www.mdpi.com/2227-7390/11/17/3636supervised learningcorrupted labelquery testingupper boundlower bound
spellingShingle Jin-Taek Seong
Bounds on Performance for Recovery of Corrupted Labels in Supervised Learning: A Finite Query-Testing Approach
Mathematics
supervised learning
corrupted label
query testing
upper bound
lower bound
title Bounds on Performance for Recovery of Corrupted Labels in Supervised Learning: A Finite Query-Testing Approach
title_full Bounds on Performance for Recovery of Corrupted Labels in Supervised Learning: A Finite Query-Testing Approach
title_fullStr Bounds on Performance for Recovery of Corrupted Labels in Supervised Learning: A Finite Query-Testing Approach
title_full_unstemmed Bounds on Performance for Recovery of Corrupted Labels in Supervised Learning: A Finite Query-Testing Approach
title_short Bounds on Performance for Recovery of Corrupted Labels in Supervised Learning: A Finite Query-Testing Approach
title_sort bounds on performance for recovery of corrupted labels in supervised learning a finite query testing approach
topic supervised learning
corrupted label
query testing
upper bound
lower bound
url https://www.mdpi.com/2227-7390/11/17/3636
work_keys_str_mv AT jintaekseong boundsonperformanceforrecoveryofcorruptedlabelsinsupervisedlearningafinitequerytestingapproach