Bounds on Performance for Recovery of Corrupted Labels in Supervised Learning: A Finite Query-Testing Approach
Label corruption leads to a significant challenge in supervised learning, particularly in deep neural networks. This paper considers recovering a small corrupted subset of data samples which are typically caused by non-expert sources, such as automatic classifiers. Our aim is to recover the corrupte...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-08-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/11/17/3636 |
_version_ | 1797582205430530048 |
---|---|
author | Jin-Taek Seong |
author_facet | Jin-Taek Seong |
author_sort | Jin-Taek Seong |
collection | DOAJ |
description | Label corruption leads to a significant challenge in supervised learning, particularly in deep neural networks. This paper considers recovering a small corrupted subset of data samples which are typically caused by non-expert sources, such as automatic classifiers. Our aim is to recover the corrupted data samples by exploiting a finite query-testing system as an additional expert. The task involves identifying the corrupted data samples with minimal expert queries and finding them to their true label values. The proposed query-testing system uses a random selection of a subset of data samples and utilizes finite field operations to construct combined responses. In this paper, we demonstrate an information-theoretic lower bound on the minimum number of queries required for recovering corrupted labels. The lower bound can be represented as a function of joint entropy with an imbalanced rate of data samples and mislabeled probability. In addition, we find an upper bound on the error probability using maximum a posteriori decoding. |
first_indexed | 2024-03-10T23:18:40Z |
format | Article |
id | doaj.art-9dd97c18e97a4c5db32c112a7b802c12 |
institution | Directory Open Access Journal |
issn | 2227-7390 |
language | English |
last_indexed | 2024-03-10T23:18:40Z |
publishDate | 2023-08-01 |
publisher | MDPI AG |
record_format | Article |
series | Mathematics |
spelling | doaj.art-9dd97c18e97a4c5db32c112a7b802c122023-11-19T08:30:05ZengMDPI AGMathematics2227-73902023-08-011117363610.3390/math11173636Bounds on Performance for Recovery of Corrupted Labels in Supervised Learning: A Finite Query-Testing ApproachJin-Taek Seong0Graduate School of Data Science, Chonnam National University, Gwangju 61186, Republic of KoreaLabel corruption leads to a significant challenge in supervised learning, particularly in deep neural networks. This paper considers recovering a small corrupted subset of data samples which are typically caused by non-expert sources, such as automatic classifiers. Our aim is to recover the corrupted data samples by exploiting a finite query-testing system as an additional expert. The task involves identifying the corrupted data samples with minimal expert queries and finding them to their true label values. The proposed query-testing system uses a random selection of a subset of data samples and utilizes finite field operations to construct combined responses. In this paper, we demonstrate an information-theoretic lower bound on the minimum number of queries required for recovering corrupted labels. The lower bound can be represented as a function of joint entropy with an imbalanced rate of data samples and mislabeled probability. In addition, we find an upper bound on the error probability using maximum a posteriori decoding.https://www.mdpi.com/2227-7390/11/17/3636supervised learningcorrupted labelquery testingupper boundlower bound |
spellingShingle | Jin-Taek Seong Bounds on Performance for Recovery of Corrupted Labels in Supervised Learning: A Finite Query-Testing Approach Mathematics supervised learning corrupted label query testing upper bound lower bound |
title | Bounds on Performance for Recovery of Corrupted Labels in Supervised Learning: A Finite Query-Testing Approach |
title_full | Bounds on Performance for Recovery of Corrupted Labels in Supervised Learning: A Finite Query-Testing Approach |
title_fullStr | Bounds on Performance for Recovery of Corrupted Labels in Supervised Learning: A Finite Query-Testing Approach |
title_full_unstemmed | Bounds on Performance for Recovery of Corrupted Labels in Supervised Learning: A Finite Query-Testing Approach |
title_short | Bounds on Performance for Recovery of Corrupted Labels in Supervised Learning: A Finite Query-Testing Approach |
title_sort | bounds on performance for recovery of corrupted labels in supervised learning a finite query testing approach |
topic | supervised learning corrupted label query testing upper bound lower bound |
url | https://www.mdpi.com/2227-7390/11/17/3636 |
work_keys_str_mv | AT jintaekseong boundsonperformanceforrecoveryofcorruptedlabelsinsupervisedlearningafinitequerytestingapproach |