Improving Label Noise Filtering by Exploiting Unlabeled Data
With the significant growth in the scale of data, an increasing amount of training data is available in many machine learning tasks. However, it is difficult to ensure perfect labeling with a large volume of training data. Some labels can be incorrect, resulting in label noise, which could lead to d...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2018-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8295034/ |
_version_ | 1818927840879443968 |
---|---|
author | Donghai Guan Hongqiang Wei Weiwei Yuan Guangjie Han Yuan Tian Mohammed Al-Dhelaan Abdullah Al-Dhelaan |
author_facet | Donghai Guan Hongqiang Wei Weiwei Yuan Guangjie Han Yuan Tian Mohammed Al-Dhelaan Abdullah Al-Dhelaan |
author_sort | Donghai Guan |
collection | DOAJ |
description | With the significant growth in the scale of data, an increasing amount of training data is available in many machine learning tasks. However, it is difficult to ensure perfect labeling with a large volume of training data. Some labels can be incorrect, resulting in label noise, which could lead to deterioration in learning performance. A common way to address label noise is to apply noise filtering techniques to identify and remove noise prior to learning. Multiple noise filtering approaches have been proposed. However, almost all existing works focus on only mislabeled training data and ignore the existence of unlabeled data. In fact, unlabeled data are common in many applications, and their values have been extensively studied and recognized. Therefore, in this paper, we explore the effective use of unlabeled data to improve the noise filtering performance. To this end, we propose a novel noise filtering algorithm called enhanced soft majority voting by exploiting unlabeled data (ESMVU), which is an ensemble-learning-based filter that adopts a soft majority voting strategy. ESMVU provides a systematic way to measure the value of unlabeled data by considering different aspects, such as label confidence and the sample distribution. Finally, the effectiveness of the proposed method is confirmed by experiments and comparison with other methods. |
first_indexed | 2024-12-20T03:19:25Z |
format | Article |
id | doaj.art-ed9e8f23c1d84f3a9aa69ad95d65fe12 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-20T03:19:25Z |
publishDate | 2018-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-ed9e8f23c1d84f3a9aa69ad95d65fe122022-12-21T19:55:15ZengIEEEIEEE Access2169-35362018-01-016111541116510.1109/ACCESS.2018.28077798295034Improving Label Noise Filtering by Exploiting Unlabeled DataDonghai Guan0Hongqiang Wei1Weiwei Yuan2Guangjie Han3https://orcid.org/0000-0002-6921-7369Yuan Tian4Mohammed Al-Dhelaan5Abdullah Al-Dhelaan6College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, ChinaCollege of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, ChinaCollege of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, ChinaKey Laboratory for Ubiquitous Network and Service Software of Liaoning Province, School of Software, Dalian University of Technology, Dalian, ChinaDepartment of Computer Science, King Saud University, Riyadh, Saudi ArabiaDepartment of Computer Science, King Saud University, Riyadh, Saudi ArabiaDepartment of Computer Science, King Saud University, Riyadh, Saudi ArabiaWith the significant growth in the scale of data, an increasing amount of training data is available in many machine learning tasks. However, it is difficult to ensure perfect labeling with a large volume of training data. Some labels can be incorrect, resulting in label noise, which could lead to deterioration in learning performance. A common way to address label noise is to apply noise filtering techniques to identify and remove noise prior to learning. Multiple noise filtering approaches have been proposed. However, almost all existing works focus on only mislabeled training data and ignore the existence of unlabeled data. In fact, unlabeled data are common in many applications, and their values have been extensively studied and recognized. Therefore, in this paper, we explore the effective use of unlabeled data to improve the noise filtering performance. To this end, we propose a novel noise filtering algorithm called enhanced soft majority voting by exploiting unlabeled data (ESMVU), which is an ensemble-learning-based filter that adopts a soft majority voting strategy. ESMVU provides a systematic way to measure the value of unlabeled data by considering different aspects, such as label confidence and the sample distribution. Finally, the effectiveness of the proposed method is confirmed by experiments and comparison with other methods.https://ieeexplore.ieee.org/document/8295034/Label noisenoise filteringunlabeled datasoft majority voting |
spellingShingle | Donghai Guan Hongqiang Wei Weiwei Yuan Guangjie Han Yuan Tian Mohammed Al-Dhelaan Abdullah Al-Dhelaan Improving Label Noise Filtering by Exploiting Unlabeled Data IEEE Access Label noise noise filtering unlabeled data soft majority voting |
title | Improving Label Noise Filtering by Exploiting Unlabeled Data |
title_full | Improving Label Noise Filtering by Exploiting Unlabeled Data |
title_fullStr | Improving Label Noise Filtering by Exploiting Unlabeled Data |
title_full_unstemmed | Improving Label Noise Filtering by Exploiting Unlabeled Data |
title_short | Improving Label Noise Filtering by Exploiting Unlabeled Data |
title_sort | improving label noise filtering by exploiting unlabeled data |
topic | Label noise noise filtering unlabeled data soft majority voting |
url | https://ieeexplore.ieee.org/document/8295034/ |
work_keys_str_mv | AT donghaiguan improvinglabelnoisefilteringbyexploitingunlabeleddata AT hongqiangwei improvinglabelnoisefilteringbyexploitingunlabeleddata AT weiweiyuan improvinglabelnoisefilteringbyexploitingunlabeleddata AT guangjiehan improvinglabelnoisefilteringbyexploitingunlabeleddata AT yuantian improvinglabelnoisefilteringbyexploitingunlabeleddata AT mohammedaldhelaan improvinglabelnoisefilteringbyexploitingunlabeleddata AT abdullahaldhelaan improvinglabelnoisefilteringbyexploitingunlabeleddata |