Deriving Ground Truth Labels for Regression Problems Using Annotator Precision

When training machine learning models with practical applications, a quality ground truth dataset is critical. Unlike in classification problems, there is currently no effective method for determining a single ground truth value or landmark from a set of annotations in regression problems. We propos...

Full description

Bibliographic Details
Main Authors: Benjamin Johnston, Philip de Chazal
Format: Article
Language:English
Published: MDPI AG 2023-08-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/16/9130
Description
Summary:When training machine learning models with practical applications, a quality ground truth dataset is critical. Unlike in classification problems, there is currently no effective method for determining a single ground truth value or landmark from a set of annotations in regression problems. We propose a novel method for deriving ground truth labels in regression problems that considers the performance and precision of individual annotators when identifying each label separately. In contrast to the commonly accepted method of computing the global mean, our method does not assume each annotator to be equally capable of completing the specified task, but rather ensures that higher-performing annotators have a greater contribution to the final result. The ground truth selection method described within this paper provides a means of improving the quality of input data for machine learning model development by removing lower-quality labels. In this study, we objectively demonstrate the improved performance by applying the method to a simulated dataset where a canonical ground truth position can be known, as well as to a sample of data collected from crowd-sourced labels.
ISSN:2076-3417