Confident Learning for Machines and Humans

The coupling of machine intelligence and human intelligence has the potential to empower humans with augmented capabilities (e.g., improving rhyme-density while writing song lyrics, enhancing empathy via emotion detection, and personalizing learning in online courses). Unfortunately, humans operate...

Full description

Bibliographic Details
Main Author:	Northcutt, Curtis George
Other Authors:	Chuang, Isaac L.
Format:	Thesis
Published:	Massachusetts Institute of Technology 2022
Online Access:	https://hdl.handle.net/1721.1/139321 https://orcid.org/0000-0002-2423-1300

_version_	1811096647086637056
author	Northcutt, Curtis George
author2	Chuang, Isaac L.
author_facet	Chuang, Isaac L. Northcutt, Curtis George
author_sort	Northcutt, Curtis George
collection	MIT
description	The coupling of machine intelligence and human intelligence has the potential to empower humans with augmented capabilities (e.g., improving rhyme-density while writing song lyrics, enhancing empathy via emotion detection, and personalizing learning in online courses). Unfortunately, humans operate in an uncertain world – where the performance of even the most sophisticated model-centric artificially intelligent system often depends on its data-centric ability to deal with the uncertainty in the labels upon which it is trained. To this end, we introduce confident learning whereby a machine (like humans) must learn with noisy-labeled data, directly quantify and identify label noise, and unlearn misconceptions by re-learning with confidence on cleaned data with erroneous labels removed. We achieve this by developing a principled theory and framework for confident learning with affordances for quantifying, identifying, and learning with label errors in data, and we open-source their implementations in the cleanlab Python package. Based on human verification of the label errors found using cleanlab: we estimate a 3.4% lower bound error rate of the test set labels of ten of the most commonly used machine learning datasets across audio, image, and text modalities; examine the noise prevalence needed to change machine benchmark rankings; and provide corrected test sets so that humans can benchmark machine performance with increased confidence. We then build and evaluate three artificially intelligent systems that augment human capabilities in noisy, real-world settings. Namely: (1) assisted-turn-taking in multi-person conversations by combining noisy embodied audio and video signals from multiple synchronized perspectives, (2) assisted-generation of writing song lyrics by exploiting the inherent aleatoric uncertainty of language and semantics, and (3) assisted-human-learning in open online courses by depolarizing/diversifying comment rankings to mitigate the majority bias inherent in rankings based on upvotes. In each case, the artificially intelligent system’s ability to overcome uncertainty is linked to its efficacy of augmenting human capabilities, and by extension, humans’ confidence in their ability to perform the associated task.
first_indexed	2024-09-23T16:46:52Z
format	Thesis
id	mit-1721.1/139321
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T16:46:52Z
publishDate	2022
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1393212022-01-15T03:02:56Z Confident Learning for Machines and Humans Northcutt, Curtis George Chuang, Isaac L. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science The coupling of machine intelligence and human intelligence has the potential to empower humans with augmented capabilities (e.g., improving rhyme-density while writing song lyrics, enhancing empathy via emotion detection, and personalizing learning in online courses). Unfortunately, humans operate in an uncertain world – where the performance of even the most sophisticated model-centric artificially intelligent system often depends on its data-centric ability to deal with the uncertainty in the labels upon which it is trained. To this end, we introduce confident learning whereby a machine (like humans) must learn with noisy-labeled data, directly quantify and identify label noise, and unlearn misconceptions by re-learning with confidence on cleaned data with erroneous labels removed. We achieve this by developing a principled theory and framework for confident learning with affordances for quantifying, identifying, and learning with label errors in data, and we open-source their implementations in the cleanlab Python package. Based on human verification of the label errors found using cleanlab: we estimate a 3.4% lower bound error rate of the test set labels of ten of the most commonly used machine learning datasets across audio, image, and text modalities; examine the noise prevalence needed to change machine benchmark rankings; and provide corrected test sets so that humans can benchmark machine performance with increased confidence. We then build and evaluate three artificially intelligent systems that augment human capabilities in noisy, real-world settings. Namely: (1) assisted-turn-taking in multi-person conversations by combining noisy embodied audio and video signals from multiple synchronized perspectives, (2) assisted-generation of writing song lyrics by exploiting the inherent aleatoric uncertainty of language and semantics, and (3) assisted-human-learning in open online courses by depolarizing/diversifying comment rankings to mitigate the majority bias inherent in rankings based on upvotes. In each case, the artificially intelligent system’s ability to overcome uncertainty is linked to its efficacy of augmenting human capabilities, and by extension, humans’ confidence in their ability to perform the associated task. Ph.D. 2022-01-14T15:03:56Z 2022-01-14T15:03:56Z 2021-06 2021-06-23T19:39:07.046Z Thesis https://hdl.handle.net/1721.1/139321 https://orcid.org/0000-0002-2423-1300 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Northcutt, Curtis George Confident Learning for Machines and Humans
title	Confident Learning for Machines and Humans
title_full	Confident Learning for Machines and Humans
title_fullStr	Confident Learning for Machines and Humans
title_full_unstemmed	Confident Learning for Machines and Humans
title_short	Confident Learning for Machines and Humans
title_sort	confident learning for machines and humans
url	https://hdl.handle.net/1721.1/139321 https://orcid.org/0000-0002-2423-1300
work_keys_str_mv	AT northcuttcurtisgeorge confidentlearningformachinesandhumans

Confident Learning for Machines and Humans

Similar Items