Open Coding for Machine Learning

Data-driven decisions have an unavoidable influence on people’s lives [5], and despite being marketed as fair decision-making tools, predictive models can easily perpetuate the same biases they hope to counteract. Some approaches to reducing this bias include incorporating interactive machine learni...

Full description

Bibliographic Details
Main Author:	Price, Magdalena
Other Authors:	Hadfield-Menell, Dylan
Format:	Thesis
Published:	Massachusetts Institute of Technology 2022
Online Access:	https://hdl.handle.net/1721.1/145142

_version_	1811084224747274240
author	Price, Magdalena
author2	Hadfield-Menell, Dylan
author_facet	Hadfield-Menell, Dylan Price, Magdalena
author_sort	Price, Magdalena
collection	MIT
description	Data-driven decisions have an unavoidable influence on people’s lives [5], and despite being marketed as fair decision-making tools, predictive models can easily perpetuate the same biases they hope to counteract. Some approaches to reducing this bias include incorporating interactive machine learning techniques, modifying the input features of the algorithm, or improving the pre-processing of the dataset [35]. However, even if the prediction model is fair and the raw dataset is fair, unfair labels still present the possibility of adding bias to the system [25]. In particular, predictive models for subjective observations are trained on correlative metrics that may not accurately reflect the nuanced nature of what is being predicted; Such a phenomenon may be understood as goal misspecification. Large datasets in particular can fall victim to this phenomenon [35], as the time and cost required demand alternative, less thorough methods of labeling. Thus, we take an approach that analyzes current methods of labeling big data, looking to reduce goal misspecification by modifying the process of labeling big data. Grounded coding theory [12] presents a modern approach to effectively labeling data from a human perspective, dividing the exploratory process into several stages that encourage thoughtful interaction with text corpora. In order to support effective data labeling, we draw explicit inspiration from some of the methodologies presented. Then, we build on these methodologies by augmenting them with machine learning techniques, providing support for effective and scalable data labeling. Thus, by providing a space for qualified individuals to effectively and efficiently create custom labels, our research better enables quality correlative goals for predictive models. Combining social science methodology with semi-supervised learning, we present a scalable annotation interface that serves as an effective alternative to current data labeling practices.
first_indexed	2024-09-23T12:47:16Z
format	Thesis
id	mit-1721.1/145142
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T12:47:16Z
publishDate	2022
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1451422022-08-30T03:33:23Z Open Coding for Machine Learning Price, Magdalena Hadfield-Menell, Dylan Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Data-driven decisions have an unavoidable influence on people’s lives [5], and despite being marketed as fair decision-making tools, predictive models can easily perpetuate the same biases they hope to counteract. Some approaches to reducing this bias include incorporating interactive machine learning techniques, modifying the input features of the algorithm, or improving the pre-processing of the dataset [35]. However, even if the prediction model is fair and the raw dataset is fair, unfair labels still present the possibility of adding bias to the system [25]. In particular, predictive models for subjective observations are trained on correlative metrics that may not accurately reflect the nuanced nature of what is being predicted; Such a phenomenon may be understood as goal misspecification. Large datasets in particular can fall victim to this phenomenon [35], as the time and cost required demand alternative, less thorough methods of labeling. Thus, we take an approach that analyzes current methods of labeling big data, looking to reduce goal misspecification by modifying the process of labeling big data. Grounded coding theory [12] presents a modern approach to effectively labeling data from a human perspective, dividing the exploratory process into several stages that encourage thoughtful interaction with text corpora. In order to support effective data labeling, we draw explicit inspiration from some of the methodologies presented. Then, we build on these methodologies by augmenting them with machine learning techniques, providing support for effective and scalable data labeling. Thus, by providing a space for qualified individuals to effectively and efficiently create custom labels, our research better enables quality correlative goals for predictive models. Combining social science methodology with semi-supervised learning, we present a scalable annotation interface that serves as an effective alternative to current data labeling practices. M.Eng. 2022-08-29T16:36:09Z 2022-08-29T16:36:09Z 2022-05 2022-05-27T16:19:29.577Z Thesis https://hdl.handle.net/1721.1/145142 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Price, Magdalena Open Coding for Machine Learning
title	Open Coding for Machine Learning
title_full	Open Coding for Machine Learning
title_fullStr	Open Coding for Machine Learning
title_full_unstemmed	Open Coding for Machine Learning
title_short	Open Coding for Machine Learning
title_sort	open coding for machine learning
url	https://hdl.handle.net/1721.1/145142
work_keys_str_mv	AT pricemagdalena opencodingformachinelearning

Open Coding for Machine Learning

Similar Items