Humans in the loop: incorporating expert and crowdsourced knowledge for predictions using survey data

Survey datasets are often wider than they are long. This high ratio of variables to observations raises concerns about overfitting during prediction, making informed variable selection important. Recent applications in computer science have sought to incorporate human knowledge into machine learning...

Descrizione completa

Dettagli Bibliografici
Autori principali: Filippova, A, Gilroy, C, Kashyap, R, Kirchner, A, Morgan, A, Polimis, K, Usmani, A, Wang, T
Natura: Journal article
Pubblicazione: SAGE Publications 2019
_version_ 1826262085241143296
author Filippova, A
Gilroy, C
Kashyap, R
Kirchner, A
Morgan, A
Polimis, K
Usmani, A
Wang, T
author_facet Filippova, A
Gilroy, C
Kashyap, R
Kirchner, A
Morgan, A
Polimis, K
Usmani, A
Wang, T
author_sort Filippova, A
collection OXFORD
description Survey datasets are often wider than they are long. This high ratio of variables to observations raises concerns about overfitting during prediction, making informed variable selection important. Recent applications in computer science have sought to incorporate human knowledge into machine learning methods to address these problems. We implement such a “human-in-the-loop” approach in the Fragile Families Challenge. We use surveys to elicit knowledge from experts and laypeople about the importance of different variables to different outcomes. This strategy gives us the option to subset the data before prediction or to incorporate human knowledge as scores in prediction models, or both together. We find that human intervention is not obviously helpful. Human-informed subsetting reduces predictive performance, and considered alone, approaches incorporating scores perform marginally worse than approaches which do not. However, incorporating human knowledge may still improve predictive performance, and future research should consider new ways of doing so.
first_indexed 2024-03-06T19:30:47Z
format Journal article
id oxford-uuid:1d64c354-2c85-474d-9e94-b2e47d00df35
institution University of Oxford
last_indexed 2024-03-06T19:30:47Z
publishDate 2019
publisher SAGE Publications
record_format dspace
spelling oxford-uuid:1d64c354-2c85-474d-9e94-b2e47d00df352022-03-26T11:10:34ZHumans in the loop: incorporating expert and crowdsourced knowledge for predictions using survey dataJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:1d64c354-2c85-474d-9e94-b2e47d00df35Symplectic Elements at OxfordSAGE Publications2019Filippova, AGilroy, CKashyap, RKirchner, AMorgan, APolimis, KUsmani, AWang, TSurvey datasets are often wider than they are long. This high ratio of variables to observations raises concerns about overfitting during prediction, making informed variable selection important. Recent applications in computer science have sought to incorporate human knowledge into machine learning methods to address these problems. We implement such a “human-in-the-loop” approach in the Fragile Families Challenge. We use surveys to elicit knowledge from experts and laypeople about the importance of different variables to different outcomes. This strategy gives us the option to subset the data before prediction or to incorporate human knowledge as scores in prediction models, or both together. We find that human intervention is not obviously helpful. Human-informed subsetting reduces predictive performance, and considered alone, approaches incorporating scores perform marginally worse than approaches which do not. However, incorporating human knowledge may still improve predictive performance, and future research should consider new ways of doing so.
spellingShingle Filippova, A
Gilroy, C
Kashyap, R
Kirchner, A
Morgan, A
Polimis, K
Usmani, A
Wang, T
Humans in the loop: incorporating expert and crowdsourced knowledge for predictions using survey data
title Humans in the loop: incorporating expert and crowdsourced knowledge for predictions using survey data
title_full Humans in the loop: incorporating expert and crowdsourced knowledge for predictions using survey data
title_fullStr Humans in the loop: incorporating expert and crowdsourced knowledge for predictions using survey data
title_full_unstemmed Humans in the loop: incorporating expert and crowdsourced knowledge for predictions using survey data
title_short Humans in the loop: incorporating expert and crowdsourced knowledge for predictions using survey data
title_sort humans in the loop incorporating expert and crowdsourced knowledge for predictions using survey data
work_keys_str_mv AT filippovaa humansintheloopincorporatingexpertandcrowdsourcedknowledgeforpredictionsusingsurveydata
AT gilroyc humansintheloopincorporatingexpertandcrowdsourcedknowledgeforpredictionsusingsurveydata
AT kashyapr humansintheloopincorporatingexpertandcrowdsourcedknowledgeforpredictionsusingsurveydata
AT kirchnera humansintheloopincorporatingexpertandcrowdsourcedknowledgeforpredictionsusingsurveydata
AT morgana humansintheloopincorporatingexpertandcrowdsourcedknowledgeforpredictionsusingsurveydata
AT polimisk humansintheloopincorporatingexpertandcrowdsourcedknowledgeforpredictionsusingsurveydata
AT usmania humansintheloopincorporatingexpertandcrowdsourcedknowledgeforpredictionsusingsurveydata
AT wangt humansintheloopincorporatingexpertandcrowdsourcedknowledgeforpredictionsusingsurveydata