Using informative behavior to increase engagement while learning from human reward

In this work, we address a relatively unexplored aspect of designing agents that learn from human reward. We investigate how an agent’s non-task behavior can affect a human trainer’s training and agent learning. We use the TAMER framework, which facilitates the training of agents by human-generated...

Full description

Bibliographic Details
Main Authors:	Li, Guangliang, Whiteson, Shimon, Knox, W. Bradley, Hung, Hayley
Other Authors:	Massachusetts Institute of Technology. Personal Robots Group
Format:	Article
Language:	English
Published:	Springer US 2016
Online Access:	http://hdl.handle.net/1721.1/103607

_version_	1826188098736750592
author	Li, Guangliang Whiteson, Shimon Knox, W. Bradley Hung, Hayley
author2	Massachusetts Institute of Technology. Personal Robots Group
author_facet	Massachusetts Institute of Technology. Personal Robots Group Li, Guangliang Whiteson, Shimon Knox, W. Bradley Hung, Hayley
author_sort	Li, Guangliang
collection	MIT
description	In this work, we address a relatively unexplored aspect of designing agents that learn from human reward. We investigate how an agent’s non-task behavior can affect a human trainer’s training and agent learning. We use the TAMER framework, which facilitates the training of agents by human-generated reward signals, i.e., judgements of the quality of the agent’s actions, as the foundation for our investigation. Then, starting from the premise that the interaction between the agent and the trainer should be bi-directional, we propose two new training interfaces to increase a human trainer’s active involvement in the training process and thereby improve the agent’s task performance. One provides information on the agent’s uncertainty which is a metric calculated as data coverage, the other on its performance. Our results from a 51-subject user study show that these interfaces can induce the trainers to train longer and give more feedback. The agent’s performance, however, increases only in response to the addition of performance-oriented information, not by sharing uncertainty levels. These results suggest that the organizational maxim about human behavior, “you get what you measure”—i.e., sharing metrics with people causes them to focus on optimizing those metrics while de-emphasizing other objectives—also applies to the training of agents. Using principle component analysis, we show how trainers in the two conditions train agents differently. In addition, by simulating the influence of the agent’s uncertainty–informative behavior on a human’s training behavior, we show that trainers could be distracted by the agent sharing its uncertainty levels about its actions, giving poor feedback for the sake of reducing the agent’s uncertainty without improving the agent’s performance.
first_indexed	2024-09-23T07:54:34Z
format	Article
id	mit-1721.1/103607
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T07:54:34Z
publishDate	2016
publisher	Springer US
record_format	dspace
spelling	mit-1721.1/1036072022-09-30T00:55:51Z Using informative behavior to increase engagement while learning from human reward Li, Guangliang Whiteson, Shimon Knox, W. Bradley Hung, Hayley Massachusetts Institute of Technology. Personal Robots Group Knox, W. Bradley In this work, we address a relatively unexplored aspect of designing agents that learn from human reward. We investigate how an agent’s non-task behavior can affect a human trainer’s training and agent learning. We use the TAMER framework, which facilitates the training of agents by human-generated reward signals, i.e., judgements of the quality of the agent’s actions, as the foundation for our investigation. Then, starting from the premise that the interaction between the agent and the trainer should be bi-directional, we propose two new training interfaces to increase a human trainer’s active involvement in the training process and thereby improve the agent’s task performance. One provides information on the agent’s uncertainty which is a metric calculated as data coverage, the other on its performance. Our results from a 51-subject user study show that these interfaces can induce the trainers to train longer and give more feedback. The agent’s performance, however, increases only in response to the addition of performance-oriented information, not by sharing uncertainty levels. These results suggest that the organizational maxim about human behavior, “you get what you measure”—i.e., sharing metrics with people causes them to focus on optimizing those metrics while de-emphasizing other objectives—also applies to the training of agents. Using principle component analysis, we show how trainers in the two conditions train agents differently. In addition, by simulating the influence of the agent’s uncertainty–informative behavior on a human’s training behavior, we show that trainers could be distracted by the agent sharing its uncertainty levels about its actions, giving poor feedback for the sake of reducing the agent’s uncertainty without improving the agent’s performance. 2016-07-14T18:54:33Z 2016-07-14T18:54:33Z 2015-08 2016-05-23T09:38:43Z Article http://purl.org/eprint/type/JournalArticle 1387-2532 1573-7454 http://hdl.handle.net/1721.1/103607 Li, Guangliang, Shimon Whiteson, W. Bradley Knox, and Hayley Hung. “Using Informative Behavior to Increase Engagement While Learning from Human Reward.” Autonomous Agents and Multi-Agent Systems (August 22, 2015). doi:10.1007/s10458-015-9308-2. en http://dx.doi.org/10.1007/s10458-015-9308-2 Autonomous Agents and Multi-Agent Systems Creative Commons Attribution http://creativecommons.org/licenses/by/4.0/ The Author(s) application/pdf Springer US Springer US
spellingShingle	Li, Guangliang Whiteson, Shimon Knox, W. Bradley Hung, Hayley Using informative behavior to increase engagement while learning from human reward
title	Using informative behavior to increase engagement while learning from human reward
title_full	Using informative behavior to increase engagement while learning from human reward
title_fullStr	Using informative behavior to increase engagement while learning from human reward
title_full_unstemmed	Using informative behavior to increase engagement while learning from human reward
title_short	Using informative behavior to increase engagement while learning from human reward
title_sort	using informative behavior to increase engagement while learning from human reward
url	http://hdl.handle.net/1721.1/103607
work_keys_str_mv	AT liguangliang usinginformativebehaviortoincreaseengagementwhilelearningfromhumanreward AT whitesonshimon usinginformativebehaviortoincreaseengagementwhilelearningfromhumanreward AT knoxwbradley usinginformativebehaviortoincreaseengagementwhilelearningfromhumanreward AT hunghayley usinginformativebehaviortoincreaseengagementwhilelearningfromhumanreward

Using informative behavior to increase engagement while learning from human reward

Similar Items