Mega-reward: Achieving human-level play without extrinsic rewards

Intrinsic rewards were introduced to simulate how human intelligence works; they are usually evaluated by intrinsically-motivated play, i.e., playing games without extrinsic rewards but evaluated with extrinsic rewards. However, none of the existing intrinsic reward approaches can achieve human-leve...

Full description

Bibliographic Details
Main Authors: Song, Y, Wang, J, Lukasiewicz, T, Xu, Z, Zhang, S, Wojcicki, A, Xu, M
Format: Conference item
Language:English
Published: Association for the Advancement of Artificial Intelligence 2020
_version_ 1797053085423501312
author Song, Y
Wang, J
Lukasiewicz, T
Xu, Z
Zhang, S
Wojcicki, A
Xu, M
author_facet Song, Y
Wang, J
Lukasiewicz, T
Xu, Z
Zhang, S
Wojcicki, A
Xu, M
author_sort Song, Y
collection OXFORD
description Intrinsic rewards were introduced to simulate how human intelligence works; they are usually evaluated by intrinsically-motivated play, i.e., playing games without extrinsic rewards but evaluated with extrinsic rewards. However, none of the existing intrinsic reward approaches can achieve human-level performance under this very challenging setting of intrinsically-motivated play. In this work, we propose a novel megalomania-driven intrinsic reward (called mega-reward), which, to our knowledge, is the first approach that achieves human-level performance in intrinsically-motivated play. Intuitively, mega-reward comes from the observation that infants' intelligence develops when they try to gain more control on entities in an environment; therefore, mega-reward aims to maximize the control capabilities of agents on given entities in a given environment. To formalize mega-reward, a relational transition model is proposed to bridge the gaps between direct and latent control. Experimental studies show that mega-reward (i) can greatly outperform all state-of-the-art intrinsic reward approaches, (ii) generally achieves the same level of performance as Ex-PPO and professional human-level scores, and (iii) has also a superior performance when it is incorporated with extrinsic rewards.
first_indexed 2024-03-06T18:39:06Z
format Conference item
id oxford-uuid:0c49a2fd-6019-431d-bfe5-345d088da4ad
institution University of Oxford
language English
last_indexed 2024-03-06T18:39:06Z
publishDate 2020
publisher Association for the Advancement of Artificial Intelligence
record_format dspace
spelling oxford-uuid:0c49a2fd-6019-431d-bfe5-345d088da4ad2022-03-26T09:34:06ZMega-reward: Achieving human-level play without extrinsic rewardsConference itemhttp://purl.org/coar/resource_type/c_5794uuid:0c49a2fd-6019-431d-bfe5-345d088da4adEnglishSymplectic Elements at OxfordAssociation for the Advancement of Artificial Intelligence2020Song, YWang, JLukasiewicz, TXu, ZZhang, SWojcicki, AXu, MIntrinsic rewards were introduced to simulate how human intelligence works; they are usually evaluated by intrinsically-motivated play, i.e., playing games without extrinsic rewards but evaluated with extrinsic rewards. However, none of the existing intrinsic reward approaches can achieve human-level performance under this very challenging setting of intrinsically-motivated play. In this work, we propose a novel megalomania-driven intrinsic reward (called mega-reward), which, to our knowledge, is the first approach that achieves human-level performance in intrinsically-motivated play. Intuitively, mega-reward comes from the observation that infants' intelligence develops when they try to gain more control on entities in an environment; therefore, mega-reward aims to maximize the control capabilities of agents on given entities in a given environment. To formalize mega-reward, a relational transition model is proposed to bridge the gaps between direct and latent control. Experimental studies show that mega-reward (i) can greatly outperform all state-of-the-art intrinsic reward approaches, (ii) generally achieves the same level of performance as Ex-PPO and professional human-level scores, and (iii) has also a superior performance when it is incorporated with extrinsic rewards.
spellingShingle Song, Y
Wang, J
Lukasiewicz, T
Xu, Z
Zhang, S
Wojcicki, A
Xu, M
Mega-reward: Achieving human-level play without extrinsic rewards
title Mega-reward: Achieving human-level play without extrinsic rewards
title_full Mega-reward: Achieving human-level play without extrinsic rewards
title_fullStr Mega-reward: Achieving human-level play without extrinsic rewards
title_full_unstemmed Mega-reward: Achieving human-level play without extrinsic rewards
title_short Mega-reward: Achieving human-level play without extrinsic rewards
title_sort mega reward achieving human level play without extrinsic rewards
work_keys_str_mv AT songy megarewardachievinghumanlevelplaywithoutextrinsicrewards
AT wangj megarewardachievinghumanlevelplaywithoutextrinsicrewards
AT lukasiewiczt megarewardachievinghumanlevelplaywithoutextrinsicrewards
AT xuz megarewardachievinghumanlevelplaywithoutextrinsicrewards
AT zhangs megarewardachievinghumanlevelplaywithoutextrinsicrewards
AT wojcickia megarewardachievinghumanlevelplaywithoutextrinsicrewards
AT xum megarewardachievinghumanlevelplaywithoutextrinsicrewards