Mega-reward: Achieving human-level play without extrinsic rewards
Intrinsic rewards were introduced to simulate how human intelligence works; they are usually evaluated by intrinsically-motivated play, i.e., playing games without extrinsic rewards but evaluated with extrinsic rewards. However, none of the existing intrinsic reward approaches can achieve human-leve...
Main Authors: | , , , , , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
Association for the Advancement of Artificial Intelligence
2020
|
_version_ | 1797053085423501312 |
---|---|
author | Song, Y Wang, J Lukasiewicz, T Xu, Z Zhang, S Wojcicki, A Xu, M |
author_facet | Song, Y Wang, J Lukasiewicz, T Xu, Z Zhang, S Wojcicki, A Xu, M |
author_sort | Song, Y |
collection | OXFORD |
description | Intrinsic rewards were introduced to simulate how human intelligence works; they are usually evaluated by intrinsically-motivated play, i.e., playing games without extrinsic rewards but evaluated with extrinsic rewards. However, none of the existing intrinsic reward approaches can achieve human-level performance under this very challenging setting of intrinsically-motivated play. In this work, we propose a novel megalomania-driven intrinsic reward (called mega-reward), which, to our knowledge, is the first approach that achieves human-level performance in intrinsically-motivated play. Intuitively, mega-reward comes from the observation that infants' intelligence develops when they try to gain more control on entities in an environment; therefore, mega-reward aims to maximize the control capabilities of agents on given entities in a given environment. To formalize mega-reward, a relational transition model is proposed to bridge the gaps between direct and latent control. Experimental studies show that mega-reward (i) can greatly outperform all state-of-the-art intrinsic reward approaches, (ii) generally achieves the same level of performance as Ex-PPO and professional human-level scores, and (iii) has also a superior performance when it is incorporated with extrinsic rewards. |
first_indexed | 2024-03-06T18:39:06Z |
format | Conference item |
id | oxford-uuid:0c49a2fd-6019-431d-bfe5-345d088da4ad |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-06T18:39:06Z |
publishDate | 2020 |
publisher | Association for the Advancement of Artificial Intelligence |
record_format | dspace |
spelling | oxford-uuid:0c49a2fd-6019-431d-bfe5-345d088da4ad2022-03-26T09:34:06ZMega-reward: Achieving human-level play without extrinsic rewardsConference itemhttp://purl.org/coar/resource_type/c_5794uuid:0c49a2fd-6019-431d-bfe5-345d088da4adEnglishSymplectic Elements at OxfordAssociation for the Advancement of Artificial Intelligence2020Song, YWang, JLukasiewicz, TXu, ZZhang, SWojcicki, AXu, MIntrinsic rewards were introduced to simulate how human intelligence works; they are usually evaluated by intrinsically-motivated play, i.e., playing games without extrinsic rewards but evaluated with extrinsic rewards. However, none of the existing intrinsic reward approaches can achieve human-level performance under this very challenging setting of intrinsically-motivated play. In this work, we propose a novel megalomania-driven intrinsic reward (called mega-reward), which, to our knowledge, is the first approach that achieves human-level performance in intrinsically-motivated play. Intuitively, mega-reward comes from the observation that infants' intelligence develops when they try to gain more control on entities in an environment; therefore, mega-reward aims to maximize the control capabilities of agents on given entities in a given environment. To formalize mega-reward, a relational transition model is proposed to bridge the gaps between direct and latent control. Experimental studies show that mega-reward (i) can greatly outperform all state-of-the-art intrinsic reward approaches, (ii) generally achieves the same level of performance as Ex-PPO and professional human-level scores, and (iii) has also a superior performance when it is incorporated with extrinsic rewards. |
spellingShingle | Song, Y Wang, J Lukasiewicz, T Xu, Z Zhang, S Wojcicki, A Xu, M Mega-reward: Achieving human-level play without extrinsic rewards |
title | Mega-reward: Achieving human-level play without extrinsic rewards |
title_full | Mega-reward: Achieving human-level play without extrinsic rewards |
title_fullStr | Mega-reward: Achieving human-level play without extrinsic rewards |
title_full_unstemmed | Mega-reward: Achieving human-level play without extrinsic rewards |
title_short | Mega-reward: Achieving human-level play without extrinsic rewards |
title_sort | mega reward achieving human level play without extrinsic rewards |
work_keys_str_mv | AT songy megarewardachievinghumanlevelplaywithoutextrinsicrewards AT wangj megarewardachievinghumanlevelplaywithoutextrinsicrewards AT lukasiewiczt megarewardachievinghumanlevelplaywithoutextrinsicrewards AT xuz megarewardachievinghumanlevelplaywithoutextrinsicrewards AT zhangs megarewardachievinghumanlevelplaywithoutextrinsicrewards AT wojcickia megarewardachievinghumanlevelplaywithoutextrinsicrewards AT xum megarewardachievinghumanlevelplaywithoutextrinsicrewards |