Sample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivity

We theoretically explore the relationship between sample-efficiency and adaptivity in reinforcement learning. An algorithm is sample-efficient if it uses a number of queries n to the environment that is polynomial in the dimension d of the problem. Adaptivity refers to the frequency at which queries...

Full description

Bibliographic Details
Main Authors: Johnson, E, Pike-Burke, C, Rebeschini, P
Format: Conference item
Language:English
Published: OpenReview 2024
_version_ 1826312929679507456
author Johnson, E
Pike-Burke, C
Rebeschini, P
author_facet Johnson, E
Pike-Burke, C
Rebeschini, P
author_sort Johnson, E
collection OXFORD
description We theoretically explore the relationship between sample-efficiency and adaptivity in reinforcement learning. An algorithm is sample-efficient if it uses a number of queries n to the environment that is polynomial in the dimension d of the problem. Adaptivity refers to the frequency at which queries are sent and feedback is processed to update the querying strategy. To investigate this interplay, we employ a learning framework that allows sending queries in K batches, with feedback being processed and queries updated after each batch. This model encompasses the whole adaptivity spectrum, ranging from non-adaptive ‘offline’ (K “ 1) to fully adaptive (K “ n) scenarios, and regimes in between. For the problems of policy evaluation and best-policy identification under d-dimensional linear function approximation, we establish Ωplog log dq lower bounds on the number of batches K required for sample-efficient algorithms with n “ Oppolypdqq queries. Our results show that just having adaptivity (K ą 1) does not necessarily guarantee sampleefficiency. Notably, the adaptivity-boundary for sample-efficiency is not between offline reinforcement learning (K “ 1), where sample-efficiency was known to not be possible, and adaptive settings. Instead, the boundary lies between different regimes of adaptivity and depends on the problem dimension.
first_indexed 2024-04-09T03:54:14Z
format Conference item
id oxford-uuid:0d419dfa-3a67-4520-b670-efaef0aae40e
institution University of Oxford
language English
last_indexed 2024-09-25T04:03:00Z
publishDate 2024
publisher OpenReview
record_format dspace
spelling oxford-uuid:0d419dfa-3a67-4520-b670-efaef0aae40e2024-05-17T09:38:13ZSample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivityConference itemhttp://purl.org/coar/resource_type/c_5794uuid:0d419dfa-3a67-4520-b670-efaef0aae40eEnglishSymplectic ElementsOpenReview2024Johnson, EPike-Burke, CRebeschini, PWe theoretically explore the relationship between sample-efficiency and adaptivity in reinforcement learning. An algorithm is sample-efficient if it uses a number of queries n to the environment that is polynomial in the dimension d of the problem. Adaptivity refers to the frequency at which queries are sent and feedback is processed to update the querying strategy. To investigate this interplay, we employ a learning framework that allows sending queries in K batches, with feedback being processed and queries updated after each batch. This model encompasses the whole adaptivity spectrum, ranging from non-adaptive ‘offline’ (K “ 1) to fully adaptive (K “ n) scenarios, and regimes in between. For the problems of policy evaluation and best-policy identification under d-dimensional linear function approximation, we establish Ωplog log dq lower bounds on the number of batches K required for sample-efficient algorithms with n “ Oppolypdqq queries. Our results show that just having adaptivity (K ą 1) does not necessarily guarantee sampleefficiency. Notably, the adaptivity-boundary for sample-efficiency is not between offline reinforcement learning (K “ 1), where sample-efficiency was known to not be possible, and adaptive settings. Instead, the boundary lies between different regimes of adaptivity and depends on the problem dimension.
spellingShingle Johnson, E
Pike-Burke, C
Rebeschini, P
Sample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivity
title Sample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivity
title_full Sample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivity
title_fullStr Sample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivity
title_full_unstemmed Sample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivity
title_short Sample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivity
title_sort sample efficiency in multi batch reinforcement learning the need for dimension dependent adaptivity
work_keys_str_mv AT johnsone sampleefficiencyinmultibatchreinforcementlearningtheneedfordimensiondependentadaptivity
AT pikeburkec sampleefficiencyinmultibatchreinforcementlearningtheneedfordimensiondependentadaptivity
AT rebeschinip sampleefficiencyinmultibatchreinforcementlearningtheneedfordimensiondependentadaptivity