Sample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivity

We theoretically explore the relationship between sample-efficiency and adaptivity in reinforcement learning. An algorithm is sample-efficient if it uses a number of queries n to the environment that is polynomial in the dimension d of the problem. Adaptivity refers to the frequency at which queries...

Full description

Bibliographic Details
Main Authors:	Johnson, E, Pike-Burke, C, Rebeschini, P
Format:	Conference item
Language:	English
Published:	OpenReview 2024

_version_	1826312929679507456
author	Johnson, E Pike-Burke, C Rebeschini, P
author_facet	Johnson, E Pike-Burke, C Rebeschini, P
author_sort	Johnson, E
collection	OXFORD
description	We theoretically explore the relationship between sample-efficiency and adaptivity in reinforcement learning. An algorithm is sample-efficient if it uses a number of queries n to the environment that is polynomial in the dimension d of the problem. Adaptivity refers to the frequency at which queries are sent and feedback is processed to update the querying strategy. To investigate this interplay, we employ a learning framework that allows sending queries in K batches, with feedback being processed and queries updated after each batch. This model encompasses the whole adaptivity spectrum, ranging from non-adaptive ‘offline’ (K “ 1) to fully adaptive (K “ n) scenarios, and regimes in between. For the problems of policy evaluation and best-policy identification under d-dimensional linear function approximation, we establish Ωplog log dq lower bounds on the number of batches K required for sample-efficient algorithms with n “ Oppolypdqq queries. Our results show that just having adaptivity (K ą 1) does not necessarily guarantee sampleefficiency. Notably, the adaptivity-boundary for sample-efficiency is not between offline reinforcement learning (K “ 1), where sample-efficiency was known to not be possible, and adaptive settings. Instead, the boundary lies between different regimes of adaptivity and depends on the problem dimension.
first_indexed	2024-04-09T03:54:14Z
format	Conference item
id	oxford-uuid:0d419dfa-3a67-4520-b670-efaef0aae40e
institution	University of Oxford
language	English
last_indexed	2024-09-25T04:03:00Z
publishDate	2024
publisher	OpenReview
record_format	dspace
spelling	oxford-uuid:0d419dfa-3a67-4520-b670-efaef0aae40e2024-05-17T09:38:13ZSample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivityConference itemhttp://purl.org/coar/resource_type/c_5794uuid:0d419dfa-3a67-4520-b670-efaef0aae40eEnglishSymplectic ElementsOpenReview2024Johnson, EPike-Burke, CRebeschini, PWe theoretically explore the relationship between sample-efficiency and adaptivity in reinforcement learning. An algorithm is sample-efficient if it uses a number of queries n to the environment that is polynomial in the dimension d of the problem. Adaptivity refers to the frequency at which queries are sent and feedback is processed to update the querying strategy. To investigate this interplay, we employ a learning framework that allows sending queries in K batches, with feedback being processed and queries updated after each batch. This model encompasses the whole adaptivity spectrum, ranging from non-adaptive ‘offline’ (K “ 1) to fully adaptive (K “ n) scenarios, and regimes in between. For the problems of policy evaluation and best-policy identification under d-dimensional linear function approximation, we establish Ωplog log dq lower bounds on the number of batches K required for sample-efficient algorithms with n “ Oppolypdqq queries. Our results show that just having adaptivity (K ą 1) does not necessarily guarantee sampleefficiency. Notably, the adaptivity-boundary for sample-efficiency is not between offline reinforcement learning (K “ 1), where sample-efficiency was known to not be possible, and adaptive settings. Instead, the boundary lies between different regimes of adaptivity and depends on the problem dimension.
spellingShingle	Johnson, E Pike-Burke, C Rebeschini, P Sample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivity
title	Sample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivity
title_full	Sample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivity
title_fullStr	Sample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivity
title_full_unstemmed	Sample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivity
title_short	Sample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivity
title_sort	sample efficiency in multi batch reinforcement learning the need for dimension dependent adaptivity
work_keys_str_mv	AT johnsone sampleefficiencyinmultibatchreinforcementlearningtheneedfordimensiondependentadaptivity AT pikeburkec sampleefficiencyinmultibatchreinforcementlearningtheneedfordimensiondependentadaptivity AT rebeschinip sampleefficiencyinmultibatchreinforcementlearningtheneedfordimensiondependentadaptivity

Sample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivity

Similar Items