Sample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivity
We theoretically explore the relationship between sample-efficiency and adaptivity in reinforcement learning. An algorithm is sample-efficient if it uses a number of queries n to the environment that is polynomial in the dimension d of the problem. Adaptivity refers to the frequency at which queries...
Main Authors: | , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
OpenReview
2024
|
_version_ | 1826312929679507456 |
---|---|
author | Johnson, E Pike-Burke, C Rebeschini, P |
author_facet | Johnson, E Pike-Burke, C Rebeschini, P |
author_sort | Johnson, E |
collection | OXFORD |
description | We theoretically explore the relationship between sample-efficiency and adaptivity in reinforcement learning. An algorithm is sample-efficient if it uses a number
of queries n to the environment that is polynomial in the dimension d of the problem. Adaptivity refers to the frequency at which queries are sent and feedback is
processed to update the querying strategy. To investigate this interplay, we employ
a learning framework that allows sending queries in K batches, with feedback being processed and queries updated after each batch. This model encompasses the
whole adaptivity spectrum, ranging from non-adaptive ‘offline’ (K “ 1) to fully
adaptive (K “ n) scenarios, and regimes in between. For the problems of policy
evaluation and best-policy identification under d-dimensional linear function approximation, we establish Ωplog log dq lower bounds on the number of batches K
required for sample-efficient algorithms with n “ Oppolypdqq queries. Our results
show that just having adaptivity (K ą 1) does not necessarily guarantee sampleefficiency. Notably, the adaptivity-boundary for sample-efficiency is not between
offline reinforcement learning (K “ 1), where sample-efficiency was known to
not be possible, and adaptive settings. Instead, the boundary lies between different
regimes of adaptivity and depends on the problem dimension. |
first_indexed | 2024-04-09T03:54:14Z |
format | Conference item |
id | oxford-uuid:0d419dfa-3a67-4520-b670-efaef0aae40e |
institution | University of Oxford |
language | English |
last_indexed | 2024-09-25T04:03:00Z |
publishDate | 2024 |
publisher | OpenReview |
record_format | dspace |
spelling | oxford-uuid:0d419dfa-3a67-4520-b670-efaef0aae40e2024-05-17T09:38:13ZSample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivityConference itemhttp://purl.org/coar/resource_type/c_5794uuid:0d419dfa-3a67-4520-b670-efaef0aae40eEnglishSymplectic ElementsOpenReview2024Johnson, EPike-Burke, CRebeschini, PWe theoretically explore the relationship between sample-efficiency and adaptivity in reinforcement learning. An algorithm is sample-efficient if it uses a number of queries n to the environment that is polynomial in the dimension d of the problem. Adaptivity refers to the frequency at which queries are sent and feedback is processed to update the querying strategy. To investigate this interplay, we employ a learning framework that allows sending queries in K batches, with feedback being processed and queries updated after each batch. This model encompasses the whole adaptivity spectrum, ranging from non-adaptive ‘offline’ (K “ 1) to fully adaptive (K “ n) scenarios, and regimes in between. For the problems of policy evaluation and best-policy identification under d-dimensional linear function approximation, we establish Ωplog log dq lower bounds on the number of batches K required for sample-efficient algorithms with n “ Oppolypdqq queries. Our results show that just having adaptivity (K ą 1) does not necessarily guarantee sampleefficiency. Notably, the adaptivity-boundary for sample-efficiency is not between offline reinforcement learning (K “ 1), where sample-efficiency was known to not be possible, and adaptive settings. Instead, the boundary lies between different regimes of adaptivity and depends on the problem dimension. |
spellingShingle | Johnson, E Pike-Burke, C Rebeschini, P Sample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivity |
title | Sample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivity |
title_full | Sample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivity |
title_fullStr | Sample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivity |
title_full_unstemmed | Sample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivity |
title_short | Sample-efficiency in multi-batch reinforcement learning: the need for dimension-dependent adaptivity |
title_sort | sample efficiency in multi batch reinforcement learning the need for dimension dependent adaptivity |
work_keys_str_mv | AT johnsone sampleefficiencyinmultibatchreinforcementlearningtheneedfordimensiondependentadaptivity AT pikeburkec sampleefficiencyinmultibatchreinforcementlearningtheneedfordimensiondependentadaptivity AT rebeschinip sampleefficiencyinmultibatchreinforcementlearningtheneedfordimensiondependentadaptivity |