Markov decision processes with unknown state feature values for safe exploration using Gaussian processes

When exploring an unknown environment, a mobile robot must decide where to observe next. It must do this whilst minimising the risk of failure, by only exploring areas that it expects to be safe. In this context, safety refers to the robot remaining in regions where critical environment features (e....

Full description

Bibliographic Details
Main Authors:	Budd, M, Lacerda, B, Duckworth, P, West, A, Lennox, B, Hawes, N
Format:	Conference item
Language:	English
Published:	Institute of Electrical and Electronics Engineers 2021

_version_	1826305629608738816
author	Budd, M Lacerda, B Duckworth, P West, A Lennox, B Hawes, N
author_facet	Budd, M Lacerda, B Duckworth, P West, A Lennox, B Hawes, N
author_sort	Budd, M
collection	OXFORD
description	When exploring an unknown environment, a mobile robot must decide where to observe next. It must do this whilst minimising the risk of failure, by only exploring areas that it expects to be safe. In this context, safety refers to the robot remaining in regions where critical environment features (e.g. terrain steepness, radiation levels) are within ranges the robot is able to tolerate. More specifically, we consider a setting where a robot explores an environment modelled with a Markov decision process, subject to bounds on the values of one or more environment features which can only be sensed at runtime. We use a Gaussian process to predict the value of the environment feature in unvisited regions, and propose an estimated Markov decision process, a model that integrates the Gaussian process predictions with the environment model transition probabilities. Building on this model, we propose an exploration algorithm that, contrary to previous approaches, considers probabilistic transitions and explicitly reasons about the uncertainty over the Gaussian process predictions. Furthermore, our approach increases the speed of exploration by selecting locations to visit further away from the currently explored area. We evaluate our approach on a real-world gamma radiation dataset, tackling the challenge of a nuclear material inspection robot exploring an a priori unknown area.
first_indexed	2024-03-07T06:35:44Z
format	Conference item
id	oxford-uuid:f790bb7f-5f16-4aa2-af7f-43f759a57648
institution	University of Oxford
language	English
last_indexed	2024-03-07T06:35:44Z
publishDate	2021
publisher	Institute of Electrical and Electronics Engineers
record_format	dspace
spelling	oxford-uuid:f790bb7f-5f16-4aa2-af7f-43f759a576482022-03-27T12:43:38ZMarkov decision processes with unknown state feature values for safe exploration using Gaussian processesConference itemhttp://purl.org/coar/resource_type/c_5794uuid:f790bb7f-5f16-4aa2-af7f-43f759a57648EnglishSymplectic ElementsInstitute of Electrical and Electronics Engineers2021Budd, MLacerda, BDuckworth, PWest, ALennox, BHawes, NWhen exploring an unknown environment, a mobile robot must decide where to observe next. It must do this whilst minimising the risk of failure, by only exploring areas that it expects to be safe. In this context, safety refers to the robot remaining in regions where critical environment features (e.g. terrain steepness, radiation levels) are within ranges the robot is able to tolerate. More specifically, we consider a setting where a robot explores an environment modelled with a Markov decision process, subject to bounds on the values of one or more environment features which can only be sensed at runtime. We use a Gaussian process to predict the value of the environment feature in unvisited regions, and propose an estimated Markov decision process, a model that integrates the Gaussian process predictions with the environment model transition probabilities. Building on this model, we propose an exploration algorithm that, contrary to previous approaches, considers probabilistic transitions and explicitly reasons about the uncertainty over the Gaussian process predictions. Furthermore, our approach increases the speed of exploration by selecting locations to visit further away from the currently explored area. We evaluate our approach on a real-world gamma radiation dataset, tackling the challenge of a nuclear material inspection robot exploring an a priori unknown area.
spellingShingle	Budd, M Lacerda, B Duckworth, P West, A Lennox, B Hawes, N Markov decision processes with unknown state feature values for safe exploration using Gaussian processes
title	Markov decision processes with unknown state feature values for safe exploration using Gaussian processes
title_full	Markov decision processes with unknown state feature values for safe exploration using Gaussian processes
title_fullStr	Markov decision processes with unknown state feature values for safe exploration using Gaussian processes
title_full_unstemmed	Markov decision processes with unknown state feature values for safe exploration using Gaussian processes
title_short	Markov decision processes with unknown state feature values for safe exploration using Gaussian processes
title_sort	markov decision processes with unknown state feature values for safe exploration using gaussian processes
work_keys_str_mv	AT buddm markovdecisionprocesseswithunknownstatefeaturevaluesforsafeexplorationusinggaussianprocesses AT lacerdab markovdecisionprocesseswithunknownstatefeaturevaluesforsafeexplorationusinggaussianprocesses AT duckworthp markovdecisionprocesseswithunknownstatefeaturevaluesforsafeexplorationusinggaussianprocesses AT westa markovdecisionprocesseswithunknownstatefeaturevaluesforsafeexplorationusinggaussianprocesses AT lennoxb markovdecisionprocesseswithunknownstatefeaturevaluesforsafeexplorationusinggaussianprocesses AT hawesn markovdecisionprocesseswithunknownstatefeaturevaluesforsafeexplorationusinggaussianprocesses

Markov decision processes with unknown state feature values for safe exploration using Gaussian processes

Similar Items