Value Iteration Networks with Double Estimator for Planetary Rover Path Planning

Path planning technology is significant for planetary rovers that perform exploration missions in unfamiliar environments. In this work, we propose a novel global path planning algorithm, based on the value iteration network (VIN), which is embedded within a differentiable planning module, built on...

Full description

Bibliographic Details
Main Authors:	Xiang Jin, Wei Lan, Tianlin Wang, Pengyao Yu
Format:	Article
Language:	English
Published:	MDPI AG 2021-12-01
Series:	Sensors
Subjects:	planetary rover path planning reinforcement learning value iteration algorithm deep neural network double estimator method
Online Access:	https://www.mdpi.com/1424-8220/21/24/8418

_version_	1827669609968304128
author	Xiang Jin Wei Lan Tianlin Wang Pengyao Yu
author_facet	Xiang Jin Wei Lan Tianlin Wang Pengyao Yu
author_sort	Xiang Jin
collection	DOAJ
description	Path planning technology is significant for planetary rovers that perform exploration missions in unfamiliar environments. In this work, we propose a novel global path planning algorithm, based on the value iteration network (VIN), which is embedded within a differentiable planning module, built on the value iteration (VI) algorithm, and has emerged as an effective method to learn to plan. Despite the capability of learning environment dynamics and performing long-range reasoning, the VIN suffers from several limitations, including sensitivity to initialization and poor performance in large-scale domains. We introduce the double value iteration network (dVIN), which decouples action selection and value estimation in the VI module, using the weighted double estimator method to approximate the maximum expected value, instead of maximizing over the estimated action value. We have devised a simple, yet effective, two-stage training strategy for VI-based models to address the problem of high computational cost and poor performance in large-size domains. We evaluate the dVIN on planning problems in grid-world domains and realistic datasets, generated from terrain images of a moon landscape. We show that our dVIN empirically outperforms the baseline methods and generalize better to large-scale environments.
first_indexed	2024-03-10T03:08:32Z
format	Article
id	doaj.art-4643a7a4d0734c4385a752fc63dae820
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-03-10T03:08:32Z
publishDate	2021-12-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-4643a7a4d0734c4385a752fc63dae8202023-11-23T10:31:10ZengMDPI AGSensors1424-82202021-12-012124841810.3390/s21248418Value Iteration Networks with Double Estimator for Planetary Rover Path PlanningXiang Jin0Wei Lan1Tianlin Wang2Pengyao Yu3School of Naval Architecture and Ocean Engineering, Dalian Maritime University, Dalian 116026, ChinaSchool of Naval Architecture and Ocean Engineering, Dalian Maritime University, Dalian 116026, ChinaSchool of Naval Architecture and Ocean Engineering, Dalian Maritime University, Dalian 116026, ChinaSchool of Naval Architecture and Ocean Engineering, Dalian Maritime University, Dalian 116026, ChinaPath planning technology is significant for planetary rovers that perform exploration missions in unfamiliar environments. In this work, we propose a novel global path planning algorithm, based on the value iteration network (VIN), which is embedded within a differentiable planning module, built on the value iteration (VI) algorithm, and has emerged as an effective method to learn to plan. Despite the capability of learning environment dynamics and performing long-range reasoning, the VIN suffers from several limitations, including sensitivity to initialization and poor performance in large-scale domains. We introduce the double value iteration network (dVIN), which decouples action selection and value estimation in the VI module, using the weighted double estimator method to approximate the maximum expected value, instead of maximizing over the estimated action value. We have devised a simple, yet effective, two-stage training strategy for VI-based models to address the problem of high computational cost and poor performance in large-size domains. We evaluate the dVIN on planning problems in grid-world domains and realistic datasets, generated from terrain images of a moon landscape. We show that our dVIN empirically outperforms the baseline methods and generalize better to large-scale environments.https://www.mdpi.com/1424-8220/21/24/8418planetary rover path planningreinforcement learningvalue iteration algorithmdeep neural networkdouble estimator method
spellingShingle	Xiang Jin Wei Lan Tianlin Wang Pengyao Yu Value Iteration Networks with Double Estimator for Planetary Rover Path Planning Sensors planetary rover path planning reinforcement learning value iteration algorithm deep neural network double estimator method
title	Value Iteration Networks with Double Estimator for Planetary Rover Path Planning
title_full	Value Iteration Networks with Double Estimator for Planetary Rover Path Planning
title_fullStr	Value Iteration Networks with Double Estimator for Planetary Rover Path Planning
title_full_unstemmed	Value Iteration Networks with Double Estimator for Planetary Rover Path Planning
title_short	Value Iteration Networks with Double Estimator for Planetary Rover Path Planning
title_sort	value iteration networks with double estimator for planetary rover path planning
topic	planetary rover path planning reinforcement learning value iteration algorithm deep neural network double estimator method
url	https://www.mdpi.com/1424-8220/21/24/8418
work_keys_str_mv	AT xiangjin valueiterationnetworkswithdoubleestimatorforplanetaryroverpathplanning AT weilan valueiterationnetworkswithdoubleestimatorforplanetaryroverpathplanning AT tianlinwang valueiterationnetworkswithdoubleestimatorforplanetaryroverpathplanning AT pengyaoyu valueiterationnetworkswithdoubleestimatorforplanetaryroverpathplanning

Value Iteration Networks with Double Estimator for Planetary Rover Path Planning

Similar Items