Trust-aware motion planning for human-robot collaboration under distribution temporal logic specifications

Recent work has considered trust-aware decision making for human-robot collaboration (HRC) with a focus on model learning. In this paper, we are interested in enabling the HRC system to complete complex tasks specified using temporal logic formulas that involve human trust. Since accurately observin...

Full description

Bibliographic Details
Main Authors: Yu, P, Dong, S, Sheng, S, Feng, L, Kwiatkowska, M
Format: Conference item
Language:English
Published: IEEE 2024
_version_ 1826314477221445632
author Yu, P
Dong, S
Sheng, S
Feng, L
Kwiatkowska, M
author_facet Yu, P
Dong, S
Sheng, S
Feng, L
Kwiatkowska, M
author_sort Yu, P
collection OXFORD
description Recent work has considered trust-aware decision making for human-robot collaboration (HRC) with a focus on model learning. In this paper, we are interested in enabling the HRC system to complete complex tasks specified using temporal logic formulas that involve human trust. Since accurately observing human trust in robots is challenging, we adopt the widely used partially observable Markov decision process (POMDP) framework for modelling the interactions between humans and robots. To specify the desired behaviour, we propose to use syntactically co-safe linear distribution temporal logic (scLDTL), a logic that is defined over predicates of states as well as belief states of partially observable systems. The incorporation of belief predicates in scLDTL enhances its expressiveness while simultaneously introducing added complexity. This also presents a new challenge as the belief predicates must be evaluated over the continuous (infinite) belief space. To address this challenge, we present an algorithm for solving the optimal policy synthesis problem. First, we enhance the belief MDP (derived by reformulating the POMDP) with a probabilistic labelling function. Then a product belief MDP is constructed between the probabilistically labelled belief MDP and the automaton translation of the scLDTL formula. Finally, we show that the optimal policy can be obtained by leveraging existing point-based value iteration algorithms with essential modifications. Human subject experiments with 21 participants on a driving simulator demonstrate the effectiveness of the proposed approach.
first_indexed 2024-04-09T03:57:32Z
format Conference item
id oxford-uuid:323857a5-13d8-4fd3-95ca-3142e9807e5f
institution University of Oxford
language English
last_indexed 2024-09-25T04:34:41Z
publishDate 2024
publisher IEEE
record_format dspace
spelling oxford-uuid:323857a5-13d8-4fd3-95ca-3142e9807e5f2024-09-16T10:11:32ZTrust-aware motion planning for human-robot collaboration under distribution temporal logic specificationsConference itemhttp://purl.org/coar/resource_type/c_5794uuid:323857a5-13d8-4fd3-95ca-3142e9807e5fEnglishSymplectic ElementsIEEE2024Yu, PDong, SSheng, SFeng, LKwiatkowska, MRecent work has considered trust-aware decision making for human-robot collaboration (HRC) with a focus on model learning. In this paper, we are interested in enabling the HRC system to complete complex tasks specified using temporal logic formulas that involve human trust. Since accurately observing human trust in robots is challenging, we adopt the widely used partially observable Markov decision process (POMDP) framework for modelling the interactions between humans and robots. To specify the desired behaviour, we propose to use syntactically co-safe linear distribution temporal logic (scLDTL), a logic that is defined over predicates of states as well as belief states of partially observable systems. The incorporation of belief predicates in scLDTL enhances its expressiveness while simultaneously introducing added complexity. This also presents a new challenge as the belief predicates must be evaluated over the continuous (infinite) belief space. To address this challenge, we present an algorithm for solving the optimal policy synthesis problem. First, we enhance the belief MDP (derived by reformulating the POMDP) with a probabilistic labelling function. Then a product belief MDP is constructed between the probabilistically labelled belief MDP and the automaton translation of the scLDTL formula. Finally, we show that the optimal policy can be obtained by leveraging existing point-based value iteration algorithms with essential modifications. Human subject experiments with 21 participants on a driving simulator demonstrate the effectiveness of the proposed approach.
spellingShingle Yu, P
Dong, S
Sheng, S
Feng, L
Kwiatkowska, M
Trust-aware motion planning for human-robot collaboration under distribution temporal logic specifications
title Trust-aware motion planning for human-robot collaboration under distribution temporal logic specifications
title_full Trust-aware motion planning for human-robot collaboration under distribution temporal logic specifications
title_fullStr Trust-aware motion planning for human-robot collaboration under distribution temporal logic specifications
title_full_unstemmed Trust-aware motion planning for human-robot collaboration under distribution temporal logic specifications
title_short Trust-aware motion planning for human-robot collaboration under distribution temporal logic specifications
title_sort trust aware motion planning for human robot collaboration under distribution temporal logic specifications
work_keys_str_mv AT yup trustawaremotionplanningforhumanrobotcollaborationunderdistributiontemporallogicspecifications
AT dongs trustawaremotionplanningforhumanrobotcollaborationunderdistributiontemporallogicspecifications
AT shengs trustawaremotionplanningforhumanrobotcollaborationunderdistributiontemporallogicspecifications
AT fengl trustawaremotionplanningforhumanrobotcollaborationunderdistributiontemporallogicspecifications
AT kwiatkowskam trustawaremotionplanningforhumanrobotcollaborationunderdistributiontemporallogicspecifications