Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning

Control intelligence is a typical field where there is a trade-off between target objectives, and researchers in this field have longed for artificial intelligence that achieves the target objectives. Multi-objective deep reinforcement learning was sufficient to satisfy this need. In particular, mul...

Full description

Bibliographic Details
Main Authors:	Man-Je Kim, Hyunsoo Park, Chang Wook Ahn
Format:	Article
Language:	English
Published:	MDPI AG 2022-03-01
Series:	Electronics
Subjects:	reinforcement learning multi-objective optimization real-time environment
Online Access:	https://www.mdpi.com/2079-9292/11/7/1069

_version_	1797439721643704320
author	Man-Je Kim Hyunsoo Park Chang Wook Ahn
author_facet	Man-Je Kim Hyunsoo Park Chang Wook Ahn
author_sort	Man-Je Kim
collection	DOAJ
description	Control intelligence is a typical field where there is a trade-off between target objectives, and researchers in this field have longed for artificial intelligence that achieves the target objectives. Multi-objective deep reinforcement learning was sufficient to satisfy this need. In particular, multi-objective deep reinforcement learning methods based on policy optimization are leading the optimization of control intelligence. However, multi-objective reinforcement learning has difficulties when finding various Pareto optimals of multi-objectives due to the greedy nature of reinforcement learning. We propose a method of policy assimilation to solve this problem. This method was applied to MO-V-MPO, one of preference-based multi-objective reinforcement learning, to increase diversity. The performance of this method has been verified through experiments in a continuous control environment.
first_indexed	2024-03-09T11:57:14Z
format	Article
id	doaj.art-1429c01f35154a499f9688c9eb8850ad
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-09T11:57:14Z
publishDate	2022-03-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-1429c01f35154a499f9688c9eb8850ad2023-11-30T23:06:57ZengMDPI AGElectronics2079-92922022-03-01117106910.3390/electronics11071069Nondominated Policy-Guided Learning in Multi-Objective Reinforcement LearningMan-Je Kim0Hyunsoo Park1Chang Wook Ahn2AI Graduate School, Gwangju Institute of Science and Technology, Gwangju 61005, KoreaNCSOFT, Seongnam-si 13494, KoreaAI Graduate School, Gwangju Institute of Science and Technology, Gwangju 61005, KoreaControl intelligence is a typical field where there is a trade-off between target objectives, and researchers in this field have longed for artificial intelligence that achieves the target objectives. Multi-objective deep reinforcement learning was sufficient to satisfy this need. In particular, multi-objective deep reinforcement learning methods based on policy optimization are leading the optimization of control intelligence. However, multi-objective reinforcement learning has difficulties when finding various Pareto optimals of multi-objectives due to the greedy nature of reinforcement learning. We propose a method of policy assimilation to solve this problem. This method was applied to MO-V-MPO, one of preference-based multi-objective reinforcement learning, to increase diversity. The performance of this method has been verified through experiments in a continuous control environment.https://www.mdpi.com/2079-9292/11/7/1069reinforcement learningmulti-objective optimizationreal-time environment
spellingShingle	Man-Je Kim Hyunsoo Park Chang Wook Ahn Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning Electronics reinforcement learning multi-objective optimization real-time environment
title	Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning
title_full	Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning
title_fullStr	Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning
title_full_unstemmed	Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning
title_short	Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning
title_sort	nondominated policy guided learning in multi objective reinforcement learning
topic	reinforcement learning multi-objective optimization real-time environment
url	https://www.mdpi.com/2079-9292/11/7/1069
work_keys_str_mv	AT manjekim nondominatedpolicyguidedlearninginmultiobjectivereinforcementlearning AT hyunsoopark nondominatedpolicyguidedlearninginmultiobjectivereinforcementlearning AT changwookahn nondominatedpolicyguidedlearninginmultiobjectivereinforcementlearning

Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning

Similar Items