Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning

Control intelligence is a typical field where there is a trade-off between target objectives, and researchers in this field have longed for artificial intelligence that achieves the target objectives. Multi-objective deep reinforcement learning was sufficient to satisfy this need. In particular, mul...

Full description

Bibliographic Details
Main Authors: Man-Je Kim, Hyunsoo Park, Chang Wook Ahn
Format: Article
Language:English
Published: MDPI AG 2022-03-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/11/7/1069
_version_ 1797439721643704320
author Man-Je Kim
Hyunsoo Park
Chang Wook Ahn
author_facet Man-Je Kim
Hyunsoo Park
Chang Wook Ahn
author_sort Man-Je Kim
collection DOAJ
description Control intelligence is a typical field where there is a trade-off between target objectives, and researchers in this field have longed for artificial intelligence that achieves the target objectives. Multi-objective deep reinforcement learning was sufficient to satisfy this need. In particular, multi-objective deep reinforcement learning methods based on policy optimization are leading the optimization of control intelligence. However, multi-objective reinforcement learning has difficulties when finding various Pareto optimals of multi-objectives due to the greedy nature of reinforcement learning. We propose a method of policy assimilation to solve this problem. This method was applied to MO-V-MPO, one of preference-based multi-objective reinforcement learning, to increase diversity. The performance of this method has been verified through experiments in a continuous control environment.
first_indexed 2024-03-09T11:57:14Z
format Article
id doaj.art-1429c01f35154a499f9688c9eb8850ad
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-09T11:57:14Z
publishDate 2022-03-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-1429c01f35154a499f9688c9eb8850ad2023-11-30T23:06:57ZengMDPI AGElectronics2079-92922022-03-01117106910.3390/electronics11071069Nondominated Policy-Guided Learning in Multi-Objective Reinforcement LearningMan-Je Kim0Hyunsoo Park1Chang Wook Ahn2AI Graduate School, Gwangju Institute of Science and Technology, Gwangju 61005, KoreaNCSOFT, Seongnam-si 13494, KoreaAI Graduate School, Gwangju Institute of Science and Technology, Gwangju 61005, KoreaControl intelligence is a typical field where there is a trade-off between target objectives, and researchers in this field have longed for artificial intelligence that achieves the target objectives. Multi-objective deep reinforcement learning was sufficient to satisfy this need. In particular, multi-objective deep reinforcement learning methods based on policy optimization are leading the optimization of control intelligence. However, multi-objective reinforcement learning has difficulties when finding various Pareto optimals of multi-objectives due to the greedy nature of reinforcement learning. We propose a method of policy assimilation to solve this problem. This method was applied to MO-V-MPO, one of preference-based multi-objective reinforcement learning, to increase diversity. The performance of this method has been verified through experiments in a continuous control environment.https://www.mdpi.com/2079-9292/11/7/1069reinforcement learningmulti-objective optimizationreal-time environment
spellingShingle Man-Je Kim
Hyunsoo Park
Chang Wook Ahn
Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning
Electronics
reinforcement learning
multi-objective optimization
real-time environment
title Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning
title_full Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning
title_fullStr Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning
title_full_unstemmed Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning
title_short Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning
title_sort nondominated policy guided learning in multi objective reinforcement learning
topic reinforcement learning
multi-objective optimization
real-time environment
url https://www.mdpi.com/2079-9292/11/7/1069
work_keys_str_mv AT manjekim nondominatedpolicyguidedlearninginmultiobjectivereinforcementlearning
AT hyunsoopark nondominatedpolicyguidedlearninginmultiobjectivereinforcementlearning
AT changwookahn nondominatedpolicyguidedlearninginmultiobjectivereinforcementlearning