Hidden Markov Random Field for Multi-Agent Optimal Decision in Top-Coal Caving

Applying model-based learning for the optimal decision of the multi-agent system is not trivial due to the expensive price or even the impossibility of obtaining the ground truth for training the model of the complex environment. Such as learning the optimal action of hydraulic supports in the top-c...

Full description

Bibliographic Details
Main Authors: Yi Yang, Zhiwei Lin, Bingfeng Li, Xinwei Li, Lizhi Cui, Keping Wang
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9052760/
_version_ 1818558000402530304
author Yi Yang
Zhiwei Lin
Bingfeng Li
Xinwei Li
Lizhi Cui
Keping Wang
author_facet Yi Yang
Zhiwei Lin
Bingfeng Li
Xinwei Li
Lizhi Cui
Keping Wang
author_sort Yi Yang
collection DOAJ
description Applying model-based learning for the optimal decision of the multi-agent system is not trivial due to the expensive price or even the impossibility of obtaining the ground truth for training the model of the complex environment. Such as learning the optimal action of hydraulic supports in the top-coal caving, the optimal action could not accessible as the ground truth of the corresponding state in the intricate processes. Regarding the latent ground truth as the hidden variable is an effective method in the hidden Markov model. This paper extends the hidden variable of ground truth to the multi-agent system and proposes the hidden Markov random field (HMRF) model with reinforcement learning for optimizing the action decision of the multi-agent. In the HMRF model, the input states and the output actions of the multi-agent are considered as an observable random field and a latent Markov random field, respectively. Based on the HMRF model, the optimal decision is inferred by the maximum posterior probability with the prior probability obtained by Q-learning. Meanwhile, the parameters of the HMRF model are estimated by the expectation maximum algorithm. In the experiment, the top-coal caving demonstrates the effectiveness of the proposed method that the recall of top-coal is improved prominently with a very small price of increasing the rock-rate. Furthermore, the proposed method is employed to deal with the predator-preys problem in the gym. The experiment result shows that the communication between agents by the HMRF increases the reward of the preys.
first_indexed 2024-12-14T00:07:05Z
format Article
id doaj.art-b34a57531f9a4e6aa572048826ff3250
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-14T00:07:05Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-b34a57531f9a4e6aa572048826ff32502022-12-21T23:25:58ZengIEEEIEEE Access2169-35362020-01-018765967660910.1109/ACCESS.2020.29847869052760Hidden Markov Random Field for Multi-Agent Optimal Decision in Top-Coal CavingYi Yang0https://orcid.org/0000-0003-1858-4641Zhiwei Lin1https://orcid.org/0000-0002-9060-4021Bingfeng Li2https://orcid.org/0000-0002-7140-1122Xinwei Li3https://orcid.org/0000-0003-0911-5130Lizhi Cui4https://orcid.org/0000-0003-0633-2295Keping Wang5https://orcid.org/0000-0002-8138-6181School of Electrical Engineering and Automation, Henan Polytechnic University, Jiaozuo, ChinaSchool of Computing, Ulster University, Newtownabbey, U.K.School of Electrical Engineering and Automation, Henan Polytechnic University, Jiaozuo, ChinaSchool of Electrical Engineering and Automation, Henan Polytechnic University, Jiaozuo, ChinaSchool of Electrical Engineering and Automation, Henan Polytechnic University, Jiaozuo, ChinaSchool of Electrical Engineering and Automation, Henan Polytechnic University, Jiaozuo, ChinaApplying model-based learning for the optimal decision of the multi-agent system is not trivial due to the expensive price or even the impossibility of obtaining the ground truth for training the model of the complex environment. Such as learning the optimal action of hydraulic supports in the top-coal caving, the optimal action could not accessible as the ground truth of the corresponding state in the intricate processes. Regarding the latent ground truth as the hidden variable is an effective method in the hidden Markov model. This paper extends the hidden variable of ground truth to the multi-agent system and proposes the hidden Markov random field (HMRF) model with reinforcement learning for optimizing the action decision of the multi-agent. In the HMRF model, the input states and the output actions of the multi-agent are considered as an observable random field and a latent Markov random field, respectively. Based on the HMRF model, the optimal decision is inferred by the maximum posterior probability with the prior probability obtained by Q-learning. Meanwhile, the parameters of the HMRF model are estimated by the expectation maximum algorithm. In the experiment, the top-coal caving demonstrates the effectiveness of the proposed method that the recall of top-coal is improved prominently with a very small price of increasing the rock-rate. Furthermore, the proposed method is employed to deal with the predator-preys problem in the gym. The experiment result shows that the communication between agents by the HMRF increases the reward of the preys.https://ieeexplore.ieee.org/document/9052760/Hidden Markov random fieldoptimal decisionmulti-agenttop-coal caving
spellingShingle Yi Yang
Zhiwei Lin
Bingfeng Li
Xinwei Li
Lizhi Cui
Keping Wang
Hidden Markov Random Field for Multi-Agent Optimal Decision in Top-Coal Caving
IEEE Access
Hidden Markov random field
optimal decision
multi-agent
top-coal caving
title Hidden Markov Random Field for Multi-Agent Optimal Decision in Top-Coal Caving
title_full Hidden Markov Random Field for Multi-Agent Optimal Decision in Top-Coal Caving
title_fullStr Hidden Markov Random Field for Multi-Agent Optimal Decision in Top-Coal Caving
title_full_unstemmed Hidden Markov Random Field for Multi-Agent Optimal Decision in Top-Coal Caving
title_short Hidden Markov Random Field for Multi-Agent Optimal Decision in Top-Coal Caving
title_sort hidden markov random field for multi agent optimal decision in top coal caving
topic Hidden Markov random field
optimal decision
multi-agent
top-coal caving
url https://ieeexplore.ieee.org/document/9052760/
work_keys_str_mv AT yiyang hiddenmarkovrandomfieldformultiagentoptimaldecisionintopcoalcaving
AT zhiweilin hiddenmarkovrandomfieldformultiagentoptimaldecisionintopcoalcaving
AT bingfengli hiddenmarkovrandomfieldformultiagentoptimaldecisionintopcoalcaving
AT xinweili hiddenmarkovrandomfieldformultiagentoptimaldecisionintopcoalcaving
AT lizhicui hiddenmarkovrandomfieldformultiagentoptimaldecisionintopcoalcaving
AT kepingwang hiddenmarkovrandomfieldformultiagentoptimaldecisionintopcoalcaving