Efficiently Detecting Non-Stationary Opponents: A Bayesian Policy Reuse Approach under Partial Observability
In multi-agent domains, dealing with non-stationary opponents that change behaviors (policies) consistently over time is still a challenging problem, where an agent usually requires the ability to detect the opponent’s policy accurately and adopt the optimal response policy accordingly. Previous wor...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-07-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/12/14/6953 |