An intrinsically motivated learning algorithm based on Bayesian surprise for cognitive radar in autonomous vehicles

IntroductionThis paper proposes a Bayesian surprise learning algorithm that internally motivates the cognitive radar to estimate a target's state (i.e., velocity, distance) from noisy measurements and make decisions to reduce the estimation error gradually. The work exhibits how the sensor lear...

Full description

Bibliographic Details
Main Authors: Yeganeh Zamiri-Jafarian, Ming Hou, Konstantinos N. Plataniotis
Format: Article
Language:English
Published: Frontiers Media S.A. 2022-12-01
Series:Frontiers in Computer Science
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fcomp.2022.1066422/full
Description
Summary:IntroductionThis paper proposes a Bayesian surprise learning algorithm that internally motivates the cognitive radar to estimate a target's state (i.e., velocity, distance) from noisy measurements and make decisions to reduce the estimation error gradually. The work exhibits how the sensor learns from experiences, anticipates future responses, and adjusts its waveform parameters to achieve informative measurements based on the Bayesian surprise.MethodsFor a simple vehicle-following scenario where the radar measurements are generated from linear Gaussian state-space models, the article adopts the Kalman filter to carry out state estimation. According to the information within the filter's estimate, the sensor intrinsically assigns a surprise-based reward value to the immediate past action and updates the value-to-go function. Through a series of hypothetical steps, the cognitive radar considers the impact of future transmissions for a prescribed set of waveforms–available from the sensor profile library–to improve the estimation process.Results and discussionNumerous experiments investigate the performance of the proposed design for various surprise-based reward expressions. The robustness of the proposed method is compared to the state-of-the-art for practical and risky driving situations. Results show that the reward functions inspired by estimation credibility measures outperform their competitors when one-step planning is considered. Simulation results also indicate that multiple-step planning does not necessarily lead to lower error, particularly when the environment changes abruptly.
ISSN:2624-9898