Joint Sub-Band and Transmission Rate Selection for Anti-Jamming Non-Contiguous Orthogonal Frequency Division Multiplexing System: An Upper Confidence Bound Based Reinforcement Learning Approach

Reinforcement Learning (RL) has been employed to assign transmission parameters to all sub-carriers in a set frequency band for anti-jamming Orthogonal Frequency Division Multiplexing (OFDM) systems. However, prior works often overlooked the influence of wireless environment fading and convergence i...

Full description

Bibliographic Details
Main Authors: Xinyi Yuan, Long Yu, Yusheng Li, Yifan Xu, Yuxin Shi
Format: Article
Language:English
Published: MDPI AG 2023-10-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/12/21/4418
Description
Summary:Reinforcement Learning (RL) has been employed to assign transmission parameters to all sub-carriers in a set frequency band for anti-jamming Orthogonal Frequency Division Multiplexing (OFDM) systems. However, prior works often overlooked the influence of wireless environment fading and convergence issues stemming from overly large parameter sets. To address these challenges, an anti-jamming scheme was proposed based on the Non-Contiguous Orthogonal Frequency Division Multiplexing (NC-OFDM) communication system integrated with reinforcement learning. First, all sub-carriers were divided into sub-bands, and a Finite State Markov Sub-bands (FSMS) model was established to describe the time-varying fading characteristics of each sub-band by combining Adaptive Modulation and Coding (AMC) technology. To mitigate instability due to the fading channel, a joint sub-band and modulation anti-jamming decision scheme was adopted, enabling the transmitter to select the optimal sub-band and transmission rate. Ultimately, this decision-making process was modeled as a Markov Decision Process (MDP), and an Upper Confidence Bound based Q-learning (UCB-Q) anti-jamming algorithm was proposed for obtaining the joint sub-band and transmission rate selection strategies. Simulation results indicate that the proposed algorithm demonstrates enhanced speed and superior average throughput. Additionally, the algorithm showcases the same commendable anti-jamming performance in scenarios with time-varying dynamic jamming.
ISSN:2079-9292