Deep reinforcement learning for dynamic power allocation for non-orthogonal multiple-access (NOMA) system

With the rapid development of communication technology, various wireless terminals have been invented and applied, while the quality of communication services required by the terminals has gradually improved as well. NOMA (Non-orthogonal Multiple Access) technology was proposed to meet the chall...

Full description

Bibliographic Details
Main Author: Dong, Junyi
Other Authors: Teh Kah Chan
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/164005
Description
Summary:With the rapid development of communication technology, various wireless terminals have been invented and applied, while the quality of communication services required by the terminals has gradually improved as well. NOMA (Non-orthogonal Multiple Access) technology was proposed to meet the challenges posed by the rapid growth in the number of terminals and mobile data traffic, especially the scarcity of spectrum resources. This technology actively introduces interference in the power or code domain at the transmitter, allowing multiple users to share the same spectrum resource block. At the same time, SIC (Successive Interference Cancellation ) technology is used at the receiver to decode the correct user information. Therefore, NOMA can help increase the efficiency of spectrum utilization and system capacity. For energy and cost saving, the EE (Energy Efficiency) of NOMA systems is the focus of various studies and an important indicator of system performance. So, a reasonable allocation of channel resources while meeting the basic requirements of the NOMA system is one of the most critical problems to be solved. At the same time, ML (Machine Learning) technology has also witnessed many breakthroughs. DRL (Deep Reinforcement Learning), which is derived from RL (Reinforcement Learning) and DL (Deep Learning), has been applied in many fields with good results due to its self-learning and unsupervised advantages. This project focuses on whether the DRL method can be applied to NOMA systems for better EE. The DQN (Deep Q-learning Network)-DDPG (Deep Deterministic Policy Gradient) network is firstly constructed with subchannel assignment and transmission power allocation as the two decision variables. The DQN is responsible for deciding how the subchannels are allocated, while the DDPG generates the strategy of transmission power for each user on each subchannel. Then, the DDQN (Double Deep Q Networks)-TD3(Twin Delayed DDPG) algorithm is proposed as an optimized method which uses DDQN and TD3 instead of DQN and DDPG, respectively. It is demonstrated that this algorithm could achieve higher system EE after simulation. Besides, this project also explores the feasibility of an A3C-based resource allocation network.