Summary: | With the rapid development of communication technology, various wireless
terminals have been invented and applied, while the quality of communication
services required by the terminals has gradually improved
as well. NOMA (Non-orthogonal Multiple Access) technology was proposed
to meet the challenges posed by the rapid growth in the number
of terminals and mobile data traffic, especially the scarcity of spectrum
resources. This technology actively introduces interference in the power
or code domain at the transmitter, allowing multiple users to share the
same spectrum resource block. At the same time, SIC (Successive Interference
Cancellation ) technology is used at the receiver to decode the
correct user information. Therefore, NOMA can help increase the efficiency
of spectrum utilization and system capacity. For energy and cost
saving, the EE (Energy Efficiency) of NOMA systems is the focus of
various studies and an important indicator of system performance. So,
a reasonable allocation of channel resources while meeting the basic requirements
of the NOMA system is one of the most critical problems
to be solved. At the same time, ML (Machine Learning) technology
has also witnessed many breakthroughs. DRL (Deep Reinforcement
Learning), which is derived from RL (Reinforcement Learning) and DL
(Deep Learning), has been applied in many fields with good results due
to its self-learning and unsupervised advantages. This project focuses on
whether the DRL method can be applied to NOMA systems for better
EE. The DQN (Deep Q-learning Network)-DDPG (Deep Deterministic
Policy Gradient) network is firstly constructed with subchannel assignment
and transmission power allocation as the two decision variables.
The DQN is responsible for deciding how the subchannels are
allocated, while the DDPG generates the strategy of transmission power
for each user on each subchannel. Then, the DDQN (Double Deep Q
Networks)-TD3(Twin Delayed DDPG) algorithm is proposed as an optimized
method which uses DDQN and TD3 instead of DQN and DDPG,
respectively. It is demonstrated that this algorithm could achieve higher
system EE after simulation. Besides, this project also explores the feasibility
of an A3C-based resource allocation network.
|