A Machine Learning Approach for Beamforming in Ultra Dense Network Considering Selfish and Altruistic Strategy

Coordinated beamforming is very efficient at managing interference in ultra dense network. However, the optimal strategy remains as a challenge task to obtain due to the coupled nature among densely and autonomously deployed cells. In this paper, the deep reinforcement learning is investigated for p...

Full description

Bibliographic Details
Main Authors: Changyin Sun, Zhao Shi, Fan Jiang
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8947959/
Description
Summary:Coordinated beamforming is very efficient at managing interference in ultra dense network. However, the optimal strategy remains as a challenge task to obtain due to the coupled nature among densely and autonomously deployed cells. In this paper, the deep reinforcement learning is investigated for predicting coordinated beamforming strategy. Formulated as a sum-rate maximization problem, the optimal solution turns out as a balanced combination of selfish and altruistic beamforming. As the balancing coefficients depend on the beamforming vectors of all the cells, iterations are inevitable to get the final solution. To address this problem and improve efficiency, deep reinforcement learning (DL) is proposed to predict the balancing coefficients. Specifically, the agent, on behalf of a base station-user pair, will rely on Deep Q-network to learn the highly complex mapping between the balancing coefficients and signal-interference environment of each user. Subsequently, the beamforming vectors are obtained efficiently through the learned balancing coefficients. Due to the distinguished feature in exploration of the beamforming parameterization, the complexity problem brought by predicting the beamforming matrix directly is avoided. The performance of the proposed scheme is investigated by experiments with arguments regarding multiple input and multiple output configuration, shadow fading and state design. Simulation results indicate the facts that: 1) the theoretically infinite strategy space can be discretized with limited levels and granularity;2)it is feasible to approximate the complex mapping by Q-learning for wireless channel consisting both the large and small scale fading, 3) the balancing coefficients only concerns large scale fading, so the coordinated beamforming can be decomposed to two sub-problems with different time scales: parameterization at large time scales and instant beamforming based on balancing coefficients.
ISSN:2169-3536