Toward a Fully-Observable Markov Decision Process With Generative Models for Integrated 6G-Non-Terrestrial Networks

The upcoming sixth generation (6G) mobile networks require integration between terrestrial mobile networks and non-terrestrial networks (NTN) such as satellites and high altitude platforms (HAPs) to ensure wide and ubiquitous coverage, high connection density, reliable communications and high data r...

Full description

Bibliographic Details
Main Authors: A. Machumilane, P. Cassara, A. Gotta
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Open Journal of the Communications Society
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10225588/
_version_ 1797685396838023168
author A. Machumilane
P. Cassara
A. Gotta
author_facet A. Machumilane
P. Cassara
A. Gotta
author_sort A. Machumilane
collection DOAJ
description The upcoming sixth generation (6G) mobile networks require integration between terrestrial mobile networks and non-terrestrial networks (NTN) such as satellites and high altitude platforms (HAPs) to ensure wide and ubiquitous coverage, high connection density, reliable communications and high data rates. The main challenge in this integration is the requirement for line-of-sight (LOS) communication between the user equipment (UE) and the satellite. In this paper, we propose a framework based on actor-critic reinforcement learning and generative models for LOS estimation and traffic scheduling on multiple links connecting a user equipment to multiple satellites in 6G-NTN integrated networks. The agent learns to estimate the LOS probabilities of the available channels and schedules traffic on appropriate links to minimise end-to-end losses with minimal bandwidth. The learning process is modelled as a partially observable Markov decision process (POMDP), since the agent can only observe the state of the channels it has just accessed. As a result, the learning agent requires a longer convergence time compared to the satellite visibility period at a given satellite elevation angle. To counteract this slow convergence, we use generative models to transform a POMDP into a fully observable Markov decision process (FOMDP). We use generative adversarial networks (GANs) and variational autoencoders (VAEs) to generate synthetic channel states of the channels that are not selected by the agent during the learning process, allowing the agent to have complete knowledge of all channels, including those that are not accessed, thus speeding up the learning process. The simulation results show that our framework enables the agent to converge in a short time and transmit with an optimal policy for most of the satellite visibility period, which significantly reduces end-to-end losses and saves bandwidth. We also show that it is possible to train generative models in real time without requiring prior knowledge of the channel models and without slowing down the learning process or affecting the accuracy of the models.
first_indexed 2024-03-12T00:43:34Z
format Article
id doaj.art-d2e077e12b144812b2631c5311e39ae4
institution Directory Open Access Journal
issn 2644-125X
language English
last_indexed 2024-03-12T00:43:34Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Open Journal of the Communications Society
spelling doaj.art-d2e077e12b144812b2631c5311e39ae42023-09-14T23:01:51ZengIEEEIEEE Open Journal of the Communications Society2644-125X2023-01-0141913193010.1109/OJCOMS.2023.330720910225588Toward a Fully-Observable Markov Decision Process With Generative Models for Integrated 6G-Non-Terrestrial NetworksA. Machumilane0https://orcid.org/0000-0002-0260-2465P. Cassara1https://orcid.org/0000-0002-3704-4133A. Gotta2https://orcid.org/0000-0002-8134-7844Department of Information Engineering, University of Pisa, Pisa, ItalyInstitute of Information Science and Technologies, CNR, Pisa, ItalyInstitute of Information Science and Technologies, CNR, Pisa, ItalyThe upcoming sixth generation (6G) mobile networks require integration between terrestrial mobile networks and non-terrestrial networks (NTN) such as satellites and high altitude platforms (HAPs) to ensure wide and ubiquitous coverage, high connection density, reliable communications and high data rates. The main challenge in this integration is the requirement for line-of-sight (LOS) communication between the user equipment (UE) and the satellite. In this paper, we propose a framework based on actor-critic reinforcement learning and generative models for LOS estimation and traffic scheduling on multiple links connecting a user equipment to multiple satellites in 6G-NTN integrated networks. The agent learns to estimate the LOS probabilities of the available channels and schedules traffic on appropriate links to minimise end-to-end losses with minimal bandwidth. The learning process is modelled as a partially observable Markov decision process (POMDP), since the agent can only observe the state of the channels it has just accessed. As a result, the learning agent requires a longer convergence time compared to the satellite visibility period at a given satellite elevation angle. To counteract this slow convergence, we use generative models to transform a POMDP into a fully observable Markov decision process (FOMDP). We use generative adversarial networks (GANs) and variational autoencoders (VAEs) to generate synthetic channel states of the channels that are not selected by the agent during the learning process, allowing the agent to have complete knowledge of all channels, including those that are not accessed, thus speeding up the learning process. The simulation results show that our framework enables the agent to converge in a short time and transmit with an optimal policy for most of the satellite visibility period, which significantly reduces end-to-end losses and saves bandwidth. We also show that it is possible to train generative models in real time without requiring prior knowledge of the channel models and without slowing down the learning process or affecting the accuracy of the models.https://ieeexplore.ieee.org/document/10225588/reinforcement learningactor-criticmultipathtraffic scheduling
spellingShingle A. Machumilane
P. Cassara
A. Gotta
Toward a Fully-Observable Markov Decision Process With Generative Models for Integrated 6G-Non-Terrestrial Networks
IEEE Open Journal of the Communications Society
reinforcement learning
actor-critic
multipath
traffic scheduling
title Toward a Fully-Observable Markov Decision Process With Generative Models for Integrated 6G-Non-Terrestrial Networks
title_full Toward a Fully-Observable Markov Decision Process With Generative Models for Integrated 6G-Non-Terrestrial Networks
title_fullStr Toward a Fully-Observable Markov Decision Process With Generative Models for Integrated 6G-Non-Terrestrial Networks
title_full_unstemmed Toward a Fully-Observable Markov Decision Process With Generative Models for Integrated 6G-Non-Terrestrial Networks
title_short Toward a Fully-Observable Markov Decision Process With Generative Models for Integrated 6G-Non-Terrestrial Networks
title_sort toward a fully observable markov decision process with generative models for integrated 6g non terrestrial networks
topic reinforcement learning
actor-critic
multipath
traffic scheduling
url https://ieeexplore.ieee.org/document/10225588/
work_keys_str_mv AT amachumilane towardafullyobservablemarkovdecisionprocesswithgenerativemodelsforintegrated6gnonterrestrialnetworks
AT pcassara towardafullyobservablemarkovdecisionprocesswithgenerativemodelsforintegrated6gnonterrestrialnetworks
AT agotta towardafullyobservablemarkovdecisionprocesswithgenerativemodelsforintegrated6gnonterrestrialnetworks