Toward a Fully-Observable Markov Decision Process With Generative Models for Integrated 6G-Non-Terrestrial Networks

The upcoming sixth generation (6G) mobile networks require integration between terrestrial mobile networks and non-terrestrial networks (NTN) such as satellites and high altitude platforms (HAPs) to ensure wide and ubiquitous coverage, high connection density, reliable communications and high data r...

Full description

Bibliographic Details
Main Authors:	A. Machumilane, P. Cassara, A. Gotta
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Open Journal of the Communications Society
Subjects:	reinforcement learning actor-critic multipath traffic scheduling
Online Access:	https://ieeexplore.ieee.org/document/10225588/

_version_	1797685396838023168
author	A. Machumilane P. Cassara A. Gotta
author_facet	A. Machumilane P. Cassara A. Gotta
author_sort	A. Machumilane
collection	DOAJ
description	The upcoming sixth generation (6G) mobile networks require integration between terrestrial mobile networks and non-terrestrial networks (NTN) such as satellites and high altitude platforms (HAPs) to ensure wide and ubiquitous coverage, high connection density, reliable communications and high data rates. The main challenge in this integration is the requirement for line-of-sight (LOS) communication between the user equipment (UE) and the satellite. In this paper, we propose a framework based on actor-critic reinforcement learning and generative models for LOS estimation and traffic scheduling on multiple links connecting a user equipment to multiple satellites in 6G-NTN integrated networks. The agent learns to estimate the LOS probabilities of the available channels and schedules traffic on appropriate links to minimise end-to-end losses with minimal bandwidth. The learning process is modelled as a partially observable Markov decision process (POMDP), since the agent can only observe the state of the channels it has just accessed. As a result, the learning agent requires a longer convergence time compared to the satellite visibility period at a given satellite elevation angle. To counteract this slow convergence, we use generative models to transform a POMDP into a fully observable Markov decision process (FOMDP). We use generative adversarial networks (GANs) and variational autoencoders (VAEs) to generate synthetic channel states of the channels that are not selected by the agent during the learning process, allowing the agent to have complete knowledge of all channels, including those that are not accessed, thus speeding up the learning process. The simulation results show that our framework enables the agent to converge in a short time and transmit with an optimal policy for most of the satellite visibility period, which significantly reduces end-to-end losses and saves bandwidth. We also show that it is possible to train generative models in real time without requiring prior knowledge of the channel models and without slowing down the learning process or affecting the accuracy of the models.
first_indexed	2024-03-12T00:43:34Z
format	Article
id	doaj.art-d2e077e12b144812b2631c5311e39ae4
institution	Directory Open Access Journal
issn	2644-125X
language	English
last_indexed	2024-03-12T00:43:34Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Open Journal of the Communications Society
spelling	doaj.art-d2e077e12b144812b2631c5311e39ae42023-09-14T23:01:51ZengIEEEIEEE Open Journal of the Communications Society2644-125X2023-01-0141913193010.1109/OJCOMS.2023.330720910225588Toward a Fully-Observable Markov Decision Process With Generative Models for Integrated 6G-Non-Terrestrial NetworksA. Machumilane0https://orcid.org/0000-0002-0260-2465P. Cassara1https://orcid.org/0000-0002-3704-4133A. Gotta2https://orcid.org/0000-0002-8134-7844Department of Information Engineering, University of Pisa, Pisa, ItalyInstitute of Information Science and Technologies, CNR, Pisa, ItalyInstitute of Information Science and Technologies, CNR, Pisa, ItalyThe upcoming sixth generation (6G) mobile networks require integration between terrestrial mobile networks and non-terrestrial networks (NTN) such as satellites and high altitude platforms (HAPs) to ensure wide and ubiquitous coverage, high connection density, reliable communications and high data rates. The main challenge in this integration is the requirement for line-of-sight (LOS) communication between the user equipment (UE) and the satellite. In this paper, we propose a framework based on actor-critic reinforcement learning and generative models for LOS estimation and traffic scheduling on multiple links connecting a user equipment to multiple satellites in 6G-NTN integrated networks. The agent learns to estimate the LOS probabilities of the available channels and schedules traffic on appropriate links to minimise end-to-end losses with minimal bandwidth. The learning process is modelled as a partially observable Markov decision process (POMDP), since the agent can only observe the state of the channels it has just accessed. As a result, the learning agent requires a longer convergence time compared to the satellite visibility period at a given satellite elevation angle. To counteract this slow convergence, we use generative models to transform a POMDP into a fully observable Markov decision process (FOMDP). We use generative adversarial networks (GANs) and variational autoencoders (VAEs) to generate synthetic channel states of the channels that are not selected by the agent during the learning process, allowing the agent to have complete knowledge of all channels, including those that are not accessed, thus speeding up the learning process. The simulation results show that our framework enables the agent to converge in a short time and transmit with an optimal policy for most of the satellite visibility period, which significantly reduces end-to-end losses and saves bandwidth. We also show that it is possible to train generative models in real time without requiring prior knowledge of the channel models and without slowing down the learning process or affecting the accuracy of the models.https://ieeexplore.ieee.org/document/10225588/reinforcement learningactor-criticmultipathtraffic scheduling
spellingShingle	A. Machumilane P. Cassara A. Gotta Toward a Fully-Observable Markov Decision Process With Generative Models for Integrated 6G-Non-Terrestrial Networks IEEE Open Journal of the Communications Society reinforcement learning actor-critic multipath traffic scheduling
title	Toward a Fully-Observable Markov Decision Process With Generative Models for Integrated 6G-Non-Terrestrial Networks
title_full	Toward a Fully-Observable Markov Decision Process With Generative Models for Integrated 6G-Non-Terrestrial Networks
title_fullStr	Toward a Fully-Observable Markov Decision Process With Generative Models for Integrated 6G-Non-Terrestrial Networks
title_full_unstemmed	Toward a Fully-Observable Markov Decision Process With Generative Models for Integrated 6G-Non-Terrestrial Networks
title_short	Toward a Fully-Observable Markov Decision Process With Generative Models for Integrated 6G-Non-Terrestrial Networks
title_sort	toward a fully observable markov decision process with generative models for integrated 6g non terrestrial networks
topic	reinforcement learning actor-critic multipath traffic scheduling
url	https://ieeexplore.ieee.org/document/10225588/
work_keys_str_mv	AT amachumilane towardafullyobservablemarkovdecisionprocesswithgenerativemodelsforintegrated6gnonterrestrialnetworks AT pcassara towardafullyobservablemarkovdecisionprocesswithgenerativemodelsforintegrated6gnonterrestrialnetworks AT agotta towardafullyobservablemarkovdecisionprocesswithgenerativemodelsforintegrated6gnonterrestrialnetworks

Toward a Fully-Observable Markov Decision Process With Generative Models for Integrated 6G-Non-Terrestrial Networks

Similar Items