Summary: | The gain from proactive caching at mobile devices highly relies on the accurate prediction of user demands and mobility, which, however, is hard to achieve due to the random user behavior. In this paper, we leverage personalized content recommendation to reduce the uncertainty of user demands in sending requests. We formulate a joint content pushing and recommendation problem that maximizes the net profit of a mobile network operator. To cope with the challenges in modeling and learning user behavior, we establish a reinforcement learning (RL) framework to resolve the problem. To circumvent the curse of dimensionality of reinforcement learning for the joint problem, that is, with very large action and state spaces, we decompose the original problem into two RL problems, where two agents with different goals operate together, and we limit the number of possible actions in each state of the pushing agent by harnessing the well-learned recommendation policy. To enable the generalization of action values from experienced states to the unexperienced states with function approximation, we find a proper way to represent the state and action of the pushing agent. Then, we resort to double deep-Q network with dueling architecture to solve the two problems. The simulation results show that the learned recommendation and pushing policies are able to converge and can increase the net profit significantly compared with baseline policies.
|