摘要

Dynamic channel allocation (DCA) is the key technology to efficiently utilize the spectrum resources and decrease the co-channel interference for multibeam satellite systems. Most works allocate the channel on the basis of the beam traffic load or the user terminal distribution of the current moment. These greedy-like algorithms neglect the intrinsic temporal correlation among the sequential channel allocation decisions, resulting in the spectrum resources underutilization. To solve this problem, a novel deep reinforcement learning (DRL)-based DCA (DRL-DCA) algorithm is proposed. Specifically, the DCA optimization problem, which aims at minimizing the service blocking probability, is formulated in the multibeam satellite systems. Due to the temporal correlation property, the DCA optimization problem is modeled as the Markov decision process (MDP) which is the dominant analytical approach in DRL. In modeled MDP, the system state is reformulated into an image-like fashion, and then, convolutional neural network is used to extract useful features. Simulation results show that the DRL-DCA algorithm can decrease the blocking probability and improve the carried traffic and spectrum efficiency compared with other channel allocation algorithms.