CN114928394B

CN114928394B - Low-orbit satellite edge computing resource allocation method with optimized energy consumption

Info

Publication number: CN114928394B
Application number: CN202210356235.9A
Authority: CN
Inventors: 吴昊南; 杨秀梅; 卜智勇; 赵宇; 唐亮
Original assignee: Shanghai Institute of Microsystem and Information Technology of CAS
Current assignee: Shanghai Institute of Microsystem and Information Technology of CAS
Priority date: 2022-04-06
Filing date: 2022-04-06
Publication date: 2024-08-02
Anticipated expiration: 2042-04-06
Also published as: CN114928394A

Abstract

The invention provides an energy consumption optimized low-orbit satellite edge computing resource allocation method, which comprises the following steps: acquiring environment state information of a dynamic low-orbit satellite edge computing network; according to the environmental state information, an optimization problem model taking the minimum system energy consumption expense as an optimization target is constructed, wherein the system energy consumption expense is a weighted sum of the processing energy consumption of the ground mobile terminal and the low-orbit satellite; based on the optimization problem model, defining a core element of the reinforcement learning model, and designing a state evaluation function to optimize a state space; solving a deep reinforcement learning model by using a deep reinforcement learning algorithm based on the optimized DQN; and acquiring an energy consumption optimized computing resource allocation strategy based on the solving result, and distributing the energy consumption optimized computing resource allocation strategy to each ground mobile terminal, the low-orbit satellite and the ground cloud server. The design of the deep reinforcement learning algorithm based on the optimized DQN solves the problem of energy consumption optimization computing resource allocation in a low-orbit satellite edge computing network, improves the computing efficiency and reduces the energy consumption expenditure of the system.

Description

An energy-optimized low-orbit satellite edge computing resource allocation method

技术领域Technical Field

本发明属于无线通信技术领域，具体涉及一种能耗优化的低轨卫星边缘计算资源分配方法。The present invention belongs to the technical field of wireless communications, and in particular relates to a low-orbit satellite edge computing resource allocation method with optimized energy consumption.

背景技术Background technique

在低轨卫星边缘计算网络中，面临的一大关键挑战是如何处理亟需能源的计算密集型任务和有限资源的计算服务提供设备之间的矛盾。然而，在目前的低轨卫星边缘计算网络研究中，通常设计仅针对地面移动终端或低轨卫星的任务处理能耗作为系统的优化目标，而忽略将其两者都纳入任务处理能耗开销。结合低轨卫星边缘计算网络场景，由于低轨卫星具有高速移动、电池容量和计算能力有限的特点，低轨卫星边缘计算网络中网络环境信息动态更新，导致环境状态信息具有较高的维度。并且，环境状态空间以及计算资源分配解空间维度随着任务、低轨卫星和地面云服务器数量增加而指数性增长，这要求计算资源分配求解方法具有一定的泛化能力和拓展性。In low-orbit satellite edge computing networks, a key challenge is how to deal with the contradiction between energy-intensive computing tasks and computing service providers with limited resources. However, in current research on low-orbit satellite edge computing networks, the task processing energy consumption of ground mobile terminals or low-orbit satellites is usually designed as the system optimization target, while ignoring the inclusion of both of them in the task processing energy consumption overhead. Combined with the low-orbit satellite edge computing network scenario, due to the characteristics of high-speed mobility, limited battery capacity and computing power of low-orbit satellites, the network environment information in the low-orbit satellite edge computing network is dynamically updated, resulting in a high dimensionality of the environmental state information. In addition, the dimensions of the environmental state space and the computing resource allocation solution space increase exponentially with the increase in the number of tasks, low-orbit satellites and ground cloud servers, which requires the computing resource allocation solution method to have a certain degree of generalization and extensibility.

目前低轨卫星边缘计算网络的研究主要以最小化卫星能耗或地面移动终端能耗为单一优化目标，尚未将其两者同时纳入系统能耗开销进行联和优化，并缺乏在低轨卫星高速移动、资源受限的情况下对计算资源分配方法进一步的研究。At present, the research on low-orbit satellite edge computing networks mainly focuses on minimizing satellite energy consumption or ground mobile terminal energy consumption as a single optimization goal. Both of them have not been simultaneously included in the system energy consumption overhead for joint optimization. There is also a lack of further research on computing resource allocation methods when low-orbit satellites move at high speed and resources are limited.

在文献[1]中，研究人员以最小化网络中的地面移动终端的能耗开销为优化目标，通过将资源分配优化问题拆分成多个凸优化问题来逐次利用基于传统优化理论的方法进行求解。在文献[2]中，研究人员在动态网络环境中以最小化地面移动终端能耗为优化目标，将非凸问题转换为线性规划问题，利用交替方向乘子法获取最优计算资源分配策略。然而，在实际低轨卫星边缘计算网络场景中，考虑到低轨卫星高速移动和有限资源的特点，上述方法难以根据动态网络环境状态进行定制化求解，易受到系统扰动影响，存在通用性和拓展性较差的问题，在计算效率上存在瓶颈。In reference [1], researchers minimized the energy consumption of ground mobile terminals in the network as the optimization goal, split the resource allocation optimization problem into multiple convex optimization problems, and solved them one by one using methods based on traditional optimization theory. In reference [2], researchers minimized the energy consumption of ground mobile terminals in a dynamic network environment as the optimization goal, converted the non-convex problem into a linear programming problem, and used the alternating direction multiplier method to obtain the optimal computing resource allocation strategy. However, in the actual low-orbit satellite edge computing network scenario, considering the characteristics of high-speed movement and limited resources of low-orbit satellites, the above methods are difficult to customize according to the dynamic network environment state, are easily affected by system disturbances, have poor versatility and scalability, and have bottlenecks in computing efficiency.

因此，如何以最小化地面移动终端和低轨卫星的加权系统能耗开销为目标，在考虑低轨卫星的高移动性，受限资源的情况下优化动态低轨卫星边缘计算网络的系统的计算资源分配是低轨卫星边缘计算网络需要考虑的关键问题。Therefore, how to minimize the weighted system energy consumption overhead of ground mobile terminals and low-orbit satellites, and optimize the computing resource allocation of the dynamic low-orbit satellite edge computing network system while taking into account the high mobility and limited resources of low-orbit satellites is a key issue that needs to be considered in the low-orbit satellite edge computing network.

参考文献：references:

[1]Z.Song,Y.Hao,Y.Liu,and X.Sun,“Energy-efficient multiaccessedgecomputing for terrestrial-satellite internet of things,”IEEE InternetofThings Journal,vol.8,no.18,pp.14 202–14 218,2021.[1]Z.Song,Y.Hao,Y.Liu,and –14 218, 2021.

[2]Q.Tang,Z.Fei,B.Li and Z.Han,"Computation Offloading in LEOSatellite Networks With Hybrid Cloud and Edge Computing,"in IEEE Internet ofThings Journal,vol.8,no.11,pp.9164-9176,1June1,2021.[2] Q.Tang, Z.Fei, B.Li and Z.Han, "Computation Offloading in LEOSatellite Networks With Hybrid Cloud and Edge Computing," in IEEE Internet ofThings Journal, vol.8, no.11, pp.9164 -9176,1June1,2021.

发明内容Summary of the invention

本发明的目的在于提供一种能耗优化的低轨卫星边缘计算资源分配方法，以在低轨卫星快速移动和资源有限的情况下，提高计算效率，降低系统能耗开销。The purpose of the present invention is to provide an energy-optimized low-orbit satellite edge computing resource allocation method to improve computing efficiency and reduce system energy consumption overhead when the low-orbit satellite moves rapidly and resources are limited.

基于上述问题，本发明提供一种能耗优化的低轨卫星边缘计算资源分配方法，包括：Based on the above problems, the present invention provides a low-orbit satellite edge computing resource allocation method with energy consumption optimization, comprising:

S1：利用智能体获取动态的低轨卫星边缘计算网络的环境状态信息；S1: Using intelligent agents to obtain environmental status information of dynamic low-orbit satellite edge computing networks;

S2：根据获取的环境状态信息，构建以最小化系统能耗开销为优化目标的优化问题模型，系统能耗开销定义为地面移动终端的任务处理能耗和低轨卫星的任务处理能耗的加权之和；S2: Based on the acquired environmental status information, an optimization problem model is constructed with the goal of minimizing the system energy consumption overhead. The system energy consumption overhead is defined as the weighted sum of the task processing energy consumption of the ground mobile terminal and the task processing energy consumption of the low-orbit satellite.

S3：基于优化问题模型，定义强化学习模型的状态空间、动作空间和收益函数，并设计状态评价函数来优化所述状态空间；S3: Based on the optimization problem model, define the state space, action space and benefit function of the reinforcement learning model, and design a state evaluation function to optimize the state space;

S4：利用基于优化DQN的深度强化学习算法求解深度强化学习模型，其中，环境状态信息经过状态评价函数映射生成的离散状态作为输入信息输入所述深度强化学习算法的网络中；S4: solving a deep reinforcement learning model using a deep reinforcement learning algorithm based on an optimized DQN, wherein a discrete state generated by mapping the environmental state information through a state evaluation function is input into a network of the deep reinforcement learning algorithm as input information;

S5：基于求解后的深度强化学习模型，获取能耗优化的计算资源分配策略，分发至各地面移动终端、低轨卫星和地面云服务器，实现计算资源分配。S5: Based on the solved deep reinforcement learning model, obtain the energy-optimized computing resource allocation strategy and distribute it to various ground mobile terminals, low-orbit satellites and ground cloud servers to realize computing resource allocation.

优选地，所述低轨卫星边缘计算网络的环境状态信息包括：地面移动终端生成的第k批次的任务集合的状态信息向量W^k、第k批次任务开始执行时各地面移动终端和低轨卫星之间的地心角向量β^k、任务开始执行时各地面移动终端和地面云服务器之间的可见性向量b^k和第k批次任务开始执行时各低轨卫星的电池使用状态信息向量U^k。Preferably, the environmental status information of the low-orbit satellite edge computing network includes: the status information vector W ^k of the kth batch of task sets generated by the ground mobile terminal, the geocentric angle vector β ^k between each ground mobile terminal and the low-orbit satellite when the kth batch of tasks starts to be executed, the visibility vector b ^k between each ground mobile terminal and the ground cloud server when the task starts to be executed, and the battery usage status information vector U ^k of each low-orbit satellite when the kth batch of tasks starts to be executed.

优选地，所述步骤S1包括：Preferably, the step S1 comprises:

步骤S11：提供由位于地面上的M个地面移动终端和J台地面云服务器、以及位于太空中的N颗低轨卫星组成的低轨卫星边缘计算网络；地面移动终端的集合、低轨卫星的集合和地面云服务器的集合分别表示为M’＝{1,...,m,...,M}，N’＝{1,...,n,...,N}和J’＝{1,...,j,...,J}，m、n、j分别表示地面移动终端的序数、低轨卫星的序数和地面云服务器的序数，M、N、J分别为地面移动终端的数量、低轨卫星的数量和地面云服务器的数量；设置每个地面移动终端每次至多能连接一颗低轨卫星；并且设置每个地面移动终端每次至多能和一台地面云服务器通过低轨卫星建立连接；Step S11: providing a low-orbit satellite edge computing network consisting of M ground mobile terminals and J ground cloud servers located on the ground, and N low-orbit satellites located in space; the set of ground mobile terminals, the set of low-orbit satellites and the set of ground cloud servers are respectively expressed as M'＝{1,...,m,...,M}, N'＝{1,...,n,...,N} and J'＝{1,...,j,...,J}, m, n, j respectively represent the ordinal number of the ground mobile terminal, the ordinal number of the low-orbit satellite and the ordinal number of the ground cloud server, and M, N, J are the number of ground mobile terminals, the number of low-orbit satellites and the number of ground cloud servers respectively; setting each ground mobile terminal to be able to connect to at most one low-orbit satellite at a time; and setting each ground mobile terminal to be able to establish a connection with at most one ground cloud server through a low-orbit satellite at a time;

步骤S12：设置每个地面移动终端在每个批次仅生成一个不可分割的计算任务；随后，将整个低轨卫星边缘计算网络需执行的任务批次的集合K表示为：K’＝{1,...,k,...,K}，k表示第k个任务批次，K为任务批次的总数量；将第m个地面移动终端的第k批次生成的任务描述为其中，表示为任务载荷的数据大小，表示为任务载荷所需的CPU处理周期数；将地面移动终端生成的第k批次的任务集合的状态信息向量W^k定义为M为地面移动终端的数量；Step S12: Set each ground mobile terminal to generate only one indivisible computing task in each batch; then, the set K of task batches to be executed by the entire low-orbit satellite edge computing network is represented as: K'={1,...,k,...,K}, k represents the kth task batch, and K is the total number of task batches; the task batch generated by the kth batch of the mth ground mobile terminal is described as in, Expressed as the data size of the task payload, It is expressed as the number of CPU processing cycles required for the task load. The state information vector ^Wk of the k-th batch of task sets generated by the ground mobile terminal is defined as M is the number of ground mobile terminals;

步骤S13：设置低轨卫星均运行在圆轨道上，将轨道高度表示为H，地球半径表示为R，地面移动终端m和低轨卫星n之间在开始执行第k批次的任务时的仰角表示为得到第k批次任务开始执行时各地面移动终端和低轨卫星之间的地心角向量β^k以及整个低轨卫星边缘计算网络的各个低轨卫星对于各个地面移动终端在执行第k批次任务的可见时长；Step S13: Set the low-orbit satellites to run on circular orbits, and represent the orbit height as H, the radius of the earth as R, and the elevation angle between the ground mobile terminal m and the low-orbit satellite n when starting to execute the kth batch of tasks as Obtain the geocentric angle vector β ^k between each ground mobile terminal and the low-orbit satellite when the k-th batch of tasks starts to be executed, as well as the visibility duration of each low-orbit satellite in the entire low-orbit satellite edge computing network to each ground mobile terminal when executing the k-th batch of tasks;

步骤S14：初始化任务开始执行时各地面移动终端和地面云服务器之间的可见性向量b^k和第k批次任务开始执行时各低轨卫星的电池使用状态信息向量U^k。Step S14: Initialize the visibility vector ^bk between each ground mobile terminal and the ground cloud server when the task starts to be executed and the battery usage status information vector ^Uk of each low-orbit satellite when the kth batch of tasks starts to be executed.

优选地，低轨卫星n对于地面移动终端m在执行第k批次任务的可见时长为：Preferably, the visible duration of the low-orbit satellite n to the ground mobile terminal m when executing the kth batch of tasks is for:

其中，T^LEO为低轨卫星的运行周期，为地面移动终端m和低轨卫星n之间的地心角；Wherein, T ^LEO is the operation period of the low-orbit satellite, is the geocentric angle between the ground mobile terminal m and the low-orbit satellite n;

地面移动终端m和低轨卫星n之间的地心角为：The geocentric angle between the ground mobile terminal m and the low-orbit satellite n for:

其中，R为地球半径，H为轨道高度，为地面移动终端m和低轨卫星n之间在开始执行第k批次的任务时的仰角；Where R is the radius of the Earth, H is the orbital altitude, is the elevation angle between the ground mobile terminal m and the low-orbit satellite n at the beginning of the kth batch of missions;

低轨卫星的运行周期T^LEO为：The operating period of a low-orbit satellite, T ^LEO , is:

其中，R为地球半径，H为轨道高度，μ表示开普勒常数。Among them, R is the radius of the earth, H is the orbital altitude, and μ represents the Kepler constant.

优选地，所述步骤S2包括：Preferably, step S2 comprises:

步骤S21：将地面移动终端生成的第k批次的任务集合的状态信息向量W^k所对应的任务调度方式向量定义为为第m个地面移动终端的第k批次生成的任务调度至低轨卫星边缘计算网络中各低轨卫星的决策向量，为将第m个地面移动终端的第k批次生成的任务调度至低轨卫星边缘计算网络中各地面云服务器的决策向量，所有地面移动终端的同一个批次的任务集合中的多个任务能够选择不同的任务调度方式；任务调度方式包括：在本地进行处理、传输至低轨卫星进行处理、和通过低轨卫星传输至地面云服务器进行处理；Step S21: The task scheduling mode vector corresponding to the state information vector ^Wk of the kth batch of task sets generated by the ground mobile terminal is defined as The tasks generated for the kth batch of the mth ground mobile terminal The decision vector for scheduling to each LEO satellite in the LEO satellite edge computing network, The task generated for the kth batch of the mth ground mobile terminal The decision vectors for scheduling to various ground cloud servers in the low-orbit satellite edge computing network. Multiple tasks in the same batch of task sets of all ground mobile terminals can choose different task scheduling methods; task scheduling methods include: local processing, transmission to low-orbit satellites for processing, and transmission to ground cloud servers via low-orbit satellites for processing;

步骤S22：根据获取的第k批次的任务集合的环境状态信息和任务调度方式向量，确定任务集合中的每一个任务的处理时延、地面移动终端的任务处理能耗和低轨卫星的任务处理能耗；Step S22: determining the processing delay of each task in the task set, the task processing energy consumption of the ground mobile terminal, and the task processing energy consumption of the low-orbit satellite according to the environmental state information and task scheduling mode vector of the task set of the kth batch obtained;

步骤S23：将地面移动终端的任务处理能耗和低轨卫星的任务处理能耗的加权之和定义为系统能耗开销，构建出以最小化系统能耗开销为优化目标的优化问题模型。Step S23: The weighted sum of the task processing energy consumption of the ground mobile terminal and the task processing energy consumption of the low-orbit satellite is defined as the system energy consumption overhead, and an optimization problem model with minimizing the system energy consumption overhead as the optimization goal is constructed.

优选地，第m个地面移动终端的第k批次生成的任务调度至低轨卫星边缘计算网络中各低轨卫星的决策向量为：Preferably, the task generated by the kth batch of the mth ground mobile terminal Decision vectors for scheduling to each LEO satellite in the LEO satellite edge computing network for:

其中，表示第m个地面移动终端的第k批次生成的任务被调度至低轨卫星n执行；表示第m个地面移动终端的第k批次生成的任务未被调度至低轨卫星n执行；in, represents the task generated by the kth batch of the mth ground mobile terminal It is dispatched to low-orbit satellite n for execution; represents the task generated by the kth batch of the mth ground mobile terminal Not dispatched to low-orbit satellite n for execution;

第m个地面移动终端的第k批次生成的任务调度至低轨卫星边缘计算网络中的各低轨卫星的决策和为 The task generated by the kth batch of the mth ground mobile terminal Decision and scheduling of each low-orbit satellite in the low-orbit satellite edge computing network for

将第m个地面移动终端的第k批次生成的任务调度至低轨卫星边缘计算网络中各地面云服务器的决策向量为：The tasks generated by the kth batch of the mth ground mobile terminal Decision vector for scheduling to various ground cloud servers in the LEO satellite edge computing network for:

其中，表示第m个地面移动终端的第k批次生成的任务通过低轨卫星n被调度至地面云服务器j执行；表示第m个地面移动终端的第k批次生成的任务未通过低轨卫星n被调度至地面云服务器j执行；in, represents the task generated by the kth batch of the mth ground mobile terminal It is dispatched to the ground cloud server j for execution via low-orbit satellite n; represents the task generated by the kth batch of the mth ground mobile terminal It is not dispatched to the ground cloud server j for execution through the low-orbit satellite n;

第m个地面移动终端的第k批次生成的任务通过低轨卫星被调度至各个地面云服务器的决策和为为 The task generated by the kth batch of the mth ground mobile terminal The decision and for

优选地，所述优化问题模型为：Preferably, the optimization problem model is:

其中，C₁、C₂、C₃、C₄、C₅分别表示第一、第二、第三、第四和第五约束条件；表示第m个地面移动终端的第k批次生成的任务被调度至低轨卫星n执行；表示第m个地面移动终端的第k批次生成的任务未被调度至低轨卫星n执行；表示第m个地面移动终端的第k批次生成的任务通过低轨卫星n被调度至地面云服务器j执行；表示第m个地面移动终端的第k批次生成的任务未通过低轨卫星n被调度至地面云服务器j执行；分别是第m个地面移动终端的第k批次生成的任务在任务调度方式为传输至低轨卫星进行处理、和通过低轨卫星传输至地面云服务器进行处理时的处理时延；为低轨卫星n对于地面移动终端m在执行第k批次任务的可见时长；为低轨卫星n为第m个地面移动终端的第k批次生成的任务分配的计算资源；z^LEO是单个低轨卫星拥有的计算资源上限；是第k批次任务开始执行时低轨卫星n的电池使用状态。Wherein, C ₁ , C ₂ , C ₃ , C ₄ , and C ₅ represent the first, second, third, fourth, and fifth constraints, respectively; represents the task generated by the kth batch of the mth ground mobile terminal It is dispatched to low-orbit satellite n for execution; represents the task generated by the kth batch of the mth ground mobile terminal Not dispatched to low-orbit satellite n for execution; represents the task generated by the kth batch of the mth ground mobile terminal It is dispatched to the ground cloud server j for execution via low-orbit satellite n; represents the task generated by the kth batch of the mth ground mobile terminal It is not dispatched to the ground cloud server j for execution through the low-orbit satellite n; They are the tasks generated by the kth batch of the mth ground mobile terminal. The processing delay when the task scheduling method is to transmit to a low-orbit satellite for processing and transmit via a low-orbit satellite to a ground cloud server for processing; is the visibility duration of low-orbit satellite n to ground mobile terminal m when performing the kth batch of tasks; The tasks generated for the kth batch of the mth ground mobile terminal for the low-orbit satellite n Allocated computing resources; z ^LEO is the upper limit of computing resources owned by a single low-orbit satellite; It is the battery usage status of low-orbit satellite n when the kth batch of missions starts to be executed.

优选地，所述强化学习模型的状态空间中的每个状态s_k包括地面移动终端生成的第k批次的任务集合的状态信息向量W^k、第k批次任务开始执行时各地面移动终端和低轨卫星之间的地心角向量β^k、任务开始执行时各地面移动终端和地面云服务器之间的可见性向量b^k和第k批次任务开始执行时各低轨卫星的电池使用状态信息向量U^k；Preferably, each state _sk in the state space of the reinforcement learning model includes a state information vector ^{Wk of} a set of tasks of the kth batch generated by a ground mobile terminal, a geocentric angle vector ^βk between each ground mobile terminal and a low-orbit satellite when the kth batch of tasks starts to be executed, a visibility vector ^bk between each ground mobile terminal and a ground cloud server when the tasks start to be executed, and a battery usage state information vector ^Uk of each low-orbit satellite when the kth batch of tasks starts to be executed;

状态评价函数g_k为：The state evaluation function g _k is:

g_k＝{g^k,1,g^k,2,g^k,3}，g _k ={g ^k,1 ,g ^k,2 ,g ^k,3 },

其中，表示状态s_k在动作a_k下不能满足低轨卫星对第m个地面移动终端的第k批次生成的任务对应的第三约束条件C₃；表示状态s_k在动作a_k下能够满足低轨卫星对第m个地面移动终端的第k批次生成的任务对应的第三约束条件C₃；表示状态s_k在动作a_k下不能满足低轨卫星n对应的第四约束条件，反之，表示状态s_k在动作a_k下不能满足低轨卫星n对应的第五约束条件，反之， in, Indicates that the state s _k cannot satisfy the task generated by the low-orbit satellite for the kth batch of the mth ground mobile terminal under the action a _k The corresponding third constraint C ₃ ; It means that the state s _k can satisfy the task generated by the low-orbit satellite for the kth batch of the mth ground mobile terminal under the action a _k . The corresponding third constraint C ₃ ; It means that the state s _k cannot satisfy the fourth constraint condition corresponding to the low-orbit satellite n under the action a _k . On the contrary, It means that the state s _k cannot satisfy the fifth constraint condition corresponding to the low-orbit satellite n under the action a _k . On the contrary,

所述强化学习模型的动作空间中的第k批次任务集合执行的动作a_k包括：The action a _k performed by the k-th batch of tasks in the action space of the reinforcement learning model includes:

a_k＝{c^k,f^k,GMT,f^k,LEO,f^k,GCS}，a _k ={c ^k ,f ^k,GMT ,f ^{k ,LEO} ,f ^{k ,GCS} },

其中，c^k表示第k批次任务集合的任务调度方式向量，f^k,GMT表示地面移动终端对第k批次任务集合中各任务分配的计算资源向量，f^k,LEO表示低轨卫星对第k批次任务集合中各任务分配的计算资源向量，f^k,GCS表示地面云服务器对第k批次任务集合中各任务分配的计算资源向量；Wherein, c ^k represents the task scheduling method vector of the k-th batch of task set, f ^k,GMT represents the computing resource vector allocated by the ground mobile terminal to each task in the k-th batch of task set, f ^k,LEO represents the computing resource vector allocated by the low-orbit satellite to each task in the k-th batch of task set, and f ^k,GCS represents the computing resource vector allocated by the ground cloud server to each task in the k-th batch of task set;

所述强化学习模型的受益函数包括瞬时收益函数和累积收益函数；The benefit function of the reinforcement learning model includes an instantaneous benefit function and a cumulative benefit function;

所述强化学习模型的瞬时收益函数r_k为：The instantaneous reward function r _k of the reinforcement learning model is:

其中，为第m个地面移动终端的第k批次生成的任务在地面移动终端的任务处理能耗，为第m个地面移动终端的第k批次生成的任务在低轨卫星的任务处理能耗；in, The tasks generated for the kth batch of the mth ground mobile terminal Task processing energy consumption in ground mobile terminals, The tasks generated for the kth batch of the mth ground mobile terminal Mission processing energy consumption in low-orbit satellites;

所述优化目标被描述成能够最大化累积收益函数的计算资源分配策略π^*，对于系统的计算资源分配策略π:S→A，执行至第k批次任务开始时的累积收益函数表示为：The optimization objective is described as a computing resource allocation strategy π ^* that can maximize the cumulative benefit function. For the computing resource allocation strategy π:S→A of the system, the cumulative benefit function executed until the start of the k-th batch of tasks is expressed as:

其中，γ∈[0,1]作为收益折扣率来映射未来收益的重要性，E_π[·]表示在可能的策略π下的期望，K表示所需处理的总任务批次数，k’表示计算过程中的任务批次，k表示当前执行任务的批次。Among them, γ∈[0,1] is used as the profit discount rate to map the importance of future profits, _Eπ [·] represents the expectation under the possible strategy π, K represents the total number of task batches to be processed, k' represents the task batch in the calculation process, and k represents the batch of the currently executed task.

在所述步骤S4中，在所述强化学习模型上引入了DNN，将利用DNN的神经网络参数θ对实际Q函数Q(s_k,a_k)进行拟合得到的拟合Q函数来对神经网络参数θ迭代更新，最终获取的拟合Q函数的最优结果为最优策略评估函数Q^*(s_k,a_k)，此时深度强化学习模型求解完成。In step S4, DNN is introduced into the reinforcement learning model, and the neural network parameter θ is iteratively updated by using the fitting Q function obtained by fitting the actual Q function Q(s _k , _ak ) using the neural network parameter θ of DNN. The optimal result of the fitting Q function finally obtained is the optimal strategy evaluation function Q ^* (s _k , _ak ), and the deep reinforcement learning model is solved at this time.

在所述步骤S5中，智能体将第k批次获取收集环境状态信息作为状态s_k输入，进行计算得到状态评价函数g_k；随后利用步骤S3建立的优化问题模型与步骤S4采用的基于优化DQN的深度强化学习算法进行求解，输出计算资源分配策略a_k＝{c^k,f^k,GMT,f^k,LEO,f^k,GCS}，得到各任务调度方式和各地面移动终端、低轨卫星和地面云服务器的计算资源分配情况{f^k ^,GMT,f^k,LEO,f^k,GCS}，并分发至各地面移动终端、低轨卫星和地面云服务器。In step S5, the intelligent agent uses the kth batch of collected environmental state information as the state _sk input, and calculates to obtain the state evaluation function _gk ; then uses the optimization problem model established in step S3 and the deep reinforcement learning algorithm based on optimized DQN adopted in step S4 to solve, and outputs the computing resource allocation strategy _ak = { ^ck , ^{fk, GMT} , fk ^{, LEO} , fk ^{, GCS} }, obtains each task scheduling mode and the computing resource allocation status of each ground mobile terminal, low-orbit satellite and ground cloud server { ^fk ^{, GMT} , ^{fk, LEO} , ^{fk, GCS} }, and distributes them to each ground mobile terminal, low-orbit satellite and ground cloud server.

本发明的方法构建以最小化地面移动终端的任务处理能耗和低轨卫星的任务处理能耗的加权系统能耗开销为目标的优化问题模型，使得智能体在考虑低轨卫星高速移动，有限能源和计算资源的情况下分发系统最优计算资源分配策略，完成任务执行，实现了低轨卫星边缘计算网络中的地面移动终端、低轨卫星和地面云服务器的计算资源分配并降低了系统能耗开销；此外，以MDP为框架定义了优化问题在强化学习模型下的核心要素，并根据系统约束设计状态评价函数优化状态空间，来获取系统的计算资源分配策略，由此，实现了高效的计算资源分配策略，提高了计算效率。此外，本发明基于优化DQN的深度强化学习算法，进一步高效地计算资源分配策略，提高了计算效率。The method of the present invention constructs an optimization problem model with the goal of minimizing the weighted system energy consumption overhead of the task processing energy consumption of the ground mobile terminal and the task processing energy consumption of the low-orbit satellite, so that the intelligent agent distributes the system's optimal computing resource allocation strategy under the condition of considering the high-speed movement of the low-orbit satellite, limited energy and computing resources, completes the task execution, realizes the computing resource allocation of the ground mobile terminal, low-orbit satellite and ground cloud server in the low-orbit satellite edge computing network and reduces the system energy consumption overhead; in addition, the core elements of the optimization problem under the reinforcement learning model are defined with MDP as the framework, and the state evaluation function is designed according to the system constraints to optimize the state space to obtain the system's computing resource allocation strategy, thereby realizing an efficient computing resource allocation strategy and improving computing efficiency. In addition, the present invention is based on the deep reinforcement learning algorithm that optimizes DQN, further efficiently calculates the resource allocation strategy and improves computing efficiency.

综上，本发明设计基于优化DQN的深度强化学习算法解决了低轨卫星边缘计算网络中能耗优化的低轨卫星边缘计算资源分配问题，提高了计算效率，降低了系统能耗开销。In summary, the deep reinforcement learning algorithm designed by the present invention based on the optimized DQN solves the problem of low-orbit satellite edge computing resource allocation with energy consumption optimization in the low-orbit satellite edge computing network, improves computing efficiency, and reduces system energy consumption overhead.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明的能耗优化的低轨卫星边缘计算资源分配方法的流程图。FIG1 is a flow chart of the energy-optimized low-orbit satellite edge computing resource allocation method of the present invention.

图2是本发明的能耗优化的低轨卫星边缘计算资源分配方法的智能体的计算架构示意图。FIG2 is a schematic diagram of the computing architecture of the intelligent agent of the energy-optimized low-orbit satellite edge computing resource allocation method of the present invention.

图3是本发明的能耗优化的低轨卫星边缘计算资源分配方法的实验场景示例图。FIG3 is an example diagram of an experimental scenario of the energy-optimized low-orbit satellite edge computing resource allocation method of the present invention.

图4是低轨卫星的圆轨道模型图。FIG. 4 is a diagram of a circular orbit model of a low-orbit satellite.

具体实施方式Detailed ways

下面对本发明的实施例作详细说明，本实施例在以本发明技术方案为前提下进行实施，给出了详细的实施方式和具体的操作过程，但本发明的保护范围不限于下述的实施例。The following is a detailed description of an embodiment of the present invention. This embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation method and a specific operation process are given, but the protection scope of the present invention is not limited to the following embodiment.

本发明针对现有技术的不足，提出了一种能耗优化的低轨卫星边缘计算资源分配方法。本发明的能耗优化的低轨卫星边缘计算资源分配方法以最小化地面移动终端和低轨卫星能耗的加权系统能耗开销为优化目标，该方法利用动态低轨卫星边缘计算网络中的地面移动终端、低轨卫星和地面云服务器进行计算资源分配，设计合理的强化学习模型核心要素和状态评价函数简化状态空间，基于优化DQN的深度强化学习算法，获取优化的计算资源分配策略，并进行策略分发。In view of the shortcomings of the prior art, the present invention proposes an energy-optimized low-orbit satellite edge computing resource allocation method. The energy-optimized low-orbit satellite edge computing resource allocation method of the present invention takes minimizing the weighted system energy consumption overhead of the ground mobile terminal and the low-orbit satellite as the optimization goal. The method uses the ground mobile terminal, low-orbit satellite and ground cloud server in the dynamic low-orbit satellite edge computing network to allocate computing resources, designs reasonable reinforcement learning model core elements and state evaluation functions to simplify the state space, obtains the optimized computing resource allocation strategy based on the deep reinforcement learning algorithm of the optimized DQN, and distributes the strategy.

如图1所示，本发明的能耗优化的低轨卫星边缘计算资源分配方法的具体步骤如下：As shown in FIG1 , the specific steps of the energy consumption optimized low-orbit satellite edge computing resource allocation method of the present invention are as follows:

步骤S1：利用智能体获取动态的低轨卫星边缘计算网络的环境状态信息；Step S1: Using the intelligent agent to obtain the environmental status information of the dynamic low-orbit satellite edge computing network;

其中，智能体位于地面和卫星上均可，通常位于地面上。在本实施例中，智能体优选为地面云服务器。The intelligent agent can be located on the ground or on a satellite, and is usually located on the ground. In this embodiment, the intelligent agent is preferably a ground cloud server.

本发明考虑的系统，即低轨卫星边缘计算网络由位于地面上的M个地面移动终端和J台地面云服务器、以及位于太空中的N颗低轨卫星组成，地面移动终端的集合、低轨卫星的集合和地面云服务器的集合可以分别表示为M’＝{1,...,m,...,M}，N’＝{1,...,n,...,N}和J’＝{1,...,j,...,J}，m、n、j分别表示地面移动终端的序数、低轨卫星的序数和地面云服务器的序数，M、N、J分别为地面移动终端的数量、低轨卫星的数量和地面云服务器的数量。The system considered in the present invention, namely, the low-orbit satellite edge computing network is composed of M ground mobile terminals and J ground cloud servers located on the ground, and N low-orbit satellites located in space. The set of ground mobile terminals, the set of low-orbit satellites and the set of ground cloud servers can be expressed as M'={1,...,m,...,M}, N'={1,...,n,...,N} and J'={1,...,j,...,J}, respectively, where m, n, j represent the ordinal number of the ground mobile terminal, the ordinal number of the low-orbit satellite and the ordinal number of the ground cloud server, respectively, and M, N, J are the number of ground mobile terminals, the number of low-orbit satellites and the number of ground cloud servers, respectively.

所述低轨卫星边缘计算网络的环境状态信息包括：地面移动终端生成的第k批次的任务集合的状态信息向量W^k，其用于确定地面移动终端生成的任务的状态信息向量；第k批次任务开始执行时各地面移动终端和低轨卫星之间的地心角向量β^k，其用于确定低轨卫星的覆盖情况；任务开始执行时各地面移动终端和地面云服务器之间的可见性向量b^k，其用于反映地面云服务器对任务的可见性；和第k批次任务开始执行时各低轨卫星的电池使用状态信息向量U^k，其用于反映低轨卫星的电池使用状态。The environmental status information of the low-orbit satellite edge computing network includes: a status information vector W ^k of a task set of the kth batch generated by a ground mobile terminal, which is used to determine the status information vector of the task generated by the ground mobile terminal; a geocentric angle vector β ^k between each ground mobile terminal and the low-orbit satellite when the kth batch of tasks starts to be executed, which is used to determine the coverage of the low-orbit satellite; a visibility vector b ^k between each ground mobile terminal and the ground cloud server when the task starts to be executed, which is used to reflect the visibility of the ground cloud server to the task; and a battery usage status information vector U ^k of each low-orbit satellite when the kth batch of tasks starts to be executed, which is used to reflect the battery usage status of the low-orbit satellite.

这是由于，第m个地面移动终端的第k批次生成的任务的计算资源分配策略取决于地面移动终端生成的任务的状态信息向量(即地面移动终端生成的第k批次的任务集合的状态信息向量W^k)、低轨卫星的覆盖情况(即第k批次任务开始执行时各地面移动终端和低轨卫星之间的地心角向量β^k)、地面云服务器对任务的可见性(即任务开始执行时各地面移动终端和地面云服务器之间的可见性向量b^k)和低轨卫星的电池使用状态(即第k批次任务开始执行时各低轨卫星的电池使用状态信息向量U^k)。This is because the task generated by the kth batch of the mth ground mobile terminal The computing resource allocation strategy depends on the state information vector of the task generated by the ground mobile terminal (i.e., the state information vector W ^k of the task set of the kth batch generated by the ground mobile terminal), the coverage of the low-orbit satellite (i.e., the geocentric angle vector β ^k between each ground mobile terminal and the low-orbit satellite when the kth batch of tasks starts to be executed), the visibility of the ground cloud server to the task (i.e., the visibility vector b ^k between each ground mobile terminal and the ground cloud server when the task starts to be executed) and the battery usage status of the low-orbit satellite (i.e., the battery usage status information vector U ^k of each low-orbit satellite when the kth batch of tasks starts to be executed).

在所述步骤S1中，在获取所述低轨卫星边缘计算网络的环境状态信息，包括：In the step S1, obtaining the environmental status information of the low-orbit satellite edge computing network includes:

步骤S11：提供由位于地面上的M个地面移动终端和J台地面云服务器、以及位于太空中的N颗低轨卫星组成的低轨卫星边缘计算网络，地面移动终端和低轨卫星均具有处理任务的移动边缘计算能力，地面云服务器具有计算能力；设置每个地面移动终端每次至多能连接一颗低轨卫星；并且设置每个地面移动终端每次至多能和一台地面云服务器通过低轨卫星实现可见的星地传输链路中转，进而通过低轨卫星建立连接。Step S11: Provide a low-orbit satellite edge computing network consisting of M ground mobile terminals and J ground cloud servers located on the ground, and N low-orbit satellites located in space, wherein the ground mobile terminals and the low-orbit satellites have mobile edge computing capabilities for processing tasks, and the ground cloud servers have computing capabilities; each ground mobile terminal is configured to be able to connect to at most one low-orbit satellite at a time; and each ground mobile terminal is configured to be able to connect to at most one ground cloud server at a time through a low-orbit satellite to achieve visible satellite-to-ground transmission link transfer, and then establish a connection through the low-orbit satellite.

步骤S12：设置每个地面移动终端在每个批次仅生成一个不可分割的计算任务；随后，将整个低轨卫星边缘计算网络需执行的任务批次的集合K表示为：K’＝{1,...,k,...,K}，k表示第k个任务批次，K为任务批次的总数量；将第m个地面移动终端的第k批次生成的任务描述为其中，表示为任务载荷的数据大小，表示为任务载荷所需的CPU处理周期数。随后，将地面移动终端生成的第k批次的任务集合的状态信息向量W^k定义为M为地面移动终端的数量。Step S12: Set each ground mobile terminal to generate only one indivisible computing task in each batch; then, the set K of task batches to be executed by the entire low-orbit satellite edge computing network is represented as: K'={1,...,k,...,K}, k represents the kth task batch, and K is the total number of task batches; the task batch generated by the kth batch of the mth ground mobile terminal is described as in, Expressed as the data size of the task payload, It is represented by the number of CPU processing cycles required for the task load. Then, the state information vector W ^k of the task set of the kth batch generated by the ground mobile terminal is defined as M is the number of ground mobile terminals.

步骤S13：考虑到低轨卫星在实际场景的高速移动性，设置低轨卫星均运行在圆轨道上，将轨道高度表示为H，地球半径表示为R，地面移动终端m和低轨卫星n之间在开始执行第k批次的任务时的仰角表示为得到第k批次任务开始执行时各地面移动终端和低轨卫星之间的地心角向量β^k以及相应的整个低轨卫星边缘计算网络的各个低轨卫星对于各个地面移动终端在执行第k批次任务的可见时长，从而确定低轨卫星的覆盖情况。Step S13: Considering the high-speed mobility of low-orbit satellites in actual scenarios, all low-orbit satellites are set to run on circular orbits, the orbit height is represented as H, the earth radius is represented as R, and the elevation angle between the ground mobile terminal m and the low-orbit satellite n at the beginning of the kth batch of tasks is represented as The geocentric angle vector ^βk between each ground mobile terminal and the low-orbit satellite when the k-th batch of tasks starts to be executed is obtained, as well as the corresponding visibility duration of each low-orbit satellite in the entire low-orbit satellite edge computing network to each ground mobile terminal when executing the k-th batch of tasks, so as to determine the coverage of the low-orbit satellite.

此时，地面移动终端m和低轨卫星n之间的地心角可以表示为：At this time, the geocentric angle between the ground mobile terminal m and the low-orbit satellite n is It can be expressed as:

其中，R为地球半径，H为轨道高度，为地面移动终端m和低轨卫星n之间在开始执行第k批次的任务时的仰角，m、n分别为地面移动终端和低轨卫星的序数。Where R is the radius of the Earth, H is the orbital altitude, is the elevation angle between the ground mobile terminal m and the low-orbit satellite n when the kth batch of missions begins. m and n are the ordinal numbers of the ground mobile terminal and the low-orbit satellite, respectively.

第k批次任务开始执行时各地面移动终端和低轨卫星之间的地心角向量β^k可以表示为：When the kth batch of tasks begins to execute, the geocentric angle vector β ^k between each ground mobile terminal and the low-orbit satellite can be expressed as:

对于位于轨道高度H的低轨卫星，该低轨卫星的运行周期T^LEO为：For a low-orbit satellite at an orbital height H, the operating period T ^LEO of the low-orbit satellite is:

因此，低轨卫星n对于地面移动终端m在执行第k批次任务的可见时长可以表示为：Therefore, the visible time of low-orbit satellite n to ground mobile terminal m when performing the kth batch of tasks is It can be expressed as:

其中，T^LEO为低轨卫星的运行周期，为地面移动终端m和低轨卫星n之间的地心角。Wherein, T ^LEO is the operation period of the low-orbit satellite, is the geocentric angle between the ground mobile terminal m and the low-orbit satellite n.

低轨卫星n对于地面云服务器j开始执行第m个地面移动终端的第k批次生成的任务时的可见性可以表示为其中，表示地面云服务器j可用于处理第m个地面移动终端的第k批次生成的任务k表示任务批次，m、n、j分别表示地面移动终端的序数、低轨卫星的序数和地面云服务器的序数。相应地，可以根据低轨卫星n对于地面云服务器j开始执行第m个地面移动终端的第k批次生成的任务时的可见性以及第k批次任务开始执行时各地面移动终端和低轨卫星之间的地心角向量β^k，来得到第k批次任务开始执行时各地面移动终端和地面云服务器之间的可见性向量b^k。在低轨卫星和地面云服务器之间的可见性成立的前提下(低轨卫星和地面云服务器之间可见)，第k批次任务开始执行时地面移动终端在低轨卫星在服务覆盖范围内，则确定第k批次任务开始执行时地面移动终端和地面云服务器之间的可见性为1，否则，第k批次任务开始执行时地面移动终端和地面云服务器之间的可见性为0。Low-orbit satellite n starts executing the kth batch of tasks generated by the mth ground mobile terminal for the ground cloud server j The visibility when can be expressed as in, Indicates that the ground cloud server j can be used to process the kth batch of tasks generated by the mth ground mobile terminal k represents the task batch, m, n, j represent the ordinal number of the ground mobile terminal, the ordinal number of the low-orbit satellite, and the ordinal number of the ground cloud server respectively. Accordingly, the task generated by the kth batch of the mth ground mobile terminal can be started for the ground cloud server j according to the low-orbit satellite n. Visibility and the geocentric angle vector β ^k between each ground mobile terminal and the low-orbit satellite when the k-th batch of tasks starts to be executed, to obtain the visibility vector b ^k between each ground mobile terminal and the ground cloud server when the k-th batch of tasks starts to be executed. Under the premise that the visibility between the low-orbit satellite and the ground cloud server is established (the low-orbit satellite and the ground cloud server are visible), when the k-th batch of tasks starts to be executed, if the ground mobile terminal is within the service coverage of the low-orbit satellite, then the visibility between the ground mobile terminal and the ground cloud server is determined to be 1 when the k-th batch of tasks starts to be executed, otherwise, the visibility between the ground mobile terminal and the ground cloud server is 0 when the k-th batch of tasks starts to be executed.

第k批次任务开始执行时低轨卫星n的电池使用状态可以表示为整个低轨卫星边缘计算网络中，第k批次任务开始执行时各低轨卫星的电池使用状态信息向量U^k可以表示为 The battery usage status of low-orbit satellite n when the kth batch of missions begins to execute can be expressed as In the entire LEO satellite edge computing network, the battery usage status information vector U ^k of each LEO satellite when the kth batch of tasks starts to execute can be expressed as

步骤S2：根据获取的环境状态信息，构建以最小化系统能耗开销为优化目标的问题模型，系统能耗开销定义为地面移动终端的任务处理能耗和低轨卫星的任务处理能耗的加权之和。Step S2: Based on the acquired environmental status information, a problem model is constructed with minimizing the system energy consumption overhead as the optimization goal. The system energy consumption overhead is defined as the weighted sum of the task processing energy consumption of the ground mobile terminal and the task processing energy consumption of the low-orbit satellite.

所述步骤S2包括：The step S2 comprises:

步骤S21：将所有地面移动终端的第k批次的任务集合的状态信息向量W^k所对应的任务调度方式向量定义为为第m个地面移动终端的第k批次生成的任务调度至低轨卫星边缘计算网络中各低轨卫星的决策向量，为将第m个地面移动终端的第k批次生成的任务调度至低轨卫星边缘计算网络中各地面云服务器的决策向量，所有地面移动终端的同一个批次(例如第k批次)的任务集合中的多个任务能够选择不同的任务调度方式。Step S21: The task scheduling mode vector corresponding to the state information vector ^Wk of the task set of the kth batch of all ground mobile terminals is defined as The tasks generated for the kth batch of the mth ground mobile terminal The decision vector for scheduling to each LEO satellite in the LEO satellite edge computing network, The task generated for the kth batch of the mth ground mobile terminal The decision vectors for scheduling to various ground cloud servers in the low-orbit satellite edge computing network, multiple tasks in the same batch (for example, the kth batch) of task sets of all ground mobile terminals can choose different task scheduling methods.

根据不同网络环境和任务需求，任务调度方式包括：在本地进行处理、传输至低轨卫星进行处理、和通过低轨卫星传输至地面云服务器进行处理。也就是说，第m个地面移动终端的第k批次生成的任务可以选择在本地进行处理、传输至低轨卫星进行处理或通过低轨卫星传输至地面云服务器进行处理。According to different network environments and task requirements, the task scheduling methods include: local processing, transmission to low-orbit satellites for processing, and transmission to ground cloud servers via low-orbit satellites for processing. In other words, the task generated by the kth batch of the mth ground mobile terminal You can choose to process it locally, transmit it to a low-orbit satellite for processing, or transmit it to a ground cloud server via a low-orbit satellite for processing.

对于低轨卫星边缘计算网络中的所有地面移动终端的第k批次的任务集合的状态信息向量W^k，描述对应的任务调度方式向量可以表示为：For the state information vector W ^k of the task set of the kth batch of all ground mobile terminals in the low-orbit satellite edge computing network, the corresponding task scheduling method vector is described It can be expressed as:

为第m个地面移动终端的第k批次生成的任务调度至低轨卫星边缘计算网络中各低轨卫星的决策向量。 The tasks generated for the kth batch of the mth ground mobile terminal Decision vector for scheduling to each LEO satellite in the LEO satellite edge computing network.

其中，将第m个地面移动终端的第k批次生成的任务调度至低轨卫星边缘计算网络中各低轨卫星的决策向量可以表示为：Among them, the tasks generated by the kth batch of the mth ground mobile terminal Decision vectors for scheduling to each LEO satellite in the LEO satellite edge computing network It can be expressed as:

其中，表示第m个地面移动终端的第k批次生成的任务被调度至低轨卫星n执行；表示任务未被调度至低轨卫星n执行。in, represents the task generated by the kth batch of the mth ground mobile terminal It is dispatched to low-orbit satellite n for execution; Indicates the task Not scheduled to execute on low-orbit satellite n.

因此，第m个地面移动终端的第k批次生成的任务调度至低轨卫星边缘计算网络中的各低轨卫星的决策和可以表示为 Therefore, the task generated by the kth batch of the mth ground mobile terminal Decision and scheduling of each low-orbit satellite in the low-orbit satellite edge computing network It can be expressed as

其中，将第m个地面移动终端的第k批次生成的任务调度至低轨卫星边缘计算网络中各地面云服务器的决策向量可以表示为：Among them, the tasks generated by the kth batch of the mth ground mobile terminal Decision vector for scheduling to various ground cloud servers in the LEO satellite edge computing network It can be expressed as:

其中，表示第m个地面移动终端的第k批次生成的任务通过低轨卫星n被调度至地面云服务器j执行；表示第m个地面移动终端的第k批次生成的任务未通过低轨卫星n被调度至地面云服务器j执行。in, represents the task generated by the kth batch of the mth ground mobile terminal It is dispatched to the ground cloud server j for execution via low-orbit satellite n; represents the task generated by the kth batch of the mth ground mobile terminal The task that is not dispatched through the low-orbit satellite n is dispatched to the ground cloud server j for execution.

因此，第m个地面移动终端的第k批次生成的任务通过低轨卫星被调度至各个地面云服务器的决策和可以表示为 Therefore, the task generated by the kth batch of the mth ground mobile terminal The decision and It can be expressed as

由于对于任何的m、k，第m个地面移动终端的第k批次生成的任务每次仅能选择一种任务调度方式，因此，可以得到：Since for any m, k, the k-th batch of tasks generated by the m-th ground mobile terminal Only one task scheduling method can be selected at a time, so we can get:

下面以第m个地面移动终端的第k批次生成的任务为例，说明第k批次的任务集合中的每一个任务所对应的处理时延、地面移动终端的任务处理能耗和低轨卫星的任务处理能耗。The following is the task generated by the kth batch of the mth ground mobile terminal Taking as an example, the processing delay corresponding to each task in the task set of the kth batch, the task processing energy consumption of the ground mobile terminal and the task processing energy consumption of the low-orbit satellite are explained.

(a)具体地，当第m个地面移动终端的第k批次生成的任务选择在本地执行的策略时，可以得到地面移动终端在执行本地任务分配的计算资源表示为那么低轨卫星边缘计算网络中，地面移动终端对第k批次任务集合中各任务分配的计算资源向量可以表示为：(a) Specifically, when the kth batch of tasks generated by the mth ground mobile terminal When you choose to execute the strategy locally, you can get The computing resources allocated to the ground mobile terminal when executing local tasks are expressed as Then in the low-orbit satellite edge computing network, the computing resource vector allocated by the ground mobile terminal to each task in the k-th batch task set can be expressed as:

其中，为第m个地面移动终端的第k批次生成的任务在执行本地任务分配的计算资源。in, The tasks generated for the kth batch of the mth ground mobile terminal Computational resources allocated in executing local tasks.

需要说明的是，若一部分任务采用了其他非本地执行的策略，这个采用了其他策略的任务的终端计算资源依然用此表示，只是对应的终端计算资源为0。It should be noted that if a part of the tasks adopts other non-local execution strategies, the terminal computing resources of the tasks adopting other strategies are still represented by this, but the corresponding terminal computing resources are 0.

此时，第m个地面移动终端的第k批次生成的任务的处理时延等于该任务的计算时延可以表示为第m个地面移动终端的第k批次生成的任务的任务处理能耗等于地面移动终端的任务处理能耗也等于地面移动终端的任务计算能耗即其中ζ表示芯片能耗系数，芯片能耗系数ζ用于计算任务处理能耗。At this time, the task generated by the kth batch of the mth ground mobile terminal Processing delay Equal to the task The computational delay It can be expressed as The task generated by the kth batch of the mth ground mobile terminal Task processing energy consumption Equal to the task processing energy consumption of ground mobile terminals It is also equal to the task computing energy consumption of the ground mobile terminal Right now Where ζ represents the chip energy consumption coefficient, which is used to calculate the task processing energy consumption.

(b)具体地，当第m个地面移动终端的第k批次生成的任务选择被调度至低轨卫星的策略时，可以得到第m个地面移动终端的第k批次生成的任务调度至低轨卫星边缘计算网络中的各低轨卫星的决策和低轨卫星n为第m个地面移动终端的第k批次生成的任务分配的计算资源表示为那么低轨卫星对第k批次任务集合中各任务分配的计算资源向量可以表示为由于各低轨卫星的计算资源有限，分配给各任务的计算资源和不能超过低轨卫星拥有的计算资源地面移动终端和执行任务的低轨卫星之间的传播时延任务上传至低轨卫星的传输时延以及执行任务的低轨卫星的任务计算时延即第m个地面移动终端的第k批次生成的任务的任务处理能耗包括地面移动终端的任务处理能耗和低轨卫星的任务处理能耗即其中，地面移动终端的任务处理能耗等于任务上传至低轨卫星的传输能耗即低轨卫星的任务处理能耗包括接收任务的传输能耗和任务的计算能耗即 (b) Specifically, when the kth batch of tasks generated by the mth ground mobile terminal When the strategy of being scheduled to a low-orbit satellite is selected, the tasks generated by the kth batch of the mth ground mobile terminal can be obtained. Decision and scheduling of each low-orbit satellite in the low-orbit satellite edge computing network The low-orbit satellite n is the kth batch of tasks generated by the mth ground mobile terminal The allocated computing resources are represented as Then the computing resource vector allocated by the low-orbit satellite to each task in the k-th batch of tasks can be expressed as Since the computing resources of each low-orbit satellite are limited, the computing resources allocated to each task cannot exceed the computing resources available to the low-orbit satellite. The propagation delay between the ground mobile terminal and the low-orbit satellite performing the mission Transmission delay of mission upload to low-orbit satellite And the mission calculation delay of the low-orbit satellite performing the mission Right now The task generated by the kth batch of the mth ground mobile terminal Task processing energy consumption Including the task processing energy consumption of ground mobile terminals and low-orbit satellite mission processing energy consumption Right now Among them, the task processing energy consumption of ground mobile terminals Equal to the transmission energy consumption of uploading the mission to the low-orbit satellite Right now Mission processing energy consumption of low-orbit satellites Including the transmission energy consumption of the receiving task and the computational energy consumption of the task Right now

(c)具体地，当第m个地面移动终端的第k批次生成的任务选择通过低轨卫星被调度至地面云服务器进行处理的策略时，可以得到第m个地面移动终端的第k批次生成的任务通过低轨卫星被调度至各个地面云服务器的决策和第m个地面移动终端的第k批次生成的任务通过低轨卫星n被调度至地面云服务器j分配的计算资源表示为那么地面云服务器对第k批次任务集合中各任务分配的计算资源向量可以表示为此时，第m个地面移动终端的第k批次生成的任务的处理时延包括地面移动终端通过低轨卫星中转至执行任务的地面云服务器之间的传播时延任务上传至中转低轨卫星的传输时延任务通过低轨卫星卸载至地面云服务器的传输时延以及执行任务的地面云服务器的任务计算时延即第m个地面移动终端的第k批次生成的任务的任务处理能耗包括地面移动终端的任务处理能耗和低轨卫星的任务处理能耗即其中，地面移动终端的任务处理能耗等于任务上传至低轨卫星的传输能耗即低轨卫星的任务处理能耗包括接收任务的传输能耗和下载任务的传输能耗即 (c) Specifically, when the kth batch of tasks generated by the mth ground mobile terminal When the strategy of dispatching to the ground cloud server for processing via the low-orbit satellite is selected, the tasks generated by the kth batch of the mth ground mobile terminal can be obtained. The decision and The task generated by the kth batch of the mth ground mobile terminal The computing resources dispatched to the ground cloud server j by the low-orbit satellite n are expressed as Then the computing resource vector allocated by the ground cloud server to each task in the k-th batch task set can be expressed as At this time, the task generated by the kth batch of the mth ground mobile terminal Processing delay Including the propagation delay between the ground mobile terminal via the low-orbit satellite and the ground cloud server that performs the mission Transmission delay of mission upload to transit low-orbit satellite The transmission delay of the task offloading from the low-orbit satellite to the ground cloud server And the task calculation delay of the ground cloud server that executes the task Right now The task generated by the kth batch of the mth ground mobile terminal Task processing energy consumption Including the task processing energy consumption of ground mobile terminals and low-orbit satellite mission processing energy consumption Right now Among them, the task processing energy consumption of the ground mobile terminal is equal to the transmission energy consumption of uploading the task to the low-orbit satellite. Right now Mission processing energy consumption of low-orbit satellites Including the transmission energy consumption of the receiving task and the transmission energy consumption of the download task Right now

(d)综合上述的第m个地面移动终端的第k批次生成的任务在不同调度方式下的描述，第m个地面移动终端的第k批次生成的任务的处理时延可以表示为分别是第m个地面移动终端的第k批次生成的任务在任务调度方式为在本地进行处理、传输至低轨卫星进行处理、和通过低轨卫星传输至地面云服务器进行处理时的处理时延(中的其中两个的值是0)。因此，对于由移动地面终端集合M组成的第k批次任务集，最大的处理时延可以表示为每当集合M的第k批次任务集均完成处理，集合M开始进行处理第k+1批次任务。第m个地面移动终端的第k批次生成的任务在地面移动终端的任务处理能耗可以表示为其中，分别是第m个地面移动终端的第k批次生成的任务在任务调度方式为在本地进行处理和传输至低轨卫星进行处理时在地面移动终端的任务处理能耗。第m个地面移动终端的第k批次生成的任务在低轨卫星的任务处理能耗可以表示为其中，分别是第m个地面移动终端的第k批次生成的任务在任务调度方式为传输至低轨卫星进行处理和通过低轨卫星传输至地面云服务器进行处理时在低轨卫星的任务处理能耗。(d) Combining the tasks generated by the kth batch of the mth ground mobile terminal Description of different scheduling methods, the tasks generated by the kth batch of the mth ground mobile terminal Processing delay It can be expressed as They are the tasks generated by the kth batch of the mth ground mobile terminal. The processing delay when the task scheduling mode is local processing, transmission to low-orbit satellite for processing, and transmission to ground cloud server for processing via low-orbit satellite ( The values of two of them are 0). Therefore, for the k-th batch of tasks consisting of the mobile ground terminal set M, the maximum processing delay can be expressed as Whenever the kth batch of tasks in set M is processed, set M starts processing the k+1th batch of tasks. Energy consumption of task processing in ground mobile terminals It can be expressed as in, They are the tasks generated by the kth batch of the mth ground mobile terminal. The energy consumption of task processing at the ground mobile terminal when the task scheduling mode is to process locally and transmit to the low-orbit satellite for processing. The task generated by the kth batch of the mth ground mobile terminal Energy consumption of mission processing in low-orbit satellites It can be expressed as in, They are the tasks generated by the kth batch of the mth ground mobile terminal. The energy consumption of task processing on low-orbit satellites when the task scheduling method is to transmit to low-orbit satellites for processing and transmit to ground cloud servers for processing via low-orbit satellites.

此外，考虑到任务的调度方式受到低轨卫星有限的电池容量所影响，在第k批次任务开始时需满足 In addition, considering that the task scheduling method is affected by the limited battery capacity of low-orbit satellites, the kth batch of tasks must meet

本发明定义的系统能耗开销为地面移动终端的任务处理能耗与低轨卫星的任务处理能耗的加权之和。权重反映了地面移动终端能耗与低轨卫星能耗在系统能耗开销中的相对重要性，其中α∈[0,1]表示移动地面终端能耗占系统能耗开销的权重，(1-α)表示低轨卫星能耗占系统能耗开销的权重。The system energy consumption overhead defined in the present invention is the weighted sum of the task processing energy consumption of the ground mobile terminal and the task processing energy consumption of the low-orbit satellite. The weight reflects the relative importance of the energy consumption of the ground mobile terminal and the energy consumption of the low-orbit satellite in the system energy consumption overhead, where α∈[0,1] represents the weight of the mobile ground terminal energy consumption in the system energy consumption overhead, and (1-α) represents the weight of the low-orbit satellite energy consumption in the system energy consumption overhead.

因此，以最小化系统能耗开销为优化目标的优化问题模型(即联合能耗优化问题)的具体描述如下：Therefore, the specific description of the optimization problem model with minimizing the system energy consumption overhead as the optimization goal (i.e., the joint energy consumption optimization problem) is as follows:

也就是说，第一、第二约束条件C₁和C₂表示每个任务(即)仅能选择一种调度方式；第三约束条件C₃表示每个任务若选取包括低轨卫星参与的任务调度方式，任务执行时延不应超过相应低轨卫星对任务的有效覆盖时间；第四约束条件C₄表示指每个低轨卫星为处理任务集中各个任务所分配的计算资源之和不能超过可用计算资源上限；第五约束条件C₅表示每个低轨卫星应保持可用能源状态始终大于0。That is, the first and second constraints _C1 and _C2 represent each task (i.e. ) can only select one scheduling method; the third constraint C ₃ means that if each task selects a task scheduling method involving low-orbit satellites, the task execution delay should not exceed the effective coverage time of the corresponding low-orbit satellite for the task; the fourth constraint C ₄ means that the sum of the computing resources allocated by each low-orbit satellite to process each task in the task set cannot exceed the upper limit of the available computing resources; the fifth constraint C ₅ means that each low-orbit satellite should keep the available energy state always greater than 0.

步骤S3：基于优化问题模型，定义强化学习模型的核心要素(即状态空间、动作空间和瞬时收益函数)，并设计状态评价函数来优化所述状态空间；Step S3: Based on the optimization problem model, the core elements of the reinforcement learning model (i.e., state space, action space, and instantaneous benefit function) are defined, and a state evaluation function is designed to optimize the state space;

在所述步骤S3中，使用马尔科夫决策过程(MarkovDecisionProcess，MDP)的框架来建立强化学习模型的求解方法。强化学习是一种对目标导向的学习与决策问题进行理解和自动化处理的计算方法，通过使用状态、动作和收益3个核心要素来定义智能体与环境交互的过程。In step S3, the Markov Decision Process (MDP) framework is used to establish a solution method for the reinforcement learning model. Reinforcement learning is a computational method for understanding and automating goal-oriented learning and decision-making problems, and defines the process of interaction between the agent and the environment by using three core elements: state, action, and benefit.

基于步骤2中建立的优化问题，本发明构建的强化学习模型的状态空间、动作空间和收益函数的定义如下：Based on the optimization problem established in step 2, the state space, action space and benefit function of the reinforcement learning model constructed by the present invention are defined as follows:

状态空间：强化学习模型的状态空间中的每个状态对应于所述低轨卫星边缘计算网络的环境状态信息，其包括地面移动终端生成的第k批次的任务集合的状态信息向量W^k、第k批次任务开始执行时各地面移动终端和低轨卫星之间的地心角向量β^k、任务开始执行时各地面移动终端和地面云服务器之间的可见性向量b^k和第k批次任务开始执行时各低轨卫星的电池使用状态信息向量U^k等。State space: Each state in the state space of the reinforcement learning model corresponds to the environmental state information of the low-orbit satellite edge computing network, which includes the state information vector ^Wk of the kth batch of tasks generated by the ground mobile terminal, the geocentric angle vector ^βk between each ground mobile terminal and the low-orbit satellite when the kth batch of tasks starts to be executed, the visibility vector ^bk between each ground mobile terminal and the ground cloud server when the task starts to be executed, and the battery usage state information vector ^Uk of each low-orbit satellite when the kth batch of tasks starts to be executed.

因此，在第k批次任务开始执行时的状态s_k∈S表示为：Therefore, the state s _k ∈ S when the kth batch of tasks starts to execute is expressed as:

s_k＝{W^k,β^k,b^k,U^k}，s _k ={W ^k ,β ^k ,b ^k ,U ^k },

其中，W^k表示地面移动终端生成的第k批次的任务集合的状态信息向量；β^k表示第k批次任务开始执行时各地面移动终端和低轨卫星之间的地心角向量；b^k表示第k批次任务开始执行时各地面移动终端和地面云服务器之间的可见性信息向量；U^k表示第k批次任务开始执行时各低轨卫星的电池使用状态信息向量。Among them, ^Wk represents the state information vector of the k-th batch of task set generated by the ground mobile terminal; ^βk represents the geocentric angle vector between each ground mobile terminal and the low-orbit satellite when the k-th batch of tasks starts to be executed; ^bk represents the visibility information vector between each ground mobile terminal and the ground cloud server when the k-th batch of tasks starts to be executed; ^Uk represents the battery usage status information vector of each low-orbit satellite when the k-th batch of tasks starts to be executed.

然而，由于s_k具有无限的状态取值，且空间维度随着任务数量增加而指数增长，这对于获取高效的计算资源分配策略提出了较大的挑战。因此，本发明在优化问题的约束条件下设计了状态评价函数来反映当前状态s_k在动作a_k下的质量，实现简化具有无限取值的状态空间s_k的目的。该状态评价函数g_k可以表示为由二元变量组成的向量组，状态评价函数g_k表示为：However, since s _k has infinite state values and the spatial dimension grows exponentially with the increase in the number of tasks, this poses a great challenge to obtaining an efficient computing resource allocation strategy. Therefore, the present invention designs a state evaluation function under the constraints of the optimization problem to reflect the quality of the current state s _k under the action a _k , so as to achieve the purpose of simplifying the state space s _k with infinite values. The state evaluation function g _k can be expressed as a vector group composed of binary variables, and the state evaluation function g _k is expressed as:

g_k＝{g^k,1,g^k,2,g^k,3}，g _k ={g ^k,1 ,g ^k,2 ,g ^k,3 },

其中，表示状态s_k在动作a_k下不能满足低轨卫星对第m个地面移动终端的第k批次生成的任务对应的第三约束条件C₃(即覆盖时间约束)，即表示状态s_k在动作a_k下能够满足低轨卫星对第m个地面移动终端的第k批次生成的任务对应的第三约束条件C₃(即覆盖时间约束)，即表示状态s_k在动作a_k下不能满足低轨卫星n对应的第四约束条件(即低轨卫星n分配的计算资源不应超过所拥有计算资源上限的约束)，即反之，即表示状态s_k在动作a_k下不能满足低轨卫星n对应的第五约束条件(即低轨卫星n的电池状态始终保持大于0的约束)；反之， in, Indicates that the state s _k cannot satisfy the task generated by the low-orbit satellite for the kth batch of the mth ground mobile terminal under the action a _k The corresponding third constraint C ₃ (i.e., coverage time constraint) is It means that the state s _k can satisfy the task generated by the low-orbit satellite for the kth batch of the mth ground mobile terminal under the action a _k . The corresponding third constraint C ₃ (i.e., coverage time constraint) is It means that the state s _k cannot satisfy the fourth constraint condition corresponding to the low-orbit satellite n under the action a _k (that is, the computing resources allocated to the low-orbit satellite n should not exceed the upper limit of the computing resources owned), that is, on the contrary, Right now It means that the state s _k cannot satisfy the fifth constraint condition corresponding to the low-orbit satellite n (i.e., the constraint that the battery state of the low-orbit satellite n is always greater than 0) under the action a _k ; otherwise,

动作空间：所述强化学习模型的动作空间中的每个动作包括任务调度方式和地面移动终端、低轨卫星和地面云服务器分配给各任务的计算资源。具体的，所述强化学习模型的动作空间中的第k批次任务集合执行的动作a_k∈A表示为：Action space: Each action in the action space of the reinforcement learning model includes the task scheduling method and the computing resources allocated to each task by the ground mobile terminal, low-orbit satellite and ground cloud server. Specifically, the action a _k ∈ A executed by the k-th batch of tasks in the action space of the reinforcement learning model is expressed as:

a_k＝{c^k,f^k,GMT,f^k,LEO,f^k,GCS}a _k ={c ^k ,f ^k,GMT ,f ^{k ,LEO} ,f ^{k ,GCS} }

其中，c^k表示第k批次任务集合的任务调度方式向量，f^k,GMT表示地面移动终端对第k批次任务集合中各任务分配的计算资源向量，f^k,LEO表示低轨卫星对第k批次任务集合中各任务分配的计算资源向量，f^k,GCS表示地面云服务器对第k批次任务集合中各任务分配的计算资源向量。Among them, c ^k represents the task scheduling method vector of the k-th batch of task set, f ^k,GMT represents the computing resource vector allocated by the ground mobile terminal to each task in the k-th batch of task set, f ^k,LEO represents the computing resource vector allocated by the low-orbit satellite to each task in the k-th batch of task set, and f ^k,GCS represents the computing resource vector allocated by the ground cloud server to each task in the k-th batch of task set.

其中，分配的计算资源数值为人为规定，通过将可分配最大的计算资源进行离散化处理来确定数值。The value of the allocated computing resources is artificially determined, and the value is determined by discretizing the maximum allocable computing resources.

收益函数：瞬时收益函数r_k被认为是状态s_k在动作a_k下环境的反馈。在以最小化任务处理的地面移动终端能耗和低轨卫星能耗组成的加权系统能耗开销为优化目标的计算资源分配问题中，所述强化学习模型的瞬时收益函数r_k可以表示为：Reward function: The instantaneous reward function r _k is considered to be the feedback of the environment under the state s _k under the action a _k . In the computing resource allocation problem with the optimization goal of minimizing the weighted system energy consumption overhead consisting of the ground mobile terminal energy consumption and the low-orbit satellite energy consumption for task processing, the instantaneous reward function r _k of the reinforcement learning model can be expressed as:

其中，为第m个地面移动终端的第k批次生成的任务的地面移动终端的任务处理能耗，为第m个地面移动终端的第k批次生成的任务的低轨卫星的任务处理能耗。in, The tasks generated for the kth batch of the mth ground mobile terminal The task processing energy consumption of ground mobile terminals, The tasks generated for the kth batch of the mth ground mobile terminal The mission processing energy consumption of low-orbit satellites.

参数的含义是地面移动终端能耗占系统能耗开销的权重，取值范围为[0,1]。The parameter means the weight of ground mobile terminal energy consumption in the system energy consumption overhead, and its value range is [0,1].

此时，所述优化目标被描述成能够最大化累积收益函数的计算资源分配策略π^*，对于系统的计算资源分配策略π:S→A，执行至第k批次任务开始时的累积收益函数可以表示为：At this time, the optimization goal is described as a computing resource allocation strategy π ^* that can maximize the cumulative benefit function. For the computing resource allocation strategy π:S→A of the system, the cumulative benefit function executed until the start of the kth batch of tasks can be expressed as:

其中，γ∈[0,1]作为收益折扣率来映射未来收益的重要性，E_π[·]表示在可能的策略π下的期望，K表示所需处理的总任务批次数，k’表示计算过程中的任务批次，用于收益求和计算，k表示当前执行任务的批次。k’和k的区别为k’为公式计算中引入的局部变量，k表示任务的第k批次。Among them, γ∈[0,1] is used as the profit discount rate to map the importance of future profits, _Eπ [·] represents the expectation under the possible strategy π, K represents the total number of task batches to be processed, k' represents the task batch in the calculation process, which is used for the profit summation calculation, and k represents the batch of the currently executed task. The difference between k' and k is that k' is a local variable introduced in the formula calculation, and k represents the kth batch of the task.

步骤S4：利用基于优化DQN(深度Q网络)的深度强化学习算法求解深度强化学习模型，其中，环境状态信息经过状态评价函数映射生成的离散状态作为输入信息输入所述强化学习模型；Step S4: solving a deep reinforcement learning model using a deep reinforcement learning algorithm based on an optimized DQN (deep Q network), wherein the discrete state generated by mapping the environmental state information through a state evaluation function is input into the reinforcement learning model as input information;

上文中的步骤S3所构建的强化学习模型通过利用状态评价函数来替代原有的动作空间，实现将可能存在无穷数量的系统状态映射到离散有限的状态评价函数上。然而，此强化学习模型依然存在离散高维的输入和动作空间。The reinforcement learning model constructed in step S3 above uses the state evaluation function to replace the original action space, thereby mapping the potentially infinite number of system states to a discrete and finite state evaluation function. However, this reinforcement learning model still has a discrete high-dimensional input and action space.

因此，为了高效地求解高性能计算资源分配策略，本发明在步骤S4中，所述强化学习模型为基于优化DQN的强化学习模型，在传统的强化学习模型上引入了DNN，将利用DNN的神经网络参数θ对实际Q函数Q(s_k,a_k)进行拟合得到的拟合Q函数来对神经网络参数θ迭代更新，最终获取的拟合Q函数的最优结果为最优策略评估函数Q^*(s_k,a_k)，即Q(s_k,a_k；θ)≈Q^*(s_k,a_k)，Q(s_k,a_k；θ)表示利用神经网络参数θ拟合得到的s_k状态下采取a_k动作的拟合Q函数。此时对应的神经网络就是求解得到的度强化学习模型，深度强化学习模型求解完成。Therefore, in order to efficiently solve the high-performance computing resource allocation strategy, in step S4 of the present invention, the reinforcement learning model is a reinforcement learning model based on optimized DQN, DNN is introduced on the traditional reinforcement learning model, and the neural network parameter θ is used to fit the actual Q function Q( _sk , _ak ) to iteratively update the neural network parameter θ, and the optimal result of the fitting Q function finally obtained is the optimal strategy evaluation function Q ^* ( _sk , _ak ), that is, Q( _sk , _ak ;θ)≈Q ^* ( _sk , _ak ), Q( _sk , _ak ;θ) represents the fitting Q function of taking the action _ak in the _sk state obtained by fitting the neural network parameter θ. At this time, the corresponding neural network is the solved deep reinforcement learning model, and the deep reinforcement learning model is solved.

其中，状态-动作对(s_k,a_k)∈A×S的Q函数Q(s_k,a_k)用来表示选择的状态-动作对的质量。基于贝尔曼等式，最优策略评估函数Q^*(s_k,a_k)的计算方式可以表示为E表示在s_k+1不确定性下的期望，γ表示未来收益的折扣率，Q^*(s_k+1,a_k+1)∣s_k,a_k表示在s_k,a_k条件下状态s_k+1下采取a_k+1动作的最优策略评估函数Q^*(s_k,a_k)。因此，本发明提出的方法通过适配基于优化DQN的深度强化学习算法，克服了传统强化学习方法遇到在存储空间和计算效率上的瓶颈，降低了系统能耗开销，提高了网络性能。Among them, the Q function Q(s _k , _ak ) of the state-action pair (s _k , _ak )∈A×S is used to represent the quality of the selected state-action pair. Based on the Bellman equation, the calculation method of the optimal policy evaluation function Q ^* (s _k , _ak ) can be expressed as E represents the expectation under the uncertainty of s _k+1 , γ represents the discount rate of future benefits, _and Q ^* (s _k+1 , a _k+1 )|s _k , a _k represents the optimal strategy evaluation function Q ^* (s _k , a _k ) of taking action a _k+1 in state s _k+1 _under the condition of s k, a k. Therefore, the method proposed in the present invention overcomes the bottlenecks of storage space and computing efficiency encountered by traditional reinforcement learning methods by adapting the deep reinforcement learning algorithm based on optimized DQN, reduces system energy consumption and improves network performance.

本发明设计的能耗优化的低轨卫星边缘计算资源分配方法的智能体的计算架构如图2所示。The computing architecture of the intelligent body of the energy-optimized low-orbit satellite edge computing resource allocation method designed by the present invention is shown in Figure 2.

在该低轨卫星边缘计算网络中，地面云服务器作为智能体通过执行本发明的能耗优化的低轨卫星边缘计算资源分配方法来获取优化后的计算资源分配策略，并分发优化后的策略至网络中的各地面移动终端、低轨卫星和地面云服务器。在步骤S1中，智能体收集环境状态信息(由前述定义可知，环境状态信息具体包括以下信息：低轨卫星边缘计算网络中各地面移动终端生成的任务状态信息、各地面移动终端和低轨卫星之间的地心角信息、各地面移动终端和地面云服务器之间的可见性信息以及各低轨卫星的电池使用状态信息)。其次，智能体将环境状态信息通过状态评价函数映射生成反映当前状态质量的离散状态作为输入信息，输入至基于优化DQN的深度强化学习算法的网络中。In the low-orbit satellite edge computing network, the ground cloud server, as an intelligent agent, obtains the optimized computing resource allocation strategy by executing the energy-optimized low-orbit satellite edge computing resource allocation method of the present invention, and distributes the optimized strategy to each ground mobile terminal, low-orbit satellite and ground cloud server in the network. In step S1, the intelligent agent collects environmental state information (from the above definition, it can be seen that the environmental state information specifically includes the following information: task status information generated by each ground mobile terminal in the low-orbit satellite edge computing network, geocentric angle information between each ground mobile terminal and the low-orbit satellite, visibility information between each ground mobile terminal and the ground cloud server, and battery usage status information of each low-orbit satellite). Secondly, the intelligent agent maps the environmental state information through the state evaluation function to generate a discrete state reflecting the quality of the current state as input information, and inputs it into the network of the deep reinforcement learning algorithm based on the optimized DQN.

该深度强化学习算法的网络由分别名为在线网络和目标网络两个部分组成，被用于稳定和优化网络性能，在线网络通过最小化损失函数梯度更新来进行对应策略更新，目标网络用于限制在线网络策略更新幅度，稳定网络性能。其中，在线网络和目标网络的神经网络参数分别定义为θ和θ^-。在线网络和目标网络具有相同的网络结构。目标网络每隔一定迭代次数从在线网络中复制网络参数θ用以更新自身的网络参数θ^-。The network of this deep reinforcement learning algorithm consists of two parts, namely the online network and the target network, which are used to stabilize and optimize network performance. The online network updates the corresponding strategy by minimizing the loss function gradient update, and the target network is used to limit the update range of the online network strategy and stabilize network performance. Among them, the neural network parameters of the online network and the target network are defined as θ and θ ^- respectively. The online network and the target network have the same network structure. The target network copies the network parameters θ from the online network every certain number of iterations to update its own network parameters θ ^- .

在线网络的网络参数θ在每次迭代中通过最小化对应的损失函数进行梯度更新，该损失函数可以表示为：The network parameters θ of the online network are updated by minimizing the corresponding loss function in each iteration. The loss function can be expressed as:

其中，y表示目标网络的Q函数值，Q(s_k,a_k；θ)表示利用在线网络的网络参数θ拟合得到的s_k状态下采取a_k动作的拟合Q函数，E[]表示在经验(s_k,a_k,r_k,s_k+1)不确定性下的期望，L_π(θ)表示在策略π下的损失函数。Where y represents the Q function value of the target network, Q(s _k , _ak ;θ) represents the fitted Q function of taking action a _k in state s _k obtained by fitting the network parameters θ of the online network, E[] represents the expectation under the uncertainty of experience (s _k , _ak , _rk ,s _k+1 ), and L _π (θ) represents the loss function under strategy π.

目标网络的Q函数值y的计算方式可以表示为：The calculation method of the Q function value y of the target network can be expressed as:

其中，Q(s_k+1,a_k+1；θ^-)表示利用目标网络的网络参数θ^-拟合得到的s_k状态下采取a_k动作的拟合Q函数，γ为收益折扣率，r_k为强化学习模型的瞬时收益函数r_k。Wherein, Q(s _k+1 , _ak+1 ;θ ⁻ ) represents the fitted Q function of taking action a _k in state s _k obtained by fitting the network parameters θ ⁻ of the target network, γ is the benefit discount rate, and r _k is the instantaneous benefit function r _k of the reinforcement learning model.

此外，DQN作为一种离线策略方法，利用经验回放机制，在每次任务批次k执行时，DQN将智能体获取的经验(s_k,a_k,r_k,s_k+1)存入经验回放池中，然后在每次网络参数更新时从经验回放池中随机采样小批量样本进行更新。本发明利用状态评价函数g_k来替代状态s_k，将智能体的经验替换为(g_k,a_k,r_k,g_k+1)，简化输入的状态空间，进行参数更新。In addition, as an offline strategy method, DQN uses the experience replay mechanism. When each task batch k is executed, DQN stores the experience (s _k , a _k , r _k , s _k+1 ) acquired by the agent in the experience replay pool, and then randomly samples small batches of samples from the experience replay pool for update each time the network parameters are updated. The present invention uses the state evaluation function g _k to replace the state s _k , replaces the experience of the agent with (g _k , a _k , r _k , g _k+1 ), simplifies the input state space, and performs parameter update.

在该深度强化学习算法的网络收集足够反映训练环境与智能体交互的样本经验集，并通过采样小批量样本经验回放获取了稳定收敛的计算资源分配策略后，结束训练优化，停止迭代。网络是否收集足够反映训练环境与智能体交互的样本经验集可以通过观察获取的计算资源分配策略的收益情况是否收敛稳定，也可以通过在线网络的损失函数收敛逼近0来判定。After the network of the deep reinforcement learning algorithm has collected enough sample experience sets that reflect the interaction between the training environment and the agent, and obtained a stable and convergent computing resource allocation strategy by replaying the sample experience of a small batch, the training optimization ends and the iteration stops. Whether the network has collected enough sample experience sets that reflect the interaction between the training environment and the agent can be determined by observing whether the benefits of the computing resource allocation strategy obtained converge and stabilize, or by whether the loss function of the online network converges to zero.

步骤S5：基于求解后的深度强化学习模型，获取能耗优化的计算资源分配策略，分发至系统内各地面移动终端、低轨卫星和地面云服务器，实现计算资源分配。Step S5: Based on the solved deep reinforcement learning model, obtain the energy-optimized computing resource allocation strategy and distribute it to various ground mobile terminals, low-orbit satellites and ground cloud servers in the system to realize computing resource allocation.

在所述步骤S5中，智能体将第k批次获取收集环境状态信息(具体包括低轨卫星边缘计算网络中各地面移动终端生成的任务状态信息、各地面移动终端和低轨卫星之间的地心角信息、各地面移动终端和地面云服务器之间的可见性信息以及各低轨卫星的电池使用状态信息)作为状态s_k输入，进行计算得到状态评价函数g_k；随后利用步骤S3建立的强化学习模型与步骤S4采用的基于优化DQN的深度强化学习算法进行求解，输出计算资源分配策略a_k＝{c^k,f^k,GMT,f^k,LEO,f^k,GCS}，得到各任务调度方式和系统内各地面移动终端、低轨卫星和地面云服务器的计算资源分配情况{f^k,GMT,f^k,LEO,f^k,GCS}，并分发至系统内各对应设备。In step S5, the intelligent agent uses the kth batch of collected environmental state information (specifically including the task state information generated by each ground mobile terminal in the low-orbit satellite edge computing network, the geocentric angle information between each ground mobile terminal and the low-orbit satellite, the visibility information between each ground mobile terminal and the ground cloud server, and the battery usage status information of each low-orbit satellite) as the state _sk input, and calculates to obtain the state evaluation function _gk ; then the reinforcement learning model established in step S3 and the deep reinforcement learning algorithm based on the optimized DQN adopted in step S4 are used to solve, and the computing resource allocation strategy _ak = {ck, ^fk ^{, GMT} , fk ^{, LEO} , fk ^{, GCS} } is output to obtain the scheduling mode of each task and the computing resource allocation status of each ground mobile terminal, low-orbit satellite and ground cloud server in the system { ^{fk, GMT} , ^{fk, LEO} , fk ^{, GCS} }, and distribute them to the corresponding devices in the system.

由此，本发明的能耗优化的低轨卫星边缘计算资源分配方法，其优势在于：Therefore, the energy consumption optimized low-orbit satellite edge computing resource allocation method of the present invention has the advantages of:

1)在包含地面移动终端、低轨卫星和地面云服务器的低轨卫星边缘计算网络中，以地面云服务器为智能体，考虑了包括低轨卫星对任务的动态覆盖情况、低轨卫星可分配的最大计算资源以及低轨卫星上的电池使用状态在内的动态特征，以最小化地面移动终端和低轨卫星能耗组成的加权系统能耗开销为优化目标，实现将地面移动终端上的计算任务在系统内进行计算资源分配。利用智能体在动态低轨卫星边缘计算网络中进行计算资源分配，能够减少地面移动终端和卫星能耗开销，提升低轨卫星边缘计算网络的性能。1) In a low-orbit satellite edge computing network that includes ground mobile terminals, low-orbit satellites, and ground cloud servers, the ground cloud server is used as an intelligent agent. Dynamic characteristics including the dynamic coverage of low-orbit satellites for tasks, the maximum computing resources that can be allocated by low-orbit satellites, and the battery usage status on low-orbit satellites are considered. The weighted system energy consumption overhead composed of the energy consumption of ground mobile terminals and low-orbit satellites is minimized as the optimization goal, and computing resources are allocated within the system for computing tasks on ground mobile terminals. Using intelligent agents to allocate computing resources in a dynamic low-orbit satellite edge computing network can reduce the energy consumption overhead of ground mobile terminals and satellites and improve the performance of the low-orbit satellite edge computing network.

2)针对低轨卫星和地面移动终端的双重能耗优化目标，定义加权系统能耗开销作为优化目标。引入深度强化学习方法，解决动态低轨卫星边缘计算网络的计算资源分配问题。基于MDP框架定义强化学习模型的核心要素，并为优化状态空间定义了状态评价函数，提出了基于优化DQN的算法求解和生成策略分发的方式。考虑到低轨卫星高速移动和资源受限的特点，所提方法在动态低轨卫星边缘计算网络中的计算效率和系统能耗开销方面具有明显性能优势。2) Aiming at the dual energy consumption optimization goals of low-orbit satellites and ground mobile terminals, the weighted system energy consumption overhead is defined as the optimization target. The deep reinforcement learning method is introduced to solve the problem of computing resource allocation in the dynamic low-orbit satellite edge computing network. The core elements of the reinforcement learning model are defined based on the MDP framework, and the state evaluation function is defined for optimizing the state space. The algorithm solution and generation strategy distribution method based on the optimized DQN are proposed. Considering the characteristics of high-speed movement and limited resources of low-orbit satellites, the proposed method has obvious performance advantages in computing efficiency and system energy consumption overhead in the dynamic low-orbit satellite edge computing network.

实验结果：Experimental results:

下面以5个地面移动终端，3颗低轨卫星和2个地面云服务器的一个场景为例，给出本发明的能耗优化的低轨卫星边缘计算资源分配方法的具体示例。Below, taking a scenario of 5 ground mobile terminals, 3 low-orbit satellites and 2 ground cloud servers as an example, a specific example of the energy-optimized low-orbit satellite edge computing resource allocation method of the present invention is given.

根据步骤S1，利用智能体获取动态的低轨卫星边缘计算网络的环境状态信息。According to step S1, the intelligent agent is used to obtain the environmental status information of the dynamic low-orbit satellite edge computing network.

本实验示例中，低轨卫星边缘计算网络的计算资源分配场景如图3所示。该低轨卫星边缘计算网络以地面云服务器为智能体，包括M个地面移动终端，N颗低轨卫星和J个地面云服务器，具体M＝5,N＝3,J＝2。假设假定低轨卫星均运行在圆轨道上，低轨卫星轨道模型如图4所示。其中，轨道高度表示为H＝800km，地球半径表示为R＝6370km。In this experimental example, the computing resource allocation scenario of the low-orbit satellite edge computing network is shown in Figure 3. The low-orbit satellite edge computing network uses ground cloud servers as intelligent entities, including M ground mobile terminals, N low-orbit satellites and J ground cloud servers, specifically M = 5, N = 3, J = 2. Assuming that all low-orbit satellites are operating on circular orbits, the low-orbit satellite orbit model is shown in Figure 4. Among them, the orbit height is expressed as H = 800km, and the earth radius is expressed as R = 6370km.

根据步骤S2，根据获取的环境状态信息，构建以最小化系统能耗开销为优化目标的优化问题模型，系统能耗开销定义为地面移动终端的任务处理能耗和低轨卫星的任务处理能耗的加权之和。According to step S2, based on the acquired environmental status information, an optimization problem model is constructed with minimizing the system energy consumption overhead as the optimization goal. The system energy consumption overhead is defined as the weighted sum of the task processing energy consumption of the ground mobile terminal and the task processing energy consumption of the low-orbit satellite.

为了解决最小化系统能耗开销为优化目标的计算资源分配问题，智能体(地面云服务器)利用获取的网络环境状态信息，在实际动态的低轨卫星边缘计算网络的约束条件下(低轨卫星对任务的覆盖时间约束，低轨卫星分配的计算资源约束和低轨卫星的电池使用状态约束)，对优化问题进行数学建模。In order to solve the computing resource allocation problem with the optimization goal of minimizing the system energy consumption overhead, the intelligent agent (ground cloud server) uses the acquired network environment status information to mathematically model the optimization problem under the constraints of the actual dynamic low-orbit satellite edge computing network (the coverage time constraints of the low-orbit satellite for the task, the computing resource constraints allocated by the low-orbit satellite, and the battery usage status constraints of the low-orbit satellite).

具体地，当任务选择本地执行策略时，任务处理时延和能耗分别通过以下计算方式得到，即其中，ζ表示芯片的能耗系数。Specifically, when the task When the local execution strategy is selected, the task processing delay and energy consumption are calculated as follows: Among them, ζ represents the energy consumption coefficient of the chip.

当任务选择被调度至低轨卫星的策略时，任务的处理时延可以通过以下计算方式得到，即其中，表示地面移动终端m到低轨卫星n的距离，c表示光的传播速度，表示任务被上传至低轨卫星n的上传速率。可以表示为任务处理的地面移动终端能耗可以表示为其中，表示地面移动终端m的上行传输功率。此外，低轨卫星能耗可以表示为其中，表示为低轨卫星获取每比特任务数据的能耗。When the task When the strategy of scheduling to a low-orbit satellite is selected, the task processing delay can be obtained by the following calculation method: in, represents the distance from the ground mobile terminal m to the low-orbit satellite n, c represents the propagation speed of light, Indicates the task The upload rate of data uploaded to low-orbit satellite n. It can be expressed as The energy consumption of the ground mobile terminal for task processing can be expressed as in, represents the uplink transmission power of the ground mobile terminal m. In addition, the energy consumption of the low-orbit satellite can be expressed as in, Represents the energy consumption of acquiring each bit of mission data for a low-orbit satellite.

当任务选择通过低轨卫星被调度至地面云服务器进行处理的策略时，任务处理时延可以通过以下计算方式得到，即其中，表示低轨卫星n到地面云服务器j的距离，表示任务通过低轨卫星n被卸载至地面云服务器j的下载速率。任务处理的地面移动终端能耗可以表示为任务处理的低轨卫星能耗可以表示为其中，表示低轨卫星n的下行传输功率。When the task When the strategy of dispatching from low-orbit satellites to ground cloud servers for processing is selected, the task processing delay can be obtained by the following calculation method: in, represents the distance from low-orbit satellite n to ground cloud server j, Indicates the task The download rate offloaded to the ground cloud server j through the low-orbit satellite n. The energy consumption of the ground mobile terminal for task processing can be expressed as The energy consumption of low-orbit satellites for mission processing can be expressed as in, represents the downlink transmission power of low-orbit satellite n.

本发明以铱星系统为例，低轨卫星n的电池使用状态在第k+1批次任务开始时的约束条件可以表示为：其中，U_max,分别表示低轨卫星n上电池最大使用能源，低轨卫星n利用太阳能板获取的能源和低轨卫星n处理第k批次任务所消耗的能源。可以通过以下计算方式得到。表示低轨卫星n在执行第k批次中利用太阳能板获取的能源、表示低轨卫星n在执行第k批次任务中所消耗的能源，表示执行第k批次任务所需的最大时延，表示太阳能每秒转换成能源的效率。可以通过以下计算方式得到，P_n表示日常的能源消耗。The present invention takes the Iridium system as an example, and the constraint condition of the battery usage status of the low-orbit satellite n at the beginning of the k+1 batch mission can be expressed as: Among them, U _max , They represent the maximum energy used by the battery on low-orbit satellite n, the energy obtained by low-orbit satellite n using solar panels, and the energy consumed by low-orbit satellite n to process the kth batch of tasks. It can be obtained by the following calculation method. represents the energy obtained by low-orbit satellite n using solar panels in the execution of batch k, represents the energy consumed by low-orbit satellite n in performing the kth batch of missions, represents the maximum delay required to execute the kth batch of tasks, Indicates the efficiency of converting solar energy into energy per second. It can be obtained by the following calculation method, _Pn represents daily energy consumption.

根据步骤S3，基于优化问题，定义强化学习模型核心要素，并设计状态评价函数优化状态空间。According to step S3, based on the optimization problem, the core elements of the reinforcement learning model are defined, and the state evaluation function is designed to optimize the state space.

利用MDP建模的强化学习模型核心要素主要包括状态空间，动作空间以及收益函数。本发明为优化状态空间，设计了状态评价函数来替代状态空间。在基于动态低轨卫星边缘计算网络背景下，优化问题模型各核心要素的具体设计如下：The core elements of the reinforcement learning model using MDP modeling mainly include state space, action space and benefit function. In order to optimize the state space, the present invention designs a state evaluation function to replace the state space. In the context of a dynamic low-orbit satellite edge computing network, the specific design of the core elements of the optimization problem model is as follows:

状态空间设计：以第k批次任务开始执行时的状态s_k∈S为例，包括任务集合生成的状态信息向量；任务开始执行时各地面移动终端和低轨卫星之间的地心角向量，用于反映低轨卫星对任务的覆盖情况；任务开始执行时各地面移动终端和地面云服务器之间的可见性信息向量，用于反映地面云服务器对任务的可见性；任务开始执行时各低轨卫星的电池使用状态信息向量，用于反映低轨卫星此时的电池使用状态。State space design: Taking the state s _k ∈S when the kth batch of tasks starts to execute as an example, it includes the state information vector generated by the task set; the geocentric angle vector between each ground mobile terminal and the low-orbit satellite when the task starts to execute, which is used to reflect the coverage of the low-orbit satellite to the task; the visibility information vector between each ground mobile terminal and the ground cloud server when the task starts to execute, which is used to reflect the visibility of the ground cloud server to the task; the battery usage status information vector of each low-orbit satellite when the task starts to execute, which is used to reflect the battery usage status of the low-orbit satellite at this time.

状态评价函数设计：包括3类二元变量组成的向量组，表示当前状态在动作下的质量，分别为低轨卫星对任务的覆盖时间约束，低轨卫星分配的计算资源上限约束和低轨卫星的电池使用状态约束。Design of state evaluation function: It includes a vector group composed of three types of binary variables, which represent the quality of the current state under action, namely, the coverage time constraint of the low-orbit satellite for the task, the upper limit constraint of the computing resources allocated by the low-orbit satellite, and the battery usage status constraint of the low-orbit satellite.

动作空间设计：对第k批次任务集合执行的动作a_k∈A为例，包括任务的调度方式，地面移动终端、低轨卫星和地面云服务器为各任务分配的计算资源。Action space design: Taking the action a _k ∈ A executed on the k-th batch of tasks as an example, it includes the task scheduling method and the computing resources allocated to each task by the ground mobile terminal, low-orbit satellite and ground cloud server.

收益函数设计：以状态s_k在动作a_k下的反馈r_k为例，描述为由因任务处理造成的地面移动终端的能耗和低轨卫星的能耗下加权组成的系统能耗开销。系统优化目标为最大化累积收益函数。Revenue function design: Taking the feedback r _k of state s _k under action a _k as an example, it is described as the system energy consumption overhead composed of the energy consumption of the ground mobile terminal caused by task processing and the energy consumption of the low-orbit satellite. The system optimization goal is to maximize the cumulative revenue function.

根据步骤S4，利用基于优化DQN的深度强化学习算法求解深度强化学习模型。According to step S4, the deep reinforcement learning model is solved using a deep reinforcement learning algorithm based on optimized DQN.

具体地，本发明中所提供的基于DQN的计算资源分配算法流程包括以下步骤：Specifically, the DQN-based computing resource allocation algorithm provided in the present invention includes the following steps:

步骤S41：初始化经验回放池U，以及在线神经网络参数θ；Step S41: Initialize the experience replay pool U and the online neural network parameters θ;

经验回放池初始化表示清空样本缓存，神经网络参数的初始值进行随机产生。Initializing the experience replay pool means clearing the sample cache and randomly generating the initial values of the neural network parameters.

步骤S42：初始化目标神经网络参数θ^-←θ；Step S42: Initialize the target neural network parameter θ ^- ←θ;

步骤S43：初始化训练回合数v为1；Step S43: Initialize the number of training rounds v to 1;

步骤S44：初始化环境和网络环境状态的评价函数g₀；Step S44: Initialize the evaluation function g ₀ of the environment and network environment status;

评价函数根据具体技术方案步骤S3进行二值化定义，初始值设置为由1组成的向量。The evaluation function is defined in binary form according to step S3 of the specific technical solution, and the initial value is set to a vector consisting of 1s.

步骤S45：初始化当前训练回合数v中的任务批次k为1；Step S45: Initialize the task batch k in the current training round number v to 1;

步骤S46：根据ε-greedy策略随机选取动作a_k，否则a_k＝argmax_a∈A Q(g_k,a；θ)；其中，ε-greedy策略是指以e(0<e<1)的概率随机选取动作，否则采用动作价值最大的动作。Step S46: randomly select an action a _k according to the ε-greedy strategy, otherwise a _k =argmax _a∈A Q(g _k ,a;θ); wherein the ε-greedy strategy refers to randomly selecting an action with a probability of e(0<e<1), otherwise adopting the action with the maximum action value.

步骤S47：执行动作a_k并获取下一网络环境状态的评价函数g_k+1和收益函数r_k；Step S47: Execute action a _k and obtain the evaluation function g _k+1 and benefit function r _k of the next network environment state;

步骤S48：存储(g_k,a_k,r_k,g_k+1)经验数据到经验回放池U中；Step S48: storing (g _k , a _k , r _k , g _k+1 ) experience data into the experience replay pool U;

步骤S49：从U中随机采样小批量样本(g_i,a_i,r_i,g_i+1)；小批量样本用于更新在线网络和目标网络的网络参数θ和θ^-。Step S49: Randomly sample small batch samples ( _gi , _ai , _ri , gi ₊₁ ) from U; the small batch samples are used to update the network parameters θ and ^θ- of the online network and the target network.

步骤S410：利用小批量样本计算在线网络和目标网络的Q函数值的损失函数L(θ)，并利用该损失函数进行小批量梯度下降，以更新在线网络的网络参数θ；Step S410: Calculate the loss function L(θ) of the Q function value of the online network and the target network using the small batch samples, and use the loss function to perform small batch gradient descent to update the network parameter θ of the online network;

步骤S411：每隔τ^-批次，目标网络的网络参数进行更新θ^-＝θ；τ^-的含义是对目标网络定期更新的步长，取值范围为大于0。Step S411: every τ ^-batch , the network parameters of the target network are updated by θ ^- =θ; τ ^- means the step length for periodically updating the target network, and the value range is greater than 0.

步骤S412：判断是否满足k<K，K为任务执行批次的设定阈值，若是，k＝k+1，进入步骤S46，从而对在线网络和目标网络的网络参数θ和θ^-进行迭代更新；否则进入步骤S413；Step S412: determine whether k<K is satisfied, where K is the set threshold of the task execution batch. If so, k=k+1, proceed to step S46, thereby iteratively updating the network parameters θ and θ ^- of the online network and the target network; otherwise, proceed to step S413;

步骤S413：判断是否满足v<V，V为训练回合迭代此数设定阈值，若是，v＝v+1，进入步骤S44，否则优化结束，得到训练完的深度强化学习模型。Step S413: Determine whether v<V, where V is a threshold for the number of training round iterations. If so, v=v+1, and go to step S44. Otherwise, the optimization ends and a trained deep reinforcement learning model is obtained.

利用基于DQN算法训练收敛求解得到的深度强化学习模型，可以得到动态低轨卫星边缘计算网络下的最优计算资源分配策略，将第k批次获取收集环境状态信息(具体包括低轨卫星边缘计算网络中各地面移动终端生成的任务状态信息、各地面移动终端和低轨卫星之间的地心角信息、各地面移动终端和地面云服务器之间的可见性信息以及各低轨卫星的电池使用状态信息)作为状态s_k输入，进行计算得到状态评价函数g_k。利用步骤S3建立的强化学习模型与步骤S4采用的基于DQN的深度强化学习算法进行求解，输出计算资源分配策略a_k＝{c^k,f^k,GMT,f^k,LEO,f^k,GCS}，得到各任务调度方式和系统内各地面移动终端、低轨卫星和地面云服务器的计算资源分配情况{f^k,GMT,f^k,LEO,f^k,GCS}，并分发至系统内各对应设备。The optimal computing resource allocation strategy under the dynamic low-orbit satellite edge computing network can be obtained by using the deep reinforcement learning model obtained by the DQN algorithm training convergence solution. The k-th batch of collected environmental state information (specifically including the task state information generated by each ground mobile terminal in the low-orbit satellite edge computing network, the geocentric angle information between each ground mobile terminal and the low-orbit satellite, the visibility information between each ground mobile terminal and the ground cloud server, and the battery usage status information of each low-orbit satellite) is used as the state _sk input to calculate and obtain the state evaluation function _gk . The reinforcement learning model established in step S3 and the deep reinforcement learning algorithm based on DQN adopted in step S4 are used to solve and output the computing resource allocation strategy _ak = {ck, ^fk ^{, GMT} , ^{fk, LEO} , ^{fk, GCS} }, and the task scheduling mode and the computing resource allocation of each ground mobile terminal, low-orbit satellite and ground cloud server in the system { ^{fk, GMT} , ^{fk, LEO} , fk ^{, GCS} } are obtained, and distributed to the corresponding devices in the system.

以上所述的，仅为本发明的较佳实施例，并非用以限定本发明的范围，本发明的上述实施例还可以做出各种变化。凡是依据本发明申请的权利要求书及说明书内容所作的简单、等效变化与修饰，皆落入本发明专利的权利要求保护范围。本发明未详尽描述的均为常规技术内容。The above is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. The above embodiments of the present invention can also be modified in various ways. All simple, equivalent changes and modifications made according to the claims and the contents of the specification of the present invention fall within the scope of protection of the claims of the present invention. The contents not described in detail in the present invention are all conventional technical contents.

Claims

1. A method for allocating low-orbit satellite edge computing resources with energy consumption optimization, characterized by comprising:

Step S1: Using the intelligent agent to obtain the environmental status information of the dynamic low-orbit satellite edge computing network;

Step S2: Based on the acquired environmental status information, an optimization problem model is constructed with minimizing the system energy consumption overhead as the optimization goal. The system energy consumption overhead is defined as the weighted sum of the task processing energy consumption of the ground mobile terminal and the task processing energy consumption of the low-orbit satellite;

Step S3: Based on the optimization problem model, define the state space, action space and benefit function of the reinforcement learning model, and design a state evaluation function to optimize the state space;

Step S4: solving the deep reinforcement learning model using a deep reinforcement learning algorithm based on optimized DQN, wherein the discrete state generated by mapping the environmental state information through the state evaluation function is input into the network of the deep reinforcement learning algorithm as input information;

Step S5: Based on the solved deep reinforcement learning model, an energy-optimized computing resource allocation strategy is obtained and distributed to various ground mobile terminals, low-orbit satellites, and ground cloud servers to achieve computing resource allocation;

The environmental status information of the low-orbit satellite edge computing network includes: the status information vector W ^k of the kth batch of task sets generated by the ground mobile terminal, the geocentric angle vector β ^k between each ground mobile terminal and the low-orbit satellite when the kth batch of tasks starts to be executed, the visibility vector b ^k between each ground mobile terminal and the ground cloud server when the task starts to be executed, and the battery usage status information vector U ^k of each low-orbit satellite when the kth batch of tasks starts to be executed;

The step S1 comprises:

Step S11: providing a low-orbit satellite edge computing network consisting of M ground mobile terminals and J ground cloud servers located on the ground, and N low-orbit satellites located in space; the set of ground mobile terminals, the set of low-orbit satellites and the set of ground cloud servers are respectively expressed as M'＝{1,...,m,...,M}, N'＝{1,...,n,...,N} and J'＝{1,...,j,...,J}, m, n, j respectively represent the ordinal number of the ground mobile terminal, the ordinal number of the low-orbit satellite and the ordinal number of the ground cloud server, and M, N, J are the number of ground mobile terminals, the number of low-orbit satellites and the number of ground cloud servers respectively; setting each ground mobile terminal to be able to connect to at most one low-orbit satellite at a time; and setting each ground mobile terminal to be able to establish a connection with at most one ground cloud server through a low-orbit satellite at a time;

Step S12: Set each ground mobile terminal to generate only one indivisible computing task in each batch; then, the set K of task batches to be executed by the entire low-orbit satellite edge computing network is represented as: K'={1,...,k,...,K}, k represents the kth task batch, and K is the total number of task batches; the task batch generated by the kth batch of the mth ground mobile terminal is described as in, Expressed as the data size of the task payload, It is expressed as the number of CPU processing cycles required for the task load. The state information vector ^Wk of the k-th batch of task sets generated by the ground mobile terminal is defined as M is the number of ground mobile terminals;

Step S13: Set the low-orbit satellites to run on circular orbits, and represent the orbit height as H, the radius of the earth as R, and the elevation angle between the ground mobile terminal m and the low-orbit satellite n when starting to execute the kth batch of tasks as Obtain the geocentric angle vector β ^k between each ground mobile terminal and the low-orbit satellite when the k-th batch of tasks starts to be executed, as well as the visibility duration of each low-orbit satellite in the entire low-orbit satellite edge computing network to each ground mobile terminal when executing the k-th batch of tasks;

Step S14: Initialize the visibility vector ^bk between each ground mobile terminal and the ground cloud server when the task starts to be executed and the battery usage status information vector ^Uk of each low-orbit satellite when the kth batch of tasks starts to be executed;

The step S2 comprises:

Step S21: The task scheduling mode vector corresponding to the state information vector ^Wk of the kth batch of task sets generated by the ground mobile terminal is defined as The tasks generated for the kth batch of the mth ground mobile terminal The decision vector for scheduling to each LEO satellite in the LEO satellite edge computing network, The task generated for the kth batch of the mth ground mobile terminal The decision vectors for scheduling to various ground cloud servers in the low-orbit satellite edge computing network. Multiple tasks in the same batch of task sets of all ground mobile terminals can choose different task scheduling methods; task scheduling methods include: local processing, transmission to low-orbit satellites for processing, and transmission to ground cloud servers via low-orbit satellites for processing;

Step S22: determining the processing delay of each task in the task set, the task processing energy consumption of the ground mobile terminal, and the task processing energy consumption of the low-orbit satellite according to the environmental state information and task scheduling mode vector of the task set of the kth batch obtained;

Step S23: defining the weighted sum of the task processing energy consumption of the ground mobile terminal and the task processing energy consumption of the low-orbit satellite as the system energy consumption overhead, and constructing an optimization problem model with minimizing the system energy consumption overhead as the optimization goal;

The task generated by the kth batch of the mth ground mobile terminal Decision vectors for scheduling to each LEO satellite in the LEO satellite edge computing network for:

in, represents the task generated by the kth batch of the mth ground mobile terminal It is dispatched to low-orbit satellite n for execution; represents the task generated by the kth batch of the mth ground mobile terminal Not dispatched to low-orbit satellite n for execution;

The task generated by the kth batch of the mth ground mobile terminal Decision and scheduling of each low-orbit satellite in the low-orbit satellite edge computing network for

The tasks generated by the kth batch of the mth ground mobile terminal Decision vector for scheduling to various ground cloud servers in the LEO satellite edge computing network for:

in, represents the task generated by the kth batch of the mth ground mobile terminal It is dispatched to the ground cloud server j for execution via low-orbit satellite n; represents the task generated by the kth batch of the mth ground mobile terminal It is not dispatched to the ground cloud server j for execution through the low-orbit satellite n;

The task generated by the kth batch of the mth ground mobile terminal The decision and for

The optimization problem model is:

Wherein, C ₁ , C ₂ , C ₃ , C ₄ , and C ₅ represent the first, second, third, fourth, and fifth constraints, respectively; represents the task generated by the kth batch of the mth ground mobile terminal It is dispatched to low-orbit satellite n for execution; represents the task generated by the kth batch of the mth ground mobile terminal Not dispatched to low-orbit satellite n for execution; represents the task generated by the kth batch of the mth ground mobile terminal It is dispatched to the ground cloud server j for execution via low-orbit satellite n; represents the task generated by the kth batch of the mth ground mobile terminal It is not dispatched to the ground cloud server j for execution through the low-orbit satellite n; They are the tasks generated by the kth batch of the mth ground mobile terminal. The processing delay when the task scheduling method is to transmit to a low-orbit satellite for processing and transmit via a low-orbit satellite to a ground cloud server for processing; is the visibility duration of low-orbit satellite n to ground mobile terminal m when performing the kth batch of tasks; The tasks generated for the kth batch of the mth ground mobile terminal for the low-orbit satellite n Allocated computing resources; z ^LEO is the upper limit of computing resources owned by a single low-orbit satellite; is the battery usage status of low-orbit satellite n when the k-th batch of tasks begins to execute; f ^k,GMT represents the computing resource vector allocated by the ground mobile terminal to each task in the k-th batch of tasks, f ^k,LEO represents the computing resource vector allocated by the low-orbit satellite to each task in the k-th batch of tasks, and f ^k,GCS represents the computing resource vector allocated by the ground cloud server to each task in the k-th batch of tasks; The tasks generated for the kth batch of the mth ground mobile terminal Task processing energy consumption in ground mobile terminals, The tasks generated for the kth batch of the mth ground mobile terminal The energy consumption of mission processing in low-orbit satellites; α represents the weight of ground mobile terminal energy consumption in the system energy consumption overhead.

2. The energy-optimized low-orbit satellite edge computing resource allocation method according to claim 1 is characterized in that the visible time of the low-orbit satellite n to the ground mobile terminal m when executing the kth batch of tasks is for:

Wherein, T ^LEO is the operation period of the low-orbit satellite, is the geocentric angle between the ground mobile terminal m and the low-orbit satellite n;

The geocentric angle between the ground mobile terminal m and the low-orbit satellite n for:

Where R is the radius of the Earth, H is the orbital altitude, is the elevation angle between the ground mobile terminal m and the low-orbit satellite n at the beginning of the kth batch of missions;

The operating period of a low-orbit satellite, T ^LEO , is:

Among them, R is the radius of the earth, H is the orbital altitude, and μ represents the Kepler constant.

3. The energy-optimized low-orbit satellite edge computing resource allocation method according to claim 1, characterized in that each state _sk in the state space of the reinforcement learning model includes a state information vector ^Wk of a set of tasks of the kth batch generated by a ground mobile terminal, a geocentric angle vector ^βk between each ground mobile terminal and the low-orbit satellite when the kth batch of tasks starts to be executed, a visibility vector ^bk between each ground mobile terminal and the ground cloud server when the task starts to be executed, and a battery usage state information vector ^Uk of each low-orbit satellite when the kth batch of tasks starts to be executed;

The state evaluation function g _k is:

g _k ={g ^k,1 ,g ^k,2 ,g ^k,3 },

in, Indicates that the state s _k cannot satisfy the task generated by the low-orbit satellite for the kth batch of the mth ground mobile terminal under the action a _k The corresponding third constraint C ₃ ; It means that the state s _k can satisfy the task generated by the low-orbit satellite for the kth batch of the mth ground mobile terminal under the action a _k . The corresponding third constraint C ₃ ; It means that the state s _k cannot satisfy the fourth constraint condition corresponding to the low-orbit satellite n under the action a _k . On the contrary, It means that the state s _k cannot satisfy the fifth constraint condition corresponding to the low-orbit satellite n under the action a _k . On the contrary,

The action a _k performed by the k-th batch of tasks in the action space of the reinforcement learning model includes:

a _k ={c ^k ,f ^k,GMT ,f ^{k ,LEO} ,f ^{k ,GCS} },

Wherein, c ^k represents the task scheduling method vector of the k-th batch of task set, f ^k,GMT represents the computing resource vector allocated by the ground mobile terminal to each task in the k-th batch of task set, f ^k,LEO represents the computing resource vector allocated by the low-orbit satellite to each task in the k-th batch of task set, and f ^k,GCS represents the computing resource vector allocated by the ground cloud server to each task in the k-th batch of task set;

The benefit function of the reinforcement learning model includes an instantaneous benefit function and a cumulative benefit function;

The instantaneous reward function r _k of the reinforcement learning model is:

in, The tasks generated for the kth batch of the mth ground mobile terminal Task processing energy consumption in ground mobile terminals, The tasks generated for the kth batch of the mth ground mobile terminal The energy consumption of mission processing in low-orbit satellites; α represents the weight of ground mobile terminal energy consumption in the system energy consumption overhead;

The optimization objective is described as a computing resource allocation strategy π ^* that can maximize the cumulative benefit function. For the computing resource allocation strategy π:S→A, the cumulative benefit function executed until the start of the k-th batch of tasks is expressed as:

Among them, γ∈[0,1] is used as the profit discount rate to map the importance of future profits, _Eπ [·] represents the expectation under the possible strategy π, K represents the total number of task batches to be processed, k' represents the task batch in the calculation process, and k represents the batch of the currently executed task.

4. The energy-optimized low-orbit satellite edge computing resource allocation method according to claim 3 is characterized in that, in step S4, DNN is introduced into the reinforcement learning model, and the neural network parameter θ is iteratively updated by using the fitting Q function obtained by fitting the actual Q function Q(s _k , _ak ) using the neural network parameter θ of DNN, and the optimal result of the fitting Q function finally obtained is the optimal strategy evaluation function Q ^* (s _k , _ak ), and the deep reinforcement learning model is solved at this time.

5. The energy-optimized low-orbit satellite edge computing resource allocation method according to claim 1 is characterized in that, in step S5, the intelligent agent uses the collected environmental state information of the kth batch as the state _sk input, and calculates to obtain the state evaluation function _gk ; then the optimization problem model established in step S3 and the deep reinforcement learning algorithm based on optimized DQN adopted in step S4 are used to solve, and the computing resource allocation strategy _ak = { ^ck , ^{fk, GMT} , fk ^{, LEO} , fk ^{, GCS} } is output to obtain the computing resource allocation conditions {fk ^{, GMT} , ^{fk, LEO} , ^fk ^{, GCS} } of each task scheduling mode and each ground mobile terminal, low-orbit satellite and ground cloud server, and distribute them to each ground mobile terminal, low-orbit satellite and ground cloud server;

Among them, c ^k represents the task scheduling method vector of the k-th batch of task set, f ^k,GMT represents the computing resource vector allocated by the ground mobile terminal to each task in the k-th batch of task set, f ^k,LEO represents the computing resource vector allocated by the low-orbit satellite to each task in the k-th batch of task set, and f ^k,GCS represents the computing resource vector allocated by the ground cloud server to each task in the k-th batch of task set.