CN111818130A

CN111818130A - Joint optimization of cache and computation based on reinforcement learning

Info

Publication number: CN111818130A
Application number: CN202010556220.8A
Authority: CN
Inventors: 李保罡; 李诗璐; 赵伟
Original assignee: North China Electric Power University
Current assignee: North China Electric Power University
Priority date: 2020-06-17
Filing date: 2020-06-17
Publication date: 2020-10-23

Abstract

The embodiment of the present invention discloses the joint optimization of caching and computing in the NOMA-MEC system based on reinforcement learning, which mainly involves the joint caching and unloading problems of the NOMA-MEC system of multi-user and multi-MEC servers. Computational results can be shared among servers by extending the local cache of a single server to a cooperative cache of multiple servers. In order to minimize the delay of the whole system, find the optimal caching and unloading strategy for each user.

Description

Joint optimization of cache and computation based on reinforcement learning

技术领域technical field

本发明涉及通信领域，尤其涉及一种基于强化学习缓存与计算的联合优化。The invention relates to the field of communication, in particular to a joint optimization based on reinforcement learning cache and calculation.

背景技术Background technique

随着科技的发展，用户对移动通信业务的处理速度需求越来越高，为了加快移动通信业务的处理速度，减少数据传输的延时，提高用户的体验，移动通信行业正在讨论在无线接入网的边缘设置移动边缘计算服务器，通过移动边缘计算服务器可以为接入该无线网络的用户提供计算能力和/或存储能力。With the development of science and technology, users have higher and higher requirements for the processing speed of mobile communication services. In order to speed up the processing speed of mobile communication services, reduce the delay of data transmission, and improve user experience, the mobile communication industry is discussing the use of wireless access A mobile edge computing server is set at the edge of the network, and the mobile edge computing server can provide computing capability and/or storage capability for users accessing the wireless network.

非正交多址接入(Non-orthogonal Multiple Access，NOMA)和移动边缘计算(Mobile Edge Computing，MEC)都被认为是下一代无线网络中极具发展前景的技术。随着智能通信的快速发展，移动边缘计算越来越多地应用于未来通信领域，它可以提高用户在应用中的计算能力，可以在如虚拟现实、增强现实、无人驾驶等领域发挥重要作用。与使用传统的云计算系统不同，移动边缘计算可以在云无线接入网中提供计算能力。移动边缘计算的基本思想是利用无线电网络中具有强大计算能力的设施，通过利用如集成于基站中的移动边缘计算服务器，用户可以将计算任务卸载到位于网络边缘的移动边缘计算服务器中，再由移动边缘计算服务器进行任务计算。由于移动边缘计算服务器可以部署在用户附近，因此使用移动边缘计算的网络可以为用户提供低延迟和低能耗的任务计算服务。Both Non-orthogonal Multiple Access (NOMA) and Mobile Edge Computing (MEC) are considered to be very promising technologies in next-generation wireless networks. With the rapid development of intelligent communication, mobile edge computing is increasingly used in the field of future communication. It can improve the computing power of users in applications, and can play an important role in fields such as virtual reality, augmented reality, and unmanned driving. . Unlike using traditional cloud computing systems, mobile edge computing can provide computing power in cloud radio access networks. The basic idea of mobile edge computing is to use the facilities with powerful computing power in the radio network. By using the mobile edge computing server integrated in the base station, users can offload computing tasks to the mobile edge computing server located at the edge of the network, and then use the mobile edge computing server integrated in the base station. Mobile edge computing servers perform task computing. Since mobile edge computing servers can be deployed near users, a network using mobile edge computing can provide users with low-latency and low-energy-consuming task computing services.

在移动边缘计算(MEC)和非正交多址接入(NOMA)结合的系统中，主要利用卸载计算去降低时延和能耗。由于MEC服务器的计算容量有限，并且在 MEC系统中，不同用户卸载的任务经常是相同的。为了避免相同任务的重复传输和计算，缓存是降低时延的有效解决方法。在NOMA-MEC系统中的缓存，现有技术中只考虑一个MEC服务器和本地缓存。单个MEC服务器的缓存容量有限，因此很难缓存更多的任务计算结果。如果多个用户请求任务的类型是多样的，则会导致较低的命中率。在这种情况下，未完成的任务只能被计算，因此不能有效地减少时延。In a system combining Mobile Edge Computing (MEC) and Non-Orthogonal Multiple Access (NOMA), offload computing is mainly used to reduce latency and energy consumption. Due to the limited computing capacity of the MEC server, and in the MEC system, the tasks offloaded by different users are often the same. In order to avoid repeated transmission and computation of the same task, caching is an effective solution to reduce latency. For the cache in the NOMA-MEC system, only one MEC server and local cache are considered in the prior art. The cache capacity of a single MEC server is limited, so it is difficult to cache more task computation results. If the types of tasks requested by multiple users are diverse, it will result in a lower hit rate. In this case, the unfinished tasks can only be calculated, so the delay cannot be effectively reduced.

发明内容SUMMARY OF THE INVENTION

为解决上述技术问题，本发明实施例提供了基于强化学习NOMA-MEC 系统中缓存与计算的联合优化，也可以说是一种基于强化学习的方法，目的是去最小化时延。其中主要涉及多用户、多MEC服务器的NOMA-MEC系统的联合缓存和卸载问题。通过将单个服务器的本地缓存扩展到多个服务器的协作缓存，服务器之间可以共享计算结果。为了使整个系统的时延最小，需要找到每个用户最优的缓存和卸载策略。In order to solve the above technical problems, the embodiments of the present invention provide joint optimization of caching and computing in NOMA-MEC system based on reinforcement learning, which can also be said to be a method based on reinforcement learning, in order to minimize the delay. It mainly involves the joint caching and unloading of NOMA-MEC system with multi-user and multi-MEC server. Computational results can be shared among servers by extending the local cache of a single server to a cooperative cache of multiple servers. In order to minimize the delay of the whole system, it is necessary to find the optimal caching and unloading strategy for each user.

本发明实施例提供了如下技术方案：The embodiment of the present invention provides the following technical solutions:

一种基于强化学习缓存与计算的联合优化，应用于NOMA-MEC系统中, 联合优化缓存和计算，通过将单个服务器的本地缓存扩展到多个服务器的协作缓存，多个服务器之间可以共享计算结果，为了使整个系统的时延最小，找到每个用户最优的缓存和卸载策略，所述方法包括：A joint optimization based on reinforcement learning caching and computing, applied to the NOMA-MEC system, jointly optimizes caching and computing, by extending the local cache of a single server to the cooperative cache of multiple servers, multiple servers can share computing As a result, in order to minimize the delay of the entire system and find the optimal caching and unloading strategy for each user, the method includes:

针对用户不同的决策方式，生成本地缓存，协作缓存，本地计算和卸载计算模式下的时延函数；According to the different decision-making methods of users, generate local cache, cooperative cache, local computing and delay functions in offload computing mode;

考虑到任务流行度是动态的和未知的，提出门控递归单元算法来预测任务流行度，将流行度高的计算结果放置在相应的移动边缘计算服务器上进行缓存。Considering that the task popularity is dynamic and unknown, a gated recursive unit algorithm is proposed to predict the task popularity, and the calculation results with high popularity are placed on the corresponding mobile edge computing server for caching.

基于服务器上缓存的计算结果，利用多代理深度Q网络算法寻找最优的缓存和卸载决策；Based on the calculation results cached on the server, the multi-agent deep Q-network algorithm is used to find the optimal caching and unloading decisions;

其中，所述针对用户不同的决策方式，生成本地缓存，协作缓存，本地计算和卸载计算模式下的时延函数，具体包括：Wherein, according to the different decision-making methods of users, generating local cache, collaborative cache, local computing and offloading computing mode of delay function, specifically includes:

1)对于选择本地缓存的用户来说，与协作缓存和卸载计算相比，它的时延是忽略不计的。1) For users who choose local caching, its latency is negligible compared to cooperative caching and offloading computation.

2)在协作缓存模式下，处理任务的时延主要是服务器间传输产生的时延 T_m,n：2) In the cooperative cache mode, the delay of processing tasks is mainly the delay T _m,n generated by the transmission between servers:

其中r_m,n表示本地服务器m和协作服务器n的下载速率，M_j代表任务j计算结果的大小。where r _m,n represents the download rate of the local server m and the cooperative server n, and M _j represents the size of the calculation result of task j.

3)如果用户选择卸载计算，为了提高频谱利用率，进一步降低传输时延，采用NOMA技术将信息传输到本地基站。处理任务的时延由卸载时延

在 MEC服务器上的处理时延

和回传时延

三部分组成。由于回传时延较小，一般可以忽略。3) If the user chooses to offload the calculation, in order to improve the spectrum utilization rate and further reduce the transmission delay, the NOMA technology is used to transmit the information to the local base station. The latency of processing tasks is determined by the offload latency

Processing Latency on MEC Server

and return delay

It consists of three parts. Since the return delay is small, it can generally be ignored.

针对选择本地计算的用户，本地执行时延为

其中W_j表示处理任务j总的CPU周期数，

是用户i在本地计算阶段的计算容量。For users who choose local computing, the local execution delay is

where W _j represents the total number of CPU cycles for processing task j,

is the computing capacity of user i in the local computing phase.

其中，所述考虑到任务流行度是动态的和未知的，提出门控递归单元算法来预测任务流行度，将流行度高的计算结果放置在相应的移动边缘计算服务器上进行缓存，具体包括：Among them, considering that the task popularity is dynamic and unknown, a gated recursive unit algorithm is proposed to predict the task popularity, and the calculation results with high popularity are placed on the corresponding mobile edge computing server for caching, specifically including:

1)收集用户在历史时刻请求的任务和不同任务的计算结果，执行下面的操作。1) Collect the tasks requested by the user at historical moments and the calculation results of different tasks, and perform the following operations.

2)输入历史时期的流行度{S_j(1),S_j(2),···S_j(T_M)}，其中,S_j(t)表示任务j在时间t上的流行度。经过GRU单元，得到长度为T_N的预测的流行度 {S_j(T_M+1)，S_j(T_M+2),···S_j(T_M+T_N)}。2) Input the popularity of the historical period {S _j (1), S _j (2),...S _j (T _M )}, where S _j (t) represents the popularity of task j at time t. After the GRU unit, the predicted popularity {S _j (T _M +1), S _j (T _M +2), ··· S _j (T _M +T _N )} of length T _N is obtained.

3)计算预测流行度q(j)和在预测时刻到来时真实流行度p(j)之间的损失函数

通过不断调整参数，使损失函数变小。3) Calculate the loss function between the predicted popularity q(j) and the real popularity p(j) when the predicted moment arrives

By continuously adjusting the parameters, the loss function becomes smaller.

4)直到损失函数不再发生变化，停止训练，否则，返回1)。4) Stop training until the loss function no longer changes, otherwise, return to 1).

由于服务器的缓存空间有限，很难存储所有任务的结果，考虑到服务器的缓存容量和计算结果的大小，将任务的计算结果按流行度由大到小的顺序缓存在服务器上，以便尽可能多地缓存任务的计算结果。在这种方式下，其他服务器也使用GRU算法来缓存相应的计算结果。在所有用户请求任务之前，这些服务器已经放置了流行任务的计算结果，这有利于命中本地缓存和协作缓存。Due to the limited cache space of the server, it is difficult to store the results of all tasks. Considering the cache capacity of the server and the size of the calculation results, the calculation results of the tasks are cached on the server in descending order of popularity, so that as many as possible Cache the calculation results of the task locally. In this way, other servers also use the GRU algorithm to cache the corresponding calculation results. These servers have placed the computation results of popular tasks before all users request tasks, which is beneficial for hitting local caches and cooperative caches.

其中，所述基于服务器上缓存的计算结果，利用多代理深度Q网络算法寻找最优的缓存和卸载决策，具体包括：Wherein, based on the calculation results cached on the server, the multi-agent deep Q network algorithm is used to find the optimal caching and unloading decisions, specifically including:

在多代理系统中，将相同基站中的每个用户视为代理。每个代理在做决策前，不知道其他代理的行为。根据环境的状态信息，代理选择自己的动作。在选择动作前，需要判断命中的情况。根据状态信息和用户请求的任务，用户可以知道是否命中了本地缓存还是协作缓存。只有当缓存被命中时，用户才能选择缓存决策。在代理获得状态信息和做出动作选择后，环境会反馈给代理奖励信息用于判断动作的好坏。考虑到奖励值，代理不断地调整自己的行为来让奖励最大化。MADQN算法的具体过程如下:In a multi-agent system, each user in the same base station is considered an agent. Each agent is unaware of the behavior of other agents before making a decision. Based on the state information of the environment, the agent chooses its own actions. Before choosing an action, you need to judge the hit situation. Based on the state information and the task requested by the user, the user can know whether the local cache or the cooperative cache was hit. The user can choose a caching decision only when the cache is hit. After the agent obtains state information and makes an action choice, the environment will feed back reward information to the agent for judging the quality of the action. Considering the reward value, the agent continuously adjusts its behavior to maximize the reward. The specific process of the MADQN algorithm is as follows:

1)初始化状态s_i，动作a_i，奖励r_i，当前Q网络Q(s,a；θ)和目标Q网络

等变量。1) Initialize state s _i , action a _i , reward _ri , current Q network Q(s, a; θ) and target Q network

etc. variables.

2)对于每个代理，观察初始状态

其中μ(t)是在时间t上，存储在所有服务器上任务的序号。

表示用户i在本地服务器m上的时延。

是MEC服务器剩余的计算容量。ξ(t)代表在同一基站中N个用户可利用的能耗。2) For each agent, observe the initial state

where μ(t) is the sequence number of the task stored on all servers at time t.

Indicates the delay of user i on the local server m.

is the remaining computing capacity of the MEC server. ξ(t) represents the energy consumption available to N users in the same base station.

3)根据贪心算法选择动作

代理以高概率1-ε选择最优的决策。否则，它将以ε的概率随机的选择缓存和卸载决策。其中

是任务卸载决策，

表示本地计算决策，

分别表示本地缓存和协作缓存决策，ε为影响因子。3) Choose action according to greedy algorithm

The agent chooses the optimal decision with high probability 1-ε. Otherwise, it randomly chooses cache and unload decisions with probability ε. in

is the task offloading decision,

represents a local computing decision,

represent local cache and cooperative cache decisions, respectively, and ε is the impact factor.

4)更新奖励r_i(t)＝T_i(t-1)-T_i(t)，其中T_i(t)和T_i(t-1)分别表示用户i在时间t和t-1上的时延。4) Update reward r _i (t)=T _i (t-1)-T _i (t), where T _i (t) and T _i (t-1) represent user i at time t and t-1, respectively delay.

5)存储代理i在时间t上的经验e_i(t)到经验回放单元D(t)中，训练时从中抽取小批量的转移序列。5) Store the experience e _i (t) of agent i at time t into the experience playback unit D(t), and extract a small batch of transfer sequences from it during training.

6)使用随机梯度下降法更新损失函数L(t)。6) Use stochastic gradient descent to update the loss function L(t).

7)更新下一个状态s(t+1)←s(t)。7) Update the next state s(t+1)←s(t).

8)直到参数θ收敛停止训练。8) Stop training until the parameter θ converges.

按照上述方法，缓存和卸载决策的选择可以被执行在其他基站的用户中，由于整个系统的时延是由多个基站的时延组成，所以最小化整个系统的时延可以通过减少不同基站的时延来实现。According to the above method, the selection of buffering and unloading decisions can be performed among users of other base stations. Since the delay of the whole system is composed of the delays of multiple base stations, the delay of the whole system can be minimized by reducing the delay of different base stations. delay to achieve.

与现有技术相比，上述技术方案具有以下优点：Compared with the prior art, the above technical solution has the following advantages:

本发明实施例提供了基于强化学习NOMA-MEC系统中缓存与计算的联合优化，也可以说是一种基于强化学习的方法，联合优化NOMA-MEC系统中的缓存和计算。首先，考虑到多服务器间的计算结果缓存和任务卸载的问题，利用多服务间协作缓存的方式来提高缓存命中率。其次，考虑到任务流行度是动态的和未知的这种更为合理的情况，提出门控递归单元这种新颖的任务流行度的预测方法，该算法是第一次被用于任务流行度的预测。利用这种算法去预测任务流行度，对于随时间动态变化的流行度，通过适当的增加学习率，它预测的更加准确。最后，基于预测的流行度，多代理深度Q网络算法被用来自主的选择缓存和卸载决策，通过不断地训练，它可以达到最小化时延的目的。The embodiments of the present invention provide joint optimization of caching and computing in the NOMA-MEC system based on reinforcement learning, which can also be said to be a method based on reinforcement learning to jointly optimize the caching and computing in the NOMA-MEC system. First, considering the problems of calculation result caching and task offloading among multiple servers, the cache hit rate is improved by means of cooperative caching among multiple services. Secondly, considering the more reasonable situation that task popularity is dynamic and unknown, we propose a novel task popularity prediction method called Gated Recurrent Unit, which is used for the first time for task popularity. predict. Using this algorithm to predict the popularity of tasks, for the popularity that changes dynamically over time, it can predict more accurately by appropriately increasing the learning rate. Finally, based on the predicted popularity, a multi-agent deep Q-network algorithm is used to autonomously select caching and offloading decisions, which can minimize latency through continuous training.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are For some embodiments of the present invention, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

图1为本发明一个实施例所提供的基于强化学习缓存与计算的联合优化示意图。FIG. 1 is a schematic diagram of joint optimization based on reinforcement learning caching and computing provided by an embodiment of the present invention.

具体实施方式Detailed ways

正如背景技术部分所述，在NOMA-MEC系统中的缓存，现有技术中只考虑一个MEC服务器和本地缓存。单个MEC服务器的缓存容量有限，因此很难缓存更多的任务计算结果。如果多个用户请求任务的类型是多样的，则会导致较低的命中率。在这种情况下，未完成的任务只能被计算，因此不能有效地减少时延。As mentioned in the background section, for the cache in the NOMA-MEC system, only one MEC server and local cache are considered in the prior art. The cache capacity of a single MEC server is limited, so it is difficult to cache more task computation results. If the types of tasks requested by multiple users are diverse, it will result in a lower hit rate. In this case, the unfinished tasks can only be calculated, so the delay cannot be effectively reduced.

针对现有技术的不足，本申请提出一种基于强化学习缓存与计算的联合优化，应用在NOMA-MEC系统中，其中主要涉及多用户、多MEC服务器的 NOMA-MEC系统的联合缓存和卸载问题。通过将单个服务器的本地缓存扩展到多个服务器的协作缓存，服务器之间可以共享计算结果。为了使整个系统的时延最小，需要找到每个用户最优的缓存和卸载策略。In view of the deficiencies of the prior art, this application proposes a joint optimization of caching and computing based on reinforcement learning, which is applied in the NOMA-MEC system, which mainly involves the joint caching and unloading of the NOMA-MEC system with multiple users and multiple MEC servers. . Computational results can be shared among servers by extending the local cache of a single server to a cooperative cache of multiple servers. In order to minimize the delay of the whole system, it is necessary to find the optimal caching and unloading strategy for each user.

如图1所示，本申请提出了基于强化学习缓存与计算的联合优化，应用在NOMA-MEC系统中，通过将单个服务器的本地缓存扩展到多个服务器的协作缓存，多个服务器之间可以共享计算结果，为了使整个系统的时延最小，找到每个用户最优的缓存和卸载策略，包括：As shown in Figure 1, this application proposes a joint optimization based on reinforcement learning caching and computing, which is applied in the NOMA-MEC system. By extending the local cache of a single server to the cooperative cache of multiple servers, multiple servers can Sharing the calculation results, in order to minimize the delay of the entire system, find the optimal caching and unloading strategy for each user, including:

第一步，针对用户不同的决策方式，生成本地缓存，协作缓存，本地计算和卸载计算模式下的时延函数。The first step is to generate local cache, cooperative cache, local computing and offload computing mode delay functions according to different decision-making methods of users.

在 MEC服务器上的处理时延

和回传时延

Processing Latency on MEC Server

and return delay

针对选择本地计算的用户，本地执行时延为

其中W_j表示处理任务j总的CPU周期数，f_i ^l是用户i在本地计算阶段的计算容量。For users who choose local computing, the local execution delay is

where W _j represents the total number of CPU cycles for processing task j, and f _i ^l is the computing capacity of user i in the local computing phase.

第二步，决定在MEC服务器上缓存的内容。考虑到任务流行度是动态的和未知的，提出门控递归单元(Gated Recurrent Unit，GRU)算法来预测任务流行度，将流行度高的计算结果放置在相应的服务器上。具体过程如下：The second step is to decide what to cache on the MEC server. Considering that the task popularity is dynamic and unknown, a Gated Recurrent Unit (GRU) algorithm is proposed to predict the task popularity, and the calculation results with high popularity are placed on the corresponding server. The specific process is as follows:

By continuously adjusting the parameters, the loss function becomes smaller.

第三步，基于服务器上缓存的计算结果，利用多代理深度Q网络 (Multi-agentDeep-Q-network，MADQN)算法寻找最优的缓存和卸载决策。In the third step, based on the calculation results cached on the server, the Multi-agent Deep-Q-network (MADQN) algorithm is used to find the optimal caching and unloading decisions.

etc. variables.

2)对于每个代理，观察初始状态

其中μ(t)是在时间t上，存储在所有服务器上任务的序号。

表示用户i在本地服务器m上的时延。

where μ(t) is the sequence number of the task stored on all servers at time t.

Indicates the delay of user i on the local server m.

3)根据贪心算法选择动作

是任务卸载决策，

表示本地计算决策，

is the task offloading decision,

represents a local computing decision,

7)更新下一个状态s(t+1)←s(t)。7) Update the next state s(t+1)←s(t).

可见，本发明实施例提供了基于强化学习NOMA-MEC系统中缓存与计算的联合优化，也可以说是一种基于强化学习的方法，联合优化NOMA-MEC 系统中的缓存和计算。首先，考虑到多服务器间的计算结果缓存和任务卸载的问题，利用多服务间协作缓存的方式来提高缓存命中率。其次，考虑到任务流行度是动态的和未知的这种更为合理的情况，提出门控递归单元这种新颖的任务流行度的预测方法，该算法是第一次被用于任务流行度的预测。利用这种算法去预测任务流行度，对于随时间动态变化的流行度，通过适当的增加学习率，它预测的更加准确。最后，基于预测的流行度，多代理深度Q 网络算法被用来自主的选择缓存和卸载决策，通过不断地训练，它可以达到最小化时延的目的。It can be seen that the embodiments of the present invention provide joint optimization of caching and computing in NOMA-MEC system based on reinforcement learning, which can also be said to be a method based on reinforcement learning to jointly optimize caching and computing in NOMA-MEC system. First, considering the problems of calculation result caching and task offloading among multiple servers, the cache hit rate is improved by means of cooperative caching among multiple services. Secondly, considering the more reasonable situation that task popularity is dynamic and unknown, we propose a novel task popularity prediction method called Gated Recurrent Unit, which is used for the first time for task popularity. predict. Using this algorithm to predict the popularity of tasks, for the popularity that changes dynamically over time, it can predict more accurately by appropriately increasing the learning rate. Finally, based on the predicted popularity, a multi-agent deep Q-network algorithm is used to autonomously select caching and offloading decisions, which can minimize latency through continuous training.

综上可知，本申请具有以下优点：To sum up, the present application has the following advantages:

(1)通过将单个服务器的本地缓存扩展到多个服务器的协作缓存，计算结果可以在多服务器间共享，比较单个服务器的本地缓存，多服务器间的协作缓存能够提高缓存命中率。也就是说，本申请考虑到多服务器间的计算结果缓存和任务卸载的问题，利用多服务间协作缓存的方式来提高缓存命中率。(1) By extending the local cache of a single server to the cooperative cache of multiple servers, the calculation results can be shared among multiple servers. Comparing the local cache of a single server, the cooperative cache among multiple servers can improve the cache hit rate. That is to say, the present application takes into account the problems of calculation result caching and task offloading among multiple servers, and improves the cache hit rate by means of cooperative caching among multiple services.

(2)提出将NOMA和MEC结合的计算结果缓存方法去解决有限计算资源和频谱资源的问题。(2) The calculation result caching method combining NOMA and MEC is proposed to solve the problem of limited computing resources and spectrum resources.

(3)考虑到任务流行度是动态的和未知的这种更为合理的情况。针对这种更合理的情况，本申请提出GRU这种新颖的任务流行度的预测方法，利用 GRU算法去预测任务流行度，对于随时间动态变化的流行度，通过适当的增加学习率，它预测的更加准确。而该算法是第一次被用于任务流行度的预测。(3) Consider the more plausible case that task popularity is dynamic and unknown. In response to this more reasonable situation, this application proposes a novel task popularity prediction method called GRU, which uses the GRU algorithm to predict the task popularity. For the popularity that changes dynamically over time, by appropriately increasing the learning rate, it predicts more accurate. And the algorithm is the first to be used for task popularity prediction.

(4)基于预测的流行度，提出MADQN算法去优化缓存和卸载决策，它可以在状态动作空间大的情况下，找到最优的缓存和卸载策略，从而使系统的时延最小。该MADQN算法被用来自主的选择缓存和卸载决策，通过不断地训练，它可以达到最小化时延的目的。(4) Based on the predicted popularity, the MADQN algorithm is proposed to optimize the caching and unloading decisions. It can find the optimal caching and unloading strategy in the case of a large state-action space, so as to minimize the system delay. The MADQN algorithm is used to autonomously select cache and unload decisions, and through continuous training, it can achieve the goal of minimizing latency.

本说明书中各个部分采用递进的方式描述，每个部分重点说明的都是与其他部分的不同之处，各个部分之间相同相似部分互相参见即可。Each part in this specification is described in a progressive manner, and each part focuses on the differences from other parts, and it is sufficient to refer to each other for the same and similar parts among the various parts.

对所公开的实施例的上述说明，使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的，本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下，在其它实施例中实现。因此，本发明将不会被限制于本文所示的实施例，而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments enables any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. Based on the joint optimization of reinforcement learning cache and calculation, it is characterized in that, applied in the NOMA-MEC system, by extending the local cache of a single server to the cooperative cache of multiple servers, the calculation results can be shared between multiple servers, In order to minimize the delay of the whole system and find the optimal caching and unloading strategy for each user, the method includes:

The first step is to generate local cache, cooperative cache, local computing and offload computing mode delay functions according to different decision-making methods of users.

In the second step, considering that the task popularity is dynamic and unknown, a gated recursive unit algorithm is proposed to predict the task popularity, and the calculation results with high popularity are placed on the corresponding mobile edge computing server for caching.

In the third step, based on the calculation results cached on the server, the multi-agent deep Q-network algorithm is used to find the optimal caching and unloading decisions.

Wherein, according to the different decision-making methods of users, generating local cache, collaborative cache, local computing and offloading computing mode of delay function, specifically includes:

1) For users who choose local caching, its latency is negligible compared to cooperative caching and offloading computation.

2) In the cooperative cache mode, the delay of processing tasks is mainly the delay T _m,n generated by the transmission between servers:

where r _m,n represents the download rate of the local server m and the cooperative server n, and M _j represents the size of the calculation result of task j.

3) If the user chooses to offload the calculation, in order to improve the spectrum utilization rate and further reduce the transmission delay, the NOMA technology is used to transmit the information to the local base station. The latency of processing tasks is determined by the offload latency

Processing Latency on MEC Server

and return delay

For users who choose local computing, the local execution delay is

Among them, considering that the task popularity is dynamic and unknown, a gated recursive unit algorithm is proposed to predict the task popularity, and the calculation results with high popularity are placed on the corresponding mobile edge computing server for caching, specifically including:

1) Collect the tasks requested by the user at historical moments and the calculation results of different tasks, and perform the following operations.

2) Input the popularity of the historical period {S _j (1), S _j (2),...S _j (T _M )}, where S _j (t) represents the popularity of task j at time t. After the GRU unit, the predicted popularity {S _j (T _M +1), S _j (T _M +2), ... S _j (T _M +T _N )} of length T _N is obtained.

3) Calculate the loss function between the predicted popularity q(j) and the real popularity p(j) when the predicted moment arrives

By continuously adjusting the parameters, the loss function becomes smaller.

4) Stop training until the loss function no longer changes, otherwise, return to 1).

Due to the limited cache space of the server, it is difficult to store the results of all tasks. Considering the cache capacity of the server and the size of the calculation results, the calculation results of the tasks are cached on the server in descending order of popularity, so that as many as possible Cache the calculation results of the task locally. In this way, other servers also use the GRU algorithm to cache the corresponding calculation results. These servers have placed the computation results of popular tasks before all users request tasks, which is beneficial for hitting local caches and cooperative caches.

Wherein, based on the calculation results cached on the server, the multi-agent deep Q network algorithm is used to find the optimal caching and unloading decisions, specifically including:

In a multi-agent system, each user in the same base station is considered an agent. Each agent is unaware of the behavior of other agents before making a decision. Based on the state information of the environment, the agent chooses its own actions. Before choosing an action, you need to judge the hit situation. Based on the state information and the task requested by the user, the user can know whether the local cache or the cooperative cache was hit. The user can choose a caching decision only when the cache is hit. After the agent obtains state information and makes an action choice, the environment will feed back reward information to the agent for judging the quality of the action. Considering the reward value, the agent continuously adjusts its behavior to maximize the reward. The specific process of the MADQN algorithm is as follows:

1) Initialize state s _i , action a _i , reward _ri , current Q network Q(s, a; θ) and target Q network

etc. variables.

2) For each agent, observe the initial state

where μ(t) is the sequence number of the task stored on all servers at time t.

Indicates the delay of user i on the local server m.

3) Choose action according to greedy algorithm

is the task offloading decision,

represents a local computing decision,

and

4) Update reward r _i (t)=T _i (t-1)-T _i (t), where T _i (t) and T _i (t-1) represent user i at time t and t-1, respectively delay.

5) Store the experience e _i (t) of agent i at time t into the experience playback unit D(t), and extract a small batch of transfer sequences from it during training.

6) Use stochastic gradient descent to update the loss function L(t).

7) Update the next state s(t+1)←s(t).

8) Stop training until the parameter θ converges.

According to the above method, the selection of buffering and unloading decisions can be performed among users of other base stations. Since the delay of the whole system is composed of the delays of multiple base stations, the delay of the whole system can be minimized by reducing the delay of different base stations. delay to achieve.