CN113726858B

CN113726858B - Self-adaptive AR task unloading and resource allocation method based on reinforcement learning

Info

Publication number: CN113726858B
Application number: CN202110925610.2A
Authority: CN
Inventors: 贺丽君; 张婉玥; 李凡
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2022-08-16
Anticipated expiration: 2041-08-12
Also published as: CN113726858A

Abstract

The invention discloses an adaptive AR task offloading and resource allocation method based on reinforcement learning, including selecting AR tasks with different video frame resolutions at the user end, and obtaining corresponding accuracy rates at the same time; Based on the delay, user energy consumption and user cost, a user experience model is established, thereby forming a joint optimization problem with the goal of improving user experience; the optimization problem is transformed into a Markov decision process, and its state space, action space and Reward setting, the reinforcement learning network is designed according to the Markov decision process; the reinforcement learning network is trained until the network converges; after the network is trained, the state of the user terminal and the MEC server is input into the network to obtain the corresponding strategy. The present invention comprehensively considers the different requirements of the user terminal on the accuracy rate, user energy consumption and user cost, effectively improves the user experience within the task delay threshold, and achieves the Nash equilibrium of the user experience.

Description

An adaptive AR task offloading and resource allocation method based on reinforcement learning

技术领域technical field

本发明属于无线通信技术领域，特别是涉及一种基于强化学习的自适应AR任务卸载和资源分配方法。The invention belongs to the technical field of wireless communication, and in particular relates to an adaptive AR task offloading and resource allocation method based on reinforcement learning.

背景技术Background technique

随着第五代移动通信技术的发展，越来越多的计算密集型和时延敏感型应用不断涌现。其中，增强现实(Augmented Reality，AR)技术已逐渐应用在智能巡检、城市运维、工业生产当中，能够实时获取到大量的数据，但是由于用户终端设备计算资源和功率的限制，不能在短时间内处理大量数据。传统的云计算可以提供丰富的计算资源，但存在回程链路拥塞、传输时延长等缺点。而移动边缘计算(Mobile Edge Computing，MEC)能够在移动网络的边缘就近为用户提供服务，用户将任务卸载到MEC服务器进行处理可以减轻本地负载，节省用户能耗，带来低时延、低能耗等优点，从而提高用户的综合体验。With the development of the fifth-generation mobile communication technology, more and more computing-intensive and delay-sensitive applications are emerging. Among them, Augmented Reality (AR) technology has been gradually applied in intelligent inspection, urban operation and maintenance, and industrial production, and can obtain a large amount of data in real time. Process large amounts of data in a short amount of time. Traditional cloud computing can provide abundant computing resources, but there are disadvantages such as backhaul link congestion and transmission delay. Mobile Edge Computing (MEC) can provide services to users at the edge of the mobile network. Users can offload tasks to the MEC server for processing, which can reduce local load, save user energy consumption, and bring low latency and low energy consumption. and other advantages, so as to improve the comprehensive experience of users.

在现有的技术中，已有很多优化计算任务卸载时延或能耗的研究工作，此外，一些研究工作还着眼于优化用户代价。但是目前的工作中用户的任务大小都是固定的，没有深入研究用户任务需求变化的情况。即使针对同一任务，不同的用户可能会有不同的具体需求，并且现有研究很少将用户的体验质量进行考虑。对于AR计算密集型任务而言，本地资源受限需要卸载到MEC进行处理，而且不同的AR视频帧分辨率会带来不同的准确率，进而带来不同的用户体验。In the existing technologies, there have been many research works on optimizing computing task offloading delay or energy consumption. In addition, some research works also focus on optimizing user cost. However, in the current work, the user's task size is fixed, and there is no in-depth study of the change of user task requirements. Even for the same task, different users may have different specific needs, and existing research rarely considers the quality of user experience. For AR computing-intensive tasks, local resources are limited and need to be offloaded to the MEC for processing, and different AR video frame resolutions will bring different accuracy rates and thus different user experiences.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于克服上述现有工作的缺点，提供了一种基于强化学习的自适应AR任务卸载和资源分配方法，该方法能够有效提高用户的体验，达到各用户体验的纳什均衡。The purpose of the present invention is to overcome the shortcomings of the above-mentioned existing work, and to provide an adaptive AR task offloading and resource allocation method based on reinforcement learning, which can effectively improve user experience and achieve Nash equilibrium of user experience.

本发明采用如下技术方案来实现的：The present invention adopts following technical scheme to realize:

一种基于强化学习的自适应AR任务卸载和资源分配方法，包括以下步骤：An adaptive AR task offloading and resource allocation method based on reinforcement learning, comprising the following steps:

1)用户端选择不同视频帧分辨率的AR任务，同时得到该视频帧分辨率的AR任务相对应的准确率；1) The user terminal selects AR tasks with different video frame resolutions, and simultaneously obtains the corresponding accuracy rates of the AR tasks with the video frame resolutions;

2)对于用户端选择的不同视频帧分辨率的AR任务进行部分卸载，首先对AR任务进行不同比例分割，然后分别计算在不同计算资源和无线资源的分配下，任务部分卸载到不同MEC服务器以及本地执行所带来的时延、用户能耗及用户费用；2) Partially offload AR tasks with different video frame resolutions selected by the client. First, the AR tasks are divided into different proportions, and then the tasks are partially offloaded to different MEC servers under the allocation of different computing resources and wireless resources. Latency, user energy consumption and user fees caused by local execution;

3)在任务时延门限的约束下，综合考虑准确率、用户能耗和用户费用建立用户体验模型；通过联合优化AR视频帧分辨率选择策略、任务部分卸载策略、计算资源和无线资源分配策略，形成以提高用户体验为目标的优化问题；3) Under the constraint of task delay threshold, a user experience model is established by comprehensively considering the accuracy rate, user energy consumption and user cost; by jointly optimizing the AR video frame resolution selection strategy, task part offloading strategy, computing resource and wireless resource allocation strategy , forming an optimization problem with the goal of improving user experience;

4)将优化问题转化为马尔科夫决策过程，初始化其状态空间、动作空间以及奖励设置；根据马尔科夫决策过程设计强化学习网络；4) Transform the optimization problem into a Markov decision process, initialize its state space, action space and reward settings; design a reinforcement learning network according to the Markov decision process;

5)通过MADDPG算法训练强化学习网络，直到网络收敛；5) Train the reinforcement learning network through the MADDPG algorithm until the network converges;

6)网络训练好之后，将用户端和MEC服务器的状态输入网络，得到相应的策略。6) After the network is trained, input the state of the client and MEC server into the network to get the corresponding strategy.

本发明进一步的改进在于，步骤1)中，假设用户i选择的AR视频被预处理为视频帧分辨率为v_i×v_i像素的视频帧，即用户i的视频帧分辨率为

表示视频帧分辨率的集合，其中

为用户集合；不同的帧分辨率对应有不同的任务大小和时延门限，用户i的计算任务表示为τ_i＝{d_i,c_i，thr_i}，分别包括任务的大小、计算量和时延门限，d_i＝δv_i ²为任务大小，其中δ定义为表示一个像素信息所需的比特数，任务计算量与任务大小之间的关系为c_i＝η_id_i，其中η_i为计算密度，即单位任务所需的计算量，根据AR视频帧分辨率与准确率的关系，用户i得到的准确率Q_i为：A further improvement of the present invention is that, in step 1), it is assumed that the AR video selected by user i is preprocessed into a video frame with a video frame resolution of v _i ×vi pixels, that is, the video frame resolution of user _i is

represents the set of video frame resolutions, where

is a set of users; different frame resolutions correspond to different task sizes and delay thresholds, and the computing task of user i is expressed as τ _i ={d _i , _{ci , thr i} _} , including the size of the task, the amount of computation and Time delay threshold, d _i =δv _i ² is the task size, where δ is defined as the number of bits required to represent one pixel information, and the relationship between task computation and task size is c _i =η _i d _i , where η _i In order to calculate the density, that is, the amount of calculation required for a unit task, according to the relationship between the AR video frame resolution and the accuracy rate, the accuracy rate Qi obtained by user _i is:

本发明进一步的改进在于，步骤2)中，由于任务部分卸载能够并行执行，对于用户i的任务，执行完成所需的时延T_i为：A further improvement of the present invention is that, in step 2), since the partial unloading of tasks can be executed in parallel, for the task of user i, the time delay T _i required to complete the execution is:

其中，TL_i为用户i的部分任务在本地执行的时延，TM_i，k为用户i的部分任务卸载到MEC服务器k的执行时延。Among them, TL _i is the local execution delay of some tasks of user i, and TM _i,k is the execution delay of unloading some tasks of user i to MEC server k.

本发明进一步的改进在于，用户i的部分任务在本地执行的时延TL_i为：A further improvement of the present invention is that the local execution time delay TL _i of some tasks of user i is:

其中，c_i表示用户i的任务计算量，f_i表示用户i处理当前任务所分配的本地计算资源，b_i,k表示用户i卸载到MEC服务器k的任务比例；Among them, ci represents the task calculation amount of user _i , fi represents the local computing resources allocated by user _i to process the current task, and b _i,k represents the proportion of tasks that user i unloads to MEC server k;

用户i的部分任务卸载到MEC服务器k执行的时延TM_i,k为：The time delay TM _{i, k} for offloading part of the tasks of user i to the MEC server k is:

其中，d_i表示用户i的任务大小，rv_i,k表示用户i与MEC服务器k之间的上行传输速率，m_i,k表示用户i分配到的MEC服务器k的计算资源大小；Among them, d _i represents the task size of user _{i, r i,k} represents the uplink transmission rate between user i and MEC server k, and m _i,k represents the computing resource size of MEC server k allocated by user i;

用户i执行当前任务的用户能耗E_i为：The user energy consumption E _i of user i performing the current task is:

其中，EL_i为本地处理能耗，EM_i，k为用户卸载部分任务到MEC服务器k时的传输能耗；Among them, EL _i is the local processing energy consumption, EM _i,k is the transmission energy consumption when the user unloads some tasks to the MEC server k;

本地处理能耗EL_i为：The local processing energy consumption EL _i is:

其中，θ表示处理任务所需的能量密度，f_i为本地分配的计算资源；Among them, θ represents the energy density required to process the task, and f _i is the locally allocated computing resource;

用户卸载部分任务到MEC服务器k时的传输能耗EM_i，k为：The transmission energy consumption EM _{i when the user unloads some tasks to the MEC server k, k} is:

其中，P_i表示用户传输功率。Among them, P _i represents the user transmission power.

本发明进一步的改进在于，步骤2)中，用户i卸载任务到MEC服务器执行时，根据任务计算量的大小c_i，卸载的任务比例b_i,k，MEC服务器收费的单价ε，需要付出相应的费用W_i：A further improvement of the present invention is that, in step 2), when the user i unloads the task to the MEC server for execution, according to the size c _i of the task calculation amount, the unloaded task ratio b _i,k , and the unit price ε charged by the MEC server, need to pay the corresponding The cost _Wi :

本发明进一步的改进在于，步骤3)中，为了提高各用户综合体验，达到各用户体验的纳什均衡，降低用户能耗和用户费用，并提高用户AR任务的准确率，因此，综合这三个因素形成用户体验模型：A further improvement of the present invention is that, in step 3), in order to improve the comprehensive experience of each user, achieve the Nash equilibrium of each user's experience, reduce user energy consumption and user cost, and improve the accuracy rate of user AR tasks, therefore, combining these three Factors that form a user experience model:

U_i＝ξ_QQ_i-ξ_EE_i-ξ_WW_i U _i =ξ _Q Q _i -ξ _E E _i -ξ _W Wi _i

在时延门限、计算资源和无线资源的约束下，联合优化用户的AR视频帧分辨率选择策略

任务部分卸载策略

无线资源分配策略

本地和MEC计算资源分配策略

形成最大化各用户体验为目标的优化问题：Under the constraints of delay threshold, computing resources and wireless resources, jointly optimize the user's AR video frame resolution selection strategy

Task Section Uninstall Policy

Radio resource allocation strategy

Local and MEC computing resource allocation strategies

Form an optimization problem with the goal of maximizing the user experience:

本发明进一步的改进在于，步骤4)中，状态空间包括所有用户计算资源大小、MEC计算资源大小，基站空闲的无线资源，初始计算资源和无线资源的分配方案、AR视频帧分辨率选择方案、任务部分卸载方案；动作空间为对初始计算资源和无线资源的分配方案、AR视频帧分辨率选择方案、任务部分卸载方案的改变量；奖励为根据约束条件(c1)-(c5)设置三层奖励。A further improvement of the present invention is that in step 4), the state space includes the size of all user computing resources, the size of MEC computing resources, the idle wireless resources of the base station, the allocation scheme of initial computing resources and wireless resources, the AR video frame resolution selection scheme, The task part offloading scheme; the action space is the allocation scheme for initial computing resources and wireless resources, the AR video frame resolution selection scheme, and the change amount of the task partial offloading scheme; the reward is to set three layers according to the constraints (c1)-(c5) award.

本发明进一步的改进在于，步骤5)中，首先根据强化学习网络得到新的状态、动作和奖励，存储在经验回放池中，然后当经验回放池的数据达到阈值，开始训练设计的强化学习网络，并不断更新经验回放池中的数据和网络参数，直到网络训练收敛。A further improvement of the present invention is that in step 5), first obtain new states, actions and rewards according to the reinforcement learning network, store them in the experience playback pool, and then start training the designed reinforcement learning network when the data in the experience playback pool reaches the threshold , and continuously update the data and network parameters in the experience replay pool until the network training converges.

本发明进一步的改进在于，步骤6)中，网络训练好之后，在具体的应用过程中，将当前时刻的状态输入到网络中，即可得到AR视频帧分辨率选择策略、任务部分卸载策略、计算资源和无线资源分配策略。A further improvement of the present invention is that, in step 6), after the network is trained, in the specific application process, the state at the current moment is input into the network to obtain the AR video frame resolution selection strategy, the task part unloading strategy, Computing resources and radio resource allocation strategies.

本发明至少具有以下有益的技术效果：The present invention has at least the following beneficial technical effects:

本发明提供的基于强化学习的自适应AR任务卸载和资源分配方法，在具体操作时，首先用户端选择不同帧分辨率的AR任务，得到相应的准确率。用户对任务进行不同比例的分割，在不同计算资源和无线资源的分配下，部分卸载到不同的MEC服务器以及部分本地执行，分别计算得到相应的时延、用户能耗及用户费用。在任务时延门限的约束下，建立用户体验模型，通过联合优化视频帧分辨率选择策略、任务部分卸载策略、计算资源和无线资源分配策略，形成以提高用户体验为目标的优化问题。将优化问题转化为马尔科夫决策过程，初始化其状态空间和动作空间，以及设置奖励，根据提出的MADDPG算法开始训练网络，并不断更新经验回放池中的数据和网络参数，直到网络收敛。最后由网络根据用户端和MEC服务器端的状态，得到帧分辨率选择策略、任务卸载和资源分配策略。In the reinforcement learning-based adaptive AR task offloading and resource allocation method provided by the present invention, during specific operations, the user terminal first selects AR tasks with different frame resolutions to obtain corresponding accuracy rates. The user divides the tasks in different proportions, and under the allocation of different computing resources and wireless resources, some are offloaded to different MEC servers and some are executed locally, and the corresponding delay, user energy consumption and user fees are calculated respectively. Under the constraint of task delay threshold, a user experience model is established, and an optimization problem aiming at improving user experience is formed by jointly optimizing the video frame resolution selection strategy, task part offloading strategy, computing resource and wireless resource allocation strategy. Transform the optimization problem into a Markov decision process, initialize its state space and action space, and set the reward, start training the network according to the proposed MADDPG algorithm, and continuously update the data and network parameters in the experience playback pool until the network converges. Finally, the network obtains the frame resolution selection strategy, task offloading and resource allocation strategy according to the state of the client and the MEC server.

综上所述，本发明综合考虑了用户端对准确率、用户能耗和用户费用的不同需求，联合优化了AR视频帧分辨率选择策略、任务部分卸载策略、计算资源和无线资源分配策略，在任务时延门限内，有效地提高用户体验。To sum up, the present invention comprehensively considers the different requirements of the user terminal for accuracy, user energy consumption and user cost, and jointly optimizes the AR video frame resolution selection strategy, the task part offloading strategy, the computing resource and the wireless resource allocation strategy, Within the task delay threshold, the user experience is effectively improved.

附图说明Description of drawings

图1：本发明的流程示意图；Fig. 1: the schematic flow chart of the present invention;

图2：各用户的用户体验(MEC计算能力为5G Cycles/s)示意图；Figure 2: Schematic diagram of user experience of each user (MEC computing capability is 5G Cycles/s);

图3：各用户的用户体验(MEC计算能力为7G Cycles/s)示意图；Figure 3: Schematic diagram of user experience of each user (MEC computing capability is 7G Cycles/s);

图4：各用户的用户体验(MEC计算能力为9G Cycles/s)示意图；Figure 4: Schematic diagram of user experience of each user (MEC computing capability is 9G Cycles/s);

图5：各用户的用户体验(MEC计算能力为11G Cycles/s)示意图；Figure 5: Schematic diagram of user experience of each user (MEC computing capability is 11G Cycles/s);

图6：MEC计算能力变化时的平均用户体验示意图；Figure 6: Schematic diagram of average user experience when MEC computing power changes;

图7：MEC计算能力变化时的平均能耗示意图；Figure 7: Schematic diagram of average energy consumption when MEC computing power changes;

图8：MEC计算能力变化时的平均费用示意图；Figure 8: Schematic diagram of average cost when MEC computing power changes;

图9：MEC计算能力变化时的平均准确率Figure 9: Average Accuracy Rates as MEC Computational Power Changes

图10：各用户的用户体验(MEC单价为0.009)示意图；Figure 10: Schematic diagram of the user experience of each user (MEC unit price is 0.009);

图11：各用户的用户体验(MEC单价为0.010)示意图；Figure 11: Schematic diagram of user experience of each user (MEC unit price is 0.010);

图12：各用户的用户体验(MEC单价为0.011)示意图；Figure 12: Schematic diagram of user experience of each user (MEC unit price is 0.011);

图13：各用户的用户体验(MEC单价为0.012)示意图；Figure 13: Schematic diagram of user experience of each user (MEC unit price is 0.012);

图14：MEC单价变化时的平均用户体验示意图；Figure 14: Schematic diagram of the average user experience when the MEC unit price changes;

图15：MEC单价变化时的平均能耗示意图；Figure 15: Schematic diagram of average energy consumption when the unit price of MEC changes;

图16：MEC单价变化时的平均费用示意图；Figure 16: Schematic diagram of the average cost when the unit price of MEC changes;

图17：MEC单价变化时的平均准确率示意图；Figure 17: Schematic diagram of the average accuracy rate when the MEC unit price changes;

图18：用户数变化时的平均用户体验示意图；Figure 18: Schematic diagram of the average user experience when the number of users changes;

图19：用户数变化时的平均能耗示意图；Figure 19: Schematic diagram of average energy consumption when the number of users changes;

图20：用户数变化时的平均费用示意图；Figure 20: Schematic diagram of the average cost when the number of users changes;

图21：用户数变化时的平均准确率示意图。Figure 21: Schematic diagram of the average accuracy when the number of users varies.

具体实施方式Detailed ways

下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例，然而应当理解，可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反，提供这些实施例是为了能够更透彻地理解本公开，并且能够将本公开的范围完整的传达给本领域的技术人员。需要说明的是，在不冲突的情况下，本发明中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本发明。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that the present disclosure will be more thoroughly understood, and will fully convey the scope of the present disclosure to those skilled in the art. It should be noted that the embodiments of the present invention and the features of the embodiments may be combined with each other under the condition of no conflict. The present invention will be described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

本发明整体流程图如附图1所示，下面结合附图对本发明进行详细阐述：The overall flow chart of the present invention is shown in accompanying drawing 1, and the present invention is described in detail below with reference to the accompanying drawings:

本发明针对多用户多MEC服务器系统，系统内每个基站旁部署一个MEC服务器，共有K个MEC服务器，表示为

每个MEC服务器服务范围内的用户数为I，则系统内用户总数为KI，用户集合表示为

在每一决策周期，用户i需要处理视频帧分辨率为

的AR计算任务，对应的准确率Q_i为：The present invention is aimed at a multi-user multi-MEC server system. One MEC server is deployed next to each base station in the system, and there are K MEC servers in total, which are expressed as

The number of users in the service range of each MEC server is I, the total number of users in the system is KI, and the user set is expressed as

In each decision cycle, user i needs to process the video frame resolution of

The AR computing task of , the corresponding accuracy rate _Qi is:

部分任务卸载到MEC服务器k执行的时延定义为TM_i,k，在本地执行的时延定义为TL_i，由于用户任务并行执行，则对于执行该任务所需的总时延T_i为：The delay of some tasks offloaded to the MEC server k is defined as TM _i,k , and the delay of local execution is defined as TL _i . Since user tasks are executed in parallel, the total delay T _i required to execute the task is:

用户i卸载部分任务到MEC服务器k执行所消耗的能量定义为EM_i，k，在本地执行消耗的能量定义为EL_i，则用户i消耗的总能量E_i为：The energy consumed by user i to unload some tasks to the MEC server k for execution is defined as EM _i,k , and the energy consumed by local execution is defined as EL _i , then the total energy E _i consumed by user i is:

当用户i卸载任务到MEC服务器执行时，根据任务计算量的大小c_i，卸载的任务比例b_i,k，MEC服务器收费的单价ε，需要付出相应的费用W_i：When the user i unloads the task to the MEC server for execution, according to the size of the task calculation c _i , the proportion of the unloaded task b _i,k , and the unit price ε charged by the MEC server, the corresponding fee W _i needs to be paid:

由此得到用户体验模型：This results in a user experience model:

U_i＝ξ_QQ_i-ξ_EE_i-ξ_WW_i U _i =ξ _Q Q _i -ξ _E E _i -ξ _W Wi _i

本发明以最大化用户体验为目标，在时延门限、计算资源和无线资源的约束下，形成如下优化问题：The present invention aims at maximizing user experience, and forms the following optimization problem under the constraints of time delay threshold, computing resources and wireless resources:

s.t.s.t.

(c1)

(c1)

(c2)

(c2)

(c3)

(c3)

(c4)

(c4)

(c5

(c5

本发明所述的基于强化学习的自适应AR任务卸载和资源分配方法包括以下步骤：The reinforcement learning-based adaptive AR task offloading and resource allocation method of the present invention includes the following steps:

3)在任务的时延门限约束下，综合考虑准确率、用户能耗和用户费用建立用户体验模型；通过联合优化AR视频帧分辨率选择策略、任务部分卸载策略、计算资源和无线资源分配策略，形成以提高用户体验为目标的优化问题；3) Under the constraint of task delay threshold, a user experience model is established by comprehensively considering the accuracy rate, user energy consumption and user cost; by jointly optimizing the AR video frame resolution selection strategy, task part offloading strategy, computing resource and wireless resource allocation strategy , forming an optimization problem with the goal of improving user experience;

下面参考图1进行详细的说明：The following is a detailed description with reference to Figure 1:

步骤11)：用户端选择不同视频帧分辨率的AR任务，得到相对应的准确率，具体过程为：Step 11): The user terminal selects AR tasks with different video frame resolutions to obtain the corresponding accuracy rates. The specific process is:

假设用户i选择的AR视频任务被预处理为视频帧分辨率为

的视频帧。不同的帧分辨率对应有不同的任务大小和时延门限，用户i的计算任务表示为τ_i＝{d_i,c_i,thr_i}，分别包括任务的大小、计算量和时延门限。d_i＝δv_i ²为任务大小，其中δ定义为表示一个像素信息所需的比特数。任务计算量与任务大小之间的关系为c_i＝η_id_i，其中η_i为单位任务所需的计算量。根据AR视频帧分辨率与准确率的关系，得到用户i的准确率Q_i为：Suppose that the AR video task selected by user i is preprocessed to a video frame resolution of

video frame. Different frame resolutions correspond to different task sizes and delay thresholds. The computing task of user i is expressed as τ _i ={d _i , _{ci ,thr i} _} , including the task size, calculation amount and delay threshold, respectively. d _i =δv _i ² is the task size, where δ is defined as the number of bits required to represent one pixel information. The relationship between the task computation amount and the task size is c _i =n _i d _i , where n _i is the computation amount required for a unit task. According to the relationship between AR video frame resolution and accuracy rate, the accuracy rate Q _i of user i is obtained as:

步骤12)：当用户i卸载部分任务到MEC服务器k执行时，消耗的时延TM_i,k为：Step 12): When the user i unloads some tasks to the MEC server k for execution, the consumed time delay TM _i,k is:

其中b_i,k为用户i卸载到MEC服务器k的任务比例，d_i为用户i的任务大小，c_i为用户i的任务计算量，rv_i,k为用户i与MEC服务器k之间的上行传输速率，m_i,k为用户i分配到的MEC服务器k的计算资源大小；where b _i,k is the proportion of tasks unloaded by user i to MEC server k, d _i is the task size of user _i , ci is the task calculation amount of user i, and rv _i,k is the difference between user i and MEC server k Uplink transmission rate, m _i,k is the computing resource size of MEC server k allocated by user i;

此时对应的用户传输能耗EM_i,k为：At this time, the corresponding user transmission energy consumption EM _i,k is:

其中P_i为用户i的传输功率；where P _i is the transmission power of user i;

当用户i在本地执行部分任务时，根据所分配的本地计算资源f_i，得到本地执行的时延TL_i为：When user i performs some tasks locally, according to the allocated local computing resources f _i , the local execution delay TL _i is obtained as:

此时对应的用户处理能耗EL_i为：At this time, the corresponding user processing energy consumption EL _i is:

其中θ为处理任务所需的能量密度，f_i为本地分配的计算资源；where θ is the energy density required to process the task, and f _i is the locally allocated computing resource;

由此可得用户i执行该任务的总时延为：From this, the total delay for user i to perform this task can be obtained as:

用户i执行该任务的用户能耗为：The user energy consumption of user i to perform this task is:

当用户i卸载任务到MEC服务器执行时，根据任务计算量的大小c_i，卸载的任务比例b_i,k，MEC服务器按任务计算量收费的单价ε，需要付出相应的用户费用W_i为：When the user i unloads the task to the MEC server for execution, according to the size of the task calculation amount c _i , the proportion of the unloaded task b _i,k , and the unit price ε charged by the MEC server according to the task calculation amount, the corresponding user fee W _i needs to be paid:

步骤13)：为了提高用户体验，达到各用户体验的纳什均衡，需要降低用户能耗和用户费用，并提高用户的准确率。通过计算不同的帧分辨选择策略、部分卸载策略和资源分配策略带来的准确率、时延、能耗和费用，得到用户体验模型：Step 13): In order to improve user experience and achieve the Nash equilibrium of each user experience, it is necessary to reduce user energy consumption and user cost, and improve user accuracy. By calculating the accuracy, delay, energy consumption and cost brought by different frame resolution selection strategies, partial offloading strategies and resource allocation strategies, the user experience model is obtained:

U_i＝ξ_QQ_i-ξ_EE_i-ξ_WW_i U _i =ξ _Q Q _i -ξ _E E _i -ξ _W Wi _i

以最大化用户体验为目标，在时延门限、计算资源和无线资源的约束下，形成的优化问题为：With the goal of maximizing user experience, under the constraints of delay threshold, computing resources and wireless resources, the optimization problem formed is:

s.t.s.t.

(c1)

(c1)

(c2)

(c2)

(c3)

(c3)

(c4)

(c4)

(c5)

(c5)

其中，z_i,k,n表示用户与MEC服务器k之间的信道n是否分给用户i，若分配则为1，否则为0。M_k为MEC服务器k的计算资源大小，F_i用户i的本地计算资源。Among them, z _i,k,n indicates whether the channel n between the user and the MEC server k is allocated to the user i, if it is allocated, it is 1, otherwise it is 0. M _k is the computing resource size of MEC server k, and F _i is the local computing resource of user i.

步骤14)：初始化马尔科夫决策过程的状态、动作和奖励。状态是对环境的描述，在智能体生成一个动作后，状态将发生变化。S_t＝(s₁，…，s_i，…，s_I)为系统在t时刻的状态空间为，其中s_i表示用户i的状态，表示为：Step 14): Initialize the states, actions and rewards of the Markov decision process. A state is a description of the environment that will change after the agent generates an action. _S _t =( _s ₁ , _.

s_i＝[v_i,b_i，1,…,b_i,K,m_i,1,…,m_i,K,z_i,1,1,…,z_i,1,N,…,z_i,k,n,…,z_i,K,N,f_i,rf_i,rm₁,…rm_k]s _i =[v _i ,b _i,1 ,...,b _i,K ,m _i,1 ,...,m _i,K ,z _i,1,1 ,...,z _i,1,N ,...,z _i,k,n ,…,z _i,K,N ,f _i ,rf _i ,rm ₁ ,…rm _k ]

其中，v_i为用户i选择的帧分辨率，b_i,1,…,b_i,K表示用户i卸载到各个MEC服务器的任务大小的比例，m_i，1，…，m_i，K表示各MEC服务器分配给用户i的计算能力，z_i，1,1,…,z_i,1,N,…,z_i，k,n,…,z_i,K,N表示各MEC服务器所在基站的上行信道资源是否空闲，f_i表示用户i分配的本地计算能力，rf_i表示用户i当前剩余计算能力，rm₁,…rm_k表示各MEC服务器当前剩余计算能力；Among them, vi is the frame resolution selected by user _i , bi _,1 ,...,bi _,K represents the proportion of the task size that user i unloads to each MEC server, _mi,1 ,..., _mi,K represents The computing power allocated by each MEC server to user i, zi _{, 1,1} ,…,zi _,1,N ,…,zi _,k,n ,…,zi _,K,N represents the base station where each MEC server is located Whether the uplink channel resource is idle, f _i represents the local computing capability allocated by user i, rf _i represents the current remaining computing capability of user i, rm ₁ ,...rm _k represents the current remaining computing capability of each MEC server;

智能体的动作表示对当前的资源分配状态的增加或减少量，是智能体行为的描述，是智能体决策的结果。令A_t＝(a₁，…，a_i，…，a_I)为系统在t时刻的动作空间，其中a_i表示用户i的动作，表示为：The action of the agent represents the increase or decrease of the current resource allocation state, is the description of the agent's behavior, and is the result of the agent's decision. Let At = (a ₁ ,..., a _i ,..., a _I ) be the action space of the system at time _t , where a _i represents the action of user i, expressed as:

a_i＝[cv_i，cb_i，1，…，cb_i，K，cm_i，1，…，cm_i，K，cz_i，1，1，…，cz_i，1，N，…，cz_i,k,n,…,cz_i,K,N,cf_i]a _i = [cv _i , cb _i,1 ,...,cb _i,K ,cm _i,1 ,...,cm _i,K ,cz _i,1,1 ,...,cz _i,1,N ,...,cz _i,k,n ,…,cz _i,K,N ,cf _i ]

其中，cv_i为用户i选择的帧分辨率的变化量，cb_i,1,…,cb_i,K表示用户i卸载到各个MEC服务器的任务比例的变化量的大小，cm_i,1,…,cm_i,K表示各MEC服务器分配给用户i的计算能力的变化量，cz_i,1,1,…,cz_i,1,N,…,cz_i,k,n,…,cz_i,K,N表示各MEC服务器所在基站的上行信道资源是否分给该用户，cf_i表示用户i分配的本地计算能力的变化量；Among them, cv _i is the change amount of the frame resolution selected by user i, cb _i,1 ,…,cb _i,K indicates the change amount of the task ratio that user i unloads to each MEC server, cm _i,1 ,… ,cm _i,K represents the variation of computing power allocated to user i by each MEC server, cz _i,1,1 ,…,cz _i,1,N ,…,cz _i,k,n ,…,cz _{i, K, N} represents whether the uplink channel resources of the base station where each MEC server is located is allocated to the user, and cf _i represents the variation of the local computing capability allocated by user i;

当用户产生动作后的状态不满足约束条件(c1)-(c4)时，奖励函数设置为：When the state after the user generates an action does not satisfy the constraints (c1)-(c4), the reward function is set as:

其中，Λ_(·)表示当不满足(·)的条件时，结果为-1，否则为0；l₁,χ₁,χ₂,χ₃,χ₄为实验参数。当满足约束条件(c1)-(c4)但不满足约束条件(c5)时，奖励函数设置为：Among them, Λ _(·) indicates that when the condition of (·) is not satisfied, the result is -1, otherwise it is 0; l ₁ , χ ₁ , χ ₂ , χ ₃ , χ ₄ are experimental parameters. When constraints (c1)-(c4) are satisfied but not (c5), the reward function is set as:

r_i＝l₂+exp(thr_i-T_i)r _i =l ₂ +exp(thr _i -T _i )

其中，exp(·)表示指数函数；l₂为实验参数。当满足所有约束条件时，奖励函数为：Among them, exp(·) represents an exponential function; l ₂ is an experimental parameter. When all constraints are met, the reward function is:

r_i＝l₃+exp(U_i)r _i =l ₃ +exp(U _i )

其中，l₃为实验参数。Among them, _l3 is the experimental parameter.

步骤15)：根据提出的基于强化学习的方法得到新的状态、动作和奖励，并存储在经验回放池中；Step 15): obtain new states, actions and rewards according to the proposed reinforcement learning-based method, and store them in the experience replay pool;

步骤16)：判断经验回放池中的数据是否达到阈值；Step 16): judge whether the data in the experience playback pool reaches the threshold;

步骤17)：开始训练网络直到收敛；Step 17): start training the network until convergence;

下面详细描述网络训练过程。The network training process is described in detail below.

该网络主要由两部分组成，分别是Actor网络和Critic网络，每个智能体有一个Actor网络和Critic网络，Actor网络和Critic网络各自又包含Eval-net和Target-net，Eval-net的网络参数定期赋给Target-net。Actor网络参数为θ＝{θ₁,…,θ_I}，令

表示所有智能体的动作策略集合。网络训练时，从经验回放池中随机抽取mini-batch大小的数据进行训练。经验回放池由一系列的状态、动作以及奖励对组成，即(S,A,R,S′)。Actor网络的Eval-net根据智能体当前的状态S产生相应的动作，接着Critic网络的Eval-net对当前产生的动作进行评判，得到对应的评判值Q。Actor网络的Target-net根据从经验回放池中得到的状态S′产生相应的动作，接着Critic网络的Target-net对动作进行评判，得到对应的评判值y。根据两个评判值得到Loss函数对Critic网络进行更新，用公式表示如下：The network is mainly composed of two parts, namely Actor network and Critic network. Each agent has an Actor network and Critic network. Actor network and Critic network each contain Eval-net and Target-net, Eval-net network parameters Periodically assigned to Target-net. Actor network parameters are θ={θ ₁ ,...,θ _I }, let

Represents the set of action policies for all agents. During network training, a mini-batch size of data is randomly selected from the experience replay pool for training. The experience replay pool consists of a series of state, action and reward pairs, namely (S, A, R, S′). The Eval-net of the Actor network generates corresponding actions according to the current state S of the agent, and then the Eval-net of the Critic network evaluates the currently generated actions to obtain the corresponding evaluation value Q. The Target-net of the Actor network generates corresponding actions according to the state S' obtained from the experience playback pool, and then the Target-net of the Critic network judges the actions and obtains the corresponding judgment value y. According to the two evaluation values, the Loss function is obtained to update the Critic network, which is expressed as follows:

其中，γ为折扣因子，而Actor网络通过最小化智能体的策略梯度进行更新，可表示为：Among them, γ is the discount factor, and the Actor network is updated by minimizing the policy gradient of the agent, which can be expressed as:

其中，X是mini-batch的大小，j是样本的索引。where X is the size of the mini-batch and j is the index of the sample.

步骤18)：网络训练好后，每一次只需要将用户和MEC服务器的状态输入到Actor网络中即可得到对应的策略；Step 18): After the network is trained, you only need to input the state of the user and the MEC server into the Actor network each time to obtain the corresponding strategy;

步骤19)：算法执行完成。Step 19): The algorithm execution is completed.

下面给出仿真设置和实验结果分析。The simulation settings and experimental results analysis are given below.

仿真平台在Python3.7，Pytorch1.5.0环境下实现，详细的仿真参数设置如表1和表2所示。The simulation platform is implemented in the environment of Python3.7 and Pytorch1.5.0. The detailed simulation parameter settings are shown in Table 1 and Table 2.

表1.仿真参数配置Table 1. Simulation parameter configuration

参数parameter 设定值set value 用户计算能力User computing power [1.5,2.0]G Cycles/s[1.5,2.0]G Cycles/s 用户传输功率User transmit power 1.0W1.0W MEC计算能力MEC computing power 5,7,9,11G Cycles/s5,7,9,11G Cycles/s 上行信道带宽Upstream channel bandwidth 100MHz100MHz 信道模型channel model Typical UrbanTypical Urban MEC费用单价MEC fee unit price 0.009,0.010,0.011,0.012per G Cycles0.009,0.010,0.011,0.012per G Cycles 视频帧分辨率Video frame resolution [200*200,600*600][200*200,600*600] 时延门限delay threshold [20,25]ms[20,25]ms 计算密度Calculate density [10,50]Cycles/bit[10,50] Cycles/bit 能量密度Energy Density 1.0×10<sup>-26</sup>J/Cycle1.0×10<sup>-26</sup>J/Cycle

表2.神经网络和训练参数Table 2. Neural network and training parameters

实验结果和分析Experimental Results and Analysis

实验结果中，对比算法为AL、AO、Random，本发明对应的算法为MADDPG。In the experimental results, the comparison algorithms are AL, AO, Random, and the corresponding algorithm of the present invention is MADDPG.

第一组实验：MEC计算能力的变化对实验结果的影响。图2至图5显示出在MEC服务器的不同计算能力的情况下，使用不同算法的各用户的用户体验的比较。从图中可以看出，与AL、AO和Random算法相比，本发明提出的MADDPG算法始终可以为每个用户提供较高的稳定的用户体验，这是因为本发明提出的MADDPG算法将用户体验作为目标函数，并通过强化学习的方法根据目标函数和约束条件设计了奖励函数，以奖励函数最大化指导网络进行学习，从而为每个用户做出接近最优的帧分辨率选择策略、部分卸载和资源分配策略，使用户体验最大化。图6至图9分别表示出随着MEC服务器计算能力的增强，不同算法下所有用户的平均用户体验、平均能耗、平均费用和平均准确率的变化。可以看出，本发明所提出的MADDPG算法通过对能耗、费用和准确率的折中，能够取得更好的平均用户体验。除了AL算法，可以看出其他算法的平均准确率逐渐增大并趋于稳定，这是因为随着MEC服务器计算能力的增强，可以分配给用户的计算资源增多，用户首先会选择能够带来高准确率的帧分辨率高的任务。高的帧分辨率通常允许更好的准确率，但代价是更长的任务执行时延、能耗和卸载带来的更高的费用，如图中结果所示。所以当MEC计算能力继续增强时，用户不会选择更高帧分辨率的任务，而是综合考虑时延门限、能耗、费用和准确率，使用户体验维持较好的水平。The first set of experiments: the effect of changes in MEC computing power on the experimental results. Figures 2 to 5 show a comparison of the user experience of each user using different algorithms under different computing capabilities of the MEC server. As can be seen from the figure, compared with the AL, AO and Random algorithms, the MADDPG algorithm proposed by the present invention can always provide a higher and stable user experience for each user, because the MADDPG algorithm proposed by the present invention improves the user experience As the objective function, the reward function is designed according to the objective function and constraints by the method of reinforcement learning, and the network is guided to learn by maximizing the reward function, so as to make a near-optimal frame resolution selection strategy and partial unloading for each user. and resource allocation strategies to maximize user experience. Figures 6 to 9 respectively show the changes of the average user experience, average energy consumption, average cost and average accuracy rate of all users under different algorithms as the computing power of the MEC server increases. It can be seen that the MADDPG algorithm proposed by the present invention can achieve a better average user experience by compromising energy consumption, cost and accuracy. Except for the AL algorithm, it can be seen that the average accuracy of other algorithms gradually increases and tends to be stable. This is because as the computing power of the MEC server increases, the computing resources that can be allocated to users increase. Accuracy for high frame resolution tasks. Higher frame resolutions generally allow better accuracy, but at the cost of longer task execution latency, energy consumption, and higher cost of offloading, as shown in the results in the figure. Therefore, when the computing power of MEC continues to increase, users will not choose tasks with higher frame resolution, but comprehensively consider the delay threshold, energy consumption, cost and accuracy rate, so that the user experience can be maintained at a better level.

第二组实验：MEC单价的变化对实验结果的影响。图10至图13显示出在MEC服务器的不同收费单价的情况下，使用不同算法的各用户的用户体验的比较。从图中可以看出，与AL、AO和Random算法相比，本发明提出的MADDPG算法始终可以为每个用户提供较高的稳定的用户体验，因为该算法可以根据MEC服务器的计算资源、单价和用户状态等更准确地为用户选择帧分辨率、分配计算资源和无线资源，从而取得更好的用户体验。图14至图17分别表示出随着MEC服务器单价的提高，不同算法下所有用户的平均用户体验、平均能耗、平均费用和平均准确率的变化。可以看出，当MEC服务器收费单价提高时，平均用户体验逐渐下降，这是因为收费单价较高时，用户不会选择帧分辨率高的大任务和卸载更多任务到MEC服务器处理，用户会选择适当的帧分辨率的任务从而在能耗、费用和准确率当中取折中，尽可能得到较高的用户体验。可以看出，当MEC服务器收费单价提高时，本发明所提出的MADDPG算法在能耗、费用和准确率方面都较为稳定，相比其他算法能够得到较好的用户体验。而AO算法为了较好的用户体验，会选择低分辨率的任务来更多的降低平均准确率和平均费用。The second set of experiments: the influence of the change of MEC unit price on the experimental results. FIG. 10 to FIG. 13 show the comparison of the user experience of each user using different algorithms in the case of different charging unit prices of the MEC server. As can be seen from the figure, compared with the AL, AO and Random algorithms, the MADDPG algorithm proposed by the present invention can always provide a higher and more stable user experience for each user, because the algorithm can and user status, etc., to more accurately select frame resolution, allocate computing resources and wireless resources for users, so as to achieve a better user experience. Figures 14 to 17 respectively show the changes of the average user experience, average energy consumption, average cost and average accuracy rate of all users under different algorithms as the unit price of the MEC server increases. It can be seen that when the unit price of the MEC server increases, the average user experience gradually declines. This is because when the unit price is higher, users will not choose large tasks with high frame resolution and offload more tasks to the MEC server for processing. The task of choosing the appropriate frame resolution to trade off energy, cost, and accuracy to get the best possible user experience. It can be seen that when the unit price charged by the MEC server increases, the MADDPG algorithm proposed by the present invention is relatively stable in terms of energy consumption, cost and accuracy, and can obtain better user experience than other algorithms. For better user experience, the AO algorithm will choose low-resolution tasks to reduce the average accuracy and average cost more.

第三组实验：用户数变化对实验结果的影响。该实验主要比较了在用户数增多时，不同算法下所有用户的平均用户体验、平均能耗、平均费用和平均准确率的变化。从图18至图21可以看出，随着用户数的增多，平均用户体验、平均费用和平均准确率都逐渐减小，相反平均能耗逐渐增大。因为用户数量的增加会产生更多的计算任务，从而会竞争MEC服务器有限的计算资源和无线资源，这意味着会有更多比例的计算任务在用户本地处理，导致本地处理能耗增大。因此，受MEC计算资源和无线资源的限制，用户会选择较低帧分辨率的任务，用户费用也会相应减少，从而尽可能得到较好的用户体验。The third group of experiments: the effect of changes in the number of users on the experimental results. This experiment mainly compares the average user experience, average energy consumption, average cost and average accuracy rate of all users under different algorithms when the number of users increases. It can be seen from Figure 18 to Figure 21 that with the increase of the number of users, the average user experience, average cost and average accuracy rate all gradually decrease, on the contrary, the average energy consumption increases gradually. Because the increase in the number of users will generate more computing tasks, which will compete for the limited computing resources and wireless resources of the MEC server, which means that a higher proportion of computing tasks will be processed locally by users, resulting in increased local processing energy consumption. Therefore, due to the limitation of MEC computing resources and wireless resources, users will choose tasks with lower frame resolutions, and user costs will be reduced accordingly, so as to obtain a better user experience as much as possible.

虽然，上文中已经用一般性说明及具体实施方案对本发明作了详尽的描述，但在本发明基础上，可以对之作一些修改或改进，这对本领域技术人员而言是显而易见的。因此，在不偏离本发明精神的基础上所做的这些修改或改进，均属于本发明要求保护的范围。Although the present invention has been described in detail above with general description and specific embodiments, it is obvious to those skilled in the art that some modifications or improvements can be made on the basis of the present invention. Therefore, these modifications or improvements made without departing from the spirit of the present invention fall within the scope of the claimed protection of the present invention.

Claims

1. an adaptive AR task unloading and resource allocation method based on reinforcement learning, is characterized in that, comprises the following steps:

1) The user terminal selects AR tasks with different video frame resolutions, and simultaneously obtains the corresponding accuracy rate of the AR tasks with the video frame resolution; it is assumed that the AR video selected by user i is preprocessed to a video frame resolution of v _i ×v The video frame of _i pixel, that is, the video frame resolution of user i is

represents the set of video frame resolutions, where

is a set of users; different frame resolutions correspond to different task sizes and delay thresholds, and the computing task of user i is expressed as τ _i ={d _i , _{ci ,thr i} _} , including the size of the task, the amount of computation and Time delay threshold, d _i =δv _i ² is the task size, where δ is defined as the number of bits required to represent one pixel information, and the relationship between task computation and task size is c _i =η _i d _i , where η _i In order to calculate the density, that is, the amount of calculation required for a unit task, according to the relationship between the AR video frame resolution and the accuracy rate, the accuracy rate Qi obtained by user _i is:

2) Partially offload AR tasks with different video frame resolutions selected by the client. First, the AR tasks are divided into different proportions, and then the tasks are partially offloaded to different MEC servers under the allocation of different computing resources and wireless resources. Time delay, user energy consumption and user cost brought by local execution; since partial offloading of tasks can be executed in parallel, for the task of user i, the time delay T _i required to complete the execution is:

Among them, TL _i is the local execution delay of some tasks of user i, and TM _i,k is the execution delay of unloading some tasks of user i to MEC server k;

The local execution delay TL _i of some tasks of user i is:

Among them, ci represents the task calculation amount of user _i , fi represents the local computing resources allocated by user _i to process the current task, and b _i,k represents the proportion of tasks that user i unloads to MEC server k;

The time delay TM _{i, k} for offloading part of the tasks of user i to the MEC server k is:

Among them, d _i represents the task size of user _{i, r i,k} represents the uplink transmission rate between user i and MEC server k, and m _i,k represents the computing resource size of MEC server k allocated by user i;

The user energy consumption E _i of user i performing the current task is:

Among them, EL _i is the local processing energy consumption, EM _i,k is the transmission energy consumption when the user unloads some tasks to the MEC server k;

The local processing energy consumption EL _i is:

Among them, θ represents the energy density required to process the task, and f _i is the locally allocated computing resource;

When the user unloads some tasks to the MEC server k, the transmission energy consumption EM _i,k is:

Among them, P _i represents the user transmission power;

When the user i unloads the task to the MEC server for execution, according to the size of the task calculation c _i , the proportion of the unloaded task b _i,k , and the unit price ε charged by the MEC server, the corresponding fee W _i needs to be paid:

3) Under the constraint of task delay threshold, a user experience model is established by comprehensively considering the accuracy rate, user energy consumption and user cost; by jointly optimizing the AR video frame resolution selection strategy, task part offloading strategy, computing resource and wireless resource allocation strategy , forming an optimization problem with the goal of improving user experience; in order to improve the comprehensive experience of each user, achieve the Nash equilibrium of each user experience, reduce user energy consumption and user costs, and improve the accuracy of user AR tasks, therefore, combining these three Factors that form a user experience model:

U _i =ξ _Q Q _i -ξ _E E _i -ξ _W Wi _i

Under the constraints of delay threshold, computing resources and wireless resources, jointly optimize the user's AR video frame resolution selection strategy

Task Section Uninstall Policy

Radio resource allocation strategy

Local and MEC computing resource allocation strategies

Form an optimization problem with the goal of maximizing the user experience:

s.t.

Among them, zi _,k,n indicate whether the channel n between the user and the MEC server k is allocated to the user i, if it is allocated, it is 1, otherwise it is 0, M _k is the computing resource size of the MEC server k, F _i user i of local computing resources;

4) Transform the optimization problem into a Markov decision process, initialize its state space, action space and reward settings; design a reinforcement learning network according to the Markov decision process; the state space includes the size of all user computing resources, the size of MEC computing resources, and the base station. Idle wireless resources, initial computing resources and wireless resource allocation scheme, AR video frame resolution selection scheme, task part offload scheme; action space is the allocation scheme for initial computing resources and wireless resources, AR video frame resolution selection scheme, The amount of change in the unloading scheme of the task part; the reward is to set three-tier rewards according to the constraints (c1)-(c5);

5) Train the reinforcement learning network through the MADDPG algorithm until the network converges; first obtain new states, actions and rewards according to the reinforcement learning network, and store them in the experience playback pool, and then start the training design reinforcement when the data in the experience playback pool reaches the threshold Learn the network and continuously update the data and network parameters in the experience playback pool until the network training converges;

6) After the network is trained, the state of the client and MEC server is input into the network, and the AR video frame resolution selection strategy, task part offloading strategy, computing resource and wireless resource allocation strategy are obtained.