CN120835006A

CN120835006A - A multi-dimensional resource management joint optimization method based on wireless edge network

Info

Publication number: CN120835006A
Application number: CN202511340285.8A
Authority: CN
Inventors: 郑洛; 苏秦
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2025-09-19
Filing date: 2025-09-19
Publication date: 2025-10-24
Anticipated expiration: 2045-09-19
Also published as: CN120835006B

Abstract

The application relates to the technical field of wireless communication and discloses a multidimensional resource management joint optimization method based on a wireless edge network. The method comprises the steps of obtaining multi-dimensional resource state information of each node in a wireless edge network in real time, calculating corresponding resource indexes based on the resource state information, determining initial resource allocation strategies of each node based on the resource indexes through reinforcement learning, enabling each node in each node to construct and train a local first strategy model based on the initial resource allocation strategies to generate a local resource allocation strategy of the node, updating the local resource allocation strategy through federal learning based on the local resource allocation strategies of each node and the emergency task queue length, and carrying out multi-dimensional resource allocation based on the local resource allocation strategies. The method can realize the cooperative scheduling of multidimensional resources such as calculation, storage, frequency spectrum, power and the like in the wireless edge network, and improves the overall efficiency of the network.

Description

Multidimensional resource management joint optimization method based on wireless edge network

Technical Field

The application relates to the technical field of wireless communication, in particular to a multidimensional resource management joint optimization method based on a wireless edge network.

Background

The wireless edge network is used as a key infrastructure for supporting emerging applications such as Internet of vehicles, industrial Internet, augmented reality and the like, and has the core challenge of efficiently coordinating multidimensional resources such as calculation, storage, frequency spectrum, power and the like so as to meet the service requirements of high reliability and low time delay. Currently, the resource management optimization technology oriented to the edge network mainly performs independent optimization aiming at a single resource dimension. For example, computing resources are allocated by static rules, spectral resources are dynamically adjusted based on channel conditions, and so on.

However, the resource optimization mode only establishes a simple mapping relation between limited dimensions, lacks global coordination on various resources, and is difficult to adapt to the coupling constraint of tasks. For example, when offloading to a node, a computation-intensive task may offset the time advantage of local computation due to transmission delays that are too high without synchronously coordinating spectrum bandwidth allocation. Based on the method, how to realize the collaborative management and joint optimization of various resources of the wireless edge network and improve the overall efficiency of the wireless edge network is a problem to be solved at present.

Disclosure of Invention

In view of this, the present application aims to provide a multidimensional resource management joint optimization method based on a wireless edge network, which improves the overall efficiency of the wireless edge network by the collaborative management and joint optimization of multiple resources.

In order to achieve the above purpose, the technical scheme of the application is as follows:

An embodiment of the present application provides a multidimensional resource management joint optimization method based on a wireless edge network, where the method is applied to a processor, and the processor is deployed in a terminal, a server or a base station, and includes:

acquiring multi-dimensional resource state information of each node in a wireless edge network in real time;

Calculating corresponding resource indexes based on the state information of each resource, wherein the resource indexes comprise calculation load, storage fragmentation rate, spectrum efficiency and reliability indexes;

Determining an initial resource allocation strategy of each node through reinforcement learning based on each resource index, so that each node in each node can execute the following operations of constructing a local first strategy model based on a space-time convolution long-short-term memory network, and training the first strategy model based on the initial resource allocation strategy, historical demand data of the node and task execution data;

Updating the local resource allocation strategy of each node through federal learning based on the local resource allocation strategy of each node and the length of the emergency task queue;

And carrying out multidimensional resource allocation for each node based on the updated local resource allocation strategy of each node.

Optionally, calculating the corresponding resource index based on the respective resource status information includes:

Constructing a multi-dimensional resource dynamic model based on the state information of each resource, wherein each dimension of the resource dynamic model respectively executes the following steps:

calculating the calculation load of each node based on the task calculation amount and the calculation force proportion of the node;

Calculating the storage fragmentation rate of each node based on the storage unit allocation capacity, the used capacity and the total capacity of the node;

calculating the spectrum efficiency of each node based on the transmitting power, the channel gain and the noise power of the node aiming at the target user;

And calculating the reliability index of each node based on the service rate, the task achievement rate, the maximum tolerant time delay and the actual time delay of the node for executing different tasks.

Optionally, determining, by reinforcement learning, an initial resource allocation policy for each node based on each resource indicator, including:

Defining a state space by a deep reinforcement learning algorithm based on each resource index, and generating state space parameters;

Defining an action space based on each state space parameter, and generating a corresponding resource adjustment amount as an action space parameter, wherein the resource adjustment amount comprises a calculation force allocation adjustment amount, a storage allocation adjustment amount, a bandwidth allocation adjustment amount and a power adjustment amount;

constructing a strategy network based on the state space parameters and the corresponding action space parameters;

Constructing a multi-objective rewarding function based on the state space parameter, the action space parameter and the overall performance index of the wireless edge network, wherein the overall performance index of the wireless edge network comprises the total network delay, the maximum network allowable delay, the upper limit of the computing power of the network, the data volume of tasks in the network and the storage access rate;

And generating an initial resource allocation strategy of each node based on the updated strategy network.

Optionally, updating parameters of the policy network further comprises:

Based on the bandwidth allocation adjustment quantity and the power adjustment quantity of the action space parameters, introducing the competition relationship and uncertainty among users through a Bayesian game theory to simulate the resource competition behavior in a multi-user environment, and constructing a user utility function;

Optimizing the multi-objective rewards function and the user utility function in an alternate iterative mode to update parameters of the strategy network.

Optionally, constructing the user utility function includes:

constructing a transmission rate item of a node i aiming at a bandwidth allocation adjustment quantity, a power adjustment quantity, spectrum efficiency and a power compensation reference value of the node i, wherein the transmission rate item is used for reflecting the balance of the transmission rate and power consumption of a user;

Constructing a resource competition item of the node i based on the difference between the ratio of the power adjustment quantity and the bandwidth allocation adjustment quantity of the node i and the other node j and the resource competition penalty coefficient;

And constructing a user utility function of the node i based on the transmission rate item and the resource competition item of the node i.

Optionally, updating the local resource allocation policy of each node through federal learning based on the local resource allocation policy of each node and the emergency task queue length, including:

determining the weight corresponding to each node based on the current emergency task queue length of each node;

calculating global model parameters according to model parameters of a first strategy model of each node and weights corresponding to the nodes;

Transmitting the global model parameters to each node so that each node updates a corresponding first strategy model based on the global model parameters;

and generating a local resource allocation strategy of the node based on the first strategy model after parameter updating.

Optionally, after adjusting the local resource allocation policy of each node through federal learning, the method further includes:

based on the total length of a task queue and the real-time total power consumption of the wireless edge network, constructing a Lyapunov function and calculating a corresponding drift term;

Constructing a first objective function based on the total power consumption budget of the wireless edge network, the current real-time total power consumption and the drift term;

and adjusting the allocation weight of each dimension resource in the local resource allocation strategy of each node to minimize the first objective function, and updating the local resource allocation strategy of each node.

Optionally, the resource status information further comprises resource requirement information of each node, and the method further comprises:

Acquiring network state information of the wireless edge network according to a first time interval, wherein the network state information comprises a node topological structure and equipment parameters of each node;

Based on a local resource allocation strategy of each node, current resource state information of the node and the network state information, constructing a network mirror model corresponding to the wireless edge network through simulation so as to map the behavior and performance of each node in the wireless edge network;

after the local resource strategy of each node is updated, the behavior and performance of each node in the wireless edge network are synchronously simulated through incremental learning and self-adaptive adjustment algorithm.

Optionally, the method further comprises:

Under the high-reliability low-delay scene, the behavior and the performance of each node are obtained through the network mirror model according to a second time interval, and the current delay and reliability indexes are calculated;

comparing the current time delay with a reliability index, and comparing the current time delay with a time delay threshold value and a reliability threshold value corresponding to the high-reliability low-time delay scene;

And under the condition that the time delay is higher than the time delay threshold value or the reliability index is lower than the reliability threshold value, updating the local resource allocation strategy of each node through federal learning.

Optionally, the method further comprises:

Acquiring task execution data of each node from the network mirror model according to a third time interval, wherein the task execution data comprises time delay index data, energy consumption index data and task progress index data;

Generating a resource status report of the wireless edge network based on task execution data of each node;

and updating the first strategy model of each node based on the task execution data of each node.

By adopting the multidimensional resource management joint optimization method based on the wireless edge network, the current resource index is calculated by acquiring multidimensional resource state information in each node of the wireless edge network in real time, thereby realizing accurate monitoring of dynamic change of the network state, and dynamically sensing the change of the network state so as to reflect and adjust the resource allocation strategy in time. Based on the resource index of each dimension, determining the initial resource allocation strategy of the node in the network through reinforcement learning, enabling each node to train a local first strategy model based on the initial resource allocation strategy, and generating the local resource allocation strategy of the node through the first strategy model. Because the initial allocation strategy is determined based on the real-time multidimensional resource indexes of all nodes in the edge network, the initial allocation strategy reflects the collaborative optimization result of the resource allocation among all nodes and the local multidimensional resource allocation of the nodes in the process of executing the task in the current network.

Based on the local resource allocation strategy and the emergency task queue length of each node, the local resource allocation strategy of each node is further optimized through federal learning so as to meet the resource allocation requirement of the network on emergency task processing, the coping capacity of the network on emergency tasks and emergency conditions is enhanced, and therefore the whole efficiency of the edge network is improved, and meanwhile the coping capacity of the network on emergency tasks is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a multi-dimensional resource management joint optimization method based on a wireless edge network according to an embodiment of the present application;

FIG. 2 is a flow chart of updating a local resource allocation policy through federal learning in accordance with an embodiment of the present application;

Fig. 3 is a flow chart of determining a multi-dimensional resource allocation policy for a wireless edge network in an embodiment of the application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

In various embodiments of the present application, it should be understood that the sequence numbers of the following processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods that are consistent with some aspects as detailed in the application.

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.

The conventional wireless edge network resource allocation scheme has the following defects:

(1) In real-time interaction scenarios, such as automatic driving, AR (Augmented Reality ) VR (Virtual Reality), and other scenarios, a single-dimension resource optimization strategy is difficult to adapt to the coupling constraint of task requirements. For example, when a computationally intensive task is offloaded to an edge server, if spectrum bandwidth allocation is not synchronously coordinated, the time advantage of local computation may be offset due to too high a transmission delay;

(2) The static allocation policy cannot adapt to dynamic changes of the network, and especially when the demands of burst tasks (such as burst task requests, time variability of channels, and position drift of mobile terminals) in the network are likely to cause congestion of task queues, so that the overall efficiency of the network is affected.

In view of the problems of the current wireless edge network resource allocation scheme, the application provides a resource management optimization method which can cooperatively schedule multi-dimensional resources, can better adapt to the coupling constraint of tasks and can quickly adapt to the dynamic change of the network, and the cooperative optimization of multi-node and multi-dimensional resource allocation is realized by monitoring the multi-dimensional resource allocation and the task execution condition in each node in the network in real time, so that the overall efficiency of the network is improved, and the coping capacity of the network for sudden task demands is enhanced.

The application will be described in detail below with reference to the drawings in connection with embodiments.

Fig. 1 is a flowchart of a multidimensional resource management joint optimization method based on a wireless edge network according to an embodiment of the present application. As shown in fig. 1, the method is applied to a processor, where the processor is deployed in a terminal, a server or a base station, and includes:

In the embodiment of the application, the multidimensional resource management joint optimization party is applied to a processor, and the processor can be a server, a terminal, a base station and other equipment deployed in a wireless edge network. In the embodiment of the application, the processor is taken as a terminal for explanation, and the method is realized through interaction among the terminal, the server and the base station.

In the embodiment of the application, the multi-dimensional resource state information of each node (namely the edge server) is collected in real time through various sensors and monitoring devices deployed in the wireless edge network, wherein the multi-dimensional resource state information comprises computing resource state data, storage resource state data, spectrum resource state data and power resource state data. The computing resource state data comprises CPU utilization rate, memory occupation, task queue length and the like, the storage resource state data comprises residual capacity, read-write speed, data fragment distribution and the like of the storage equipment, the frequency spectrum resource state data comprises channel quality, bandwidth occupation, interference level and the like, and the power resource state data comprises current power levels, battery states, energy consumption rates and the like of the base station and the terminal equipment.

After the resource state information of each node is acquired through various sensors and detection equipment, the resource state information is sent to a centralized data processing node, namely a central data processing unit, of a core network layer in the network at preset time intervals. The central data processing unit cleans and standardizes the received original resource state data, removes abnormal values and fills in missing data. Optionally, data normalization is performed on the data of each dimension, and data of different dimensions and magnitudes are converted into a unified numerical range for processing and analysis of a subsequent modeling algorithm.

According to the resource state information of each dimension, the corresponding resource index is calculated, and the method specifically comprises the steps of calculating load, storing fragment rate, spectrum efficiency and reliability index. Based on the resource indexes of each dimension, the initial resource allocation strategy of all nodes in the target network is determined through reinforcement learning, so that the cooperative optimization of the resource allocation of each dimension is realized, the dynamic balance of the multi-dimension performance indexes is achieved, and the overall performance of the network is improved. Each node in the network builds and trains a local first strategy model based on the initial allocation strategy, and the allocation strategy of each dimension resource of the local, namely the local resource allocation strategy, is generated through the first strategy model of each node.

In addition, in order to meet the urgent task demands of all nodes in the network, local resource allocation strategies of all the nodes are optimized through federal learning based on the urgent task queue length of all the nodes and parameters of a first strategy model, and finally multidimensional resource allocation is performed on all the nodes based on the local resource allocation strategies of all the nodes obtained through federal learning optimization. The resource allocation strategy of each node is further optimized through federal learning, so that the network can better meet the requirements of emergency tasks while the resource allocation in the wireless edge network is globally optimized, and the overall efficiency and the service quality of the network are improved.

As one embodiment of the present application, calculating a corresponding resource index based on each resource status information includes:

In one embodiment, a resource dynamic model is constructed for mapping resource metrics of each dimension resource in real-time. The resource dynamic model is expressed in a vector or matrix form, and each dimension maps corresponding resource state information respectively, wherein the resource dynamic model comprises calculation resource state data, storage resource state data, spectrum resource state data and power resource state data. The input of the model is the resource state information of each node obtained in real time after pretreatment, and the corresponding resource index is output through calculation.

Specifically, the resource dynamic model is expressed as:

;

Wherein, the Representing a multi-source dynamic resource model; representing real-time computational load; representing a storage fragmentation rate of a node; representing the spectral efficiency of the node; Representing the reliability index of the node.

The resource index calculation mode of each dimension is specifically as follows:

(1) Calculating real-time calculation load, namely calculating the calculation load of the current network by using a queuing theory model or a task scheduling algorithm based on the arrival rate of calculation tasks of nodes (edge servers), the complexity of the tasks and the processing capacity of the servers. In the embodiment of the application, a task scheduling algorithm is adopted to calculate real-time load, and an M/G/1 queuing model (queuing theory model) can be adopted, wherein the task arrival process accords with poisson distribution, the service time accords with general distribution, and load indexes such as the utilization rate of computing resources, waiting time and the like can be obtained by solving the queuing theory model. The specific calculation steps are as follows:

1) And (3) task calculation amount sensing, namely deploying a calculation resource monitoring module at an edge network node, and acquiring the calculation amount of each task in real time. The amount of computation is typically measured in units of CPU (Central Processing Unit ) cycles, and the number of CPU cycles required for each task is determined by analyzing the code structure, data processing requirements, and historical running records of the task. For example, for an image processing task, the CPU period required by the task is estimated according to factors such as the resolution of the image, the complexity of the processing algorithm and the like;

2) And determining the calculation force distribution rate, namely distributing corresponding calculation force distribution rate for each task by the node based on the current resource state information and the task priority. Specifically, the value range of the calculation force distribution rate is [0,1], which represents the proportion of the calculation resources allocated to the task by the server to the total calculation resources. For example, if the server has 10 CPU cores in total, and 8 cores are allocated to a certain high-priority task, the power allocation rate of the task is 0.8;

3) Calculating the calculation load of the node based on the task calculation amount and calculation force distribution rate of the node The following are provided:

;

Wherein, the For the calculated amount of the i-th task,And (3) distributing the computing power distribution rate for the ith task to the edge server, wherein N is the total number of the current tasks of the node.

For example, there are currently two tasks in the node, wherein the calculated amount of task 1 is 1000CPU cycles, the calculated force distribution rate is 0.5, the calculated amount of task 2 is 1500CPU cycles, the calculated force distribution rate is 0.5, and then the real-time load is carried outThe method comprises the following steps:

And a CPU cycle.

(2) Calculating the storage fragmentation rate, namely counting the fragmentation number and the size distribution of the storage space by analyzing the data storage and access modes of the node storage equipment, dynamically calculating the storage fragmentation rate by predefining a storage fragmentation rate formula, such as the ratio of the total fragmentation area to the total storage capacity, combining the growth trend and the deletion strategy of the data storage, and reflecting the utilization efficiency and the availability of storage resources, wherein the specific calculation steps are as follows:

1) And monitoring the allocation and use conditions of the storage units, namely carrying out fine granularity monitoring on storage resources of each node to obtain the allocated capacity and the actual use capacity of each storage unit. The allocated capacity refers to the size of a storage space allocated in advance to a specific data storage task, and the actual used capacity refers to the portion actually occupied in the allocated space.

For example, in a distributed network storage system, 100GB of storage space (allocated capacity) is allocated to user 1, but user 1 currently uses only 60GB (actual used capacity) in practice.

2) Calculating a storage fragmentation rate of a node based on the allocated capacity and actual usage capacity of each storage unit in the node:

;

Wherein, the For the allocated capacity of the mth storage unit,In order to make a practical use of the capacity,For total storage capacity, M is the total number of storage units.

For example, when the node includes 3 storage units (i.e., m=3) and the total storage capacity is 500GB, the allocated capacity of the storage unit 1 is 150GB, the actual use capacity is 130GB, the allocated capacity of the storage unit 2 is 200GB, the actual use capacity is 180GB, and the allocated capacity of the storage unit 3 is 100GB, and the actual use capacity is 70GB. The storage fragmentation rate of the node is:

。

(3) Calculating spectral efficiency Based on the consideration of the characteristics of wireless channels, the modulation and demodulation modes, the multi-user access condition and other factors, the spectrum efficiency of the node is calculated by adopting a shannon formula or a similar communication theory model. Optionally, according to parameters such as signal-to-noise ratio, bandwidth, transmission rate and the like, determining the maximum data transmission rate which can be supported by the unit spectrum width under the current spectrum resource allocation, so as to measure the utilization efficiency of the spectrum resource. The specific calculation steps are as follows:

1) And acquiring spectrum resource parameters, namely acquiring the transmitting power, the channel gain and the noise power parameters of each user in real time through a wireless interface monitoring module. For example, in an AR/VR application scenario, a plurality of users are connected to a node in a wireless edge network through a wireless device, and a wireless interface monitoring module of the node acquires a spectrum resource parameter of each user in real time.

2) Calculating the spectral efficiency of the user based on the acquired transmit power, channel gain and noise power parameters:

;

Wherein, the For the transmit power of the nth user,Channel gain for the nth user; for the transmit power of the jth user, Channel gain for the jth user; Is the noise power. The spectrum efficiency reflects the effectiveness of spectrum resources occupied by each user in the presence of interference, and is used for evaluating the utilization efficiency of the spectrum resources.

For example, assuming that the transmission power of the user 1 is 0.5W, the channel gain is 2, the transmission power of the user 2 is 0.3W, the channel gain is 1.5, and the noise power is 0.1W at a certain time, the spectral efficiency of the user 1 is calculated to be about 2.58.

The product of the transmit power of the target user and the channel gain represents the strength of the useful signal, while the sum of the interference signal power and the sum of the noise power of the other users constitutes the interference to the useful signal. The larger the ratio obtained by dividing the useful signal power by the total interference and noise power, the higher the data transmission rate can be realized under the same spectrum resource, and the higher the spectrum efficiency is. For example, in a wireless edge network where multiple users share the same frequency band, spectrum resources can be reasonably allocated by accurately calculating the spectrum efficiency of each user, so that interference is avoided, and the spectrum utilization rate of the whole network is improved.

(4) Calculating reliability indexThe reliability of each component in the edge network is not needed, such as the failure rate of a server, the stability of a communication link, the data integrity of storage equipment and the like, is comprehensively considered, a reliability engineering method, such as a failure mode and influence analysis or reliability block diagram method, is adopted, a 'time delay-reliability' joint model is constructed to calculate the reliability index of the whole, such as the Mean Time Between Failure (MTBF) or reliability probability, and the resource management strategy is ensured to meet the performance requirements and simultaneously has enough reliability guarantee.

Specifically, service reliability index in "delay-reliability" joint model is definedThe expression is as follows:

;

Wherein, the For the service rate of the class k task,In order for the task to arrive at a rate,For the maximum tolerated time delay to be the most,The actual time delay of the kth task; For the computational load of the node, For the storage fragmentation rate of a node,And alpha, beta and gamma respectively represent the contribution ratio of the computational resource load, the storage fragment rate and the spectral efficiency to the task time delay (namely the weight coefficient of each coefficient), the alpha+beta+gamma=1 is satisfied, and the alpha, beta and gamma are all larger than 0. In practical application, the determination of the weight coefficient needs to adopt a proper weighting method, such as a hierarchical analysis method, an entropy value method and the like, according to specific application scenes and resource characteristics so as to ensure the rationality and the accuracy of the formula.

The service rate and the task arrival rate describe the service and arrival process of the task in the network, and the maximum tolerated latency is a threshold that measures whether the task delay is acceptable. And calculating the difference between the maximum tolerant time delay and the actual time delay, and combining the service rate and the arrival rate to obtain the reliability index. A closer value of 1 for the reliability index indicates a higher reliability of the service, i.e. a higher probability of completion of the task within the specified time delay, whereas a closer value of 0 indicates a lower reliability of the service, i.e. a lower probability of completion of the task within the specified time delay.

For example, in the application of the internet of vehicles, the wireless edge network evaluates the reliability of the service under the current network condition by calculating the reliability index of the real-time position information updating task of the vehicle, and further provides a reference basis for driving decision.

In one embodiment, based on the updated local resource allocation policy of each node, multidimensional resource allocation is performed for each node, and the specific steps are as follows:

(1) And establishing a mapping relation between the physical equipment and the resource allocation strategy in the edge network based on the optimized local resource allocation strategy. And generating a corresponding task instruction according to the local resource allocation strategy. For example, computing task offload instructions are generated based on an allocation policy of computing resources, tasks that need to be processed or offloaded locally to other edge servers are determined, and specific offload paths and transport parameters are determined. And generating a bandwidth allocation instruction based on the allocation strategy of the spectrum resources, and determining the spectrum resource allocation proportion and the bandwidth size of each communication link or user. Generating a power control instruction based on the power resource, and setting the transmitting power of the base station and the terminal equipment so as to meet the requirements of communication quality and energy consumption;

(2) And formatting the generated control instruction of the resource allocation of each dimension according to equipment and a network communication protocol in the wireless edge network so as to ensure that the instruction can be correctly analyzed and executed. Then, the instructions are sent to corresponding devices and system components, such as edge servers (nodes), base stations, terminal devices and the like, through corresponding communication channels, so that real-time control and adjustment of resource allocation are realized.

As one embodiment of the present application, determining an initial resource allocation policy of each node through reinforcement learning based on each resource index includes:

In one embodiment, an initial resource allocation policy for a wireless edge network is generated by a hierarchical intelligent decision engine based on a dynamic model of resources. In this embodiment, the hierarchical intelligent decision engine is built based on deep reinforcement learning. Specifically, the initial resource allocation strategy of the node is determined through deep reinforcement learning, and the specific steps are as follows:

(1) Based on the deep reinforcement learning algorithm, each resource index is obtained from the resource dynamic model as a state space parameter, which comprises the steps of calculating load Storage fragmentation rateSpectral efficiencyReliability index at previous time. These parameters collectively reflect the resource status and quality of service of the edge network at time t. Constructing a state space vector based on each state space parameterThe following are provided:

;

The state space vector is used as the input of the deep reinforcement learning agent to sense the environment state, and provides comprehensive environment information for the subsequent resource allocation decision.

(2) And defining the action space of each state space parameter to generate a multidimensional resource adjustment quantity. Specifically, firstly determining action space parameters, taking adjustable resource allocation parameters of each dimension as the action space parameters, specifically comprising calculating force allocation adjustment quantityStorage allocation adjustment amountBandwidth allocation adjustmentAmount of power adjustment. Dynamic adjustment of various resources in the action space parameter-associated wireless edge network can flexibly change the resource allocation strategy to adapt to different network states. Wherein, the expression of each motion space parameter is as follows:

;

Wherein, the 、、、The weight matrixes are all corresponding parameters;、、、 Bias vectors for respective corresponding parameters; Is a state space vector; activating a function for sigmoid for limiting the output to a reasonable range; Is a hyperbolic tangent function for limiting the output to a range of [ -1,1], and ReLU is a modified linear unit function for ensuring that the bandwidth allocation adjustment is non-negative.

(3) Constructing motion space vectors based on the respective motion space parametersThe following are provided:

。

the intelligent agent realizes the real-time adjustment of the resource allocation of each dimension in the network by outputting the motion space vector.

For example, when the calculation load is high, the agent relieves the calculation pressure by increasing the calculation force allocation adjustment amount, and when the storage fragmentation rate is too high, the use of the storage resources is optimized by adjusting the storage allocation adjustment amount.

In order to balance the targets of service reliability, time delay and the like of the nodes, comprehensively considering the improvement effect of the action space parameters on the state space parameters and the performance index of the whole network, and constructing a multi-target rewarding functionThe method is characterized by comprising the following steps:

;

Wherein, the 、、Is a weight coefficient, satisfiesThe weight coefficient is used for balancing each optimization target, so that the rewarding function can comprehensively reflect the advantages and disadvantages of the resource allocation strategy; is the total time delay of the network; For the maximum allowable delay of the network, To calculate the upper limit of force, L is the amount of data for the task in the network,For storing access rates; For the purpose of calculating the force distribution adjustment quantity, An adjustment amount is allocated for the bandwidth and,The adjustment amount is allocated for the storage,For the computational load of the node,For the spectral efficiency of the user,The storage fragmentation rate for a node.

And (3) introducing the association of the state space parameters and the action space parameters (including the total network time delay, the maximum allowable time delay, the upper limit of calculation force, the data volume of tasks and the storage access rate) while balancing each optimization target by using the weight coefficient so as to enhance the sensitivity of the reward function to the state and the action.

And optimizing the target rewarding function to dynamically update the strategy network of the intelligent agent, thereby obtaining the initial resource allocation strategy of all nodes in the network.

According to the embodiment, through dynamic modeling and intelligent decision, the network state change can be accurately perceived, the resource allocation strategy can be timely adjusted, and the network can still keep the low-delay communication performance when facing the dynamic changes such as burst task requests, channel time variability, mobile terminal position drift and the like. And the cooperative optimization of the multidimensional resources breaks the limitation of hierarchical management in the traditional resource allocation scheme, and realizes the high-efficiency cooperation of the cross-resource dimension.

For example, in the case that the wireless edge network is applied to a smart medical scenario, computing resources, storage resources, spectrum resources and power resources in the network can be flexibly allocated and dynamically adjusted according to actual task demands and network states, so that the resource utilization rate of medical image real-time analysis tasks is improved, the time delay of medical resource utilization can be reduced, and real-time interaction of remote surgery is supported.

As one embodiment of the present application, updating parameters of the policy network further includes:

In one embodiment, the competition relationship and uncertainty among the user tasks in the network are also considered, so that the high efficiency and fairness of the resource allocation are ensured, and therefore, in addition to updating the strategy network according to the multi-objective rewarding function, the bandwidth allocation adjustment amount and the power adjustment amount of the action space parameter are subjected to game balance analysis based on the Bayesian game theory, and the user utility function is generated. The user utility function is utilized to introduce the competition relation and uncertainty among users, and the multi-user competition behavior is simulated so as to generate a reasonable competition strategy, thereby realizing the fair and efficient allocation of resources.

On the basis, the method is optimized together in an alternate iteration mode based on the multi-objective rewarding function and the user utility function, so that an efficient allocation strategy considering resource fairness and multi-user competition fairness is realized.

As one embodiment of the present application, constructing a user utility function includes:

In one embodiment, the construction of the user utility function comprehensively considers the cost of resource allocation and payment obtained by each user and the expectation of other user behaviors, thereby generating a reasonable competition strategy and realizing the fair and efficient allocation of resources. Specifically, the user utility function consists of communication efficiency improvement caused by bandwidth and power adjustment and punishment caused by resource competition. The improvement of communication efficiency reflects the benefit obtained by the user through adjusting bandwidth and power, and the resource competition punishment suppresses the excessive occupation of resources by the user, so that the user is promoted to consider the influence on other users in competition, and the overall fairness and stability of the network are maintained.

In the embodiment of the application, the user utility functionThe expression of (2) is as follows:

;

Wherein, the K is a resource competition punishment coefficient; Is a transmission rate term for reflecting the balance of the transmission rate and power consumption of the user; a resource competition item for suppressing excessive competition between users; The ratio of the power adjustment quantity of the user i to the bandwidth allocation adjustment quantity is compared with the ratio of the power adjustment quantity of other users j to the bandwidth allocation adjustment quantity, so that the resource competition degree faced by the user i is measured; An adjustment amount is allocated for the bandwidth and, For the spectral efficiency of the user,Is the power adjustment amount;、 Respectively distributing adjustment amounts for bandwidths of two users i and j; 、 The power adjustment amounts for the two users i, j, respectively.

In practical applications, computing a user utility function requires consideration of the action space parameters of each user as well as the status and actions of other users in the network. When adjusting the bandwidth and power allocation of a user, the user utility function of the user is calculated based on the current network state and the actions of other users. And then, obtaining the optimal user competition strategy under the current network state by solving Bayesian equilibrium.

For example, in the case that the bandwidth allocation adjustment amount is increased or the power adjustment amount is decreased by the user i, the resource utilization efficiency is increased, thereby increasing the utility function value of the user, and at the same time, if the difference between the power and the bandwidth ratio of the user i and other users is increased, the resource competition penalty term is correspondingly increased, thereby decreasing the utility function value of the user. Therefore, a balance point needs to be found between resource utilization efficiency and contention fairness to maximize user utility.

In the embodiment of the application, the multi-objective rewarding function and the user utility function of the deep reinforcement learning strategy network are updated through alternate iteration, so as to generate an initial resource allocation strategy, and the specific updating flow is as follows:

(1) Updating parameters of an agent policy network by a gradient descent method based on prize values and action cost functions of a multi-objective prize function :

;

Wherein, the Is the learning rate; Is a discount factor; Is a motion cost function; Representing policy network parameters Obtaining a gradient; a reward signal which represents environmental feedback after the intelligent agent executes a certain action at the moment is used for guiding the policy optimization direction for instant rewarding;

(2) Solving Bayesian equilibrium, and solving an optimal response strategy of a user by optimizing the combination of the user utility function and the rewarding value of the multi-objective rewarding function:

;

Wherein, the In order to synergistically optimize the weight of the system,The power adjustment amount in the optimal response strategy; allocating an adjustment amount for the bandwidth in the optimal response strategy; Is an instant rewards; a utility function for the user;

(3) Setting a convergence condition, judging whether the policy network of the intelligent agent and the user policy are converged, wherein the judgment condition is as follows:

;

Wherein, the AndRepeating the process until convergence conditions are met for a convergence threshold;、 The calculated force distribution adjustment amounts are respectively at the time t and the time t+1; 、 the bandwidth allocation adjustment amounts at time t and time t+1 are respectively.

The repeated iterative process comprises the steps of updating an agent strategy network, solving a Bayesian game equilibrium solution and convergence judgment. In the strategy network updating, the rewarding value and the action cost function are utilized to guide the adjustment of network parameters, in the Bayesian game equilibrium solving, the user utility function and the cooperative optimization weight are combined to find the optimal user competition strategy, and whether the updating is stopped is judged by defining convergence judging conditions (such as the variation amplitude threshold of the resource adjustment quantity). And outputting an initial resource allocation strategy through the strategy network under the condition that the strategy network and the user utility function are converged. The initial resource allocation strategy is used as a reference for subsequent resource optimization, guides the actual allocation and adjustment of various resources in the edge network, and finally outputs the initial resource allocation strategy as follows:

;

Wherein, the The calculated force allocation adjustment amount at the next time (time t+1) in the optimal response strategy is represented,Representing the amount of memory allocation adjustment in the best response policy,Indicating the amount of bandwidth allocation adjustment in the best response strategy,Indicating the amount of power adjustment in the best response strategy.

In the embodiment of the application, the multi-objective rewarding function and the user utility function are updated by alternate iteration, so that the intelligent agent can gradually optimize the resource allocation strategy under the condition of considering the user competition, and the dual optimization of the network performance and the user fairness is realized. Further, each node builds and trains a local first policy network based on the initial resource allocation policy, thereby generating a local resource allocation policy according to resources, tasks and user competition conditions of the nodes locally.

As one embodiment of the application, the resource status information further comprises resource demand information of each node, and the method further comprises:

In one embodiment, a digital twinning technique is employed to construct a network image model based on an initial resource allocation policy to support verification and dynamic optimization of the resource allocation policy. The network mirror model is based on physical components and logic architecture of an actual wireless edge network, and uses physical modeling, data driving modeling and simulation technology to map real-time network state information (including network topology structure and node equipment parameters) and resource state information into the network mirror model, and the network mirror model accurately maps the behaviors and performances of each node in the actual network. The behavior of the node in the network refers to dynamic operation logic and interaction rules of the network node in the resource scheduling process, and the behavior comprises the following steps:

(1) Task scheduling behavior, including unloading decision of the edge server to the computing task and priority queue management;

(2) The resource competition behavior comprises power adjustment and defragmentation triggering threshold values under Bayesian equilibrium;

(3) The fault response behavior comprises task migration path selection when the node is down and spectrum switching strategy (FMEA reliability model) when the channel is interrupted.

The performance of a node in a network refers to a quantifiable service index generated after a resource allocation policy acts on the network, and includes:

(1) The key time delay index comprises end-to-end task processing time delay;

(2) The resource efficiency index comprises a storage fragmentation rate, a frequency spectrum utilization rate and a calculation resource idle rate;

(3) The system reliability index comprises a task packet loss rate and service availability.

And the network mirror model has a dynamic data synchronization mechanism, and dynamically updates network state information through incremental learning and self-adaptive adjustment algorithms along with dynamic changes of the wireless edge network, such as addition of new equipment, dynamic changes of tasks, consumption and supplementation of resources, and the like, so that the real-time state of the model and the real wireless edge network is ensured to be consistent, and the real running condition of the real wireless edge network under different resource allocation strategies is simulated in real time in a virtual environment.

As an embodiment of the present application, the method further comprises:

In this embodiment, the network mirror model is further used to simulate the network performance under different resource allocation policies, instead of the actual network trial-and-error to verify the validity of the resource decisions. Optionally, under a high-reliability low-delay scene, the behavior and performance of each node in the network are mapped in real time through a network mirror model, and the delay and reliability index of the network when an emergency task is executed under the scene are simulated. Furthermore, based on simulated time delay and reliability indexes, a time delay threshold value and a reliability threshold value required by a high-reliability low-time delay scene are combined, and a node local resource allocation strategy is updated, so that the coping capacity of the wireless edge network to an emergency task in the high-reliability low-time delay scene is improved, the network has timely and efficient multidimensional resource dynamic allocation performance, the requirements of users on low-time delay and high-reliability communication can be better met, full-link performance guarantee is provided for leading-edge digital services, and the use experience of the users is improved.

In the embodiment, the high-reliability low-delay scene can be an automatic driving scene, an AR/VR video frame real-time rendering scene, a control scene requiring microsecond-level synchronization of instructions in the smart industry, a zero-jitter interaction scene for realizing remote operation in smart medical treatment, a scene for balancing real-time access of mass terminals in a smart city, and the like.

For example, in an automatic driving scene, the network mirror model simulates the end-to-end time delay and reliability of an emergency obstacle avoidance task by mapping the state data of a vehicle, a road side unit and an edge server in real time, and updates the local resource allocation strategy of the node by federal learning based on a time delay threshold and a reliability threshold in the emergency obstacle avoidance scene.

In this embodiment, the network mirror model further records historical demand data of each node in the network, where the historical demand data is further used to predict future resource demands of the nodes, and the prediction result is used to train the first policy model local to the nodes. In this embodiment, an adaptive optimization model for node characteristics, i.e., a first policy model, is constructed based on ConvLSTM (Convolutional Long Short-Term Memory) models and full-connection layers.

The ConvLSTM model combines a convolutional neural network and a long-term and short-term memory network, can process data in space and time dimensions simultaneously, and in a wireless edge network, resource requirements often have space-time correlation, for example, a certain mode can be presented by computing resource requirements at different time points and different nodes. In the embodiment, a ConvLSTM model is adopted to capture the space-time characteristics of the resource demands, so that the accuracy of predicting the future resource demands of the nodes is improved. The ConvLSTM model has the structure of an input layer, a CNN (Convolutional Neural Network) layer, an LSTM (Long Short-Term Memory) layer and an output layer.

The network mirror model records historical resource demand data of the nodes, including historical demand data of computing resources, storage resources and spectrum resources. These historical resource demand data are normalized and time series constructed and input to ConvLSTM model. The input layer of the model receives historical resource demand data (a multidimensional space-time matrix) of nodes recorded by the network mirror image model in a previous time period, the spatial characteristics (such as resource demand distribution modes among different edge nodes) of the historical resource demand data are extracted through a convolution layer, then the LSTM layer captures long-term dependency relationship (such as periodic change and trend of resource demand) of a time sequence, further a prediction result of future resource demand of the nodes is generated, and a future resource demand sequence is output. Predicted outcome of ConvLSTM modelObtained by the following expression:

;

Wherein, the A space-time convolution kernel weight matrix; The LSTM unit parameter matrix is adopted; is a historical sequence input (a time sequence slice) with a time window, representing a sequence from The set of continuous space-time characteristic tensors from time to t time is written。

In practical application, historical resource demand data recorded by a network mirror model is adopted to train ConvLSTM models, a loss function (such as mean square error) is used to measure the difference between a predicted value and a true value in the process, and model parameters are adjusted through a back propagation algorithmAnd. After training, the model predicts the resource demand data (resource demand sequence) at the future time t+1 according to the current time (time t) of the wireless edge network and the historical resource demand data of the historical time period) The method is used for providing basis for the subsequent federal learning to adjust the resource allocation strategy from the global state to the single node. Alternatively, the sequence of future resource demands may be a scalar or vector.

For example, in an industrial internet scenario, a space-time convolution LSTM model is trained by analyzing the computing resource and storage resource requirements of different production periods in the past week, so as to predict the resource requirements of the next production period, and provide a basis for scheduling of production tasks and pre-allocation of resources.

Optionally, training of a first strategy model local to the auxiliary node is assisted by utilizing a future resource demand sequence output by the ConvLSTM model, so that model convergence is accelerated, and generation efficiency of a local resource allocation strategy is improved.

The network state information of dynamic change in the wireless edge network is mapped in real time by constructing a network mirror model, the ConvLSTM model is adopted to predict future resource demand data, and the local first strategy network is updated in an auxiliary mode, so that the effect of dynamically adapting the current strategy of the node to the time-space change characteristic of the network is achieved, and the adaptation capacity of the network to the dynamic change is improved. Whether the short-term burst task or the long-term network topology change, the real-time sensing and intelligent decision can be used for effectively coping, so that the edge network can still keep good performance and service quality in complex and changeable environments.

As one embodiment of the present application, updating the local resource allocation policy of each node by federal learning based on the local resource allocation policy of each node and the emergency task queue length, includes:

In the conventional wireless edge network resource allocation strategy, the resource competition among distributed nodes lacks a global coordination mechanism. For example, in large-scale user scenarios such as smart cities, the user selfish policy easily causes that the Nash equilibrium deviates from the optimal state, resulting in the reduction of the fairness and efficiency of overall resource allocation, and affecting the overall efficiency of network services.

In order to realize global resource allocation policy adjustment considering competition among nodes, in this embodiment, local resource allocation policies of the nodes are updated through federal learning. Federal learning allows multiple parties to co-train a shared model without sharing the original data. In this embodiment, each node in the wireless edge network is taken as a participant, and each participant has a corresponding local resource allocation policy. The parameters of the local first strategy model of each node are globally regulated through federal learning, global model parameters are generated, the local resource allocation strategies of each node are regulated based on the global model parameters, the local resource allocation strategies of each node are aggregated while protecting the data privacy of each node, and therefore global regulation is carried out based on the local resource allocation strategies of each node, and the overall resource allocation efficiency is improved.

Specifically, the federal learning framework in this embodiment includes a participant, a coordinator, a communication mechanism, and a privacy protection mechanism. The participants are all nodes, and in this embodiment, each node trains a local first policy model by using local resource demand data and task execution data. The resource demand data comprises demand data of computing resources, storage resources and spectrum resources, task execution data comprises time delay data such as task processing time delay of computing/storage access and the like and transmission time delay influenced by spectrum allocation, energy consumption data such as actual power consumption of equipment under power control instructions such as base station transmitting power, server CPU energy consumption and the like, task success rate, task interruption times and the like. These data reflect the node specific resource usage patterns and task characteristics.

The coordinator is a central server of the wireless edge network and is responsible for coordinating the training process of each node, aggregating the parameters of the local first strategy model and generating global model parameters. The communication mechanism defines a communication mode and a data transmission format between the participant and the coordinator, and the privacy protection mechanism is used for ensuring that the data privacy of the participant is protected and preventing data leakage in the training process.

In this embodiment, the length of the urgent task queue of each node is determinedAnd determining the weight of the node, wherein the longer the length of the emergency task queue is, the larger the influence of the local strategy on the global model is. The urgent task queue length refers to the backlog number of high priority tasks within a single node. In practical application, the emergency task queue length of the node is counted and obtained in real time through a task scheduler local to the edge server node. Specifically, when a task reaches a node, the task scheduler marks priorities of different tasks according to a preset rule, the tasks with high priority are added into an urgent task queue, and the tasks with other levels of priority are added into a common task processing queue.

For example, the AR/VR frame rendering task is marked as a normal priority, and the vehicle collision pre-warning task is marked as an emergency priority. The preset rules may be set according to delay sensitivity, service type, etc. The scheduler keeps track of the number of tasks marked as urgent and not yet processed, generating real-timeValues.

Local model parameters of nodes by using weights corresponding to all nodesComputing global model parametersThe method comprises the following steps:

;

Wherein, the An urgent task queue length for the i-th node; The length of the emergency task queue of the jth node is the total number of nodes.

The weight calculation is carried out based on the weight of each node and the local model parameters through federal learning, so that the resource allocation strategy of the network is ensured to be capable of more focusing on the requirements of emergency tasks in the nodes, and the overall response speed and service quality of the network are improved.

FIG. 2 is a flow chart of updating a local resource allocation policy through federal learning in an embodiment of the present application. As shown in fig. 2, a first policy model local to the node is constructed based on the ConvLSTM model and the full connectivity layer. Based on the initial resource allocation strategy and the historical resource demand data of each node recorded by the network mirror image model, each node (note: n nodes are included in the network, only node 1 and node n are shown in fig. 2) trains a first strategy model according to local task load and resource use conditions, and corresponding local model parameters are obtained. The ConvLSTM model predicts future resource demand data of the node according to the historical resource demand data of the node, and updates the parameters of the first strategy model based on the predicted resource demand data and the actual resource demand data at the next moment. The central server aggregates the parameters of the first strategy model of each node, determines the weight of the model parameters of the corresponding node according to the emergency task queue length of each node, and further calculates the global model parameters of the wireless edge network based on the weight. And the central server distributes the global model parameters to each node, and each node updates the local resource allocation strategy by using the global model parameters to realize the resource allocation strategy optimization considering the global optimization of the network and the local emergency task demands of the nodes.

As an embodiment of the present application, after adjusting the local resource allocation policy of each node through federal learning, the method further includes:

In one embodiment, after the node local resource allocation strategy is regulated through federal learning, the network has the capability of rapidly responding to an emergency task, the resource consumption and the stability of a task queue of the network are further optimized through a Lyapunov drift optimization method, and the long-term performance of the wireless edge network is ensured.

Specifically, the resource weight is adjusted by the Lyapunov drift optimization method so as to optimize the long-term stability and power consumption of the task queues in the network, and the network is ensured to realize efficient allocation of resources on the premise of meeting the power budget. Firstly, defining the state and control variables (total length of task queues, total power consumption of network and the like) of a Lyapunov queue dynamic model, and constructing the Lyapunov queue dynamic model at the next moment (namely the moment t+1):

;

Wherein, the The service capacity of the network is the upper limit of the number of tasks which can be actually processed by the edge network in the time slot t; The task arrival amount before the network; The total length of the task queue at the next moment of the network.

In the embodiment of the application, the length of a queue of the node in the future is calculated based on the predicted result of the resource demand of the node in the future, which is output by the ConvLSTM model. Further, based on the future queue length of each node, calculating the total length of the task queue of the wireless edge network。

Based on the dynamic model, by a resource weight vectorRegulation of, wherein,To calculate the loadIs used for the optimization of the weights of the (c),To store fragmentation rateIs used for adjusting the weight of the steel plate,For spectral efficiencyBalance weights of (a). The method has the function of abstracting the stability of the queue into an optimizable object, sharing no parameters with a resource dynamic model and a reinforcement learning strategy model, and independently driving resource weight adjustment.

In this embodiment, the constraint conditions (i.e., the purpose of optimization) of the lyapunov drift optimization are:

(1) Total power consumption of network Not exceeding the total power budget;

(2) And the total length backlog of the network task queue is minimized, so that the stability of the network queue is ensured.

Construction of Lyapunov function based on Lyapunov dynamic model and constraint conditionsAnd a corresponding drift term:

;

Wherein, the The task queue length at time t for the kth edge server.

The first objective function of Lyapunov drift optimization comprises the following parts:

(1) Lyapunov drift term, which is to measure the change trend of network state, such as the change of queue length;

(2) Penalty terms-terms related to network constraints, such as power consumption versus budget;

(3) Dynamically adjusting the resource weight to balance the distribution of different resource dimensions.

Specifically, the expression of the first objective function is as follows:

;

Wherein V is a stability control parameter; The total power consumption of the network at the time t; budgeting the total power of the network; is a Lyapunov drift term, and has the expression:

;

E is a mathematical expectation operator, and represents a statistical average value of the change of the queue length under the random task arrival condition; Is a lyapunov function.

In this embodiment, the allocation weights (such as the computing power resource, the storage resource, the bandwidth resource, and the power resource) of the resources in each dimension are adjusted to minimize the first objective function, that is, to minimize the weighted sum of the lyapunov shift term and the network power consumption exceeding the total power budget. Further, such that when network power budget constraints are metOn the premise of minimizing the growth of the backlog of the task queue in the network, and guaranteeing the stability of the length of the task queue. The stability control parameter V is used to trade-off the relationship between queue stability and power consumption, and different optimization objective emphasis can be achieved by adjusting the value of V. For example, the value of V may be increased where control over power consumption is more stringent to increase the weight of the power cost term in the optimization objective.

The embodiment of the application realizes the stability constraint on the total power consumption of the network and the length of the task queue through Lyapunov optimization, ensures the stability of the network queue, avoids communication interruption or data loss caused by queue congestion, improves the stability and reliability of long-term service provided by the network, and improves the overall performance and resource utilization rate of the network. Taking an automatic driving scene as an example, the scheme can realize extremely low end-to-end time delay and high task reliability, and meet the severe requirements of cooperative decision and emergency braking of vehicles.

As an embodiment of the present application, the method further comprises:

In this embodiment, after each node executes a corresponding control instruction based on a local resource allocation policy, the behavior and performance of each node are continuously monitored. Specifically, task execution data of the node is obtained from the network mirror model according to a third time interval, wherein the task execution data comprises time delay index data, energy consumption index data and task progress index data. The time delay index data comprises task processing time delay, data transmission time delay and the like, the energy consumption index data comprises information such as power consumption, battery power change and the like of equipment in a network, the task progress index data comprises task execution states (comprising completion progress and stages), dependency relations (comprising subtask sequences), resource matching information (namely whether current resources are sufficient), environment dynamics (comprising node movement information and channel quality) and the like. And aggregating the task execution data of each node, and performing arrangement and analysis to generate a resource status report of the network. The report content can be presented in the form of an intuitive chart, curve or data table, so that the time delay and the energy consumption condition of the multidimensional resource in actual operation are clearly displayed, a user can know the resource allocation effect and the network performance conveniently, a basis is provided for subsequent strategy adjustment and further optimization, and the user experience is improved. And the newly generated task execution data in each node is used for performing incremental training on the first strategy model of the node local so as to dynamically adapt to the node change and keep the overall performance of the network optimal.

Fig. 3 is a flow chart of determining a multi-dimensional resource allocation policy for a wireless edge network in an embodiment of the application. As shown in fig. 3, the resource status information of each node in the network, including status data of computing resources, storage resources, spectrum resources and power resources, is collected in real time by sensors and monitoring devices deployed in the wireless edge network. And cleaning and standardizing the collected multi-dimensional data, and then constructing a resource dynamic model for dynamically mapping the multi-dimensional resource indexes based on a dynamic modeling algorithm so as to quantify the real-time state of the network. And then, defining state space parameters (including calculation load, storage fragmentation rate, spectrum efficiency, reliability indexes and the like) and action space parameters (including calculation power distribution adjustment amount, storage distribution adjustment amount, broadband distribution adjustment amount, power adjustment amount and the like) by using deep reinforcement learning through a layered intelligent decision engine, and constructing a multi-objective reward function by combining the state space parameters and the action space parameters so as to jointly optimize time delay, energy consumption and resource utilization rate. Meanwhile, a user utility function is introduced to balance the competitive behaviors among multiple users in a single node, and an initial resource allocation strategy which takes the resource utilization efficiency and the user fairness into consideration is generated by adopting an alternate iterative mode to jointly optimize the multi-objective rewarding function and the user utility function. On the basis, a network mirror image model is built by utilizing a digital twin technology to map network dynamics in real time, and network performance under different resource allocation strategies is simulated through the network mirror image model to replace a real network trial and error so as to verify the effectiveness of resource decision. Along with the change of an actual network, such as the addition of new equipment, the dynamic change of tasks, the consumption and supplement of resources and the like, the network mirror model is dynamically updated in time through incremental learning and self-adaptive adjustment algorithms. The collaborative scheduling of calculation, storage, frequency spectrum and power resources is realized through real-time perception modeling, layered intelligent decision and digital twin dynamic optimization mechanisms.

On the basis of an initial resource allocation strategy, a local first strategy model is built based on the ConvLSTM model, historical resource demand data obtained from a network mirror model is predicted through the ConvLSTM model, and the local first strategy model is trained by utilizing the historical resource demand data and task execution data of the nodes to obtain local model parameters. The local model parameters of all nodes are aggregated through federal learning, weight adjustment is carried out according to the length of the emergency task queue, global model parameters are determined, and the local first strategy model of each node is updated based on the global model parameters. Further, the local resource allocation strategy of each node is optimized and updated through Lyapunov drift, and the allocation weight of each dimension resource in the network is dynamically adjusted so as to restrict the total power consumption of the network and the stability of the length of the task queue, thereby improving the stability and reliability of the network for providing long-term service and improving the overall performance and the resource utilization rate of the network.

And generating control instructions based on the resource allocation strategy of each node, wherein the control instructions comprise a calculation task unloading instruction, a spectrum bandwidth allocation instruction and a power control instruction, and are executed by a controller of each node. After the node executes the control instruction, task execution data (such as time delay data, energy consumption indexes and the like) newly generated by the node are continuously monitored and fed back to generate a resource state report, so that a closed-loop resource management link of sensing, modeling, decision making, execution and feedback is formed, and the precision of resource management allocation and the robustness of a network are continuously improved. In addition, task execution data newly generated by each node is used as newly added training data, and incremental training is carried out on the first strategy model of the node, so that the accuracy of resource management allocation is further improved.

Based on the same inventive concept, an embodiment of the present application provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of the wireless edge network based multi-dimensional resource management joint optimization method according to any of the above embodiments of the present application.

Based on the same inventive concept, an embodiment of the present application provides a readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps in the wireless edge network based multidimensional resource management joint optimization method according to any of the above embodiments of the present application.

Based on the same inventive concept, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the computer program is executed by the processor to implement the steps in the wireless edge network based multidimensional resource management joint optimization method according to any of the above embodiments of the present application.

The foregoing description of the preferred embodiments of the application is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the application.

For the purposes of simplicity of explanation, the methodologies are shown as a series of acts, but one of ordinary skill in the art will recognize that the present application is not limited by the order of acts described, as some acts may, in accordance with the present application, occur in other orders and concurrently. Further, those skilled in the art will recognize that the embodiments described in the specification are all of the preferred embodiments, and that the acts and components referred to are not necessarily required by the present application.

It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, this application is to be construed as including the preferred embodiments and all such variations and modifications as fall within the scope of the embodiments of the application.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.

The above description is provided for describing the multi-dimensional resource management combined optimization method based on the wireless edge network in detail, and specific examples are applied to describe the principles and implementation manners of the present application, and the description of the above examples is only used for helping to understand the method and core ideas of the present application, and meanwhile, for those skilled in the art, according to the ideas of the present application, there are changes in the specific implementation manners and application ranges, so the disclosure should not be interpreted as limiting the present application.

Claims

1. A multi-dimensional resource management joint optimization method based on a wireless edge network, characterized by being applied to a processor deployed in a terminal, a server, or a base station; the method comprising:

Obtain multi-dimensional resource status information of each node in the wireless edge network in real time;

Calculate the corresponding resource indicators based on the status information of each resource, including computing load, storage fragmentation rate, spectrum efficiency and reliability indicators;

Based on each resource indicator, an initial resource allocation strategy for each node is determined through reinforcement learning, so that each of the nodes performs the following operations: constructing a local first strategy model based on a spatiotemporal convolutional long short-term memory network, training the first strategy model based on the initial resource allocation strategy, historical demand data of the node, and task execution data; and generating a local resource allocation strategy for the node through the first strategy model;

Based on the local resource allocation strategy of each node and the length of the urgent task queue, the local resource allocation strategy of each node is updated through federated learning;

Based on the updated local resource allocation strategy of each node, multi-dimensional resource allocation is performed for each node.

2. The multi-dimensional resource management joint optimization method based on wireless edge network according to claim 1, wherein the corresponding resource indicators are calculated based on the status information of each resource, including:

A multi-dimensional resource dynamic model is constructed based on each resource status information; the following steps are performed for each dimension of the resource dynamic model:

Calculate the computing load of each node based on the task computing amount and computing power ratio of the node;

Calculating the storage fragmentation rate of each node based on the allocated capacity, used capacity, and total capacity of the storage unit of each node;

Calculating the spectral efficiency of each node based on the transmit power, channel gain, and noise power of each node for the target user;

The reliability index of each node is calculated based on the service rate, task arrival rate, maximum tolerable delay and actual delay of each node performing different tasks.

3. The multi-dimensional resource management joint optimization method based on a wireless edge network according to claim 1, wherein the initial resource allocation strategy of each node is determined by reinforcement learning based on each resource indicator, comprising:

Based on various resource indicators, the state space is defined through deep reinforcement learning algorithms to generate state space parameters;

Defining an action space based on each state space parameter and generating corresponding resource adjustments as action space parameters; the resource adjustments include: computing power allocation adjustment, storage allocation adjustment, bandwidth allocation adjustment, and power adjustment;

Constructing a policy network based on the state space parameters and the corresponding action space parameters;

Constructing a multi-objective reward function based on the state space parameters, the action space parameters, and the overall performance indicators of the wireless edge network; the overall performance indicators of the wireless edge network include: total network latency, maximum allowable network latency, upper limit of network computing power, data volume of tasks in the network, and storage access rate;

Based on the multi-objective reward function, the parameters of the policy network are updated; and based on the updated policy network, an initial resource allocation strategy for each node is generated.

4. The multi-dimensional resource management joint optimization method based on wireless edge network according to claim 3, wherein updating the parameters of the policy network further comprises:

Based on the bandwidth allocation adjustment amount and power adjustment amount of the action space parameters, the competition relationship and uncertainty between users are introduced through Bayesian game theory to simulate resource competition behavior in a multi-user environment and construct a user utility function;

The multi-objective reward function and the user utility function are optimized in an alternating iterative manner to update the parameters of the policy network.

5. The multi-dimensional resource management joint optimization method based on wireless edge network according to claim 4, wherein constructing the user utility function comprises:

Constructing a transmission rate term for a node i based on its bandwidth allocation adjustment, power adjustment, spectrum efficiency, and power compensation reference value. The transmission rate term reflects the balance between the user's transmission rate and power consumption.

Constructing a resource contention item for node i based on a difference between a ratio of a power adjustment amount to a bandwidth allocation adjustment amount between node i and another node j, and a resource contention penalty coefficient;

Based on the transmission rate term and the resource contention term of the node i, a user utility function of the node i is constructed.

6. The multi-dimensional resource management joint optimization method based on a wireless edge network according to claim 1, characterized in that the local resource allocation policy of each node is updated through federated learning based on the local resource allocation policy of each node and the length of the emergency task queue, comprising:

Determine the weight of each node based on the current length of the urgent task queue of each node;

Calculating global model parameters based on the model parameters of the first strategy model of each node and the weight corresponding to the node;

Sending the global model parameters to each node, so that each node updates the corresponding first strategy model based on the global model parameters;

Based on the first policy model after parameter update, a local resource allocation policy of the node is generated.

7. The multi-dimensional resource management joint optimization method based on a wireless edge network according to claim 1 or 6, characterized in that after adjusting the local resource allocation strategy of each node through federated learning, it also includes:

Based on the total length of the task queue and the real-time total power consumption of the wireless edge network, construct a Lyapunov function and calculate the corresponding drift term;

The allocation weights of resources of each dimension in the local resource allocation strategy of each node are adjusted to minimize the first objective function, and the local resource allocation strategy of each node is updated.

8. The multi-dimensional resource management joint optimization method based on wireless edge network according to claim 1, wherein the resource status information further includes: resource demand information of each node; the method further includes:

Acquire network status information of the wireless edge network according to a first time interval, including: a node topology structure and device parameters of each node;

Based on the local resource allocation strategy of each node, the current resource status information of the node and the network status information, a network mirror model corresponding to the wireless edge network is constructed through simulation to map the behavior and performance of each node in the wireless edge network;

After the local resource strategy of each node is updated, the behavior and performance of each node in the wireless edge network are synchronously simulated through incremental learning and adaptive adjustment algorithms.

9. The multi-dimensional resource management joint optimization method based on wireless edge network according to claim 8, characterized in that the method further comprises:

In a high-reliability and low-latency scenario, the network mirror model is used to obtain the behavior and performance of each node at a second time interval, and the current latency and reliability indicators are calculated;

Compare the current latency and reliability indicators with the latency threshold and reliability threshold corresponding to the high reliability and low latency scenario;

When the delay is higher than the delay threshold, or the reliability index is lower than the reliability threshold, the local resource allocation strategy of each node is updated through federated learning.

10. The multi-dimensional resource management joint optimization method based on wireless edge network according to claim 8, characterized in that it also includes:

Acquiring task execution data of each node from the network mirror model at a third time interval; the task execution data including: delay index data, energy consumption index data and task progress index data;

generating a resource status report of the wireless edge network based on the task execution data of each node;

Based on the task execution data of each node, the first policy model of each node is updated.