CN114024639B

CN114024639B - Distributed channel allocation method in wireless multi-hop network

Info

Publication number: CN114024639B
Application number: CN202111318928.0A
Authority: CN
Inventors: 雷建军; 尚凤军; 王颖; 刘捷; 周盈
Original assignee: Chengdu Skysoft Info & Tech Co ltd
Current assignee: Chengdu Skysoft Info & Tech Co ltd; Shenzhen Hongyue Information Technology Co ltd
Priority date: 2021-11-09
Filing date: 2021-11-09
Publication date: 2024-01-05
Anticipated expiration: 2041-11-09
Also published as: CN114024639A

Abstract

The invention relates to the field of wireless network communications, and specifically to a distributed channel allocation method in a wireless multi-hop network, which includes adopting a physical architecture that at least includes a physical device layer, a computing layer and a network service layer. The physical device layer is randomly deployed in the network. n wireless nodes form a multi-hop wireless communication network. Each node acts as an autonomous intelligent agent and interacts with the uncertain network environment through the local decision-making module; the convergence node in the computing layer is responsible for all other sites in the network. The collected data is aggregated, analyzed and processed, and the node has edge computing function, and can train an asynchronous DRL model based on the empirical information collected distributedly by the node, model the multi-channel allocation problem as a POMDP problem, and use the trained asynchronous DRL Model performs channel allocation; the invention solves the problems of hidden terminals and exposed terminals in high-density multi-hop wireless networks, and effectively avoids data conflicts and waste of channel resources.

Description

A distributed channel allocation method in wireless multi-hop networks

技术领域Technical field

本发明涉及无线网络通信领域，具体涉及一种无线多跳网络中分布式信道分配方法。The invention relates to the field of wireless network communications, and in particular to a distributed channel allocation method in a wireless multi-hop network.

背景技术Background technique

多信道媒体控制接入(multiple media access control，MMAC)技术可以使在单信道通信中相互干扰的通信链路能在多个正交信道中实现无干扰的数据传输。MMAC可以有效地避免单信道的干扰问题，提升整个网络的吞吐量，因此，被认为是目前缓解无线网络信道资源短缺的一种极具潜力的技术。虽然，多信道通信相对与单信道通信有很多优点，但带来了许多新的问题：Multi-channel media access control (MMAC) technology can enable communication links that interfere with each other in single-channel communication to achieve interference-free data transmission in multiple orthogonal channels. MMAC can effectively avoid single-channel interference problems and improve the throughput of the entire network. Therefore, it is considered to be a technology with great potential to alleviate the shortage of channel resources in wireless networks. Although multi-channel communication has many advantages over single-channel communication, it brings many new problems:

信道分配和协商：基于多信道的MAC通信技术最基础和最重要的问题是如何合理地分配信道资源，以保证每个节点在正常通信的前提下，最大化整个网络的网络容量。此外，在通信之前，节点之间需要协商解决信道的使用问题，以确保两个通信节点在数据传输期间工作在同一信道上。Channel allocation and negotiation: The most basic and important issue in multi-channel MAC communication technology is how to reasonably allocate channel resources to ensure that each node can maximize the network capacity of the entire network under the premise of normal communication. In addition, before communication, nodes need to negotiate the use of the channel to ensure that the two communication nodes work on the same channel during data transmission.

多信道广播：基于单信道模型的无线网络可以很容易实现广播，因为每一个传感器节点都处于同一个信道；然而在多信道环境中，当某个节点进行广播时，由于节点分布在多个信道上导致某些节点不能接收到广播内容。广播功能在网络应用中有着重要的作用，因此，如何实现广播功能是基于多信道通信面临的又一难题。Multi-channel broadcast: A wireless network based on a single-channel model can easily implement broadcast because each sensor node is on the same channel; however, in a multi-channel environment, when a node broadcasts, since the nodes are distributed in multiple channels As a result, some nodes cannot receive broadcast content. The broadcast function plays an important role in network applications. Therefore, how to implement the broadcast function is another difficult problem faced by multi-channel communication.

多跳隐藏终端和暴露终端：如图1所示，多跳隐藏终端是在接收节点的通信范围内而在发送节点的通信范围之外的节点。这些节点由于收不到发送节点的发送数据，而可能向同样的接收节点发送数据，造成数据传输的冲突。在高密度情况下，隐藏终端问题会导致不必要的数据冲突，极大地降低网络性能。多跳暴露终端问题是指是指在发送节点的覆盖范围内而在接收节点的覆盖范围外的节点，暴露终端因听到发送节点的发送而延迟发送。暴露终端的存在会导致不必要的信道资源浪费。Multi-hop hidden terminals and exposed terminals: As shown in Figure 1, multi-hop hidden terminals are nodes within the communication range of the receiving node but outside the communication range of the sending node. Since these nodes cannot receive the data sent by the sending node, they may send data to the same receiving node, causing data transmission conflicts. In high-density situations, the hidden terminal problem can cause unnecessary data conflicts and greatly reduce network performance. The multi-hop exposed terminal problem refers to a node that is within the coverage of the sending node but outside the coverage of the receiving node. The exposed terminal delays sending because it hears the sending of the sending node. Exposing the presence of terminals will lead to unnecessary waste of channel resources.

发明内容Contents of the invention

为了有效降低网络中的干扰和数据冲突，提高信道的利用率和系统吞吐量，保证节点之间数据业务传输的可靠性，本发明提出一种无线多跳网络中分布式信道分配方法，采用至少包括物理设备层、计算层和网络服务层的物理架构，物理设备层由随机部署在网络中的n个无线节点组成一个多跳的无线通信网络，每个节点作为一个自治的智能体Agent，通过本地决策模块与不确定的网络环境进行交互；计算层的汇聚节点负责对网络中其他站点所收集的数据进行汇聚、分析和处理，且该节点具有边缘计算功能或采用专用边缘服务器节点，即可卸载节点的计算任务，并可基于节点分布式采集的经验信息训练异步DRL模型，将多信道分配问题建模为POMDP问题，利用集中式节点或边缘服务器训练好的异步DRL模型进行分布式的信道分配。In order to effectively reduce interference and data conflicts in the network, improve channel utilization and system throughput, and ensure the reliability of data service transmission between nodes, the present invention proposes a distributed channel allocation method in a wireless multi-hop network, which adopts at least It includes the physical architecture of the physical device layer, computing layer and network service layer. The physical device layer consists of n wireless nodes randomly deployed in the network to form a multi-hop wireless communication network. Each node acts as an autonomous intelligent agent. The local decision-making module interacts with the uncertain network environment; the aggregation node of the computing layer is responsible for aggregating, analyzing and processing data collected by other sites in the network, and the node has edge computing capabilities or uses dedicated edge server nodes. Offload the computing tasks of the node, and train the asynchronous DRL model based on the empirical information collected distributedly by the node. Model the multi-channel allocation problem as a POMDP problem, and use the asynchronous DRL model trained by the centralized node or edge server to perform distributed channel allocation. distribute.

进一步的，将多信道分配问题建模为POMDP问题，即Agent观察当前网络状态s并在时间周期t执行动作a，并在执行动作a后以状态转移概率P转移到下一个时间周期的网络状态s′，并从环境中获得相应的奖励R，则POMDP问题表示为：Furthermore, the multi-channel allocation problem is modeled as a POMDP problem, that is, the Agent observes the current network state s and performs action a in time period t, and after executing action a, it transitions to the network state of the next time period with state transition probability P s′, and obtain the corresponding reward R from the environment, then the POMDP problem is expressed as:

M＝<S,A,P,R,γ>；M＝<S,A,P,R,γ>;

其中，M表示POMDP问题模型；S是状态集合表示状态空间；A是动作集合表示动作空间，其中动作a∈A表示节点欲切换的信道编号；R为奖励函数；γ为折扣因子。即在给定环境状态s∈S,Agent执行动作a∈A，则环境状态将从s迁移到s′，即s→s′，同时从环境获得相应的回报R。Among them, M represents the POMDP problem model; S is the state set representing the state space; A is the action set representing the action space, where the action a∈A represents the channel number that the node wants to switch; R is the reward function; γ is the discount factor. That is, in a given environmental state s∈S, if the Agent performs action a∈A, the environmental state will migrate from s to s′, that is, s→s′, and at the same time obtain the corresponding reward R from the environment.

进一步的，节点i在第t个时间周期观察到的环境状态表示为：Further, the environmental state observed by node i in the t-th time period Expressed as:

其中，表征了节点i的邻居节点对每个无线信道的占用情况，即各信道潜在的干扰度；K是可用信道数量，N是指节点数量；/>表示节点i的邻居节点在占第t个时间周期对信道j的占用情况，/>表示存在节点i的邻居节点使用信道j，/>表示存在节点i的邻居节点使用信道j；/>n_i,o为节点i的邻居节点总数。in, It represents the occupancy of each wireless channel by the neighbor nodes of node i, that is, the potential interference degree of each channel; K is the number of available channels, and N refers to the number of nodes;/> Indicates the occupation of channel j by neighbor nodes of node i in the tth time period,/> Indicates that there is a neighbor node of node i using channel j,/> Indicates that there is a neighbor node of node i using channel j;/> n _i,o is the total number of neighbor nodes of node i.

进一步的，当节点在执行动作a后，并从状态s转移到下一个状态s′时从环境中获得的奖励R可表示为：Furthermore, when a node performs action a and moves from state s to the next state s′, the reward R obtained from the environment can be expressed as:

其中，R(s,a)节点i在第t个数据周期将信道切换为信道k后的奖励R,即R＝R(s,a)；表示当前周期是否存在节点i的邻居节点使用信道k：若不存在节点i的邻居节点使用信道k，则/>反之，/> 为在第t个时间周期，节点i的邻成功传输概率。Among them, R(s,a) is the reward R after node i switches the channel to channel k in the t-th data period, that is, R=R(s,a); Indicates whether there is a neighbor node of node i using channel k in the current cycle: if there is no neighbor node of node i using channel k, then/> On the contrary,/> is the probability of successful transmission by the neighbor of node i in the tth time period.

进一步的，部署在计算层的异步DRL模型包括当前网络、目标网络、误差计算模块和经验池，以及部署在无线节点本地的决策模块，本地决策模块的网络结构与当前网络相同，本地决策模块的参数定期从边缘节点处获取；其中：Further, the asynchronous DRL model deployed at the computing layer includes the current network, target network, error calculation module and experience pool, as well as a decision-making module deployed locally on the wireless node. The network structure of the local decision-making module is the same as the current network. Parameters are obtained periodically from edge nodes; where:

目标网络固定网络参数并获取目标值函数，当前网络用于评估策略更新参数，逼近值函数；The target network fixes the network parameters and obtains the target value function, The current network is used to evaluate policy update parameters and approximate the value function;

当前网络的参数θ每一时间周期都更新；目标网络的参数θ^-每隔固定多个时间周期更新一次，期间保持不变；The parameters θ of the current network are updated every time period; the parameters θ of the target network ^are updated every fixed number of time periods and remain unchanged during the period;

经验池中的经验e＝<s,a,r,s′>,s,s′∈S,a∈A，由网络中的节点异步地从无线多跳网络环境中采集；The experience e=<s,a,r,s′>,s,s′∈S,a∈A in the experience pool is collected asynchronously from the wireless multi-hop network environment by the nodes in the network;

误差计算模块通过目标网络和当前网络计算的TD偏差来更新当前网络的参数；此外，每隔固定时间间隔将当前网络的参数拷贝到目标网络。The error calculation module updates the parameters of the current network through the TD deviation calculated between the target network and the current network; in addition, the parameters of the current network are copied to the target network at regular intervals.

进一步的，目标值函数的计算包括：Furthermore, the objective value function The calculation includes:

其中，R(s_t,a_t)为节点i∈[1,N](N为节点数量)，在第t个时间周期状态s_t∈S执行动作a_t∈A后在第t个时间周期获得的奖励；Q(s_t+1,a_t+1；θ^-),(s_t+1∈S,a_t+1∈A)表示一个网络，即第t+1个时间周期基于目标网络，即参数为θ^-，节点i以状态s_t+1执行动作a_t+1的网络；s_t+1为节点i在第t+1个时间周期的状态；a_t+1为节点i在第t+1个时间周期执行的动作；max_at+1∈AQ(s_t+1,a_t+1；θ^-)表示节点i基于目标网络(参数为θ^-)下，在状态s_t+1下选择动作a_t+1以最大化相应的Q值。Among them, R(s _t , a _t ) is the node i∈[1,N] (N is the number of nodes). After executing the action a _t ∈A in the t-th time period state s _t ∈S, in the t-th time period The reward obtained; Q(s _t+1 ,a _t+1 ; θ ^- ), (s _t+1 ∈S, a _t+1 ∈A) represents a network, that is, the t+1th time period is based on the target network , that is, the parameter is θ ^- , and node i performs action a _t+1 in state s t ₊₁ ; s _t+1 is the state of node i in the t+1th time period; a _t+1 is the state of node i in Action performed in the t+1th time period; max _at+1 ∈AQ(s _t+1 ,a _t+1 ; θ ^- ) indicates that node i is in state s _t + based on the target network (parameter is θ ^- ) ₁ select action a _t+1 to maximize the corresponding Q value.

进一步的，误差计算模块计算当前网络Q(s_t,a_t；θ)和目标值之间的误差：Further, the error calculation module calculates the current network Q (s _t , a _t ; θ) and the target value Error between:

采用梯度下降来更新神经网络参数：Use gradient descent to update neural network parameters:

其中，L(θ)为模型的TD误差函数；表示对所选mini-batch经验数据求期望；θ实时更新的当前网络的参数；α学习率；/>为相应的梯度；Q(s_t,a_t；θ)表示一个网络，即第t个时间周期网络参数为θ下节点i以状态s_t执行动作a_t的网络。Among them, L(θ) is the TD error function of the model; Represents the expectation for the selected mini-batch empirical data; θ parameters of the current network updated in real time; α learning rate;/> is the corresponding gradient; Q(s _t , a _t ; θ) represents a network, that is, a network in which node i performs action a _t in state s _t under the network parameter θ in the t-th time period.

进一步的，将整个系统时间划分为多个连续的超帧时间，一个超帧时间为一个时间周期，每个超帧包括一个信标帧、一个控制周期和一个数据传输周期，控制周期采用一个固定的控制信道来传输相关的控制信息和信道分配决策；数据传输周期采用K个非重叠信道以支持无干扰的并行数据传输；且在控制周期，网络中的所有节点切换到控制信道上以侦听和发送相关的控制信息；数据传输周期有数据要发送的节点切换到其父节点所在的信道上基于信道接入机制进行数据传输。Further, the entire system time is divided into multiple consecutive superframe times. One superframe time is a time period. Each superframe includes a beacon frame, a control period and a data transmission period. The control period adopts a fixed control channel to transmit relevant control information and channel allocation decisions; the data transmission cycle uses K non-overlapping channels to support interference-free parallel data transmission; and during the control cycle, all nodes in the network switch to the control channel to listen Control information related to sending; in the data transmission period, the node with data to be sent switches to the channel where its parent node is located to transmit data based on the channel access mechanism.

进一步的，节点在执行动作a的过程中，采用基于RTS/DCTS的信道接入机制，包括：Further, when performing action a, the node adopts a channel access mechanism based on RTS/DCTS, including:

若节点d位于第m跳、其下一跳第m+1跳节点为节点i，即节点d是节点i的父节点；若节点e位于第m跳、其下一跳第m+1跳节点为节点j，即节点e是节点j的父节点；四个节点均工作在相同信道上，且节点i和节点j的退避值为0；If node d is located at the mth hop, its next hop is the m+1th hop node, that is, node d is the parent node of node i; if node e is located at the mth hop, its next hop is the m+1th hop node. is node j, that is, node e is the parent node of node j; the four nodes all work on the same channel, and the backoff values of node i and node j are 0;

当节点i发送一个RTS帧给节点d时，节点d等待一个CIFS时间，再返回一个CTS帧；When node i sends an RTS frame to node d, node d waits for a CIFS time and then returns a CTS frame;

节点d的子节点在接收到节点i的RTS帧或节点d的CTS帧后，将基于Duration字段中的信息设置相应的NAV；After receiving the RTS frame of node i or the CTS frame of node d, the child node of node d will set the corresponding NAV based on the information in the Duration field;

当节点e接收到来自节点i的RTS帧，等待一个SIFS，返回一个CTS帧来通知其子节点在节点i传输期间，其子节点延迟数据传输；When node e receives an RTS frame from node i, it waits for a SIFS and returns a CTS frame to notify its child nodes that its child nodes delay data transmission during the transmission of node i;

其中，RTS指请求发送；CTS指清除发送；CIFS为用于目的节点返回CTS的帧间间隔；SIFS指用来分隔开属于一次对话的各帧，并且CIFS略大于SIFS。Among them, RTS refers to request to send; CTS refers to clear to send; CIFS is the inter-frame interval used for the destination node to return CTS; SIFS refers to the frame used to separate each frame belonging to a conversation, and CIFS is slightly larger than SIFS.

进一步的，若节点j位于节点i的通信范围内，且其父节点没有位于节点i的通信范围，则当节点j收到RTS帧后，等待一个RIFS后，节点j发送RTS帧给父节点e。Furthermore, if node j is within the communication range of node i, and its parent node is not within the communication range of node i, then after node j receives the RTS frame, after waiting for a RIFS, node j sends the RTS frame to parent node e. .

本发明解决在高密度多跳无线网络中的隐藏终端和暴露终端问题，有效地避免了数据冲突和信道资源浪费问题，以提高整体的网络性能。此外，基于节点在数据传输周期的信道接入性能和信道占用情况，针对无线多跳多信道网络，提出了一种异步的DRL模型来动态优化节点的信道分配策略。提出了一种基于移动边缘计算(MEC)的新型无线模式解决了终端节点的计算和存储压力，设计了一个分布式交互(微学习)和集中训练(宏学习)框架来训练异步DRL模型。因此，即使在资源受限的终端上也可以实现了本发明所提出的异步DRL模型。此外，本发明考虑了多智能体场景(MAS)中的非平稳问题，仅利用邻居局部信息在避免了网络剧烈的动态变化的同时，可进一步加速了网络收敛性。The invention solves the problems of hidden terminals and exposed terminals in high-density multi-hop wireless networks, effectively avoids data conflicts and waste of channel resources, and improves overall network performance. In addition, based on the node's channel access performance and channel occupancy during the data transmission cycle, an asynchronous DRL model is proposed to dynamically optimize the node's channel allocation strategy for wireless multi-hop multi-channel networks. A new wireless mode based on mobile edge computing (MEC) is proposed to solve the computing and storage pressure of terminal nodes, and a distributed interaction (micro-learning) and centralized training (macro-learning) framework is designed to train asynchronous DRL models. Therefore, the asynchronous DRL model proposed by the present invention can be implemented even on a resource-limited terminal. In addition, the present invention considers the non-stationary problem in the multi-agent scenario (MAS), and only uses local local information of neighbors to avoid drastic dynamic changes in the network, while further accelerating network convergence.

附图说明Description of drawings

图1是现有技术中提供的多信道中隐藏和暴露终端示例图；Figure 1 is an example diagram of hidden and exposed terminals in multiple channels provided in the prior art;

图2是本发明实施例提供的边缘计算赋能的系统架构图；Figure 2 is a system architecture diagram of edge computing empowerment provided by an embodiment of the present invention;

图3是本发明采用的超帧结构图；Figure 3 is a superframe structure diagram used in the present invention;

图4是本发明中基于分布式决策架构的异步DRL模型；Figure 4 is an asynchronous DRL model based on distributed decision-making architecture in the present invention;

图5是本发明实施例提供的异步DRL模型集中式训练流程。Figure 5 is a centralized training process for asynchronous DRL models provided by an embodiment of the present invention.

图6是本发明实施例提供的RTS/DCTS工作原理图之一；Figure 6 is one of the working principle diagrams of RTS/DCTS provided by the embodiment of the present invention;

图7是本发明实施例提供的RTS/DCTS工作原理图之二。Figure 7 is the second working principle diagram of RTS/DCTS provided by the embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.

本发明提出一种无线多跳网络中分布式信道分配方法，采用至少包括物理设备层、计算层和网络服务层的物理架构，物理设备层由随机部署在网络中的n个无线节点组成一个多跳的无线通信网络，每个节点作为一个自治的智能体Agent，通过本地决策模块与不确定的网络环境进行交互；计算层的汇聚节点负责对网络中其他站点所收集的数据进行汇聚、分析和处理，且该节点具有边缘计算功能，即可卸载节点的计算任务，并可基于节点分布式采集的经验信息训练异步DRL模型，将多信道分配问题建模为POMDP问题，利用训练好的异步DRL模型进行信道分配。The present invention proposes a distributed channel allocation method in a wireless multi-hop network, adopting a physical architecture that at least includes a physical device layer, a computing layer and a network service layer. The physical device layer is composed of n wireless nodes randomly deployed in the network. In a hopping wireless communication network, each node acts as an autonomous intelligent agent and interacts with the uncertain network environment through local decision-making modules; the convergence node in the computing layer is responsible for aggregating, analyzing and analyzing the data collected by other sites in the network. Processing, and the node has edge computing function, it can offload the computing tasks of the node, and train the asynchronous DRL model based on the empirical information collected distributedly by the node, model the multi-channel allocation problem as a POMDP problem, and use the trained asynchronous DRL The model performs channel allocation.

实施例1Example 1

本实施例给出系统架构图，如图2所示，系统架构包括物理设备层、计算层和网络服务层。其中，物理设备层由随机部署在网络中的n个无线节点组成一个多跳的无线通信网络，每个节点作为一个自治的智能体Agent，通过本地决策模块与不确定的网络环境进行交互；计算层的汇聚节点负责对网络中其他站点所收集的数据进行汇聚、分析和处理，且该节点具有边缘计算功能可卸载节点的计算任务，可基于节点分布式采集的经验信息训练异步DRL模型。This embodiment provides a system architecture diagram, as shown in Figure 2. The system architecture includes a physical device layer, a computing layer, and a network service layer. Among them, the physical device layer consists of n wireless nodes randomly deployed in the network to form a multi-hop wireless communication network. Each node acts as an autonomous intelligent agent and interacts with the uncertain network environment through the local decision-making module; calculation The aggregation node of the layer is responsible for aggregating, analyzing and processing data collected by other sites in the network, and the node has an edge computing function that can offload the computing tasks of the node, and can train an asynchronous DRL model based on the empirical information collected distributedly by the node.

在进行数据传输过程中，本实施例选择以超帧结构进行数据传输，超帧结构如图3所示，系统时间划分为多个连续的超帧时间，每个超帧包括一个信标帧，控制周期和数据传输周期。其中，控制周期采用一个固定的控制信道来传相关的控制信息和信道分配决策；数据传输周期采用K个非重叠信道以支持无干扰的并行数据传输。因此，在控制周期，网络中的所有节点要切换到控制信道上以侦听和发送相关的控制信息(路由、时间同步、信道切换等)；数据传输周期有数据要发送的节点切换到其父节点所在的信道上基于信道接入机制进行数据传输。During the data transmission process, this embodiment chooses to transmit data in a superframe structure. The superframe structure is shown in Figure 3. The system time is divided into multiple consecutive superframe times. Each superframe includes a beacon frame. control cycle and data transmission cycle. Among them, the control period uses a fixed control channel to transmit relevant control information and channel allocation decisions; the data transmission period uses K non-overlapping channels to support interference-free parallel data transmission. Therefore, during the control period, all nodes in the network must switch to the control channel to listen and send related control information (routing, time synchronization, channel switching, etc.); during the data transmission period, nodes with data to be sent switch to their parent Data transmission is performed based on the channel access mechanism on the channel where the node is located.

本实施例采用的异步DRL模型如图4所示，采用DRL来解决多跳无线网络中的动态多通道分配问题。本发明实施例结合了DQN函数逼近能力和A3C异步经验采样架构，提出了异步DRL模型，旨在为节点合理分配信道，以最大限度地提高数据传输的可靠性。其中，部署在边缘服务器上的DRL模型采用DQN架构，引入DNN从原始数据中提取特征来逼近行为值函数，同时结合A3C的异步训练框架来解决DQN不适合于高维动作空间和MAS问题，打破了经验之间的相关性，显著提高了网络的收敛速度，解决了无法在资源受限的无线节点上实现A3C算法的问题。The asynchronous DRL model used in this embodiment is shown in Figure 4. DRL is used to solve the dynamic multi-channel allocation problem in multi-hop wireless networks. The embodiment of the present invention combines the DQN function approximation capability and the A3C asynchronous experience sampling architecture, and proposes an asynchronous DRL model, aiming to reasonably allocate channels to nodes to maximize the reliability of data transmission. Among them, the DRL model deployed on the edge server adopts the DQN architecture. DNN is introduced to extract features from the original data to approximate the behavior value function. At the same time, it is combined with the asynchronous training framework of A3C to solve the problem that DQN is not suitable for high-dimensional action spaces and MAS, breaking the It improves the correlation between experiences, significantly improves the convergence speed of the network, and solves the problem of being unable to implement the A3C algorithm on resource-limited wireless nodes.

本实施例考虑某些场景下无线节点的计算能力、能量和内存能力有限，导致计算瓶颈和性能低下，限制了对高级应用的支持，并运行计算密集型任务，即训练DRL模型。因此，本发明实施例采用基于边缘计算赋能的无线网络架构，将节点训练异步DRL模型的计算任务转移给资源丰富的边缘节点(汇聚节点)。如图2所示，部署在计算层的异步DRL模型由当前网络(main)、目标网络(target)和经验池(experience replay)组成。因此，边缘计算赋能的汇聚节点完成模型的训练和更新任务。This embodiment considers that the computing power, energy, and memory capacity of wireless nodes in certain scenarios are limited, resulting in computing bottlenecks and low performance, limiting support for advanced applications, and running computationally intensive tasks, that is, training DRL models. Therefore, embodiments of the present invention adopt a wireless network architecture based on edge computing empowerment to transfer the computing tasks of node training asynchronous DRL models to resource-rich edge nodes (aggregation nodes). As shown in Figure 2, the asynchronous DRL model deployed at the computing layer consists of the current network (main), the target network (target) and the experience pool (experience replay). Therefore, the convergence node empowered by edge computing completes the training and update tasks of the model.

在采用异步DRL模型进行信道分配时，本发明结合了DQN的函数逼近能力和A3C的异步交互体系结构，在图4给出的异步DRL模型中分布式交互模块(微学习)允许终端节点使用本地观测信息异步选择信道资源。此外，集中训练模块(宏学习)通过调整操作参数来训练异步DRL模型，从而引导系统朝着特定于应用程序的全局优化目标(例如，最大化数据传输的可靠性)前进。其中，每个终端节点维护一个DRL预测模型来独立地分配信道。具体来说，本发明实施例将多信道分配问题建模为POMDP问题，POMDP问题由五个元组组成:M＝<S,A,P,R,γ>，状态s、动作a、状态转移概率P、奖励函数R和折扣因子γ。Agent观察当前网络状态s并在每个时间步t的控制周期，执行动作a。然后以状态转移概率转移到下一个状态，从环境中获得奖励R_t+1。When using the asynchronous DRL model for channel allocation, the present invention combines the function approximation capability of DQN and the asynchronous interaction architecture of A3C. In the asynchronous DRL model given in Figure 4, the distributed interaction module (micro-learning) allows terminal nodes to use local Observation information asynchronously selects channel resources. Furthermore, a centralized training module (macro learning) trains the asynchronous DRL model by adjusting operating parameters, thus guiding the system toward application-specific global optimization goals (e.g., maximizing the reliability of data transmission). Among them, each terminal node maintains a DRL prediction model to independently allocate channels. Specifically, the embodiment of the present invention models the multi-channel allocation problem as a POMDP problem. The POMDP problem consists of five tuples: M = <S, A, P, R, γ>, state s, action a, and state transition. Probability P, reward function R and discount factor γ. The agent observes the current network state s and executes action a in the control cycle of each time step t. Then it moves to the next state with the state transition probability and obtains reward R _t+1 from the environment.

状态空间，S＝{S₁,S₂,...,S_2K+N}。其中，K是可用信道数量，N是指节点数量。对于特定节点i，在第t个周期时，其状态向量， State space, S={S ₁ , S ₂ ,..., S _2K+N }. Among them, K is the number of available channels, and N refers to the number of nodes. For a specific node i, in the tth period, its state vector,

其中，表示节点i的邻居节点对信道j的占用情况，/>表示存在节点i的邻居信道占用了信道j；反之，S_i,t,j＝0。/>是节点i的邻居节点总数。in, Indicates the occupation of channel j by neighbor nodes of node i,/> It means that there is a neighbor channel of node i occupying channel j; otherwise, S _i,t,j =0. /> is the total number of neighbor nodes of node i.

动作空间，A＝{a₁,a₂...,a_K}，a_k∈A。其中，用于表示节点i在下一数据传输周期欲切换的信道编号，a_k＝ch_i,t,k,ch_i,t,k＝k∈[1,K]。Action space, A={a ₁ ,a ₂ ...,a _K }, a _k ∈A. Among them, it is used to indicate the channel number that node i wants to switch in the next data transmission cycle, a _k =ch _i,t,k , ch _i,t,k =k∈[1,K].

奖励函数，R。当节点i在第t个数据周期，局部观测状态执行动作切换到信道ch_i,t,k时，在该数据传输周期结束后，环境会返回给该节点一个立即奖励值，R＝R(s,a)，该值可通过下列函数进行求解：Reward function, R. When node i is in the t-th data period, the local observation state perform action When switching to channel ch _i,t,k , after the data transmission period ends, the environment will return an immediate reward value to the node, R=R(s,a). This value can be solved by the following function:

其中，在当前数据周期，表示不存在节点i的邻居节点使用信道ch_i,t,k；反之，/> 是使用信道ch_i,t,k＝k的节点i的邻居节点数。/>是节点在ch_i,t,k上进行数据传输的成功传输概率。in, In the current data period, it means that there is no neighbor node of node i using channel ch _i,t,k ; otherwise,/> is the number of neighbor nodes of node i using channel ch _i,t,k =k. /> is the successful transmission probability of node transmitting data on ch _i,t,k .

边缘计算赋能的汇聚节点基于网络中每个节点分布式异步采集的经验信息集中式地训练DRL模型，并把更新后的网络模型参数发送给节点，每个节点可以从其父节点处获取最新的网络参数。The convergence node empowered by edge computing centrally trains the DRL model based on the experience information collected asynchronously distributed by each node in the network, and sends the updated network model parameters to the node. Each node can obtain the latest data from its parent node. network parameters.

DRL模型的集中式训练过程如图5所示，异步DRL模型中存在两个结构完全相同但是参数却不同的网络，预测Q估计的当前值，其使用的是最新的参数；而预测Q现实的神经网络目标值参数，其使用之前的旧参数。在本实施例中将节点的状态作为神经网络的输入，并将每个节点执行不同的动作作为节点的的类别，通过神经网络预测节点执行每个动作的概率，将该概率作为神经网络的输出，即Q的值，例如Q(s,a；θ)表示在神经网络的参数为θ情况下，输入节点状态s，节点执行动作a的概率。The centralized training process of the DRL model is shown in Figure 5. There are two networks with the same structure but different parameters in the asynchronous DRL model. The current value of the Q estimate is predicted using the latest parameters; while the realistic prediction of Q Neural network target value parameter, which uses the old parameters from before. In this embodiment, the status of the node is used as the input of the neural network, and the different actions performed by each node are used as the category of the node. The probability of the node executing each action is predicted through the neural network, and this probability is used as the output of the neural network. , that is, the value of Q, for example, Q(s,a; θ) represents the probability of the node executing action a when the input node state s is θ when the parameters of the neural network are θ.

模型训练时，随机从经验池中拿出一些(mini-batch)经验来训练，以打破经验之间的相关性。此外，由于本发明中经验池中的经验信息由智能体异步地采样提供，因此可进一步打破经验之间的相关性，并且提供更加丰富的经验。When training the model, some (mini-batch) experiences are randomly taken from the experience pool for training to break the correlation between experiences. In addition, since the experience information in the experience pool in the present invention is asynchronously sampled and provided by the agent, the correlation between experiences can be further broken and a richer experience can be provided.

从图5中可以看出<s,a>信息作为当前值网络的输入，以获取Q(s,a；θ)，用来评估当前状态行为值函数；s′∈S信息用于目标值网络的输入，以获取对应的maxQ(s′,a′；θ^-)；计算出包括：It can be seen from Figure 5 that <s, a> information is used as the input of the current value network to obtain Q (s, a; θ), which is used to evaluate the current state behavior value function; s′∈S information is used for the target value network input to obtain the corresponding maxQ(s′,a′; θ ^- ); calculate include:

因此，基于值，采用DQN误差函数模块，可以进一步计算出误差值：Therefore, based on value, using the DQN error function module, the error value can be further calculated:

当前网络基于误差函数梯度来更新当前值网络的参数：The current network updates the parameters of the current value network based on the gradient of the error function:

其中，s∈S,a∈A。每经过一定次数的迭代，将当前值网络的参数复制给目标值网络；Among them, s∈S,a∈A. After a certain number of iterations, the parameters of the current value network are copied to the target value network;

θ^-←θθ ^- ←θ

重复上述过程使网络达到稳定状态。Repeat the above process until the network reaches a stable state.

虽然,基于异步DRL的信道分配模型通过应用多个并行数据传输来提高网络性能，但是在高密集无线多跳网络场景下，特定信道上的隐藏终端和暴露终端问题将进一步加剧。图1示出了无线多跳网络中的隐藏终端和暴露问题，当节点D正在给节点C传输数据时，由于节点B位于节点D的通信范围外。因此，节点B误认为信道处于空闲状态，故当节点B此时给节点C和A发送数据时，在节点C处发生数据冲突，导致不必要的数据重传，进一步加剧网络拥塞程度；此外，当节点B1给节点A1传输数据时，由于节点B2处于节点B1的通信范围，且节点B2和A2分别未处于节点A1和B1的通信范围时，节点B2误认为信道处于空闲状态而延迟数据发送，这将导致不必要的信道资源浪费。因此，本发明实施例提出基于RTS/DCTS机制来解决上述无线多跳网络中的隐藏终端和暴露终端问题。下面通过举例来进一步描述RTS/DCTS机制。Although the asynchronous DRL-based channel allocation model improves network performance by applying multiple parallel data transmissions, the problem of hidden terminals and exposed terminals on specific channels will be further exacerbated in high-density wireless multi-hop network scenarios. Figure 1 shows the hidden terminal and exposure problems in wireless multi-hop networks. When node D is transmitting data to node C, node B is located outside the communication range of node D. Therefore, node B mistakenly believes that the channel is in an idle state, so when node B sends data to nodes C and A at this time, a data conflict occurs at node C, resulting in unnecessary data retransmission, further exacerbating network congestion; in addition, When node B1 transmits data to node A1, because node B2 is in the communication range of node B1, and nodes B2 and A2 are not in the communication range of node A1 and B1 respectively, node B2 mistakenly thinks that the channel is in an idle state and delays data transmission. This will lead to unnecessary waste of channel resources. Therefore, embodiments of the present invention propose to solve the above problems of hidden terminals and exposed terminals in the wireless multi-hop network based on the RTS/DCTS mechanism. The RTS/DCTS mechanism is further described below through examples.

图6为本发明较佳实施例提供的基于RTS/DCTS解决无线多跳网络中隐藏终端问题的示意图。其中，节点i和j，节点d和e分别位于m和m+1跳(指不同且相邻跳数)且工作在相同信道上时。节点d是节点i的父节点，节点e是节点j的父节点。节点e也是节点i的邻居节点。假设此时节点i和j的退避值都是0。Figure 6 is a schematic diagram of solving the problem of hidden terminals in wireless multi-hop networks based on RTS/DCTS according to a preferred embodiment of the present invention. Among them, nodes i and j, nodes d and e are respectively located at m and m+1 hops (referring to different and adjacent hop numbers) and work on the same channel. Node d is the parent node of node i, and node e is the parent node of node j. Node e is also a neighbor node of node i. Assume that the backoff values of nodes i and j are both 0 at this time.

当节点e接收到来自节点i的RTS帧，等待一个SIFS，返回一个CTS帧来通知其子节点在节点i传输期间，其子节点延迟数据传输，以此避免隐藏终端问题。When node e receives an RTS frame from node i, it waits for a SIFS and returns a CTS frame to notify its child nodes that during the transmission of node i, its child nodes delay data transmission to avoid the hidden terminal problem.

在所述多跳环境下的信道接入机制中，隐藏终端问题不可避免，因此节点i在特定信道k上的成功传输概率，可用下列公式进行计算：In the channel access mechanism in the multi-hop environment, the hidden terminal problem is inevitable, so the successful transmission probability of node i on a specific channel k, The following formula can be used to calculate:

其中，τ是在所述信道接入时隙中的传输概率。具体地，(n_s是该节点的父节点总的子节点数)。n_a表示节点i的邻居节点数，而n_f表示节点i的父节点的邻居节点数(不包括该父节点的子节点)。where τ is the transmission probability in the channel access slot. specifically, ( _ns is the total number of child nodes of the node's parent node). n _a represents the number of neighbor nodes of node i, and n _f represents the number of neighbor nodes of the parent node of node i (excluding the child nodes of the parent node).

针对所述保留终端问题，请参照图7，图7为本发明较佳实施例提供的基于RTS/DCTS解决无线多跳网络中暴露终端问题的一个实例的示意图。其中，节点i和j，节点d和e分别位于m和m+1跳(指不同且相邻跳数)且工作在相同信道上时。节点d是节点i的父节点，节点e是节点j的父节点。节点j也是节点i的邻居节点。假设此时节点i和j的退避值都是0。Regarding the reserved terminal problem, please refer to FIG. 7 , which is a schematic diagram of an example of solving the problem of exposed terminals in a wireless multi-hop network based on RTS/DCTS according to a preferred embodiment of the present invention. Among them, nodes i and j, nodes d and e are respectively located at m and m+1 hops (referring to different and adjacent hop numbers) and work on the same channel. Node d is the parent node of node i, and node e is the parent node of node j. Node j is also a neighbor node of node i. Assume that the backoff values of nodes i and j are both 0 at this time.

当节点i发送RTS给节点d时，节点d等待一个CIFS时间，再返回一个CTS帧；因为节点j位于节点i的通信范围内。因此，节点j也会收到RTS帧，但由于该RTS帧的目的节点不是节点j的目的节点，因此节点j不会根据该RTS的Duration字段信息设置NAV；When node i sends RTS to node d, node d waits for a CIFS time and then returns a CTS frame; because node j is within the communication range of node i. Therefore, node j will also receive an RTS frame, but since the destination node of the RTS frame is not the destination node of node j, node j will not set NAV based on the Duration field information of the RTS;

当节点j收到RTS帧后，等待一个RIFS后，判断是否收到CTS帧；由于其父节点e不在节点i的通信范围内，所以节点e不会在SIFS后返回一个CTS；因此，节点j没有在RIFS后没有接收到CTS帧；节点j发送RTS帧给父节点e；When node j receives the RTS frame, it waits for a RIFS to determine whether it has received the CTS frame; because its parent node e is not within the communication range of node i, node e will not return a CTS after SIFS; therefore, node j No CTS frame is received after RIFS; node j sends RTS frame to parent node e;

网络中的节点执行上述过程，即可有效地解决网络中的隐藏终端和暴露终端导致的数据冲突和信道资源浪费问题；因此，成功传输概率可以重写为：When nodes in the network perform the above process, they can effectively solve the data conflict and channel resource waste problems caused by hidden terminals and exposed terminals in the network; therefore, the successful transmission probability can be rewritten as:

基于所述RTS/DCTS机制，位于同一信道上的相邻父节点下的数据链路之间的数据冲突可以通过SIFS和CTS有效地避免；此外，所述信道接入机制，引入RIFS帧间间隔解决了网络中的暴力终端问题，从而提升了节点的成功传输概率，即因此，所述信道接入机制能够提升网络中节点的成功传输概率；Based on the RTS/DCTS mechanism, data conflicts between data links under adjacent parent nodes located on the same channel can be effectively avoided through SIFS and CTS; in addition, the channel access mechanism introduces RIFS inter-frame spacing It solves the problem of violent terminals in the network, thus improving the probability of successful transmission of nodes, that is, Therefore, the channel access mechanism can improve the successful transmission probability of nodes in the network;

此外，从上述公式可以看出P_s与参数n_a和n_f直接相关，而参数n_s，n_a和n_f可以通过优化信道分配策略进一步优化；因此，本发明实施例将节点在所在信道上的成功传输概率/>用于信道分配模型奖励函数的一部分，旨在进一步优化网络性能。In addition, it can be seen from the above formula that P _s is related to the parameter n _a and n _f are directly related, and the parameters n _s , n _a and n _f can be further optimized by optimizing the channel allocation strategy; therefore, the embodiment of the present invention takes the node’s successful transmission probability on the channel/> Used as part of the channel allocation model reward function to further optimize network performance.

本发明实施例提出的信道分配和信道接入机制，首先在从不同的层面上优化信道资源，信道分配从频域上，信道接入从时域上优化信道资源。此外，合理的信道分配机制将进一步缓解信道接入过程中的干扰问题，节点的信道接入性能将进一步优化其信道分配策略。The channel allocation and channel access mechanism proposed by the embodiment of the present invention first optimizes channel resources from different levels. Channel allocation optimizes channel resources from the frequency domain, and channel access optimizes channel resources from the time domain. In addition, a reasonable channel allocation mechanism will further alleviate the interference problem during the channel access process, and the node's channel access performance will further optimize its channel allocation strategy.

尽管已经示出和描述了本发明的实施例，对于本领域的普通技术人员而言，可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型，本发明的范围由所附权利要求及其等同物限定。Although the embodiments of the present invention have been shown and described, those of ordinary skill in the art will understand that various changes, modifications, and substitutions can be made to these embodiments without departing from the principles and spirit of the invention. and modifications, the scope of the invention is defined by the appended claims and their equivalents.

Claims

1. A distributed channel allocation method in a wireless multi-hop network is characterized in that a physical architecture at least comprising a physical device layer, a computing layer and a network service layer is adopted, the physical device layer comprises n wireless nodes which are randomly deployed in the network to form a multi-hop wireless communication network, the multi-channel allocation problem is modeled as a POMDP problem, an asynchronous DRL model is utilized to realize distributed channel allocation, each node is used as an autonomous Agent, interaction is carried out with an uncertain network environment through a local decision module, a gathering node of the computing layer is responsible for gathering, analyzing and processing data collected by other stations in the network, the node has an edge computing function, the computing task of the node can be unloaded, the asynchronous DRL model can be trained based on experience information collected by the node in a distributed mode, and the wireless node periodically updates parameters of the local decision module from the gathering node, and the method concretely comprises the following steps:

the POMDP problem consists of five tuples, m= < S, a, P, R, γ >, state S, action a, state transition probability P, reward function R, and discount factor γ;

the Agent observes the current network state s and executes the action a in the control period of each time step t; then transition to the next state with state transition probability, obtaining rewards R from the environment _t+1 ；

State spaceWhere K is the number of available channels and N is the number of nodes; for a specific node i, at the t-th period, its state vector, +.>

Wherein,representing the occupancy of channel j by the neighbor node of node i,/>Indicating that the neighbor channel with node i occupies channel j; on the contrary, S _i,t,j ＝0；/>Is the total number of neighbor nodes of node i;

motion space a= { a ₁ ,a ₂ ...,a _K }，a _k E A, wherein a is a channel number for indicating that node i is to switch in the next data transmission period _k ＝ch _i,t,k ,ch _i,t,k ＝k∈[1,K]；

Reward function R, when node i is in the t data period, locally observing stateExecuting an actionSwitching to channel ch _i,t,k At the end of the data transmission period, the environment returns to the node an immediate prize value, r=r (s, a), which can be solved by the following function:

wherein,in the current data period, the neighbor node without the node i uses the channel ch _i,t,k The method comprises the steps of carrying out a first treatment on the surface of the On the contrary, let(s)> Is to use channel ch _i,t,k Number of neighbor nodes of node i of=k; />Is node ch _i,t,k Successful transmission probability of data transmission is carried out on the data;

the aggregation node energized by edge calculation is used for intensively training a DRL model based on the experience information acquired by each node in the network in a distributed and asynchronous way, and sending updated network model parameters to the nodes, wherein each node can acquire the latest network parameters from a father node;

taking the states of the nodes as inputs of the neural network, and executing each node differentlyThe action is taken as the category of the node, the probability of each action executed by the node is predicted by the neural network, and the probability is taken as the output of the neural network, namely<s,a>Information is used as input of a current value network to acquire Q (s, a; theta) for evaluating a current state behavior value function; the S ' S information is used for the input of the target value network to obtain the corresponding maxQ (S ', a '; θ) ^- ) The method comprises the steps of carrying out a first treatment on the surface of the CalculatingComprising the following steps:

thus, based onThe value, adopting the DQN error function module, can further calculate the error value:

the current network updates parameters of the current value network based on the error function gradient:

wherein S epsilon S and a epsilon A; copying parameters of the current value network to the target value network every time a certain number of iterations are performed;

θ ^- ←θ

repeating the above process to make the network reach a stable state.

2. The method of claim 1, wherein the entire system time is divided into a plurality of consecutive super-frame times, one super-frame time being a time period, each super-frame including a beacon frame, a control period and a data transmission period, the control period employing a fixed control channel to transmit the associated control information and channel allocation decisions; k non-overlapping channels are adopted in the data transmission period to support interference-free parallel data transmission; and in the control period, all nodes in the network switch to the control channel to intercept and send the relevant control information; and switching the node with data to be sent to a channel where a parent node is located in the data transmission period to perform data transmission based on a channel access mechanism.

3. The method for distributed channel allocation in a wireless multi-hop network according to claim 1, wherein the node uses an RTS/DCTS-based channel access mechanism in performing act a, comprising:

if the node d is located in the m-th hop and the m+1st hop node of the next hop is the node i, namely the node d is a father node of the node i; if the node e is located in the m-th hop and the m+1st hop node of the next hop is the node j, namely the node e is the father node of the node j; the four nodes all work on the same channel, and the back-off value of the node i and the node j is 0;

when the node i sends an RTS frame to the node d, the node d waits for a CIFS time and returns a CTS frame;

after receiving the RTS frame of the node i or the CTS frame of the node d, the child node of the node d sets a corresponding NAV based on the information in the Duration field;

when node e receives the RTS frame from node i, waiting for a SIFS, returning a CTS frame to inform the child node thereof that the child node delays data transmission during the transmission of node i;

wherein, RTS refers to request sending; CTS refers to clear to send; CIFS is the interframe space for the destination node to return CTS; SIFS refers to a technique for separating frames belonging to a session, and CIFS is slightly larger than SIFS.

4. A method for distributing channels in a wireless multi-hop network according to claim 3, wherein if node j is located in the communication range of node i and its parent node is not located in the communication range of node i, when node j receives the RTS frame, after waiting for a RIFS, node j sends the RTS frame to parent node e.