CN118311976A

CN118311976A - Multi-UAV obstacle avoidance method, system, device and medium based on CFS

Info

Publication number: CN118311976A
Application number: CN202410719116.4A
Authority: CN
Inventors: 庄嘉帆; 韩高飞; 夏子皓; 林澈; 李文姬; 范衠; 郝志峰
Original assignee: Shantou University
Current assignee: Shantou University
Priority date: 2024-06-05
Filing date: 2024-06-05
Publication date: 2024-07-09
Anticipated expiration: 2044-06-05
Also published as: CN118311976B

Abstract

The present invention relates to the field of machine learning technology, and specifically to a multi-UAV obstacle avoidance method, system, device and medium based on CFS. The method comprises: obtaining a depth image collected by a UAV of a current environment, encoding the depth image to obtain a visual representation, integrating the current speed, target position and the visual representation of the UAV to obtain an observation vector; inputting the observation vector into a pre-trained obstacle avoidance model to obtain the flight action of the UAV; the obstacle avoidance model is constructed based on a deep reinforcement learning network, the obstacle avoidance model comprises a parallel visual feature extraction network, a policy network and a value network, the policy network comprises a plurality of linear layers, a CFS module is embedded between adjacent linear layers, and the CFS module is used to select causal features directly related to the obstacle avoidance task from the visual representation; controlling the flight of the UAV based on the flight action of the UAV until it reaches the target position; the present invention can provide a more generalized obstacle avoidance strategy in a complex and unknown environment.

Description

Multi-UAV obstacle avoidance method, system, device and medium based on CFS

技术领域Technical Field

本发明涉及机器学习技术领域，尤其涉及一种基于CFS的多无人机避障方法、系统、装置及介质。The present invention relates to the field of machine learning technology, and in particular to a multi-UAV obstacle avoidance method, system, device and medium based on CFS.

背景技术Background technique

随着无人机（UAV）系统的快速发展，它们已被广泛应用于多个领域，如农业、搜索与救援、矿业和巡逻检查等。为了实现多UAV之间的有效协作，寻找一条避开障碍物到达目标位置的最优路径变得尤为重要，尤其是在大规模无人机群体中。传统的多UAV避障方法主要依赖于实时的同步定位与地图构建（SLAM）技术，使用如激光雷达（LiDAR）等传感器来感知周围环境，并通过路径规划生成轨迹。此外，为了提高SLAM系统的性能，通常会引入先验地图信息。然而，这些传统方法通常需要大量的计算资源，并且受限于已有的先验地图信息，难以适应未知环境。With the rapid development of unmanned aerial vehicle (UAV) systems, they have been widely used in many fields, such as agriculture, search and rescue, mining, and patrol inspection. In order to achieve effective collaboration between multiple UAVs, it is particularly important to find an optimal path to avoid obstacles and reach the target location, especially in large-scale UAV groups. Traditional multi-UAV obstacle avoidance methods mainly rely on real-time simultaneous localization and mapping (SLAM) technology, using sensors such as laser radar (LiDAR) to perceive the surrounding environment and generate trajectories through path planning. In addition, in order to improve the performance of the SLAM system, prior map information is usually introduced. However, these traditional methods usually require a lot of computing resources and are limited by the existing prior map information, making it difficult to adapt to unknown environments.

发明内容Summary of the invention

有鉴于此，本发明实施例的目的是提供一种基于CFS的多无人机避障方法、系统、装置及介质，以解决现有技术中所存在的一个或多个技术问题，至少提供一种有益的选择或创造条件。In view of this, the purpose of the embodiments of the present invention is to provide a multi-UAV obstacle avoidance method, system, device and medium based on CFS to solve one or more technical problems existing in the prior art and at least provide a beneficial choice or create conditions.

一方面，本发明实施例提供了一种基于CFS的多无人机避障方法，所述方法包括以下步骤：On the one hand, an embodiment of the present invention provides a multi-UAV obstacle avoidance method based on CFS, the method comprising the following steps:

获取无人机对当前环境采集的深度图像，通过编码器将所述深度图像编码得到视觉表示，将无人机的当前速度、目标位置、以及所述视觉表示整合得到观测向量；Obtain a depth image of the current environment collected by the drone, encode the depth image through an encoder to obtain a visual representation, and integrate the current speed of the drone, the target position, and the visual representation to obtain an observation vector;

将所述观测向量输入预先训练得到的避障模型，得到无人机的飞行动作；其中，所述避障模型基于深度强化学习网络构建，所述避障模型包括并行的视觉特征提取网络、策略网络和价值网络，所述策略网络包括多个线性层，相邻线性层之间嵌入有CFS模块，所述CFS模块用于从视觉表示中选择与避障任务直接相关的因果特征；The observation vector is input into a pre-trained obstacle avoidance model to obtain the flight action of the UAV; wherein the obstacle avoidance model is constructed based on a deep reinforcement learning network, the obstacle avoidance model includes a parallel visual feature extraction network, a policy network and a value network, the policy network includes a plurality of linear layers, and a CFS module is embedded between adjacent linear layers, and the CFS module is used to select causal features directly related to the obstacle avoidance task from the visual representation;

基于所述无人机的飞行动作控制所述无人机飞行，直至到达目标位置。The UAV is controlled to fly based on the flight action of the UAV until it reaches a target position.

可选地，所述从视觉表示中选择与避障任务直接相关的因果特征，包括：Optionally, selecting causal features directly related to the obstacle avoidance task from the visual representation comprises:

获取视觉表示；Get visual representations;

通过CFS模块中的一个可训练的权重和一个多层感知器生成一个可微分的二进制掩码，将所述二进制掩码嵌入到策略网络中，激活因果特征通道并抑制非因果特征通道，以消除视觉表示中的非因果特征，保留因果特征。A differentiable binary mask is generated through a trainable weight in the CFS module and a multi-layer perceptron, which is embedded into the policy network to activate the causal feature channel and suppress the non-causal feature channel to eliminate the non-causal features in the visual representation and retain the causal features.

可选地，所述避障模型通过以下方式训练得到：Optionally, the obstacle avoidance model is trained in the following manner:

获取样本观测向量；所述样本观测向量包括样本速度、样本目标位置、以及样本深度图像；Acquire a sample observation vector; the sample observation vector includes a sample velocity, a sample target position, and a sample depth image;

基于所述样本深度图像对所述视觉特征提取网络进行迭代训练，得到训练好的视觉特征提取网络；Iteratively training the visual feature extraction network based on the sample depth image to obtain a trained visual feature extraction network;

基于所述样本深度图像、样本速度和样本目标位置对所述策略网络进行迭代训练，通过价值网络对所述策略网络迭代输出的无人机动作进行价值评估，根据价值评估结果对策略网络进行迭代；通过最大化避障奖励函数的方式来指导策略网络的训练和参数优化，得到训练好的策略网络和价值网络；Iteratively train the policy network based on the sample depth image, sample speed and sample target position, perform value evaluation on the drone action iteratively output by the policy network through the value network, and iterate the policy network according to the value evaluation result; guide the training and parameter optimization of the policy network by maximizing the obstacle avoidance reward function to obtain a trained policy network and value network;

将训练好的视觉特征提取网络、训练好的策略网络和价值网络作为避障模型。The trained visual feature extraction network, the trained strategy network and the value network are used as obstacle avoidance models.

可选地，所述基于所述样本深度图像对所述视觉特征提取网络进行迭代训练，得到训练好的视觉特征提取网络，包括：Optionally, the iteratively training the visual feature extraction network based on the sample depth image to obtain a trained visual feature extraction network includes:

将所述样本观测向量输入视觉特征提取网络进行迭代训练；Inputting the sample observation vector into a visual feature extraction network for iterative training;

在迭代训练过程中，获取视觉特征提取网络重构得到的重建图像，基于所述深度图像和对应的重建图像计算所述视觉特征提取网络的第一损失值；During the iterative training process, a reconstructed image obtained by the visual feature extraction network is obtained, and a first loss value of the visual feature extraction network is calculated based on the depth image and the corresponding reconstructed image;

当第一损失值降低到第一阈值时停止迭代训练，得到训练好的视觉特征提取网络。When the first loss value drops to a first threshold, the iterative training is stopped to obtain a trained visual feature extraction network.

可选地，所述获取视觉特征提取网络重构得到的重建图像，包括：Optionally, obtaining a reconstructed image reconstructed by a visual feature extraction network includes:

通过四层卷积网络和一个全连接层将深度图像转换为多维的潜在特征，通过解码器重构得到的重建图像。The depth image is converted into multi-dimensional latent features through a four-layer convolutional network and a fully connected layer, and the reconstructed image is reconstructed by the decoder.

可选地，所述基于所述样本深度图像、样本速度和样本目标位置对所述策略网络进行迭代训练，包括：Optionally, the iterative training of the policy network based on the sample depth image, the sample speed and the sample target position comprises:

采用目标到达奖励函数对策略网络输出无人机的动作进行评价，直至根据目标到达奖励函数计算得到的第二损失值低于第二损失阈值。The target reaching reward function is used to evaluate the action of the drone output by the policy network until the second loss value calculated according to the target reaching reward function is lower than the second loss threshold.

可选地，所述通过价值网络对所述策略网络迭代输出的无人机动作进行价值评估，包括：Optionally, the performing value evaluation on the drone action iteratively output by the strategy network through the value network includes:

采用避障奖励函数对价值网络输出的期望累积奖励进行评价，直至根据避障奖励函数计算得到的第三损失值低于第三损失阈值。The expected cumulative reward output by the value network is evaluated using the obstacle avoidance reward function until the third loss value calculated according to the obstacle avoidance reward function is lower than the third loss threshold.

另一方面，本发明实施例提供了一种基于CFS的多无人机避障系统，包括：On the other hand, an embodiment of the present invention provides a multi-UAV obstacle avoidance system based on CFS, including:

第一模块，用于获取无人机对当前环境采集的深度图像，通过编码器将所述深度图像编码得到视觉表示，将无人机的当前速度、目标位置、以及所述视觉表示整合得到观测向量；The first module is used to obtain a depth image of the current environment collected by the drone, encode the depth image through an encoder to obtain a visual representation, and integrate the current speed, target position, and the visual representation of the drone to obtain an observation vector;

第二模块，用于将所述观测向量输入预先训练得到的避障模型，得到无人机的飞行动作；其中，所述避障模型基于深度强化学习网络构建，所述避障模型包括并行的视觉特征提取网络、策略网络和价值网络，所述策略网络包括多个线性层，相邻线性层之间嵌入有CFS模块，所述CFS模块用于从视觉表示中选择与避障任务直接相关的因果特征；The second module is used to input the observation vector into a pre-trained obstacle avoidance model to obtain the flight action of the UAV; wherein the obstacle avoidance model is constructed based on a deep reinforcement learning network, and the obstacle avoidance model includes a parallel visual feature extraction network, a policy network and a value network, and the policy network includes multiple linear layers, and a CFS module is embedded between adjacent linear layers, and the CFS module is used to select causal features directly related to the obstacle avoidance task from the visual representation;

第三模块，用于基于所述无人机的飞行动作控制所述无人机飞行，直至到达目标位置。The third module is used to control the flight of the drone based on the flight action of the drone until it reaches the target position.

另一方面，本发明实施例提供了一种基于CFS的多无人机避障装置，包括：On the other hand, an embodiment of the present invention provides a multi-UAV obstacle avoidance device based on CFS, comprising:

至少一个处理器；at least one processor;

至少一个存储器，用于存储至少一个程序；at least one memory for storing at least one program;

当所述至少一个程序被所述至少一个处理器执行，使得所述至少一个处理器实现上述的方法。When the at least one program is executed by the at least one processor, the at least one processor implements the above method.

另一方面，本发明实施例提供了一种计算机可读存储介质，其中存储有处理器可执行的程序，所述处理器可执行的程序在由处理器执行时用于执行上述的方法。On the other hand, an embodiment of the present invention provides a computer-readable storage medium, in which a program executable by a processor is stored. When the program executable by the processor is executed by the processor, it is used to perform the above method.

本发明实施例包括以下有益效果：本实施例通过引入CFS模块，使得策略网络能够更好地消除输入特征中的非因果因素的影响，从而在复杂未知的环境中提供更具泛化性的避障策略。The embodiments of the present invention include the following beneficial effects: By introducing the CFS module, the policy network can better eliminate the influence of non-causal factors in the input features, thereby providing a more generalized obstacle avoidance strategy in complex and unknown environments.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for use in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative labor.

图1是本发明实施例提供的一种基于CFS的多无人机避障方法的步骤流程示意图；FIG1 is a schematic diagram of a process flow of a multi-UAV obstacle avoidance method based on CFS provided by an embodiment of the present invention;

图2是本发明实施例提供的一种基于CFS的多无人机避障方法的架构图；FIG2 is an architecture diagram of a multi-UAV obstacle avoidance method based on CFS provided by an embodiment of the present invention;

图3是本发明实施例提供的一种基于CFS的多无人机避障系统的架构图；FIG3 is an architecture diagram of a multi-UAV obstacle avoidance system based on CFS provided by an embodiment of the present invention;

图4是本发明实施例提供的一种基于CFS的多无人机避障装置的结构框图。FIG4 is a structural block diagram of a CFS-based multi-UAV obstacle avoidance device provided in an embodiment of the present invention.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application more clearly understood, the present application is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application and are not used to limit the present application.

需要说明的是，虽然在装置示意图中进行了功能模块划分，在流程图中示出了逻辑顺序，但是在某些情况下，可以以不同于装置中的模块划分，或流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。It should be noted that, although the functional modules are divided in the device schematic diagram and the logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than the module division in the device or the order in the flowchart. The terms "first", "second", etc. in the specification, claims and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence.

除非另有定义，本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的，不是旨在限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as those commonly understood by those skilled in the art to which this application belongs. The terms used herein are only for the purpose of describing the embodiments of this application and are not intended to limit this application.

此外，所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施例中。在下面的描述中，提供许多具体细节从而给出对本申请的实施例的充分理解。然而，本领域技术人员将意识到，可以实践本申请的技术方案而没有特定细节中的一个或更多，或者可以采用其它的方法、组元、装置、步骤等。在其它情况下，不详细示出或描述公知方法、装置、实现或者操作以避免模糊本申请的各方面。In addition, described feature, structure or characteristic can be combined in one or more embodiments in any suitable manner. In the following description, many specific details are provided to provide a full understanding of the embodiments of the present application. However, those skilled in the art will appreciate that the technical scheme of the present application can be put into practice without one or more of the specific details, or other methods, components, devices, steps, etc. can be adopted. In other cases, known methods, devices, realizations or operations are not shown or described in detail to avoid blurring the various aspects of the application.

附图中所示的方框图仅仅是功能实体，不一定必须与物理上独立的实体相对应。即，可以采用软件形式来实现这些功能实体，或在一个或多个硬件模块或集成电路中实现这些功能实体，或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。The block diagrams shown in the accompanying drawings are merely functional entities and do not necessarily correspond to physically independent entities. That is, these functional entities may be implemented in software form, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

附图中所示的流程图仅是示例性说明，不是必须包括所有的内容和操作/步骤，也不是必须按所描述的顺序执行。例如，有的操作/步骤还可以分解，而有的操作/步骤可以合并或部分合并，因此实际执行的顺序有可能根据实际情况改变。The flowcharts shown in the accompanying drawings are only exemplary and do not necessarily include all the contents and operations/steps, nor must they be executed in the order described. For example, some operations/steps can be decomposed, and some operations/steps can be combined or partially combined, so the actual execution order may change according to actual conditions.

下面首先对本发明涉及的若干名称进行解释：First, several names involved in the present invention are explained below:

无人机（Unmanned Aerial Vehicle，UAV），是一种不需要人直接操控的飞行器，它可以自主飞行或通过远程控制进行操作。Unmanned Aerial Vehicle (UAV) is an aircraft that does not require direct human control and can fly autonomously or be operated by remote control.

深度强化学习（Deep Reinforcement Learning，DRL）是机器学习的一个分支，它结合了深度学习和强化学习的概念。深度学习是一种基于神经网络的机器学习技术，而强化学习是一种决策过程，其中智能体（Agent）通过与环境交互来学习如何做出决策以最大化累积奖励。Deep Reinforcement Learning (DRL) is a branch of machine learning that combines the concepts of deep learning and reinforcement learning. Deep learning is a machine learning technique based on neural networks, while reinforcement learning is a decision-making process in which an agent learns how to make decisions to maximize cumulative rewards by interacting with the environment.

软演员-评论家算法（Soft Actor-Critic，SAC）是一种用于解决离散动作空间和连续动作空间的强化学习问题的方法。SAC算法是off-policy的强化学习算法，这意味着它可以利用之前的经验来改进策略，而不仅仅是当前的决策。The Soft Actor-Critic (SAC) algorithm is a method for solving reinforcement learning problems in discrete action spaces and continuous action spaces. The SAC algorithm is an off-policy reinforcement learning algorithm, which means that it can use previous experience to improve the strategy, not just the current decision.

正则化自编码器（Regularized Autoencoder，RAE）是自编码器的一种变体，它通过在自编码器的损失函数中加入正则化项来增强其性能和泛化能力。自编码器是一种无监督学习算法，用于学习数据的有效表示，即特征降维或特征学习。Regularized Autoencoder (RAE) is a variant of autoencoder that enhances its performance and generalization ability by adding a regularization term to the loss function of the autoencoder. Autoencoder is an unsupervised learning algorithm used to learn an effective representation of data, namely feature dimensionality reduction or feature learning.

Actor-Critic算法框架是强化学习中一种常用的策略学习方法，它结合了价值模块（Critic）和策略模块（Actor）的更新机制。策略网络是强化学习算法中实现智能体决策过程的核心组件，其设计和训练直接影响到智能体在特定任务上的性能。在强化学习（Reinforcement Learning, RL）中，策略模块（Actor）与价值模块（Critic）结合使用。价值模块评价当前策略的表现，而策略模块根据这些评价来更新策略，策略模块是核心组成部分之一，它定义了智能体（agent）如何在给定状态下选择动作。The Actor-Critic algorithm framework is a commonly used strategy learning method in reinforcement learning. It combines the update mechanism of the value module (Critic) and the policy module (Actor). The policy network is the core component of the reinforcement learning algorithm to implement the decision-making process of the intelligent agent. Its design and training directly affect the performance of the intelligent agent on specific tasks. In reinforcement learning (RL), the policy module (Actor) is used in combination with the value module (Critic). The value module evaluates the performance of the current strategy, while the policy module updates the strategy based on these evaluations. The policy module is one of the core components, which defines how the intelligent agent (agent) chooses actions in a given state.

因果特征选择（Causal Feature Selection，CFS）模块是一种在机器学习模型中用于识别和选择与目标变量存在因果关系的输入特征的方法。这种方法与传统的特征选择技术不同，后者通常基于特征与目标变量之间的相关性进行选择。CFS模块的目标是识别那些不仅与目标变量相关，而且可能对其有直接影响的特征。The Causal Feature Selection (CFS) module is a method used in machine learning models to identify and select input features that have a causal relationship with the target variable. This method is different from traditional feature selection techniques, which usually select features based on the correlation between the features and the target variable. The goal of the CFS module is to identify features that are not only correlated with the target variable, but may also have a direct impact on it.

按路径长度加权的成功率（Success-Weighted Path Length，SPL）是一种在强化学习中用来衡量策略性能的指标，特别是在那些需要智能体达到某个特定目标的任务中。SPL结合了任务成功与否的二元反馈以及智能体达到目标所走的步数。Success-Weighted Path Length (SPL) is a metric used in reinforcement learning to measure policy performance, especially in tasks that require the agent to reach a specific goal. SPL combines the binary feedback of whether the task was successful or not and the number of steps the agent took to reach the goal.

现有技术主要采用深度强化学习（DRL）进行多无人机避障导航，通过传感器输入实现无需环境地图的端到端避障导航。其中，SAC+RAE方法在多UAV避障导航中，可以在已知环境中训练学习避撞策略，从而在已知环境中成功躲避障碍物并到达目标点。但面对未知障碍物环境时，无人机避障导航的成功率会显著下降，显露出深度强化学习模型泛化能力差的问题。为了提升DRL模型的泛化能力，研究者引入了因果表示学习，通过构建因果结构识别关键特征以应对OOD场景。本发明提出的因果特征选择（CFS）模块，通过集成至策略网络中，可以有效提取出视觉表示中的因果因素，减少非因果因素的干扰，显著提升了无人机在未知场景中的避障导航性能，相较于现有技术展现了更强的泛化性和鲁棒性。The existing technology mainly uses deep reinforcement learning (DRL) for multi-UAV obstacle avoidance navigation, and realizes end-to-end obstacle avoidance navigation without environmental maps through sensor input. Among them, the SAC+RAE method can train and learn collision avoidance strategies in a known environment in multi-UAV obstacle avoidance navigation, so as to successfully avoid obstacles and reach the target point in a known environment. However, when facing an unknown obstacle environment, the success rate of the obstacle avoidance navigation of the UAV will drop significantly, revealing the problem of poor generalization ability of the deep reinforcement learning model. In order to improve the generalization ability of the DRL model, the researchers introduced causal representation learning, and identified key features by constructing a causal structure to cope with OOD scenarios. The causal feature selection (CFS) module proposed in the present invention can effectively extract the causal factors in the visual representation by integrating it into the policy network, reduce the interference of non-causal factors, and significantly improve the obstacle avoidance navigation performance of the UAV in unknown scenes. Compared with the existing technology, it shows stronger generalization and robustness.

现有技术的缺点是：The disadvantages of the prior art are:

泛化能力不足：基于DRL的避障导航方法，尤其是SAC+RAE，虽然在已知环境表现良好，但在面对未见过的障碍物时，其导航成功率显著下降。这表明现有技术在未知环境中的泛化能力有限，难以有效适应训练时未见过的障碍物。Insufficient generalization ability: Although the obstacle avoidance navigation method based on DRL, especially SAC+RAE, performs well in known environments, its navigation success rate drops significantly when facing unseen obstacles. This shows that the generalization ability of existing technologies in unknown environments is limited, and it is difficult to effectively adapt to obstacles that have not been seen during training.

SAC+RAE方法在训练过程中可能会错误地将障碍物的形状与避障策略虚假关联起来，导致在遇到不同形状的障碍物时，无人机的避障策略失效。The SAC+RAE method may falsely associate the shape of the obstacle with the obstacle avoidance strategy during the training process, causing the obstacle avoidance strategy of the drone to fail when encountering obstacles of different shapes.

现有技术可能无法区分视觉信息中的因果因素（如障碍物距离和无人机速度）与非因果因素（如障碍物形状和背景纹理）。因此导致在测试环境变化时，非因果因素可能通过虚假相关性对策略预测产生负面影响。Existing techniques may fail to distinguish causal factors (such as obstacle distance and drone speed) from non-causal factors (such as obstacle shape and background texture) in visual information. As a result, non-causal factors may negatively affect policy predictions through spurious correlations when the test environment changes.

导致原因：Cause:

DRL方法通常假设训练和测试数据是独立同分布的，但在现实应用中，实际部署环境往往与训练环境差异较大，这违反了独立同分布假设，从而引发了深度强化学习的泛化问题。DRL methods usually assume that training and test data are independent and identically distributed, but in real applications, the actual deployment environment is often quite different from the training environment, which violates the independent and identically distributed assumption and thus causes the generalization problem of deep reinforcement learning.

SAC+RAE使用的正则化自编码器（RAE）可能会隐式地编码所有视觉信息，包括与任务不相关的非因果因素，这可能导致策略网络受到虚假相关性的影响。The regularized autoencoder (RAE) used by SAC+RAE may implicitly encode all visual information, including non-causal factors irrelevant to the task, which may cause the policy network to be affected by spurious correlations.

现有技术没有使用因果推断框架来识别和利用数据中的因果关系，而是依赖于关联性学习，这在OOD场景中会导致泛化性能下降。Instead of using a causal inference framework to identify and exploit causal relationships in data, existing techniques rely on associative learning, which can lead to degraded generalization performance in OOD scenarios.

本发明旨在解决无人机群体在复杂户外环境中的避障导航问题，特别是在面对未知环境和障碍物时，如何通过基于因果特征选择的深度强化学习方法实现鲁棒的导航性能和有效的避障。该技术挑战属于机器人技术领域，具体涉及多智能体系统中的无人机自主导航和避障策略。此外，它还涵盖了人工智能领域中深度强化学习的应用，以及计算机视觉在环境感知和路径规划中的作用。最终，该技术可为农业、搜索与救援、矿业和巡逻检查等多个实际应用领域提供支持，通过先进的控制策略提升无人机的操作效率和安全性。This invention aims to solve the obstacle avoidance and navigation problem of drone swarms in complex outdoor environments, especially how to achieve robust navigation performance and effective obstacle avoidance through deep reinforcement learning methods based on causal feature selection when facing unknown environments and obstacles. This technical challenge belongs to the field of robotics, specifically involving autonomous navigation and obstacle avoidance strategies for drones in multi-agent systems. In addition, it also covers the application of deep reinforcement learning in the field of artificial intelligence, and the role of computer vision in environmental perception and path planning. Ultimately, this technology can provide support for multiple practical application areas such as agriculture, search and rescue, mining, and patrol inspections, and improve the operational efficiency and safety of drones through advanced control strategies.

如图1和图2所示，图1为本发明实施例提供的一种基于CFS的多无人机避障方法，所述方法包括以下步骤：As shown in FIG. 1 and FIG. 2 , FIG. 1 is a multi-UAV obstacle avoidance method based on CFS provided by an embodiment of the present invention, and the method comprises the following steps:

S100，获取无人机对当前环境采集的深度图像，通过编码器将所述深度图像编码得到视觉表示，将无人机的当前速度、目标位置、以及所述视觉表示整合得到观测向量；S100, obtaining a depth image of the current environment collected by the drone, encoding the depth image through an encoder to obtain a visual representation, and integrating the current speed, target position, and the visual representation of the drone to obtain an observation vector;

无人机通过其前置中央摄像头捕获深度图像，这些图像包含了从无人机视点到环境中各个物体表面的距离信息。同时，无人机的当前速度和目标位置信息也被整合进观测向量，其中，表示当前速度，表示目标位置，表示视觉表示。The drone captures depth images through its front central camera, which contain distance information from the drone's viewpoint to the surfaces of various objects in the environment. At the same time, the drone's current speed and target position information are also integrated into the observation vector ,in, Indicates the current speed. Indicates the target location, Indicates a visual representation.

S200，将所述观测向量输入预先训练得到的避障模型，得到无人机的飞行动作；其中，所述避障模型基于深度强化学习网络构建，所述避障模型包括并行的视觉特征提取网络、策略网络和价值网络，所述策略网络包括多个线性层，相邻线性层之间嵌入有CFS模块，所述CFS模块用于从视觉表示中选择与避障任务直接相关的因果特征；S200, inputting the observation vector into a pre-trained obstacle avoidance model to obtain a flight action of the drone; wherein the obstacle avoidance model is constructed based on a deep reinforcement learning network, the obstacle avoidance model includes a parallel visual feature extraction network, a policy network and a value network, the policy network includes a plurality of linear layers, a CFS module is embedded between adjacent linear layers, and the CFS module is used to select causal features directly related to the obstacle avoidance task from the visual representation;

CFS模块通过在训练过程中引入因果特征选择机制，自动识别和选择对避障任务有因果关系的视觉特征，同时过滤可能导致错误决策的非因果特征，从而使DRL模型在未知环境中具有较高的泛化能力。The CFS module introduces a causal feature selection mechanism during the training process to automatically identify and select visual features that are causally related to the obstacle avoidance task, while filtering out non-causal features that may lead to incorrect decisions, thereby enabling the DRL model to have higher generalization capabilities in unknown environments.

S300，基于所述无人机的飞行动作控制所述无人机飞行，直至到达目标位置。S300: Control the flight of the drone based on the flight action of the drone until the drone reaches a target position.

本发明采用的非稀疏奖励函数为无人机提供了更多的学习信号，这不仅加快了学习过程，还提高了避障策略的鲁棒性和适应性。The non-sparse reward function adopted by the present invention provides more learning signals for the UAV, which not only speeds up the learning process but also improves the robustness and adaptability of the obstacle avoidance strategy.

本发明能够是多无人机在面对未见过的障碍物和环境背景时，保持较高的避障成功率和稳定的行为表现。通过引入CFS模块，本发明的策略网络能够更好地抵抗输入特征中的非因果因素的影响，从而在复杂未知的环境中提供更可靠的避障策略。The present invention enables multiple drones to maintain a high obstacle avoidance success rate and stable behavior performance when facing obstacles and environmental backgrounds that they have never seen before. By introducing the CFS module, the policy network of the present invention can better resist the influence of non-causal factors in the input features, thereby providing a more reliable obstacle avoidance strategy in complex and unknown environments.

在一些改进的实施例中，所述从视觉表示中选择与避障任务直接相关的因果特征，包括：In some improved embodiments, the step of selecting causal features directly related to the obstacle avoidance task from the visual representation includes:

获取视觉表示；Get visual representations;

在策略网络中嵌入CFS模块，CFS模块负责从提取的视觉表示中选择与避障任务直接相关的因果特征。CFS模块通过一个可训练的权重和一个小的多层感知器（MLP）生成一个可微分的二进制掩码，用于激活因果特征通道并抑制非因果特征通道。本发明依靠CFS模块生成的可微分二进制掩码来选择因果特征，这种方法比传统DRL模型中直接使用所有输入特征的方法更具泛化性和鲁棒性。通过在策略网络中嵌入CFS模块，利用小的多层感知器（MLP）和ReLU激活函数，将可训练权重转换为二进制掩码，实现了对特征选择过程的端到端学习。The CFS module is embedded in the policy network, which is responsible for selecting causal features directly related to the obstacle avoidance task from the extracted visual representation. The CFS module generates a differentiable binary mask through a trainable weight and a small multi-layer perceptron (MLP) to activate the causal feature channel and suppress the non-causal feature channel. The present invention relies on the differentiable binary mask generated by the CFS module to select causal features, which is more generalizable and robust than the method of directly using all input features in the traditional DRL model. By embedding the CFS module in the policy network, using a small multi-layer perceptron (MLP) and ReLU activation function to convert the trainable weights into binary masks, end-to-end learning of the feature selection process is achieved.

由于CFS模块生成的二进制掩码是可微分的，这允许它在DRL框架中通过反向传播算法进行端到端的优化。这种优化方式确保了CFS模块能够有效地与策略网络协同工作，保证了学习到的策略对于避障任务的适应性和鲁棒性。Since the binary mask generated by the CFS module is differentiable, this allows it to be optimized end-to-end in the DRL framework through the back-propagation algorithm. This optimization method ensures that the CFS module can work effectively with the policy network and ensures the adaptability and robustness of the learned policy for obstacle avoidance tasks.

在一些改进的实施例中，所述避障模型通过以下方式训练得到：In some improved embodiments, the obstacle avoidance model is trained in the following manner:

S210，获取样本观测向量；所述样本观测向量包括样本速度、样本目标位置、以及样本深度图像；S210, obtaining a sample observation vector; the sample observation vector includes a sample velocity, a sample target position, and a sample depth image;

S220，基于所述样本深度图像对所述视觉特征提取网络进行迭代训练，得到训练好的视觉特征提取网络；S220, iteratively training the visual feature extraction network based on the sample depth image to obtain a trained visual feature extraction network;

S230，基于所述样本深度图像、样本速度和样本目标位置对所述策略网络进行迭代训练，通过价值网络对所述策略网络迭代输出的无人机动作进行价值评估，根据价值评估结果对策略网络进行迭代；通过最大化避障奖励函数的方式来指导策略网络的训练和参数优化，得到训练好的策略网络和价值网络；S230, iteratively training the policy network based on the sample depth image, sample speed, and sample target position, performing value evaluation on the drone action iteratively output by the policy network through the value network, and iterating the policy network according to the value evaluation result; guiding the training and parameter optimization of the policy network by maximizing the obstacle avoidance reward function, and obtaining a trained policy network and value network;

S240，将训练好的视觉特征提取网络、训练好的策略网络和价值网络作为避障模型。S240, using the trained visual feature extraction network, the trained strategy network and the value network as an obstacle avoidance model.

具体地，策略网络输出无人机的动作，价值网络评估动作的期望累积奖励；利用软演员-评论家算法训练改进后的Actor-Critic算法，其包括一个三层的多层感知器（MLP）构成的策略网络和价值网络。本发明还设计了一种非稀疏奖励函数，它结合了目标到达奖励函数和避障奖励函数，以促进无人机学习有效的避障策略，并通过在模拟环境中的测试验证了所提方法的有效性。本发明设计的奖励函数不仅考虑了无人机到达目标的奖励，还细致地考虑了避障和路径效率。这种设计为无人机提供了在探索环境和学习避障策略时所需的有效反馈，从而加快了学习过程，并最终提升了策略的质量。Specifically, the policy network outputs the action of the drone, and the value network evaluates the expected cumulative reward of the action; the improved Actor-Critic algorithm is trained using the soft actor-critic algorithm, which includes a policy network and a value network composed of a three-layer multilayer perceptron (MLP). The present invention also designs a non-sparse reward function, which combines the target arrival reward function and the obstacle avoidance reward function to promote the drone to learn an effective obstacle avoidance strategy, and verifies the effectiveness of the proposed method through testing in a simulated environment. The reward function designed by the present invention not only considers the reward for the drone to reach the target, but also carefully considers obstacle avoidance and path efficiency. This design provides the drone with the effective feedback it needs when exploring the environment and learning obstacle avoidance strategies, thereby speeding up the learning process and ultimately improving the quality of the strategy.

在一些改进的实施例中，所述基于所述样本深度图像对所述视觉特征提取网络进行迭代训练，得到训练好的视觉特征提取网络，包括：In some improved embodiments, the iterative training of the visual feature extraction network based on the sample depth image to obtain a trained visual feature extraction network includes:

具体地，正则化自编码器（RAE）接收无人机的观测信息，并将其编码为一个50维的潜在特征。这一步骤涉及将深度图像通过四层卷积网络和一个全连接层转换成潜在特征，然后通过解码器重构与深度图像对应的重建图像，以便于捕捉环境中的关键视觉信息。通过第一损失函数J(RAE)计算得到第一损失值。Specifically, the regularized autoencoder (RAE) receives the observation information of the drone and encodes it into a 50-dimensional latent feature. This step involves converting the depth image into a latent feature through a four-layer convolutional network and a fully connected layer, and then reconstructing the reconstructed image corresponding to the depth image through the decoder to capture the key visual information in the environment. The first loss value is calculated by the first loss function J(RAE).

在一些改进的实施例中，所述获取视觉特征提取网络重构得到的重建图像，包括：In some improved embodiments, the step of obtaining a reconstructed image obtained by reconstructing a visual feature extraction network includes:

在一些改进的实施例中，所述基于所述样本深度图像、样本速度和样本目标位置对所述策略网络进行迭代训练，包括：In some improved embodiments, the iterative training of the policy network based on the sample depth image, the sample speed and the sample target position includes:

在一些改进的实施例中，所述通过价值网络对所述策略网络迭代输出的无人机动作进行价值评估，包括：In some improved embodiments, the value evaluation of the drone action iteratively output by the policy network through the value network includes:

通过第二损失函数J(π)计算得到第二损失值Z_π。通过第三损失函数J(Q)计算得到第三损失值Z_Q。The second loss value Z _π is calculated by the second loss function J(π). The third loss value Z _Q is calculated by the third loss function J(Q).

本发明提出了一种用于多无人机避障的鲁棒策略学习方法，通过集成因果特征选择（CFS）模块来提高深度强化学习（DRL）在未知环境中的泛化能力。This paper proposes a robust policy learning method for multi-UAV obstacle avoidance, which improves the generalization ability of deep reinforcement learning (DRL) in unknown environments by integrating a causal feature selection (CFS) module.

针对多无人机在复杂环境中的避障导航任务，现有的深度强化学习（DRL）方法在未知环境中的泛化能力有限，导致在未知环境中的避障策略效果不佳。而在实际应用中，无人机在未知环境中的避障导航是一种常见任务。本发明提供了一种集成了因果特征选择（CFS）模块的DRL方法，可以提高无人机在未知复杂环境中的避障导航性能。For the obstacle avoidance and navigation tasks of multiple UAVs in complex environments, the existing deep reinforcement learning (DRL) method has limited generalization ability in unknown environments, resulting in poor obstacle avoidance strategies in unknown environments. In practical applications, obstacle avoidance and navigation of UAVs in unknown environments is a common task. The present invention provides a DRL method integrated with a causal feature selection (CFS) module, which can improve the obstacle avoidance and navigation performance of UAVs in unknown complex environments.

本发明创新性地将因果特征选择技术应用于无人机的避障导航策略学习中，这是一种基于因果表示学习的DRL方法。本发明对DRL模型进行了改进，提出了一种集成CFS模块的策略网络。通过对无人机的视觉输入进行因果特征选择，过滤掉与避障任务无关的非因果因素，从而减少虚假相关造成的影响，并提高策略网络的泛化能力。The present invention innovatively applies causal feature selection technology to the obstacle avoidance navigation strategy learning of UAVs, which is a DRL method based on causal representation learning. The present invention improves the DRL model and proposes a policy network with an integrated CFS module. By performing causal feature selection on the visual input of the UAV, non-causal factors irrelevant to the obstacle avoidance task are filtered out, thereby reducing the impact of false correlation and improving the generalization ability of the policy network.

与现有技术相比，本发明的方法在包含未知背景和障碍物的场景中，在成功率、SPL、平均速度等指标上，相比于现有方法展现了更优越的性能，证明了该方法学习到的避障导航策略在未知环境中的有效性和鲁棒性。Compared with the existing technology, the method of the present invention shows superior performance in terms of success rate, SPL, average speed and other indicators in scenes with unknown backgrounds and obstacles, which proves the effectiveness and robustness of the obstacle avoidance navigation strategy learned by this method in unknown environments.

相比现有技术，本发明的优点在于：Compared with the prior art, the advantages of the present invention are:

通过设计因果特征选择（CFS）模块，本发明能够更好地适应未知环境，提高无人机在未见过的障碍物和背景中的导航成功率。By designing a causal feature selection (CFS) module, the present invention can better adapt to unknown environments and improve the navigation success rate of UAVs in unseen obstacles and backgrounds.

CFS模块通过提取出视觉表示中的因果因素，消除非因果因素与动作预测之间的虚假相关，从而避免因虚假相关导致的无人机泛化性能下降。The CFS module extracts causal factors from visual representation and eliminates spurious correlations between non-causal factors and action prediction, thereby avoiding the degradation of drone generalization performance caused by spurious correlations.

本发明通过明确区分视觉信息中的因果和非因果因素，减少非因果因素对策略学习的影响，增强DRL模型对于未知环境变化的泛化性能。The present invention reduces the impact of non-causal factors on strategy learning by clearly distinguishing causal and non-causal factors in visual information, thereby enhancing the generalization performance of the DRL model for unknown environmental changes.

通过集成CFS模块到策略网络中，本发明能够更有效地学习最优避障导航策略，即使在面对复杂和未知的环境条件时也能保持较高的成功率。By integrating the CFS module into the policy network, the present invention is able to more effectively learn the optimal obstacle avoidance navigation strategy and maintain a high success rate even in the face of complex and unknown environmental conditions.

参阅图3，本发明实施例提供了一种基于CFS的多无人机避障系统，包括：Referring to FIG. 3 , an embodiment of the present invention provides a multi-UAV obstacle avoidance system based on CFS, including:

可见，上述方法实施例中的内容均适用于本系统实施例中，本系统实施例所具体实现的功能与上述方法实施例相同，并且达到的有益效果与上述方法实施例所达到的有益效果也相同。It can be seen that the contents of the above method embodiments are all applicable to the present system embodiments, the functions specifically implemented by the present system embodiments are the same as those of the above method embodiments, and the beneficial effects achieved are also the same as those achieved by the above method embodiments.

参阅图4，本发明实施例提供了一种基于CFS的多无人机避障装置，包括：Referring to FIG. 4 , an embodiment of the present invention provides a multi-UAV obstacle avoidance device based on CFS, comprising:

至少一个处理器；at least one processor;

可见，上述方法实施例中的内容均适用于本装置实施例中，本装置实施例所具体实现的功能与上述方法实施例相同，并且达到的有益效果与上述方法实施例所达到的有益效果也相同。It can be seen that the contents of the above method embodiments are all applicable to the present device embodiments, the functions specifically implemented by the present device embodiments are the same as those of the above method embodiments, and the beneficial effects achieved are also the same as those achieved by the above method embodiments.

此外，本申请实施例还公开了一种计算机程序产品或计算机程序，计算机程序产品或计算机程序存储在计算机可读存介质中。计算机设备的处理器可以从计算机可读存储介质读取该计算机程序，处理器执行该计算机程序，使得该计算机设备执行上述的方法。同样地，上述方法实施例中的内容均适用于本存储介质实施例中，本存储介质实施例所具体实现的功能与上述方法实施例相同，并且达到的有益效果与上述方法实施例所达到的有益效果也相同。In addition, the embodiment of the present application also discloses a computer program product or a computer program, and the computer program product or the computer program is stored in a computer-readable storage medium. The processor of the computer device can read the computer program from the computer-readable storage medium, and the processor executes the computer program, so that the computer device performs the above method. Similarly, the contents in the above method embodiment are all applicable to the storage medium embodiment, and the functions specifically implemented by the storage medium embodiment are the same as those in the above method embodiment, and the beneficial effects achieved are also the same as those achieved by the above method embodiment.

以上所描述的装置实施例仅仅是示意性的，其中作为分离部件说明的单元可以是或者也可以不是物理上分开的，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The device embodiments described above are merely illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

本领域普通技术人员可以理解，上文中所公开方法中的全部或某些步骤、系统、设备中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。Those skilled in the art will appreciate that all or some of the steps in the methods disclosed above, and the functional modules/units in the systems and devices may be implemented as software, firmware, hardware, or a suitable combination thereof.

本申请的说明书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等（如果存在）是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if any) in the specification of the present application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the data used in this way can be interchanged where appropriate, so that the embodiments of the present application described herein can be implemented in an order other than those illustrated or described herein. In addition, the terms "including" and "having" and any of their variations are intended to cover non-exclusive inclusions, for example, a process, method, system, product or device that includes a series of steps or units is not necessarily limited to those steps or units that are clearly listed, but may include other steps or units that are not clearly listed or inherent to these processes, methods, products or devices.

应当理解，在本申请中，“至少一个（项）”是指一个或者多个，“多个”是指两个或两个以上。“和/或”，用于描述关联对象的关联关系，表示可以存在三种关系，例如，“A和/或B”可以表示：只存在A，只存在B以及同时存在A和B三种情况，其中A，B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项（个）”或其类似表达，是指这些项中的任意组合，包括单项（个）或复数项（个）的任意组合。例如，a，b或c中的至少一项（个），可以表示：a，b，c，“a和b”，“a和c”，“b和c”，或“a和b和c”，其中a，b，c可以是单个，也可以是多个。It should be understood that in this application, "at least one (item)" means one or more, and "plurality" means two or more. "And/or" is used to describe the association relationship of associated objects, indicating that three relationships may exist. For example, "A and/or B" can mean: only A exists, only B exists, and A and B exist at the same time, where A and B can be singular or plural. The character "/" generally indicates that the objects associated before and after are in an "or" relationship. "At least one of the following" or similar expressions refers to any combination of these items, including any combination of single or plural items. For example, at least one of a, b or c can mean: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", where a, b, c can be single or multiple.

在本申请所提供的几个实施例中，应该理解到，所揭露的装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in the present application, it should be understood that the disclosed devices and methods can be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括多指令用以使得一台计算机设备（可以是个人计算机，服务器，或者网络设备等）执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器（Read-Only Memory，简称ROM)、随机存取存储器（Random Access Memory，简称RAM)、磁碟或者光盘等各种可以存储程序的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including multiple instructions to enable a computer device (which can be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, referred to as ROM), random access memory (Random Access Memory, referred to as RAM), disk or optical disk and other media that can store programs.

以上参照附图说明了本申请实施例的优选实施例，并非因此局限本申请实施例的权利范围。本领域技术人员不脱离本申请实施例的范围和实质内所作的任何修改、等同替换和改进，均应在本申请实施例的权利范围之内。The preferred embodiments of the present application are described above with reference to the accompanying drawings, but the scope of the rights of the present application is not limited thereto. Any modification, equivalent substitution and improvement made by a person skilled in the art without departing from the scope and essence of the present application should be within the scope of the rights of the present application.

Claims

1. A CFS-based multi-unmanned aerial vehicle obstacle avoidance method, the method comprising the steps of:

acquiring a depth image acquired by an unmanned aerial vehicle for a current environment, encoding the depth image through an encoder to obtain a visual representation, and integrating the current speed, the target position and the visual representation of the unmanned aerial vehicle to obtain an observation vector;

Inputting the observation vector into a pre-trained obstacle avoidance model to obtain the flight action of the unmanned aerial vehicle; the obstacle avoidance model is constructed based on a deep reinforcement learning network, and comprises a parallel visual feature extraction network, a strategy network and a value network, wherein the strategy network comprises a plurality of linear layers, CFS modules are embedded between adjacent linear layers, and the CFS modules are used for selecting causal features directly related to an obstacle avoidance task from visual representation;

and controlling the unmanned aerial vehicle to fly based on the flying action of the unmanned aerial vehicle until reaching a target position.

2. The method of claim 1, wherein selecting causal features from the visual representation that are directly related to obstacle avoidance tasks comprises:

Acquiring a visual representation;

A differentiable binary mask is generated by a trainable weight and a multi-layer perceptron in the CFS module, the binary mask is embedded in the strategy network, the causal feature channel is activated and the non-causal feature channel is suppressed to eliminate non-causal features in the visual representation and preserve causal features.

3. The method of claim 1, wherein the obstacle avoidance model is trained by:

obtaining a sample observation vector; the sample observation vector includes a sample velocity, a sample target position, and a sample depth image;

Performing iterative training on the visual feature extraction network based on the sample depth image to obtain a trained visual feature extraction network;

Performing iterative training on the strategy network based on the sample depth image, the sample speed and the sample target position, performing value evaluation on unmanned aerial vehicle operation which is iteratively output by the strategy network through a value network, and iterating the strategy network according to a value evaluation result; training and parameter optimization of the strategy network are guided by a mode of maximizing the obstacle avoidance reward function, and a trained strategy network and a trained value network are obtained;

and taking the trained visual feature extraction network, the trained strategy network and the value network as an obstacle avoidance model.

4. A method according to claim 3, wherein the iteratively training the visual feature extraction network based on the sample depth image to obtain a trained visual feature extraction network comprises:

Inputting the sample observation vector into a visual feature extraction network for iterative training;

In the iterative training process, a reconstructed image obtained by reconstructing a visual feature extraction network is obtained, and a first loss value of the visual feature extraction network is calculated based on the depth image and the corresponding reconstructed image;

and stopping iterative training when the first loss value is reduced to a first threshold value, and obtaining a trained visual feature extraction network.

5. The method of claim 4, wherein the acquiring the reconstructed image of the reconstruction of the visual feature extraction network comprises:

the depth image is converted into multi-dimensional latent features through a four-layer convolution network and a full connection layer, and the obtained reconstructed image is reconstructed through a decoder.

6. The method of claim 3, wherein the iteratively training the policy network based on the sample depth image, sample velocity, and sample target location comprises:

and evaluating the action of the unmanned aerial vehicle output by adopting the target arrival reward function to the strategy network until a second loss value calculated according to the target arrival reward function is lower than a second loss threshold value.

7. A method according to claim 3, wherein said evaluating the value of unmanned aerial vehicle actions iteratively output by said policy network over a value network comprises:

And evaluating the expected accumulated rewards output by the value network by adopting the obstacle avoidance rewards function until a third loss value calculated according to the obstacle avoidance rewards function is lower than a third loss threshold value.

8. A CFS-based multi-unmanned aerial vehicle obstacle avoidance system, the system comprising:

the first module is used for acquiring a depth image acquired by the unmanned aerial vehicle for the current environment, encoding the depth image through an encoder to obtain a visual representation, and integrating the current speed, the target position and the visual representation of the unmanned aerial vehicle to obtain an observation vector;

the second module is used for inputting the observation vector into a pre-trained obstacle avoidance model to obtain the flight action of the unmanned aerial vehicle; the obstacle avoidance model is constructed based on a deep reinforcement learning network, and comprises a parallel visual feature extraction network, a strategy network and a value network, wherein the strategy network comprises a plurality of linear layers, CFS modules are embedded between adjacent linear layers, and the CFS modules are used for selecting causal features directly related to an obstacle avoidance task from visual representation;

and the third module is used for controlling the unmanned aerial vehicle to fly based on the flying action of the unmanned aerial vehicle until reaching the target position.

9. Many unmanned aerial vehicle keep away barrier device based on CFS, its characterized in that includes:

at least one processor;

At least one memory for storing at least one program;

the at least one program, when executed by the at least one processor, causes the at least one processor to implement the method of any one of claims 1 to 7.

10. A computer readable storage medium, in which a processor executable program is stored, characterized in that the processor executable program is for performing the method according to any one of claims 1 to 7 when being executed by a processor.