CN114620218B

CN114620218B - Unmanned aerial vehicle landing method based on improved YOLOv algorithm

Info

Publication number: CN114620218B
Application number: CN202210325905.0A
Authority: CN
Inventors: 杨凯军; 邹鹏; 梁晨; 欧阳凌丛; 张志雄
Original assignee: Shaanxi University of Science and Technology
Current assignee: Shaanxi University of Science and Technology
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2024-10-29
Anticipated expiration: 2042-03-30
Also published as: CN114620218A

Abstract

The invention provides an unmanned aerial vehicle landing method based on an improved YOLOv algorithm, wherein a camera module shoots a video of an environment where an unmanned aerial vehicle is located and transmits the video to an onboard computer through a data graph transmission module; the onboard computer uses YOLOv s algorithm, based on a Mosaic data enhancement model, and uses a self-adaptive anchor frame calculation and a self-adaptive picture scaling method to detect whether the real-time video stream corresponds to a flat land, a sloping field, a pit or a stair; if the result is a landing environment, the airborne computer is combined with the ground station and the flight control board to generate a yaw angle for controlling the unmanned aerial vehicle to land; if the detection result is not the landing environment, repeating the detection until the detection result is the landing environment; the ground station transmits the yaw angle into the flight control board through the data graph transmission module, the flight control board controls the unmanned aerial vehicle to land, after the unmanned aerial vehicle is far away from the ground, the unmanned aerial vehicle naturally lands if the unmanned aerial vehicle is flat, and the flight control board drives the electric push rod to extend outwards if the unmanned aerial vehicle is sloping fields, pits or stairs, so that the unmanned aerial vehicle is assisted to land in a corresponding landing environment.

Description

A drone landing method based on improved YOLOv5 algorithm

技术领域Technical Field

本发明涉及计算机视觉与无人机控制技术领域，具体为一种基于改进YOLOv5算法的无人机降落方法。The present invention relates to the technical field of computer vision and unmanned aerial vehicle control, and in particular to a method for landing an unmanned aerial vehicle based on an improved YOLOv5 algorithm.

背景技术Background Art

无人机是一种结合遥控、接收设备并由程序控制的空中机器人，其以机动性强、小巧灵活、任务周期短等优点活跃并应用于各个领域中。A drone is an aerial robot that combines remote control and receiving equipment and is controlled by a program. It is active and applied in various fields due to its strong maneuverability, compactness, flexibility, and short mission cycle.

近几年随着计算机技术的不断发展，计算机视觉等诸多技术也得到了多方面的应用，例如在日常商场进出门口，实时检测人们是否戴口罩、高速路口、停车场车牌的自动识别等等。In recent years, with the continuous development of computer technology, many technologies such as computer vision have also been applied in many aspects. For example, in daily shopping mall entrances and exits, real-time detection of whether people are wearing masks, automatic recognition of license plates at highway intersections and parking lots, etc.

目标的实时检测与识别的难度主要在于特征信息的提取以及信息的实时处理，而YOLO算法的提出则很好的解决了这一问题，YOLO算法发展至今，已出现很多版本，目前检测效果最好的为YOLOv5系列，它是一种单阶段目标检测算法，相比于前一个YOLOv4版本，其实时识别精度与速度都得到了极大的提升。The difficulty of real-time detection and recognition of targets lies mainly in the extraction of feature information and real-time processing of information. The YOLO algorithm has solved this problem very well. Since its development, many versions of the YOLO algorithm have appeared. Currently, the best detection effect is the YOLOv5 series, which is a single-stage target detection algorithm. Compared with the previous YOLOv4 version, its real-time recognition accuracy and speed have been greatly improved.

目标的检测与识别和无人机的结合在实际场景中有一些应用，但是目前应用于无人机降落阶段的计算机视觉并不是很多，且效果一般，也没有及时识别并使无人机平稳安全降落到地面的措施，促使无人机可以快速并且稳定的下降。The combination of target detection and recognition and drones has some applications in actual scenarios, but currently there are not many computer visions applied to the landing phase of drones, and the effect is average. There are also no measures to timely identify and enable drones to land smoothly and safely on the ground, so that drones can descend quickly and stably.

发明内容Summary of the invention

针对现有技术中存在的问题，本发明提供一种基于改进YOLOv5算法的无人机降落方法，能及时识别无人机降落的地面并使无人机平稳安全降落到地面。In view of the problems existing in the prior art, the present invention provides a UAV landing method based on an improved YOLOv5 algorithm, which can timely identify the landing ground of the UAV and enable the UAV to land smoothly and safely on the ground.

为了实现上述目的，本发明采用以下的技术方案：In order to achieve the above object, the present invention adopts the following technical solutions:

一种基于改进YOLOv5算法的无人机降落方法，所述无人机的无人机机架底部安装有电动推杆，无人机机架上安装有摄像头模块、数据图形传输模块、机载电脑和飞控板，包括如下步骤：A method for landing a drone based on an improved YOLOv5 algorithm, wherein an electric push rod is installed at the bottom of a drone frame of the drone, and a camera module, a data graphics transmission module, an onboard computer and a flight control board are installed on the drone frame, comprising the following steps:

步骤1，摄像头模块拍摄无人机所处环境的视频，之后通过数据图形传输模块传输至机载电脑；Step 1, the camera module captures the video of the drone's environment, which is then transmitted to the onboard computer via the data graphics transmission module;

步骤2，机载电脑采用YOLOv5s算法，基于Mosaic数据增强模型，并运用自适应锚框计算和自适应图片缩放方法检测数据图形传输模块传入的实时视频流是否对应平地环境、坡地环境、坑地环境或楼梯环境；Step 2: The onboard computer uses the YOLOv5s algorithm, based on the Mosaic data enhancement model, and uses the adaptive anchor frame calculation and adaptive image scaling methods to detect whether the real-time video stream transmitted by the data graphics transmission module corresponds to a flat environment, a slope environment, a pit environment, or a stair environment;

步骤3，若步骤2的检测结果是无人机的降落环境，则机载电脑结合地面站与飞控板生成用于控制无人机降落的偏航角；若步骤2的检测结果不是无人机的降落环境，则重复进行步骤1和步骤2，直到检测结果是无人机的降落环境为止，机载电脑结合地面站与飞控板生成用于控制无人机降落的偏航角；Step 3: If the detection result of step 2 is the landing environment of the drone, the onboard computer combines the ground station and the flight control board to generate a yaw angle for controlling the landing of the drone; if the detection result of step 2 is not the landing environment of the drone, repeat steps 1 and 2 until the detection result is the landing environment of the drone, and the onboard computer combines the ground station and the flight control board to generate a yaw angle for controlling the landing of the drone;

步骤4，地面站将步骤3得到的偏航角通过数据图形传输模块传入飞控板，飞控板控制无人机降落，待无人机距离地面1～2m后，若降落环境为平地，无人机自然降落，若降落环境为坡地、坑地或楼梯时，飞控板驱动电动推杆外伸，协助无人机在对应降落环境降落。Step 4: The ground station transmits the yaw angle obtained in step 3 to the flight control board through the data graphic transmission module. The flight control board controls the UAV to land. When the UAV is 1 to 2 meters away from the ground, if the landing environment is flat, the UAV will land naturally. If the landing environment is a slope, pit or stairs, the flight control board drives the electric push rod to extend to assist the UAV to land in the corresponding landing environment.

优选的，所述的步骤2包括如下具体的分步骤：Preferably, the step 2 includes the following specific sub-steps:

步骤2a，先将数据图形传输模块传入的实时视频流提取为一帧一帧的图片，再将这些图片标注后分成平地环境、坡地环境、坑地环境和楼梯环境，最后进行转化、格式调整，得到统一尺寸的标注图片文件；Step 2a, first extract the real-time video stream transmitted by the data graphic transmission module into frames of pictures, then mark these pictures and divide them into flat environment, slope environment, pit environment and stair environment, and finally convert and adjust the format to obtain a uniform-sized marked picture file;

步骤2b，将步骤2a所述的标注图片文件送入YOLOv5s的输入端，依次进行Focus操作和Ghost操作，得到融合后的图片，将融合后的图片进行信息的融合和特征的提取，输出3种特征预测图；Step 2b, sending the annotated image file described in step 2a to the input end of YOLOv5s, performing Focus operation and Ghost operation in sequence to obtain a fused image, fusing information and extracting features of the fused image, and outputting three feature prediction images;

步骤2c，将步骤2b所述的特征预测图进行非极大值抑制操作，得到与真实框概率最接近的预测框，得到训练好的网络模型；Step 2c, performing a non-maximum suppression operation on the feature prediction map described in step 2b to obtain a prediction box that is closest to the true box probability, and obtain a trained network model;

步骤2d，机载电脑先将数据图形传输模块传入的实时视频流提取为一帧一帧的图片，再将这些图片标注后进行转化、格式调整，最后将得到的统一尺寸的标注图片文件送入步骤2c训练好的网络模型中，判断该实时视频流是否对应平地环境、坡地环境、坑地环境或楼梯环境。In step 2d, the onboard computer first extracts the real-time video stream transmitted by the data graphics transmission module into frames of images, then annotates these images and converts and adjusts the format. Finally, the obtained annotated image files of uniform size are sent to the network model trained in step 2c to determine whether the real-time video stream corresponds to a flat environment, a slope environment, a pit environment or a stair environment.

进一步，所述的步骤2c在进行非极大值抑制操作时，采用CIoU_Loss作为边界框的损失函数，CIoU_Loss如下式所示：Furthermore, in step 2c, when performing the non-maximum suppression operation, CIoU_Loss is used as the loss function of the bounding box, and CIoU_Loss is shown in the following formula:

其中，IoU为预测框和真实框的交集与预测框和真实框的并集的比值，b,b^gt分别代表预测框和真实框的中心点，ρ为预测框中心点和真实框中心点间的欧式距离，c为同时包含预测框和真实框的最小闭包区域的对角线距离，α为权重函数，v是影响因子。Among them, IoU is the ratio of the intersection of the predicted box and the true box to the union of the predicted box and the true box, b, ^bgt represent the center points of the predicted box and the true box respectively, ρ is the Euclidean distance between the center points of the predicted box and the true box, c is the diagonal distance of the minimum closure area that contains both the predicted box and the true box, α is the weight function, and v is the influencing factor.

再进一步，所述的步骤2b依次用FPN结构和PAN结构对融合后的图片进行信息的融合和特征的提取，输出3种特征预测图。Furthermore, in step 2b, the FPN structure and the PAN structure are used to fuse information and extract features from the fused image in turn, and three feature prediction maps are output.

优选的，所述的摄像头模块为RER-USB4K02AF-V100。Preferably, the camera module is RER-USB4K02AF-V100.

优选的，所述的摄像头模块通过数据图形传输模块连接在机载电脑的输入端，机载电脑的输出端连接在飞控板的输入端，数据图形传输模块的一端连接在飞控板的输入端，数据图形传输模块的型号为HM30。Preferably, the camera module is connected to the input end of the onboard computer through a data graphic transmission module, the output end of the onboard computer is connected to the input end of the flight control board, one end of the data graphic transmission module is connected to the input end of the flight control board, and the model of the data graphic transmission module is HM30.

优选的，还包括航模电池，航模电池通过5V2A BEC供电型电流计分别与机载电脑的供电端和飞控板的供电端连接。Preferably, it also includes a model aircraft battery, which is connected to the power supply end of the onboard computer and the power supply end of the flight control board respectively through a 5V2A BEC power supply ammeter.

优选的，还包括连接在飞控板输入端的GPS定位模块，GPS定位模块用于将采集到的无人机实时位置信息通过通讯模块传输至地面站。Preferably, it also includes a GPS positioning module connected to the input end of the flight control board, and the GPS positioning module is used to transmit the collected real-time position information of the drone to the ground station through the communication module.

优选的，所述的电动推杆与飞控板的输出端连接，电动推杆为3～5个。Preferably, the electric push rod is connected to the output end of the flight control board, and there are 3 to 5 electric push rods.

优选的，所述飞控板的型号为Pixhawk。Preferably, the model of the flight control board is Pixhawk.

与现有技术相比，本发明具有以下有益的技术效果：Compared with the prior art, the present invention has the following beneficial technical effects:

本发明一种基于改进YOLOv5算法的无人机降落方法，摄像头模块可拍摄无人机实时所处环境的视频，之后通过数据图形传输模块传输至机载电脑，机载电脑可检测出实时视频流是否对应平地、坡地、坑地或楼梯环境，若结果是无人机的降落环境，则机载电脑可结合地面站与飞控板生成用于控制无人机降落的偏航角，若结果不是无人机的降落环境，则无人机可继续飞行，通过摄像头模块、数据图形传输模块和机载电脑的这种配合可找到无人机的降落环境，进而生成需要的偏航角，地面站可将偏航角通过数据图形传输模块传入飞控板，飞控板再控制无人机降落，待无人机接近地面，此时需要判断地面环境，若降落环境为平地，无人机可自然降落，若为坡地、坑地或楼梯时，飞控板此时驱动电动推杆外伸，可协助无人机在对应降落环境降落，使无人机平稳安全降落到地面，增强了无人机的智能性。本发明中的结构搭建简单，移植性强，解决了无人机智能降落的问题，可以极大地提高无人机相关科研人员的工作效率。The invention discloses a method for landing a drone based on an improved YOLOv5 algorithm. A camera module can shoot a video of the real-time environment of the drone, and then transmit the video to an onboard computer through a data graphic transmission module. The onboard computer can detect whether the real-time video stream corresponds to a flat land, a sloped land, a pit or a stair environment. If the result is the landing environment of the drone, the onboard computer can generate a yaw angle for controlling the landing of the drone in combination with a ground station and a flight control board. If the result is not the landing environment of the drone, the drone can continue to fly. The landing environment of the drone can be found through the cooperation of the camera module, the data graphic transmission module and the onboard computer, and the required yaw angle can be generated. The ground station can transmit the yaw angle to the flight control board through the data graphic transmission module, and the flight control board controls the landing of the drone. When the drone approaches the ground, it is necessary to judge the ground environment at this time. If the landing environment is a flat land, the drone can land naturally. If it is a sloped land, a pit or a stair, the flight control board drives an electric push rod to extend outward at this time, so as to assist the drone to land in the corresponding landing environment, so that the drone can land on the ground stably and safely, thereby enhancing the intelligence of the drone. The structure of the present invention is simple to build and has strong portability, which solves the problem of intelligent landing of drones and can greatly improve the work efficiency of drone-related scientific researchers.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明所述的四旋翼无人机工作原理框图。FIG1 is a block diagram of the working principle of the quad-rotor UAV according to the present invention.

图2是本发明所述的四旋翼无人机工作流程图。FIG. 2 is a flowchart of the working process of the quadrotor drone described in the present invention.

图3是本发明所述的四旋翼无人机结构示意图。FIG3 is a schematic diagram of the structure of the quad-rotor UAV according to the present invention.

图中：电动推杆6、直流无刷电机7、桨叶8、无人机机架9和楼梯10。In the figure: an electric push rod 6, a brushless DC motor 7, propeller blades 8, a drone frame 9 and stairs 10.

具体实施方式DETAILED DESCRIPTION

下面结合具体的实施例对本发明做进一步的详细说明，所述是对本发明的解释而不是限定。The present invention is further described in detail below in conjunction with specific embodiments, which are intended to explain the present invention rather than to limit it.

本发明一种基于改进YOLOv5算法的四旋翼无人机降落方法，参阅图3，该无人机主要包括无人机机架9，无人机机架9底部安装有电动推杆6，无人机机架9上安装有摄像头模块、GPS定位模块、数据图形传输模块、机载电脑、飞控板和航模电池，无人机机架9的四个旋翼处安装有直流无刷电机7，直流无刷电机7位于上端的输出端安装有桨叶8。摄像头模块具体为RER-USB4K02AF-V100，数据图形传输模块型号具体为HM30，飞控板的型号为Pixhawk。The present invention discloses a landing method for a quad-rotor drone based on an improved YOLOv5 algorithm. Referring to FIG3 , the drone mainly includes a drone frame 9, an electric push rod 6 is installed at the bottom of the drone frame 9, a camera module, a GPS positioning module, a data graphic transmission module, an onboard computer, a flight control board and a model aircraft battery are installed on the drone frame 9, and a DC brushless motor 7 is installed at the four rotors of the drone frame 9, and a blade 8 is installed at the output end of the DC brushless motor 7 at the upper end. The camera module is specifically RER-USB4K02AF-V100, the data graphic transmission module model is specifically HM30, and the flight control board model is Pixhawk.

参阅图1，摄像头模块通过数据图形传输模块连接在机载电脑的输入端，机载电脑的输出端连接在飞控板的输入端，连接时采用USB数据线，数据图形传输模块的另一端同时通过杜邦线连接在飞控板的输入端。摄像头模块实时采集无人机所处环境的视频流，并将视频流通过数据图形传输模块传输给机载电脑。Referring to Figure 1, the camera module is connected to the input end of the onboard computer through the data graphic transmission module, and the output end of the onboard computer is connected to the input end of the flight control board. The connection is made using a USB data cable, and the other end of the data graphic transmission module is simultaneously connected to the input end of the flight control board through a DuPont line. The camera module collects the video stream of the drone's environment in real time, and transmits the video stream to the onboard computer through the data graphic transmission module.

航模电池通过5V2A BEC供电型电流计分别与机载电脑的供电端和飞控板的供电端连接，连接时选用USB数据线，航模电池用于给机载电脑和飞控板供电。GPS定位模块通过杜邦线连接在飞控板的输入端，可实时采集四旋翼无人机的位置信息，并通过通讯模块传输至飞控板的地面站软件上。电动推杆6、直流无刷电机7均通过杜邦线分别与飞控板的输出端相连接。The aircraft model battery is connected to the power supply end of the onboard computer and the power supply end of the flight control board through a 5V2A BEC power supply ammeter. A USB data cable is used for connection. The aircraft model battery is used to power the onboard computer and the flight control board. The GPS positioning module is connected to the input end of the flight control board through a DuPont line, which can collect the position information of the quadcopter in real time and transmit it to the ground station software of the flight control board through the communication module. The electric push rod 6 and the brushless DC motor 7 are respectively connected to the output end of the flight control board through DuPont lines.

参阅图2，飞控板配合无人机地面站软件生成控制无人机的横滚、俯仰、偏航角，驱动直流无刷电机7以不同速度和方向旋转，从而控制无人机以自由姿态飞行。机载电脑内部安装有YOLOv5算法的运行环境以及操作系统，通过运行改进后的YOLOv5算法，摄像头模块拍摄到无人机所处环境的视频后，通过数据图形传输模块送入至机载电脑，机载电脑可以实时快速的检测由摄像头模块拍摄的实时视频流是否为经过训练后的4种降落环境，若是，则机载电脑结合地面站以及飞控板的控制算法估算出生成用于控制四旋翼无人机降落的偏航角，若不是则继续寻找符合降落特征的降落环境，直至找到为止，之后地面站将偏航角通过数据图形传输模块传回给飞控板，进而控制直流无刷电机7，从而控制四旋翼无人机的降落，待四旋翼无人机降落至1～2m后，若降落环境为坡地、坑地或楼梯时，飞控板驱动无人机底部的若干支电动推杆6，图3仅仅展示了3个，电动推杆6为现有结构，较细的支撑杆插接在较宽的支撑杆中，将较细的支撑杆外伸不同的长度，协助无人机在不同地面进行自主降落，若降落环境为平地，则无需启动电动推杆6，四旋翼无人机自然降落即可。Refer to Figure 2. The flight control board cooperates with the drone ground station software to generate and control the roll, pitch, and yaw angles of the drone, and drives the DC brushless motor 7 to rotate at different speeds and directions, thereby controlling the drone to fly in a free posture. The onboard computer is installed with the operating environment and operating system of the YOLOv5 algorithm. By running the improved YOLOv5 algorithm, the camera module captures the video of the drone's environment and sends it to the onboard computer through the data graphics transmission module. The onboard computer can quickly detect in real time whether the real-time video stream captured by the camera module is the four trained landing environments. If so, the onboard computer combines the ground station and the control algorithm of the flight control board to estimate the yaw angle used to control the landing of the quadcopter. If not, it continues to search for a landing environment that meets the landing characteristics until it is found. Then the ground The station transmits the yaw angle back to the flight control board through the data graphic transmission module, and then controls the DC brushless motor 7, thereby controlling the landing of the quadcopter. After the quadcopter lands to 1-2m, if the landing environment is a slope, a pit or stairs, the flight control board drives several electric push rods 6 at the bottom of the drone. Figure 3 only shows 3 of them. The electric push rod 6 is an existing structure. The thinner support rod is inserted into the wider support rod. The thinner support rod is extended to different lengths to assist the drone to land autonomously on different grounds. If the landing environment is flat, there is no need to start the electric push rod 6, and the quadcopter can land naturally.

由于将摄像头模块拍摄的实时视频流训练为机载电脑可以识别的4种降落环境过程较为复杂，所以之后单独具体进行说明。Since the process of training the real-time video stream captured by the camera module into four landing environments that can be recognized by the onboard computer is relatively complicated, it will be explained in detail later.

本发明一种基于改进YOLOv5算法的四旋翼无人机降落方法，在地面环境检测时，即将摄像头模块拍摄的实时视频流训练为机载电脑可以识别的4种降落环境，包括以下步骤：The present invention provides a landing method for a quad-rotor drone based on an improved YOLOv5 algorithm. When detecting the ground environment, the real-time video stream captured by the camera module is trained into four landing environments that can be recognized by an onboard computer, including the following steps:

步骤一：改进YOLOv5算法的实时检测模型在基于pytorch的深度学习框架下运行，首先将摄像头模块拍摄的视频流提取为一帧一帧的图片，作为实验数据的测试集以及训练集，之后利用LabelImage进行图片数据集的标注，分成平地、坡地、坑地、楼梯4种不同的地面环境，标注为Pascal VOC的标签格式，保存为xml格式的文件。Step 1: Improve the real-time detection model of the YOLOv5 algorithm and run it under the deep learning framework based on pytorch. First, extract the video stream captured by the camera module as frames of pictures as the test set and training set of the experimental data. Then use LabelImage to annotate the picture data set and divide it into four different ground environments: flat land, slope, pit, and stairs. Annotate them in the Pascal VOC label format and save them as xml format files.

步骤二：将保存好的xml格式的标注文件转化为txt格式的标注文件，并且按照1：4的数量比例划分为测试集与训练集。Step 2: Convert the saved annotation file in XML format into an annotation file in TXT format, and divide it into a test set and a training set in a ratio of 1:4.

步骤三：该改进的YOLOv5算法为单阶段检测算法，现有的YOLOv5检测算法一共分为4种大小不一的版本，分别为YOLOv5s、YOLOv5m、YOLOv5l、YOLOv5x。由于要求实时检测的速度快，延时小，故选择YOLOv5s模型，该模型网络最小、速度最快、所占内存也最小，具有小而精的特点。Step 3: The improved YOLOv5 algorithm is a single-stage detection algorithm. The existing YOLOv5 detection algorithm is divided into 4 versions of different sizes, namely YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. Since the real-time detection is required to be fast and have a small delay, the YOLOv5s model is selected. This model has the smallest network, the fastest speed, and the smallest memory usage, and is small and precise.

为了适应上述的标注文件，在之后的步骤中需要修改模型的相关代码参数，运行后即可进入到训练代码中进行之后的权重的预训练以及正式训练。预训练以及正式训练过程中，模型选用Mosaic数据增强，提升模型的训练速度以及网络精度，并运用自适应锚框计算以及自适应图片缩放方法。In order to adapt to the above annotation files, the relevant code parameters of the model need to be modified in the following steps. After running, you can enter the training code to perform subsequent weight pre-training and formal training. During the pre-training and formal training process, the model uses Mosaic data enhancement to improve the model's training speed and network accuracy, and uses adaptive anchor box calculation and adaptive image scaling methods.

该步骤的主要流程为将每个txt格式的标注图片文件进行格式调整，统一成640*640*3的标准尺寸，并送入YOLOv5s的输入端。The main process of this step is to adjust the format of each labeled image file in txt format, unify it into a standard size of 640*640*3, and send it to the input end of YOLOv5s.

步骤四：在基准网络阶段，主要提取一些通用的网络特征表示。现有的YOLOv5算法主要使用的是Focus结构和CSP结构，Focus结构主要是通过Slice操作进行输入图片的裁剪，依次经过Slice操作、Concat操作后输出320*320*12的特征映射，再经过一个通道数为32的卷积层，输出为320*320*32的特征映射。CSP结构则主要是为了增强卷积神经网络的学习能力，缓解大量需要推理计算的问题，降低内存成本以及计算的瓶颈。Step 4: In the benchmark network stage, some common network feature representations are mainly extracted. The existing YOLOv5 algorithm mainly uses the Focus structure and the CSP structure. The Focus structure mainly crops the input image through the Slice operation, and then outputs a 320*320*12 feature map after the Slice operation and the Concat operation, and then passes through a convolution layer with 32 channels to output a 320*320*32 feature map. The CSP structure is mainly to enhance the learning ability of the convolutional neural network, alleviate the problem of a large number of inference calculations, and reduce memory costs and computing bottlenecks.

本发明选用Ghost模块代替CSP结构，改进后的网络骨干采用由Ghost模块组成的Ghost瓶颈，替代了原有的基准网络阶段的CSP中的三个瓶颈结构，并采用Ghost模块替代传统的卷积层，通过修改卷积方法提高权值的检测速度。Ghost模块分为三个步骤：卷积、Ghost生成和特征地图拼接。首先使用现有的卷积法得到特征映射，然后对每个通道的特征映射进行Φ运算生成Ghost特征图，Φ运算类似于3×3卷积，最后将第一步得到的特征映射和第二步得到的Ghost特征图连接起来，得到最终的结果输出，即完成Ghost模块的工作原理说明，之后便进入步骤五。The present invention uses the Ghost module to replace the CSP structure. The improved network backbone adopts the Ghost bottleneck composed of the Ghost module to replace the three bottleneck structures in the CSP of the original benchmark network stage, and the Ghost module is used to replace the traditional convolution layer. The detection speed of the weight is improved by modifying the convolution method. The Ghost module is divided into three steps: convolution, Ghost generation and feature map splicing. First, the feature map is obtained using the existing convolution method, and then the feature map of each channel is subjected to Φ operation to generate a Ghost feature map. The Φ operation is similar to a 3×3 convolution. Finally, the feature map obtained in the first step and the Ghost feature map obtained in the second step are connected to obtain the final result output, that is, the working principle description of the Ghost module is completed, and then step five is entered.

该步骤的主要流程为处理YOLOv5s的输入端的送进来的图片，即对其进行Focus操作以及Ghost操作，融合了图片的特征以及提高了网络的学习能力，将融合后的图片输出到下一阶段。The main process of this step is to process the image sent to the input end of YOLOv5s, that is, to perform Focus operation and Ghost operation on it, integrate the features of the image and improve the learning ability of the network, and output the integrated image to the next stage.

步骤五：颈部网络阶段依次采用FPN结构和PAN结构。FPN结构是自顶向下，将高层的强语义特征传递下来，对FPN的整个金字塔结构进行增强，但只增强了语义信息，对定位信息没有传递。PAN结构就是针对这一点，在FPN结构的后面添加一个自底向上的金字塔，对FPN结构补充，将低层的强定位特征传递上去。Step 5: The FPN structure and PAN structure are used in the neck network stage. The FPN structure is top-down, passing down the strong semantic features of the high-level layers, enhancing the entire pyramid structure of the FPN, but only enhancing the semantic information, without passing the positioning information. The PAN structure is aimed at this point, adding a bottom-up pyramid behind the FPN structure to supplement the FPN structure and pass up the strong positioning features of the low-level layers.

该步骤的主要流程为对基准网络阶段的输出图片用FPN结构和PAN结构进行图片信息的融合以及特征的提取，输出为3种不同大小的特征预测图。The main process of this step is to use the FPN structure and PAN structure to fuse the image information and extract the features of the output image of the baseline network stage, and output three feature prediction maps of different sizes.

步骤六：在YOLOv5s的输出端，主要是为了选取概率最大的预测框作为实际输出，因此会涉及到损失函数的选取，该部分采用CIoU_Loss作为边界框的损失函数，而要计算CIoU_Loss，需要先计算GIoU_Loss，其计算方法如下：Step 6: At the output end of YOLOv5s, the main purpose is to select the prediction box with the highest probability as the actual output, so it involves the selection of the loss function. This part uses CIoU_Loss as the loss function of the bounding box. To calculate CIoU_Loss, you need to calculate GIoU_Loss first. The calculation method is as follows:

先计算预测框和真实框的最小闭包区域面积，也就是同时包含了预测框和真实框的最小框的面积，然后计算出IoU，再计算预测框和真实框的最小闭包区域中不属于预测框和真实框的区域占预测框和真实框的最小闭包区域的比重，最后用IoU减去这个比重得到GIoU_Loss。First, calculate the minimum enclosed area of the predicted box and the real box, that is, the area of the minimum box that contains both the predicted box and the real box, then calculate IoU, and then calculate the proportion of the area that does not belong to the predicted box and the real box in the minimum enclosed area of the predicted box and the real box to the minimum enclosed area of the predicted box and the real box, and finally subtract this proportion from IoU to get GIoU_Loss.

上式中：GIoU_Loss为边界框的损失函数(非YOLOv5的边界框损失函数)，IoU为预测框和真实框的交集与预测框和真实框的并集的比值，A_c为预测框和真实框的最小闭包区域面积，U为预测框和真实框的最小闭包区域中属于预测框和真实框的面积。In the above formula: GIoU_Loss is the loss function of the bounding box (not the bounding box loss function of YOLOv5), IoU is the ratio of the intersection of the predicted box and the true box to the union of the predicted box and the true box, _Ac is the minimum enclosed area of the predicted box and the true box, and U is the area of the predicted box and the true box in the minimum enclosed area of the predicted box and the true box.

DIoU_Loss要比GIoU_Loss更加符合真实框回归的机制，其将真实框与预测框之间的距离，重叠率以及尺度都考虑进去，使得真实框回归变得更加稳定。其计算方法如下：DIoU_Loss is more in line with the mechanism of real box regression than GIoU_Loss. It takes into account the distance, overlap rate and scale between the real box and the predicted box, making the real box regression more stable. Its calculation method is as follows:

上式中：b,b^gt分别代表了预测框和真实框的中心点，ρ为这两个中心点间的欧式距离，c为能够同时包含预测框和真实框的最小闭包区域的对角线距离。而YOLOv5的边界框损失函数CIoU_Loss定义如下：In the above formula: b, b ^gt represent the center points of the predicted box and the real box respectively, ρ is the Euclidean distance between the two center points, and c is the diagonal distance of the minimum closure area that can contain both the predicted box and the real box. The bounding box loss function CIoU_Loss of YOLOv5 is defined as follows:

其中的α为权重函数，v是影响因子。Here α is the weight function and v is the impact factor.

针对预测框的筛选工作，进行NMS(即非极大值抑制)操作，将预测概率小的预测框剔除掉，留下预测概率大的预测框，一步一步重复，直至留下预测概率最大的预测框，这是因为CIoU_Loss中包含影响因子v，涉及真实框的信息。而YOLOv5中采用加权非极大值抑制的方式，加权非极大值抑制与非极大值抑制相比，是在进行剔除预测框的过程中，根据网络预测的概率进行加权，得到新的预测框，把该预测框作为最终的预测框。For the screening of prediction boxes, NMS (non-maximum suppression) operation is performed to remove the prediction boxes with small prediction probabilities and leave the prediction boxes with large prediction probabilities. This is repeated step by step until the prediction box with the largest prediction probability is left. This is because CIoU_Loss contains the influence factor v, which involves the information of the real box. YOLOv5 uses weighted non-maximum suppression. Compared with non-maximum suppression, weighted non-maximum suppression is to weight the prediction boxes according to the probability predicted by the network during the process of removing the prediction boxes, and obtain a new prediction box, which is used as the final prediction box.

因此，该部分的主要流程为对颈部网络阶段输出的3种不同大小的特征预测图进行加权非极大值抑制，筛选与真实框概率最接近的预测框，同时将其作为最后的结果输出。至此，训练部分的所有流程全部结束。Therefore, the main process of this part is to perform weighted non-maximum suppression on the three different sizes of feature prediction maps output by the neck network stage, select the prediction box with the closest probability to the true box, and output it as the final result. At this point, all processes of the training part are completed.

步骤七：在实际测试时，是将未知的图片送入训练好的模型中，模型判断出该图片的类型，具体的过程除不涉及真实框外，测试的其余步骤均与上述训练的步骤一到步骤六相同，证明训练好的模型可靠，可以在机载电脑中运行。Step 7: In the actual test, the unknown image is sent to the trained model, and the model determines the type of the image. The specific process does not involve the real frame. The rest of the test steps are the same as steps 1 to 6 of the above training, proving that the trained model is reliable and can run on the onboard computer.

因此机载电脑便先将数据图形传输模块传入的实时视频流提取为一帧一帧的图片，再将这些图片按照上述过程标注后进行转化、格式调整，最后将得到的统一尺寸的标注图片文件送入训练好的网络模型中，准确判断出该实时视频流是否对应平地、坡地、坑地或楼梯。Therefore, the onboard computer first extracts the real-time video stream transmitted by the data graphics transmission module into frames of pictures, and then annotates these pictures according to the above process for conversion and format adjustment. Finally, the obtained annotated image files of uniform size are sent to the trained network model to accurately determine whether the real-time video stream corresponds to flat ground, sloped ground, pit or stairs.

Claims

1. A drone landing method based on an improved YOLOv5 algorithm, characterized in that an electric push rod is installed at the bottom of the drone frame of the drone, and a camera module, a data graphics transmission module, an airborne computer and a flight control board are installed on the drone frame, comprising the following steps:

Step 1, the camera module captures the video of the drone's environment, which is then transmitted to the onboard computer via the data graphics transmission module;

Step 2: The onboard computer uses the YOLOv5s algorithm, based on the Mosaic data enhancement model, and uses the adaptive anchor frame calculation and adaptive image scaling methods to detect whether the real-time video stream transmitted by the data graphics transmission module corresponds to a flat environment, a slope environment, a pit environment, or a stair environment;

Step 3: If the detection result of step 2 is the landing environment of the drone, the onboard computer combines the ground station and the flight control board to generate a yaw angle for controlling the landing of the drone; if the detection result of step 2 is not the landing environment of the drone, repeat steps 1 and 2 until the detection result is the landing environment of the drone, and the onboard computer combines the ground station and the flight control board to generate a yaw angle for controlling the landing of the drone;

Step 4: The ground station transmits the yaw angle obtained in step 3 to the flight control board through the data graphic transmission module. The flight control board controls the drone to land. When the drone is 1 to 2 meters away from the ground, if the landing environment is flat, the drone will land naturally. If the landing environment is a slope, pit or stairs, the flight control board drives the electric push rod to extend outward to assist the drone to land in the corresponding landing environment.

The step 2 includes the following specific sub-steps:

Step 2a, first extract the real-time video stream transmitted by the data graphic transmission module into frames of pictures, then mark these pictures and divide them into flat environment, slope environment, pit environment and stair environment, and finally convert and adjust the format to obtain a uniform-sized marked picture file;

Step 2b, sending the annotated image file described in step 2a to the input end of YOLOv5s, performing Focus operation and Ghost operation in sequence to obtain a fused image, fusing information and extracting features of the fused image, and outputting three feature prediction images;

Step 2c, performing a non-maximum suppression operation on the feature prediction map described in step 2b to obtain a prediction box that is closest to the true box probability, and obtain a trained network model;

Step 2d, the onboard computer first extracts the real-time video stream transmitted by the data graphic transmission module into frames of pictures, then annotates the pictures, converts them, and adjusts their formats, and finally sends the obtained annotated picture files of uniform size to the network model trained in step 2c to determine whether the real-time video stream corresponds to a flat environment, a slope environment, a pit environment, or a stair environment;

In step 2c, when performing the non-maximum suppression operation, CIoU_Loss is used as the loss function of the bounding box. CIoU_Loss is shown in the following formula:

Among them, IoU is the ratio of the intersection of the predicted box and the true box to the union of the predicted box and the true box, b, ^bgt represent the center points of the predicted box and the true box respectively, ρ is the Euclidean distance between the center points of the predicted box and the true box, c is the diagonal distance of the minimum closure area that contains both the predicted box and the true box, α is the weight function, and v is the influencing factor.

2. According to the unmanned aerial vehicle landing method based on the improved YOLOv5 algorithm of claim 1, it is characterized in that the step 2b uses the FPN structure and the PAN structure to fuse the information and extract the features of the fused picture in turn, and outputs three feature prediction maps.

3. The unmanned aerial vehicle landing method based on improved YOLOv5 algorithm according to claim 1, wherein the camera module is RER-USB4K02AF-V100.

4. The unmanned aerial vehicle landing method based on the improved YOLOv5 algorithm according to claim 1 is characterized in that the camera module is connected to the input end of the onboard computer through a data graphic transmission module, the output end of the onboard computer is connected to the input end of the flight control board, one end of the data graphic transmission module is connected to the input end of the flight control board, and the model of the data graphic transmission module is HM30.

5. The unmanned aerial vehicle landing method based on improved YOLOv5 algorithm according to claim 1, is characterized in that, it also includes a model aircraft battery, and the model aircraft battery is connected to the power supply end of the onboard computer and the power supply end of the flight control board respectively through a 5V2A BEC power supply type ammeter.

6. The unmanned aerial vehicle landing method based on the improved YOLOv5 algorithm according to claim 1 is characterized in that it also includes a GPS positioning module connected to the input end of the flight control board, and the GPS positioning module is used to transmit the collected real-time position information of the unmanned aerial vehicle to the ground station through the communication module.

7. The drone landing method based on the improved YOLOv5 algorithm according to claim 1 is characterized in that the electric push rod is connected to the output end of the flight control board, and there are 3 to 5 electric push rods.

8. The unmanned aerial vehicle landing method based on the improved YOLOv5 algorithm according to claim 1, wherein the model of the flight control board is Pixhawk.