CN112464930A

CN112464930A - Target detection network construction method, target detection method, device and storage medium

Info

Publication number: CN112464930A
Application number: CN201910857984.8A
Authority: CN
Inventors: 徐航; 黎嘉伟; 李震国; 张维; 梁小丹
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-09-09
Filing date: 2019-09-09
Publication date: 2021-03-09

Abstract

The present application provides a construction method of a target detection network, a target detection method, an apparatus and a computer-readable storage medium. It involves the field of artificial intelligence, specifically the field of computer vision. The construction method includes: determining a search space of the target detection network; determining an initial network architecture of the target detection network according to the search space of the target detection network, and updating the initial network architecture of the target detection network according to the search space of the target detection network Iterate until the target detection network that meets the preset requirements is obtained. Wherein, the optional connection relationship of the feature fusion layer in the target detection network includes any node of one layer of neural network and any node of another layer of neural network between any two adjacent layers of neural networks in the feature fusion layer Connection. The present application can simplify the complexity of the target detection network.

Description

Object detection network construction method, object detection method, device and storage medium

技术领域technical field

本申请涉及人工智能领域，并且更具体地，涉及一种目标检测网络构建方法、目标检测方法、装置和存储介质。The present application relates to the field of artificial intelligence, and more particularly, to a method for constructing a target detection network, a method for target detection, an apparatus and a storage medium.

背景技术Background technique

人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说，人工智能是计算机科学的一个分支，它企图了解智能的实质，并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法，使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人，自然语言处理，计算机视觉，决策与推理，人机交互，推荐与搜索，AI基础理论等。Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that responds in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theory.

随着人工智能技术的快速发展，神经网络(例如，深度神经网络)近年来在图像、视频以及语音等多种媒体信号的处理与分析中取得了很大的成就。一个性能优良的神经网络往往拥有精妙的网络结构，而这需要具有高超技能和丰富经验的人类专家花费大量精力进行构建。为了更好地构建神经网络，人们提出了通过神经网络结构搜索(neuralarchitecturesearch，NAS)的方法来搭建神经网络，通过自动化地搜索神经网络结构，从而得到性能优异的神经网络结构。With the rapid development of artificial intelligence technology, neural networks (eg, deep neural networks) have made great achievements in the processing and analysis of various media signals such as images, videos, and speech in recent years. A well-performing neural network often has a delicate network structure, which requires a lot of effort by human experts with high skills and rich experience to construct. In order to better construct a neural network, a neural network structure search (NAS) method is proposed to build a neural network, and a neural network structure with excellent performance can be obtained by automatically searching the neural network structure.

目标检测是计算机视觉领域的基本任务之一，目标检测一般是在图像中定位目标物体并赋予目标物体相应的标签。当前主流的目标检测系统主要由骨干网络、特征融合层、区域候选网络(region proposal network，RPN)和区域卷积神经网络(regionconvolutionalneural network head，RCNN)头组成。Object detection is one of the basic tasks in the field of computer vision. Object detection generally locates the target object in the image and assigns the corresponding label to the target object. The current mainstream object detection system is mainly composed of a backbone network, a feature fusion layer, a region proposal network (RPN) and a region convolutional neural network (RCNN) head.

传统方案一般是由专家按照一定的策略来手工设计目标检测网络，需要消耗大量的人力成本和时间成本，设计得到的目标检测网络的性能也比较一般。The traditional solution is that the target detection network is manually designed by experts according to a certain strategy, which requires a lot of labor cost and time cost, and the performance of the designed target detection network is also relatively general.

发明内容SUMMARY OF THE INVENTION

本申请提供一种目标检测网络构建方法、目标检测方法、装置和存储介质，以构建出复杂度更低的目标检测网络。The present application provides a target detection network construction method, target detection method, device and storage medium, so as to construct a target detection network with lower complexity.

第一方面，提供了一种目标检测网络的构建方法，该目标检测网络包括骨干网络、特征融合层、区域候选网络RPN和区域卷积神经网络RCNN，该方法包括：确定目标检测网络的搜索空间，其中，该目标检测网络的搜索空间包括特征融合层的搜索空间；根据目标检测网络的搜索空间确定目标检测网络的初始网络架构；根据目标检测网络的搜索空间对目标检测网络的初始网络架构进行迭代更新，直到得到满足预设要求的目标检测网络。In a first aspect, a method for constructing a target detection network is provided, the target detection network includes a backbone network, a feature fusion layer, a regional candidate network RPN and a regional convolutional neural network RCNN, the method includes: determining a search space of the target detection network , wherein the search space of the target detection network includes the search space of the feature fusion layer; the initial network architecture of the target detection network is determined according to the search space of the target detection network; the initial network architecture of the target detection network is determined according to the search space of the target detection network. Iteratively update until the target detection network that meets the preset requirements is obtained.

其中，上述目标检测网络的搜索空间包括特征融合层的搜索空间。Wherein, the search space of the above target detection network includes the search space of the feature fusion layer.

上述特征融合层的搜索空间包括特征融合层的可选连接关系，该特征融合层的可选连接关系具体包括多层神经网络中的相邻两层神经网络中的一层神经网络的任意一个节点与另一层神经网络中的任意一个节点的连接。The search space of the above-mentioned feature fusion layer includes the optional connection relationship of the feature fusion layer, and the optional connection relationship of the feature fusion layer specifically includes any node of a layer of neural networks in two adjacent layers of neural networks in a multi-layer neural network. A connection to any node in another layer of neural network.

另外，上述目标检测网络的初始网络架构中的特征融合层是根据特征融合层的搜索空间确定的。也就是说，在根据目标检测网络的搜索空间确定目标检测网络的初始网络架构时，具体可以根据特征融合层的搜索空间来确定目标检测网络的初始网络架构中的特征融合层的网络架构。In addition, the feature fusion layer in the initial network architecture of the above target detection network is determined according to the search space of the feature fusion layer. That is, when the initial network architecture of the target detection network is determined according to the search space of the target detection network, the network architecture of the feature fusion layer in the initial network architecture of the target detection network can be specifically determined according to the search space of the feature fusion layer.

应理解，在目标检测网络的搜索空间确定目标检测网络的初始网络架构时，目标检测网络的网络层次数目以及目标检测网络包含的节点个数可以是预先确定好的，具体地，可以根据目标检测网络的应用需求或者目标检测性能的要求来确定目标检测网络的网络层次数目和目标检测网络包含的节点个数。It should be understood that when the initial network architecture of the target detection network is determined in the search space of the target detection network, the number of network layers of the target detection network and the number of nodes included in the target detection network may be predetermined. The application requirements of the network or the requirements of target detection performance determine the number of network layers of the target detection network and the number of nodes contained in the target detection network.

例如，当对目标检测网络的目标检测性能要求较高时，目标检测网络的网络层次数目和目标检测网络包含的节点个数可以比较大，而当对目标检测网络的目标检测速度/复杂度要求较高时，目标检测网络的网络层次数目和目标检测网络包含的节点个数可以比较小。For example, when the target detection performance of the target detection network is required to be high, the number of network layers of the target detection network and the number of nodes contained in the target detection network can be relatively large, and when the target detection speed/complexity of the target detection network is required. When it is higher, the number of network layers of the target detection network and the number of nodes contained in the target detection network can be relatively small.

本申请中，由于特征融合层的搜索空间中包含的特征融合层的可选连接关系更多，因此，本申请可以根据更多可选的连接关系来更合理的确定目标检测网络的初始网络架构中的特征融合层的网络架构，并对特征融合层的网络架构进行更新，能够简化最终得到的目标检测网络的复杂度。In this application, since the search space of the feature fusion layer contains more optional connection relationships of the feature fusion layer, this application can more reasonably determine the initial network architecture of the target detection network according to more optional connection relationships The network architecture of the feature fusion layer in the feature fusion layer is updated, and the network architecture of the feature fusion layer is updated, which can simplify the complexity of the final target detection network.

具体地，在本申请中，针对特征融合层的搜索空间来说，由于采用了可选连接关系更加自由的搜索空间，因此，与手工设定网络架构的方式相比，本申请在根据特征融合层的搜索空间确定目标检测网络的初始网络架构中的特征融合层的网络架构，并对特征融合层的网络架构进行更新时，能够得到更加简化的特征融合层的网络结构，从而能够最终简化目标检测网络的复杂度，减少目标检测网络部署时需要占用的存储空间。Specifically, in the present application, for the search space of the feature fusion layer, since a search space with a more free optional connection relationship is adopted, compared with the way of manually setting the network architecture, the present application uses feature fusion according to the search space. The search space of the layer determines the network architecture of the feature fusion layer in the initial network architecture of the target detection network, and when the network architecture of the feature fusion layer is updated, a more simplified network structure of the feature fusion layer can be obtained, which can finally simplify the target. The complexity of the detection network is reduced, and the storage space required for the deployment of the target detection network is reduced.

此外，由于特征融合层的搜索空间中包含了更多可选的连接关系，因此，在根据特征融合层的搜索空间确定目标检测网络的初始网络架构中的特征融合层的网络架构，并对特征融合层的网络架构进行更新，最终能够构建出性能更好的目标检测网络。In addition, since the search space of the feature fusion layer contains more optional connection relationships, the network architecture of the feature fusion layer in the initial network architecture of the target detection network is determined according to the search space of the feature fusion layer, and the features The network architecture of the fusion layer is updated, and finally a target detection network with better performance can be constructed.

上述第一方面的目标检测网络的构建方法可以是一种神经网络的自动构建方法，上述第一方面的目标检测网络的构建方法可以由目标检测网络的构建装置来自动执行。The method for constructing a target detection network in the first aspect may be an automatic method for constructing a neural network, and the method for constructing a target detection network in the first aspect may be automatically executed by a device for constructing a target detection network.

可选地，在上述目标检测网络中，骨干网络的网络架构以及RPN的网络结构是预先确定好的。Optionally, in the above target detection network, the network structure of the backbone network and the network structure of the RPN are predetermined.

也就是说，在对上述目标检测网络的初始网络架构进行更新的过程中，可以只对目标检测网络的初始网络架构中的特征融合层和RCNN进行更新。That is to say, in the process of updating the initial network architecture of the target detection network, only the feature fusion layer and the RCNN in the initial network architecture of the target detection network may be updated.

另外，上述骨干网络的网络架构以及RPN的网络结构也可以是事先未确定好的网络架构，这样在对目标检测网络的初始网络架构进行更新的过程中，可以对目标检测网络的骨干网络、特征融合层、RPN以及RCNN的网络架构进行更新。In addition, the network architecture of the above-mentioned backbone network and the network architecture of the RPN may also be a network architecture that has not been determined in advance, so that in the process of updating the initial network architecture of the target detection network, the backbone network and features of the target detection network can be updated. The network architecture of fusion layer, RPN and RCNN is updated.

结合第一方面，在第一方面的某些实现方式中，上述根据目标检测网络的搜索空间，对目标检测网络的初始网络架构进行迭代更新，直到得到满足预设要求的目标检测网络，包括：根据目标检测网络的搜索空间，对目标检测网络的初始网络架构进行迭代更新，以减小目标检测网络对应的损失函数的取值，进而得到满足预设要求的目标检测网络。With reference to the first aspect, in some implementations of the first aspect, the above-mentioned initial network architecture of the target detection network is iteratively updated according to the search space of the target detection network, until a target detection network that meets the preset requirements is obtained, including: According to the search space of the target detection network, the initial network architecture of the target detection network is iteratively updated to reduce the value of the loss function corresponding to the target detection network, thereby obtaining a target detection network that meets the preset requirements.

其中，上述损失函数包括目标检测网络的目标检测误差和/或目标检测网络的复杂度。Wherein, the above-mentioned loss function includes the target detection error of the target detection network and/or the complexity of the target detection network.

应理解，在上述迭代更新过程中可以根据目标检测网络对应的损失函数的取值来对目标检测网络的初始网络架构进行迭代更新，以使得目标检测网络对应的损失函数的取值尽可能的小，直到得到满足预设要求的目标检测网络。It should be understood that in the above iterative update process, the initial network architecture of the target detection network can be iteratively updated according to the value of the loss function corresponding to the target detection network, so that the value of the loss function corresponding to the target detection network is as small as possible. , until the target detection network that meets the preset requirements is obtained.

具体地，在对目标检测网络的初始网络架构进行迭代更新时，可以对目标检测网络的网络结构(网络中不同节点之间的连接关系)进行调整，并且在每次调整后计算目标检测网络对应的损失函数的取值，然后再根据目标检测网络对应的损失函数的取值再对目标检测网络的网络结构进行更新，这样一直迭代下去，直到得到满足预设要求的目标检测网络。Specifically, when the initial network architecture of the target detection network is iteratively updated, the network structure of the target detection network (the connection relationship between different nodes in the network) can be adjusted, and the corresponding target detection network is calculated after each adjustment. Then, according to the value of the loss function corresponding to the target detection network, the network structure of the target detection network is updated, and so on, until the target detection network that meets the preset requirements is obtained.

在上述迭代更新过程中，可以在每次对目标检测网络的网络架构进行更新后计算目标检测网络对应的损失函数的取值，如果损失函数的取值满足要求则停止更新对目标检测网络架构的更新，此时得到的目标检测网络架构就是满足预设要求的目标检测网络，如果损失函数的取值不满足要求，则可以根据该损失函数的取值继续来更新目标检测网络的网络参数，直到得到满足预设要求的目标检测网络。In the above iterative update process, the value of the loss function corresponding to the target detection network can be calculated after each update of the network architecture of the target detection network. If the value of the loss function meets the requirements, the update of the target detection network architecture is stopped. Update, the target detection network architecture obtained at this time is the target detection network that meets the preset requirements. If the value of the loss function does not meet the requirements, you can continue to update the network parameters of the target detection network according to the value of the loss function, until A target detection network that meets the preset requirements is obtained.

结合第一方面，在第一方面的某些实现方式中，特征融合层的搜索空间还包括特征融合层的可选操作类型，该特征融合层的可选操作类型包括多层神经网络中的相邻两层神经网络中的一层神经网络的任意一个节点与另一层神经网络中的任意一个节点的连接所对应的卷积操作。With reference to the first aspect, in some implementations of the first aspect, the search space of the feature fusion layer further includes an optional operation type of the feature fusion layer, and the optional operation type of the feature fusion layer includes a phase in a multi-layer neural network. The convolution operation corresponding to the connection between any node of one layer of neural network in the adjacent two-layer neural network and any node of the other layer of neural network.

其中，上述卷积操作包括空洞卷积操作。Wherein, the above-mentioned convolution operation includes a hole convolution operation.

本申请中，当特征融合层中的可选操作类型包括空洞卷积操作时，能够在采用更少的卷积参数来实现基本相同的目标检测性能。In this application, when the optional operation type in the feature fusion layer includes a hole convolution operation, it can achieve substantially the same target detection performance with fewer convolution parameters.

此外，当特征融合层中的可选操作类型包括空洞卷积操作时，能够在卷积参数量基本相同的情况下，获得具有更好目标检测性能的目标检测网络。In addition, when the optional operation type in the feature fusion layer includes atrous convolution operation, a target detection network with better target detection performance can be obtained under the condition that the amount of convolution parameters is basically the same.

具体地，在相同参数量的情况下，采用空洞卷积进行卷积处理能够取得比传统卷积更大的感受野，因此，能够使得最终得到的目标检测网络具有更好的目标检测性能。Specifically, in the case of the same amount of parameters, using atrous convolution to perform convolution processing can achieve a larger receptive field than traditional convolution, so the final target detection network can have better target detection performance.

结合第一方面，在第一方面的某些实现方式中，RCNN包括多个基本单元，多个基本单元中的每个基本单元由至少两个节点构成，目标检测网络的搜索空间还包括RCNN的搜索空间，RCNN的搜索空间包括多个基本单元中的每个基本单元的搜索空间，每个基本单元的搜索空间包括每个基本单元的可选连接关系，每个基本单元的可选连接关系包括每个基本单元内的任意两个节点之间的连接；目标检测网络的初始网络架构中的RCNN是根据RCNN的搜索空间确定的。With reference to the first aspect, in some implementations of the first aspect, the RCNN includes multiple basic units, each of the multiple basic units is composed of at least two nodes, and the search space of the target detection network further includes the RCNN. Search space, the search space of RCNN includes the search space of each basic unit in the plurality of basic units, the search space of each basic unit includes the optional connection relationship of each basic unit, and the optional connection relationship of each basic unit includes The connection between any two nodes within each basic unit; the RCNN in the initial network architecture of the object detection network is determined according to the search space of the RCNN.

应理解，在上述每个基本单元内，节点是按照输入到输出的方向进行连接的。It should be understood that in each of the above basic units, nodes are connected in the direction from input to output.

本申请中，在RCNN的搜索空间内，由于RCNN的每个基本单元内的任意两个节点之间都可以连接，因此，本申请能够根据更宽松的RCNN的搜索空间来更合理的确定RCNN的初始网络结构，并对RCNN的初始网络结构进行更新，能够简化目标检测网络的复杂度。In this application, in the search space of RCNN, since any two nodes in each basic unit of RCNN can be connected, this application can more reasonably determine the search space of RCNN according to the more relaxed RCNN search space. The initial network structure and updating the initial network structure of RCNN can simplify the complexity of the target detection network.

进一步的，在本申请中，当根据特征融合层的搜索空间确定目标检测网络的初始网络架构中的特征融合层，并对特征融合层的网络架构进行更新，根据RCNN的搜索空间确定目标检测网络的初始网络架构中的RCNN的网络架构，并对RCNN的网络架构进行更新时，能够同时实现对特征融合层的网络结构和RCNN的网络结构的更新和优化，从而使得最终优化得到的RCNN的网络结构与特征融合层的网络结构更加匹配，可以得到目标检测性能更好的目标检测网络。Further, in this application, when the feature fusion layer in the initial network architecture of the target detection network is determined according to the search space of the feature fusion layer, and the network architecture of the feature fusion layer is updated, the target detection network is determined according to the search space of the RCNN. The network architecture of the RCNN in the initial network architecture, and when the network architecture of the RCNN is updated, the network structure of the feature fusion layer and the network structure of the RCNN can be updated and optimized at the same time, so that the final optimized RCNN network can be obtained. The structure is more matched with the network structure of the feature fusion layer, and a target detection network with better target detection performance can be obtained.

结合第一方面，在第一方面的某些实现方式中，每个基本单元的搜索空间还包括每个基本单元的可选操作类型，每个基本单元的可选操作类型包括每个基本单元内的任意两个节点之间的连接所对应的卷积操作，该卷积操作包括空洞卷积操作。With reference to the first aspect, in some implementations of the first aspect, the search space of each basic unit further includes an optional operation type of each basic unit, and the optional operation type of each basic unit includes The convolution operation corresponding to the connection between any two nodes of , the convolution operation includes the hole convolution operation.

本申请中，当RCNN中的可选操作类型包括空洞卷积操作时，能够在采用更少的卷积参数来实现基本相同的目标检测性能。In this application, when the optional operation type in RCNN includes atrous convolution operation, it can achieve substantially the same target detection performance with fewer convolution parameters.

另外，当RCNN的可选操作类型包括空洞卷积操作时，能够在卷积参数量基本相同的情况下，获得具有更好目标检测性能的目标检测网络。In addition, when the optional operation type of RCNN includes atrous convolution operation, a target detection network with better target detection performance can be obtained under the condition that the amount of convolution parameters is basically the same.

结合第一方面，在第一方面的某些实现方式中，每个基本单元内的任意两个节点之间的连接所对应的卷积操作包括间隔数为2的空洞卷积操作。With reference to the first aspect, in some implementations of the first aspect, the convolution operation corresponding to the connection between any two nodes in each basic unit includes an atrous convolution operation with an interval of 2.

由于RCNN中处理的特征图的分辨率一般比较低，比较适合采用间隔数较小的空洞卷积操作，因此，对于RCNN中的每个基本单元来说，当可选的操作包括间隔数为2的空洞卷积时，能够提高最终得到的RCNN的性能，进而提高最终得到的目标检测网络的性能。Since the resolution of the feature map processed in RCNN is generally low, it is more suitable to use a hole convolution operation with a small number of intervals. Therefore, for each basic unit in RCNN, when the optional operation includes the number of intervals is 2 When the atrous convolution is used, the performance of the final RCNN can be improved, thereby improving the performance of the final target detection network.

结合第一方面，在第一方面的某些实现方式中，多个基本单元中的至少两个基本单元分别由不同数目的节点构成。With reference to the first aspect, in some implementations of the first aspect, at least two basic units in the plurality of basic units are respectively composed of different numbers of nodes.

本申请中，当RCNN中的多个基本单元中的至少两个基本单元可以由不同数目的节点构成时，使得RCNN中基本单元的构成更加自由，从而能够在确定RCNN的初始网络结构和对RCNN的初始网络结构进行更新时增加RCNN的网络结构的可能性，便于搜索到更好的RCNN的网络结构，使得最终更有可能得到目标检测性能更好的目标检测网络。In this application, when at least two basic units in the multiple basic units in the RCNN can be composed of different numbers of nodes, the composition of the basic units in the RCNN is more free, so that the initial network structure of the RCNN can be determined and the RCNN When the initial network structure is updated, the possibility of increasing the network structure of RCNN is easy to search for a better network structure of RCNN, which makes it more likely to obtain a target detection network with better target detection performance in the end.

结合第一方面，在第一方面的某些实现方式中，每个基本单元的输入特征图的分辨率与每个基本单元的输出特征图的分辨率相同。In conjunction with the first aspect, in some implementations of the first aspect, the resolution of the input feature map of each basic unit is the same as the resolution of the output feature map of each basic unit.

当RCNN中的每个基本单元的输入特征图的分辨率与每个基本单元的输出特征图的分辨率相同时，RCNN中的每个基本单元在处理特征图时不改变特征图的分辨率，便于保留特征图的信息，另外，也有利于跳转链接，否则每个跳转链接都需要额外对齐输出与待输入的特征图大小，效率较低。When the resolution of the input feature map of each basic unit in RCNN is the same as the resolution of the output feature map of each basic unit, each basic unit in RCNN does not change the resolution of the feature map when processing the feature map, It is convenient to retain the information of the feature map. In addition, it is also beneficial to jump links. Otherwise, each jump link needs to align the output and the size of the feature map to be input, which is inefficient.

结合第一方面，在第一方面的某些实现方式中，上述满足预设要求的目标检测网络满足下列条件中的至少一种：目标检测网络的检测性能满足预设性能要求；对目标检测网络的网络架构的更新次数大于或者等于预设次数；目标检测网络的复杂度小于或者等于预设复杂度。With reference to the first aspect, in some implementations of the first aspect, the target detection network that meets the preset requirements satisfies at least one of the following conditions: the detection performance of the target detection network meets the preset performance requirements; The number of updates of the network architecture is greater than or equal to the preset number of times; the complexity of the target detection network is less than or equal to the preset complexity.

结合第一方面，在第一方面的某些实现方式中，目标检测网络的复杂度是根据目标检测网络的模型参数的数量或者大小、目标检测网络的内存访问成本MAC以及目标检测网络的浮点运算次数中的至少一种确定的。With reference to the first aspect, in some implementations of the first aspect, the complexity of the target detection network is based on the number or size of model parameters of the target detection network, the memory access cost MAC of the target detection network, and the floating point of the target detection network. At least one of the number of operations is determined.

第二方面，提供了一种目标检测网络的构建方法，目标检测网络包括骨干网络、特征融合层、区域候选网络RPN和区域卷积神经网络RCNN，该方法包括：确定目标检测网络的搜索空间，其中，RCNN包括多个基本单元，多个基本单元中的每个基本单元由至少两个节点构成，目标检测网络的搜索空间包括RCNN的搜索空间，RCNN的搜索空间包括多个基本单元中的每个基本单元的搜索空间，每个基本单元的搜索空间包括每个基本单元的可选连接关系，每个基本单元的可选连接关系包括每个基本单元内的任意两个节点之间的连接；根据目标检测网络的搜索空间确定目标检测网络的初始网络架构，其中，目标检测网络的初始网络架构中的RCNN是根据RCNN的搜索空间确定的；根据目标检测网络的搜索空间，对目标检测网络的初始网络架构进行迭代更新，直到得到满足预设要求的目标检测网络。In a second aspect, a method for constructing a target detection network is provided. The target detection network includes a backbone network, a feature fusion layer, a regional candidate network RPN and a regional convolutional neural network RCNN. The method includes: determining a search space of the target detection network, The RCNN includes multiple basic units, each of the multiple basic units is composed of at least two nodes, the search space of the target detection network includes the RCNN search space, and the RCNN search space includes each of the multiple basic units. The search space of each basic unit, the search space of each basic unit includes the optional connection relationship of each basic unit, and the optional connection relationship of each basic unit includes the connection between any two nodes in each basic unit; The initial network architecture of the target detection network is determined according to the search space of the target detection network, wherein the RCNN in the initial network architecture of the target detection network is determined according to the search space of the RCNN; The initial network architecture is iteratively updated until a target detection network that meets the preset requirements is obtained.

应理解，RCNN的搜索空间可以由RCNN中的每个基本单元的搜索空间组成。It should be understood that the search space of the RCNN may consist of the search space of each basic unit in the RCNN.

本申请中，在RCNN的搜索空间内，RCNN的每个基本单元内的任意两个节点之间都可以连接，与手工设定网络架构的方式相比，本申请能够根据更宽松的RCNN的搜索空间来更合理的确定RCNN的初始网络结构，并对RCNN的初始网络结构进行更新，可以简化目标检测网络的复杂度。In this application, in the search space of RCNN, any two nodes in each basic unit of RCNN can be connected. Space to more reasonably determine the initial network structure of RCNN, and update the initial network structure of RCNN, which can simplify the complexity of the target detection network.

此外，由于RCNN的搜索空间中包含了更多可选的连接关系，因此，在根据RCNN的搜索空间确定目标检测网络的初始网络架构中的RCNN，并对RCNN的网络架构进行更新，最终能够构建出性能更好的目标检测网络。In addition, since the search space of RCNN contains more optional connection relationships, the RCNN in the initial network architecture of the target detection network is determined according to the search space of RCNN, and the network architecture of RCNN is updated to finally construct A target detection network with better performance is obtained.

上述第二方面的目标检测网络的构建方法可以是一种神经网络的自动构建方法，上述第二方面的目标检测网络的构建方法可以由目标检测网络的构建装置来自动执行。The method for constructing a target detection network in the second aspect may be an automatic method for constructing a neural network, and the method for constructing a target detection network in the second aspect may be automatically executed by a device for constructing a target detection network.

可选地，上述骨干网络的网络架构以及RPN的网络结构是预先确定好的。Optionally, the network architecture of the backbone network and the network architecture of the RPN are predetermined.

如果上述骨干网络的网络架构以及RPN的网络结构是预先确定好的，那么，在对上述目标检测网络的初始网络架构进行更新时，可以只对目标检测网络的初始网络架构中的特征融合层和RCNN进行更新。If the network architecture of the backbone network and the network architecture of the RPN are predetermined, then when updating the initial network architecture of the target detection network, only the feature fusion layer and the feature fusion layer in the initial network architecture of the target detection network can be updated. RCNN is updated.

结合第二方面，在第二方面的某些实现方式中，根据目标检测网络的搜索空间，对目标检测网络的初始网络架构进行迭代更新，直到得到满足预设要求的目标检测网络，包括：根据目标检测网络的搜索空间，对目标检测网络的初始网络架构进行迭代更新，以减小目标检测网络对应的损失函数的取值，进而得到满足预设要求的目标检测网络。With reference to the second aspect, in some implementations of the second aspect, the initial network architecture of the target detection network is iteratively updated according to the search space of the target detection network until a target detection network that meets the preset requirements is obtained, including: In the search space of the target detection network, the initial network architecture of the target detection network is iteratively updated to reduce the value of the loss function corresponding to the target detection network, thereby obtaining a target detection network that meets the preset requirements.

在上述迭代更新过程中可以根据目标检测网络对应的损失函数的取值来对目标检测网络的初始网络架构进行迭代更新，以使得目标检测网络对应的损失函数的取值尽可能的小，直到得到满足预设要求的目标检测网络。In the above iterative update process, the initial network architecture of the target detection network can be iteratively updated according to the value of the loss function corresponding to the target detection network, so that the value of the loss function corresponding to the target detection network is as small as possible, until the value of the loss function corresponding to the target detection network is obtained. Object detection network that meets preset requirements.

具体地，在上述迭代更新过程中，可以在每次对目标检测网络的网络架构进行更新后计算目标检测网络对应的损失函数的取值，如果损失函数的取值满足要求则停止更新对目标检测网络架构的更新，此时得到的目标检测网络架构就是满足预设要求的目标检测网络，如果损失函数的取值不满足要求，则可以根据该损失函数的取值继续来更新目标检测网络的网络参数，直到得到满足预设要求的目标检测网络。Specifically, in the above iterative update process, the value of the loss function corresponding to the target detection network can be calculated after each update of the network architecture of the target detection network, and if the value of the loss function meets the requirements, the update of the target detection network is stopped. The update of the network architecture. The target detection network architecture obtained at this time is the target detection network that meets the preset requirements. If the value of the loss function does not meet the requirements, the network of the target detection network can be updated according to the value of the loss function. parameters until a target detection network that meets the preset requirements is obtained.

结合第二方面，在第二方面的某些实现方式中，每个基本单元的搜索空间还包括每个基本单元的可选操作类型，每个基本单元的可选操作类型包括每个基本单元内的任意两个节点之间的连接所对应的卷积操作，该卷积操作包括空洞卷积操作。With reference to the second aspect, in some implementations of the second aspect, the search space of each basic unit further includes an optional operation type of each basic unit, and the optional operation type of each basic unit includes The convolution operation corresponding to the connection between any two nodes of , the convolution operation includes the hole convolution operation.

当RCNN中的可选操作类型包括空洞卷积操作时，能够在采用更少的卷积参数来实现基本相同的目标检测性能。When the optional operation type in RCNN includes atrous convolution operation, it can achieve basically the same object detection performance with fewer convolution parameters.

当RCNN的可选操作类型包括空洞卷积操作时，能够在卷积参数量基本相同的情况下，获得具有更好目标检测性能的目标检测网络。When the optional operation type of RCNN includes atrous convolution operation, a target detection network with better target detection performance can be obtained under the condition that the amount of convolution parameters is basically the same.

结合第二方面，在第二方面的某些实现方式中，每个基本单元内的任意两个节点之间的连接所对应的卷积操作包括间隔数为2的空洞卷积操作。With reference to the second aspect, in some implementations of the second aspect, the convolution operation corresponding to the connection between any two nodes in each basic unit includes an atrous convolution operation with an interval of 2.

结合第二方面，在第二方面的某些实现方式中，多个基本单元中的至少两个基本单元分别由不同数目的节点构成。In conjunction with the second aspect, in some implementations of the second aspect, at least two of the plurality of basic units are respectively composed of different numbers of nodes.

当RCNN中的多个基本单元中的至少两个基本单元可以由不同数目的节点构成时，使得RCNN中基本单元的构成更加自由，从而能够在确定RCNN的初始网络结构和对RCNN的初始网络结构进行更新时增加RCNN的网络结构的可能性，便于搜索到更好的RCNN的网络结构，使得最终更有可能得到目标检测性能更好的目标检测网络。When at least two of the basic units in the RCNN can be composed of different numbers of nodes, the composition of the basic units in the RCNN is more free, so that the initial network structure of the RCNN can be determined and the initial network structure of the RCNN can be determined. When updating, the possibility of increasing the network structure of RCNN is convenient to search for a better network structure of RCNN, which makes it more likely to obtain a target detection network with better target detection performance in the end.

结合第二方面，在第二方面的某些实现方式中，每个基本单元的输入特征图的分辨率与每个基本单元的输出特征图的分辨率相同。In conjunction with the second aspect, in some implementations of the second aspect, the resolution of the input feature map of each basic unit is the same as the resolution of the output feature map of each basic unit.

当RCNN中的每个基本单元的输入特征图的分辨率与每个基本单元的输出特征图的分辨率相同时，说明RCNN中的每个基本单元在处理特征图时不改变特征图的分辨率，便于保留特征图的信息。When the resolution of the input feature map of each basic unit in the RCNN is the same as the resolution of the output feature map of each basic unit, it means that each basic unit in the RCNN does not change the resolution of the feature map when processing the feature map , which is convenient to retain the information of the feature map.

结合第二方面，在第二方面的某些实现方式中，目标检测网络的搜索空间还包括特征融合层的搜索空间，特征融合层的搜索空间包括特征融合层的可选连接关系，特征融合层的可选连接关系包括多层神经网络中的相邻两层神经网络中的一层神经网络的任意一个节点与另一层神经网络中的任意一个节点的连接；目标检测网络的初始网络架构中的特征融合层是根据特征融合层的搜索空间确定的。In combination with the second aspect, in some implementations of the second aspect, the search space of the target detection network further includes the search space of the feature fusion layer, and the search space of the feature fusion layer includes the optional connection relationship of the feature fusion layer, and the feature fusion layer The optional connection relationship of the multi-layer neural network includes the connection between any node of one layer of neural network in the adjacent two-layer neural network in the multi-layer neural network and any node in the other layer of neural network; in the initial network architecture of the target detection network The feature fusion layer of is determined according to the search space of the feature fusion layer.

在本申请中，当根据特征融合层的搜索空间确定目标检测网络的初始网络架构中的特征融合层，并对特征融合层的网络架构进行更新，根据RCNN的搜索空间确定目标检测网络的初始网络架构中的RCNN，并对RCNN的网络架构进行更新时，能够同时实现对特征融合层的网络结构和RCNN的网络结构的更新和优化，从而使得最终优化得到的RCNN的网络结构与特征融合层的网络结构更加匹配，可以得到目标检测性能更好的目标检测网络。In this application, when the feature fusion layer in the initial network architecture of the target detection network is determined according to the search space of the feature fusion layer, and the network architecture of the feature fusion layer is updated, the initial network of the target detection network is determined according to the search space of the RCNN When the RCNN in the architecture is updated, the network structure of the feature fusion layer and the network structure of the RCNN can be updated and optimized at the same time, so that the final optimized RCNN network structure and feature fusion layer. The network structure is more matched, and the target detection network with better target detection performance can be obtained.

结合第二方面，在第二方面的某些实现方式中，特征融合层的搜索空间还包括特征融合层的可选操作类型，特征融合层的可选操作类型包括多层神经网络中的相邻两层神经网络中的一层神经网络的任意一个节点与另一层神经网络中的任意一个节点的连接所对应的卷积操作，其中，多层神经网络中的相邻两层神经网络中的一层神经网络的任意一个节点与另一层神经网络中的任意一个节点的连接所对应的卷积操作包括空洞卷积操作。With reference to the second aspect, in some implementations of the second aspect, the search space of the feature fusion layer further includes optional operation types of the feature fusion layer, and the optional operation types of the feature fusion layer include adjacent ones in the multi-layer neural network. The convolution operation corresponding to the connection between any node of one layer of neural network in the two-layer neural network and any node of the other layer of neural network, wherein, the adjacent two layers of neural network in the multi-layer neural network The convolution operation corresponding to the connection between any node of one layer of neural network and any node of another layer of neural network includes atrous convolution operation.

当特征融合层中的可选操作类型包括空洞卷积操作时，能够在采用更少的卷积参数来实现基本相同的目标检测性能。When the optional operation types in the feature fusion layer include atrous convolution operations, substantially the same object detection performance can be achieved with fewer convolution parameters.

结合第二方面，在第二方面的某些实现方式中，满足预设要求的目标检测网络满足下列条件中的至少一种：目标检测网络的检测性能满足预设性能要求；对目标检测网络的网络架构的更新次数大于或者等于预设次数；目标检测网络的复杂度小于或者等于预设复杂度。With reference to the second aspect, in some implementations of the second aspect, the target detection network that meets the preset requirements satisfies at least one of the following conditions: the detection performance of the target detection network meets the preset performance requirements; The update times of the network architecture are greater than or equal to the preset times; the complexity of the target detection network is less than or equal to the preset complexity.

结合第二方面，在第二方面的某些实现方式中，目标检测网络的复杂度是根据目标检测网络的模型参数的数量或者大小、目标检测网络的内存访问成本MAC以及目标检测网络的浮点运算次数中的至少一种确定的。With reference to the second aspect, in some implementations of the second aspect, the complexity of the target detection network is based on the number or size of model parameters of the target detection network, the memory access cost MAC of the target detection network, and the floating point of the target detection network. At least one of the number of operations is determined.

第三方面，提供了一种目标检测方法，该方法包括：获取图像；采用目标检测网络对图像进行处理，得到图像的目标检测结果，目标检测结果包括图像中的检测目标所处的位置和检测目标所属的分类结果；其中，目标检测网络包括骨干网络、特征融合层、区域候选网络RPN和区域卷积神经网络RCNN，目标检测网络是满足预设要求的检测网络，目标检测网络是根据目标检测网络的搜索空间对目标检测网络的初始网络架构进行迭代更新得到的，目标检测网络的初始网络架构是根据目标检测网络的搜索空间确定的；目标检测网络的搜索空间包括特征融合层的搜索空间，目标检测网络的初始网络架构中的特征融合层是根据特征融合层的搜索空间确定的，特征融合层的搜索空间包括特征融合层的可选连接关系，特征融合层的可选连接关系包括多层神经网络中的相邻两层神经网络中的一层神经网络的任意一个节点与另一层神经网络中的任意一个节点的连接。In a third aspect, a target detection method is provided, the method includes: acquiring an image; using a target detection network to process the image to obtain a target detection result of the image, where the target detection result includes the location of the detection target in the image and the detection result. The classification result of the target; the target detection network includes a backbone network, a feature fusion layer, a regional candidate network RPN, and a regional convolutional neural network RCNN. The target detection network is a detection network that meets preset requirements, and the target detection network is based on target detection. The search space of the network is obtained by iteratively updating the initial network architecture of the target detection network. The initial network architecture of the target detection network is determined according to the search space of the target detection network; the search space of the target detection network includes the search space of the feature fusion layer, The feature fusion layer in the initial network architecture of the target detection network is determined according to the search space of the feature fusion layer. The search space of the feature fusion layer includes the optional connection relationship of the feature fusion layer, and the optional connection relationship of the feature fusion layer includes multiple layers. The connection between any node of one layer of neural network in the adjacent two-layer neural network in the neural network and any node of the other layer of neural network.

由于上述目标检测方法采用的目标检测网络在构建过程中采用的特征融合层的搜索空间中包含的特征融合层的可选连接关系更多，因此，在根据特征融合层的搜索空间确定的特征融合层能够更好地进行特征融合，进而使得最终的到的目标检测网络在进行目标检测时具有更好的性能。Since the target detection network adopted by the above target detection method contains more optional connection relationships of the feature fusion layer in the search space of the feature fusion layer used in the construction process, the feature fusion layer determined according to the search space of the feature fusion layer contains more optional connection relationships. The layer can perform better feature fusion, so that the final target detection network has better performance in target detection.

结合第三方面，在第三方面的某些实现方式中，特征融合层的搜索空间还包括特征融合层的可选操作类型，特征融合层的可选操作类型包括多层神经网络中的相邻两层神经网络中的一层神经网络的任意一个节点与另一层神经网络中的任意一个节点的连接所对应的卷积操作，其中，该卷积操作包括空洞卷积操作。With reference to the third aspect, in some implementations of the third aspect, the search space of the feature fusion layer further includes optional operation types of the feature fusion layer, and the optional operation types of the feature fusion layer include adjacent ones in the multi-layer neural network. A convolution operation corresponding to the connection between any node of one layer of neural network in the two-layer neural network and any node of another layer of neural network, wherein the convolution operation includes a hole convolution operation.

当特征融合层中的可选操作类型包括空洞卷积操作时，能够在采用更少的卷积参数来实现基本相同的目标检测性能。另外，在卷积参数具有相同参数量的情况下，采用空洞卷积进行卷积处理能够取得比传统卷积更大的感受野，因此，能够使得最终得到的目标检测网络具有更好的目标检测性能。When the optional operation types in the feature fusion layer include atrous convolution operations, substantially the same object detection performance can be achieved with fewer convolution parameters. In addition, when the convolution parameters have the same amount of parameters, the convolution processing using atrous convolution can obtain a larger receptive field than the traditional convolution, so the final target detection network can have better target detection. performance.

结合第三方面，在第三方面的某些实现方式中，RCNN包括多个基本单元，多个基本单元中的每个基本单元由至少两个节点构成，目标检测网络的搜索空间还包括RCNN的搜索空间，RCNN的搜索空间包括多个基本单元中的每个基本单元的搜索空间，每个基本单元的搜索空间包括每个基本单元的可选连接关系，每个基本单元的可选连接关系包括每个基本单元内的任意两个节点之间的连接；目标检测网络的初始网络架构中的RCNN是根据RCNN的搜索空间确定的。With reference to the third aspect, in some implementations of the third aspect, the RCNN includes multiple basic units, each of the multiple basic units is composed of at least two nodes, and the search space of the target detection network also includes the RCNN. Search space, the search space of RCNN includes the search space of each basic unit in the plurality of basic units, the search space of each basic unit includes the optional connection relationship of each basic unit, and the optional connection relationship of each basic unit includes The connection between any two nodes within each basic unit; the RCNN in the initial network architecture of the object detection network is determined according to the search space of the RCNN.

结合第三方面，在第三方面的某些实现方式中，每个基本单元的搜索空间还包括每个基本单元的可选操作类型，每个基本单元的可选操作类型包括每个基本单元内的任意两个节点之间的连接所对应的卷积操作，每个基本单元内的任意两个节点之间的连接所对应的卷积操作包括空洞卷积操作。With reference to the third aspect, in some implementations of the third aspect, the search space of each basic unit further includes an optional operation type of each basic unit, and the optional operation type of each basic unit includes The convolution operation corresponding to the connection between any two nodes of , and the convolution operation corresponding to the connection between any two nodes in each basic unit includes the hole convolution operation.

当RCNN中的可选操作类型包括空洞卷积操作时，能够在采用更少的卷积参数来实现基本相同的目标检测性能。另外，当RCNN的可选操作类型包括空洞卷积操作时，能够在卷积参数量基本相同的情况下，获得具有更好目标检测性能的目标检测网络。When the optional operation type in RCNN includes atrous convolution operation, it can achieve basically the same object detection performance with fewer convolution parameters. In addition, when the optional operation type of RCNN includes atrous convolution operation, a target detection network with better target detection performance can be obtained under the condition that the amount of convolution parameters is basically the same.

具体地，在卷积参数具有相同参数量的情况下，采用空洞卷积进行卷积处理能够取得比传统卷积更大的感受野，因此，采用上述目标检测网络具有更好的目标检测性能。Specifically, when the convolution parameters have the same amount of parameters, the convolution processing using atrous convolution can achieve a larger receptive field than the traditional convolution. Therefore, using the above target detection network has better target detection performance.

结合第三方面，在第三方面的某些实现方式中，每个基本单元内的任意两个节点之间的连接所对应的卷积操作包括间隔数为2的空洞卷积操作。With reference to the third aspect, in some implementations of the third aspect, the convolution operation corresponding to the connection between any two nodes in each basic unit includes an atrous convolution operation with an interval of 2.

由于RCNN中处理的特征图的分辨率一般比较低，比较适合采用间隔数较小的空洞卷积操作，对于RCNN中的每个基本单元来说，当可选的操作包括间隔数为2的空洞卷积时，能够最终提高目标检测网络的目标检测性能。Since the resolution of the feature map processed in RCNN is generally low, it is more suitable to use a hole convolution operation with a small number of intervals. For each basic unit in RCNN, when the optional operation includes holes with a number of intervals of 2 When convolution, it can finally improve the target detection performance of the target detection network.

结合第三方面，在第三方面的某些实现方式中，多个基本单元中的至少两个基本单元分别由不同数目的节点构成。With reference to the third aspect, in some implementations of the third aspect, at least two basic units in the plurality of basic units are respectively constituted by different numbers of nodes.

结合第三方面，在第三方面的某些实现方式中，每个基本单元的输入特征图的分辨率与每个基本单元的输出特征图的分辨率相同。In conjunction with the third aspect, in some implementations of the third aspect, the resolution of the input feature map of each basic unit is the same as the resolution of the output feature map of each basic unit.

当RCNN中的每个基本单元的输入特征图的分辨率与每个基本单元的输出特征图的分辨率相同时，RCNN中的每个基本单元在处理特征图时不改变特征图的分辨率，便于保留特征图的信息，进而保证目标检测网络的目标检测性能。When the resolution of the input feature map of each basic unit in RCNN is the same as the resolution of the output feature map of each basic unit, each basic unit in RCNN does not change the resolution of the feature map when processing the feature map, It is convenient to retain the information of the feature map, thereby ensuring the target detection performance of the target detection network.

结合第三方面，在第三方面的某些实现方式中，上述目标检测网络满足下列条件中的至少一种：目标检测网络的检测性能满足预设性能要求；对目标检测网络的网络架构的更新次数大于或者等于预设次数；目标检测网络的复杂度小于或者等于预设复杂度。With reference to the third aspect, in some implementations of the third aspect, the above-mentioned target detection network satisfies at least one of the following conditions: the detection performance of the target detection network meets the preset performance requirements; the update of the network architecture of the target detection network The number of times is greater than or equal to the preset number of times; the complexity of the target detection network is less than or equal to the preset complexity.

结合第三方面，在第三方面的某些实现方式中，上述目标检测网络的复杂度是根据目标检测网络的模型参数的数量或者大小、目标检测网络的内存访问成本MAC以及目标检测网络的浮点运算次数中的至少一种确定的。With reference to the third aspect, in some implementations of the third aspect, the complexity of the above target detection network is based on the number or size of model parameters of the target detection network, the memory access cost MAC of the target detection network, and the floating point of the target detection network. At least one of the number of point operations is determined.

第四方面，提供了一种目标检测方法，该方法包括：获取图像；采用目标检测网络对图像进行处理，得到图像的目标检测结果，目标检测结果包括图像中的检测目标所处的位置和检测目标所属的分类结果；其中，目标检测网络包括骨干网络、特征融合层、区域候选网络RPN和区域卷积神经网络RCNN，目标检测网络是满足预设要求的检测网络，目标检测网络是根据目标检测网络的搜索空间对目标检测网络的初始网络架构进行迭代更新得到的，目标检测网络的初始网络架构是根据目标检测网络的搜索空间确定的；RCNN包括多个基本单元，多个基本单元中的每个基本单元由至少两个节点构成，目标检测网络的搜索空间包括RCNN的搜索空间，RCNN的搜索空间包括多个基本单元中的每个基本单元的搜索空间，每个基本单元的搜索空间包括每个基本单元的可选连接关系，每个基本单元的可选连接关系包括每个基本单元内的任意两个节点之间的连接，目标检测网络的初始网络架构中的RCNN是根据RCNN的搜索空间确定的。In a fourth aspect, a target detection method is provided, the method comprising: acquiring an image; using a target detection network to process the image to obtain a target detection result of the image, where the target detection result includes the position and detection result of the detection target in the image The classification result of the target; the target detection network includes a backbone network, a feature fusion layer, a regional candidate network RPN, and a regional convolutional neural network RCNN. The target detection network is a detection network that meets preset requirements, and the target detection network is based on target detection. The search space of the network is obtained by iteratively updating the initial network architecture of the target detection network. The initial network architecture of the target detection network is determined according to the search space of the target detection network; RCNN includes multiple basic units, and each of the multiple basic units. Each basic unit consists of at least two nodes, the search space of the target detection network includes the search space of RCNN, the search space of RCNN includes the search space of each basic unit in the plurality of basic units, and the search space of each basic unit includes each The optional connection relationship of each basic unit, the optional connection relationship of each basic unit includes the connection between any two nodes in each basic unit, the RCNN in the initial network architecture of the target detection network is based on the search space of RCNN definite.

由于上述目标检测方法采用的目标检测网络在构建过程中采用的RCNN的搜索空间中包含的RCNN的可选连接关系更多，因此，在根据RCNN的搜索空间确定的RCNN能够更好地进行特征融合，进而使得最终的到的目标检测网络在进行目标检测时具有更好的性能。此外，由于能够根据更多可选的连接关系来更合理的确定目标检测网络的初始网络架构中的RCNN的网络结构并对RCNN的网络架构进行更新，能够简化最终得到的目标检测网络的复杂度。Since the target detection network adopted by the above target detection method contains more optional connection relationships of RCNN in the search space of RCNN adopted in the construction process, the RCNN determined according to the search space of RCNN can better perform feature fusion. , so that the final target detection network has better performance in target detection. In addition, since the network structure of the RCNN in the initial network architecture of the target detection network can be more reasonably determined according to more optional connection relationships and the network architecture of the RCNN can be updated, the complexity of the final target detection network can be simplified. .

具体地，在本申请中，对于RCNN的搜索空间来说，由于采用了可选连接关系更加自由的搜索空间，因此，与手工设定网络架构的方式相比，本申请在根据RCNN的搜索空间确定目标检测网络的初始网络架构中的RCNN，并对RCNN的网络架构进行更新时，能够得到更加简化的RCNN的网络结构，从而能够最终简化目标检测网络的复杂度，减少目标检测网络部署时需要占用的存储空间。Specifically, in the present application, for the search space of RCNN, since the search space with optional connection relationship is more free, compared with the way of manually setting the network architecture, the present application uses the search space based on RCNN. Determine the RCNN in the initial network architecture of the target detection network, and when the network architecture of the RCNN is updated, a more simplified network structure of the RCNN can be obtained, which can finally simplify the complexity of the target detection network and reduce the target detection network. occupied storage space.

结合第四方面，在第四方面的某些实现方式中，每个基本单元的搜索空间还包括每个基本单元的可选操作类型，每个基本单元的可选操作类型包括每个基本单元内的任意两个节点之间的连接所对应的卷积操作，每个基本单元内的任意两个节点之间的连接所对应的卷积操作包括空洞卷积操作。With reference to the fourth aspect, in some implementations of the fourth aspect, the search space of each basic unit further includes an optional operation type of each basic unit, and the optional operation type of each basic unit includes The convolution operation corresponding to the connection between any two nodes of , and the convolution operation corresponding to the connection between any two nodes in each basic unit includes the hole convolution operation.

当RCNN中的可选操作类型包括空洞卷积操作时，目标检测网络能够利用更少的卷积参数来实现基本相同的目标检测性能。当RCNN的可选操作类型包括空洞卷积操作时，能够在卷积参数量基本相同的情况下，利用上述目标检测网络进行目标检测具有更好的检测性能。When the optional operation types in RCNN include atrous convolution operations, the object detection network is able to achieve essentially the same object detection performance with fewer convolution parameters. When the optional operation type of RCNN includes atrous convolution operation, the above target detection network can be used for target detection with better detection performance under the condition that the amount of convolution parameters is basically the same.

结合第四方面，在第四方面的某些实现方式中，每个基本单元内的任意两个节点之间的连接所对应的卷积操作包括间隔数为2的空洞卷积操作。With reference to the fourth aspect, in some implementations of the fourth aspect, the convolution operation corresponding to the connection between any two nodes in each basic unit includes an atrous convolution operation with an interval of 2.

结合第四方面，在第四方面的某些实现方式中，多个基本单元中的至少两个基本单元分别由不同数目的节点构成。With reference to the fourth aspect, in some implementations of the fourth aspect, at least two basic units in the plurality of basic units are respectively composed of different numbers of nodes.

结合第四方面，在第四方面的某些实现方式中，每个基本单元的输入特征图的分辨率与每个基本单元的输出特征图的分辨率相同。In conjunction with the fourth aspect, in some implementations of the fourth aspect, the resolution of the input feature map of each basic unit is the same as the resolution of the output feature map of each basic unit.

结合第四方面，在第四方面的某些实现方式中，目标检测网络的搜索空间还包括特征融合层的搜索空间，特征融合层的搜索空间包括特征融合层的可选连接关系，特征融合层的可选连接关系包括多层神经网络中的相邻两层神经网络中的一层神经网络的任意一个节点与另一层神经网络中的任意一个节点的连接；目标检测网络的初始网络架构中的特征融合层是根据特征融合层的搜索空间确定的。In combination with the fourth aspect, in some implementations of the fourth aspect, the search space of the target detection network further includes the search space of the feature fusion layer, and the search space of the feature fusion layer includes the optional connection relationship of the feature fusion layer, and the feature fusion layer The optional connection relationship of the multi-layer neural network includes the connection between any node of one layer of the neural network in the two adjacent layers of the multi-layer neural network and any node in the other layer of the neural network; in the initial network architecture of the target detection network The feature fusion layer of is determined according to the search space of the feature fusion layer.

结合第四方面，在第四方面的某些实现方式中，特征融合层的搜索空间还包括特征融合层的可选操作类型，特征融合层的可选操作类型包括多层神经网络中的相邻两层神经网络中的一层神经网络的任意一个节点与另一层神经网络中的任意一个节点的连接所对应的卷积操作，其中，多层神经网络中的相邻两层神经网络中的一层神经网络的任意一个节点与另一层神经网络中的任意一个节点的连接所对应的卷积操作包括空洞卷积操作。With reference to the fourth aspect, in some implementations of the fourth aspect, the search space of the feature fusion layer further includes optional operation types of the feature fusion layer, and the optional operation types of the feature fusion layer include adjacent ones in the multi-layer neural network. The convolution operation corresponding to the connection between any node of one layer of neural network in the two-layer neural network and any node in the other layer of neural network, wherein, the adjacent two-layer neural network in the multi-layer neural network is the convolution operation. The convolution operation corresponding to the connection between any node of one layer of neural network and any node of another layer of neural network includes atrous convolution operation.

结合第四方面，在第四方面的某些实现方式中，目标检测网络满足下列条件中的至少一种：目标检测网络的检测性能满足预设性能要求；对目标检测网络的网络架构的更新次数大于或者等于预设次数；目标检测网络的复杂度小于或者等于预设复杂度。With reference to the fourth aspect, in some implementations of the fourth aspect, the target detection network satisfies at least one of the following conditions: the detection performance of the target detection network meets the preset performance requirements; the number of updates to the network architecture of the target detection network Greater than or equal to the preset number of times; the complexity of the target detection network is less than or equal to the preset complexity.

结合第四方面，在第四方面的某些实现方式中，目标检测网络的复杂度是根据目标检测网络的模型参数的数量或者大小、目标检测网络的内存访问成本MAC以及目标检测网络的浮点运算次数中的至少一种确定的。With reference to the fourth aspect, in some implementations of the fourth aspect, the complexity of the target detection network is based on the number or size of model parameters of the target detection network, the memory access cost MAC of the target detection network, and the floating point of the target detection network. At least one of the number of operations is determined.

第五方面，提供了一种目标检测网络的构建装置，该装置包括用于执行上述第一方面或者第二方面中的任意一种实现方式中的方法的模块。In a fifth aspect, an apparatus for constructing a target detection network is provided, where the apparatus includes a module for executing the method in any one of the implementation manners of the first aspect or the second aspect.

第五方面，提供了一种目标检测装置，该装置包括用于执行上述第三方面或者第四方面中的任意一种实现方式中的方法的模块。In a fifth aspect, a target detection apparatus is provided. The apparatus includes a module for executing the method in any one of the implementation manners of the third aspect or the fourth aspect.

第六方面，提供了一种目标检测网络的构建装置，该装置包括：存储器，用于存储程序；处理器，用于执行所述存储器存储的程序，当所述存储器存储的程序被执行时，所述处理器用于执行第一方面或者第二方面中的任意一种实现方式中的方法。In a sixth aspect, a device for constructing a target detection network is provided, the device comprising: a memory for storing a program; a processor for executing the program stored in the memory, when the program stored in the memory is executed, The processor is configured to execute the method in any one of the implementation manners of the first aspect or the second aspect.

第七方面，提供了一种目标检测装置，该装置包括：存储器，用于存储程序；处理器，用于执行所述存储器存储的程序，当所述存储器存储的程序被执行时，所述处理器用于执行第三方面或者第四方面中的任意一种实现方式中的方法。In a seventh aspect, a target detection device is provided, the device comprising: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the processing The device is configured to execute the method in any one of the implementation manners of the third aspect or the fourth aspect.

第八方面，提供一种计算机可读介质，该计算机可读介质存储用于设备执行的程序代码，该程序代码包括用于执行第一方面至第四方面中的任意一种实现方式中的方法。In an eighth aspect, a computer-readable medium is provided, where the computer-readable medium stores program codes for device execution, the program codes including methods for executing any one of the implementation manners of the first aspect to the fourth aspect .

第九方面，提供一种包含指令的计算机程序产品，当该计算机程序产品在计算机上运行时，使得计算机执行上述第一方面至第四方面中的任意一种实现方式中的方法。In a ninth aspect, there is provided a computer program product containing instructions, which, when the computer program product is run on a computer, causes the computer to execute the method in any one of the implementation manners of the first aspect to the fourth aspect.

第十方面，提供一种芯片，所述芯片包括处理器与数据接口，所述处理器通过所述数据接口读取存储器上存储的指令，执行上述第一方面至第四方面中的任意一种实现方式中的方法。A tenth aspect provides a chip, the chip includes a processor and a data interface, the processor reads an instruction stored in a memory through the data interface, and executes any one of the first to fourth aspects above method in the implementation.

可选地，作为一种实现方式，所述芯片还可以包括存储器，所述存储器中存储有指令，所述处理器用于执行所述存储器上存储的指令，当所述指令被执行时，所述处理器用于执行第一方面至第四方面中的任意一种实现方式中的方法。Optionally, as an implementation manner, the chip may further include a memory, in which instructions are stored, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in any one of the implementation manners of the first aspect to the fourth aspect.

附图说明Description of drawings

图1是本申请实施例提供的系统架构的结构示意图；1 is a schematic structural diagram of a system architecture provided by an embodiment of the present application;

图2是利用本申请实施例提供的卷积神经网络模型进行目标检测的示意图；2 is a schematic diagram of target detection using the convolutional neural network model provided by an embodiment of the present application;

图3是本申请实施例提供的一种芯片硬件结构示意图；3 is a schematic diagram of a chip hardware structure provided by an embodiment of the present application;

图4是本申请实施例提供的一种系统架构的示意图；4 is a schematic diagram of a system architecture provided by an embodiment of the present application;

图5是本申请实施例的目标检测系统的示意图；5 is a schematic diagram of a target detection system according to an embodiment of the present application;

图6是目标检测系统应用在自动驾驶领域的示意图；FIG. 6 is a schematic diagram of the application of the target detection system in the field of automatic driving;

图7是本申请实施例的目标检测网络的构建方法的示意性流程图；7 is a schematic flowchart of a method for constructing a target detection network according to an embodiment of the present application;

图8是特征融合层一种可能的结构示意图；Fig. 8 is a possible structural schematic diagram of the feature fusion layer;

图9是RCNN的一种可能的结构的示意图；FIG. 9 is a schematic diagram of a possible structure of RCNN;

图10是特征融合层一种可能的结构示意图；FIG. 10 is a schematic diagram of a possible structure of the feature fusion layer;

图11是RCNN的一种可能的结构的示意图；Figure 11 is a schematic diagram of a possible structure of RCNN;

图12是本申请实施例的目标检测网络的构建方法的示意性流程图；12 is a schematic flowchart of a method for constructing a target detection network according to an embodiment of the present application;

图13是本申请实施例的目标检测方法的示意性流程图；13 is a schematic flowchart of a target detection method according to an embodiment of the present application;

图14是本申请实施例的目标检测网络的构建装置的示意性框图；14 is a schematic block diagram of an apparatus for constructing a target detection network according to an embodiment of the present application;

图15是本申请实施例的目标检测装置的示意性框图；15 is a schematic block diagram of a target detection apparatus according to an embodiment of the present application;

图16是本申请实施例的目标检测网络的构建装置的示意性框图；16 is a schematic block diagram of an apparatus for constructing a target detection network according to an embodiment of the present application;

图17是本申请实施例的目标检测装置的示意性框图。FIG. 17 is a schematic block diagram of a target detection apparatus according to an embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行描述。The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

本申请的方案可以应用在辅助驾驶、自动驾驶、平安城市、智能终端等需要进行目标检测(例如，对图像中的行人进行检测)的领域。下面对两种较为常用的应用场景进行简单的介绍。The solution of the present application can be applied in the fields of assisted driving, automatic driving, safe city, intelligent terminal and other fields that need to perform target detection (for example, detection of pedestrians in images). The two most commonly used application scenarios are briefly introduced below.

应用场景一：辅助/自动驾驶系统Application Scenario 1: Assisted/Autonomous Driving System

在高级驾驶辅助系统(advanced driving assistant system，ADAS)和自动驾驶系统(autonomous driving system，ADS)中需要对路面上行人或者障碍物进行检测和躲避，尤其是要避免碰撞行人，这就需要进行较为准确的目标检测。In advanced driving assistant system (ADAS) and autonomous driving system (ADS), it is necessary to detect and avoid pedestrians or obstacles on the road, especially to avoid collision with pedestrians, which requires more Accurate object detection.

应用场景二：平安城市/视频监控系统Application Scenario 2: Safe City/Video Surveillance System

在平安城市系统和视频监控系统中通过实时进行目标检测(检测行人或者车辆)，标出检测结果，并将检测结果系统的分析单元中，可以用于查找犯罪嫌疑人、失踪人口以及特定车辆等。In the safe city system and video surveillance system, the target detection (detection of pedestrians or vehicles) is carried out in real time, the detection results are marked, and the detection results can be used in the analysis unit of the system to find criminal suspects, missing persons and specific vehicles, etc. .

本申请的方案涉及神经网络的构建和利用神经网络进行目标检测，为了更好地理解本申请方案，下面先对神经网络的相关术语和概念进行介绍。The solution of the present application involves the construction of a neural network and the use of a neural network for target detection. In order to better understand the solution of the present application, the related terms and concepts of the neural network are first introduced below.

(1)神经网络(1) Neural network

神经网络可以是由神经单元组成的，神经单元可以是指以x_s和截距1为输入的运算单元，该运算单元的输出可以如公式(1)所示：A neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes x _s and an intercept 1 as inputs, and the output of the operation unit can be shown in formula (1):

其中，s＝1、2、……n，n为大于1的自然数，W_s为x_s的权重，b为神经单元的偏置。f为神经单元的激活函数(activation functions)，该激活函数用于对神经网络中的特征进行非线性变换，从而将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入，激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络，即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连，来提取局部接受域的特征，局部接受域可以是由若干个神经单元组成的区域。Among them, s=1, 2, ... n, n is a natural number greater than 1, W _s is the weight of x _s , and b is the bias of the neural unit. f is an activation function of the neural unit, and the activation function is used to perform nonlinear transformation on the features in the neural network, so as to convert the input signal in the neural unit into an output signal. The output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function. A neural network is a network formed by connecting a plurality of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.

(2)深度神经网络(2) Deep neural network

深度神经网络(deep neural network，DNN)，也称多层神经网络，可以理解为具有多层隐含层的神经网络。按照不同层的位置对DNN进行划分，DNN内部的神经网络可以分为三类：输入层，隐含层，输出层。一般来说第一层是输入层，最后一层是输出层，中间的层数都是隐含层。层与层之间是全连接的，也就是说，第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。Deep neural network (deep neural network, DNN), also known as multi-layer neural network, can be understood as a neural network with multiple hidden layers. The DNN is divided according to the positions of different layers. The neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.

虽然DNN看起来很复杂，但是就每一层的工作来说，其实并不复杂，简单来说就是如下线性关系表达式：

其中，

是输入向量，

是输出向量，

是偏移向量，W是权重矩阵(也称系数)，α()是激活函数。每一层仅仅是对输入向量

经过如此简单的操作得到输出向量

由于DNN层数多，系数W和偏移向量

的数量也比较多。这些参数在DNN中的定义如下所述：以系数W为例，假设在一个三层的DNN中，第二层的第4个神经元到第三层的第2个神经元的线性系数定义为

上标3代表系数W所在的层数，而下标对应的是输出的第三层索引2和输入的第二层索引4。Although DNN looks complicated, it is not complicated in terms of the work of each layer. In short, it is the following linear relationship expression:

in,

is the input vector,

is the output vector,

is the offset vector, W is the weight matrix (also called coefficients), and α() is the activation function. Each layer is just an input vector

After such a simple operation to get the output vector

Due to the large number of DNN layers, the coefficient W and offset vector

The number is also higher. The definitions of these parameters in DNN are as follows: Taking the coefficient W as an example, suppose that in a three-layer DNN, the linear coefficient from the fourth neuron in the second layer to the second neuron in the third layer is defined as

The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.

综上，第L-1层的第k个神经元到第L层的第j个神经元的系数定义为

To sum up, the coefficient from the kth neuron in the L-1 layer to the jth neuron in the Lth layer is defined as

需要注意的是，输入层是没有W参数的。在深度神经网络中，更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言，参数越多的模型复杂度越高，“容量”也就越大，也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程，其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。It should be noted that the input layer does not have a W parameter. In a deep neural network, more hidden layers allow the network to better capture the complexities of the real world. In theory, a model with more parameters is more complex and has a larger "capacity", which means that it can complete more complex learning tasks. Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vectors W of many layers).

(3)卷积神经网络(3) Convolutional Neural Network

卷积神经网络(convolutional neuron network，CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器，该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中，一个神经元可以只与部分邻层神经元连接。一个卷积层中，通常包含若干个特征平面，每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重，这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。卷积核可以以随机大小的矩阵的形式初始化，在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外，共享权重带来的直接好处是减少卷积神经网络各层之间的连接，同时又降低了过拟合的风险。Convolutional neural network (CNN) is a deep neural network with a convolutional structure. A convolutional neural network consists of a feature extractor consisting of convolutional layers and subsampling layers, which can be viewed as a filter. The convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal. In a convolutional layer of a convolutional neural network, a neuron can only be connected to some of its neighbors. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some neural units arranged in a rectangle. Neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract image information is independent of location. The convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights by learning during the training process of the convolutional neural network. In addition, the immediate benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.

(4)残差网络(4) Residual network

残差网络是在2015年提出的一种深度卷积网络，相比于传统的卷积神经网络，残差网络更容易优化，并且能够通过增加相当的深度来提高准确率。残差网络的核心是解决了增加深度带来的副作用(退化问题)，这样能够通过单纯地增加网络深度，来提高网络性能。残差网络一般会包含很多结构相同的子模块，通常会采用残差网络(residualnetwork，ResNet)连接一个数字表示子模块重复的次数，比如ResNet50表示残差网络中有50个子模块。Residual network is a deep convolutional network proposed in 2015. Compared with traditional convolutional neural network, residual network is easier to optimize and can improve the accuracy by increasing a considerable depth. The core of the residual network is to solve the side effect (degeneration problem) of increasing the depth, which can improve the network performance by simply increasing the network depth. Residual networks generally contain many sub-modules with the same structure. Usually, a residual network (ResNet) is used to connect a number to indicate the number of repetitions of the sub-modules. For example, ResNet50 indicates that there are 50 sub-modules in the residual network.

(6)分类器(6) Classifier

很多神经网络结构最后都有一个分类器，用于对图像中的物体进行分类。分类器一般由全连接层(fully connected layer)和softmax函数(可以称为归一化指数函数)组成，能够根据输入而输出不同类别的概率。Many neural network architectures end up with a classifier that classifies objects in an image. The classifier is generally composed of a fully connected layer and a softmax function (which can be called a normalized exponential function), which can output the probability of different categories according to the input.

(7)损失函数(7) Loss function

在训练深度神经网络的过程中，因为希望深度神经网络的输出尽可能的接近真正想要预测的值，所以可以通过比较当前网络的预测值和真正想要的目标值，再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然，在第一次更新之前通常会有初始化的过程，即为深度神经网络中的各层预先配置参数)，比如，如果网络的预测值高了，就调整权重向量让它预测低一些，不断地调整，直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此，就需要预先定义“如何比较预测值和目标值之间的差异”，这便是损失函数(loss function)或目标函数(objective function)，它们是用于衡量预测值和目标值的差异的重要方程。其中，以损失函数举例，损失函数的输出值(loss)越高表示差异越大，那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。In the process of training a deep neural network, because it is hoped that the output of the deep neural network is as close as possible to the value you really want to predict, you can compare the predicted value of the current network with the target value you really want, and then based on the difference between the two to update the weight vector of each layer of neural network (of course, there is usually an initialization process before the first update, that is, to pre-configure parameters for each layer in the deep neural network), for example, if the predicted value of the network If it is high, adjust the weight vector to make the prediction lower, and keep adjusting until the deep neural network can predict the real desired target value or a value very close to the real desired target value. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", which is the loss function or objective function, which is used to measure the difference between the predicted value and the target value. important equation. Among them, taking the loss function as an example, the higher the output value of the loss function (loss), the greater the difference, then the training of the deep neural network becomes the process of reducing the loss as much as possible.

(8)反向传播算法(8) Back propagation algorithm

神经网络可以采用误差反向传播(back propagation，BP)算法在训练过程中修正初始的神经网络模型中参数的数值，使得神经网络模型的重建误差损失越来越小。具体地，前向传递输入信号直至输出会产生误差损失，通过反向传播误差损失信息来更新初始的神经网络模型中参数，从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动，旨在得到最优的神经网络模型的参数，例如权重矩阵。The neural network can use the error back propagation (BP) algorithm to correct the values of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, the input signal is passed forward until the output will generate error loss, and the parameters in the initial neural network model are updated by back-propagating the error loss information, so that the error loss converges. The back-propagation algorithm is a back-propagation movement dominated by error loss, aiming to obtain the parameters of the optimal neural network model, such as the weight matrix.

以上对神经网络的一些基本内容做了简单介绍，下面针对图像数据处理时可能用到的一些特定神经网络进行介绍。Some basic contents of neural networks are briefly introduced above. The following is an introduction to some specific neural networks that may be used in image data processing.

下面结合图1对本申请实施例的系统架构进行详细的介绍。The following describes the system architecture of the embodiment of the present application in detail with reference to FIG. 1 .

图1是本申请实施例的系统架构的示意图。如图1所示，系统架构100包括执行设备110、训练设备120、数据库130、客户设备140、数据存储系统150、以及数据采集系统160。FIG. 1 is a schematic diagram of a system architecture of an embodiment of the present application. As shown in FIG. 1 , the system architecture 100 includes an execution device 110 , a training device 120 , a database 130 , a client device 140 , a data storage system 150 , and a data acquisition system 160 .

另外，执行设备110包括计算模块111、I/O接口112、预处理模块113和预处理模块114。其中，计算模块111中可以包括目标模型/规则101，预处理模块113和预处理模块114是可选的。In addition, the execution device 110 includes a calculation module 111 , an I/O interface 112 , a preprocessing module 113 and a preprocessing module 114 . The calculation module 111 may include the target model/rule 101, and the preprocessing module 113 and the preprocessing module 114 are optional.

数据采集设备160用于采集训练数据。针对本申请实施例的目标检测方法来说，训练数据可以包括训练图像(该训练图像中包括行人)以及标注数据，其中，标注数据中给出了训练图片中的存在行人的包围框(bounding box)的坐标。在采集到训练数据之后，数据采集设备160将这些训练数据存入数据库130，训练设备120基于数据库130中维护的训练数据训练得到目标模型/规则101。The data collection device 160 is used to collect training data. For the target detection method according to the embodiment of the present application, the training data may include a training image (including pedestrians in the training image) and labeling data, wherein the labeling data provides a bounding box of pedestrians in the training picture. )coordinate of. After collecting the training data, the data collection device 160 stores the training data in the database 130 , and the training device 120 obtains the target model/rule 101 by training based on the training data maintained in the database 130 .

下面对训练设备120基于训练数据得到目标模型/规则101进行描述，训练设备120对输入的训练图像进行物体检测，将输出的目标检测结果(图像中行人车辆等目标的包围框以及包围框的置信度)与标注结果进行对比，直到训练设备120输出的物体的目标检测结果与预先标注的结果的差异小于一定的阈值，从而完成目标模型/规则101的训练。The following describes the target model/rule 101 obtained by the training device 120 based on the training data. The training device 120 performs object detection on the input training image, and outputs the target detection result (the bounding box of objects such as pedestrians and vehicles in the image and the bounding box of the bounding box). The confidence level) is compared with the labeling result until the difference between the target detection result of the object output by the training device 120 and the pre-labeled result is less than a certain threshold, so that the training of the target model/rule 101 is completed.

上述目标模型/规则101能够用于实现本申请实施例的目标检测方法，即，将待处理图像(通过相关预处理后)输入该目标模型/规则101，即可得到待处理图像的目标检测结果。本申请实施例中的目标模型/规则101具体可以为神经网络。需要说明的是，在实际应用中，数据库130中维护的训练数据不一定都来自于数据采集设备160的采集，也有可能是从其他设备接收得到的。另外需要说明的是，训练设备120也不一定完全基于数据库130维护的训练数据进行目标模型/规则101的训练，也有可能从云端或其他地方获取训练数据进行模型训练，上述描述不应该作为对本申请实施例的限定。The above target model/rule 101 can be used to implement the target detection method of the embodiment of the present application, that is, input the image to be processed (after relevant preprocessing) into the target model/rule 101, and then the target detection result of the image to be processed can be obtained . The target model/rule 101 in this embodiment of the present application may specifically be a neural network. It should be noted that, in practical applications, the training data maintained in the database 130 does not necessarily come from the collection of the data collection device 160, and may also be received from other devices. In addition, it should be noted that the training device 120 may not necessarily train the target model/rule 101 completely based on the training data maintained by the database 130, and may also obtain training data from the cloud or other places for model training. The above description should not be used as a reference to this application Limitations of Examples.

根据训练设备120训练得到的目标模型/规则101可以应用于不同的系统或设备中，如应用于图1所示的执行设备110，所述执行设备110可以是终端，如手机终端，平板电脑，笔记本电脑，增强现实(augmented reality，AR)/虚拟现实(virtual reality，VR)，车载终端等，还可以是服务器或者云端等。在图1中，执行设备110配置输入/输出(input/output，I/O)接口112，用于与外部设备进行数据交互，用户可以通过客户设备140向I/O接口112输入数据，所述输入数据在本申请实施例中可以包括：客户设备输入的待处理图像。这里的客户设备140具体可以是终端设备。The target model/rule 101 trained according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. Notebook computers, augmented reality (AR)/virtual reality (VR), in-vehicle terminals, etc., may also be servers or the cloud. In FIG. 1, the execution device 110 is configured with an input/output (I/O) interface 112 for data interaction with external devices, and the user can input data to the I/O interface 112 through the client device 140, the In this embodiment of the present application, the input data may include: an image to be processed input by the client device. The client device 140 here may specifically be a terminal device.

预处理模块113和预处理模块114用于根据I/O接口112接收到的输入数据(如待处理图像)进行预处理，在本申请实施例中，可以没有预处理模块113和预处理模块114或者只有的一个预处理模块。当不存在预处理模块113和预处理模块114时，可以直接采用计算模块111对输入数据进行处理。The preprocessing module 113 and the preprocessing module 114 are used to perform preprocessing according to the input data (such as the image to be processed) received by the I/O interface 112. In this embodiment of the present application, the preprocessing module 113 and the preprocessing module 114 may be absent. Or just a preprocessing module. When the preprocessing module 113 and the preprocessing module 114 do not exist, the calculation module 111 can be directly used to process the input data.

在执行设备110对输入数据进行预处理，或者在执行设备110的计算模块111执行计算等相关的处理过程中，执行设备110可以调用数据存储系统150中的数据、代码等以用于相应的处理，也可以将相应处理得到的数据、指令等存入数据存储系统150中。When the execution device 110 preprocesses the input data, or the calculation module 111 of the execution device 110 performs calculations and other related processing, the execution device 110 can call the data, codes, etc. in the data storage system 150 for corresponding processing , the data and instructions obtained by corresponding processing may also be stored in the data storage system 150 .

最后，I/O接口112将处理结果，如将目标模型/规则101计算得到的目标检测结果呈现给客户设备140，从而提供给用户。Finally, the I/O interface 112 presents the processing result, such as the target detection result calculated by the target model/rule 101, to the client device 140, thereby providing it to the user.

具体地，经过计算模块111中的目标模型/规则101处理得到的目标检测结果可以通过预处理模块113(也可以再加上预处理模块114的处理)的处理后将处理结果送入到I/O接口，再由I/O接口将处理结果送入到客户设备140中显示。Specifically, the target detection result obtained through the processing of the target model/rule 101 in the calculation module 111 can be processed by the preprocessing module 113 (and the processing by the preprocessing module 114 can also be added), and then the processing result can be sent to the I/O The O interface, and then the I/O interface sends the processing result to the client device 140 for display.

应理解，当上述系统架构100中不存在预处理模块113和预处理模块114时，计算模块111还可以将处理得到的目标检测结果传输到I/O接口，然后再由I/O接口将处理结果送入到客户设备140中显示。It should be understood that when the preprocessing module 113 and the preprocessing module 114 do not exist in the above-mentioned system architecture 100, the computing module 111 can also transmit the processed target detection result to the I/O interface, and then the I/O interface will process the result. The results are sent to the client device 140 for display.

值得说明的是，训练设备120可以针对不同的目标或称不同的任务，基于不同的训练数据生成相应的目标模型/规则101，该相应的目标模型/规则101即可以用于实现上述目标或完成上述任务，从而为用户提供所需的结果。It is worth noting that the training device 120 can generate corresponding target models/rules 101 based on different training data for different goals or tasks, and the corresponding target models/rules 101 can be used to achieve the above goals or complete The above task, thus providing the user with the desired result.

在图1中，用户可以手动给定输入数据，该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下，客户设备140可以自动地向I/O接口112发送输入数据，如果要求客户设备140自动发送输入数据需要获得用户的授权，则用户可以在客户设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果，具体的呈现形式可以是显示、声音、动作等具体方式。客户设备140也可以作为数据采集端，采集如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果作为新的样本数据，并存入数据库130。当然，也可以不经过客户设备140进行采集，而是由I/O接口112直接将如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果，作为新的样本数据存入数据库130。In FIG. 1 , the user can manually specify input data, which can be operated through the interface provided by the I/O interface 112 . In another case, the client device 140 can automatically send the input data to the I/O interface 112 . If the user's authorization is required to request the client device 140 to automatically send the input data, the user can set the corresponding permission in the client device 140 . The user can view the result output by the execution device 110 on the client device 140, and the specific presentation form can be a specific manner such as display, sound, and action. The client device 140 can also be used as a data collection terminal to collect the input data of the input I/O interface 112 and the output result of the output I/O interface 112 as new sample data as shown in the figure, and store them in the database 130 . Of course, it is also possible not to collect through the client device 140, but the I/O interface 112 directly uses the input data input into the I/O interface 112 and the output result of the output I/O interface 112 as shown in the figure as a new sample The data is stored in database 130 .

值得注意的是，图1仅是本申请实施例提供的一种系统架构的示意图，图中所示设备、器件、模块等之间的位置关系不构成任何限制，例如，在图1中，数据存储系统150相对执行设备110是外部存储器，在其它情况下，也可以将数据存储系统150置于执行设备110中。It is worth noting that FIG. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 1 , the data The storage system 150 is an external memory relative to the execution device 110 , and in other cases, the data storage system 150 may also be placed in the execution device 110 .

如图1所示，根据训练设备120训练得到目标模型/规则101，可以是本申请实施例中的神经网络，具体的，本申请实施例提供的神经网络可以是CNN以及深度卷积神经网络(deep convolutional neural networks,DCNN)等等。As shown in FIG. 1 , the target model/rule 101 obtained by training the training device 120 may be a neural network in the embodiment of the present application. Specifically, the neural network provided in the embodiment of the present application may be a CNN and a deep convolutional neural network ( deep convolutional neural networks, DCNN) and so on.

由于CNN是一种非常常见的神经网络，下面结合图2重点对CNN的结构进行详细的介绍。如上文的基础概念介绍所述，卷积神经网络是一种带有卷积结构的深度神经网络，是一种深度学习(deep learning)架构，深度学习架构是指通过机器学习的算法，在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构，CNN是一种前馈(feed-forward)人工神经网络，该前馈人工神经网络中的各个神经元可以对输入其中的图像作出响应。Since CNN is a very common neural network, the structure of CNN will be introduced in detail in conjunction with Figure 2 below. As mentioned in the introduction to the basic concepts above, a convolutional neural network is a deep neural network with a convolutional structure and a deep learning architecture. learning at multiple levels of abstraction. As a deep learning architecture, CNN is a feed-forward artificial neural network in which individual neurons can respond to images fed into it.

如图2所示，卷积神经网络(CNN)200可以包括输入层210，卷积层/池化层220(其中池化层为可选的)，以及全连接层(fully connected layer)230。下面对这些层的相关内容做详细介绍。As shown in FIG. 2 , a convolutional neural network (CNN) 200 may include an input layer 210 , a convolutional/pooling layer 220 (where the pooling layer is optional), and a fully connected layer 230 . The relevant contents of these layers are described in detail below.

卷积层/池化层220：Convolutional layer/pooling layer 220:

卷积层：Convolutional layer:

如图2所示卷积层/池化层220可以包括如示例221-226层，举例来说：在一种实现中，221层为卷积层，222层为池化层，223层为卷积层，224层为池化层，225为卷积层，226为池化层；在另一种实现方式中，221、222为卷积层，223为池化层，224、225为卷积层，226为池化层。即卷积层的输出可以作为随后的池化层的输入，也可以作为另一个卷积层的输入以继续进行卷积操作。As shown in FIG. 2, the convolutional/pooling layer 220 may include layers 221-226 as examples, for example: in one implementation, layer 221 is a convolutional layer, layer 222 is a pooling layer, and layer 223 is a convolutional layer Layer 224 is a pooling layer, 225 is a convolutional layer, and 226 is a pooling layer; in another implementation, 221 and 222 are convolutional layers, 223 are pooling layers, and 224 and 225 are convolutional layers. layer, 226 is the pooling layer. That is, the output of a convolutional layer can be used as the input of a subsequent pooling layer, or it can be used as the input of another convolutional layer to continue the convolution operation.

下面将以卷积层221为例，介绍一层卷积层的内部工作原理。The following will take the convolutional layer 221 as an example to introduce the inner working principle of a convolutional layer.

卷积层221可以包括很多个卷积算子，卷积算子也称为核，其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器，卷积算子本质上可以是一个权重矩阵，这个权重矩阵通常被预先定义，在对图像进行卷积操作的过程中，权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理，从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关，需要注意的是，权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的，在进行卷积运算的过程中，权重矩阵会延伸到输入图像的整个深度。因此，和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出，但是大多数情况下不使用单一权重矩阵，而是应用多个尺寸(行×列)相同的权重矩阵，即多个同型矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度，这里的维度可以理解为由上面所述的“多个”来决定。不同的权重矩阵可以用来提取图像中不同的特征，例如一个权重矩阵用来提取图像边缘信息，另一个权重矩阵用来提取图像的特定颜色，又一个权重矩阵用来对图像中不需要的噪点进行模糊化等。该多个权重矩阵尺寸(行×列)相同，经过该多个尺寸相同的权重矩阵提取后的卷积特征图的尺寸也相同，再将提取到的多个尺寸相同的卷积特征图合并形成卷积运算的输出。The convolution layer 221 may include many convolution operators. The convolution operator is also called a kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix. The convolution operator is essentially Can be a weight matrix, which is usually pre-defined, usually one pixel by one pixel (or two pixels by two pixels) along the horizontal direction on the input image during the convolution operation on the image. ...It depends on the value of the stride step) to process, so as to complete the work of extracting specific features from the image. The size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image. During the convolution operation, the weight matrix will be extended to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will result in a single depth dimension of the convolutional output, but in most cases a single weight matrix is not used, but multiple weight matrices of the same size (row × column) are applied, That is, multiple isotype matrices. The output of each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" described above. Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to extract unwanted noise in the image. Blur, etc. The multiple weight matrices have the same size (row×column), and the size of the convolution feature maps extracted from the multiple weight matrices with the same size is also the same, and then the multiple extracted convolution feature maps with the same size are combined to form The output of the convolution operation.

这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到，通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息，从而使得卷积神经网络200进行正确的预测。The weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained by training can be used to extract information from the input image, so that the convolutional neural network 200 can make correct predictions .

当卷积神经网络200有多个卷积层的时候，初始的卷积层(例如221)往往提取较多的一般特征，该一般特征也可以称之为低级别的特征；随着卷积神经网络200深度的加深，越往后的卷积层(例如226)提取到的特征越来越复杂，比如高级别的语义之类的特征，语义越高的特征越适用于待解决的问题。When the convolutional neural network 200 has multiple convolutional layers, the initial convolutional layer (eg, 221 ) often extracts more general features, which can also be called low-level features; with the convolutional neural network As the depth of the network 200 deepens, the features extracted by the later convolutional layers (eg, 226) become more and more complex, such as features such as high-level semantics. Features with higher semantics are more suitable for the problem to be solved.

池化层：Pooling layer:

由于常常需要减少训练参数的数量，因此卷积层之后常常需要周期性的引入池化层，在如图2中220所示例的221-226各层，可以是一层卷积层后面跟一层池化层，也可以是多层卷积层后面接一层或多层池化层。在图像处理过程中，池化层的唯一目的就是减少图像的空间大小。池化层可以包括平均池化算子和/或最大池化算子，以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值作为平均池化的结果。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外，就像卷积层中用权重矩阵的大小应该与图像尺寸相关一样，池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸，池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。Since it is often necessary to reduce the number of training parameters, it is often necessary to periodically introduce a pooling layer after the convolutional layer. In the layers 221-226 as shown in 220 in Figure 2, it can be a convolutional layer followed by a layer. The pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers. During image processing, the only purpose of pooling layers is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a max pooling operator for sampling the input image to obtain a smaller size image. The average pooling operator can calculate the pixel values in the image within a certain range to produce an average value as the result of average pooling. The max pooling operator can take the pixel with the largest value within a specific range as the result of max pooling. Also, just as the size of the weight matrix used in the convolutional layer should be related to the size of the image, the operators in the pooling layer should also be related to the size of the image. The size of the output image after processing by the pooling layer can be smaller than the size of the image input to the pooling layer, and each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer.

全连接层230：Fully connected layer 230:

在经过卷积层/池化层220的处理后，卷积神经网络200还不足以输出所需要的输出信息。因为如前所述，卷积层/池化层220只会提取特征，并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或其他相关信息)，卷积神经网络200需要利用全连接层230来生成一个或者一组所需要的类的数量的输出。因此，在全连接层230中可以包括多层隐含层(如图2所示的231、232至23n)以及输出层240，该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到，例如该任务类型可以包括图像识别，图像分类，图像超分辨率重建等等。After being processed by the convolutional layer/pooling layer 220, the convolutional neural network 200 is not sufficient to output the required output information. Because as mentioned before, the convolutional layer/pooling layer 220 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other relevant information), the convolutional neural network 200 needs to utilize the fully connected layer 230 to generate one or a set of outputs of the required number of classes. Therefore, the fully connected layer 230 may include multiple hidden layers (231, 232 to 23n as shown in FIG. 2) and the output layer 240, and the parameters contained in the multiple hidden layers may be based on specific task types The relevant training data is pre-trained, for example, the task type can include image recognition, image classification, image super-resolution reconstruction and so on.

在全连接层230中的多层隐含层之后，也就是整个卷积神经网络200的最后层为输出层240，该输出层240具有类似分类交叉熵的损失函数，具体用于计算预测误差，一旦整个卷积神经网络200的前向传播(如图2由210至240方向的传播为前向传播)完成，反向传播(如图2由240至210方向的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差，以减少卷积神经网络200的损失，及卷积神经网络200通过输出层输出的结果和理想结果之间的误差。After the multi-layer hidden layers in the fully connected layer 230, that is, the last layer of the entire convolutional neural network 200 is the output layer 240, the output layer 240 has a loss function similar to the classification cross entropy, and is specifically used to calculate the prediction error, Once the forward propagation of the entire convolutional neural network 200 (as shown in Figure 2, the propagation from the direction 210 to 240 is forward propagation) is completed, the back propagation (as shown in Figure 2, the propagation from the 240 to 210 direction is the back propagation) will Start to update the weight values and biases of the aforementioned layers to reduce the loss of the convolutional neural network 200 and the error between the result output by the convolutional neural network 200 through the output layer and the ideal result.

需要说明的是，如图2所示的卷积神经网络200仅作为一种卷积神经网络的示例，在具体的应用中，卷积神经网络还可以以其他网络模型的形式存在。It should be noted that the convolutional neural network 200 shown in FIG. 2 is only used as an example of a convolutional neural network, and in a specific application, the convolutional neural network may also exist in the form of other network models.

应理解，可以采用图2所示的卷积神经网络(CNN)200执行本申请实施例的目标检测方法，如图2所示，待处理图像经过输入层210、卷积层/池化层220和全连接层230的处理之后可以得到待处理图像的检测结果(待处理图像中的存在行人的包围框以及图像中存在行人的包围框的置信度)。It should be understood that a convolutional neural network (CNN) 200 shown in FIG. 2 can be used to perform the target detection method of the embodiment of the present application. As shown in FIG. 2 , the image to be processed passes through the input layer 210, the convolution layer/pooling layer 220 After processing with the fully connected layer 230, the detection result of the image to be processed (the bounding box of pedestrians in the to-be-processed image and the confidence level of the bounding box of pedestrians in the image) can be obtained.

图3为本申请实施例提供的一种芯片硬件结构，该芯片包括神经网络处理器50。该芯片可以被设置在如图1所示的执行设备110中，用以完成计算模块111的计算工作。该芯片也可以被设置在如图1所示的训练设备120中，用以完成训练设备120的训练工作并输出目标模型/规则101。如图2所示的卷积神经网络中各层的算法均可在如图3所示的芯片中得以实现。FIG. 3 is a hardware structure of a chip provided by an embodiment of the application, and the chip includes a neural network processor 50 . The chip can be set in the execution device 110 as shown in FIG. 1 to complete the calculation work of the calculation module 111 . The chip can also be set in the training device 120 as shown in FIG. 1 to complete the training work of the training device 120 and output the target model/rule 101 . The algorithms of each layer in the convolutional neural network shown in Figure 2 can be implemented in the chip shown in Figure 3.

神经网络处理器(neural-network processing unit，NPU)50作为协处理器挂载到主中央处理器(central processing unit，CPU)(host CPU)上，由主CPU分配任务。NPU的核心部分为运算电路503，控制器504控制运算电路503提取存储器(权重存储器或输入存储器)中的数据并进行运算。A neural-network processing unit (NPU) 50 is mounted on a main central processing unit (central processing unit, CPU) (host CPU) as a co-processor, and tasks are allocated by the main CPU. The core part of the NPU is the operation circuit 503, and the controller 504 controls the operation circuit 503 to extract the data in the memory (weight memory or input memory) and perform operations.

在一些实现中，运算电路503内部包括多个处理单元(process engine,PE)。在一些实现中，运算电路503是二维脉动阵列。运算电路503还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中，运算电路503是通用的矩阵处理器。In some implementations, the arithmetic circuit 503 includes multiple processing units (process engines, PEs). In some implementations, arithmetic circuit 503 is a two-dimensional systolic array. The arithmetic circuit 503 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 503 is a general-purpose matrix processor.

举例来说，假设有输入矩阵A，权重矩阵B，输出矩阵C。运算电路503从权重存储器502中取矩阵B相应的数据，并缓存在运算电路503中每一个PE上。运算电路503从输入存储器501中取矩阵A数据与矩阵B进行矩阵运算，得到的矩阵的部分结果或最终结果，保存在累加器(accumulator)508中。For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The operation circuit 503 fetches the data corresponding to the matrix B from the weight memory 502 and buffers it on each PE in the operation circuit 503 . The operation circuit 503 takes the data of the matrix A from the input memory 501 and performs the matrix operation on the matrix B, and stores the partial result or the final result of the matrix in the accumulator 508 .

向量计算单元507可以对运算电路503的输出做进一步处理，如向量乘，向量加，指数运算，对数运算，大小比较等等。例如，向量计算单元507可以用于神经网络中非卷积/非FC层的网络计算，如池化(pooling)，批归一化(batch normalization)，局部响应归一化(local response normalization)等。The vector calculation unit 507 can further process the output of the operation circuit 503, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on. For example, the vector computing unit 507 can be used for network computation of non-convolutional/non-FC layers in the neural network, such as pooling, batch normalization, local response normalization, etc. .

在一些实现中，向量计算单元能507将经处理的输出的向量存储到统一缓存器506。例如，向量计算单元507可以将非线性函数应用到运算电路503的输出，例如累加值的向量，用以生成激活值。在一些实现中，向量计算单元507生成归一化的值、合并值，或二者均有。在一些实现中，处理过的输出的向量能够用作到运算电路503的激活输入，例如用于在神经网络中的后续层中的使用。In some implementations, the vector computation unit can 507 store the processed output vectors to the unified buffer 506 . For example, the vector calculation unit 507 may apply a nonlinear function to the output of the arithmetic circuit 503, such as a vector of accumulated values, to generate activation values. In some implementations, vector computation unit 507 generates normalized values, merged values, or both. In some implementations, the vector of processed outputs can be used as activation input to the arithmetic circuit 503, eg, for use in subsequent layers in a neural network.

统一存储器506用于存放输入数据以及输出数据。Unified memory 506 is used to store input data and output data.

权重数据直接通过存储单元访问控制器505(direct memory accesscontroller，DMAC)将外部存储器中的输入数据搬运到输入存储器501和/或统一存储器506、将外部存储器中的权重数据存入权重存储器502，以及将统一存储器506中的数据存入外部存储器。The weight data directly transfers the input data in the external memory to the input memory 501 and/or the unified memory 506 through the storage unit access controller 505 (direct memory access controller, DMAC), stores the weight data in the external memory into the weight memory 502, and The data in unified memory 506 is stored in external memory.

总线接口单元(bus interface unit，BIU)510，用于通过总线实现主CPU、DMAC和取指存储器509之间进行交互。A bus interface unit (bus interface unit, BIU) 510 is used to realize the interaction between the main CPU, the DMAC and the instruction fetch memory 509 through the bus.

与控制器504连接的取指存储器(instruction fetch buffer)509，用于存储控制器504使用的指令；an instruction fetch buffer 509 connected to the controller 504 for storing instructions used by the controller 504;

控制器504，用于调用指存储器509中缓存的指令，实现控制该运算加速器的工作过程。The controller 504 is used for invoking the instructions cached in the memory 509 to control the working process of the operation accelerator.

一般地，统一存储器506，输入存储器501，权重存储器502以及取指存储器509均为片上(on-chip)存储器，外部存储器为该NPU外部的存储器，该外部存储器可以为双倍数据率同步动态随机存储器(double data rate synchronous dynamic randomaccessmemory，简称DDR SDRAM)、高带宽存储器(high bandwidth memory，HBM)或其他可读可写的存储器。Generally, the unified memory 506, the input memory 501, the weight memory 502 and the instruction fetch memory 509 are all on-chip memories, and the external memory is the memory outside the NPU, and the external memory can be double data rate synchronous dynamic random access. Memory (double data rate synchronous dynamic random access memory, referred to as DDR SDRAM), high bandwidth memory (high bandwidth memory, HBM) or other readable and writable memory.

另外，在本申请中，图2所示的卷积神经网络中各层的运算可以由运算电路503或向量计算单元507执行。In addition, in this application, the operation of each layer in the convolutional neural network shown in FIG. 2 may be performed by the operation circuit 503 or the vector calculation unit 507 .

如图4所示，本申请实施例提供了一种系统架构300。该系统架构包括本地设备301、本地设备302以及执行设备210和数据存储系统250，其中，本地设备301和本地设备302通过通信网络与执行设备210连接。As shown in FIG. 4 , an embodiment of the present application provides a system architecture 300 . The system architecture includes a local device 301, a local device 302, an execution device 210 and a data storage system 250, wherein the local device 301 and the local device 302 are connected with the execution device 210 through a communication network.

执行设备210可以由一个或多个服务器实现。可选的，执行设备210可以与其它计算设备配合使用，例如：数据存储器、路由器、负载均衡器等设备。执行设备210可以布置在一个物理站点上，或者分布在多个物理站点上。执行设备210可以使用数据存储系统250中的数据，或者调用数据存储系统250中的程序代码来实现本申请实施例的搜索神经网络结构的方法。The execution device 210 may be implemented by one or more servers. Optionally, the execution device 210 may be used in conjunction with other computing devices, such as data storage, routers, load balancers and other devices. The execution device 210 may be arranged on one physical site, or distributed across multiple physical sites. The execution device 210 may use the data in the data storage system 250 or call the program code in the data storage system 250 to implement the method for searching a neural network structure in this embodiment of the present application.

具体地，执行设备210可以执行以下过程：确定目标检测网络的搜索空间；根据目标检测网络的搜索空间确定目标检测网络的初始网络架构；根据目标检测网络的搜索空间，对目标检测网络的初始网络架构进行迭代更新，直到得到满足预设要求的所述目标检测网络。Specifically, the execution device 210 may perform the following processes: determine the search space of the target detection network; determine the initial network architecture of the target detection network according to the search space of the target detection network; The architecture is iteratively updated until the target detection network that meets the preset requirements is obtained.

通过上述过程执行设备210能够搭建成一个目标神经网络，该目标神经网络可以用于目标检测。Through the above process execution device 210, a target neural network can be built, and the target neural network can be used for target detection.

用户可以操作各自的用户设备(例如本地设备301和本地设备302)与执行设备210进行交互。每个本地设备可以表示任何计算设备，例如个人计算机、计算机工作站、智能手机、平板电脑、智能摄像头、智能汽车或其他类型蜂窝电话、媒体消费设备、可穿戴设备、机顶盒、游戏机等。A user may operate respective user devices (eg, local device 301 and local device 302 ) to interact with execution device 210 . Each local device may represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, gaming console, etc.

每个用户的本地设备可以通过任何通信机制/通信标准的通信网络与执行设备210进行交互，通信网络可以是广域网、局域网、点对点连接等方式，或它们的任意组合。Each user's local device can interact with the execution device 210 through any communication mechanism/standard communication network, which can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.

在一种实现方式中，本地设备301、本地设备302从执行设备210获取到目标神经网络的相关参数，将目标神经网络部署在本地设备301、本地设备302上，利用该目标神经网络进行目标检测。In an implementation manner, the local device 301 and the local device 302 obtain the relevant parameters of the target neural network from the execution device 210, deploy the target neural network on the local device 301 and the local device 302, and use the target neural network to perform target detection. .

在另一种实现中，执行设备210上可以直接部署目标神经网络，执行设备210通过从本地设备301和本地设备302获取待处理图像(本地设备301和本地设备302可以将图待处理图像上传给执行设备210)，并根据目标神经网络对待处理图像进行目标检测，并将目标检测结果发送给本地设备301和本地设备302。In another implementation, the target neural network can be directly deployed on the execution device 210, and the execution device 210 obtains the image to be processed from the local device 301 and the local device 302 (the local device 301 and the local device 302 can upload the image to be processed to the Execute the device 210), and perform target detection on the image to be processed according to the target neural network, and send the target detection result to the local device 301 and the local device 302.

上述执行设备210也可以称为云端设备，此时执行设备210一般部署在云端。The above execution device 210 may also be referred to as a cloud device, and in this case, the execution device 210 is generally deployed in the cloud.

下面结合图5对目标检测系统进行详细的介绍。The target detection system is described in detail below with reference to FIG. 5 .

图5是本申请实施例的目标检测系统的示意图。FIG. 5 is a schematic diagram of a target detection system according to an embodiment of the present application.

如图5所示，目标检测系统包括骨干网络(backbone)、特征融合层、候选区域生成网络(region proposal network，RPN)以及区域卷积神经网络(region CNN，RCNN)，其中，骨干网络还可以称为主干网络，下面对目标检测网络中的这四部分网络结构分别进行详细的介绍。As shown in Figure 5, the target detection system includes a backbone network (backbone), a feature fusion layer, a region proposal network (RPN) and a region convolutional neural network (region CNN, RCNN). Called the backbone network, the following describes the four network structures in the target detection network in detail.

骨干网络：Backbone network:

骨干网络用于提取底层图片信息，是基于视觉的深度神经网络模型的通用结构。在实际中，骨干网络通常是基于一般的深度卷积神经网络的架构微调而成。例如，骨干网络可以基于视觉几何组(visual geometry group，VGG)网络(该网络是牛津大学的视觉几何组(visual geometry group)提出一种网络)的架构微调而成。再如，骨干网络还可以基于深度残差网络(deep residual network,ResNet)的架构微调而成。The backbone network is used to extract the underlying image information and is a common structure of vision-based deep neural network models. In practice, the backbone network is usually fine-tuned based on the architecture of general deep convolutional neural networks. For example, the backbone network can be fine-tuned based on the architecture of the Visual Geometry Group (VGG) network, a network proposed by the Visual Geometry Group at Oxford University. For another example, the backbone network can also be fine-tuned based on the architecture of a deep residual network (ResNet).

如图5所示，骨干网络可以对待检测图像进行特征提取，得到待检测图像的图像特征。As shown in Figure 5, the backbone network can perform feature extraction on the image to be detected to obtain image features of the image to be detected.

特征融合层：Feature fusion layer:

特征融合层用于对骨干网络提取得到的多尺度多层次的特征进行筛选融合，生成更紧凑更有表现力的特征向量，以便于输入分类器后作进一步处理。特征融合层在基于多尺度多层次特征的神经网络设计中被广泛使用。在实际应用中，一方面可以利用金字塔结构调整不同尺度特征的大小形状和权重，并把结果相加融合成单一的特征向量，另一方面通过跳转链接同时把不同层次的特征连接起来，挖掘出更富有表现力的多层次特征。The feature fusion layer is used to filter and fuse the multi-scale and multi-level features extracted by the backbone network to generate a more compact and expressive feature vector, which is easy to input into the classifier for further processing. Feature fusion layers are widely used in neural network design based on multi-scale and multi-level features. In practical applications, on the one hand, the pyramid structure can be used to adjust the size, shape and weight of features at different scales, and the results can be added and fused into a single feature vector. More expressive multi-level features.

如图5所示，特征融合层对骨干网络提取的待检测图像的图像特征进行融合处理，得到待检测图像的多层次特征。As shown in FIG. 5 , the feature fusion layer performs fusion processing on the image features of the image to be detected extracted by the backbone network to obtain multi-level features of the image to be detected.

RPN：RPN:

RPN是一个用于生成粗糙的目标位置和类标信息(类别标签信息)的快速回归分类器。在实际应用中，RPN可以采用包含二值分类器和边框回归(边框回归是一个用于目标检测的回归模型，在滑动窗口得到的目标定位附近寻找一个与真实窗口更接近，损失函数值更小的回归窗口)的两层简单网络来实现。RPN is a fast regression classifier for generating coarse object location and class label information (class label information). In practical applications, RPN can use a binary classifier and bounding box regression (the bounding box regression is a regression model for target detection, looking for a target location near the sliding window that is closer to the real window, and the loss function value is smaller. The regression window) is implemented by a two-layer simple network.

如图5所示，RPN对特征融合层得到的待检测图像的多层次特征进行处理，得到目标的初步分类检测结果。As shown in Figure 5, RPN processes the multi-level features of the image to be detected obtained by the feature fusion layer, and obtains the preliminary classification and detection results of the target.

RCNN：RCNN:

RCNN也可以称为RCNN头，RCNN是目标检测网络的特有部分，用于进一步优化RPN得到的初步分类检测结果，在实际应用中，RCNN一般要通过比RPN复杂的多的多层网络来实现。通过RCNN与RPN的结合，可以使得目标检测系统能快速的去除大量无效的图像区域，并能集中力量细致的检测更有潜力的图像区域，从而提高目标检测的效果。RCNN can also be called RCNN head. RCNN is a unique part of the target detection network, which is used to further optimize the preliminary classification detection results obtained by RPN. In practical applications, RCNN is generally implemented by a multi-layer network that is more complex than RPN. Through the combination of RCNN and RPN, the target detection system can quickly remove a large number of invalid image areas, and can concentrate on detecting more potential image areas in detail, thereby improving the effect of target detection.

如图5所示，RCNN对RPN得到的初步分类检测结果进行进一步处理，得到目标的分类和目标的包围框。As shown in Figure 5, RCNN further processes the preliminary classification detection results obtained by RPN to obtain the classification of the target and the bounding box of the target.

利用本申请实施例的目标检测网络的构建方法得到的目标检测系统可以应用在自动驾驶场景中。在自动驾驶的目标识别场景中，需要准确识别出道路上的车辆行人等目标。自动驾驶系统需要实时对道路上突发事件做出响应，因此要求搭载的目标检测系统能在有限的硬件资源上得到高效准确的目标识别结果。由于本申请实施例的目标检测网络的构建方法构建得到的目标检测系统能够更有效的进行目标检测，因此，采用本申请实施例的目标检测网络的构建方法得到的目标检测系统能够提高目标检测效果，提高自动驾驶的安全性能。The target detection system obtained by using the method for constructing a target detection network in the embodiment of the present application can be applied in an automatic driving scenario. In the target recognition scene of automatic driving, it is necessary to accurately identify the targets such as vehicles and pedestrians on the road. The automatic driving system needs to respond to emergencies on the road in real time, so the target detection system is required to obtain efficient and accurate target recognition results on limited hardware resources. Since the target detection system constructed by the method for constructing the target detection network in the embodiment of the present application can perform target detection more effectively, the target detection system constructed by the method for constructing the target detection network in the embodiment of the present application can improve the target detection effect , to improve the safety performance of autonomous driving.

图6是目标检测系统应用在自动驾驶领域的示意图。Figure 6 is a schematic diagram of the application of the target detection system in the field of automatic driving.

如图6所示，通过车载摄像头可以获取视频数据，该视频数据有一系列道路画面组成。通过目标检测模块对视频数据进行处理，能够检测出道路画面中存在的障碍物和障碍物的位置。接下来，目标检测模块可以将检测结果(道路画面中存在的障碍物和障碍物的位置)输入到行动检测模块，使得行动检测模块能够根据该检测结果生成驾驶操作信号，并将驾驶操作信号发送给相应的自动驾驶的执行设备，从而实现对车辆的自动控制。As shown in FIG. 6 , video data can be obtained through the on-board camera, and the video data is composed of a series of road pictures. The video data is processed by the target detection module, and the obstacles and the positions of the obstacles existing in the road picture can be detected. Next, the target detection module can input the detection results (obstacles in the road picture and the positions of the obstacles) to the action detection module, so that the action detection module can generate a driving operation signal according to the detection result, and send the driving operation signal To the corresponding automatic driving executive equipment, so as to realize the automatic control of the vehicle.

另外，图6的目标检测模块中的目标检测系统可以是根据本申请实施例的目标检测网络的构建方法得到的神经网络(模型)。In addition, the target detection system in the target detection module of FIG. 6 may be a neural network (model) obtained according to the construction method of the target detection network in the embodiment of the present application.

图7是本申请实施例的目标检测网络的构建方法的示意性流程图。图7所示的方法可以由本申请实施例的目标检测网络的构建装置执行(例如，图7所示的方法可以由下文中的图14或者图16所示的装置来执行)，图7所示的方法包括步骤1001至1003，下面对这些步骤分别进行详细的介绍。FIG. 7 is a schematic flowchart of a method for constructing a target detection network according to an embodiment of the present application. The method shown in FIG. 7 may be executed by the apparatus for constructing a target detection network in this embodiment of the present application (for example, the method shown in FIG. 7 may be executed by the apparatus shown in FIG. 14 or FIG. 16 below), as shown in FIG. 7 The method includes steps 1001 to 1003, and these steps are described in detail below.

1001、确定目标检测网络的搜索空间。1001. Determine a search space of the target detection network.

上述目标检测网络的搜索空间包括可以特征融合层的搜索空间。The search space of the above target detection network includes a search space that can feature fusion layers.

对于特征融合层的搜索空间来说，可以包括特征融合层的可选连接关系。For the search space of the feature fusion layer, the optional connection relationship of the feature fusion layer can be included.

具体地，特征融合层的可选连接关系可以包括特征融合层的多层神经网络中的相邻两层神经网络中的一层神经网络的任意一个节点与另一层神经网络中的任意一个节点的连接。Specifically, the optional connection relationship of the feature fusion layer may include any node of one layer of neural network in two adjacent layers of neural network in the multi-layer neural network of the feature fusion layer and any node of another layer of neural network Connection.

例如，如图8所示，特征融合层包括A层神经网络、B层神经网络和C层神经网络。其中，各层神经网络的组成节点如下所示。For example, as shown in Fig. 8, the feature fusion layer includes A-layer neural network, B-layer neural network and C-layer neural network. Among them, the constituent nodes of each layer of neural network are as follows.

A层神经网络：P_{0}_1、P_{0}_2、P_{0}_3和P_{0}_4。A-layer neural network: P_{0}_1, P_{0}_2, P_{0}_3, and P_{0}_4.

B层神经网络：P_{1}_1、P_{2}_2、P_{3}_3和P_{4}_4。B-layer neural network: P_{1}_1, P_{2}_2, P_{3}_3, and P_{4}_4.

C层神经网络：P_{2}_1、P_{2}_2、P_{2}_3和P_{2}_4。C-layer neural network: P_{2}_1, P_{2}_2, P_{2}_3, and P_{2}_4.

在图8中，A层神经网络中的任意一个节点可以与B层神经网络中的任意一个节点相连，B层神经网络中的任意一个节点可以与C层神经网络中的任意一个节点相连。In Figure 8, any node in the A layer neural network can be connected with any node in the B layer neural network, and any node in the B layer neural network can be connected with any node in the C layer neural network.

由于从A层神经网络到B层神经网络，以及从B层神经网络到C层神经网络，考虑到了所有层之间的跳转链接，可以构建出网络结构更好的特征融合层。Since the A-layer neural network to the B-layer neural network, and from the B-layer neural network to the C-layer neural network, considering the jump links between all layers, a feature fusion layer with a better network structure can be constructed.

为了便于显示，图8仅示出了A层神经网络中的一个节点与B层神经网络中的节点的连接关系，以及B层中的一个节点与C层神经网络中的节点的连接关系。For convenience of display, FIG. 8 only shows the connection relationship between a node in the layer A neural network and the node in the layer B neural network, and the connection relationship between a node in the layer B and the node in the layer C neural network.

具体地，如图8所示，A层神经网络中的节点P_{0}_2可以与B层神经网络中的节点P_{1}_1，P_{1}_2，P_{1}_3和P_{1}_4连接；B层神经网络中的节点P_{1}_1可以与C层神经网络中的节点P_{2}_1，P_{2}_2，P_{2}_3和P_{2}_4连接。应理解，图8仅仅示出了特征融合层的一种具体情况，本申请对特征融合层中包含的神经网络的层数以及每层神经网络包含的节点的个数不做限定。Specifically, as shown in Fig. 8, the node P_{0}_2 in the A layer neural network can be connected with the nodes P_{1}_1, P_{1}_2, P_{1}_3 and P_{ in the B layer neural network 1}_4 connection; the node P_{1}_1 in the B layer neural network can be connected with the nodes P_{2}_1, P_{2}_2, P_{2}_3 and P_{2}_4 in the C layer neural network . It should be understood that FIG. 8 only shows a specific situation of the feature fusion layer, and the present application does not limit the number of neural network layers included in the feature fusion layer and the number of nodes included in each layer of the neural network.

此外，上述目标检测网络的搜索空间还可以包括RCNN的搜索空间。In addition, the search space of the above target detection network may also include the search space of RCNN.

其中，上述目标检测网络的RCNN包括多个基本单元，该多个基本单元中的每个基本单元由至少两个节点构成，上述RCNN的搜索空间包括多个基本单元中的每个基本单元的搜索空间，每个基本单元的搜索空间包括每个基本单元的可选连接关系，每个基本单元的可选连接关系包括每个基本单元内的任意两个节点之间的连接。Wherein, the RCNN of the above-mentioned target detection network includes a plurality of basic units, each basic unit in the plurality of basic units is composed of at least two nodes, and the search space of the above-mentioned RCNN includes the search of each basic unit in the plurality of basic units space, the search space of each basic unit includes the optional connection relationship of each basic unit, and the optional connection relationship of each basic unit includes the connection between any two nodes in each basic unit.

例如，如图9所示，RCNN包括图9上方的基本单元1和图9下方的基本单元2，下面分别对基本单元1和基本单元2进行介绍。For example, as shown in FIG. 9 , the RCNN includes the basic unit 1 at the top of FIG. 9 and the basic unit 2 at the bottom of FIG. 9 , and the basic unit 1 and the basic unit 2 are respectively introduced below.

基本单元1：Basic Unit 1:

基本单元1由7个节点构成，其中包括2个输入节点(H_{0},H_{0})，4个中间节点(0,1,2,3)和1个输出节点(H_{1})。Basic unit 1 consists of 7 nodes, including 2 input nodes (H_{0}, H_{0}), 4 intermediate nodes (0, 1, 2, 3) and 1 output node (H_{1} ).

在基本单元1中，除了2个输入节点之间不能直接相连之外，剩余的任意两个节点之间可以直接相连。In basic unit 1, except that the two input nodes cannot be directly connected, any two remaining nodes can be directly connected.

基本单元2：Basic Unit 2:

基本单元2由7个节点构成，其中包括2个输入节点(H_{0},H_{1})，4个中间节点(0,1,2,3)和1个输出节点(H_{2})。Basic unit 2 consists of 7 nodes, including 2 input nodes (H_{0}, H_{1}), 4 intermediate nodes (0, 1, 2, 3) and 1 output node (H_{2} ).

上述图9仅示出了基本单元1和基本单元2其中的一种可能的连接关系。The above-mentioned FIG. 9 only shows one possible connection relationship between the base unit 1 and the base unit 2 .

对于上述基本单元1和基本单元2来说，基本单元1或者基本单元2内部的节点在进行连接的时候是按照从输入到输出的方向进行连接的。For the above-mentioned basic unit 1 and basic unit 2, the nodes inside the basic unit 1 or the basic unit 2 are connected according to the direction from input to output when connecting.

1002、根据目标检测网络的搜索空间确定目标检测网络的初始网络架构。1002. Determine an initial network architecture of the target detection network according to the search space of the target detection network.

其中，上述目标检测网络的初始网络架构包括特征融合层的网络架构，该特征融合层的网络架构是根据特征融合层的搜索空间确定的，也就是说，在步骤1002中，可以是根据特征融合层的搜索空间来确定目标检测网络的初始网络架构中的特征融合层的网络架构。The initial network architecture of the above target detection network includes the network architecture of the feature fusion layer, and the network architecture of the feature fusion layer is determined according to the search space of the feature fusion layer, that is, in step 1002, it may be based on the feature fusion layer. The search space of layers is used to determine the network architecture of the feature fusion layer in the initial network architecture of the object detection network.

另外，上述目标检测网络的初始网络架构也可以包括RCNN的网络架构，该RCNN的网络架构是根据RCNN的搜索空间确定的，也就是说，在步骤1002中，可以是根据RCNN的搜索空间来确定RCNN的网络架构。In addition, the initial network architecture of the above-mentioned target detection network may also include the network architecture of the RCNN, and the network architecture of the RCNN is determined according to the search space of the RCNN, that is, in step 1002, it may be determined according to the search space of the RCNN. The network architecture of RCNN.

应理解，在上述步骤1002中确定目标检测网络的初始网络架构时，目标检测网络的网络层次数目以及目标检测网络包含的节点个数可以是预先确定好的。It should be understood that when the initial network architecture of the target detection network is determined in the above step 1002, the number of network layers of the target detection network and the number of nodes included in the target detection network may be predetermined.

具体地，在构建目标检测网络之前，可以先根据待构建的目标检测网络的应用需求或者目标检测性能的要求来确定目标检测网络的网络层次数目和目标检测网络包含的节点个数。Specifically, before constructing the target detection network, the number of network layers of the target detection network and the number of nodes included in the target detection network may be determined according to the application requirements or target detection performance requirements of the target detection network to be constructed.

例如，当对目标检测网络的目标检测性能要求较高时，目标检测网络的网络层次数目可以比较多，目标检测网络包含的节点个数也可以比较多，而当对目标检测网络的目标检测性能要求较低时，目标检测网络的网络层次数目可以较少，目标检测网络包含的节点个数也可以比较少。For example, when the target detection performance of the target detection network is required to be high, the number of network layers of the target detection network can be relatively large, and the number of nodes contained in the target detection network can also be relatively large. When the requirements are low, the number of network layers of the target detection network can be smaller, and the number of nodes included in the target detection network can also be relatively small.

1003、根据目标检测网络的搜索空间，对目标检测网络的初始网络架构进行迭代更新，直到得到满足预设要求的目标检测网络。1003. According to the search space of the target detection network, iteratively update the initial network architecture of the target detection network until a target detection network that meets the preset requirements is obtained.

应理解，在步骤1003中，在对目标检测网络的初始网络架构进行迭代更新时，既可以对目标检测网络中的目标检测网络的初始网络架构中的特征融合层的网络架构和RCNN的网络架构中的任意一种进行更新，也可以对目标检测网络的初始网络架构中的特征融合层的网络架构和RCNN的网络架构同时进行更新。It should be understood that in step 1003, when the initial network architecture of the target detection network is iteratively updated, both the network architecture of the feature fusion layer and the network architecture of the RCNN in the initial network architecture of the target detection network in the target detection network can be updated. Any one of them can be updated, and the network architecture of the feature fusion layer and the network architecture of RCNN in the initial network architecture of the target detection network can also be updated at the same time.

具体地，在步骤1003中，可以在每次对目标检测网络的网络架构进行迭代更新后，确认更新网络的目标检测网络是否满足要求，如果不满足预设要求的话，继续更新该目标检测网络的网络架构，直到得到满足预设要求的目标检测网络。Specifically, in step 1003, after each iterative update of the network architecture of the target detection network, it is possible to confirm whether the target detection network of the updated network meets the requirements, and if it does not meet the preset requirements, continue to update the target detection network of the target detection network. Network architecture until a target detection network that meets the preset requirements is obtained.

在上述目标检测网络中，骨干网络的网络架构以及RPN的网络结构可以是预先确定好的。这样在对上述目标检测网络的初始网络架构进行更新的过程中，可以只对目标检测网络的初始网络架构中的特征融合层的网络架构或者目标检测网络的初始网络架构中的RCNN的网络架构进行更新。In the above target detection network, the network structure of the backbone network and the network structure of the RPN may be predetermined. In this way, in the process of updating the initial network architecture of the target detection network, only the network architecture of the feature fusion layer in the initial network architecture of the target detection network or the network architecture of the RCNN in the initial network architecture of the target detection network can be updated. renew.

另外，上述骨干网络的网络架构以及RPN的网络结构也可以是事先未确定好的网络架构，这样在对目标检测网络的初始网络架构进行更新的过程中，可以对目标检测网络的骨干网络、特征融合层、RPN以及RCNN的网络架构都进行更新。In addition, the network architecture of the above-mentioned backbone network and the network architecture of the RPN may also be a network architecture that has not been determined in advance, so that in the process of updating the initial network architecture of the target detection network, the backbone network, features of the target detection network can be updated. The network architectures of the fusion layer, RPN and RCNN are all updated.

本申请中，由于特征融合层的搜索空间中包含的特征融合层的可选连接关系更多，因此，本申请可以根据更多可选的连接关系来更合理的确定目标检测网络的初始网络架构中的特征融合层，并对特征融合层的网络架构进行更新，能够简化最终得到的目标检测网络的复杂度。In this application, since the search space of the feature fusion layer contains more optional connection relationships of the feature fusion layer, this application can more reasonably determine the initial network architecture of the target detection network according to more optional connection relationships The feature fusion layer in the feature fusion layer, and the network architecture of the feature fusion layer is updated, which can simplify the complexity of the final target detection network.

具体地，在本申请中，针对特征融合层的搜索空间来说，由于采用了可选连接关系更加自由的搜索空间，因此，在根据特征融合层的搜索空间确定目标检测网络的初始网络架构中的特征融合层，并对特征融合层的网络架构进行更新时，能够得到更加简化的特征融合层的网络架构，从而能够最终简化目标检测网络的复杂度，减少目标检测网络部署时需要占用的存储空间。Specifically, in this application, for the search space of the feature fusion layer, since a search space with a more free optional connection relationship is adopted, the initial network architecture of the target detection network is determined according to the search space of the feature fusion layer. When the network architecture of the feature fusion layer is updated, a more simplified network architecture of the feature fusion layer can be obtained, which can ultimately simplify the complexity of the target detection network and reduce the storage required for the deployment of the target detection network. space.

此外，由于特征融合层的搜索空间中包含了更多可选的连接关系，因此，在根据特征融合层的搜索空间确定目标检测网络的初始网络架构中的特征融合层，并对特征融合层的网络架构进行更新，最终能够构建出性能更好的目标检测网络。In addition, since the search space of the feature fusion layer contains more optional connection relationships, the feature fusion layer in the initial network architecture of the target detection network is determined according to the search space of the feature fusion layer, and the The network architecture is updated, and finally a better-performing target detection network can be constructed.

本申请中，在RCNN的搜索空间内，RCNN的每个基本单元内的任意两个节点之间都可以连接，因此，本申请能够根据更宽松的RCNN的搜索空间来更合理的确定RCNN的初始网络结构，并对RCNN的初始网络结构进行更新，能够简化目标检测网络的复杂度。In this application, in the search space of RCNN, any two nodes in each basic unit of RCNN can be connected. Therefore, this application can more reasonably determine the initial value of RCNN according to the more relaxed RCNN search space. The network structure is updated, and the initial network structure of RCNN is updated, which can simplify the complexity of the target detection network.

此外，由于RCNN的搜索空间中包含了更多可选的连接关系，因此，在根据RCNN的搜索空间确定目标检测网络的初始网络架构中的RCNN的网络架构，并对RCNN的网络架构进行更新，最终能够构建出性能更好的目标检测网络。In addition, since the search space of RCNN contains more optional connection relationships, the network architecture of RCNN in the initial network architecture of the target detection network is determined according to the search space of RCNN, and the network architecture of RCNN is updated. Finally, a target detection network with better performance can be constructed.

图7所示的方法可以是一种神经网络的自动构建方法，图7所示的方法可以由本申请实施例的目标检测网络的构建装置来自动执行。The method shown in FIG. 7 may be an automatic construction method of a neural network, and the method shown in FIG. 7 may be automatically executed by the apparatus for constructing a target detection network according to an embodiment of the present application.

当对目标检测网络中的目标检测网络的初始网络架构的特征融合层的网络架构和RCNN的网络架构同时进行更新时，能够在使得最终得到的特征融合层与RCNN的结构更加匹配，从而能够提高最终得到的目标检测网络的性能。When the network architecture of the feature fusion layer of the initial network architecture of the target detection network in the target detection network and the network architecture of the RCNN are updated at the same time, the resulting feature fusion layer can be more matched with the structure of the RCNN, thereby improving the performance of the RCNN. The performance of the final object detection network.

可选地，上述步骤1003具体包括：根据目标检测网络的搜索空间，对目标检测网络的初始网络架构进行迭代更新，以减小目标检测网络对应的损失函数的取值，进而得到满足预设要求的目标检测网络。Optionally, the above-mentioned step 1003 specifically includes: according to the search space of the target detection network, iteratively update the initial network architecture of the target detection network, so as to reduce the value of the loss function corresponding to the target detection network, so as to meet the preset requirements. object detection network.

其中，上述损失函数可以包括目标检测网络的目标检测误差和/或目标检测网络的复杂度。Wherein, the above-mentioned loss function may include the target detection error of the target detection network and/or the complexity of the target detection network.

在对目标检测网络的初始网络架构进行迭代更新时，可以对目标检测网络的网络结构(网络中不同节点之间的连接关系)进行调整，并且在每次调整后计算目标检测网络对应的损失函数的取值，然后再根据目标检测网络对应的损失函数的取值再对目标检测网络的网络结构进行更新，这样一直迭代下去，直到得到满足预设要求的目标检测网络。When the initial network architecture of the target detection network is iteratively updated, the network structure of the target detection network (the connection relationship between different nodes in the network) can be adjusted, and the loss function corresponding to the target detection network can be calculated after each adjustment. The value of , and then update the network structure of the target detection network according to the value of the loss function corresponding to the target detection network.

具体地，在上述过程中，可以在每次对目标检测网络的网络架构进行更新后确定目标检测网络对应的损失函数的取值(也可以称为函数值)，如果目标检测网络对应的损失函数的取值的大小已经小于预设的阈值，则可以停止对目标检测网络的网络架构进行更新，此时得到的目标检测网络就是满足预设要求的目标检测网络，如果目标检测网络对应的损失函数的取值不小于预设的阈值，那么，可以根据目标检测网络对应的损失函数的取值的大小决定是否继续对目标检测网络的网络架构进行更新，这样一直迭代下去，直到得到满足预设要求的目标检测网络。Specifically, in the above process, the value of the loss function corresponding to the target detection network (also referred to as the function value) can be determined after each update of the network architecture of the target detection network. If the loss function corresponding to the target detection network The size of the value of is less than the preset threshold, you can stop updating the network architecture of the target detection network. The target detection network obtained at this time is the target detection network that meets the preset requirements. If the loss function corresponding to the target detection network The value of the target detection network is not less than the preset threshold, then, it can be determined whether to continue to update the network architecture of the target detection network according to the value of the loss function corresponding to the target detection network, and so on and so on until the preset requirements are met. object detection network.

可选地，上述满足预设要求的目标检测网络满足下列条件(1)至(3)中的至少一种：Optionally, the above-mentioned target detection network that meets the preset requirements satisfies at least one of the following conditions (1) to (3):

(1)目标检测网络的检测性能满足预设性能要求；(1) The detection performance of the target detection network meets the preset performance requirements;

(2)对目标检测网络的网络架构的更新次数大于或者等于预设次数；(2) The number of updates to the network architecture of the target detection network is greater than or equal to a preset number of times;

(3)目标检测网络的复杂度小于或者等于预设复杂度。(3) The complexity of the target detection network is less than or equal to the preset complexity.

其中，上述目标检测网络的复杂度(也可以称为目标检测网络的网络结构的复杂度)可以是根据目标检测网络的模型参数的数量或者大小、目标检测网络的内存访问成本MAC以及目标检测网络的浮点运算次数中的至少一种确定的。The complexity of the target detection network (also referred to as the complexity of the network structure of the target detection network) may be based on the number or size of model parameters of the target detection network, the memory access cost MAC of the target detection network, and the target detection network. at least one of the number of floating-point operations is determined.

对于特征融合层的搜索空间来说，除了可以包括特征融合层的可选连接关系之外，还可以包括特征融合层的可选操作类型。For the search space of the feature fusion layer, in addition to the optional connection relationship of the feature fusion layer, it can also include optional operation types of the feature fusion layer.

具体地，特征融合层的可选操作类型可以包括多层神经网络中的相邻两层神经网络中的一层神经网络的任意一个节点与另一层神经网络中的任意一个节点的连接所对应的卷积操作。Specifically, the optional operation type of the feature fusion layer may include the connection corresponding to any node of one layer of neural network in two adjacent layers of neural network in the multi-layer neural network and any node of the other layer of neural network. convolution operation.

其中，上述多层神经网络中的相邻两层神经网络中的一层神经网络的任意一个节点与另一层神经网络中的任意一个节点的连接所对应的卷积操作包括空洞卷积操作。Wherein, the convolution operation corresponding to the connection between any node of one layer of neural network in the two adjacent layers of neural network in the above-mentioned multi-layer neural network and any node of another layer of neural network includes hole convolution operation.

特征融合层的可选操作可以包括以下操作中的任意一种：Optional operations for the feature fusion layer can include any of the following operations:

(1)无连接；(1) No connection;

(2)5×5空洞卷积，间隔数为2；(2) 5×5 hole convolution, the number of intervals is 2;

(3)跳过连接(恒等映射)；(3) skip connection (identity mapping);

(4)5×5空洞卷积，间隔数为3；(4) 5×5 hole convolution, the number of intervals is 3;

(5)3×3空洞卷积，间隔数为2；(5) 3×3 hole convolution, the number of intervals is 2;

(6)3×3深度可分卷积；(6) 3×3 depthwise separable convolution;

(7)3×3空洞卷积，间隔数为3；(7) 3×3 hole convolution, the number of intervals is 3;

(8)5×5深度可分卷积。(8) 5×5 depthwise separable convolution.

与传统卷积算子相比，在可学参数量相同的情况下，空洞卷积算子比空洞卷积算子具有更大的感受野，使得采用空洞卷积算子能够从图像中提取出更大视觉范围的特征。或者，为了提取到相同视觉范围的特征，采用空洞卷积算子能够减少特征融合层的网络参数的数量和大小。因此，当特征融合层中的可选操作类型包括空洞卷积操作时，能够在采用更少的卷积参数来实现基本相同的目标检测性能。Compared with the traditional convolution operator, the atrous convolution operator has a larger receptive field than the atrous convolution operator with the same amount of learnable parameters, so that the atrous convolution operator can be used to extract the image from the image. Features of greater visual range. Alternatively, in order to extract features of the same visual range, the use of atrous convolution operators can reduce the number and size of network parameters in the feature fusion layer. Therefore, when the optional operation types in the feature fusion layer include atrous convolution operations, it is possible to achieve substantially the same object detection performance with fewer convolution parameters.

下面结合附图对最终得到的目标检测网络的特征融合层的网络结构进行描述。The network structure of the feature fusion layer of the finally obtained target detection network will be described below with reference to the accompanying drawings.

如图10所示，该特征融合层包括的神经网络的层数，以及每层神经网络中的节点均与图8所示的特征融合层相同。图10所示的是最终得到的特征融合层的一种可能的连接方式，具体地，在图10中，具体地，在图8中，B层神经网络中的P_{1}_1和P_{1}_4接收第一组的所有节点(4个节点)的输出，B层神经网络中的P_{1}_2和P_{1}_3接收A层神经网络中的3个节点的输出；C层神经网络中的P_{2}_4接收B层神经网络中的所有节点(4个节点)的输出，C层神经网络中的P_{2}_1、P_{2}_2、P_{2}_3均接收A层神经网络中的3个节点的输出。As shown in FIG. 10 , the number of layers of the neural network included in the feature fusion layer and the nodes in each layer of the neural network are the same as the feature fusion layer shown in FIG. 8 . Figure 10 shows a possible connection mode of the finally obtained feature fusion layer, specifically, in Figure 10, specifically, in Figure 8, P_{1}_1 and P_{ 1}_4 receives the output of all nodes (4 nodes) of the first group, P_{1}_2 and P_{1}_3 in the B layer neural network receive the output of 3 nodes in the A layer neural network; C layer P_{2}_4 in the neural network receives the output of all nodes (4 nodes) in the B-layer neural network, and P_{2}_1, P_{2}_2, P_{2}_3 in the C-layer neural network are all Receive the outputs of 3 nodes in the A layer neural network.

表1示出了B层神经网络中的每个节点对应的输入节点。Table 1 shows the input nodes corresponding to each node in the B-layer neural network.

表1Table 1

B层神经网络B layer neural network 对应的输入节点corresponding input node P_{1}_1P_{1}_1 P_{0}_1、P_{0}_2、P_{0}_3和P_{0}_4P_{0}_1, P_{0}_2, P_{0}_3, and P_{0}_4 P_{1}_2P_{1}_2 P_{0}_2、P_{0}_3和P_{0}_4P_{0}_2, P_{0}_3, and P_{0}_4 P_{1}_3P_{1}_3 P_{0}_2、P_{0}_3和P_{0}_4P_{0}_2, P_{0}_3, and P_{0}_4 P_{1}_4P_{1}_4 P_{0}_1、P_{0}_2、P_{0}_3和P_{0}_4P_{0}_1, P_{0}_2, P_{0}_3, and P_{0}_4

表2示出了B层神经网络中的每个节点对应的输入节点。Table 2 shows the input nodes corresponding to each node in the B-layer neural network.

表2Table 2

C层神经网络C layer neural network 对应的输入节点corresponding input node P_{2}_1P_{2}_1 P_{0}_1、P_{0}_2、P_{0}_3和P_{0}_4P_{0}_1, P_{0}_2, P_{0}_3, and P_{0}_4 P_{2}_2P_{2}_2 P_{0}_2、P_{0}_3和P_{0}_4P_{0}_2, P_{0}_3, and P_{0}_4 P_{2}_3P_{2}_3 P_{0}_2、P_{0}_3和P_{0}_4P_{0}_2, P_{0}_3, and P_{0}_4 P_{2}_4P_{2}_4 P_{0}_1、P_{0}_2、P_{0}_3和P_{0}_4P_{0}_1, P_{0}_2, P_{0}_3, and P_{0}_4

另外，图10还示出节点之间一种可能的操作空间的选择方式，具体地，在A层神经网络到B层神经网络之间的部分节点之间的操作如表3所示，除了表3所示的操作关系之外，A层神经网络和第二节点之间对应的操作均为identity。In addition, Fig. 10 also shows a possible selection method of the operation space between nodes. Specifically, the operations between some nodes between the A-layer neural network and the B-layer neural network are shown in Table 3, except that the table In addition to the operation relationship shown in 3, the corresponding operations between the A-layer neural network and the second node are all identity.

表3table 3

B层神经网络B layer neural network 对应的输入节点corresponding input node 对应的操作corresponding operation P_{1}_1P_{1}_1 P_{0}_3P_{0}_3 dil_conv_5×5_r3dil_conv_5×5_r3 P_{1}_2P_{1}_2 P_{0}_3P_{0}_3 dil_conv_5×5_r3dil_conv_5×5_r3 P_{1}_4P_{1}_4 P_{0}_4P_{0}_4 dil_conv_5×5_r2dil_conv_5×5_r2

在B层神经网络到C层神经网络之间的部分节点之间的操作如表4所示，除了表4所示的操作关系之外，B层神经网络和C层神经网络之间对应的操作均为identity。The operations between some nodes between the B-layer neural network and the C-layer neural network are shown in Table 4. In addition to the operation relationship shown in Table 4, the corresponding operations between the B-layer neural network and the C-layer neural network Both are identities.

表4Table 4

C层神经网络C layer neural network 对应的输入节点corresponding input node 对应的操作corresponding operation P_{2}_2P_{2}_2 P_{1}_1P_{1}_1 dil_conv_5×5_r3dil_conv_5×5_r3 P_{2}_4P_{2}_4 P_{1}_4P_{1}_4 dil_conv_5×5_r3dil_conv_5×5_r3

上述步骤1002中的目标检测网络的搜索空间还可以包括RCNN的搜索空间，RCNN可以包括多个基本单元，该多个基本单元中的每个基本单元由至少两个节点构成，RCNN的搜索空间包括多个基本单元中的每个基本单元的搜索空间，每个基本单元的搜索空间包括每个基本单元的可选连接关系，每个基本单元的可选连接关系包括每个基本单元内的任意两个节点之间的连接。The search space of the target detection network in the above step 1002 may also include the search space of the RCNN, and the RCNN may include a plurality of basic units, each of which is composed of at least two nodes, and the search space of the RCNN includes: The search space of each basic unit in the plurality of basic units, the search space of each basic unit includes the optional connection relationship of each basic unit, and the optional connection relationship of each basic unit includes any two in each basic unit. connections between nodes.

当上述步骤1002中的目标检测网络的搜索空间包括RCNN的搜索空间时，上述目标检测网络中的目标检测网络的初始网络架构中的RCNN可以是根据RCNN的搜索空间确定的。也就是说，在步骤1002中，可以根据RCNN的搜索空间的来确定目标检测网络的初始网络架构中的RCNN的网络架构。When the search space of the target detection network in the above step 1002 includes the search space of the RCNN, the RCNN in the initial network architecture of the target detection network in the above-mentioned target detection network may be determined according to the search space of the RCNN. That is, in step 1002, the network architecture of the RCNN in the initial network architecture of the target detection network can be determined according to the search space of the RCNN.

进一步的，在本申请中，当根据特征融合层的搜索空间确定目标检测网络的初始网络架构的特征融合层，并对特征融合层的网络架构进行更新，与手工设定网络架构的方式相比，本申请根据RCNN的搜索空间确定目标检测网络的初始网络架构的RCNN，并对RCNN的网络架构进行更新时，能够同时实现对特征融合层的网络结构和RCNN的网络结构的更新和优化，从而使得最终优化得到的RCNN的网络结构与特征融合层的网络结构更加匹配，可以得到目标检测性能更好的目标检测网络。Further, in this application, when the feature fusion layer of the initial network architecture of the target detection network is determined according to the search space of the feature fusion layer, and the network architecture of the feature fusion layer is updated, compared with the way of manually setting the network architecture. In this application, the RCNN of the initial network architecture of the target detection network is determined according to the search space of the RCNN, and when the network architecture of the RCNN is updated, the network structure of the feature fusion layer and the network structure of the RCNN can be updated and optimized at the same time, thereby The network structure of the final optimized RCNN is more matched with the network structure of the feature fusion layer, and a target detection network with better target detection performance can be obtained.

可选地，在上述RCNN的搜索空间内，每个基本单元的搜索空间还包括每个基本单元的可选操作类型，每个基本单元的可选操作类型包括每个基本单元内的任意两个节点之间的连接所对应的卷积操作，该卷积操作包括空洞卷积操作。Optionally, in the search space of the above-mentioned RCNN, the search space of each basic unit also includes the optional operation type of each basic unit, and the optional operation type of each basic unit includes any two in each basic unit. The convolution operation corresponding to the connection between nodes, the convolution operation includes the hole convolution operation.

可选地，上述每个基本单元内的任意两个节点之间的连接所对应的卷积操作包括间隔数为2的空洞卷积操作。Optionally, the convolution operation corresponding to the connection between any two nodes in each basic unit includes a hole convolution operation with an interval of 2.

可选地，上述RCNN中的基本单元可以由不同数目的节点构成。Optionally, the basic units in the above-mentioned RCNN may be composed of different numbers of nodes.

在传统方案中，RCNN中的基本单元一般都是由相同数目的节点构成，这种方式在优化RCNN网络结构时不够灵活。而本申请中的RCNN中的基本单元可以由不同数目的节点构成，使得RCNN中基本单元的构成更加自由，从而能够在确定RCNN的初始网络结构和对RCNN的初始网络结构进行更新时增加RCNN的网络结构的可能性，便于搜索到更好的RCNN的网络结构，使得最终更有可能得到目标检测性能更好的目标检测网络。In the traditional scheme, the basic units in RCNN are generally composed of the same number of nodes, which is not flexible enough to optimize the RCNN network structure. The basic units in the RCNN in this application can be composed of different numbers of nodes, which makes the composition of the basic units in the RCNN more free, so that the initial network structure of the RCNN can be determined and the initial network structure of the RCNN can be updated. The possibility of network structure is convenient to search for a better RCNN network structure, making it more likely to obtain a target detection network with better target detection performance in the end.

另外，RCNN中每个基本单元的节点构成数目可以是在构建目标检测网络之前预先确定好的。In addition, the number of nodes constituting each basic unit in RCNN can be predetermined before constructing the object detection network.

可选地，上述每个基本单元的输入特征图的分辨率与每个基本单元的输出特征图的分辨率相同。Optionally, the resolution of the input feature map of each basic unit is the same as the resolution of the output feature map of each basic unit.

当RCNN中的每个基本单元的输入特征图的分辨率与每个基本单元的输出特征图的分辨率相同时，RCNN中的每个基本单元在处理特征图时不改变特征图的分辨率，便于保留特征图的信息。When the resolution of the input feature map of each basic unit in RCNN is the same as the resolution of the output feature map of each basic unit, each basic unit in RCNN does not change the resolution of the feature map when processing the feature map, It is convenient to retain the information of the feature map.

下面结合附图对最终得到的目标检测网络的RCNN的网络结构进行描述。The following describes the network structure of the RCNN of the final target detection network with reference to the accompanying drawings.

如图11所示，该RCNN包括的基本单元，以及每个基本单元中包含的节点个数均与图9所示的RCNN相同，图11所示的RCNN所示的网络架构可以是最终得到的RCNN的网络架构。As shown in Figure 11, the basic units included in the RCNN and the number of nodes contained in each basic unit are the same as the RCNN shown in Figure 9. The network architecture shown in the RCNN shown in Figure 11 can be the final result. The network architecture of RCNN.

具体地，如图11所示，对于RCNN的基本单元1来说，由两个相同的输入节点(H_{0})，输入节点(H_{0})输入的特征图经过中间节点0至3的处理后，在输出节点处进行加权求和，得到第一个卷积单元的输出(H_{1})。其中，输入节点到中间节点以及中间节点到输出节点的操作均为5×5的卷积(conv_5×5)。Specifically, as shown in Figure 11, for the basic unit 1 of RCNN, the feature map input by the two identical input nodes (H_{0}) and the input nodes (H_{0}) passes through the intermediate nodes 0 to 3 After processing, weighted summation is performed at the output node to obtain the output of the first convolution unit (H_{1}). Among them, the operations from the input node to the intermediate node and the intermediate node to the output node are all 5×5 convolutions (conv_5×5).

对于RCNN的基本单元2来说，输入节点H_{0}和H_{1}输入的特征图经过中间节点0至3的处理后，在输出节点处进行加权求和，得到第一个卷积单元的输出(H_{2})。其中，输入节点到中间节点以及中间节点到输出节点的操作均为5×5的卷积(conv_5×5)。For the basic unit 2 of RCNN, the feature maps input by input nodes H_{0} and H_{1} are processed by intermediate nodes 0 to 3, and then weighted and summed at the output node to obtain the first convolution unit. the output of (H_{2}). Among them, the operations from the input node to the intermediate node and the intermediate node to the output node are all 5×5 convolutions (conv_5×5).

下面结合图12对本申请实施例目标检测网络的构建方法的过程进行更详细的介绍。The process of the method for constructing a target detection network in this embodiment of the present application will be described in more detail below with reference to FIG. 12 .

图12是本申请实施例的目标检测网络的构建方法的流程图。图12所示的方法可以由本申请实施例的目标检测网络的构建装置执行，图12所示的方法包括步骤2001至步骤2006，下面对这些步骤分别进行详细的介绍。FIG. 12 is a flowchart of a method for constructing a target detection network according to an embodiment of the present application. The method shown in FIG. 12 may be executed by the apparatus for constructing a target detection network according to an embodiment of the present application. The method shown in FIG. 12 includes steps 2001 to 2006 , which will be described in detail below.

2001、初始化目标检测网络的网络架构。2001. Initialize the network architecture of the target detection network.

在步骤2001中，可以先确定目标检测网络的初始网络架构。In step 2001, the initial network architecture of the target detection network may be determined first.

具体地，由于目标检测网络包括骨干网络、特征融合层、RPN和RCNN，因此，在步骤2001中，确定目标检测网络的初始网络架构也就是需要确定目标检测网络中的骨干网络、特征融合层、RPN和RCNN的网络架构。Specifically, since the target detection network includes a backbone network, a feature fusion layer, RPN and RCNN, in step 2001, to determine the initial network architecture of the target detection network, it is necessary to determine the backbone network, feature fusion layer, Network architecture of RPN and RCNN.

其中，骨干网络和RPN可以是预先构建好的网络，因此，初始化目标检测网络的网络结构相当于初始化特征融合层和RCNN的网络结构。Among them, the backbone network and RPN can be pre-built networks, therefore, initializing the network structure of the target detection network is equivalent to initializing the network structure of the feature fusion layer and RCNN.

经过步骤2001之后，可以确定出特征融合层的初始网络结构和RCNN的初始网络结构。After step 2001, the initial network structure of the feature fusion layer and the initial network structure of the RCNN can be determined.

2002a、对特征融合层进行性能评估。2002a, perform performance evaluation on the feature fusion layer.

具体地，在步骤2002a中，可以对特征融合层的性能和特征融合层占用的资源进行评估，便于后续根据特征融合层的性能和占用资源的评估情况来更新特征融合层的网络架构。Specifically, in step 2002a, the performance of the feature fusion layer and the resources occupied by the feature fusion layer can be evaluated, so that the network architecture of the feature fusion layer can be updated subsequently according to the performance of the feature fusion layer and the evaluation of the resources occupied.

2002b、对RCNN进行性能评估。2002b, Evaluate the performance of RCNN.

与步骤2002a类似，在步骤2002b中，也可以对RCNN的性能和资源占用情况进行评估，便于后续根据RCNN的性能和资源占用情况来更新RCNN的网络架构。Similar to step 2002a, in step 2002b, the performance and resource occupancy of the RCNN can also be evaluated, so that the network architecture of the RCNN can be updated subsequently according to the performance and resource occupancy of the RCNN.

上述步骤2002a和步骤2002b既可以同时发生，也可以先后发生，本申请对步骤2002a和步骤2002b发生的先后顺序不做限定。The above steps 2002a and 2002b may occur simultaneously or sequentially, and the present application does not limit the sequence in which the steps 2002a and 2002b occur.

2003a、更新特征融合层的网络架构。2003a, update the network architecture of the feature fusion layer.

在经过步骤2002a得到特征融合层的性能和资源占用情况之后，可以根据特征融合层的性能和资源占用情况对特征融合层的网络架构进行更新。After obtaining the performance and resource occupation of the feature fusion layer through step 2002a, the network architecture of the feature fusion layer may be updated according to the performance and resource occupation of the feature fusion layer.

具体地，当特征融合层的性能不满足要求时，可以在更新特征融合层时，适当增加特征融合层的复杂度，使得更新后的特征融合层具有更好的性能；而当特征融合层占用的资源过多时，可以在更新特征融合层时降低特征融合层的复杂度，使得更新后的特征融合层具有更简单的结构，从而使得更新后的特征融合层占用的资源减少。Specifically, when the performance of the feature fusion layer does not meet the requirements, the complexity of the feature fusion layer can be appropriately increased when the feature fusion layer is updated, so that the updated feature fusion layer has better performance; When there are too many resources, the complexity of the feature fusion layer can be reduced when the feature fusion layer is updated, so that the updated feature fusion layer has a simpler structure, thereby reducing the resources occupied by the updated feature fusion layer.

2003b、更新RCNN的网络架构。2003b, update the network architecture of RCNN.

在经过步骤2002b得到RCNN的性能和资源占用情况之后，可以根据RCNN的性能和资源占用情况对RCNN的网络架构进行更新。After obtaining the performance and resource occupancy of the RCNN through step 2002b, the network architecture of the RCNN can be updated according to the performance and resource occupancy of the RCNN.

具体地，当RCNN的性能不满足要求时，可以在更新RCNN时，将RCNN的复杂度提高，使得更新后的RCNN具有更好的性能；而当RCNN占用的资源过多时，可以在更新RCNN时降低RCNN的复杂度，使得更新后的RCNN具有更简单的结构，从而使得更新后的RCNN占用的资源减少。Specifically, when the performance of the RCNN does not meet the requirements, the complexity of the RCNN can be increased when the RCNN is updated, so that the updated RCNN has better performance; and when the RCNN occupies too many resources, it can be updated when the RCNN is updated. The complexity of RCNN is reduced, so that the updated RCNN has a simpler structure, so that the resources occupied by the updated RCNN are reduced.

应理解，上述步骤2003a和2003b既可以同时发生，也可以先后发生，本申请对步骤2003a和2003b的先后顺序不做限定。It should be understood that the above steps 2003a and 2003b may occur simultaneously or sequentially, and the present application does not limit the sequence of steps 2003a and 2003b.

2004、更新目标检测网络的网络参数。2004. Update the network parameters of the target detection network.

在更新了特征融合层和RCNN的网络架构之后，可以对目标检测网络的网络参数进行更新，这里的目标检测网络的网络参数包括骨干网络、特征融合层、RPN和RCNN的网络参数(该网络参数具体可以包括卷积参数)。After updating the network architecture of the feature fusion layer and RCNN, the network parameters of the target detection network can be updated. The network parameters of the target detection network here include the backbone network, the feature fusion layer, the network parameters of RPN and RCNN (the network parameters Specifically, convolution parameters can be included).

2005、确定目标检测网络是否满足预设条件。2005. Determine whether the target detection network satisfies a preset condition.

具体地，在步骤2005中，可以确定目标检测网络是否满足下列条件：Specifically, in step 2005, it can be determined whether the target detection network satisfies the following conditions:

(1)目标检测网络的检测性能满足预设要求；(1) The detection performance of the target detection network meets the preset requirements;

(2)目标检测网络的网络架构更新次数是否达到预设次数；(2) Whether the number of network architecture updates of the target detection network reaches a preset number of times;

当目标网络满足上述条件(1)至(3)中的任意一个时，可以确定目标检测网络满足预设要求。When the target network satisfies any one of the above-mentioned conditions (1) to (3), it can be determined that the target detection network satisfies the preset requirements.

上述目标检测网络的性能满足预设要求具体可以是目标检测网络进行目标检测的准确率大于某个准确率阈值。例如，目标检测网络进行目标检测的准确率大于60％时，确定目标检测网络的性能满足预设要求。The performance of the above-mentioned target detection network satisfies the preset requirement may specifically be that the accuracy of the target detection performed by the target detection network is greater than a certain accuracy threshold. For example, when the target detection accuracy rate of the target detection network is greater than 60%, it is determined that the performance of the target detection network meets the preset requirements.

当步骤2005中确定目标检测网络满足预设条件时，继续执行步骤2006，输出目标检测网络。当步骤2005中确定目标检测网络不满足预设条件时，继续执行步骤2002a和步骤2002b，以继续更新目标检测网络的网络架构。When it is determined in step 2005 that the target detection network satisfies the preset condition, step 2006 is continued to output the target detection network. When it is determined in step 2005 that the target detection network does not meet the preset conditions, steps 2002a and 2002b are continued to update the network architecture of the target detection network.

2006、输出目标检测网络。2006. Output target detection network.

经过步骤2005确定目标检测网络满足预设条件后，目标神经网络的构建完成，可以输出目标检测网络。After it is determined in step 2005 that the target detection network satisfies the preset conditions, the construction of the target neural network is completed, and the target detection network can be output.

图13是本申请实施例的目标检测方法的示意性流程图。图13所示方法可以由本申请实施例的目标检测装置来执行，例如，图13所示的方法可以由图15或者图17所示的装置来执行。FIG. 13 is a schematic flowchart of a target detection method according to an embodiment of the present application. The method shown in FIG. 13 may be performed by the target detection apparatus in this embodiment of the present application. For example, the method shown in FIG. 13 may be performed by the apparatus shown in FIG. 15 or FIG. 17 .

图13所示的方法包括步骤3001和3002，下面对这两个步骤以及相关内容进行详细介绍。The method shown in FIG. 13 includes steps 3001 and 3002, and the two steps and related contents will be described in detail below.

3001、获取图像。3001. Obtain an image.

3002、采用目标检测网络对上述图像进行处理，得到该图像的目标检测结果。3002. Use a target detection network to process the above image to obtain a target detection result of the image.

上述步骤3002中得到的图像的目标检测结果包括图像中的检测目标所处的位置和检测目标所属的分类结果。The target detection result of the image obtained in the above step 3002 includes the position of the detection target in the image and the classification result to which the detection target belongs.

上述步骤3002中采用的目标检测网络可以是根据本申请实施例的目标检测网络的构建方法构建得到的。上文在介绍本申请实施例时对目标检测网络的限定和解释也适用于步骤3002中的目标检测网络。The target detection network used in the above step 3002 may be constructed according to the construction method of the target detection network in the embodiment of the present application. The above definitions and explanations on the target detection network when introducing the embodiments of the present application are also applicable to the target detection network in step 3002 .

具体地，上述步骤3002中采用的目标检测网络可以是根据图7或者图12所示的方法构建得到的。Specifically, the target detection network used in the above step 3002 may be constructed according to the method shown in FIG. 7 or FIG. 12 .

本申请中的目标检测方法采用的目标检测网络在构建过程中采用的特征融合层的搜索空间中包含的特征融合层的可选连接关系更多，因此，在根据特征融合层的搜索空间确定的特征融合层能够更好地进行特征融合，进而使得最终的到的目标检测网络在进行目标检测时具有更好的性能。The target detection network used in the target detection method in this application contains more optional connection relationships of the feature fusion layer in the search space of the feature fusion layer used in the construction process. Therefore, in the search space determined according to the feature fusion layer The feature fusion layer can perform better feature fusion, so that the final target detection network has better performance in target detection.

上述图13所示的方法中的目标检测网络的初始网络架构是根据目标检测网络的搜索空间确定的，而目标检测网络最终的网络架构可以是根据目标检测网络的搜索空间对目标检测网络的初始网络架构进行迭代更新得到的。The initial network architecture of the target detection network in the method shown in FIG. 13 above is determined according to the search space of the target detection network, and the final network architecture of the target detection network can be the initial network architecture of the target detection network according to the search space of the target detection network. The network architecture is iteratively updated.

在获取上述图13所示的方法中采用的目标检测网络时，目标检测网络的搜索空间可以包括特征融合层的搜索空间和/或RCNN的搜索空间。When acquiring the target detection network used in the method shown in FIG. 13 , the search space of the target detection network may include the search space of the feature fusion layer and/or the search space of the RCNN.

当上述目标检测网络的搜索空间包括特征融合层的搜索空间时，目标检测网络的初始网络架构中的特征融合层是根据特征融合层的搜索空间确定的，目标检测网络中的特征融合层的最终网络架构是根据特征融合层的搜索空间对目标检测网络的初始网络架构中的特征融合层的网络架构进行迭代更新得到的。When the search space of the above target detection network includes the search space of the feature fusion layer, the feature fusion layer in the initial network architecture of the target detection network is determined according to the search space of the feature fusion layer, and the final feature fusion layer in the target detection network is determined. The network architecture is obtained by iteratively updating the network architecture of the feature fusion layer in the initial network architecture of the target detection network according to the search space of the feature fusion layer.

当上述目标检测网络的搜索空间包括RCNN的搜索空间时，目标检测网络的初始网络架构中的RCNN是根据RCNN的搜索空间确定的，目标检测网络中的RCNN的最终网络架构是根据RCNN的搜索空间对目标检测网络的初始网络架构中的RCNN的网络架构进行迭代更新得到的。When the search space of the above target detection network includes the search space of the RCNN, the RCNN in the initial network architecture of the target detection network is determined according to the search space of the RCNN, and the final network architecture of the RCNN in the target detection network is determined according to the search space of the RCNN It is obtained by iteratively updating the RCNN network architecture in the initial network architecture of the target detection network.

可选地，上述目标检测网络的搜索空间包括特征融合层的搜索空间，特征融合层的搜索空间包括特征融合层的可选连接关系，特征融合层的可选连接关系包括特征融合层的多层神经网络中的相邻两层神经网络中的一层神经网络的任意一个节点与另一层神经网络中的任意一个节点的连接。Optionally, the search space of the above-mentioned target detection network includes the search space of the feature fusion layer, the search space of the feature fusion layer includes the optional connection relationship of the feature fusion layer, and the optional connection relationship of the feature fusion layer includes multiple layers of the feature fusion layer. The connection between any node of one layer of neural network in the adjacent two-layer neural network in the neural network and any node of the other layer of neural network.

可选地，上述特征融合层的搜索空间还包括特征融合层的可选操作类型，特征融合层的可选操作类型包括多层神经网络中的相邻两层神经网络中的一层神经网络的任意一个节点与另一层神经网络中的任意一个节点的连接所对应的卷积操作，其中，该卷积操作包括空洞卷积操作。Optionally, the search space of the above-mentioned feature fusion layer also includes the optional operation type of the feature fusion layer, and the optional operation type of the feature fusion layer includes the operation type of one layer of the adjacent two-layer neural network in the multi-layer neural network. A convolution operation corresponding to the connection between any node and any node in another layer of neural network, wherein the convolution operation includes an atrous convolution operation.

当特征融合层中的可选操作类型包括空洞卷积操作时，能够在采用更少的卷积参数来实现基本相同的目标检测性能。另外，在卷积参数具有相同参数量的情况下，采用空洞卷积进行卷积处理能够取得比传统卷积更大的感受野，因此，能够在利用目标检测网络进行目标检测时取得更好的目标检测性能。When the optional operation types in the feature fusion layer include atrous convolution operations, substantially the same object detection performance can be achieved with fewer convolution parameters. In addition, when the convolution parameters have the same amount of parameters, the convolution processing using atrous convolution can obtain a larger receptive field than the traditional convolution, so it can achieve better target detection when using the target detection network. Object detection performance.

可选地，上述RCNN包括多个基本单元，多个基本单元中的每个基本单元由至少两个节点构成，目标检测网络的搜索空间还包括RCNN的搜索空间，RCNN的搜索空间包括多个基本单元中的每个基本单元的搜索空间，每个基本单元的搜索空间包括每个基本单元的可选连接关系，每个基本单元的可选连接关系包括每个基本单元内的任意两个节点之间的连接；目标检测网络的初始网络架构中的RCNN是根据RCNN的搜索空间确定的。Optionally, the above-mentioned RCNN includes a plurality of basic units, and each basic unit in the plurality of basic units is composed of at least two nodes, and the search space of the target detection network also includes the search space of the RCNN, and the search space of the RCNN includes a plurality of basic units. The search space of each basic unit in the unit, the search space of each basic unit includes the optional connection relationship of each basic unit, and the optional connection relationship of each basic unit includes any two nodes in each basic unit. The connection between the two; the RCNN in the initial network architecture of the target detection network is determined according to the search space of the RCNN.

可选地，上述每个基本单元的搜索空间还包括每个基本单元的可选操作类型，每个基本单元的可选操作类型包括每个基本单元内的任意两个节点之间的连接所对应的卷积操作，该卷积操作包括空洞卷积操作。Optionally, the search space of each basic unit above also includes the optional operation type of each basic unit, and the optional operation type of each basic unit includes the corresponding connection between any two nodes in each basic unit. The convolution operation includes atrous convolution operation.

当RCNN中的可选操作类型包括空洞卷积操作时，能够在采用更少的卷积参数来实现基本相同的目标检测性能。另外，当RCNN的可选操作类型包括空洞卷积操作时，能够在卷积参数量基本相同的情况下，取得更好的目标检测性能。When the optional operation type in RCNN includes atrous convolution operation, it can achieve basically the same object detection performance with fewer convolution parameters. In addition, when the optional operation type of RCNN includes atrous convolution operation, better target detection performance can be achieved under the condition that the amount of convolution parameters is basically the same.

具体地，在卷积参数具有相同参数量的情况下，采用空洞卷积进行卷积处理能够取得比传统卷积更大的感受野，因此，采用上述目标检测网络进行目标检测时具有更好的目标检测性能。Specifically, when the convolution parameters have the same amount of parameters, convolution processing using atrous convolution can achieve a larger receptive field than traditional convolution. Therefore, using the above target detection network for target detection has better performance. Object detection performance.

可选地，每个基本单元内的任意两个节点之间的连接所对应的卷积操作包括间隔数为2的空洞卷积操作。Optionally, the convolution operation corresponding to the connection between any two nodes in each basic unit includes an atrous convolution operation with an interval of 2.

可选地，上述多个基本单元中的至少两个基本单元分别由不同数目的节点构成。Optionally, at least two basic units in the above-mentioned plurality of basic units are respectively constituted by nodes of different numbers.

可选地，每个基本单元的输入特征图的分辨率与每个基本单元的输出特征图的分辨率相同。Optionally, the resolution of the input feature map of each basic unit is the same as the resolution of the output feature map of each basic unit.

当RCNN中的每个基本单元的输入特征图的分辨率与每个基本单元的输出特征图的分辨率相同时，RCNN中的每个基本单元在处理特征图时不改变特征图的分辨率，便于保留特征图的信息，进而保证采用目标检测网络进行目标检测时取得较好的目标检测效果。When the resolution of the input feature map of each basic unit in RCNN is the same as the resolution of the output feature map of each basic unit, each basic unit in RCNN does not change the resolution of the feature map when processing the feature map, It is convenient to retain the information of the feature map, thereby ensuring a better target detection effect when the target detection network is used for target detection.

可选地，上述目标检测网络满足下列条件中的至少一种：目标检测网络的检测性能满足预设性能要求；对目标检测网络的网络架构的更新次数大于或者等于预设次数；目标检测网络的复杂度小于或者等于预设复杂度。Optionally, the above-mentioned target detection network satisfies at least one of the following conditions: the detection performance of the target detection network meets the preset performance requirements; the number of updates to the network architecture of the target detection network is greater than or equal to the preset number of times; The complexity is less than or equal to the preset complexity.

可选地，上述目标检测网络的复杂度是根据目标检测网络的模型参数的数量或者大小、目标检测网络的内存访问成本MAC以及目标检测网络的浮点运算次数中的至少一种确定的。Optionally, the complexity of the target detection network is determined according to at least one of the number or size of model parameters of the target detection network, the memory access cost MAC of the target detection network, and the number of floating point operations of the target detection network.

为了说明本申请实施例的目标检测网络的构建方法的效果，下面结合具体的测试结果对本申请实施例的目标检测网络的构建方法得到的神经网络的复杂度，以及利用本申请的目标检测网络的构建方法得到的神经网络进行目标检测的准确率进行分析。In order to illustrate the effect of the construction method of the target detection network of the embodiment of the present application, the complexity of the neural network obtained by the construction method of the target detection network of the embodiment of the present application and the complexity of the neural network obtained by the construction method of the target detection network of the embodiment of the present application are described below with reference to the specific test results. The accuracy of target detection by the neural network obtained by the construction method is analyzed.

表5table 5

表5示出了采用不同方案得到的目标检测网络的复杂度。Table 5 shows the complexity of object detection networks obtained with different schemes.

其中，现有方案1至现有方案6的具体方案分别如下：Among them, the specific schemes of the existing scheme 1 to the existing scheme 6 are as follows:

现有方案1：引导锚定更快RCNN(guided anchoring faster RCNN)；Existing solution 1: guided anchoring faster RCNN (guided anchoring faster RCNN);

现有方案2：路径集合网络(path aggregation network)2018年在CVPR发表；Existing scheme 2: path aggregation network published in CVPR in 2018;

现有方案3：使用区域探测网络进行实时对象检测(towards real-time objectdetection with region proposal networks)，2015年于NIPS发表；Existing solution 3: Towards real-time object detection with region proposal networks, published in NIPS in 2015;

现有方案4：特征金字塔网络(feature pyramid network)，2017年于CVPR发表；Existing scheme 4: feature pyramid network, published in CVPR in 2017;

现有方案5：用于对象检测的关系网络(relation networks for objectdetection)，CVPR，2018年在CVPR发表；Existing solution 5: relation networks for object detection, CVPR, published in CVPR in 2018;

现有方案6：密集物体检测的焦点损失(focal loss for dense objectdetection)，2017年在ICCV发表。Existing scheme 6: focal loss for dense object detection, published in ICCV in 2017.

由表5可知，在数据集COCO进行测试时，在骨干网络分别为ResNet-50和ResNet-101时利用本申请方案得到的神经网络进行目标检测时的平均精度(mean averageprecision，mAP)均高于利用现有方案得到的神经网络进行目标检测时的平均精度。It can be seen from Table 5 that when the data set COCO is tested, when the backbone networks are ResNet-50 and ResNet-101 respectively, the average precision (mean average precision, mAP) of target detection using the neural network obtained by the solution of this application is higher than The average accuracy of target detection using the neural network obtained by the existing scheme.

具体地，当骨干网络为ResNet-50时，采用本申请方案得到的神经网络进行目标检测的平均精度为40.5，而采用现有方案1或者现有方案2得到的神经网络进行目标检测的平均精度均为39.8，与现有方案1和现有方案2相比，采用本申请方案得到的神经网络进行目标检测的平均精度提高了40.5-39.8＝0.7。Specifically, when the backbone network is ResNet-50, the average accuracy of target detection using the neural network obtained by the solution of the present application is 40.5, while the average accuracy of target detection using the neural network obtained by the existing solution 1 or the existing solution 2 is 40.5 Both are 39.8. Compared with the existing solution 1 and the existing solution 2, the average accuracy of target detection by the neural network obtained by the solution of the present application is improved by 40.5-39.8=0.7.

当骨干网络为ResNet-101时，采用本申请方案得到的神经网络进行目标检测的平均精度为42.5，而采用现有方案3至现有方案6得到的神经网络进行目标检测的平均精度最高是39.1，与现有方案3至现有方案6相比，采用本申请方案得到的神经网络进行目标检测的平均精度至少提高了42.5-39.1＝3.4(与现有方案3相比，本申请方案对应的目标检测的平均精度更是提高了42.5-34.9＝7.6)。When the backbone network is ResNet-101, the average accuracy of target detection using the neural network obtained by the solution of this application is 42.5, while the average accuracy of target detection using the neural network obtained by using the existing solution 3 to the existing solution 6 is the highest. 39.1 , compared with the existing scheme 3 to the existing scheme 6, the average accuracy of target detection using the neural network obtained by the proposed scheme is improved by at least 42.5-39.1=3.4 (compared with the existing scheme 3, the corresponding The average accuracy of object detection is even improved by 42.5-34.9=7.6).

表6Table 6

表6示出了现有方案和本申请方案构建得到的目标检测网络包含的参数量，以及采用现有方案和本申请方案构建得到的目标检测网络进行目标检测时的平均精度。如表6所示，本申请方案分别在三种数据集(VOC数据集，COCO数据集以及BDD数据集)上构建得到的目标检测网络的参数总量均低于现有方案构建得到的目标检测网络的参数总量。另外，本申请方案分别在三种数据集上构建得到的目标检测网络在测试数据集(PASCALVOC数据集)上进行目标检测的mAP也高于现有方案构建得到的目标检测网络在测试数据集(PASCALVOC数据集)上进行目标检测的mAP。Table 6 shows the amount of parameters included in the target detection network constructed by the existing solution and the solution of the present application, and the average precision of target detection when the target detection network constructed by the existing solution and the solution of the present application is used for target detection. As shown in Table 6, the total number of parameters of the target detection network constructed on the three data sets (VOC data set, COCO data set and BDD data set) of the proposed scheme is lower than the target detection obtained by the existing scheme. The total number of parameters of the network. In addition, the target detection network constructed by the proposed scheme on the three datasets respectively has higher mAP for target detection on the test dataset (PASCALVOC dataset) than the target detection network constructed by the existing scheme in the test dataset ( mAP for object detection on the PASCALVOC dataset).

表7Table 7

与表6类似，表7也示出了现有方案和本申请方案构建得到的目标检测网络包含的参数量，以及采用现有方案和本申请方案构建得到的目标检测网络进行目标检测时的平均精度。Similar to Table 6, Table 7 also shows the amount of parameters contained in the target detection network constructed by the existing scheme and the proposed scheme, and the average of the target detection network constructed by the existing scheme and the proposed scheme for target detection. precision.

如表7所示，本申请方案分别在三种数据集(BDD数据集，COCO数据集以及VOC数据集)上构建得到的目标检测网络的参数总量均低于现有方案构建得到的目标检测网络的参数总量。另外，本申请方案分别在三种数据集上构建得到的目标检测网络在测试数据集(BDD数据集)上进行目标检测的mAP也高于现有方案构建得到的目标检测网络在测试数据集(BDD数据集)上进行目标检测的mAP。As shown in Table 7, the total number of parameters of the target detection network constructed on the three data sets (BDD data set, COCO data set and VOC data set) of the proposed scheme is lower than the target detection obtained by the existing scheme. The total number of parameters of the network. In addition, the target detection network constructed on the three data sets in the proposed scheme has higher mAP for target detection on the test data set (BDD data set) than the target detection network constructed by the existing scheme in the test data set (BDD data set). mAP for object detection on BDD dataset).

上文结合附图对本申请实施例的目标检测网络的构建方法和本申请实施例的目标检测方法进行了详细的介绍，下面结合附图对本申请实施例的目标检测网络的构建装置和目标检测装置进行描述。The construction method of the target detection network according to the embodiment of the present application and the target detection method according to the embodiment of the present application are described in detail above with reference to the accompanying drawings. The following describes the construction device and target detection device of the target detection network according to the embodiment of the present application with reference to the accompanying drawings. describe.

应理解，下文中介绍的目标检测网络的构建装置能够执行本申请实施例的目标检测网络的构建方法的各个步骤，下文中介绍的目标检测装置能够执行本申请实施例的目标检测方法的各个步骤，下面在介绍本申请实施例的目标检测网络的构建装置和目标检测装置适当省略重复的描述。It should be understood that the device for constructing a target detection network described below can perform each step of the method for constructing a target detection network in the embodiment of the present application, and the target detection device described below can perform each step in the target detection method in the embodiment of the present application. , and the repeated description is appropriately omitted in the description of the apparatus for constructing a target detection network and the apparatus for target detection according to the embodiments of the present application.

图14是本申请实施例的目标检测网络的构建装置的示意性框图。图14所示的装置5000包括确定单元5001和构建单元5002。FIG. 14 is a schematic block diagram of an apparatus for constructing a target detection network according to an embodiment of the present application. The apparatus 5000 shown in FIG. 14 includes a determination unit 5001 and a construction unit 5002 .

装置5000可以执行本申请实施例的目标检测网络的构建方法的各个步骤，具体地，装置5000既可以执行图7所示的方法，也可以执行图12所示的方法。The apparatus 5000 may execute each step of the method for constructing a target detection network in this embodiment of the present application. Specifically, the apparatus 5000 may execute the method shown in FIG. 7 or the method shown in FIG. 12 .

具体地，当装置5000执行图7所示的方法时，确定单元5001具体可以用于执行步骤1001和1002，构建单元5002可以用于执行步骤1003。Specifically, when the apparatus 5000 executes the method shown in FIG. 7 , the determining unit 5001 may be specifically configured to execute steps 1001 and 1002 , and the constructing unit 5002 may be configured to execute step 1003 .

当装置5000执行图12所示的方法时，确定单元5001具体可以用于执行步骤2001，构建单元5002可以用于执行步骤2002至2006，其中，步骤2002包括步骤2002a和2002b,步骤2003包括步骤2003a和2003b。When the apparatus 5000 executes the method shown in FIG. 12, the determining unit 5001 can be specifically configured to execute step 2001, and the constructing unit 5002 can be configured to execute steps 2002 to 2006, wherein step 2002 includes steps 2002a and 2002b, and step 2003 includes step 2003a and 2003b.

上述装置5000中的确定单元5001和构建单元5002相当于下文中图16所示的装置7000中的处理器7002。The determining unit 5001 and the constructing unit 5002 in the above apparatus 5000 are equivalent to the processor 7002 in the apparatus 7000 shown in FIG. 16 below.

图15是本申请实施例的目标检测装置的示意性框图。图15所示的装置6000包括获取单元6001和检测单元6002。FIG. 15 is a schematic block diagram of a target detection apparatus according to an embodiment of the present application. The apparatus 6000 shown in FIG. 15 includes an acquisition unit 6001 and a detection unit 6002 .

装置6000可以执行本申请实施例的目标检测方法的各个步骤，具体地，装置6000既可以执行图7所示的方法，也可以执行图12所示的方法。The apparatus 6000 may execute each step of the target detection method of the embodiment of the present application. Specifically, the apparatus 6000 may execute the method shown in FIG. 7 or the method shown in FIG. 12 .

图16是本申请实施例提供的神经网络结构搜索装置的硬件结构示意图。图16所示的神经网络结构搜索装置3000(该装置3000具体可以是一种计算机设备)包括存储器3001、处理器3002、通信接口3003以及总线3004。其中，存储器3001、处理器3002、通信接口3003通过总线3004实现彼此之间的通信连接。FIG. 16 is a schematic diagram of a hardware structure of a neural network structure search apparatus provided by an embodiment of the present application. The neural network structure search apparatus 3000 shown in FIG. 16 (the apparatus 3000 may specifically be a computer device) includes a memory 3001 , a processor 3002 , a communication interface 3003 and a bus 3004 . The memory 3001 , the processor 3002 , and the communication interface 3003 are connected to each other through the bus 3004 for communication.

存储器3001可以是只读存储器(read only memory，ROM)，静态存储设备，动态存储设备或者随机存取存储器(random access memory，RAM)。存储器3001可以存储程序，当存储器3001中存储的程序被处理器3002执行时，处理器3002用于执行本申请实施例的目标检测网络的构建方法的各个步骤。The memory 3001 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). The memory 3001 may store a program, and when the program stored in the memory 3001 is executed by the processor 3002, the processor 3002 is configured to execute each step of the method for constructing a target detection network in this embodiment of the present application.

处理器3002可以采用通用的中央处理器(central processing unit，CPU)，微处理器，应用专用集成电路(application specific integrated circuit，ASIC)，图形处理器(graphics processing unit，GPU)或者一个或多个集成电路，用于执行相关程序，以实现本申请实施例的目标检测网络的构建方法。The processor 3002 can be a general-purpose central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU), or one or more The integrated circuit is used for executing the relevant program, so as to realize the construction method of the target detection network in the embodiment of the present application.

处理器3002还可以是一种集成电路芯片，具有信号的处理能力。在实现过程中，本申请的目标检测网络的构建方法的各个步骤可以通过处理器3002中的硬件的集成逻辑电路或者软件形式的指令完成。The processor 3002 may also be an integrated circuit chip with signal processing capability. In the implementation process, each step of the method for constructing a target detection network of the present application may be completed by an integrated logic circuit of hardware in the processor 3002 or instructions in the form of software.

上述处理器3002还可以是通用处理器、数字信号处理器(digital signalprocessing，DSP)、专用集成电路(ASIC)、现成可编程门阵列(field programmable gatearray，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成，或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器，闪存、只读存储器，可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器3001，处理器3002读取存储器3001中的信息，结合其硬件完成本神经网络结构搜索装置中包括的单元所需执行的功能，或者执行本申请实施例的目标检测网络的构建方法。The above-mentioned processor 3002 may also be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic devices, discrete gates Or transistor logic devices, discrete hardware components. The methods, steps, and logic block diagrams disclosed in the embodiments of this application can be implemented or executed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory 3001, and the processor 3002 reads the information in the memory 3001 and, in combination with its hardware, completes the functions required to be performed by the units included in the neural network structure search apparatus, or performs the construction of the target detection network of the embodiment of the present application method.

通信接口3003使用例如但不限于收发器一类的收发装置，来实现装置3000与其他设备或通信网络之间的通信。例如，可以通过通信接口3003获取待构建的神经网络的信息以及构建神经网络过程中需要的训练数据。The communication interface 3003 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 3000 and other devices or a communication network. For example, the information of the neural network to be constructed and the training data required in the process of constructing the neural network can be acquired through the communication interface 3003 .

总线3004可包括在装置3000各个部件(例如，存储器3001、处理器3002、通信接口3003)之间传送信息的通路。The bus 3004 may include a pathway for transferring information between the various components of the device 3000 (eg, the memory 3001, the processor 3002, the communication interface 3003).

图17是本申请实施例的目标检测装置的硬件结构示意图。图17所示的目标检测装置4000包括存储器4001、处理器4002、通信接口4003以及总线4004。其中，存储器4001、处理器4002、通信接口4003通过总线4004实现彼此之间的通信连接。FIG. 17 is a schematic diagram of a hardware structure of a target detection apparatus according to an embodiment of the present application. The target detection apparatus 4000 shown in FIG. 17 includes a memory 4001 , a processor 4002 , a communication interface 4003 , and a bus 4004 . The memory 4001 , the processor 4002 , and the communication interface 4003 are connected to each other through the bus 4004 for communication.

存储器4001可以是ROM，静态存储设备和RAM。存储器4001可以存储程序，当存储器4001中存储的程序被处理器4002执行时，处理器4002和通信接口4003用于执行本申请实施例的目标检测方法的各个步骤。The memory 4001 may be ROM, static storage device and RAM. The memory 4001 may store a program. When the program stored in the memory 4001 is executed by the processor 4002, the processor 4002 and the communication interface 4003 are used to execute each step of the target detection method of the embodiment of the present application.

处理器4002可以采用通用的，CPU，微处理器，ASIC，GPU或者一个或多个集成电路，用于执行相关程序，以实现本申请实施例的目标检测装置中的单元所需执行的功能，或者执行本申请实施例的目标检测方法。The processor 4002 may adopt a general-purpose CPU, a microprocessor, an ASIC, a GPU, or one or more integrated circuits, and is used to execute a related program, so as to realize the function required to be performed by the unit in the target detection apparatus of the embodiment of the present application, Or execute the target detection method of the embodiment of the present application.

处理器4002还可以是一种集成电路芯片，具有信号的处理能力。在实现过程中，本申请实施例的目标检测方法的各个步骤可以通过处理器4002中的硬件的集成逻辑电路或者软件形式的指令完成。The processor 4002 may also be an integrated circuit chip with signal processing capability. In the implementation process, each step of the target detection method in the embodiment of the present application may be completed by an integrated logic circuit of hardware in the processor 4002 or an instruction in the form of software.

上述处理器4002还可以是通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成，或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器，闪存、只读存储器，可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器4001，处理器4002读取存储器4001中的信息，结合其硬件完成本申请实施例的目标检测装置中包括的单元所需执行的功能，或者执行本申请实施例的目标检测方法。The above-mentioned processor 4002 may also be a general-purpose processor, DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components. The methods, steps, and logic block diagrams disclosed in the embodiments of this application can be implemented or executed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory 4001, and the processor 4002 reads the information in the memory 4001 and, in combination with its hardware, completes the functions required to be performed by the units included in the target detection apparatus of the embodiment of the present application, or executes the target detection method of the embodiment of the present application .

通信接口4003使用例如但不限于收发器一类的收发装置，来实现装置4000与其他设备或通信网络之间的通信。例如，可以通过通信接口4003获取待处理图像。The communication interface 4003 implements communication between the device 4000 and other devices or a communication network using a transceiver device such as, but not limited to, a transceiver. For example, the image to be processed can be acquired through the communication interface 4003 .

总线4004可包括在装置4000各个部件(例如，存储器4001、处理器4002、通信接口4003)之间传送信息的通路。Bus 4004 may include a pathway for communicating information between various components of device 4000 (eg, memory 4001, processor 4002, communication interface 4003).

应注意，尽管上述装置7000和装置5000仅仅示出了存储器、处理器、通信接口，但是在具体实现过程中，本领域的技术人员应当理解，装置7000和装置8000还可以包括实现正常运行所必须的其他器件。同时，根据具体需要，本领域的技术人员应当理解，装置7000和装置8000还可包括实现其他附加功能的硬件器件。此外，本领域的技术人员应当理解，装置7000和装置8000也可仅仅包括实现本申请实施例所必须的器件，而不必包括图16和图17中所示的全部器件。It should be noted that although the above-mentioned apparatus 7000 and apparatus 5000 only show a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that the apparatus 7000 and the apparatus 8000 may also include necessary components for normal operation. of other devices. Meanwhile, according to specific needs, those skilled in the art should understand that the apparatus 7000 and the apparatus 8000 may further include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the apparatus 7000 and the apparatus 8000 may also only include the necessary devices for implementing the embodiments of the present application, and do not necessarily include all the devices shown in FIG. 16 and FIG. 17 .

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统、装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

在本申请所提供的几个实施例中，应该理解到，所揭露的系统、装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(read-only memory，ROM)、随机存取存储器(random access memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution, and the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, removable hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

1. A method for constructing an object detection network, wherein the object detection network comprises a backbone network, a feature fusion layer, a regional candidate network RPN and a regional convolutional neural network RCNN, and the method is characterized by comprising the following steps:

determining a search space of the target detection network, wherein the search space of the target detection network comprises a search space of the feature fusion layer, the search space of the feature fusion layer comprises a selectable connection relation of the feature fusion layer, and the selectable connection relation of the feature fusion layer comprises a connection of any node of one layer of neural network and any node of the other layer of neural network in two adjacent layers of neural networks of the feature fusion layer;

determining an initial network architecture of the target detection network according to a search space of the target detection network, wherein a feature fusion layer in the initial network architecture of the target detection network is determined according to the search space of the feature fusion layer;

and according to the search space of the target detection network, iteratively updating the initial network architecture of the target detection network until the target detection network meeting the preset requirement is obtained.

2. The method according to claim 1, wherein the iteratively updating the initial network architecture of the target detection network according to the search space of the target detection network until the target detection network satisfying a preset requirement is obtained includes:

and according to the search space of the target detection network, iteratively updating the initial network architecture of the target detection network to reduce the value of a loss function corresponding to the target detection network, so as to obtain the target detection network meeting the preset requirement, wherein the loss function comprises a target detection error of the target detection network and/or the complexity of the target detection network.

3. The construction method according to claim 1 or 2, wherein the search space of the feature fusion layer further includes a selectable operation type of the feature fusion layer, the selectable operation type of the feature fusion layer includes a convolution operation corresponding to connection of any one node of one layer of the neural network and any one node of the other layer of the neural network in two adjacent layers of the multilayer neural network, and the convolution operation includes a hole convolution operation.

4. The building method according to any one of claims 1-3, wherein the RCNN comprises a plurality of basic units, each basic unit of the plurality of basic units is composed of at least two nodes, the search space of the object detection network further comprises the search space of the RCNN, the search space of the RCNN comprises the search space of each basic unit of the plurality of basic units, the search space of each basic unit comprises the optional connection relation of each basic unit, and the optional connection relation of each basic unit comprises the connection between any two nodes in each basic unit;

the RCNN in an initial network architecture of the target detection network is determined according to a search space of the RCNN.

5. The construction method according to claim 4, wherein the search space of each basic unit further includes a selectable operation type of each basic unit, the selectable operation type of each basic unit includes a convolution operation corresponding to a connection between any two nodes in each basic unit, and the convolution operation includes a hole convolution operation.

6. The construction method according to claim 5, wherein the hole convolution operation corresponding to the connection between any two nodes in each basic unit comprises a hole convolution operation with an interval number of 2.

7. The construction method according to any one of claims 4 to 6, wherein at least two basic units of the plurality of basic units are respectively constituted by different numbers of nodes.

8. The construction method according to any one of claims 4 to 7, wherein the resolution of the input feature map of each basic unit is the same as the resolution of the output feature map of each basic unit.

9. The construction method according to any one of claims 2 to 8, wherein the target detection network satisfying preset requirements satisfies at least one of the following conditions:

the detection performance of the target detection network meets the preset performance requirement;

updating the network architecture of the target detection network for more than or equal to a preset number;

the complexity of the target detection network is less than or equal to a preset complexity.

10. The method of claim 9, wherein the complexity of the target detection network is determined according to at least one of a number or a size of model parameters of the target detection network, a memory access cost MAC of the target detection network, and a number of floating point operations of the target detection network.

11. A method for constructing an object detection network, wherein the object detection network comprises a backbone network, a feature fusion layer, a regional candidate network RPN and a regional convolutional neural network RCNN, and the method is characterized by comprising the following steps:

determining a search space of the target detection network, wherein the RCNN includes a plurality of basic units, each basic unit of the plurality of basic units is composed of at least two nodes, the search space of the target detection network includes the search space of the RCNN, the search space of the RCNN includes the search space of each basic unit of the plurality of basic units, the search space of each basic unit includes a selectable connection relation of each basic unit, and the selectable connection relation of each basic unit includes a connection between any two nodes in each basic unit;

determining an initial network architecture of the target detection network according to the search space of the target detection network, wherein the RCNN in the initial network architecture of the target detection network is determined according to the search space of the RCNN;

12. The method according to claim 11, wherein the iteratively updating the initial network architecture of the target detection network according to the search space of the target detection network until the target detection network satisfying a preset requirement is obtained includes:

13. The building method according to claim 11 or 12, wherein the search space of each basic unit further includes a selectable operation type of each basic unit, the selectable operation type of each basic unit includes a convolution operation corresponding to a connection between any two nodes in each basic unit, and the convolution operation includes a hole convolution operation.

14. The construction method according to claim 13, wherein the hole convolution operation includes a hole convolution operation of an interval number of 2.

15. The construction method according to any one of claims 11 to 14, wherein at least two basic units of the plurality of basic units are respectively constituted by different numbers of nodes.

16. The construction method according to any one of claims 11 to 15, wherein the resolution of the input feature map of each basic unit is the same as the resolution of the output feature map of each basic unit.

17. The construction method according to any one of claims 11 to 16, wherein the target detection network satisfying preset requirements satisfies at least one of the following conditions:

18. The method of claim 17, wherein the complexity of the target detection network is determined according to at least one of a number or a size of model parameters of the target detection network, a memory access cost MAC of the target detection network, and a number of floating point operations of the target detection network.

19. A method of object detection, comprising:

acquiring an image;

processing the image by adopting a target detection network to obtain a target detection result of the image, wherein the target detection result comprises the position of a detection target in the image and a classification result of the detection target;

the target detection network comprises a backbone network, a feature fusion layer, a regional candidate network RPN and a regional convolutional neural network RCNN, the target detection network meets preset requirements, the target detection network is obtained by iteratively updating an initial network architecture of the target detection network according to a search space of the target detection network, and the initial network architecture of the target detection network is determined according to the search space of the target detection network;

the search space of the target detection network comprises the search space of the feature fusion layer, the feature fusion layer in the initial network architecture of the target detection network is determined according to the search space of the feature fusion layer, the search space of the feature fusion layer comprises the optional connection relation of the feature fusion layer, and the optional connection relation of the feature fusion layer comprises the connection of any node of one layer of neural network in two adjacent layers of neural networks in the multilayer neural network with any node in the other layer of neural network.

20. The object detection method of claim 19, wherein the search space of the feature fusion layer further includes a selectable operation type of the feature fusion layer, the selectable operation type of the feature fusion layer includes a convolution operation corresponding to a connection of any one node of one layer of the neural network with any one node of the other layer of the neural network in two adjacent layers of the multi-layer neural network, wherein the convolution operation includes a hole convolution operation.

21. The object detection method according to claim 19 or 20, wherein the RCNN includes a plurality of basic units, each basic unit of the plurality of basic units is composed of at least two nodes, the search space of the object detection network further includes the search space of the RCNN, the search space of the RCNN includes the search space of each basic unit of the plurality of basic units, the search space of each basic unit includes the selectable connection relation of each basic unit, and the selectable connection relation of each basic unit includes a connection between any two nodes within each basic unit;

22. The object detection method of claim 21, wherein the search space of each basic unit further includes a selectable operation type of the each basic unit, the selectable operation type of the each basic unit includes a convolution operation corresponding to a connection between any two nodes in the each basic unit, and the convolution operation includes a hole convolution operation.

23. The object detection method of claim 22, wherein the hole convolution operation includes a hole convolution operation of an interval number of 2.

24. The object detection method of any one of claims 21-23, wherein at least two basic units of the plurality of basic units are respectively composed of different numbers of nodes.

25. The object detection method of any one of claims 21-24, wherein the resolution of the input feature map of each elementary unit is the same as the resolution of the output feature map of each elementary unit.

26. The object detection method according to any of claims 19-25, wherein the object detection network fulfils at least one of the following conditions:

27. The object detection method of claim 26, wherein the complexity of the object detection network is determined according to at least one of the number or size of model parameters of the object detection network, a memory access cost MAC of the object detection network, and the number of floating point operations of the object detection network.

28. A method of object detection, comprising:

acquiring an image;

the RCNN includes a plurality of basic units, each basic unit of the plurality of basic units is composed of at least two nodes, the search space of the target detection network includes the search space of the RCNN, the search space of the RCNN includes the search space of each basic unit of the plurality of basic units, the search space of each basic unit includes the optional connection relationship of each basic unit, the optional connection relationship of each basic unit includes the connection between any two nodes in each basic unit, and the RCNN in the initial network architecture of the target detection network is determined according to the search space of the RCNN.

29. The object detection method of claim 28, wherein the search space of each basic unit further includes a selectable operation type of the each basic unit, the selectable operation type of the each basic unit includes a convolution operation corresponding to a connection between any two nodes in the each basic unit, and the convolution operation includes a hole convolution operation.

30. The object detection method of claim 29, wherein the hole convolution operation includes a hole convolution operation of an interval number of 2.

31. The object detection method of any one of claims 28-30, wherein at least two elementary units of the plurality of elementary units are each constituted by a different number of nodes.

32. The object detection method of any one of claims 28-31, wherein the resolution of the input feature map of each elementary unit is the same as the resolution of the output feature map of each elementary unit.

33. The object detection method according to any of claims 28-32, wherein the object detection network satisfies at least one of the following conditions:

34. The object detection method of claim 33, wherein the complexity of the object detection network is determined according to at least one of the number or size of model parameters of the object detection network, a memory access cost MAC of the object detection network, and the number of floating point operations of the object detection network.

35. An apparatus for constructing an object detection network, comprising:

a memory for storing a program;

a processor for executing the memory-stored program, the processor for performing the construction method of any one of claims 1-10 or 11-18 when the memory-stored program is executed.

36. An object detection device, comprising:

a memory for storing a program;

a processor for executing the memory-stored program, the processor for performing the object detection method of any one of claims 19-27 or 28-34 when the memory-stored program is executed.

37. A computer-readable storage medium, characterized in that the computer-readable medium stores program code for execution by a device, the program code comprising instructions for performing the method of any of claims 1-10 or 11-18.

38. A computer-readable storage medium, characterized in that the computer-readable medium stores program code for execution by a device, the program code comprising instructions for performing the method of any of claims 19-27 or 28-34.