CN113191489B

CN113191489B - Training method of binary neural network model, image processing method and device

Info

Publication number: CN113191489B
Application number: CN202110494162.5A
Authority: CN
Inventors: 刘传建; 王云鹤; 韩凯
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2023-04-18
Anticipated expiration: 2041-04-30
Also published as: CN113191489A

Abstract

The application relates to an image processing technology in the field of computer vision in the field of artificial intelligence, and discloses a binary neural network model training method, an image processing method and an image processing device. The training method comprises the following steps: s1: determining a knowledge distillation framework; the teacher network is a trained neural network model, and the student network is an initial binary neural network model M ₀ (ii) a S2: training a binary neural network model M by using a j +1 th batch of images and a target loss function _j To obtain a binary neural network model M _j+1 (ii) a The target loss function comprises an angle loss item, and the angle loss item is used for describing the difference between an included angle between the characteristic matrix and the weight matrix in the teacher network and an included angle between the characteristic matrix and the weight matrix in the student network; s3: when a preset condition is met, the binary neural network model M is used _j+1 As a target binary neural network model; otherwise let j = j +1 and repeat step S2. According to the application embodiment, the prediction accuracy of the binary neural network model can be improved.

Description

Training method of binary neural network model, image processing method and device

技术领域Technical Field

本申请涉及人工智能技术领域，尤其涉及一种二值神经网络模型的训练方法、图像处理方法和装置。The present application relates to the field of artificial intelligence technology, and in particular to a training method, an image processing method and a device for a binary neural network model.

背景技术Background Art

计算机视觉是各个应用领域，如制造业、检验、文档分析、医疗诊断中各种智能/自主系统中不可分割的一部分，它是一门关于如何运用照相机/摄像机和计算机来获取我们所需的，被拍摄对象的数据与信息的学问。形象地说，就是给计算机安装上眼睛(照相机/摄像机)和大脑(算法)用来代替人眼对目标进行识别、跟踪和测量等，从而使计算机能够感知环境。因为感知可以看作是从感官信号中提取信息，所以计算机视觉也可以看作是研究如何使人工系统从图像或多维数据中“感知”的科学。总的来说，计算机视觉就是用各种成像系统代替视觉器官获取输入信息，再由计算机来代替大脑对这些输入信息完成处理和解释。计算机视觉的最终研究目标就是使计算机能像人那样通过视觉观察和理解世界，具有自主适应环境的能力。Computer vision is an integral part of various intelligent/autonomous systems in various application fields, such as manufacturing, inspection, document analysis, and medical diagnosis. It is a discipline about how to use cameras/camcorders and computers to obtain the data and information we need about the objects being photographed. Figuratively speaking, it is to equip computers with eyes (cameras/camcorders) and brains (algorithms) to replace human eyes in identifying, tracking, and measuring targets, so that computers can perceive the environment. Because perception can be seen as extracting information from sensory signals, computer vision can also be seen as a science that studies how to make artificial systems "perceive" from images or multidimensional data. In general, computer vision uses various imaging systems to replace visual organs to obtain input information, and then computers replace the brain to complete the processing and interpretation of these input information. The ultimate research goal of computer vision is to enable computers to observe and understand the world through vision like humans, and have the ability to adapt to the environment autonomously.

图像分类(image classification，IC)、目标检测(object detection，OD)和图像分割(image segmentation，IS)是高层视觉语义理解任务中的重要问题，随着人工智能技术的快速发展，上述三种基本任务在计算机视觉领域的应用越来越广泛。深度卷积神经网络在上述三种基本任务，尤其是目标检测中占据了越来越重要的地位，但是深度卷积神经网络模型通常有数百万个参数，需要数十亿次浮点运算(floating point operations，FLOPs)来计算，这限制了其在资源有限的平台上的部署。为了在嵌入式设备上实现高效的在线推理，目前通常将深度卷积神经网络模型进行量化，得到二值神经网络模型来进行上述计算机视觉中的基本任务；这主要是由于二值神经网络模型中参数所占的存储空间远小于量化前深度卷积神经网络模型参数所占用的存储空间。Image classification (IC), object detection (OD) and image segmentation (IS) are important problems in high-level visual semantic understanding tasks. With the rapid development of artificial intelligence technology, the above three basic tasks are increasingly widely used in the field of computer vision. Deep convolutional neural networks have played an increasingly important role in the above three basic tasks, especially in object detection. However, deep convolutional neural network models usually have millions of parameters and require billions of floating point operations (FLOPs) to calculate, which limits their deployment on resource-limited platforms. In order to achieve efficient online inference on embedded devices, deep convolutional neural network models are usually quantized to obtain binary neural network models to perform the above basic tasks in computer vision; this is mainly because the storage space occupied by the parameters in the binary neural network model is much smaller than the storage space occupied by the parameters of the deep convolutional neural network model before quantization.

然而，现有技术中的二值神经网络模型相较于神经网络模型在预测精度上降低较多。However, the binary neural network model in the prior art has a much lower prediction accuracy than the neural network model.

发明内容Summary of the invention

本申请实施例提供了一种二值神经网络模型的训练方法、图像处理方法和装置，可以有效提升训练得到的二值神经网络模型的预测精度。The embodiments of the present application provide a training method, an image processing method and a device for a binary neural network model, which can effectively improve the prediction accuracy of the trained binary neural network model.

第一方面，本申请提供了一种二值神经网络模型的训练方法，该方法包括：S1：确定知识蒸馏框架；其中，知识蒸馏框架中的教师网络为训练好的神经网络模型，知识蒸馏框架中的学生网络为初始二值神经网络模型M₀，教师网络和学生网络分别包含N层神经网络，N为正整数；S2：利用第j+1批图像和目标损失函数训练二值神经网络模型M_j，得到二值神经网络模型M_j+1；其中，二值神经网络模型M_j是基于第j批图像训练得到的，j为正整数；目标损失函数包含角度损失项，角度损失项用于描述教师网络中第i层神经网络对应的第一角度和学生网络中第i层神经网络对应的第二角度之间的差异；第一角度是基于教师网络中第i层神经网络的权重矩阵和第j+1批图像在教师网络中第i层神经网络中的输入矩阵得到的；第二角度是基于学生网络中第i层神经网络的二值权重矩阵和第j+1批图像在学生网络中第i层神经网络中的二值输入矩阵得到的；i为小于或等于N的正整数；S3：当满足预设条件时，将二值神经网络模型M_j+1作为目标二值神经网络模型；否则令j＝j+1，并重复步骤S2。In a first aspect, the present application provides a method for training a binary neural network model, the method comprising: S1: determining a knowledge distillation framework; wherein the teacher network in the knowledge distillation framework is a trained neural network model, the student network in the knowledge distillation framework is an initial binary neural network model M ₀ , the teacher network and the student network respectively include N layers of neural networks, N is a positive integer; S2: using the j+1th batch of images and the target loss function to train the binary neural network model M _j to obtain a binary neural network model M _j+1 ; wherein the binary neural network model M _j is obtained based on the training of the j-th batch of images, and j is a positive integer; the target loss function includes an angle loss term, which is used to describe the difference between a first angle corresponding to the i-th layer of the neural network in the teacher network and a second angle corresponding to the i-th layer of the neural network in the student network; the first angle is obtained based on the weight matrix of the i-th layer of the neural network in the teacher network and the input matrix of the j+1-th batch of images in the i-th layer of the neural network in the teacher network; the second angle is obtained based on the binary weight matrix of the i-th layer of the neural network in the student network and the binary input matrix of the j+1-th batch of images in the i-th layer of the neural network in the student network; i is a positive integer less than or equal to N; S3: when the preset conditions are met, the binary neural network model M _j+1 is used as the target binary neural network model; otherwise, let j=j+1 and repeat step S2.

应当理解，在每次训练过程中，学生网络中模型参数的调整是朝向目标损失函数值最小的方向进行。也即当目标损失函数中的角度损失项越小，学生网络和教师网络的性能差异越小。It should be understood that in each training process, the adjustment of the model parameters in the student network is carried out in the direction of minimizing the value of the target loss function. That is, when the angle loss term in the target loss function is smaller, the performance difference between the student network and the teacher network is smaller.

可以看出，在本申请实施例中，采用知识蒸馏框架中训练好的教师网络指导学生网络的训练过程，且在目标损失函数中设计角度损失项来更新学生网络中的参数，一方面可以使得学生网络对输入样本的特征提取结果与教师网络对输入样本的特征提取结果接近，另一方面，使得学生网络中二值权重矩阵和二值输入矩阵间的角度与教师网络中权重矩阵和输入矩阵间的角度相接近。综上，相较于现有技术中二值神经网络模型的训练过程未考虑量化后的角度损失，本申请通过引入知识蒸馏框架和损失函数中的角度损失项可以使得训练后学生网络的性能最大程度上接近教师网络的性能，从而提升本申请实施例中训练得到的目标二值神经网络模型的预测精度。It can be seen that in the embodiment of the present application, the teacher network trained in the knowledge distillation framework is used to guide the training process of the student network, and the angle loss term is designed in the target loss function to update the parameters in the student network. On the one hand, the feature extraction results of the student network for the input samples can be made close to the feature extraction results of the teacher network for the input samples. On the other hand, the angle between the binary weight matrix and the binary input matrix in the student network is made close to the angle between the weight matrix and the input matrix in the teacher network. In summary, compared with the prior art in which the training process of the binary neural network model does not consider the quantized angle loss, the present application can make the performance of the trained student network as close to the performance of the teacher network as possible by introducing the knowledge distillation framework and the angle loss term in the loss function, thereby improving the prediction accuracy of the target binary neural network model trained in the embodiment of the present application.

在一种可行的实施方式中，上述目标损失函数还包括卷积结果损失项；其中，卷积结果损失项用于描述教师网络中第i层神经网络的第一卷积输出结果和学生网络中第i层神经网络的第二卷积输出结果之间的差异；第一卷积输出结果是基于教师网络中第i层神经网络的权重矩阵和第j+1批图像在教师网络中第i层神经网络的输入矩阵得到的；第二卷积输出结果是基于学生网络中第i层神经网络对应的二值权重矩阵和对应的权重缩放尺度因子，以及第j+1批图像在学生网络中第i层神经网络中的二值输入矩阵得到的。In a feasible implementation, the above-mentioned target loss function also includes a convolution result loss term; wherein the convolution result loss term is used to describe the difference between the first convolution output result of the i-th layer neural network in the teacher network and the second convolution output result of the i-th layer neural network in the student network; the first convolution output result is based on the weight matrix of the i-th layer neural network in the teacher network and the input matrix of the i-th layer neural network in the teacher network for the j+1-th batch of images; the second convolution output result is based on the binary weight matrix and the corresponding weight scaling factor corresponding to the i-th layer neural network in the student network, as well as the binary input matrix of the j+1-th batch of images in the i-th layer neural network in the student network.

应当理解，目标损失函数中的卷积结果损失项越小，则目标损失函数损失值越小，说明学生网络和教师网络的性能越接近。It should be understood that the smaller the convolution result loss term in the target loss function, the smaller the loss value of the target loss function, which means that the performance of the student network and the teacher network are closer.

可以看出，在本申请实施例中，通过在目标损失函数中引入卷积结果损失项来使得学生网络中第二卷积输出结果与教师网络中的第一卷积输出结果尽可能接近，即使得学生网络中每层神经网络的输出结果尽可能接近教师网络中每层神经网络的输出结果，进而确保学生网络的输出的预测值与教师网络的预测值相接近，从而提升训练后得到的目标二值神经网络模型的预测精度。It can be seen that in the embodiment of the present application, the convolution result loss term is introduced into the target loss function to make the second convolution output result in the student network as close as possible to the first convolution output result in the teacher network, that is, the output result of each layer of the neural network in the student network is made as close as possible to the output result of each layer of the neural network in the teacher network, thereby ensuring that the predicted value of the output of the student network is close to the predicted value of the teacher network, thereby improving the prediction accuracy of the target binary neural network model obtained after training.

在一种可行的实施方式中，上述目标损失函数还包括权重损失项；其中，权重损失项用于描述教师网络中第i层神经网络的权重矩阵和学生网络中第i层神经网络的二值权重矩阵之间的差异。In a feasible implementation, the above-mentioned objective loss function also includes a weight loss term; wherein the weight loss term is used to describe the difference between the weight matrix of the i-th layer neural network in the teacher network and the binary weight matrix of the i-th layer neural network in the student network.

可以看出，本申请实施例还可以通过在目标损失函数中引入表征教师网络中权重矩阵和学生网络中二值权重矩阵差异的权重损失项，与卷积结果损失项和角度损失项共同训练学生网络。通过在目标损失函数中引入上述三种模型性能衡量指标来训练学生网络，可以最大程度提升训练后得到的目标神经网络模型的性能，使其最大程度接近教师网络的性能。It can be seen that the embodiment of the present application can also introduce a weight loss term that characterizes the difference between the weight matrix in the teacher network and the binary weight matrix in the student network into the target loss function, and train the student network together with the convolution result loss term and the angle loss term. By introducing the above three model performance measurement indicators into the target loss function to train the student network, the performance of the target neural network model obtained after training can be maximized, making it as close to the performance of the teacher network as possible.

在一种可行的实施方式中，上述利用第j+1批图像和目标损失函数训练二值神经网络模型M_j，得到二值神经网络模型M_j+1，包括：将第j+1批图像输入二值神经网络模型M_j，得到第j+1批图像的预测值；基于第j+1批图像的预测值、第j+1批图像的标签和目标损失函数更新二值神经网络模型M_j中每层神经网络中的参数，得到二值神经网络模型M_j+1。In a feasible implementation, the above-mentioned training of the binary neural network model M _j using the j+1th batch of images and the target loss function to obtain the binary neural network model M _j+1 includes: inputting the j+1th batch of images into the binary neural network model M _j to obtain prediction values of the j+1th batch of images; updating the parameters of each layer of the neural network in the binary neural network model M _j based on the prediction values of the j+1th batch of images, the labels of the j+1th batch of images and the target loss function to obtain the binary neural network model M _j+1 .

应当理解，在一次训练的正向传播过程结束后，本申请实施例可以基于该次正向传播过程结束后得到图像预测值；然后基于图像预测值、图像标签，以及上述目标损失函数计算得到该次训练过程中的损失函数值，并基于该目标损失函数值来更新模型参数。It should be understood that after the forward propagation process of a training is completed, the embodiment of the present application can obtain an image prediction value based on the end of the forward propagation process; then calculate the loss function value in the training process based on the image prediction value, the image label, and the above-mentioned target loss function, and update the model parameters based on the target loss function value.

可以看出，在本申请实施例中，在二值神经网络模型的每次训练过程中，可以通过每次训练所使用图像的预测值、所使用图像的标签，以及上述目标损失函数中包含的角度损失项、卷积结果损失项和权重损失项一同逐层更新学生网络中的参数。综上，本申请实施例一方面通过比较学生网络预测值和图像标签间差异，另一方面通过比较学生网络与教师网络对应模型参数(对应目标损失函数中权重损失项)和模型中间计算结果的差异(对应目标损失函数中的角度损失项和卷积结果损失项)，可以使得训练得到的目标二值神经网络模型在特征提取和预测值更加接近教师网络的性能，即提升模型预测准确率。It can be seen that in the embodiment of the present application, in each training process of the binary neural network model, the parameters in the student network can be updated layer by layer through the predicted value of the image used in each training, the label of the image used, and the angle loss term, convolution result loss term and weight loss term contained in the above-mentioned target loss function. In summary, the embodiment of the present application compares the difference between the student network prediction value and the image label on the one hand, and compares the difference between the corresponding model parameters of the student network and the teacher network (corresponding to the weight loss term in the target loss function) and the intermediate calculation results of the model (corresponding to the angle loss term and convolution result loss term in the target loss function) on the other hand, so that the trained target binary neural network model can be closer to the performance of the teacher network in feature extraction and prediction value, that is, the model prediction accuracy is improved.

在一种可行的实施方式中，上述将第j+1批图像输入二值神经网络模型M_j，得到第j+1批图像的预测值，包括：P1：基于二值神经网络模型M_j中第i层神经网络对应的参考权重矩阵和概率矩阵得到第i层神经网络的二值权重矩阵；P2:根据第j+1批图像在第i层神经网络的二值输入矩阵和二值权重矩阵，得到第i层神经网络的第二卷积输出结果；其中，概率矩阵中的任一位置元素用于表征二值权重矩阵中该任一位置上元素取参考权重矩阵中该任一位置上元素的概率值；P3：令i＝i+1，并重复步骤P1-P2，基于第N层神经网络的第二卷积输出结果得到第j+1批图像的预测值。In a feasible implementation, the j+1th batch of images is input into the binary neural network model _Mj to obtain the predicted values of the j+1th batch of images, including: P1: obtaining the binary weight matrix of the i-th layer of the neural network based on the reference weight matrix and probability matrix corresponding to the i-th layer of the neural network in the binary neural network model _Mj ; P2: obtaining the second convolution output result of the i-th layer of the neural network based on the binary input matrix and binary weight matrix of the j+1th batch of images in the i-th layer of the neural network; wherein, the element at any position in the probability matrix is used to characterize the probability value of the element at any position in the binary weight matrix taking the element at any position in the reference weight matrix; P3: let i=i+1, and repeat steps P1-P2, and obtain the predicted values of the j+1th batch of images based on the second convolution output result of the N-th layer of the neural network.

可以看出，在本申请实施例中，在每次训练时每层神经网络的正向传播过程中，先根据每层神经网络对应的参考权重矩阵和概率矩阵确定每层神经网络的二值权重矩阵，进而基于每层神经网络的二值输入矩阵和二值权重矩阵计算得到每层神经网络的第二卷积输出结果，进而在正向传播到第N层神经网络时，可以根据第N层神经网络的第二卷积输出结果计算得到该次训练过程中模型输出的图像预测值。进而在后续反向传播过程中，可以正向传播过程中得到的第二卷积输出结果和第一卷积输出结果，以及图像预测值和图像标签计算该次训练过程中的损失函数值，进而根据损失函数值调整模型参数，确保得到最优的目标二值神经网络模型。It can be seen that in the embodiment of the present application, during the forward propagation process of each layer of the neural network during each training, the binary weight matrix of each layer of the neural network is first determined based on the reference weight matrix and probability matrix corresponding to each layer of the neural network, and then the second convolution output result of each layer of the neural network is calculated based on the binary input matrix and the binary weight matrix of each layer of the neural network, and then when forward propagating to the Nth layer of the neural network, the image prediction value of the model output during this training process can be calculated based on the second convolution output result of the Nth layer of the neural network. Then, in the subsequent back propagation process, the second convolution output result and the first convolution output result obtained during the forward propagation process, as well as the image prediction value and the image label can be used to calculate the loss function value of this training process, and then the model parameters are adjusted according to the loss function value to ensure that the optimal target binary neural network model is obtained.

在一种可行的实施方式中，上述参考权重矩阵包括第一参考权重矩阵和第二参考权重矩阵，概率矩阵包括第一概率矩阵和第二概率矩阵；基于二值神经网络模型M_j中第i层神经网络对应的参考权重矩阵和概率矩阵得到第i层神经网络的二值权重矩阵，包括：基于第一参考权重矩阵中任一位置元素在第一概率矩阵中对应的第一概率值和第二参考权重矩阵中该任一位置元素在第二概率矩阵中对应的第二概率值确定目标二值权重矩阵中该任一位置上的元素；其中，第一概率矩阵中的任一位置元素用于表征二值权重矩阵中该任一位置上元素取第一参考权重矩阵中该任一位置上元素的概率值；第二概率矩阵中的任一位置元素用于表征二值权重矩阵中该任一位置上元素取第二参考权重矩阵中该任一位置上元素的概率值。In a feasible implementation, the reference weight matrix includes a first reference weight matrix and a second reference weight matrix, and the probability matrix includes a first probability matrix and a second probability matrix; the binary weight matrix of the i-th layer neural network is obtained based on the reference weight matrix and probability matrix corresponding to the i-th layer neural network in the binary neural network _model Mj, including: determining the element at any position in the target binary weight matrix based on the first probability value corresponding to the element at any position in the first reference weight matrix in the first probability matrix and the second probability value corresponding to the element at any position in the second reference weight matrix in the second probability matrix; wherein the element at any position in the first probability matrix is used to characterize the probability value of the element at any position in the binary weight matrix taking the element at any position in the first reference weight matrix; the element at any position in the second probability matrix is used to characterize the probability value of the element at any position in the binary weight matrix taking the element at any position in the second reference weight matrix.

可以看出，在本申请实施例中，学生网络中的每层神经网络中的参考权重矩阵包含第一参考权重矩阵和第二参考权重矩阵；且每层神经网络中的概率矩阵包含第一概率矩阵和第二概率矩阵，第一概率矩阵与第一参考权重矩阵对应，第二概率矩阵和第二参考权重矩阵对应。基于第一参考权重矩阵和第二参考权重矩阵相同位置元素的概率值，选取第一参考权重矩阵或第二参考权重矩阵中该位置上元素作为二值权重矩阵中该位置上的元素；基于此规则得到单词训练过程中每层神经网络对应的二值权重矩阵；然后基于该二值权重矩阵计算得到上述实施例中对应的每层神经网络的第二卷积输出结果和每次训练过程中目标损失函数值，从而确保训练过程的正确进行，进而得到最优的二值神经网络模型。It can be seen that in the embodiment of the present application, the reference weight matrix in each layer of the neural network in the student network includes a first reference weight matrix and a second reference weight matrix; and the probability matrix in each layer of the neural network includes a first probability matrix and a second probability matrix, the first probability matrix corresponds to the first reference weight matrix, and the second probability matrix corresponds to the second reference weight matrix. Based on the probability values of the elements at the same position in the first reference weight matrix and the second reference weight matrix, the elements at that position in the first reference weight matrix or the second reference weight matrix are selected as the elements at that position in the binary weight matrix; based on this rule, the binary weight matrix corresponding to each layer of the neural network in the word training process is obtained; then, based on the binary weight matrix, the second convolution output result of each layer of the neural network corresponding to the above embodiment and the target loss function value in each training process are calculated, thereby ensuring the correct conduct of the training process, and then obtaining the optimal binary neural network model.

在一种可行的实施方式中，上述根据第j+1批图像在第i层神经网络的二值输入矩阵和二值权重矩阵，得到第i层神经网络的第二卷积输出结果，包括：基于第j+1批图像中每张图像在第i层神经网络中的二值输入矩阵和二值权重矩阵分别进行卷积运算，得到每张图像的参考特征矩阵；利用第i层神经网络的权重缩放尺度因子对每张图像的参考特征矩阵进行缩放，得到第二卷积输出结果。In a feasible implementation, the above-mentioned second convolution output result of the i-th layer neural network is obtained based on the binary input matrix and binary weight matrix of the j+1-th batch of images in the i-th layer neural network, including: performing convolution operations on the binary input matrix and binary weight matrix of each image in the j+1-th batch of images in the i-th layer neural network respectively to obtain a reference feature matrix of each image; and scaling the reference feature matrix of each image using the weight scaling factor of the i-th layer neural network to obtain a second convolution output result.

可以看出，在本申请实施例中，基于每层神经网络中二值输入矩阵和二值权重矩阵得到每张图像的参考特征矩阵；然后利用每层神经网络的权重缩放尺度因子对每张图像的参考特征矩阵进行缩放，得到第二卷积输出结果。该第二卷积输出结果表征了输入图像的特征，引而在目标损失函数中设计卷积结果损失项可以使得训练得到的目标二值神经网络模型尽可能保留教师网络的特征提取能力，进而提升目标二值神经网络模型的预测准确率。It can be seen that in the embodiment of the present application, the reference feature matrix of each image is obtained based on the binary input matrix and the binary weight matrix in each layer of the neural network; then the reference feature matrix of each image is scaled using the weight scaling factor of each layer of the neural network to obtain the second convolution output result. The second convolution output result characterizes the characteristics of the input image, and the design of the convolution result loss term in the target loss function can make the trained target binary neural network model retain the feature extraction ability of the teacher network as much as possible, thereby improving the prediction accuracy of the target binary neural network model.

在一种可行的实施方式中，上述参数包括概率矩阵或权重缩放尺度因子中的至少一个。In a feasible implementation, the above parameters include at least one of a probability matrix or a weight scaling factor.

可以看出，在本申请实施例中，每次训练反向传播过程会更新每层神经网络中的概率矩阵或权重缩放尺度因子中的至少一个，从而使得下次训练时正向传播过程中，每层神经网络中的二值权重矩阵得到更新，进而基于更新后的二值权重矩阵和/或权重缩放尺度因子得到该次训练模型输出的图像预测值和损失函数值，基于得到的损失函数值进一步调整模型参数，确保得到与教师网络性能接近的学生网络。It can be seen that in the embodiment of the present application, each training back propagation process will update at least one of the probability matrix or weight scaling factor in each layer of the neural network, so that during the forward propagation process of the next training, the binary weight matrix in each layer of the neural network is updated, and then based on the updated binary weight matrix and/or weight scaling factor, the image prediction value and loss function value output by the training model are obtained, and based on the obtained loss function value, the model parameters are further adjusted to ensure that a student network with performance close to that of the teacher network is obtained.

第二方面，本申请提供了一种模型训练方法，上述模型包括教师网络和学生网络，教师网络为训练好的神经网络模型，学生网络为二值神经网络模型，教师网络和学生网络分别包含N层神经网络，N为正整数，该方法包括：利用教师网络和目标损失函数对二值神经网络模型进行训练；其中，目标损失函数包括角度损失项，角度损失项用于描述教师网络中第i层神经网络对应的第一角度和学生网络中第i层神经网络对应的第二角度之间的差异；第一角度是基于教师网络中第i层神经网络的权重矩阵和教师网络中第i层神经网络中的输入矩阵得到的；第二角度是基于学生网络中第i层神经网络的二值权重矩阵和学生网络中第i层神经网络中的二值输入矩阵得到的；i为小于或等于N的正整数；重复执行上述步骤，直到满足迭代终止条件，得到目标二值神经网络模型。In the second aspect, the present application provides a model training method, wherein the above-mentioned model includes a teacher network and a student network, the teacher network is a trained neural network model, the student network is a binary neural network model, the teacher network and the student network respectively include N layers of neural networks, N is a positive integer, and the method includes: using the teacher network and the target loss function to train the binary neural network model; wherein the target loss function includes an angle loss term, and the angle loss term is used to describe the difference between a first angle corresponding to the i-th layer of the neural network in the teacher network and a second angle corresponding to the i-th layer of the neural network in the student network; the first angle is obtained based on the weight matrix of the i-th layer of the neural network in the teacher network and the input matrix in the i-th layer of the neural network in the teacher network; the second angle is obtained based on the binary weight matrix of the i-th layer of the neural network in the student network and the binary input matrix in the i-th layer of the neural network in the student network; i is a positive integer less than or equal to N; repeat the above steps until the iteration termination condition is met to obtain the target binary neural network model.

在一种可行的实施方式中，目标损失函数还包括卷积结果损失项；其中，卷积结果损失项用于描述教师网络中第i层神经网络的第一卷积输出结果和学生网络中第i层神经网络的第二卷积输出结果之间的差异；第一卷积输出结果是基于教师网络中第i层神经网络的权重矩阵和教师网络中第i层神经网络的输入矩阵得到的；第二卷积输出结果是基于学生网络中第i层神经网络对应的二值权重矩阵和对应的权重缩放尺度因子，以及学生网络中第i层神经网络中的二值输入矩阵得到的。In a feasible implementation, the target loss function also includes a convolution result loss term; wherein the convolution result loss term is used to describe the difference between the first convolution output result of the i-th layer neural network in the teacher network and the second convolution output result of the i-th layer neural network in the student network; the first convolution output result is based on the weight matrix of the i-th layer neural network in the teacher network and the input matrix of the i-th layer neural network in the teacher network; the second convolution output result is based on the binary weight matrix and the corresponding weight scaling factor corresponding to the i-th layer neural network in the student network, as well as the binary input matrix in the i-th layer neural network in the student network.

在一种可行的实施方式中，目标损失函数还包括权重损失项；其中，权重损失项用于描述教师网络中第i层神经网络的权重矩阵和学生网络中第i层神经网络的二值权重矩阵之间的差异。In a feasible implementation, the objective loss function also includes a weight loss term; wherein the weight loss term is used to describe the difference between the weight matrix of the i-th layer neural network in the teacher network and the binary weight matrix of the i-th layer neural network in the student network.

在一种可行的实施方式中，上述利用教师网络和目标损失函数对二值神经网络模型进行训练，包括：将训练图像输入二值神经网络模型，得到训练图像的预测值；基于训练图像的预测值、训练图像的标签和目标损失函数更新二值神经网络模型中的参数。In a feasible implementation, the above-mentioned training of the binary neural network model using the teacher network and the target loss function includes: inputting the training image into the binary neural network model to obtain the predicted value of the training image; and updating the parameters in the binary neural network model based on the predicted value of the training image, the label of the training image and the target loss function.

在一种可行的实施方式中，上述将训练图像输入二值神经网络模型，得到训练图像的预测值，包括：P1：基于二值神经网络模型中第i层神经网络对应的参考权重矩阵和概率矩阵得到第i层神经网络的二值权重矩阵；其中，概率矩阵中的任一位置元素用于表征二值权重矩阵中该任一位置上元素取参考权重矩阵中该任一位置上元素的概率值；P2：根据二值权重矩阵和训练图像在第i层神经网络的二值输入矩阵，得到第i层神经网络的第二卷积输出结果；In a feasible implementation, the above-mentioned input of the training image into the binary neural network model to obtain the predicted value of the training image includes: P1: obtaining the binary weight matrix of the i-th layer neural network based on the reference weight matrix and probability matrix corresponding to the i-th layer neural network in the binary neural network model; wherein, the element at any position in the probability matrix is used to characterize the probability value of the element at any position in the binary weight matrix taking the element at any position in the reference weight matrix; P2: obtaining the second convolution output result of the i-th layer neural network according to the binary weight matrix and the binary input matrix of the training image in the i-th layer neural network;

P3：令i＝i+1，并重复步骤P1-P2，基于第N层神经网络的第二卷积输出结果得到训练图像的预测值。P3: Let i=i+1 and repeat steps P1-P2 to obtain the predicted value of the training image based on the second convolution output result of the Nth layer neural network.

在一种可行的实施方式中，参考权重矩阵包括第一参考权重矩阵和第二参考权重矩阵，概率矩阵包括第一概率矩阵和第二概率矩阵；基于二值神经网络模型中第i层神经网络对应的参考权重矩阵和概率矩阵得到第i层神经网络的二值权重矩阵，包括：基于第一参考权重矩阵中任一位置元素在第一概率矩阵中对应的第一概率值和第二参考权重矩阵中该任一位置元素在第二概率矩阵中对应的第二概率值确定目标二值权重矩阵中该任一位置上的元素；其中，第一概率矩阵中的任一位置元素用于表征二值权重矩阵中该任一位置上元素取第一参考权重矩阵中该任一位置上元素的概率值；第二概率矩阵中的任一位置元素用于表征二值权重矩阵中该任一位置上元素取第二参考权重矩阵中该任一位置上元素的概率值。In a feasible implementation, the reference weight matrix includes a first reference weight matrix and a second reference weight matrix, and the probability matrix includes a first probability matrix and a second probability matrix; the binary weight matrix of the i-th layer neural network is obtained based on the reference weight matrix and probability matrix corresponding to the i-th layer neural network in the binary neural network model, including: determining the element at any position in the target binary weight matrix based on the first probability value corresponding to the element at any position in the first reference weight matrix in the first probability matrix and the second probability value corresponding to the element at any position in the second reference weight matrix in the second probability matrix; wherein the element at any position in the first probability matrix is used to characterize the probability value of the element at any position in the binary weight matrix taking the element at any position in the first reference weight matrix; the element at any position in the second probability matrix is used to characterize the probability value of the element at any position in the binary weight matrix taking the element at any position in the second reference weight matrix.

在一种可行的实施方式中，上述根据所述二值权重矩阵和训练图像在第i层神经网络的二值输入矩阵，得到第i层神经网络的第二卷积输出结果，包括：对二值权重矩阵和训练图像在第i层神经网络中的二值输入矩阵进行卷积运算，得到训练图像的参考特征矩阵；利用第i层神经网络的权重缩放尺度因子对训练图像的参考特征矩阵进行缩放，得到第二卷积输出结果。In a feasible implementation, the above-mentioned second convolution output result of the i-th layer neural network is obtained according to the binary weight matrix and the binary input matrix of the training image in the i-th layer neural network, including: performing a convolution operation on the binary weight matrix and the binary input matrix of the training image in the i-th layer neural network to obtain a reference feature matrix of the training image; scaling the reference feature matrix of the training image using the weight scaling factor of the i-th layer neural network to obtain a second convolution output result.

在一种可行的实施方式中，参数包括概率矩阵或权重缩放尺度因子中的至少一个。In one possible implementation, the parameter includes at least one of a probability matrix or a weight scaling factor.

应当理解，第二方面中各实施例的有益效果可以对应参照第一方面中对应实施例的描述，此处不再赘述。It should be understood that the beneficial effects of each embodiment in the second aspect can be referred to the description of the corresponding embodiment in the first aspect, and will not be repeated here.

第三方面，本申请提供了一种图像处理方法，该方法包括：获取待处理图像；利用目标二值神经网络模型对待处理图像进行图像处理，得到待处理图像的预测值；其中，目标二值神经网络模型通过K次训练得到的，在K次训练中的第j+1次训练中：利用第j+1批图像和目标损失函数训练二值神经网络模型M_j，得到二值神经网络模型M_j+1；二值神经网络模型M_j为知识蒸馏框架中的学生网络；知识蒸馏框架中的教师网络为训练好的神经网络模型，教师网络和学生网络分别包含N层神经网络，N为正整数；目标损失函数包含角度损失项；K为正整数，j为大于或等于零，且小于或等于K的整数；角度损失项用于描述教师网络中第i层神经网络对应的第一角度和学生网络中第i层神经网络对应的第二角度之间的差异；第一角度是基于教师网络中第i层神经网络对应的权重矩阵和第j+1批图像在教师网络中第i层神经网络中的输入矩阵得到的；第二角度是基于学生网络中第i层神经网络对应的二值权重矩阵和第j+1批图像在学生网络中第i层神经网络中的二值输入矩阵得到的；i为小于或等于N的正整数。In a third aspect, the present application provides an image processing method, the method comprising: obtaining an image to be processed; performing image processing on the image to be processed using a target binary neural network model to obtain a predicted value of the image to be processed; wherein the target binary neural network model is obtained through K trainings, and in the j+1th training of the K trainings: using the j+1th batch of images and the target loss function to train the binary neural network model M _j to obtain the binary neural network model M _j+1 ; the binary neural network model M _j is the student network in the knowledge distillation framework; the teacher network in the knowledge distillation framework is a trained neural network model, and the teacher network and the student network respectively contain N layers of neural networks, where N is a positive integer; the target loss function contains an angle loss term; K is a positive integer, and j is an integer greater than or equal to zero and less than or equal to K; the angle loss term is used to describe the difference between the first angle corresponding to the i-th layer of the neural network in the teacher network and the second angle corresponding to the i-th layer of the neural network in the student network; the first angle is based on the weight matrix corresponding to the i-th layer of the neural network in the teacher network and the input matrix of the j+1th batch of images in the i-th layer of the neural network in the teacher network; the second angle is based on the binary weight matrix corresponding to the i-th layer of the neural network in the student network and the binary input matrix of the j+1th batch of images in the i-th layer of the neural network in the student network; i is a positive integer less than or equal to N.

可以看出，在本申请实施例中，由于第一方面中的方法在训练时引入知识蒸馏框架，并在目标损失函数中引入相应的角度损失项，因而通过第一方面中的方法训练得到的目标二值神经网络模型，相对于现有的二值神经网络模型而言，模型精度有了较大提升；同时，由于二值神经网络相比于教师网络而言模型参数占用的存储空间较小，更加轻量化，因而更适合在嵌入式设备上使用，应用前景更加广泛。It can be seen that in the embodiments of the present application, since the method in the first aspect introduces a knowledge distillation framework during training and introduces a corresponding angle loss term in the target loss function, the target binary neural network model trained by the method in the first aspect has a greatly improved model accuracy compared to the existing binary neural network model; at the same time, since the binary neural network model parameters occupy less storage space than the teacher network and are more lightweight, it is more suitable for use on embedded devices and has a broader application prospect.

在一种可行的实施方式中，上述图像处理包括图像分类、目标检测或图像分割中的至少一种。In a feasible implementation, the above-mentioned image processing includes at least one of image classification, target detection or image segmentation.

可以看出，本申请实施例中的方法可以用于图像分类、目标检测和图像分割中的任一任务下，通过在上述三个任务下运用本申请实施例中的图像处理方法可以提高图像处理的效果，即本模型的通用性好。It can be seen that the method in the embodiment of the present application can be used for any task of image classification, target detection and image segmentation. By applying the image processing method in the embodiment of the present application to the above three tasks, the image processing effect can be improved, that is, the versatility of this model is good.

第四方面，本申请提供了一种图像处理方法，该方法包括：获取待处理图像；利用目标二值神经网络模型对待处理图像进行图像处理，得到待处理图像的预测值；其中，目标二值神经网络模型是通过目标损失函数对知识蒸馏框架中的初始二值神经网络模型M₀训练得到的，初始二值神经网络模型M₀为知识蒸馏框架中的学生网络，知识蒸馏框架中的教师网络为训练好的神经网络模型；目标损失函数包括角度损失项，角度损失项用于描述教师网络中特征矩阵和权重矩阵间的夹角和学生网络中特征矩阵和权重矩阵间夹角的差异。In a fourth aspect, the present application provides an image processing method, the method comprising: obtaining an image to be processed; performing image processing on the image to be processed using a target binary neural network model to obtain a predicted value of the image to be processed; wherein the target binary neural network model is obtained by training an initial binary neural network model _M0 in a knowledge distillation framework through a target loss function, the initial binary neural network model _M0 is a student network in the knowledge distillation framework, and the teacher network in the knowledge distillation framework is a trained neural network model; the target loss function includes an angle loss term, and the angle loss term is used to describe the difference between the angle between the feature matrix and the weight matrix in the teacher network and the angle between the feature matrix and the weight matrix in the student network.

可以看出，在本申请实施例中，由于第一方面中的方法在训练时引入知识蒸馏框架，并在目标损失函数中引入相应的角度损失项，因而通过第一方面中的方法训练得到的目标二值神经网络模型，相对于现有的二值神经网络模型而言，模型精度有了较大提升；同时，由于二值神经网络相比于教师网络中的神经网络模型而言模型参数占用的存储空间较小，更加轻量化，因而在嵌入式设备中有良好的应用前景。It can be seen that in the embodiments of the present application, since the method in the first aspect introduces a knowledge distillation framework during training and introduces a corresponding angle loss term in the target loss function, the target binary neural network model trained by the method in the first aspect has a greatly improved model accuracy compared to the existing binary neural network model; at the same time, since the binary neural network model parameters occupy less storage space and are more lightweight than the neural network model in the teacher network, it has good application prospects in embedded devices.

第五方面，本申请提供了一种二值神经网络模型的训练装置，该装置包括：确定单元，用于执行步骤S1。训练单元，用于执行步骤S2。决策单元，用于执行步骤S3。步骤S1：确定知识蒸馏框架；其中，知识蒸馏框架中的教师网络为训练好的神经网络模型，知识蒸馏框架中的学生网络为初始二值神经网络模型M₀，教师网络和学生网络分别包含N层神经网络，N为正整数。步骤S2：利用第j+1批图像和目标损失函数训练二值神经网络模型M_j，得到二值神经网络模型M_j+1；其中，二值神经网络模型M_j是基于第j批图像训练得到的，j为正整数；目标损失函数包含角度损失项，角度损失项用于描述教师网络中第i层神经网络对应的第一角度和学生网络中第i层神经网络对应的第二角度之间的差异；第一角度是基于教师网络中第i层神经网络的权重矩阵和第j+1批图像在教师网络中第i层神经网络中的输入矩阵得到的；第二角度是基于学生网络中第i层神经网络的二值权重矩阵和第j+1批图像在学生网络中第i层神经网络中的二值输入矩阵得到的；i为小于或等于N的正整数。步骤S3：当满足预设条件时，将二值神经网络模型M_j+1作为目标二值神经网络模型；否则令j＝j+1，并重复步骤S2。In a fifth aspect, the present application provides a training device for a binary neural network model, the device comprising: a determination unit, configured to execute step S1. A training unit, configured to execute step S2. A decision unit, configured to execute step S3. Step S1: Determine a knowledge distillation framework; wherein the teacher network in the knowledge distillation framework is a trained neural network model, the student network in the knowledge distillation framework is an initial binary neural network model M ₀ , the teacher network and the student network each contain N layers of neural networks, and N is a positive integer. Step S2: Use the j+1th batch of images and the target loss function to train the binary neural network model M _j to obtain the binary neural network model M _j+1 ; wherein the binary neural network model M _j is obtained based on the jth batch of images, and j is a positive integer; the target loss function includes an angle loss term, which is used to describe the difference between the first angle corresponding to the i-th layer of the neural network in the teacher network and the second angle corresponding to the i-th layer of the neural network in the student network; the first angle is obtained based on the weight matrix of the i-th layer of the neural network in the teacher network and the input matrix of the j+1th batch of images in the i-th layer of the neural network in the teacher network; the second angle is obtained based on the binary weight matrix of the i-th layer of the neural network in the student network and the binary input matrix of the j+1th batch of images in the i-th layer of the neural network in the student network; i is a positive integer less than or equal to N. Step S3: When the preset conditions are met, the binary neural network model M _j+1 is used as the target binary neural network model; otherwise, let j=j+1 and repeat step S2.

在一种可行的实施方式中，上述训练单元具体用于：将第j+1批图像输入二值神经网络模型M_j，得到第j+1批图像的预测值；基于第j+1批图像的预测值、第j+1批图像的标签和目标损失函数更新二值神经网络模型M_j每层神经网络的参数，得到二值神经网络模型M_j+1。In a feasible implementation, the training unit is specifically used to: input the j+1th batch of images into the binary neural network model M _j to obtain the predicted values of the j+1th batch of images; update the parameters of each layer of the binary neural network model M _j based on the predicted values of the j+1th batch of images, the labels of the j+1th batch of images and the target loss function to obtain the binary neural network model M _j+1 .

在一种可行的实施方式中，在上述将第j+1批图像输入二值神经网络模型M_j，得到第j+1批图像的预测值的方面，训练单元具体用于：P1：基于二值神经网络模型M_j中第i层神经网络对应的参考权重矩阵和概率矩阵得到第i层神经网络的二值权重矩阵；P2：根据第j+1批图像在第i层神经网络的二值输入矩阵和二值权重矩阵，得到第i层神经网络的第二卷积输出结果；其中，概率矩阵中的任一位置元素用于表征二值权重矩阵中该任一位置上元素取参考权重矩阵中该任一位置上元素的概率值；P3：令i＝i+1，重复步骤P1-P2，基于第N层神经网络的第二卷积输出结果得到第j+1批图像的预测值。In a feasible implementation, in the aspect of inputting the j+1th batch of images into the binary neural network model _Mj to obtain the predicted values of the j+1th batch of images, the training unit is specifically used for: P1: obtaining the binary weight matrix of the i-th layer of the neural network based on the reference weight matrix and probability matrix corresponding to the i-th layer of the neural network in the binary neural network model _Mj ; P2: obtaining the second convolution output result of the i-th layer of the neural network based on the binary input matrix and binary weight matrix of the j+1th batch of images in the i-th layer of the neural network; wherein, the element at any position in the probability matrix is used to characterize the probability value of the element at any position in the binary weight matrix taking the element at any position in the reference weight matrix; P3: let i=i+1, repeat steps P1-P2, and obtain the predicted values of the j+1th batch of images based on the second convolution output result of the N-th layer of the neural network.

在一种可行的实施方式中，上述参考权重矩阵包括第一参考权重矩阵和第二参考权重矩阵，概率矩阵包括第一概率矩阵和第二概率矩阵；在基于二值神经网络模型M_j中第i层神经网络对应的参考权重矩阵和概率矩阵得到第i层神经网络的二值权重矩阵的方面，训练单元具体用于：基于第一参考权重矩阵中任一位置元素在第一概率矩阵中对应的第一概率值和第二参考权重矩阵中该任一位置元素在第二概率矩阵中对应的第二概率值确定目标二值权重矩阵中该任一位置上的元素；其中，第一概率矩阵中的任一位置元素用于表征二值权重矩阵中该任一位置上元素取第一参考权重矩阵中该任一位置上元素的概率值；第二概率矩阵中的任一位置元素用于表征二值权重矩阵中该任一位置上元素取第二参考权重矩阵中该任一位置上元素的概率值。In a feasible implementation, the above-mentioned reference weight matrix includes a first reference weight matrix and a second reference weight matrix, and the probability matrix includes a first probability matrix and a second probability matrix; in terms of obtaining the binary weight matrix of _{the i-th layer neural network based on the reference weight matrix and probability matrix corresponding to the i-th layer neural network in} the binary neural network model M j, the training unit is specifically used to: determine the element at any position in the target binary weight matrix based on the first probability value corresponding to the element at any position in the first reference weight matrix in the first probability matrix and the second probability value corresponding to the element at any position in the second reference weight matrix in the second probability matrix; wherein, the element at any position in the first probability matrix is used to represent the probability value of the element at any position in the binary weight matrix taking the element at any position in the first reference weight matrix; the element at any position in the second probability matrix is used to represent the probability value of the element at any position in the binary weight matrix taking the element at any position in the second reference weight matrix.

在一种可行的实施方式中，在上述根据第j+1批图像在第i层神经网络的二值输入矩阵和二值权重矩阵，得到第i层神经网络的第二卷积输出结果的方面，训练单元具体用于：基于第j+1批图像中每张图像在第i层神经网络中的二值输入矩阵和二值权重矩阵分别进行卷积运算，得到每张图像的参考特征矩阵；利用第i层神经网络的权重缩放尺度因子对每张图像的参考特征矩阵进行缩放，得到第二卷积输出结果。In a feasible implementation, in the aspect of obtaining the second convolution output result of the i-th layer neural network based on the binary input matrix and the binary weight matrix of the j+1-th batch of images in the i-th layer neural network, the training unit is specifically used to: perform convolution operations based on the binary input matrix and the binary weight matrix of each image in the j+1-th batch of images in the i-th layer neural network, respectively, to obtain a reference feature matrix for each image; and scale the reference feature matrix of each image using the weight scaling factor of the i-th layer neural network to obtain the second convolution output result.

第六方面，本申请提供了一种模型训练装置，该模型包括教师网络和学生网络，教师网络为训练好的神经网络模型，学生网络为二值神经网络模型，教师网络和学生网络分别包含N层神经网络，N为正整数，该装置包括：训练单元，用于利用教师网络和目标损失函数对二值神经网络模型进行训练；其中，目标损失函数包括角度损失项，角度损失项用于描述教师网络中第i层神经网络对应的第一角度和学生网络中第i层神经网络对应的第二角度之间的差异；第一角度是基于教师网络中第i层神经网络的权重矩阵和教师网络中第i层神经网络中的输入矩阵得到的；第二角度是基于学生网络中第i层神经网络的二值权重矩阵和学生网络中第i层神经网络中的二值输入矩阵得到的；i为小于或等于N的正整数；决策单元，用于重复执行上述步骤，直到满足迭代终止条件，得到目标二值神经网络模型。In a sixth aspect, the present application provides a model training device, which includes a teacher network and a student network, the teacher network is a trained neural network model, the student network is a binary neural network model, the teacher network and the student network respectively include N layers of neural networks, N is a positive integer, and the device includes: a training unit, which is used to train the binary neural network model using the teacher network and the target loss function; wherein the target loss function includes an angle loss term, and the angle loss term is used to describe the difference between a first angle corresponding to the i-th layer of the neural network in the teacher network and a second angle corresponding to the i-th layer of the neural network in the student network; the first angle is obtained based on the weight matrix of the i-th layer of the neural network in the teacher network and the input matrix in the i-th layer of the neural network in the teacher network; the second angle is obtained based on the binary weight matrix of the i-th layer of the neural network in the student network and the binary input matrix in the i-th layer of the neural network in the student network; i is a positive integer less than or equal to N; a decision unit, which is used to repeatedly execute the above steps until the iteration termination condition is met to obtain the target binary neural network model.

在一种可行的实施方式中，在上述利用教师网络和目标损失函数对二值神经网络模型进行训练的方面，训练单元具体用于：将训练图像输入二值神经网络模型，得到训练图像的预测值；基于训练图像的预测值、训练图像的标签和目标损失函数更新二值神经网络模型中的参数。In a feasible implementation, in the above-mentioned aspect of training the binary neural network model using the teacher network and the target loss function, the training unit is specifically used to: input the training image into the binary neural network model to obtain the predicted value of the training image; and update the parameters in the binary neural network model based on the predicted value of the training image, the label of the training image and the target loss function.

在一种可行的实施方式中，在上述将训练图像输入二值神经网络模型，得到训练图像的预测值的方面，训练单元具体用于：P1：基于二值神经网络模型中第i层神经网络对应的参考权重矩阵和概率矩阵得到第i层神经网络的二值权重矩阵；其中，概率矩阵中的任一位置元素用于表征二值权重矩阵中该任一位置上元素取参考权重矩阵中该任一位置上元素的概率值；P2：根据二值权重矩阵和训练图像在第i层神经网络的二值输入矩阵，得到第i层神经网络的第二卷积输出结果；P3：令i＝i+1，并重复步骤P1-P2，基于第N层神经网络的第二卷积输出结果得到训练图像的预测值。In a feasible implementation, in the aspect of inputting the training image into the binary neural network model to obtain the predicted value of the training image, the training unit is specifically used for: P1: obtaining the binary weight matrix of the i-th layer neural network based on the reference weight matrix and probability matrix corresponding to the i-th layer neural network in the binary neural network model; wherein, the element at any position in the probability matrix is used to characterize the probability value of the element at any position in the binary weight matrix taking the element at any position in the reference weight matrix; P2: obtaining the second convolution output result of the i-th layer neural network according to the binary weight matrix and the binary input matrix of the training image in the i-th layer neural network; P3: setting i=i+1, and repeating steps P1-P2, and obtaining the predicted value of the training image based on the second convolution output result of the N-th layer neural network.

在一种可行的实施方式中，参考权重矩阵包括第一参考权重矩阵和第二参考权重矩阵，概率矩阵包括第一概率矩阵和第二概率矩阵；在基于二值神经网络模型中第i层神经网络对应的参考权重矩阵和概率矩阵得到第i层神经网络的二值权重矩阵的方面，训练单元具体用于：基于第一参考权重矩阵中任一位置元素在第一概率矩阵中对应的第一概率值和第二参考权重矩阵中该任一位置元素在第二概率矩阵中对应的第二概率值确定目标二值权重矩阵中该任一位置上的元素；其中，第一概率矩阵中的任一位置元素用于表征二值权重矩阵中该任一位置上元素取第一参考权重矩阵中该任一位置上元素的概率值；第二概率矩阵中的任一位置元素用于表征二值权重矩阵中该任一位置上元素取第二参考权重矩阵中该任一位置上元素的概率值。In a feasible implementation, the reference weight matrix includes a first reference weight matrix and a second reference weight matrix, and the probability matrix includes a first probability matrix and a second probability matrix; in terms of obtaining the binary weight matrix of the i-th layer neural network based on the reference weight matrix and probability matrix corresponding to the i-th layer neural network in the binary neural network model, the training unit is specifically used to: determine the element at any position in the target binary weight matrix based on the first probability value corresponding to the element at any position in the first reference weight matrix in the first probability matrix and the second probability value corresponding to the element at any position in the second reference weight matrix in the second probability matrix; wherein the element at any position in the first probability matrix is used to characterize the probability value of the element at any position in the binary weight matrix taking the element at any position in the first reference weight matrix; the element at any position in the second probability matrix is used to characterize the probability value of the element at any position in the binary weight matrix taking the element at any position in the second reference weight matrix.

在一种可行的实施方式中，在上述根据二值权重矩阵和训练图像在第i层神经网络的二值输入矩阵，得到第i层神经网络的第二卷积输出结果的方面，训练单元具体用于：对二值权重矩阵和训练图像在第i层神经网络中的二值输入矩阵进行卷积运算，得到训练图像的参考特征矩阵；利用第i层神经网络的权重缩放尺度因子对训练图像的参考特征矩阵进行缩放，得到第二卷积输出结果。In a feasible implementation, in the aspect of obtaining the second convolution output result of the i-th layer neural network based on the binary weight matrix and the binary input matrix of the training image in the i-th layer neural network, the training unit is specifically used to: perform a convolution operation on the binary weight matrix and the binary input matrix of the training image in the i-th layer neural network to obtain a reference feature matrix of the training image; and scale the reference feature matrix of the training image using the weight scaling factor of the i-th layer neural network to obtain the second convolution output result.

第七方面，本申请提供了一种图像处理装置，该装置包括：获取单元，用于获取待处理图像；处理单元，用于利用目标二值神经网络模型对待处理图像进行图像处理，得到待处理图像的预测值；其中，目标二值神经网络模型通过K次训练得到的，在K次训练中的第j+1次训练中：利用第j+1批图像和目标损失函数训练二值神经网络模型M_j，得到二值神经网络模型M _j+1；二值神经网络模型M_j为知识蒸馏框架中的学生网络；知识蒸馏框架中的教师网络为训练好的神经网络模型，教师网络和学生网络分别包含N层神经网络，N为正整数；目标损失函数包含角度损失项；K为正整数，j为大于或等于零，且小于或等于K的整数；角度损失项用于描述教师网络中第i层神经网络对应的第一角度和学生网络中第i层神经网络对应的第二角度之间的差异；第一角度是基于教师网络中第i层神经网络对应的权重矩阵和第j+1批图像在教师网络中第i层神经网络中的输入矩阵得到的；第二角度是基于学生网络中第i层神经网络对应的二值权重矩阵和第j+1批图像在学生网络中第i层神经网络中的二值输入矩阵得到的；i为小于或等于N的正整数。In a seventh aspect, the present application provides an image processing device, the device comprising: an acquisition unit, for acquiring an image to be processed; a processing unit, for performing image processing on the image to be processed using a target binary neural network model to obtain a predicted value of the image to be processed; wherein the target binary neural network model is obtained through K trainings, and in the j+1th training of the K trainings: the binary neural network model M _j is trained using the j+1th batch of images and the target loss function to obtain a binary neural network model M _j+1 ; the binary neural network model M _j is the student network in the knowledge distillation framework; the teacher network in the knowledge distillation framework is a trained neural network model, and the teacher network and the student network respectively contain N layers of neural networks, where N is a positive integer; the target loss function contains an angle loss term; K is a positive integer, and j is an integer greater than or equal to zero and less than or equal to K; the angle loss term is used to describe the difference between the first angle corresponding to the i-th layer of the neural network in the teacher network and the second angle corresponding to the i-th layer of the neural network in the student network; the first angle is based on the weight matrix corresponding to the i-th layer of the neural network in the teacher network and the input matrix of the j+1th batch of images in the i-th layer of the neural network in the teacher network; the second angle is based on the binary weight matrix corresponding to the i-th layer of the neural network in the student network and the binary input matrix of the j+1th batch of images in the i-th layer of the neural network in the student network; i is a positive integer less than or equal to N.

第八方面，本申请提供了一种图像处理装置，该装置包括：获取单元，用于获取待处理图像；处理单元，用于利用目标二值神经网络模型对待处理图像进行图像处理，得到待处理图像的预测值；其中，目标二值神经网络模型是通过目标损失函数对知识蒸馏框架中的初始二值神经网络模型M₀训练得到的，初始二值神经网络模型M₀为知识蒸馏框架中的学生网络，知识蒸馏框架中的教师网络为训练好的神经网络模型；目标损失函数包括角度损失项，角度损失项用于描述教师网络中特征矩阵和权重矩阵间的夹角和学生网络中特征矩阵和权重矩阵间夹角的差异。In an eighth aspect, the present application provides an image processing device, which includes: an acquisition unit for acquiring an image to be processed; a processing unit for performing image processing on the image to be processed using a target binary neural network model to obtain a predicted value of the image to be processed; wherein the target binary neural network model is obtained by training an initial binary neural network model _M0 in a knowledge distillation framework through a target loss function, the initial binary neural network model _M0 is a student network in the knowledge distillation framework, and the teacher network in the knowledge distillation framework is a trained neural network model; the target loss function includes an angle loss term, and the angle loss term is used to describe the difference between the angle between the feature matrix and the weight matrix in the teacher network and the angle between the feature matrix and the weight matrix in the student network.

第九方面，本申请提供了一种模型训练装置，包括处理器和存储器，存储器用于存储程序指令，处理器用于调用程序指令来执行第一方面或第二方面中任一项所述的方法。In a ninth aspect, the present application provides a model training device, comprising a processor and a memory, the memory being used to store program instructions, and the processor being used to call the program instructions to execute any of the methods in the first aspect or the second aspect.

第十方面，本申请提供了一种芯片系统，上述芯片系统包括处理器和存储器；其中，存储器，用于存储目标二值神经网络模型和程序指令；目标二值神经网络模型是基于上述第一方面或第二方面中任一项所述方法训练得到的；处理器，用于读取所述程序指令，以调用目标二值神经网络模型执行如第三方面或第四方面中任一项所述的方法。In the tenth aspect, the present application provides a chip system, which includes a processor and a memory; wherein the memory is used to store a target binary neural network model and program instructions; the target binary neural network model is trained based on the method described in any one of the first aspect or the second aspect; the processor is used to read the program instructions to call the target binary neural network model to execute the method described in any one of the third aspect or the fourth aspect.

上述芯片具体可以是现场可编程门阵列(field-programmable gate array，FPGA)或者专用集成电路(application-specific integrated circuit，ASIC)。The above chip can specifically be a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).

第十一方面，本申请提供了一种终端设备，上述终端设备包括第十方面中芯片系统，以及耦合至芯片系统的分立器件；其中，终端设备包括汽车、摄像头、电脑、手机或可穿戴设备。In the eleventh aspect, the present application provides a terminal device, which includes the chip system in the tenth aspect, and a discrete device coupled to the chip system; wherein the terminal device includes a car, a camera, a computer, a mobile phone or a wearable device.

第十二方面，本申请提供了一种计算机可读存储介质，上述计算机可读介质存储用于设备执行的程序代码，该程序代码包括用于执行如上述第一方面、第二方面、第三方面或第四方面中任一项所述的方法。In a twelfth aspect, the present application provides a computer-readable storage medium, wherein the computer-readable medium stores a program code for execution by a device, wherein the program code includes a method for executing any one of the above-mentioned first aspect, second aspect, third aspect or fourth aspect.

第十三方面，本申请提供了一种包含指令的计算机程序产品，当该计算机程序产品在计算机上运行时，使得计算机执行上述第一方面、第二方面、第三方面或第四方面中任一项所述的方法。In a thirteenth aspect, the present application provides a computer program product comprising instructions, which, when executed on a computer, enables the computer to execute any of the methods in the first, second, third or fourth aspects above.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

以下对本申请实施例用到的附图进行介绍。The following is an introduction to the drawings used in the embodiments of the present application.

图1是本申请实施例提供的一种系统架构的结构示意图；FIG1 is a schematic diagram of a system architecture provided by an embodiment of the present application;

图2本申请实施例提供的一种主干网络的结构示意图；FIG2 is a schematic diagram of the structure of a backbone network provided in an embodiment of the present application;

图3是本申请实施例提供的一种芯片硬件结构示意图；FIG3 is a schematic diagram of a chip hardware structure provided in an embodiment of the present application;

图4是本申请实施例提供的另一种系统架构结构示意图；FIG4 is a schematic diagram of another system architecture structure provided in an embodiment of the present application;

图5是本申请实施例提供的一种二值神经网络中卷积运算过程示意图；FIG5 is a schematic diagram of a convolution operation process in a binary neural network provided in an embodiment of the present application;

图6是本申请实施例提供的一种知识蒸馏框架的结构示意图；FIG6 is a schematic diagram of the structure of a knowledge distillation framework provided in an embodiment of the present application;

图7是本申请实施例提供的一种二值神经网络模型的训练方法流程示意图；FIG7 is a flow chart of a training method for a binary neural network model provided in an embodiment of the present application;

图8是本申请实施例提供的另一种模型训练方法流程示意图；FIG8 is a flow chart of another model training method provided in an embodiment of the present application;

图9-A到图9-C为本申请实施例提供的不同网络模型的提取特征分布示意图；9-A to 9-C are schematic diagrams of distribution of extracted features of different network models provided in embodiments of the present application;

图10是申请实施例中一种特征矩阵与权重矩阵之间夹角示意图；FIG10 is a schematic diagram of the angle between a feature matrix and a weight matrix in an embodiment of the application;

图11是本申请实施例提供的一种图像处理方法的流程示意图；FIG11 is a schematic flow chart of an image processing method provided in an embodiment of the present application;

图12是本申请实施例提供的另一种图像处理方法的流程示意图；FIG12 is a schematic flow chart of another image processing method provided in an embodiment of the present application;

图13是本申请实施例提供的一种二值神经网络模型训练装置示意图；FIG13 is a schematic diagram of a binary neural network model training device provided in an embodiment of the present application;

图14是本申请实施例提供的一种模型训练装置示意图；FIG14 is a schematic diagram of a model training device provided in an embodiment of the present application;

图15是本申请实施例提供的一种图像处理装置的结构示意图；FIG15 is a schematic diagram of the structure of an image processing device provided in an embodiment of the present application;

图16是本申请实施例中一种模型训练装置的硬件结构示意图；FIG16 is a schematic diagram of the hardware structure of a model training device in an embodiment of the present application;

图17是本申请实施例提供的图像处理装置的硬件结构示意图。FIG. 17 is a schematic diagram of the hardware structure of the image processing device provided in an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

下面结合本申请实施例中的附图对本申请实施例进行描述。The embodiments of the present application are described below in conjunction with the drawings in the embodiments of the present application.

本申请实施例可以应用在图像分类、图像分割和目标检测等计算机视觉中的基本处理任务中，例如图片检测、相册管理、录像、平安城市、人机交互以及其他需要进行图像处理的场景。The embodiments of the present application can be applied to basic processing tasks in computer vision such as image classification, image segmentation and target detection, such as picture detection, album management, video recording, safe city, human-computer interaction and other scenarios that require image processing.

应理解，本申请实施例中的图像可以为静态图像(或称为静态画面)或动态图像(或称为动态画面)，例如，本申请中的图像可以为视频或动态图片，或者，本申请中的图像也可以为静态图片或照片。为了便于描述，本申请在下述实施例中将静态图像或动态图像统一称为图像。It should be understood that the image in the embodiments of the present application may be a static image (or a static picture) or a dynamic image (or a dynamic picture). For example, the image in the present application may be a video or a dynamic picture, or the image in the present application may be a static picture or a photo. For ease of description, the present application will uniformly refer to static images or dynamic images as images in the following embodiments.

本申请实施例的方法可以具体应用到相册管理和目标检测场景中，下面对这两种场景进行详细的介绍。The method of the embodiment of the present application can be specifically applied to album management and target detection scenarios. These two scenarios are introduced in detail below.

相册管理：Album management:

用户的终端设备，如手机等的相册中可能存储有大量的图像，例如，通过相机拍照、截图或者从网络下载等方式获取的大量图像。当用户需要从大量的图像数据中找出自己需要的图像时，可以采用本申请实施例中的方法对相册中大量的图像进行分类，不同类型的图像保存在不同的目录下，例如，动物类，风景类和人物类等，其中，动物类也可进行细分为不同的子类，例如根据图像中具体的动物类别对该图像中的动物进行识别，划分到该图像所属的子类。A large number of images may be stored in the photo album of a user's terminal device, such as a mobile phone, for example, a large number of images obtained by taking photos with a camera, taking screenshots, or downloading from the Internet. When a user needs to find the image he needs from a large amount of image data, the method in the embodiment of the present application can be used to classify a large number of images in the photo album. Different types of images are stored in different directories, for example, animals, landscapes, and people, etc. Among them, the animal category can also be subdivided into different subcategories, for example, the animal in the image is identified according to the specific animal category in the image, and classified into the subcategory to which the image belongs.

可以看出，采用本申请的方法可以快速且准确地帮助用户定位到其想要寻找的图像所属的类别，进而节省用户时间，提升用户体验。It can be seen that the method of the present application can quickly and accurately help users locate the category to which the image they want to find belongs, thereby saving user time and improving user experience.

目标检测：Object Detection:

目标检测即是从图像中找出感兴趣的物体，并确定该物体的位置和大小。例如，用户想要在自己终端设备的相册中寻找一些包含猫的图像，此时，可以采用本申请实施例中的方法，识别用户终端设备中所有包含有猫的图像，以供用户选择。Object detection is to find the object of interest in the image and determine the position and size of the object. For example, if a user wants to find some images containing cats in the photo album of his terminal device, the method in the embodiment of the present application can be used to identify all images containing cats in the user terminal device for the user to select.

可以看出，采用本申请实施例中的方法可以准确地对图像中的目标进行检测，从而对包含用户感兴趣物体的图像进行筛选，提升用户体验。It can be seen that the method in the embodiment of the present application can accurately detect the target in the image, thereby screening the images containing the objects of interest to the user and improving the user experience.

应理解，上文介绍的相册管理和目标检测只是本申请实施例的方法所应用的两个具体场景，本申请实施例的方法在应用时并不限于上述两个场景，本申请实施例的方法能够应用到任何需要进行图像处理的场景中，例如，图像分割。或者，本申请实施例中的方法也可以类似地应用于其他领域，例如，语音识别及自然语言处理等，本申请实施例中对此并不限定。It should be understood that the album management and target detection described above are only two specific scenarios in which the method of the embodiment of the present application is applied. The method of the embodiment of the present application is not limited to the above two scenarios when applied. The method of the embodiment of the present application can be applied to any scenario that requires image processing, such as image segmentation. Alternatively, the method of the embodiment of the present application can also be similarly applied to other fields, such as speech recognition and natural language processing, etc., which is not limited in the embodiment of the present application.

下面从模型训练侧和模型应用侧对本申请提供的方法进行描述：The following describes the method provided by this application from the model training side and the model application side:

本申请实施例提供的二值神经网络模型的训练方法，涉及计算机视觉的处理，具体可以应用于数据训练、机器学习、深度学习等数据处理方法，对训练数据(如本申请中的待处理图像)进行符号化和形式化的智能信息建模、抽取、预处理、训练等，最终得到训练好的目标二值神经网络模型；并且，本申请实施例提供的图像处理方法可以运用上述训练好的目标二值神经网络模型，将输入数据(如本申请中的待处理图像)输入到训练好的目标二值神经网络模型中，得到输出数据(如本申请中的待处理图像的预测值)。需要说明的是，本申请实施例提供的二值神经网络模型的训练方法和图像处理方法是基于同一个构思产生的发明，也可以理解为一个系统中的两个部分，或一个整体流程的两个阶段：如模型训练阶段和模型应用阶段。The training method of the binary neural network model provided in the embodiment of the present application relates to the processing of computer vision, and can be specifically applied to data processing methods such as data training, machine learning, and deep learning, and the training data (such as the image to be processed in the present application) is symbolized and formalized for intelligent information modeling, extraction, preprocessing, training, etc., and finally a trained target binary neural network model is obtained; and the image processing method provided in the embodiment of the present application can use the above-mentioned trained target binary neural network model to input the input data (such as the image to be processed in the present application) into the trained target binary neural network model to obtain output data (such as the predicted value of the image to be processed in the present application). It should be noted that the training method of the binary neural network model and the image processing method provided in the embodiment of the present application are inventions based on the same concept, and can also be understood as two parts in a system, or two stages of an overall process: such as the model training stage and the model application stage.

本申请实施例涉及了大量神经网络的相关应用，为了更好地理解本申请实施例的方案，下面先对本申请实施例可能涉及的神经网络和计算机视觉领域的相关术语和概念进行介绍。The embodiments of the present application involve a large number of neural network-related applications. In order to better understand the solutions of the embodiments of the present application, the relevant terms and concepts in the field of neural networks and computer vision that may be involved in the embodiments of the present application are first introduced below.

(1)图像分类(1) Image classification

从待处理图像或视频判断里面包含什么类别的目标。Determine what category of objects are contained in the image or video to be processed.

(2)目标检测(2) Object Detection

从给定的待处理图像中识别出所有感兴趣的目标(物体)，并确定它们的类别和位置。由于各类物体有不同的外观，形状，姿态，加上成像时光照，遮挡等因素的干扰，目标检测是计算机视觉领域的核心且最具挑战性的问题之一。Identify all the targets (objects) of interest from a given image to be processed and determine their categories and locations. Since various objects have different appearances, shapes, and postures, and are interfered by factors such as illumination and occlusion during imaging, target detection is one of the core and most challenging problems in the field of computer vision.

(3)图像分割(3) Image segmentation

图像分割分为实例分割和场景分割，图像分割主要用于判断待处理图像中的每个像素点属于哪个目标或物体。Image segmentation is divided into instance segmentation and scene segmentation. Image segmentation is mainly used to determine which target or object each pixel in the image to be processed belongs to.

(4)神经网络(4) Neural Network

神经网络可以是由神经单元组成的，神经单元可以是指以x_s和截距1为输入的运算单元，该运算单元的输出可以为：A neural network can be composed of neural units. A neural unit can refer to an operation unit with x _s and intercept 1 as input. The output of the operation unit can be:

其中，s＝1、2、……n，n为大于1的自然数，W_s为x_s的权重，b为神经单元的偏置。f为神经单元的激活函数(activation functions)，用于将非线性特性引入神经网络中，来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是sigmoid函数。神经网络是将许多个上述单一的神经单元联结在一起形成的网络，即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连，来提取局部接受域的特征，局部接受域可以是由若干个神经单元组成的区域。Where s=1, 2, ...n, n is a natural number greater than 1, _Ws is the weight of _xs , and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into the output signal. The output signal of the activation function can be used as the input of the next convolutional layer. The activation function can be a sigmoid function. A neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected to the local receptive field of the previous layer to extract the characteristics of the local receptive field. The local receptive field can be an area composed of several neural units.

(5)深度神经网络(5) Deep Neural Networks

深度神经网络(deep neural network，DNN)，也称多层神经网络，可以理解为具有很多层隐含层的神经网络，这里的“很多”并没有特别的度量标准。从DNN按不同层的位置划分，DNN内部的神经网络可以分为三类：输入层，隐含层，输出层。一般来说第一层是输入层，最后一层是输出层，中间的层数都是隐含层。层与层之间是全连接的，也就是说，第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。虽然DNN看起来很复杂，但是就每一层的工作来说，其实并不复杂，简单来说就是如下线性关系表达式：

其中，

是输入向量，

是输出向量，

是偏移向量，W是权重矩阵(也称系数)，α()是激活函数。每一层仅仅是对输入向量

经过如此简单的操作得到输出向量

由于DNN层数多，则系数W和偏移向量

的数量也就很多了。这些参数在DNN中的定义如下所述：以系数W为例：假设在一个三层的DNN中，第二层的第4个神经元到第三层的第2个神经元的线性系数定义为

上标3代表系数W所在的层数，而下标对应的是输出的第三层索引2和输入的第二层索引4。总结就是：第L-1层的第k个神经元到第L层的第j个神经元的系数定义为

需要注意的是，输入层是没有W参数的。在深度神经网络中，更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言，参数越多的模型复杂度越高，“容量”也就越大，也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程，其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。A deep neural network (DNN), also known as a multi-layer neural network, can be understood as a neural network with many hidden layers. There is no special metric for "many" here. From the position of different layers of DNN, the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the layers in between are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer. Although DNN looks complicated, the work of each layer is actually not complicated. Simply put, it is the following linear relationship expression:

in,

is the input vector,

is the output vector,

is the offset vector, W is the weight matrix (also called coefficient), and α() is the activation function. Each layer is just an input vector

After such a simple operation, the output vector

Since DNN has many layers, the coefficient W and the offset vector

The definition of these parameters in DNN is as follows: Take coefficient W as an example: Assume that in a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as

The superscript 3 represents the layer number of the coefficient W, while the subscripts correspond to the output third layer index 2 and the input second layer index 4. In summary, the coefficients from the kth neuron in the L-1th layer to the jth neuron in the Lth layer are defined as

It should be noted that the input layer does not have a W parameter. In a deep neural network, more hidden layers allow the network to better describe complex situations in the real world. Theoretically, the more parameters a model has, the higher its complexity and the greater its "capacity", which means it can complete more complex learning tasks. Training a deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (a weight matrix formed by many layers of vectors W).

(6)卷积神经网络(6) Convolutional Neural Networks

卷积神经网络(CNN，convolutional neuron network)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器。该特征抽取器可以看作是滤波器，卷积过程可以看作是使用一个可训练的滤波器与一个输入的图像或者卷积特征平面(feature map)做卷积。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中，一个神经元可以只与部分邻层神经元连接。一个卷积层中，通常包含若干个特征平面，每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重，这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。这其中隐含的原理是：图像的某一部分的统计信息与其他部分是一样的。即意味着在某一部分学习的图像信息也能用在另一部分上。所以对于图像上的所有位置，都能使用同样的学习得到的图像信息。在同一卷积层中，可以使用多个卷积核来提取不同的图像信息，一般地，卷积核数量越多，卷积操作反映的图像信息越丰富。Convolutional neural network (CNN) is a deep neural network with a convolutional structure. Convolutional neural network contains a feature extractor consisting of a convolution layer and a subsampling layer. The feature extractor can be regarded as a filter, and the convolution process can be regarded as using a trainable filter to convolve with an input image or convolution feature plane (feature map). Convolution layer refers to the neuron layer in the convolutional neural network that performs convolution processing on the input signal. In the convolution layer of the convolutional neural network, a neuron can only be connected to some neurons in the adjacent layer. A convolution layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. The neural units in the same feature plane share weights, and the shared weights here are the convolution kernels. Shared weights can be understood as the way to extract image information is independent of position. The implicit principle is that the statistical information of a part of the image is the same as that of other parts. This means that the image information learned in a part can also be used in another part. Therefore, the same learned image information can be used for all positions on the image. In the same convolution layer, multiple convolution kernels can be used to extract different image information. Generally speaking, the more convolution kernels there are, the richer the image information reflected by the convolution operation.

卷积核可以以随机大小的矩阵的形式初始化，在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外，共享权重带来的直接好处是减少卷积神经网络各层之间的连接，同时又降低了过拟合的风险。The convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network. In addition, the direct benefit of shared weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.

(7)损失函数(7) Loss function

在训练深度神经网络的过程中，因为希望深度神经网络的输出尽可能的接近真正想要预测的值，所以可以通过比较当前网络的预测值和真正想要的目标值，再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然，在第一次更新之前通常会有初始化的过程，即为深度神经网络中的各层预先配置参数)，比如，如果网络的预测值高了，就调整权重向量让它预测低一些，不断的调整，直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此，就需要预先定义“如何比较预测值和目标值之间的差异”，这便是损失函数(loss function)或目标函数(objective function)，它们是用于衡量预测值和目标值的差异的重要方程。其中，以损失函数举例，损失函数的输出值(loss)越高表示差异越大，那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。In the process of training a deep neural network, because we hope that the output of the deep neural network is as close as possible to the value we really want to predict, we can compare the predicted value of the current network with the target value we really want, and then update the weight vector of each layer of the neural network according to the difference between the two (of course, there is usually an initialization process before the first update, that is, pre-configuring parameters for each layer in the deep neural network). For example, if the predicted value of the network is high, adjust the weight vector to make it predict a lower value, and keep adjusting until the deep neural network can predict the target value we really want or a value very close to the target value we really want. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", which is the loss function or objective function, which are important equations used to measure the difference between the predicted value and the target value. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, so the training of the deep neural network becomes a process of minimizing this loss as much as possible.

(8)反向传播算法(8) Back propagation algorithm

卷积神经网络可以采用误差反向传播(back propagation，BP)算法在训练过程中修正初始的超分辨率模型中参数的大小，使得超分辨率模型的重建误差损失越来越小。具体地，前向传递输入信号直至输出会产生误差损失，通过反向传播误差损失信息来更新初始的超分辨率模型中参数，从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动，旨在得到最优的超分辨率模型的参数，例如权重矩阵。Convolutional neural networks can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial super-resolution model during the training process, so that the reconstruction error loss of the super-resolution model becomes smaller and smaller. Specifically, the forward transmission of the input signal to the output will generate error loss, and the error loss information is back-propagated to update the parameters in the initial super-resolution model, so that the error loss converges. The back propagation algorithm is a back propagation movement dominated by error loss, aiming to obtain the optimal parameters of the super-resolution model, such as the weight matrix.

(9)像素值(9) Pixel value

图像的像素值可以是一个红绿蓝(RGB)颜色值，像素值可以是表示颜色的长整数。例如，像素值为256*Red+100*Green+76Blue，其中，Blue代表蓝色分量，Green代表绿色分量，Red代表红色分量。各个颜色分量中，数值越小，亮度越低，数值越大，亮度越高。对于灰度图像来说，像素值可以是灰度值。The pixel value of an image can be a red, green, and blue (RGB) color value, and the pixel value can be a long integer representing the color. For example, the pixel value is 256*Red+100*Green+76Blue, where Blue represents the blue component, Green represents the green component, and Red represents the red component. In each color component, the smaller the value, the lower the brightness, and the larger the value, the higher the brightness. For grayscale images, the pixel value can be a grayscale value.

(10)熵(英文：Entropy)(10) Entropy

可以表示事物的确定性，确定性越高，熵越低，反之，熵越高。针对分类任务来说，如果一张图片的分类结果的置信度越接近0或者1，其熵越低，分类结果越接近0.5，熵越高，代表分类结果不确定。It can represent the certainty of things. The higher the certainty, the lower the entropy, and vice versa. For classification tasks, if the confidence of the classification result of an image is closer to 0 or 1, its entropy is lower, and the closer the classification result is to 0.5, the higher the entropy, which means the classification result is uncertain.

(11)知识蒸馏(11) Knowledge Distillation

知识蒸馏是一种模型压缩常见方法，模型压缩指的是在教师-学生框架中，将复杂、学习能力强的教师网络学到的特征表示“知识”蒸馏出来，传递给参数量小、学习能力弱的学生网络。Knowledge distillation is a common method of model compression. Model compression refers to distilling the feature representation "knowledge" learned by the complex and powerful teacher network in the teacher-student framework and passing it to the student network with small parameters and weak learning ability.

(12)二值神经网络(12) Binary Neural Network

二值神经网络指仅使用1和-1两个值来表示神经网络参数(weights)和神经网络中经过非线性函数激活过的卷积运算输出(activations)的神经网络，相比于全精度的神经网络，它可以节省大量的内存和计算，有利于模型在资源受限设备上的部署。A binary neural network refers to a neural network that uses only two values, 1 and -1, to represent the neural network parameters (weights) and the convolution operation outputs (activations) activated by nonlinear functions in the neural network. Compared with full-precision neural networks, it can save a lot of memory and computing, which is conducive to the deployment of models on resource-constrained devices.

(13)教师网络(13) Teacher Network

教师网络通常是指一个更加复杂的网络，具有非常好的性能和泛化能力，本申请实施例中的教师网络可以是神经网络参数(weights)和神经网络中经过非线性函数激活过的卷积运算输出(activations)为全精度(32位浮点数)，半精度(16位浮点数)，或常用整型(8bit，4bit，2bit整数)等类型数据的神经网络。The teacher network usually refers to a more complex network with very good performance and generalization ability. The teacher network in the embodiment of the present application can be a neural network whose neural network parameters (weights) and convolution operation outputs (activations) activated by nonlinear functions in the neural network are full-precision (32-bit floating point numbers), half-precision (16-bit floating point numbers), or commonly used integer types (8bit, 4bit, 2bit integers) and other types of data.

下面介绍本申请实施例提供的系统架构。The following introduces the system architecture provided by the embodiments of the present application.

参见附图1，图1为本申请实施例提供的一种系统架构100的结构示意图。如系统架构100所示，数据采集设备160用于采集训练数据，本申请实施例中训练数据包括带有标签的图像数据，其中，图像的标签可以是该图像对应的类别，或该图像内的目标对应的类别，或该图像的每个像素点对应的类别，上述类别在数学上的表示形式为一个多维向量。Referring to FIG. 1 , FIG. 1 is a schematic diagram of a system architecture 100 provided in an embodiment of the present application. As shown in the system architecture 100, the data acquisition device 160 is used to acquire training data. In the embodiment of the present application, the training data includes labeled image data, wherein the label of the image may be a category corresponding to the image, or a category corresponding to an object in the image, or a category corresponding to each pixel of the image, and the above categories are mathematically represented as a multidimensional vector.

在采集到训练数据之后，数据采集设备160将这些训练数据存入数据库130，训练设备120基于数据库130中维护的训练数据训练得到目标模型101(即为本申请实施例中的目标二值神经网络模型)。After collecting the training data, the data collection device 160 stores the training data in the database 130, and the training device 120 obtains the target model 101 (that is, the target binary neural network model in the embodiment of the present application) through training based on the training data maintained in the database 130.

下面将以实施例一更详细地描述训练设备120如何基于训练数据得到目标模型101，该目标模型101能够用于实现本申请实施例提供的图像处理方法，即，将待处理图像通过相关预处理后输入该目标模型101，即可得到待处理图像的预测值。本申请实施例中的目标模型101具体可以为目标二值神经网络模型，在本申请提供的实施例中，该目标二值神经网络模型是通过训练初始二值神经网络模型M₀得到的。需要说明的是，在实际的应用中，数据库130中维护的训练数据不一定都来自于数据采集设备160的采集，也有可能是从其他设备接收得到的。另外需要说明的是，训练设备120也不一定完全基于数据库130维护的训练数据进行目标模型101的训练，也有可能从云端或其他地方获取训练数据进行模型训练，上述描述不应该作为对本申请实施例的限定。The following will describe in more detail how the training device 120 obtains the target model 101 based on the training data with the first embodiment. The target model 101 can be used to implement the image processing method provided in the embodiment of the present application, that is, the image to be processed is input into the target model 101 after relevant preprocessing, and the predicted value of the image to be processed can be obtained. The target model 101 in the embodiment of the present application can specifically be a target binary neural network model. In the embodiment provided in the present application, the target binary neural network model is obtained by training the initial binary neural network model _M0 . It should be noted that in actual applications, the training data maintained in the database 130 may not all come from the collection of the data acquisition device 160, but may also be received from other devices. It should also be noted that the training device 120 may not necessarily train the target model 101 completely based on the training data maintained by the database 130, and it may also be possible to obtain training data from the cloud or other places for model training. The above description should not be used as a limitation on the embodiment of the present application.

根据训练设备120训练得到的目标模型101可以应用于不同的系统或设备中，如应用于图1所示的执行设备110，执行设备110可以是终端，如手机终端，平板电脑，笔记本电脑，增强现实(augmented reality，AR)/虚拟现实(virtual reality，VR)，车载终端等，还可以是服务器或者云端等。在附图1中，执行设备110配置有输入/输出(input/output，I/O)接口112，用于与外部设备进行数据交互，用户可以通过客户设备140向I/O接口112输入数据，输入数据在本申请实施例中可以包括各种图像或视频数据。The target model 101 obtained by training the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. 1 . The execution device 110 can be a terminal, such as a mobile phone terminal, a tablet computer, a laptop computer, an augmented reality (AR)/virtual reality (VR), a vehicle terminal, etc., and can also be a server or a cloud, etc. In FIG. 1 , the execution device 110 is configured with an input/output (I/O) interface 112 for data interaction with an external device. The user can input data to the I/O interface 112 through the client device 140. The input data may include various image or video data in the embodiment of the present application.

在执行设备110对输入数据进行预处理，或者在执行设备110的计算模块111执行计算等相关的处理过程中，执行设备110可以调用数据存储系统150中的数据、代码等以用于相应的处理，也可以将相应处理得到的数据、指令等存入数据存储系统150中。When the execution device 110 preprocesses the input data, or when the computing module 111 of the execution device 110 performs calculations and other related processing, the execution device 110 can call the data, code, etc. in the data storage system 150 for corresponding processing, and can also store the data, instructions, etc. obtained from the corresponding processing into the data storage system 150.

最后，I/O接口112将处理结果，如上述得到的待处理图像的预测值(即该待处理图像的类别标签，或从该待处理图像中识别出的目标，或对该待处理图像进行分割的结果)返回给客户设备140，从而提供给用户。Finally, the I/O interface 112 returns the processing results, such as the predicted value of the image to be processed obtained above (i.e., the category label of the image to be processed, or the target identified from the image to be processed, or the result of segmenting the image to be processed) to the client device 140, thereby providing it to the user.

值得说明的是，训练设备120可以针对不同的目标或称不同的任务，基于不同的训练数据生成相应的目标模型101，该相应的目标模型101即可以用于实现上述目标或完成上述任务，从而为用户提供所需的结果。It is worth noting that the training device 120 can generate a corresponding target model 101 based on different training data for different goals or different tasks, and the corresponding target model 101 can be used to achieve the above goals or complete the above tasks, thereby providing the user with the desired results.

在附图1中所示情况下，用户可以手动给定输入数据，该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下，客户设备140可以自动地向I/O接口112发送输入数据，如果要求客户设备140自动发送输入数据需要获得用户的授权，则用户可以在客户设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果，具体的呈现形式可以是显示、声音、动作等具体方式。客户设备140也可以作为数据采集端，采集如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果作为新的样本数据，并存入数据库130。当然，也可以不经过客户设备140进行采集，而是由I/O接口112直接将如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果，作为新的样本数据存入数据库130。In the case shown in FIG. 1 , the user can manually give input data, and the manual giving can be operated through the interface provided by the I/O interface 112. In another case, the client device 140 can automatically send input data to the I/O interface 112. If the client device 140 is required to automatically send input data and needs to obtain the user's authorization, the user can set the corresponding authority in the client device 140. The user can view the results output by the execution device 110 on the client device 140, and the specific presentation form can be a specific method such as display, sound, action, etc. The client device 140 can also be used as a data acquisition terminal to collect the input data of the input I/O interface 112 and the output results of the output I/O interface 112 as shown in the figure as new sample data, and store them in the database 130. Of course, it is also possible not to collect through the client device 140, but the I/O interface 112 directly stores the input data of the input I/O interface 112 and the output results of the output I/O interface 112 as new sample data in the database 130.

值得注意的是，附图1仅是本发明实施例提供的一种系统架构的示意图，图中所示设备、器件、模块等之间的位置关系不构成任何限制，例如，在附图1中，数据存储系统150相对执行设备110是外部存储器，在其它情况下，也可以将数据存储系统150置于执行设备110中。It is worth noting that FIG1 is only a schematic diagram of a system architecture provided by an embodiment of the present invention, and the positional relationship between the devices, components, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG1, the data storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 can also be placed in the execution device 110.

如图1所示，根据训练设备120训练得到目标模型101，该目标模型101在本申请实施例中可以是基于本申请实施例二值神经网络模型的训练方法训练得到的目标二值神经网络模型，具体的，本申请实施例提供的目标二值神经网络模型可以是卷积神经网络或其他功能类似的神经网络，本方案对此不做具体限定。As shown in Figure 1, the target model 101 is obtained by training with the training device 120. In the embodiment of the present application, the target model 101 can be a target binary neural network model obtained by training based on the training method of the binary neural network model in the embodiment of the present application. Specifically, the target binary neural network model provided in the embodiment of the present application can be a convolutional neural network or other neural network with similar functions, and this solution does not make any specific limitations on this.

如前文的基础概念介绍所述，卷积神经网络是一种带有卷积结构的深度神经网络，是一种深度学习(deep learning，DL)架构，深度学习架构是指通过机器学习的算法，在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构，CNN是一种前馈(feed-forward)人工神经网络，该前馈人工神经网络中的各个神经元可以对输入其中的图像作出响应。As mentioned in the previous basic concept introduction, convolutional neural network is a deep neural network with convolution structure, which is a deep learning (DL) architecture. Deep learning architecture refers to multiple levels of learning at different abstract levels through machine learning algorithms. As a deep learning architecture, CNN is a feed-forward artificial neural network, in which each neuron can respond to the image input into it.

如图2所示，卷积神经网络(CNN)200可以包括输入层210，卷积层/池化层220(其中池化层为可选的)，以及神经网络层230。As shown in FIG. 2 , a convolutional neural network (CNN) 200 may include an input layer 210 , a convolutional layer/pooling layer 220 (wherein the pooling layer is optional), and a neural network layer 230 .

卷积层/池化层220：Convolutional layer/pooling layer 220:

卷积层：Convolutional Layer:

如图2所示卷积层/池化层220可以包括如示例221-226层，举例来说：在一种实现中，221层为卷积层，222层为池化层，223层为卷积层，224层为池化层，225为卷积层，226为池化层；在另一种实现方式中，221、222为卷积层，223为池化层，224、225为卷积层，226为池化层。即卷积层的输出可以作为随后的池化层的输入，也可以作为另一个卷积层的输入以继续进行卷积操作。As shown in FIG2 , the convolution layer/pooling layer 220 may include layers 221-226, for example: in one implementation, layer 221 is a convolution layer, layer 222 is a pooling layer, layer 223 is a convolution layer, layer 224 is a pooling layer, layer 225 is a convolution layer, and layer 226 is a pooling layer; in another implementation, layers 221 and 222 are convolution layers, layer 223 is a pooling layer, layers 224 and 225 are convolution layers, and layer 226 is a pooling layer. That is, the output of a convolution layer can be used as the input of a subsequent pooling layer, or as the input of another convolution layer to continue the convolution operation.

下面将以卷积层221为例，介绍一层卷积层的内部工作原理。The following will take the convolution layer 221 as an example to introduce the internal working principle of a convolution layer.

卷积层221可以包括很多个卷积算子，卷积算子也称为核，其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器，卷积算子本质上可以是一个权重矩阵，这个权重矩阵通常被预先定义，在对图像进行卷积操作的过程中，权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理，从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关，需要注意的是，权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的，在进行卷积运算的过程中，权重矩阵会延伸到输入图像的整个深度。因此，和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出，但是大多数情况下不使用单一权重矩阵，而是应用多个尺寸(行×列)相同的权重矩阵，即多个同型矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度，这里的维度可以理解为由上面所述的“多个”来决定。不同的权重矩阵可以用来提取图像中不同的特征，例如一个权重矩阵用来提取图像边缘信息，另一个权重矩阵用来提取图像的特定颜色，又一个权重矩阵用来对图像中不需要的噪点进行模糊化等。该多个权重矩阵尺寸(行×列)相同，经过该多个尺寸相同的权重矩阵提取后的特征图的尺寸也相同，再将提取到的多个尺寸相同的特征图合并形成卷积运算的输出。The convolution layer 221 may include a plurality of convolution operators, which are also called kernels. The convolution operator is equivalent to a filter that extracts specific information from the input image matrix in image processing. The convolution operator can be essentially a weight matrix, which is usually predefined. In the process of performing convolution operations on the image, the weight matrix is usually processed one pixel after another (or two pixels after two pixels... depending on the value of the step length stride) in the horizontal direction on the input image, thereby completing the work of extracting specific features from the image. The size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image. In the process of performing convolution operations, the weight matrix will extend to the entire depth of the input image. Therefore, convolution with a single weight matrix will produce a convolution output with a single depth dimension, but in most cases, a single weight matrix is not used, but multiple weight matrices of the same size (row × column), that is, multiple isotype matrices, are applied. The output of each weight matrix is stacked to form the depth dimension of the convolution image, and the dimension here can be understood as being determined by the "multiple" mentioned above. Different weight matrices can be used to extract different features in the image, for example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to blur unwanted noise in the image, etc. The multiple weight matrices have the same size (rows × columns), and the feature maps extracted by the multiple weight matrices of the same size are also the same size. The extracted feature maps of the same size are then merged to form the output of the convolution operation.

这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到，通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息，从而使得卷积神经网络200进行正确的预测。The weight values in these weight matrices need to be obtained through a lot of training in practical applications. The weight matrices formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network 200 can make correct predictions.

当卷积神经网络200有多个卷积层的时候，初始的卷积层(例如221)往往提取较多的一般特征，该一般特征也可以称之为低级别的特征；随着卷积神经网络200深度的加深，越往后的卷积层(例如226)提取到的特征越来越复杂，比如高级别的语义之类的特征，语义越高的特征越适用于待解决的问题。When the convolutional neural network 200 has multiple convolutional layers, the initial convolutional layer (for example, 221) often extracts more general features, which can also be called low-level features. As the depth of the convolutional neural network 200 increases, the features extracted by the later convolutional layers (for example, 226) become more and more complex, such as high-level semantic features. Features with higher semantics are more suitable for the problem to be solved.

池化层：Pooling layer:

由于常常需要减少训练参数的数量，因此卷积层之后常常需要周期性的引入池化层，在如图2中卷积层/池化层220所示例的221-226各层，可以是一层卷积层后面跟一层池化层，也可以是多层卷积层后面接一层或多层池化层。在图像处理过程中，池化层的唯一目的就是减少图像的空间大小。池化层可以包括平均池化算子和/或最大池化算子，以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值作为平均池化的结果。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外，就像卷积层中用权重矩阵的大小应该与图像尺寸相关一样，池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸，池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。Since it is often necessary to reduce the number of training parameters, it is often necessary to periodically introduce a pooling layer after the convolution layer. In the layers 221-226 illustrated in the convolution layer/pooling layer 220 in FIG. 2, a convolution layer may be followed by a pooling layer, or multiple convolution layers may be followed by one or more pooling layers. In the image processing process, the only purpose of the pooling layer is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator to sample the input image to obtain an image of smaller size. The average pooling operator may calculate the pixel values in the image within a specific range to generate an average value as the result of average pooling. The maximum pooling operator may take the pixel with the largest value in the range within a specific range as the result of maximum pooling. In addition, just as the size of the weight matrix used in the convolution layer should be related to the image size, the operator in the pooling layer should also be related to the image size. The size of the image output after processing by the pooling layer may be smaller than the size of the image input to the pooling layer, and each pixel in the image output by the pooling layer represents the average value or maximum value of the corresponding sub-region of the image input to the pooling layer.

神经网络层230：Neural Network Layer 230:

在经过卷积层/池化层220的处理后，卷积神经网络200还不足以输出所需要的输出信息。因为如前所述，卷积层/池化层220只会提取特征，并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或其他相关信息)，卷积神经网络200需要利用神经网络层230来生成一个或者一组所需要的类的数量的输出。因此，在神经网络层230中可以包括多层隐含层(如图2所示的231、232至23n)以及输出层240，该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到，例如该任务类型可以包括图像识别、目标检测、图像分类和图像超分辨率重建等。After being processed by the convolution layer/pooling layer 220, the convolution neural network 200 is not sufficient to output the required output information. Because as mentioned above, the convolution layer/pooling layer 220 will only extract features and reduce the parameters brought by the input image. However, in order to generate the final output information (the required class information or other related information), the convolution neural network 200 needs to use the neural network layer 230 to generate one or a group of outputs of the required number of classes. Therefore, the neural network layer 230 may include multiple hidden layers (231, 232 to 23n as shown in Figure 2) and an output layer 240, and the parameters contained in the multiple hidden layers can be pre-trained according to the relevant training data of the specific task type. For example, the task type may include image recognition, target detection, image classification, and image super-resolution reconstruction.

在神经网络层230中的多层隐含层之后，也就是整个卷积神经网络200的最后层为输出层240，该输出层240具有类似分类交叉熵的损失函数，具体用于计算预测误差，一旦整个卷积神经网络200的前向传播(如图2由210至240方向的传播为前向传播)完成，反向传播(如图2由240至210方向的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差，以减少卷积神经网络200的损失，及卷积神经网络200通过输出层输出的结果和理想结果之间的误差。After the multiple hidden layers in the neural network layer 230, that is, the last layer of the entire convolutional neural network 200 is the output layer 240. The output layer 240 has a loss function similar to the classification cross entropy, which is specifically used to calculate the prediction error. Once the forward propagation of the entire convolutional neural network 200 (the propagation from 210 to 240 in FIG. 2 is the forward propagation) is completed, the back propagation (the propagation from 240 to 210 in FIG. 2 is the back propagation) will begin to update the weight values and biases of the aforementioned layers to reduce the loss of the convolutional neural network 200 and the error between the result output by the convolutional neural network 200 through the output layer and the ideal result.

需要说明的是，如图2所示的卷积神经网络200仅作为一种卷积神经网络的示例，在具体的应用中，卷积神经网络还可以以其他网络模型的形式存在。It should be noted that the convolutional neural network 200 shown in FIG. 2 is only an example of a convolutional neural network. In specific applications, the convolutional neural network may also exist in the form of other network models.

下面介绍本申请实施例提供的一种芯片硬件结构。The following introduces a chip hardware structure provided by an embodiment of the present application.

图3为本发明实施例提供的一种芯片硬件结构，该芯片包括神经网络处理器50。该芯片可以被设置在如图1所示的执行设备110中，用以完成计算模块111的计算工作。该芯片也可以被设置在如图1所示的训练设备120中，用以完成训练设备120的训练工作并输出目标模型101。如图2所示的卷积神经网络中各层的算法均可在如图3所示的芯片中得以实现。FIG3 is a chip hardware structure provided by an embodiment of the present invention, and the chip includes a neural network processor 50. The chip can be set in the execution device 110 shown in FIG1 to complete the calculation work of the calculation module 111. The chip can also be set in the training device 120 shown in FIG1 to complete the training work of the training device 120 and output the target model 101. The algorithms of each layer in the convolutional neural network shown in FIG2 can be implemented in the chip shown in FIG3.

神经网络处理器NPU 50作为协处理器挂载到主CPU(Host CPU)上，由Host CPU分配任务。NPU的核心部分为运算电路503，控制器504控制运算电路503提取存储器(权重存储器或输入存储器)中的数据并进行运算。The neural network processor NPU 50 is mounted on the host CPU as a coprocessor, and the host CPU assigns tasks. The core part of the NPU is the operation circuit 503, and the controller 504 controls the operation circuit 503 to extract data from the memory (weight memory or input memory) and perform operations.

在一些实现中，运算电路503内部包括多个处理单元(process engine,PE)。在一些实现中，运算电路503是二维脉动阵列。运算电路503还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中，运算电路503是通用的矩阵处理器。In some implementations, the operation circuit 503 includes multiple processing units (process engines, PEs) inside. In some implementations, the operation circuit 503 is a two-dimensional systolic array. The operation circuit 503 can also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the operation circuit 503 is a general-purpose matrix processor.

举例来说，假设有输入矩阵A，权重矩阵B，输出矩阵C。运算电路从权重存储器502中取矩阵B相应的数据，并缓存在运算电路中每一个PE上。运算电路从输入存储器501中取矩阵A数据与矩阵B进行矩阵运算，得到的矩阵的部分结果或最终结果，保存在累加器508(accumulator)中。For example, assume there is an input matrix A, a weight matrix B, and an output matrix C. The operation circuit takes the corresponding data of matrix B from the weight memory 502 and caches it on each PE in the operation circuit. The operation circuit takes the matrix A data from the input memory 501 and performs matrix operation with matrix B, and the partial result or final result of the matrix is stored in the accumulator 508 (accumulator).

向量计算单元507可以对运算电路的输出做进一步处理，如向量乘，向量加，指数运算，对数运算，大小比较等等。例如，向量计算单元507可以用于神经网络中非卷积/非FC层的网络计算，如池化(Pooling)，批归一化(batch normalization)，局部响应归一化(local response normalization)等。The vector calculation unit 507 can further process the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc. For example, the vector calculation unit 507 can be used for network calculations of non-convolutional/non-FC layers in a neural network, such as pooling, batch normalization, local response normalization, etc.

在一些实现种，向量计算单元507能将经处理的输出的向量存储到统一存储器506。例如，向量计算单元507可以将非线性函数应用到运算电路503的输出，例如累加值的向量，用以生成激活值。在一些实现中，向量计算单元507生成归一化的值、合并值，或二者均有。在一些实现中，处理过的输出的向量能够用作到运算电路503的激活输入，例如用于在神经网络中的后续层中的使用。In some implementations, the vector calculation unit 507 can store the processed output vector to the unified memory 506. For example, the vector calculation unit 507 can apply a nonlinear function to the output of the operation circuit 503, such as a vector of accumulated values, to generate an activation value. In some implementations, the vector calculation unit 507 generates a normalized value, a merged value, or both. In some implementations, the processed output vector can be used as an activation input to the operation circuit 503, such as for use in a subsequent layer in a neural network.

统一存储器506用于存放输入数据以及输出数据。The unified memory 506 is used to store input data and output data.

权重数据直接通过存储单元访问控制器505(direct memory accesscontroller，DMAC)将外部存储器中的输入数据搬运到输入存储器501和/或统一存储器506、将外部存储器中的权重数据存入权重存储器502，以及将统一存储器506中的数据存入外部存储器。The weight data is directly transferred from the external memory to the input memory 501 and/or the unified memory 506 through the direct memory access controller 505 (DMAC), the weight data in the external memory is stored in the weight memory 502, and the data in the unified memory 506 is stored in the external memory.

总线接口单元(bus interface unit，BIU)510，用于通过总线实现主CPU、DMAC和取指存储器509之间进行交互。The bus interface unit (BIU) 510 is used to implement the interaction between the main CPU, DMAC and instruction fetch memory 509 through the bus.

与控制器504连接的取指存储器(instruction fetch buffer)509，用于存储控制器504使用的指令。An instruction fetch buffer 509 connected to the controller 504 is used to store instructions used by the controller 504 .

控制器504，用于调用指存储器509中缓存的指令，实现控制该运算加速器的工作过程。The controller 504 is used to call the instructions cached in the memory 509 to control the working process of the computing accelerator.

一般地，统一存储器506，输入存储器501，权重存储器502以及取指存储器509均为片上(on-chip)存储器，外部存储器为该NPU外部的存储器，该外部存储器可以为双倍数据率同步动态随机存储器(double data rate synchronous dynamic random accessmemory，简称DDR SDRAM)、高带宽存储器(high bandwidth memory，HBM)或其他可读可写的存储器。Generally, the unified memory 506, the input memory 501, the weight memory 502 and the instruction fetch memory 509 are all on-chip memories, and the external memory is a memory outside the NPU, which can be a double data rate synchronous dynamic random access memory (DDR SDRAM), a high bandwidth memory (HBM) or other readable and writable memory.

其中，图2所示的卷积神经网络中各层的运算可以由运算电路503或向量计算单元507执行。Among them, the operations of each layer in the convolutional neural network shown in Figure 2 can be performed by the operation circuit 503 or the vector calculation unit 507.

上文中介绍的图1中的训练设备120能够执行本申请实施例中训练二值神经网络模型的方法的各个步骤，图1中的执行设备110能够执行本申请实施例的图像处理方法(比如，图像分类、图像分割和目标检测)的各个步骤，图2所示的神经网络模型和图3所示的芯片也可以用于执行本申请实施例的图像处理方法的各个步骤，图3所示的芯片也可以用于执行本申请实施例中训练二值神经网络模型的方法的各个步骤。The training device 120 in Figure 1 introduced above can execute the various steps of the method for training a binary neural network model in the embodiment of the present application. The execution device 110 in Figure 1 can execute the various steps of the image processing method (for example, image classification, image segmentation, and target detection) in the embodiment of the present application. The neural network model shown in Figure 2 and the chip shown in Figure 3 can also be used to execute the various steps of the image processing method in the embodiment of the present application. The chip shown in Figure 3 can also be used to execute the various steps of the method for training a binary neural network model in the embodiment of the present application.

如图4所示，图4为本申请实施例提供一种系统架构300的结构示意图。该系统架构包括本地设备301、本地设备302以及执行设备210和数据存储系统250；其中，本地设备301和本地设备302通过通信网络与执行设备210连接。As shown in Figure 4, Figure 4 is a schematic diagram of a system architecture 300 provided in an embodiment of the present application. The system architecture includes a local device 301, a local device 302, an execution device 210 and a data storage system 250; wherein the local device 301 and the local device 302 are connected to the execution device 210 via a communication network.

执行设备210可以由一个或多个服务器实现。可选的，执行设备210可以与其它计算设备配合使用，例如：数据存储器、路由器、负载均衡器等设备。执行设备210可以布置在一个物理站点上，或者分布在多个物理站点上。执行设备210可以使用数据存储系统250中的数据，或者调用数据存储系统250中的程序代码来实现本申请实施例的训练二值神经网络的方法或图像处理方法(比如，图像超分方法、图像去噪方法、图像去马赛克方法及图像去模糊方法)。The execution device 210 can be implemented by one or more servers. Optionally, the execution device 210 can be used in conjunction with other computing devices, such as data storage devices, routers, load balancers, and other devices. The execution device 210 can be arranged at a physical site, or distributed at multiple physical sites. The execution device 210 can use the data in the data storage system 250, or call the program code in the data storage system 250 to implement the method for training a binary neural network or the image processing method (e.g., image super-resolution method, image denoising method, image demosaicing method, and image deblurring method) of the embodiment of the present application.

具体地，执行设备210可以执行以下过程：Specifically, the execution device 210 may perform the following process:

S1：确定知识蒸馏框架；其中，知识蒸馏框架中的教师网络为训练好的神经网络模型，知识蒸馏框架中的学生网络为初始二值神经网络模型M₀，教师网络和学生网络分别包含N层神经网络，N为正整数；S2：利用第j+1批图像和目标损失函数训练二值神经网络模型M_j，得到二值神经网络模型M_j+1；其中，二值神经网络模型M_j是基于第j批图像训练得到的，j为正整数；目标损失函数包含角度损失项，角度损失项用于描述教师网络中第i层神经网络对应的第一角度和学生网络中第i层神经网络对应的第二角度之间的差异；第一角度是基于教师网络中第i层神经网络的权重矩阵和第j+1批图像在教师网络中第i层神经网络中的输入矩阵得到的；第二角度是基于学生网络中第i层神经网络的二值权重矩阵和第j+1批图像在学生网络中第i层神经网络中的二值输入矩阵得到的；i为小于或等于N的正整数；S3：当满足预设条件时，将二值神经网络模型M_j+1作为目标二值神经网络模型；否则令j＝j+1，并重复步骤S2。S1: Determine the knowledge distillation framework; wherein the teacher network in the knowledge distillation framework is a trained neural network model, and the student network in the knowledge distillation framework is an initial binary neural network model _M0 , and the teacher network and the student network respectively include N layers of neural networks, where N is a positive integer; S2: Use the j+1th batch of images and the target loss function to train the binary neural network model _Mj to obtain the binary neural network model _Mj+1 ; wherein the binary neural network model _Mj is obtained based on the jth batch of images, and j is a positive integer; the target loss function includes an angle loss term, which is used to describe the difference between the first angle corresponding to the i-th layer of the neural network in the teacher network and the second angle corresponding to the i-th layer of the neural network in the student network; the first angle is obtained based on the weight matrix of the i-th layer of the neural network in the teacher network and the input matrix of the j+1th batch of images in the i-th layer of the neural network in the teacher network; the second angle is obtained based on the binary weight matrix of the i-th layer of the neural network in the student network and the binary input matrix of the j+1th batch of images in the i-th layer of the neural network in the student network; i is a positive integer less than or equal to N; S3: When the preset conditions are met, the binary neural network model Mj is trained. _j+1 is used as the target binary neural network model; otherwise, j=j+1 is set and step S2 is repeated.

通过上述执行设备210能够训练得到一个目标二值神经网络模型，目标二值神经网络模型可以用于图像处理、语音处理及自然语言处理等，例如，该目标二值神经网络模型可以用于实现本申请实施例中的图像分类、目标检测和图像分割方法。Through the above-mentioned execution device 210, a target binary neural network model can be trained and obtained, and the target binary neural network model can be used for image processing, speech processing, natural language processing, etc. For example, the target binary neural network model can be used to implement the image classification, target detection and image segmentation methods in the embodiments of the present application.

或者，通过上述过程执行设备210能够搭建成一个图像处理装置，该图像处理装置可以用于图像处理(例如，可以用于实现本申请实施例中的图像分类、目标检测和图像分割方法)。Alternatively, the device 210 executing the above process can be constructed into an image processing apparatus, which can be used for image processing (for example, it can be used to implement the image classification, target detection and image segmentation methods in the embodiments of the present application).

用户可以操作各自的用户设备(例如本地设备301和本地设备302)与执行设备210进行交互。每个本地设备可以表示任何计算设备，例如个人计算机、计算机工作站、智能手机、平板电脑、智能摄像头、智能汽车或其他类型蜂窝电话、媒体消费设备、可穿戴设备、机顶盒、游戏机等。Users can operate their respective user devices (e.g., local device 301 and local device 302) to interact with execution device 210. Each local device can represent any computing device, such as a personal computer, a computer workstation, a smart phone, a tablet computer, a smart camera, a smart car or other type of cellular phone, a media consumption device, a wearable device, a set-top box, a game console, etc.

每个用户的本地设备可以通过任何通信机制/通信标准的通信网络与执行设备210进行交互，通信网络可以是广域网、局域网、点对点连接等方式，或它们的任意组合。The local device of each user can interact with the execution device 210 through a communication network of any communication mechanism/communication standard. The communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.

在一种实现方式中，本地设备301、本地设备302从执行设备210获取到神经网络的相关参数，将神经网络部署在本地设备301、本地设备302上，利用该神经网络对待处理图像进行图像处理，得到待处理图像的处理结果。In one implementation, the local device 301 and the local device 302 obtain relevant parameters of the neural network from the execution device 210, deploy the neural network on the local device 301 and the local device 302, and use the neural network to perform image processing on the image to be processed to obtain a processing result of the image to be processed.

在另一种实现中，执行设备210上可以直接部署神经网络，执行设备210通过从本地设备301和本地设备302获取待处理图像，并利用该神经网络对待处理图像进行图像处理，得到待处理图像的处理结果。In another implementation, the neural network can be directly deployed on the execution device 210. The execution device 210 obtains the image to be processed from the local device 301 and the local device 302, and uses the neural network to perform image processing on the image to be processed to obtain a processing result of the image to be processed.

在一种实现方式中，本地设备301、本地设备302从执行设备210获取到图像处理装置的相关参数，将图像处理装置部署在本地设备301、本地设备302上，利用该图像处理装置对待处理图像进行图像处理，得到待处理图像的处理结果。In one implementation, the local device 301 and the local device 302 obtain relevant parameters of the image processing apparatus from the execution device 210, deploy the image processing apparatus on the local device 301 and the local device 302, and use the image processing apparatus to perform image processing on the image to be processed to obtain a processing result of the image to be processed.

在另一种实现中，执行设备210上可以直接部署图像处理装置，执行设备210通过从本地设备301和本地设备302获取待处理图像，并利用该图像处理装置对待处理图像进行图像处理，得到待处理图像的处理结果。In another implementation, the image processing apparatus may be directly deployed on the execution device 210. The execution device 210 obtains the image to be processed from the local device 301 and the local device 302, and uses the image processing apparatus to perform image processing on the image to be processed to obtain a processing result of the image to be processed.

也就是说，上述执行设备210也可以为云端设备，此时，执行设备210可以部署在云端；或者，上述执行设备210也可以为终端设备，此时，执行设备210可以部署在用户终端侧，本申请实施例对此并不限定。That is to say, the above-mentioned execution device 210 can also be a cloud device, in which case the execution device 210 can be deployed in the cloud; or, the above-mentioned execution device 210 can also be a terminal device, in which case the execution device 210 can be deployed on the user terminal side, which is not limited in the embodiments of the present application.

下面结合附图对本申请实施例的训练二值神经网络模型的方法及图像处理方法(例如，图像处理方法可以包括图像分类、目标检测和图像分割)进行详细的介绍。The following is a detailed introduction to the method for training a binary neural network model and an image processing method (for example, the image processing method may include image classification, target detection, and image segmentation) according to an embodiment of the present application in conjunction with the accompanying drawings.

请参见图6，图6为本申请实施例中一种知识蒸馏框架600的结构示意图。该知识蒸馏框架即是本申请实施例所使用的知识蒸馏框架。知识蒸馏框架600包含教师网络和学生网络，教师网络为训练好的神经网络模型，学生网络为初始二值神经网络模型M₀；教师网络和学生网络分别包含N层神经网络，N为正整数。知识蒸馏框架600在本申请实施例中也称为分层搜索(layer-wise search，LWS-Det)架构。Please refer to Figure 6, which is a schematic diagram of the structure of a knowledge distillation framework 600 in an embodiment of the present application. The knowledge distillation framework is the knowledge distillation framework used in the embodiment of the present application. The knowledge distillation framework 600 includes a teacher network and a student network, the teacher network is a trained neural network model, and the student network is an initial binary neural network model _M0 ; the teacher network and the student network each include N layers of neural networks, where N is a positive integer. The knowledge distillation framework 600 is also referred to as a layer-wise search (LWS-Det) architecture in the embodiment of the present application.

应当注意，图6仅示出了教师网络和学生网络中第i层神经网络的部分结构，其它层神经网络的结构与第i层对应相同，i为大于1的正整数。It should be noted that FIG6 only shows the partial structure of the i-th layer neural network in the teacher network and the student network, and the structures of the other layers of the neural network are the same as those of the i-th layer, where i is a positive integer greater than 1.

在教师网络中，第i-1层神经网络的输出结果输入到第i层神经网络，经过批标准化(Batch Normalization，BN)层和激活函数(图6中未示出)等处理后，得到第i层的输入矩阵a_i-1；然后对输入矩阵a_i-1和权重矩阵w_i进行卷积运算，得到第一卷积输出结果；最后经过带参数的线性整流函数(Parametric Rectified Linear Unit，PReLU)和BN层等处理，得到第i层神经网络的输出结果。在学生网络中，第i-1层神经网络的输出结果输入到第i层神经网络，经过批标准化(Batch Normalization，BN)层、激活函数和二值化(图6中未示出)等处理后，得到第i层的二值输入矩阵

基于第一参考权重矩阵

第二参考权重矩阵

第一概率矩阵和第二概率矩阵确定第i层神经网络的二值权重矩阵

二值权重矩阵

的确定过程会在后文图7实施例中进行详细描述；然后对二值权重矩阵

和二值输入矩阵

进行卷积运算，得到参考特征矩阵；最后，通过权重缩放尺度因子a_i、PReLU和BN层等处理，得到第i层神经网络的输出结果。In the teacher network, the output of the i-1th layer of the neural network is input to the i-th layer of the neural network. After the batch normalization (BN) layer and the activation function (not shown in FIG6 ), the input matrix a _i-1 of the i-th layer is obtained. Then, the input matrix a _i-1 and the weight matrix w _i are convolved to obtain the first convolution output result. Finally, the output of the i-th layer of the neural network is obtained after the parametric linear rectifier function (PReLU) and the BN layer. In the student network, the output of the i-1th layer of the neural network is input to the i-th layer of the neural network. After the batch normalization (BN) layer, the activation function and the binarization (not shown in FIG6 ), the binary input matrix of the i-th layer is obtained.

Based on the first reference weight matrix

The second reference weight matrix

The first probability matrix and the second probability matrix determine the binary weight matrix of the i-th layer neural network

Binary weight matrix

The determination process will be described in detail in the embodiment of FIG. 7 later; then the binary weight matrix

and the binary input matrix

A convolution operation is performed to obtain a reference feature matrix; finally, the output result of the i-th layer neural network is obtained through weight scaling factor a _i , PReLU and BN layers.

应当理解，图6中的PReLU也可用其它激活函数代替，本申请对此不限定。第i层神经网络中输入矩阵a_i-1和权重矩阵w_i进行卷积运算的具体过程可以参见图5，此处不再赘述。It should be understood that the PReLU in FIG6 can also be replaced by other activation functions, and the present application does not limit this. The specific process of performing convolution operation on the input matrix a _i-1 and the weight matrix w _i in the i-th layer neural network can be seen in FIG5 , which will not be repeated here.

请参见图7，图7为本申请实施例中一种二值神经网络模型的训练方法700流程示意图。如图7所示，方法700包括步骤S1、步骤S2和步骤S3。Please refer to Figure 7, which is a flow chart of a training method 700 for a binary neural network model in an embodiment of the present application. As shown in Figure 7, the method 700 includes step S1, step S2 and step S3.

在一些示例中，该方法700可以由图1中的执行设备110、图3所示的芯片以及图4中的执行设备210等执行。In some examples, the method 700 may be executed by the execution device 110 in FIG. 1 , the chip shown in FIG. 3 , the execution device 210 in FIG. 4 , and the like.

步骤S1：确定知识蒸馏框架；其中，知识蒸馏框架中的教师网络为训练好的神经网络模型，知识蒸馏框架中的学生网络为初始二值神经网络模型M₀，教师网络和学生网络分别包含N层神经网络，N为正整数。Step S1: determine the knowledge distillation framework; wherein the teacher network in the knowledge distillation framework is a trained neural network model, the student network in the knowledge distillation framework is an initial binary neural network model M ₀ , the teacher network and the student network each contain N layers of neural networks, N is a positive integer.

其中，教师网络中训练好的神经网络模型可以是神经网络参数(weights)和经过非线性函数激活过的卷积运算输出(activations)为实数类型的神经网络模型，该实数类型包括但不限于全精度(32位浮点数)，半精度(16位浮点数)，或整型(8bit，4bit，2bit整数)。初始二值神经网络模型M₀为将模型中参数进行初始化后得到的。Among them, the trained neural network model in the teacher network can be a neural network model whose neural network parameters (weights) and convolution operation outputs (activations) activated by nonlinear functions are real number types, and the real number types include but are not limited to full precision (32-bit floating point numbers), half precision (16-bit floating point numbers), or integers (8bit, 4bit, 2bit integers). The initial binary neural network model _M0 is obtained after initializing the parameters in the model.

应当理解，本申请实施例可以应用于所有的二值神经网络的训练过程，本申请对此不限定。It should be understood that the embodiments of the present application can be applied to all binary neural network training processes, and the present application is not limited to this.

步骤S2：利用第j+1批图像和目标损失函数训练二值神经网络模型M_j，得到二值神经网络模型M_j+1；其中，二值神经网络模型M_j是基于第j批图像训练得到的，j为正整数；目标损失函数包含角度损失项，角度损失项用于描述教师网络中第i层神经网络对应的第一角度和学生网络中第i层神经网络对应的第二角度之间的差异；第一角度是基于教师网络中第i层神经网络的权重矩阵和第j+1批图像在教师网络中第i层神经网络中的输入矩阵得到的；第二角度是基于学生网络中第i层神经网络的二值权重矩阵和第j+1批图像在学生网络中第i层神经网络中的二值输入矩阵得到的；i为小于或等于N的正整数。Step S2: Use the j+1th batch of images and the target loss function to train the binary neural network model M _j to obtain the binary neural network model M _j+1 ; wherein the binary neural network model M _j is obtained based on the jth batch of images, and j is a positive integer; the target loss function includes an angle loss term, which is used to describe the difference between a first angle corresponding to the i-th layer of the neural network in the teacher network and a second angle corresponding to the i-th layer of the neural network in the student network; the first angle is obtained based on the weight matrix of the i-th layer of the neural network in the teacher network and the input matrix of the j+1th batch of images in the i-th layer of the neural network in the teacher network; the second angle is obtained based on the binary weight matrix of the i-th layer of the neural network in the student network and the binary input matrix of the j+1th batch of images in the i-th layer of the neural network in the student network; i is a positive integer less than or equal to N.

具体地，在利用第j+1批图像进行第j+1次训练的过程中，将第j+1批图像输入到上述知识蒸馏框架，即同步输入教师网络和学生网络中，利用教师网络来指导学生网络的训练。其中，第j+1批图像包括至少一张图像。Specifically, in the process of using the j+1th batch of images for the j+1th training, the j+1th batch of images is input into the above-mentioned knowledge distillation framework, that is, simultaneously input into the teacher network and the student network, and the teacher network is used to guide the training of the student network. The j+1th batch of images includes at least one image.

对于第j+1批图像中的每张图像，计算每张图像在教师网络中第i层神经网络中对应的输入矩阵和该层神经网络的权重矩阵之间的角度，即第一角度；同时计算每张图像在学生网络中第i层神经网络中对应的二值输入矩阵和该层神经网络的二值权重矩阵之间的角度，即第二角度；基于公式(2)计算每张图像的在第i层神经网络中的角度损失。然后基于第j+1批图像中每张图像在第i层神经网络中的角度损失计算角度损失平均值，该角度损失平均值即为第j+1批图像在第i层神经网络中的角度损失项。最后，对第j+1批图像在每层神经网络中的角度损失项进行累加，得到在第j+1次训练中的目标函数中角度损失项的损失值。For each image in the j+1th batch of images, the angle between the input matrix corresponding to each image in the i-th layer of the neural network in the teacher network and the weight matrix of the neural network in this layer is calculated, that is, the first angle; at the same time, the angle between the binary input matrix corresponding to each image in the i-th layer of the neural network in the student network and the binary weight matrix of the neural network in this layer is calculated, that is, the second angle; the angle loss of each image in the i-th layer of the neural network is calculated based on formula (2). Then, based on the angle loss of each image in the j+1th batch of images in the i-th layer of the neural network, the average angle loss is calculated, and the average angle loss is the angle loss term of the j+1th batch of images in the i-th layer of the neural network. Finally, the angle loss terms of the j+1th batch of images in each layer of the neural network are accumulated to obtain the loss value of the angle loss term in the objective function in the j+1th training.

其中，

为每张图像在第i层神经网络中的角度损失项；cosθ_i为每张图像对应的第一角度；

为每张图像对应的第二角度；a_i-1为每张图像在教师网络中第i层神经网络中的输入矩阵；w_i为教师网络中第i层神经网络中的权重矩阵；

为每张图像在学生网络中第i层神经网络中的二值输入矩阵；

为学生网络中第i层神经网络中的二值权重矩阵；

为张量积(tensor product)。in,

is the angle loss term of each image in the i-th layer of the neural network; cosθ _i is the first angle corresponding to each image;

is the second angle corresponding to each image; a _i-1 is the input matrix of each image in the i-th layer of the neural network in the teacher network; w _i is the weight matrix in the i-th layer of the neural network in the teacher network;

The binary input matrix for each image in the i-th layer of the neural network in the student network;

is the binary weight matrix in the i-th layer of the neural network in the student network;

is the tensor product.

可以看出，在本申请实施例中，采用训练好的教师网络来指导学生网络的训练，且在目标损失函数中设计角度损失项来更新学生网络中的参数，一方面可以使得学生网络对输入样本的特征提取结果与教师网络对输入样本的特征提取结果接近，另一方面，使得学生网络中二值权重矩阵和二值输入矩阵间的角度与教师网络中权重矩阵和输入矩阵间的角度相接近。综上，相较于现有技术中二值神经网络模型的训练过程未考虑量化后的角度损失，本申请可以通过引入知识蒸馏框架和损失函数中的角度损失项使得训练后学生网络的性能最大程度上接近教师网络的性能，从而提升本申请实施例中训练得到的目标二值神经网络模型的预测精度。It can be seen that in the embodiment of the present application, a trained teacher network is used to guide the training of the student network, and an angle loss term is designed in the target loss function to update the parameters in the student network. On the one hand, the feature extraction results of the student network for the input samples can be made close to the feature extraction results of the teacher network for the input samples. On the other hand, the angle between the binary weight matrix and the binary input matrix in the student network is made close to the angle between the weight matrix and the input matrix in the teacher network. In summary, compared with the prior art in which the training process of the binary neural network model does not consider the quantized angle loss, the present application can introduce the knowledge distillation framework and the angle loss term in the loss function to make the performance of the trained student network as close to the performance of the teacher network as possible, thereby improving the prediction accuracy of the target binary neural network model trained in the embodiment of the present application.

在一种可行的实施方式中，目标损失函数还包括卷积结果损失项；其中，卷积结果损失项用于描述教师网络中第i层神经网络的第一卷积输出结果和学生网络中第i层神经网络的第二卷积输出结果之间的差异；第一卷积输出结果是基于教师网络中第i层神经网络的权重矩阵和第j+1批图像在教师网络中第i层神经网络的输入矩阵得到的；第二卷积输出结果是基于学生网络中第i层神经网络对应的二值权重矩阵和对应的权重缩放尺度因子，以及第j+1批图像在学生网络中第i层神经网络中的二值输入矩阵得到的。In a feasible implementation, the target loss function also includes a convolution result loss term; wherein the convolution result loss term is used to describe the difference between the first convolution output result of the i-th layer neural network in the teacher network and the second convolution output result of the i-th layer neural network in the student network; the first convolution output result is based on the weight matrix of the i-th layer neural network in the teacher network and the input matrix of the i-th layer neural network in the teacher network for the j+1-th batch of images; the second convolution output result is based on the binary weight matrix and the corresponding weight scaling factor corresponding to the i-th layer neural network in the student network, as well as the binary input matrix of the j+1-th batch of images in the i-th layer neural network in the student network.

具体地，对于第j+1批图像中的每张图像，其在第i层神经网络中对应的卷积结果损失可以采用公式(3)计算得到：即对每张图像在教师网络中第i层神经网络的输入矩阵和教师网络中第i层神经网络的权重矩阵进行卷积运算，得到每张图像对应的第一卷积输出结果；对每张图像在学生网络中第i层神经网络的二值输入矩阵和学生网络中第i层神经网络的二值权重矩阵进行卷积运算，得到参考特征矩阵，利用学生网络中第i层神经网络的权重缩放尺度因子对该参考特征矩阵进行逐元素乘，得到每张图像对应的第二卷积输出结果。对每张图像在第i层神经网络中的第一卷积输出结果和第二卷积输出结果相减后计算二范数，得到每张图像在第i层神经网络中的卷积结果损失值。然后基于第j+1批图像中每张图像的卷积结果损失值计算卷积结果损失平均值，该卷积结果损失平均值即为第j+1批图像在第i层神经网络中的卷积结果损失；最后，将第j+1批图像在每层神经网络中对应的卷积结果损失进行累加，得到第j+1次训练中的目标函数中卷积结果损失项的损失值。Specifically, for each image in the j+1th batch of images, the convolution result loss corresponding to the i-th layer of the neural network can be calculated using formula (3): that is, the input matrix of the i-th layer of the neural network in the teacher network and the weight matrix of the i-th layer of the neural network in the teacher network are convolved to obtain the first convolution output result corresponding to each image; the binary input matrix of the i-th layer of the neural network in the student network and the binary weight matrix of the i-th layer of the neural network in the student network are convolved to obtain the reference feature matrix, and the reference feature matrix is element-wise multiplied by the weight scaling factor of the i-th layer of the neural network in the student network to obtain the second convolution output result corresponding to each image. After subtracting the first convolution output result and the second convolution output result of each image in the i-th layer of the neural network, the bi-norm is calculated to obtain the convolution result loss value of each image in the i-th layer of the neural network. Then, based on the convolution result loss value of each image in the j+1th batch of images, the average convolution result loss is calculated. The average convolution result loss is the convolution result loss of the j+1th batch of images in the i-th layer of the neural network. Finally, the convolution result loss corresponding to the j+1th batch of images in each layer of the neural network is accumulated to obtain the loss value of the convolution result loss item in the objective function in the j+1th training.

其中，

为每张图像在第i层神经网络中的卷积结果损失；E_i为求取每张图像在第i层神经网络中的卷积结果损失的函数；α_i为第i层神经网络对应的权重缩放尺度因子；⊙为哈达玛积(Hadamard product)，表示对两个矩阵中对应位置元素相乘；°为逐元素相乘；公式(3)中其余参数的物理意义参见公式(2)中相关解释，此处不再赘述。in,

is the convolution loss of each image in the i-th layer of the neural network; E _i is the function for obtaining the convolution loss of each image in the i-th layer of the neural network; α _i is the weight scaling factor corresponding to the i-th layer of the neural network; ⊙ is the Hadamard product, which means the multiplication of the elements at the corresponding positions in the two matrices; ° is the element-by-element multiplication; the physical meanings of the other parameters in formula (3) refer to the relevant explanations in formula (2) and will not be repeated here.

具体地，在第j+1次训练中，第i层神经网络对应的权重损失可以采用公式(4)计算得到：即利用学生网络中第i层神经网络的权重缩放尺度因子对该网络中第i层神经网络的二值权重矩阵进行逐元素乘，得到第一权重矩阵；然后对第一权重矩阵和教师网络中第i层神经网络的权重矩阵相减后计算二范数，得到第i层神经网络对应的权重损失。按照上述步骤计算得到第j+1次训练中每层神经网络对应的权重损失，并对每层神经网络的权重损失进行累加，得到第j+1次训练中目标损失函数中的权重损失项的损失值。Specifically, in the j+1th training, the weight loss corresponding to the i-th layer of the neural network can be calculated using formula (4): that is, the binary weight matrix of the i-th layer of the neural network in the student network is multiplied element by element using the weight scaling factor of the i-th layer of the neural network in the student network to obtain the first weight matrix; then the first weight matrix is subtracted from the weight matrix of the i-th layer of the neural network in the teacher network and the binary norm is calculated to obtain the weight loss corresponding to the i-th layer of the neural network. According to the above steps, the weight loss corresponding to each layer of the neural network in the j+1th training is calculated, and the weight loss of each layer of the neural network is accumulated to obtain the loss value of the weight loss term in the target loss function in the j+1th training.

其中，

为第j+1次训练中第i层神经网络对应的权重损失；其余参数物理意义参见公式(2)和公式(3)中相关解释，此处不再赘述。in,

is the weight loss corresponding to the i-th layer of the neural network in the j+1-th training. The physical meanings of the remaining parameters refer to the relevant explanations in formulas (2) and (3) and will not be repeated here.

可以看出，本申请实施例还可以通过在目标损失函数中引入表征教师网络中权重矩阵和学生网络中二值权重矩阵差异的权重损失项，与卷积结果损失项和角度损失项共同训练学生网络。通过在目标损失函数中引入上述三种模型性能衡量指标来训练学生网络，可以最大程度提升训练后得到的目标神经网络模型的性能，使其最大程度接近神经网络的性能。It can be seen that the embodiment of the present application can also introduce a weight loss term that characterizes the difference between the weight matrix in the teacher network and the binary weight matrix in the student network into the target loss function, and train the student network together with the convolution result loss term and the angle loss term. By introducing the above three model performance measurement indicators into the target loss function to train the student network, the performance of the target neural network model obtained after training can be maximized, making it as close to the performance of the neural network as possible.

具体地，目标损失函数中还包含检测损失项，该检测损失项是基于第j+1批图像的预测值和第j+1批图像的标签得到的。检测损失项的具体计算过程如下：计算第j+1批图像中每张图像的预测值和标签之间的差值，然后取平均值，即得到第j+1次训练过程中目标损失函数中检测损失项的损失值。应当理解，上述目标损失函数可以包含角度损失项、卷积结果损失项、权重损失项或检测结果损失项中的一项或多项。基于目标损失函数计算第j+1次训练过程中损失函数值，损失函数值越小，说明学生网络和教师网络的性能越接近。按照使目标损失函数值越来越小的方向更新二值神经网络模型M_j中每层神经网络中的参数，得到二值神经网络模型M_j+1。Specifically, the target loss function also includes a detection loss term, which is obtained based on the predicted value of the j+1th batch of images and the label of the j+1th batch of images. The specific calculation process of the detection loss term is as follows: calculate the difference between the predicted value and the label of each image in the j+1th batch of images, and then take the average value, that is, obtain the loss value of the detection loss term in the target loss function during the j+1th training process. It should be understood that the above-mentioned target loss function may include one or more of the angle loss term, the convolution result loss term, the weight loss term or the detection result loss term. The loss function value in the j+1th training process is calculated based on the target loss function. The smaller the loss function value, the closer the performance of the student network and the teacher network. Update the parameters in each layer of the neural network in the binary neural network model _Mj in the direction of making the target loss function value smaller and smaller, and obtain the binary neural network model Mj ₊₁ .

进一步，可选地，在更新二值神经网络模型M_j中每层神经网络中的参数时可按照从第N层神经网络到第1层神经网络的顺序逐层更新每层神经网络的参数，本申请对此不限定。Further, optionally, when updating the parameters in each layer of the neural network in the binary neural network model _Mj, the parameters of each layer of the neural network can be updated layer by layer in the order from the Nth layer of the neural network to the 1st layer of the neural network, which is not limited in the present application.

具体地，在第j+1次训练中第i层神经网络的前向传播过程中，基于第i层神经网络对应的参考权重矩阵和概率矩阵得到第i层神经网络的二值权重矩阵，该二值权重矩阵为学生网络第i层神经网络中的模型参数。对于第j+1批图像中的每张图像，当前向传播到第N层神经网络时，基于第N层神经网络的二值权重矩阵和每张图像在第N层神经网络中对应的二值输入矩阵，得到每张图像的第二卷积输出结果；然后对每张图像的第二卷积输出结果输入激活函数、BN层和全连接Softmax层等结构中，得到二值神经网络模型M_j对第j+1批图像中每张图像的预测值。Specifically, during the forward propagation of the i-th layer of the neural network in the j+1th training, the binary weight matrix of the i-th layer of the neural network is obtained based on the reference weight matrix and probability matrix corresponding to the i-th layer of the neural network, and the binary weight matrix is the model parameter in the i-th layer of the student network. For each image in the j+1th batch of images, when forward propagating to the N-th layer of the neural network, the second convolution output result of each image is obtained based on the binary weight matrix of the N-th layer of the neural network and the binary input matrix corresponding to each image in the N-th layer of the neural network; then the second convolution output result of each image is input into the activation function, BN layer, and fully connected Softmax layer structures to obtain the prediction value of the binary neural network model M _j for each image in the j+1th batch of images.

应当理解，上述激活函数可以是线性整流函数(Rectified Linear Unit，ReLU)、PReLU或其它可能的激活函数，本申请对此不限定。It should be understood that the above activation function may be a rectified linear unit (ReLU), PReLU or other possible activation functions, which is not limited in this application.

可选地，第j+1次训练过程中，可采用可微二值搜索(DifferentiableBinarization Search，DBS)方法从搜索空间O中搜索出学生网络中第i层神经网络的二值权重矩阵；其中，该搜索空间可以为学生网络中第i层神经网络对应的第一参考权重矩阵和第二参考权重矩阵。进一步，可选地，第一参考权重矩阵和第二参考权重矩阵中的元素全部为1或-1，当第一参考权重矩阵中的元素全部为1时，第二参考权重矩阵中的元素全部为-1，当第一参考权重矩阵中的元素全部为-1时，第二参考权重矩阵中的元素全部为1。第一参考权重矩阵对应第一概率矩阵，第二参考权重矩阵对应第二概率矩阵。应当理解，上述第一参考权重矩阵、第二参考权重矩阵、第一概率矩阵和第二概率矩阵的宽和高相同。Optionally, during the j+1th training process, a differentiable binary search (DBS) method can be used to search for the binary weight matrix of the i-th layer neural network in the student network from the search space O; wherein the search space can be the first reference weight matrix and the second reference weight matrix corresponding to the i-th layer neural network in the student network. Further, optionally, all elements in the first reference weight matrix and the second reference weight matrix are 1 or -1, when all elements in the first reference weight matrix are 1, all elements in the second reference weight matrix are -1, and when all elements in the first reference weight matrix are -1, all elements in the second reference weight matrix are 1. The first reference weight matrix corresponds to the first probability matrix, and the second reference weight matrix corresponds to the second probability matrix. It should be understood that the width and height of the first reference weight matrix, the second reference weight matrix, the first probability matrix and the second probability matrix are the same.

具体地，根据第一参考权重矩阵中任一位置上元素在第一概率矩阵中对应的第一概率值和第二参考权重矩阵该任一位置上的元素在第二概率矩阵中对应的第一概率值确定学生网络中第i层神经网络中二值权重矩阵该任一位置上的元素。进一步，可选地，可以将上述第一概率值和第二概率值中较大概率值在参考权重矩阵中该任一位置上对应的元素作为二值权重矩阵中该任一位置上的元素。Specifically, the element at any position of the binary weight matrix in the student network is determined according to the first probability value corresponding to the element at any position in the first reference weight matrix in the first probability matrix and the first probability value corresponding to the element at any position in the second reference weight matrix in the second probability matrix. Further, optionally, the element corresponding to the larger probability value of the first probability value and the second probability value at any position in the reference weight matrix can be used as the element at any position in the binary weight matrix.

举例来说，第i层神经网络中的第一参考权重矩阵和第二参考权重矩阵可以分别为图6中所示的

和

第一概率矩阵和第二概率矩阵可以分别为图6中所示的

和

当

的值大于

的值时，将

在第一参考权重矩阵

对应位置上的元素，即-1，确定为二值权重矩阵

中第一行第一列的元素，按照此规则依次确定二值权重矩阵

中的每个元素。For example, the first reference weight matrix and the second reference weight matrix in the i-th layer neural network can be respectively as shown in FIG.

and

The first probability matrix and the second probability matrix can be respectively as shown in FIG. 6

and

when

The value is greater than

When the value of

In the first reference weight matrix

The element at the corresponding position, i.e. -1, is determined as a binary weight matrix

The elements in the first row and first column of the binary weight matrix are determined in turn according to this rule

Each element in .

可选地，还可以在确定第i层神经网络的二值权重矩阵之前，参照公式(5)对第一概率矩阵和第二概率矩阵进行归一化处理；然后根据公式(6)确定第i层神经网络对应的二值权重矩阵。Optionally, before determining the binary weight matrix of the i-th layer neural network, the first probability matrix and the second probability matrix may be normalized with reference to formula (5); and then the binary weight matrix corresponding to the i-th layer neural network may be determined according to formula (6).

公式(5)和(6)中，

为归一化后的概率矩阵；操作o_k,o′_k∈O；s.t.表示满足于；

表示操作o_k的第i层神经网络对应的二值权重矩阵中第l个权重对应的第一概率值和第二概率值；

表示取第一概率值和第二概率值中的最大值；

为第i层神经网络对应的二值权重矩阵

第l个位置上的元素。In formulas (5) and (6),

is the normalized probability matrix; operation o _k ,o′ _k ∈O; st means satisfied;

Represents the first probability value and the second probability value corresponding to the lth weight in the binary weight matrix corresponding to the i-th layer neural network of operation o _k ;

Indicates taking the maximum value of the first probability value and the second probability value;

is the binary weight matrix corresponding to the i-th layer of the neural network

The element at position l.

可以看出，在本申请实施例中，学生网络中的每层神经网络中的参考权重矩阵包含第一参考权重矩阵和第二参考权重矩阵；且每层神经网络中的概率矩阵包含第一概率矩阵和第二概率矩阵，第一概率矩阵与第一参考权重矩阵对应，第二概率矩阵和第二参考权重矩阵对应。基于第一参考权重矩阵和第二参考权重矩阵相同位置元素的概率值，选取第一参考权重矩阵或第二参考权重矩阵中该位置上元素作为二值权重矩阵中该位置上的元素；基于此规则得到单词训练过程中每层神经网络对应的二值权重矩阵；然后基于该二值权重矩阵计算得到上述实施例中对应的每层神经网络的第二卷积输出结果和每次训练过程中目标损失函数值，从而确保训练过程的正确进行，进而得到最优的二值神经网络模型。It can be seen that in the embodiment of the present application, the reference weight matrix in each layer of the neural network in the student network includes a first reference weight matrix and a second reference weight matrix; and the probability matrix in each layer of the neural network includes a first probability matrix and a second probability matrix, the first probability matrix corresponds to the first reference weight matrix, and the second probability matrix corresponds to the second reference weight matrix. Based on the probability value of the elements at the same position in the first reference weight matrix and the second reference weight matrix, the element at that position in the first reference weight matrix or the second reference weight matrix is selected as the element at that position in the binary weight matrix; based on this rule, the binary weight matrix corresponding to each layer of the neural network in the word training process is obtained; then, based on the binary weight matrix, the second convolution output result of each layer of the neural network corresponding to the above embodiment and the target loss function value in each training process are calculated, thereby ensuring the correct conduct of the training process, and then obtaining the optimal binary neural network model.

具体地，可参见图6所示，在学生网络的第i层神经网络中，对于第j+1批图像中的每张图像，对每张图像在第i层神经网络的输入矩阵进行激活、标准化和二值化等操作后，得到每张图像在第i层神经网络中的二值输入矩阵

然后对每张图像的二值输入矩阵和第i层神经网络的二值权重矩阵进行卷积运算，得到每张图像的参考特征矩阵；最后利用第i层神经网络的权重缩放尺度因子α_i对每张图像的参考特征矩阵进行逐元素相乘，得到每张图像的第二卷积输出结果。Specifically, as shown in FIG6 , in the i-th layer neural network of the student network, for each image in the j+1-th batch of images, after activating, standardizing, and binarizing the input matrix of the i-th layer neural network for each image, the binary input matrix of each image in the i-th layer neural network is obtained.

Then, the binary input matrix of each image and the binary weight matrix of the i-th layer neural network are convolved to obtain the reference feature matrix of each image; finally, the reference feature matrix of each image is element-by-element multiplied using the weight scaling factor α _i of the i-th layer neural network to obtain the second convolution output result of each image.

在一种可行的实施方式中，上述参数包括所述概率矩阵或所述权重缩放尺度因子中的至少一个。In a feasible implementation manner, the above parameters include at least one of the probability matrix or the weight scaling factor.

具体地，在第j+1次训练过程中，在参照上述实施例中过程计算得到第j+1次训练过程中的损失函数值后，可以按照使目标损失函数值逐渐减小的趋势调整学生网络中每层神经网络中的概率矩阵或权重缩放尺度因子中的至少一个。Specifically, during the j+1th training process, after the loss function value of the j+1th training process is calculated by referring to the process in the above embodiment, at least one of the probability matrix or weight scaling factor in each layer of the neural network in the student network can be adjusted according to the trend of gradually reducing the target loss function value.

可以看出，在本申请实施例中，每次训练反向传播过程会更新每层神经网络中的概率矩阵和权重缩放尺度因子中的至少一个，从而使得下次训练时正向传播过程中，每层神经网络中的二值权重矩阵得到更新，进而基于更新后的二值权重矩阵和/或权重缩放尺度因子得到该次训练模型输出的图像预测值和损失函数值，基于得到的损失函数值进一步调整模型参数，确保得到与教师网络性能接近的学生网络。It can be seen that in the embodiment of the present application, each training back propagation process will update at least one of the probability matrix and weight scaling factor in each layer of the neural network, so that during the forward propagation process of the next training, the binary weight matrix in each layer of the neural network is updated, and then based on the updated binary weight matrix and/or weight scaling factor, the image prediction value and loss function value output by the training model are obtained, and based on the obtained loss function value, the model parameters are further adjusted to ensure that a student network with performance close to that of the teacher network is obtained.

可选地，当第j+1批图像中包含图像数量为一张时，上述目标损失函数的表达式可以如公式(7)所示：Optionally, when the number of images contained in the j+1th batch of images is one, the expression of the above objective loss function can be shown as formula (7):

其中，L为目标损失函数；L^GT为检测结果损失项；

为角度损失项；

为卷积结果损失项；

为权重损失项；λL^Lim为细纹理损失项，用于限制对检测头的辅助学习；λ、μ和γ为模型中的超参数，可选地，可以将其分别设置为0.01、0.01和0.0001，本申请对此不限定。Among them, L is the target loss function; L ^GT is the detection result loss term;

is the angle loss term;

is the convolution result loss term;

is the weight loss term; λL ^Lim is the fine texture loss term, which is used to limit the auxiliary learning of the detection head; λ, μ and γ are hyperparameters in the model, which can be set to 0.01, 0.01 and 0.0001 respectively, which is not limited in this application.

步骤S3：当满足预设条件时，将二值神经网络模型M_j+1作为目标二值神经网络模型；否则令j＝j+1，并重复步骤S2。Step S3: When the preset conditions are met, the binary neural network model M _j+1 is used as the target binary neural network model; otherwise, j=j+1 is set and step S2 is repeated.

可选地，在第j+1次训练结束后，当满足以下任一预设条件时，将二值神经网络模型Mj+1作为目标二值神经网络模型：Optionally, after the j+1th training is completed, when any of the following preset conditions is met, the binary neural network model Mj+1 is used as the target binary neural network model:

预设条件一：前j+1次训练过程中所使用的训练图片数量达到预设数量。该预设数量可以是训练集中的图片总数或其它任意数值，本申请对此不限定。Preset condition 1: The number of training images used in the previous j+1 training processes reaches a preset number. The preset number can be the total number of images in the training set or any other value, which is not limited in this application.

预设条件二：第j+1次训练过程中目标损失函数值小于预设数值。该预设数值可以是根据具体场景设置，本申请对此不限定。Preset condition 2: During the j+1th training process, the target loss function value is less than a preset value. The preset value can be set according to a specific scenario, and this application does not limit this.

预设条件三：基于第j+1次训练过程中所得到的多张图像预测值与标签得到的预测准确率高于预设比例。该预设数值可以是根据具体场景设置，本申请对此不限定。Preset condition three: The prediction accuracy rate obtained based on the predicted values and labels of multiple images obtained in the j+1th training process is higher than a preset ratio. The preset value can be set according to a specific scenario, and this application does not limit this.

可以看出，在本申请实施例中，采用知识蒸馏框架中训练好的教师网络指导学生网络的训练过程，且在目标损失函数中设计角度损失项、卷积结果损失项以及权重损失项来共同更新学生网络中的参数，具体地：通过目标损失函数中角度损失项来最小化训练得到的目标二值神经网络量化后的角度损失，通过卷积结果损失项以及权重损失项来最小化训练得到的目标二值神经网络量化后的幅值损失。从而使得训练得到的目标二值神经网络模型的性能接近于最大程度上接近教师网络的性能，提升本申请实施例中训练得到的目标二值神经网络模型的预测精度。It can be seen that in the embodiment of the present application, the teacher network trained in the knowledge distillation framework is used to guide the training process of the student network, and the angle loss term, the convolution result loss term and the weight loss term are designed in the target loss function to jointly update the parameters in the student network. Specifically: the angle loss term in the target loss function is used to minimize the angle loss of the trained target binary neural network after quantization, and the convolution result loss term and the weight loss term are used to minimize the amplitude loss of the trained target binary neural network after quantization. Thereby, the performance of the trained target binary neural network model is close to the performance of the teacher network to the greatest extent, and the prediction accuracy of the trained target binary neural network model in the embodiment of the present application is improved.

请参见图8，图8为本申请实施例提供的另一种模型训练方法800的流程示意图。方法800包括步骤S810和步骤S820。该模型包括教师网络和学生网络，教师网络为训练好的神经网络模型，学生网络为二值神经网络模型，教师网络和学生网络分别包含N层神经网络，N为正整数。Please refer to Figure 8, which is a flow chart of another model training method 800 provided in an embodiment of the present application. Method 800 includes step S810 and step S820. The model includes a teacher network and a student network, the teacher network is a trained neural network model, the student network is a binary neural network model, and the teacher network and the student network each include N layers of neural networks, where N is a positive integer.

步骤S810，利用教师网络和目标损失函数对二值神经网络模型进行训练；其中，目标损失函数包括角度损失项，角度损失项用于描述教师网络中第i层神经网络对应的第一角度和学生网络中第i层神经网络对应的第二角度之间的差异；第一角度是基于教师网络中第i层神经网络的权重矩阵和教师网络中第i层神经网络中的输入矩阵得到的；第二角度是基于学生网络中第i层神经网络的二值权重矩阵和学生网络中第i层神经网络中的二值输入矩阵得到的；i为小于或等于N的正整数。Step S810, training the binary neural network model using the teacher network and the target loss function; wherein the target loss function includes an angle loss term, which is used to describe the difference between a first angle corresponding to the i-th layer of the neural network in the teacher network and a second angle corresponding to the i-th layer of the neural network in the student network; the first angle is based on the weight matrix of the i-th layer of the neural network in the teacher network and the input matrix in the i-th layer of the neural network in the teacher network; the second angle is based on the binary weight matrix of the i-th layer of the neural network in the student network and the binary input matrix in the i-th layer of the neural network in the student network; i is a positive integer less than or equal to N.

步骤S820，重复执行上述步骤S810，直到满足迭代终止条件，得到目标二值神经网络模型。Step S820, repeat the above step S810 until the iteration termination condition is met to obtain the target binary neural network model.

在一种可行的实施方式中，上述将训练图像输入二值神经网络模型，得到训练图像的预测值，包括：P1：基于二值神经网络模型中第i层神经网络对应的参考权重矩阵和概率矩阵得到第i层神经网络的二值权重矩阵；其中，概率矩阵中的任一位置元素用于表征二值权重矩阵中该任一位置上元素取参考权重矩阵中该任一位置上元素的概率值；P2：根据二值权重矩阵和训练图像在第i层神经网络的二值输入矩阵，得到第i层神经网络的第二卷积输出结果；P3：令i＝i+1，并重复步骤P1-P2，基于第N层神经网络的第二卷积输出结果得到训练图像的预测值。In a feasible implementation, the above-mentioned input of the training image into the binary neural network model to obtain the predicted value of the training image includes: P1: obtaining the binary weight matrix of the i-th layer neural network based on the reference weight matrix and probability matrix corresponding to the i-th layer neural network in the binary neural network model; wherein, the element at any position in the probability matrix is used to characterize the probability value of the element at any position in the binary weight matrix taking the element at any position in the reference weight matrix; P2: obtaining the second convolution output result of the i-th layer neural network according to the binary weight matrix and the binary input matrix of the training image in the i-th layer neural network; P3: setting i=i+1, and repeating steps P1-P2, and obtaining the predicted value of the training image based on the second convolution output result of the N-th layer neural network.

应当理解，在上述图8实施例中的训练图像可以理解为图7中每次训练过程中所使用的图像，图8实施例中具体训练过程可以参照图7中对应过程的描述，此处不再赘述。It should be understood that the training images in the above-mentioned embodiment of FIG. 8 can be understood as the images used in each training process in FIG. 7 . The specific training process in the embodiment of FIG. 8 can refer to the description of the corresponding process in FIG. 7 , which will not be repeated here.

请参见图9-A到图9-C，图9-A到图9-C为本申请实施例中不同网络模型的提取特征分布示意图。如图9-A所示，图9-A为教师网络中第一层神经网络和最后一层神经网络提取特征的分布图；图9-B为本申请实施例中二值神经网络中第一层神经网络和最后一层神经网络的提取特征的分布图；图9-C为采用高效的二值化目标检测器(An EfficientBinarized Object Detector，BiDet)方法得到的二值神经网络中第一层神经网络和最后一层神经网络提取特征的分布图。Please refer to Figures 9-A to 9-C, which are schematic diagrams of the distribution of extracted features of different network models in the embodiments of the present application. As shown in Figure 9-A, Figure 9-A is a distribution diagram of the extracted features of the first layer of neural network and the last layer of neural network in the teacher network; Figure 9-B is a distribution diagram of the extracted features of the first layer of neural network and the last layer of neural network in the binary neural network in the embodiment of the present application; Figure 9-C is a distribution diagram of the extracted features of the first layer of neural network and the last layer of neural network in the binary neural network obtained by using an efficient binarized object detector (An Efficient Binarized Object Detector, BiDet) method.

可以看出，采用本申请实施例得到的二值神经网络提取特征的分布接近于教师网络提取特征的分布，而采用BiDet方法得到的二值神经网络提取特征的分布与教师网络提取特征的分布相差较大，说明采用本申请实施例方法训练得到的二值神经网络模型与教师网络模型的性能接近，预测准确率高于现有技术中其它二值神经网络模型。It can be seen that the distribution of features extracted by the binary neural network obtained by the embodiment of the present application is close to the distribution of features extracted by the teacher network, while the distribution of features extracted by the binary neural network obtained by the BiDet method is quite different from the distribution of features extracted by the teacher network, indicating that the binary neural network model trained by the method of the embodiment of the present application has a performance close to that of the teacher network model, and the prediction accuracy is higher than other binary neural network models in the prior art.

请参见图10，图10为本申请实施例中特征矩阵和权重矩阵之间夹角示意图。特征矩阵夹角即指上述实施例中的任意一层神经网络中权重矩阵和输入矩阵之间的夹角。图10中将教师网络和二值神经网络中的特征矩阵简化为三维向量进行表征。图10中的(a)表征了教师网络中权重向量w和输入向量a之间的夹角，以及各自的幅值。图10中的(d)表征了本申请实施例中二值神经网络的二值权重向量和二值输入向量之间的夹角以及各自的幅值。图10中的(b)和(c)分别表征了采用其余二值神经网络中二值权重向量和二值输入向量之间的夹角以及各自的幅值。将图10中的(b)(c)和(d)分别与(a)比较可以看出：采用本申请实施例中的方法量化得到的二值权重向量

和二值输入向量

之间的夹角

以及各自的幅值，与教师网络中对应向量之间的夹角θ和幅值基本相同，即图10中的(d)所示。采用图10中的(b)所示方法量化得到的二值权重向量

和二值输入向量

发生了重合，且量化后得到的二值权重向量的幅值大于教师网络中权重向量的幅值。采用图10中的(c)所示方法进行量化得到的二值权重向量

和二值输入向量

发生了重合，即夹角为零；且二值权重向量的幅值远小于教师网络中权重向量的幅值。Please refer to Figure 10, which is a schematic diagram of the angle between the feature matrix and the weight matrix in the embodiment of the present application. The feature matrix angle refers to the angle between the weight matrix and the input matrix in any layer of the neural network in the above embodiments. In Figure 10, the feature matrices in the teacher network and the binary neural network are simplified into three-dimensional vectors for representation. (a) in Figure 10 represents the angle between the weight vector w and the input vector a in the teacher network, as well as their respective amplitudes. (d) in Figure 10 represents the angle between the binary weight vector and the binary input vector of the binary neural network in the embodiment of the present application, as well as their respective amplitudes. (b) and (c) in Figure 10 respectively represent the angle between the binary weight vector and the binary input vector in the remaining binary neural networks, as well as their respective amplitudes. Comparing (b), (c) and (d) in Figure 10 with (a), it can be seen that the binary weight vector quantized using the method in the embodiment of the present application is

and the binary input vector

The angle between

and their respective amplitudes are basically the same as the angle θ and amplitude between the corresponding vectors in the teacher network, as shown in (d) in Figure 10. The binary weight vector quantized using the method shown in (b) in Figure 10 is

and the binary input vector

The binary weight vector obtained by quantization is larger than the weight vector in the teacher network.

and the binary input vector

There is an overlap, that is, the angle is zero; and the amplitude of the binary weight vector is much smaller than the amplitude of the weight vector in the teacher network.

可以看出，在本申请实施例中，通过在目标函数中引入角度损失项来最小化目标二值神经网络中特征矩阵和权重矩阵间的角度与教师网络中特征矩阵和权重矩阵间的差异，从而使得训练得到的目标二值神经网络的性能接近教师网络的性能，即预测准确率。It can be seen that in the embodiment of the present application, the angle loss term is introduced into the objective function to minimize the difference between the angle between the feature matrix and the weight matrix in the target binary neural network and the angle between the feature matrix and the weight matrix in the teacher network, so that the performance of the trained target binary neural network is close to the performance of the teacher network, that is, the prediction accuracy.

图11为本申请图像处理方法的流程示意图。图11中的方法1100包括步骤1110及步骤1120。FIG11 is a flowchart of the image processing method of the present application. The method 1100 in FIG11 includes step 1110 and step 1120.

在一些示例中，该方法1100可以由图1中的执行设备110、图3所示的芯片以及图4中的执行设备210等设备执行。In some examples, the method 1100 may be executed by a device such as the execution device 110 in FIG. 1 , the chip shown in FIG. 3 , and the execution device 210 in FIG. 4 .

步骤1110，获取待处理图像。Step 1110, obtaining an image to be processed.

步骤1120，利用目标二值神经网络模型对待处理图像进行图像处理，得到待处理图像的预测值。Step 1120, using the target binary neural network model to perform image processing on the image to be processed to obtain a predicted value of the image to be processed.

其中，目标二值神经网络模型通过K次训练得到的，在K次训练中的第j+1次训练中：利用第j+1批图像和目标损失函数训练二值神经网络模型M_j，得到二值神经网络模型M_j+1；二值神经网络模型M_j为知识蒸馏框架中的学生网络；知识蒸馏框架中的教师网络为训练好的神经网络模型，教师网络和学生网络分别包含N层神经网络，N为正整数；目标损失函数包含角度损失项；K为正整数，j为大于或等于零，且小于或等于K的整数；角度损失项用于描述教师网络中第i层神经网络对应的第一角度和学生网络中第i层神经网络对应的第二角度之间的差异；第一角度是基于教师网络中第i层神经网络对应的权重矩阵和第j+1批图像在教师网络中第i层神经网络中的输入矩阵得到的；第二角度是基于学生网络中第i层神经网络对应的二值权重矩阵和第j+1批图像在学生网络中第i层神经网络中的二值输入矩阵得到的；i为小于或等于N的正整数。Among them, the target binary neural network model is obtained through K trainings, and in the j+1th training among the K trainings: the binary neural network model _Mj is trained using the j+1th batch of images and the target loss function to obtain the binary neural network model Mj ₊₁ ; the binary neural network model _Mj is the student network in the knowledge distillation framework; the teacher network in the knowledge distillation framework is a trained neural network model, and the teacher network and the student network respectively include N layers of neural networks, where N is a positive integer; the target loss function includes an angle loss term; K is a positive integer, j is an integer greater than or equal to zero and less than or equal to K; the angle loss term is used to describe the difference between the first angle corresponding to the i-th layer of the neural network in the teacher network and the second angle corresponding to the i-th layer of the neural network in the student network; the first angle is obtained based on the weight matrix corresponding to the i-th layer of the neural network in the teacher network and the input matrix of the j+1th batch of images in the i-th layer of the neural network in the teacher network; the second angle is obtained based on the binary weight matrix corresponding to the i-th layer of the neural network in the student network and the binary input matrix of the j+1th batch of images in the i-th layer of the neural network in the student network; i is a positive integer less than or equal to N.

可以看出，本申请实施例中的方法可以用于图像分类、目标检测或图像分割中的任一任务下，通过在上述三个任务下运用本申请实施例中的图像处理方法可以提高图像处理的效果，即本模型的通用性好。It can be seen that the method in the embodiment of the present application can be used for any task in image classification, target detection or image segmentation. By applying the image processing method in the embodiment of the present application to the above three tasks, the image processing effect can be improved, that is, the versatility of this model is good.

上述目标二值神经网络模型的具体训练过程可以参照图7中的方法700和图8中方法800的具体描述，此处不再赘述。The specific training process of the above-mentioned target binary neural network model can refer to the specific description of method 700 in Figure 7 and method 800 in Figure 8, which will not be repeated here.

可选的，上述方法700和方法800可以由CPU处理，也可以由CPU和GPU共同处理，也可以不用GPU，而使用其他适合用于神经网络计算的处理器，本申请不做限制。Optionally, the above methods 700 and 800 can be processed by a CPU, or by a CPU and a GPU together, or other processors suitable for neural network calculations can be used without a GPU, and this application does not impose any restrictions.

上述图像处理可以包括图像分类、图像分割、目标检测或其它有关的图像处理，本申请对此不做具体限定。下面将具体描述方法1100在图像分类、图像分割以及目标检测领域的应用。The above-mentioned image processing may include image classification, image segmentation, target detection or other related image processing, which is not specifically limited in the present application. The following specifically describes the application of method 1100 in the fields of image classification, image segmentation and target detection.

图像分类：将待处理图像输入目标二值神经网络模型，模型中的主干网络对待处理图像的特征进行提取，得到待处理图像的特征向量，并基于该特征向量逐层进行相应计算，最后得到该待处理图像的预测值，该预测值可以是一个多维向量，该多维向量中的每个元素对应一个图像类别，该多维向量中的每个元素用于表征待处理图像为该每个元素对应图像类别的概率值。Image classification: The image to be processed is input into the target binary neural network model. The backbone network in the model extracts the features of the image to be processed, obtains the feature vector of the image to be processed, and performs corresponding calculations layer by layer based on the feature vector. Finally, the predicted value of the image to be processed is obtained. The predicted value can be a multidimensional vector. Each element in the multidimensional vector corresponds to an image category, and each element in the multidimensional vector is used to represent the probability value of the image to be processed being the image category corresponding to each element.

图像分割：将待处理图像输入目标二值神经网络模型，模型中的主干网络对待处理图像的特征进行提取，得到待处理图像的特征向量，并基于该特征向量逐层进行相应计算，得到该待处理图像的多个预测值，该多个预测值与该待处理图像的多个像素点一一对应。该多个预测值中的每个预测值为一个多维向量，任一多维向量中的每个元素对应一个图像类别，任一多维向量中的每个元素用于表征该任一多维向量对应像素点为该每个元素所对应图像类别的概率值。Image segmentation: The image to be processed is input into the target binary neural network model. The backbone network in the model extracts the features of the image to be processed, obtains the feature vector of the image to be processed, and performs corresponding calculations layer by layer based on the feature vector to obtain multiple predicted values of the image to be processed. The multiple predicted values correspond to multiple pixel points of the image to be processed. Each predicted value in the multiple predicted values is a multidimensional vector, and each element in any multidimensional vector corresponds to an image category. Each element in any multidimensional vector is used to represent the probability value that the pixel point corresponding to any multidimensional vector is the image category corresponding to each element.

目标检测：将待处理图像输入目标二值神经网络模型，模型中的主干网络对待处理图像的特征进行提取，得到待处理图像的特征向量，模型首先会基于提取的特征向量对待处理图像中的目标物体进行识别并分割，得到目标物体对应目标区域，最后输出与目标区域对应的多维向量，该多维向量表示的含义与上述图像分类中的含义相同，此处不再赘述。Target detection: The image to be processed is input into the target binary neural network model. The backbone network in the model extracts the features of the image to be processed and obtains the feature vector of the image to be processed. The model first identifies and segments the target object in the image to be processed based on the extracted feature vector, obtains the target area corresponding to the target object, and finally outputs a multidimensional vector corresponding to the target area. The meaning of the multidimensional vector is the same as that in the above image classification, which will not be repeated here.

可以理解，图7和图8所描述的实施例为该二值神经网络模型的训练阶段(如图1所示的训练设备120执行的阶段)，具体训练是采用由图7或图8所示的实施例或该实施例基础上任意一种可能的实现方式进行的；而图11所描述的实施例则可以理解为是该二值神经网络模型的应用阶段(如图1所示的执行设备110执行的阶段)，具体可以体现为采用由图7或图8所示实施例训练得到的目标二值神经网络模型，并根据用户输入的待处理图像，得到待处理图像的预测值。It can be understood that the embodiments described in Figures 7 and 8 are the training stage of the binary neural network model (the stage executed by the training device 120 shown in Figure 1), and the specific training is carried out using the embodiments shown in Figure 7 or 8 or any possible implementation method based on the embodiments; and the embodiment described in Figure 11 can be understood as the application stage of the binary neural network model (the stage executed by the execution device 110 shown in Figure 1), which can be specifically embodied by using the target binary neural network model trained by the embodiments shown in Figures 7 or 8, and obtaining the predicted value of the image to be processed based on the image to be processed input by the user.

请参见图12，图12是本申请实施例提供的另一种图像处理方法1200的流程示意图。图12中的方法1200包括步骤1210及步骤1220。Please refer to FIG12 , which is a flowchart of another image processing method 1200 provided in an embodiment of the present application. The method 1200 in FIG12 includes step 1210 and step 1220 .

在一些示例中，该方法1200可以由图1中的执行设备110、图3所示的芯片以及图4中的执行设备210等设备执行。In some examples, the method 1200 may be executed by a device such as the execution device 110 in FIG. 1 , the chip shown in FIG. 3 , and the execution device 210 in FIG. 4 .

步骤1210，获取待处理图像。Step 1210, obtaining an image to be processed.

步骤1220，利用目标二值神经网络模型对待处理图像进行图像处理，得到待处理图像的预测值；其中，目标二值神经网络模型是通过目标损失函数对知识蒸馏框架中的初始二值神经网络模型M₀训练得到的，初始二值神经网络模型M₀为知识蒸馏框架中的学生网络，知识蒸馏框架中的教师网络为训练好的神经网络模型；目标损失函数包括角度损失项，角度损失项用于描述教师网络中特征矩阵和权重矩阵间的夹角和学生网络中特征矩阵和权重矩阵间夹角的差异。Step 1220, using the target binary neural network model to perform image processing on the image to be processed to obtain a predicted value of the image to be processed; wherein the target binary neural network model is obtained by training the initial binary neural network model _M0 in the knowledge distillation framework through the target loss function, the initial binary neural network model _M0 is the student network in the knowledge distillation framework, and the teacher network in the knowledge distillation framework is a trained neural network model; the target loss function includes an angle loss term, and the angle loss term is used to describe the difference between the angle between the feature matrix and the weight matrix in the teacher network and the angle between the feature matrix and the weight matrix in the student network.

可以看出，在本申请实施例中，由于第一方面中的方法在训练时引入知识蒸馏框架，并在目标损失函数中引入相应的角度损失项，因而通过第一方面中的方法训练得到的目标二值神经网络模型，相对于现有的二值神经网络模型而言，模型精度有了较大提升；同时，由于二值神经网络相比于神经网络模型而言模型参数更少，更加轻量化，因而在嵌入式设备中有良好的应用前景。It can be seen that in the embodiments of the present application, since the method in the first aspect introduces a knowledge distillation framework during training and introduces a corresponding angle loss term in the target loss function, the target binary neural network model trained by the method in the first aspect has a greatly improved model accuracy compared to the existing binary neural network model; at the same time, since the binary neural network has fewer model parameters and is more lightweight than the neural network model, it has good application prospects in embedded devices.

请参见表1，表1描述了采用不同方法在模式分析、统计建模和计算学习(PatternAnalysis,Statistical Modelling and Computational Learing，PASCAL VOC)数据集上训练得到的二值神经网络模型和教师网络进行目标检测时的检测效果和模型相关性能。Please refer to Table 1, which describes the detection effect and model-related performance of the binary neural network model and the teacher network trained on the Pattern Analysis, Statistical Modeling and Computational Learing (PASCAL VOC) dataset using different methods for target detection.

在此次仿真实验中，目标检测框架Framework分别采用快速区域卷积神经网络(Faster Region-based Convolutional Neural Networks，Faster RCNN)和单阶段检测器(Single Shot Multi-Box Detector，SSD)进行；其中，Faster RCNN为一种通用的两阶段目标检测框架，SSD为一种通用的单阶段目标检测框架。In this simulation experiment, the target detection framework uses Faster Region-based Convolutional Neural Networks (Faster RCNN) and Single Shot Multi-Box Detector (SSD) respectively; among them, Faster RCNN is a general two-stage target detection framework, and SSD is a general single-stage target detection framework.

在Faster RCNN中的短边和长边分别为600×1000个像素点(即表1中input为600×1000)的条件下，骨干网络Backbone分别采用残差神经网络(residual network，ResNet)中的ResNet-18、ResNet-34和ResNet-50三种网络；然后在包含不同骨干网络的框架中，分别采用不同量化方式(Quantization Method)对模型进行量化，量化方式包含：实值Real-valued，其量化后网络中的数值类型为32比特(bit)浮点型；低位宽梯度训练的低位宽卷积神经网络(Training Low Bitwidth Convolutional Neural Networks with LowBitwidth Gradients，DeRoFa-Net)，量化后网络中的数值类型为4比特浮点型；以及应用改进的表示能力和先进的训练算法得到的1比特卷积神经网络(Enhancing the Performanceof 1-bit CNNs With Improved Representational Capability and Advanced TrainingAlgorithm，Bi-Real-Net)；高效的二值化目标检测器(An Efficient Binarized ObjectDetector，BiDet)；具有广义激活函数的精确二元神经网络(Towards Precise BinaryNeural Network with Generalized Activation Functions，ReActNet)和分层搜索(layer-wise search，LWS-Det)；其中，Bi-Real-Net、BiDet、ReActNet和LWS-Det这四种量化方式量化后网络中数值类型为1比特整型，LWS-Det方式为本申请实施例中的采用的量化方式。表1中还包含不同方式量化时的内存使用量Memory Usage、每秒10亿次浮点运算量(Giga floating point operations Per Second，GFLOPs)、平均精度均值(mean averageprecision，mAP)。Under the condition that the short side and long side of Faster RCNN are 600×1000 pixels respectively (i.e., the input in Table 1 is 600×1000), the backbone network uses three networks of ResNet (Residual Network, ResNet), namely ResNet-18, ResNet-34 and ResNet-50. Then, in the framework containing different backbone networks, different quantization methods are used to quantize the models, including: Real-valued, in which the numerical type in the network after quantization is 32-bit floating point type; Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients (DeRoFa-Net), in which the numerical type in the network after quantization is 4-bit floating point type; and Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithms. TrainingAlgorithm, Bi-Real-Net); An Efficient Binarized Object Detector, BiDet; Towards Precise Binary Neural Network with Generalized Activation Functions, ReActNet and layer-wise search, LWS-Det; Among them, after quantization by the four quantization methods of Bi-Real-Net, BiDet, ReActNet and LWS-Det, the numerical type in the network is 1-bit integer, and the LWS-Det method is the quantization method used in the embodiments of the present application. Table 1 also includes the memory usage, Giga floating point operations Per Second, GFLOPs, and mean average precision (mAP) when quantized by different methods.

可以看出，在PASCAL VOC数据集上测试时，在Real-valued方式下训练得到的模型精度分别为76.4％、77.8％和79.5％；然后基于本申请实施例的蒸馏方法，分别训练ResNet-18/34/50作为主干网络的二值目标检测模型LWS-Det，在测试集上的精度为为73.2％、75.8％和76.9％，这大大加快了计算速度并且分别节省了6.79/5.88/5.57倍的存储空间。与其他二值量化方法相比，LWS-Det有显著的性能提升。使用ResNet-18骨干网络，LWS-Det比Bi-Real-Net、BiDet和ReActNet在相同内存和计算资源使用量的情况下，mAP性能分别提高12.3％、10.5％和3.6％的。同样，使用ResNet-34骨干网络，LWS-Det比Bi-Real-Net、BiDet和ReActNet的mAP性能分别高出12.7％、10.0％和3.5％。此外，使用ResNet-50骨干网络，LWS-Det比Bi-Real-Net和ReActNet的mAP性能分别高出11.2％和3.8％。此外，在ResNet-34骨干网络下，相对于4比特的DoReFa-Net方法，LWS-Det使用了更低的GFLOPs和内存空间，却高出了0.2％的mAP，表明本申请实施例效果提升十分显著。It can be seen that when tested on the PASCAL VOC dataset, the model accuracies obtained by training in the Real-valued mode were 76.4%, 77.8% and 79.5% respectively; then based on the distillation method of the embodiment of the present application, the binary target detection model LWS-Det with ResNet-18/34/50 as the backbone network was trained respectively, and the accuracies on the test set were 73.2%, 75.8% and 76.9%, which greatly accelerated the calculation speed and saved 6.79/5.88/5.57 times of storage space respectively. Compared with other binary quantization methods, LWS-Det has a significant performance improvement. Using the ResNet-18 backbone network, LWS-Det has improved the mAP performance by 12.3%, 10.5% and 3.6% respectively compared with Bi-Real-Net, BiDet and ReActNet with the same memory and computing resource usage. Similarly, using the ResNet-34 backbone network, LWS-Det outperforms Bi-Real-Net, BiDet, and ReActNet by 12.7%, 10.0%, and 3.5% respectively in mAP performance. In addition, using the ResNet-50 backbone network, LWS-Det outperforms Bi-Real-Net and ReActNet by 11.2% and 3.8% respectively in mAP performance. In addition, under the ResNet-34 backbone network, compared with the 4-bit DoReFa-Net method, LWS-Det uses lower GFLOPs and memory space, but has a 0.2% higher mAP, indicating that the effect of the embodiment of the present application is significantly improved.

在SSD中的短边和长边分别为300×300个像素点(即表1中input为300×300)、骨干网络Backbone采用VGG-16、量化方式分别采用Real-valued、DeRoFa-Net、Bi-Real-Net、BiDet、ReActNet和LWS-Det时，不同量化方式量化后模型数值类型W/A、内存使用量MemoryUsage、GFLOPs和mAP可参见表1，此处不再赘述。When the short and long sides of SSD are 300×300 pixels respectively (i.e., the input is 300×300 in Table 1), the backbone network Backbone adopts VGG-16, and the quantization methods are Real-valued, DeRoFa-Net, Bi-Real-Net, BiDet, ReActNet and LWS-Det respectively, the numerical type W/A, memory usage MemoryUsage, GFLOPs and mAP of the model after quantization by different quantization methods can be seen in Table 1, which will not be repeated here.

可以看出，LWS-Det在基于VGG-16骨干网络的SSD框架上可以实现14.76％的计算加速和4.81％的存储压缩。相比Real-valued而言，mAP性能差距很小(约2.9％)。与Bi-Real-Net、BiDet和ReActNet相比，LWS-Det可以在相同的GFLOPs和内存使用情况下mAP性能分别提高7.6％、5.4％和3.0％。LWS-Det的mAP性能比4比特DoReFa-Net提高了2.2％，并且明显降低了GFLOPs和内存使用量。It can be seen that LWS-Det can achieve 14.76% computational acceleration and 4.81% storage compression on the SSD framework based on the VGG-16 backbone network. Compared with Real-valued, the mAP performance gap is very small (about 2.9%). Compared with Bi-Real-Net, BiDet, and ReActNet, LWS-Det can improve the mAP performance by 7.6%, 5.4%, and 3.0% respectively under the same GFLOPs and memory usage. The mAP performance of LWS-Det is 2.2% higher than that of the 4-bit DoReFa-Net, and significantly reduces GFLOPs and memory usage.

综上所述，与上述各种检测框架上的二值神经网络相比，LWS-Det实现了最先进的性能，达到了接近全精度Real-vaued模型的性能，这在大量实验中得到了证明，清楚地验证了LWS-Det的优点，展示了LWS-Det在不同应用场景下的优越性和通用性。In summary, compared with the binary neural networks on various detection frameworks mentioned above, LWS-Det achieves state-of-the-art performance and reaches performance close to that of full-precision Real-vaued models, which is demonstrated in a large number of experiments, clearly verifying the advantages of LWS-Det and demonstrating its superiority and versatility in different application scenarios.

表1为本申请实施例中二值神经网络与其他网络在PASCAL VOC数据集上的实验结果对比表Table 1 is a comparison table of experimental results of binary neural network and other networks on PASCAL VOC dataset in the embodiment of this application

请参见图13，图13是本申请实施例提供的一种二值神经网络模型的训练装置1300示意图。该装置1300包括确定单元1310、训练单元1320和决策单元1330；其中，Please refer to FIG. 13 , which is a schematic diagram of a training device 1300 for a binary neural network model provided in an embodiment of the present application. The device 1300 includes a determination unit 1310 , a training unit 1320 and a decision unit 1330 ; wherein,

确定单元1310，用于执行步骤S1。The determining unit 1310 is configured to execute step S1.

训练单元1320，用于执行步骤S2。The training unit 1320 is used to execute step S2.

决策单元1330，用于执行步骤S3。The decision unit 1330 is used to execute step S3.

步骤S1：确定知识蒸馏框架；其中，知识蒸馏框架中的教师网络为训练好的神经网络模型，知识蒸馏框架中的学生网络为初始二值神经网络模型M₀，教师网络和学生网络分别包含N层神经网络，N为正整数。步骤S2：利用第j+1批图像和目标损失函数训练二值神经网络模型M_j，得到二值神经网络模型M_j+1；其中，二值神经网络模型M_j是基于第j批图像训练得到的，j为正整数；目标损失函数包含角度损失项，角度损失项用于描述教师网络中第i层神经网络对应的第一角度和学生网络中第i层神经网络对应的第二角度之间的差异；第一角度是基于教师网络中第i层神经网络的权重矩阵和第j+1批图像在教师网络中第i层神经网络中的输入矩阵得到的；第二角度是基于学生网络中第i层神经网络的二值权重矩阵和第j+1批图像在学生网络中第i层神经网络中的二值输入矩阵得到的；i为小于或等于N的正整数。步骤S3：当满足预设条件时，将二值神经网络模型M_j+1作为目标二值神经网络模型；否则令j＝j+1，并重复步骤S2。Step S1: Determine the knowledge distillation framework; wherein the teacher network in the knowledge distillation framework is a trained neural network model, and the student network in the knowledge distillation framework is an initial binary neural network model M ₀ , and the teacher network and the student network respectively include N layers of neural networks, where N is a positive integer. Step S2: Use the j+1th batch of images and the target loss function to train the binary neural network model M _j to obtain the binary neural network model M _j+1 ; wherein the binary neural network model M _j is obtained based on the jth batch of images, and j is a positive integer; the target loss function includes an angle loss term, which is used to describe the difference between the first angle corresponding to the i-th layer of the neural network in the teacher network and the second angle corresponding to the i-th layer of the neural network in the student network; the first angle is obtained based on the weight matrix of the i-th layer of the neural network in the teacher network and the input matrix of the j+1th batch of images in the i-th layer of the neural network in the teacher network; the second angle is obtained based on the binary weight matrix of the i-th layer of the neural network in the student network and the binary input matrix of the j+1th batch of images in the i-th layer of the neural network in the student network; i is a positive integer less than or equal to N. Step S3: When the preset conditions are met, the binary neural network model M _j+1 is used as the target binary neural network model; otherwise, j=j+1 is set and step S2 is repeated.

在一种可行的实施方式中，在上述将第j+1批图像输入二值神经网络模型M_j，得到第j+1批图像的预测值的方面，训练单元1320具体用于：P1：基于二值神经网络模型M_j中第i层神经网络对应的参考权重矩阵和概率矩阵得到第i层神经网络的二值权重矩阵；P2：根据第j+1批图像在第i层神经网络的二值输入矩阵和二值权重矩阵，得到第i层神经网络的第二卷积输出结果；其中，概率矩阵中的任一位置元素用于表征二值权重矩阵中该任一位置上元素取参考权重矩阵中该任一位置上元素的概率值；P3：令i＝i+1，重复步骤P1-P2，基于第N层神经网络的第二卷积输出结果得到第j+1批图像的预测值。In a feasible implementation, in the aspect of inputting the j+1th batch of images into the binary neural network model M _j to obtain the predicted values of the j+1th batch of images, the training unit 1320 is specifically used for: P1: obtaining the binary weight matrix of the i-th layer of the neural network based on the reference weight matrix and probability matrix corresponding to the i-th layer of the neural network in the binary neural network model M _j ; P2: obtaining the second convolution output result of the i-th layer of the neural network based on the binary input matrix and binary weight matrix of the j+1th batch of images in the i-th layer of the neural network; wherein, the element at any position in the probability matrix is used to characterize the probability value of the element at any position in the binary weight matrix taking the element at any position in the reference weight matrix; P3: let i=i+1, repeat steps P1-P2, and obtain the predicted values of the j+1th batch of images based on the second convolution output result of the N-th layer of the neural network.

在一种可行的实施方式中，上述参考权重矩阵包括第一参考权重矩阵和第二参考权重矩阵，概率矩阵包括第一概率矩阵和第二概率矩阵；在基于二值神经网络模型M_j中第i层神经网络对应的参考权重矩阵和概率矩阵得到第i层神经网络的二值权重矩阵的方面，训练单元1320具体用于：基于第一参考权重矩阵中任一位置元素在第一概率矩阵中对应的第一概率值和第二参考权重矩阵中该任一位置元素在第二概率矩阵中对应的第二概率值确定目标二值权重矩阵中该任一位置上的元素；其中，第一概率矩阵中的任一位置元素用于表征二值权重矩阵中该任一位置上元素取第一参考权重矩阵中该任一位置上元素的概率值；第二概率矩阵中的任一位置元素用于表征二值权重矩阵中该任一位置上元素取第二参考权重矩阵中该任一位置上元素的概率值。In a feasible implementation, the above-mentioned reference weight matrix includes a first reference weight matrix and a second reference weight matrix, and the probability matrix includes a first probability matrix and a second probability matrix; in terms of obtaining the binary weight matrix of _{the i-th layer neural network based on the reference weight matrix and probability matrix corresponding to the i-th layer neural network in} the binary neural network model M j, the training unit 1320 is specifically used to: determine the element at any position in the target binary weight matrix based on the first probability value corresponding to the element at any position in the first reference weight matrix in the first probability matrix and the second probability value corresponding to the element at any position in the second reference weight matrix in the second probability matrix; wherein, the element at any position in the first probability matrix is used to represent the probability value of the element at any position in the binary weight matrix taking the element at any position in the first reference weight matrix; the element at any position in the second probability matrix is used to represent the probability value of the element at any position in the binary weight matrix taking the element at any position in the second reference weight matrix.

在一种可行的实施方式中，在上述根据第j+1批图像在第i层神经网络的二值输入矩阵和二值权重矩阵，得到第i层神经网络的第二卷积输出结果的方面，训练单元1320具体用于：基于第j+1批图像中每张图像在第i层神经网络中的二值输入矩阵和二值权重矩阵分别进行卷积运算，得到每张图像的参考特征矩阵；利用第i层神经网络的权重缩放尺度因子对每张图像的参考特征矩阵进行缩放，得到第二卷积输出结果。In a feasible implementation, in the aspect of obtaining the second convolution output result of the i-th layer neural network based on the binary input matrix and the binary weight matrix of the j+1-th batch of images in the i-th layer neural network, the training unit 1320 is specifically used to: perform convolution operations based on the binary input matrix and the binary weight matrix of each image in the j+1-th batch of images in the i-th layer neural network, respectively, to obtain a reference feature matrix for each image; and scale the reference feature matrix of each image using the weight scaling factor of the i-th layer neural network to obtain the second convolution output result.

请参见图14，图14是本申请实施例提供的另一种模型训练装置1400示意图。该装置1400包括训练单元1410和决策单元1420。该模型包括教师网络和学生网络，教师网络为训练好的神经网络模型，学生网络为二值神经网络模型，教师网络和学生网络分别包含N层神经网络，N为正整数。Please refer to Figure 14, which is a schematic diagram of another model training device 1400 provided in an embodiment of the present application. The device 1400 includes a training unit 1410 and a decision unit 1420. The model includes a teacher network and a student network, the teacher network is a trained neural network model, the student network is a binary neural network model, and the teacher network and the student network each include N layers of neural networks, where N is a positive integer.

训练单元1410，用于利用教师网络和目标损失函数对二值神经网络模型进行训练；其中，目标损失函数包括角度损失项，角度损失项用于描述教师网络中第i层神经网络对应的第一角度和学生网络中第i层神经网络对应的第二角度之间的差异；第一角度是基于教师网络中第i层神经网络的权重矩阵和教师网络中第i层神经网络中的输入矩阵得到的；第二角度是基于学生网络中第i层神经网络的二值权重矩阵和学生网络中第i层神经网络中的二值输入矩阵得到的；i为小于或等于N的正整数。Training unit 1410 is used to train a binary neural network model using a teacher network and a target loss function; wherein the target loss function includes an angle loss term, which is used to describe the difference between a first angle corresponding to the i-th layer of the neural network in the teacher network and a second angle corresponding to the i-th layer of the neural network in the student network; the first angle is obtained based on the weight matrix of the i-th layer of the neural network in the teacher network and the input matrix in the i-th layer of the neural network in the teacher network; the second angle is obtained based on the binary weight matrix of the i-th layer of the neural network in the student network and the binary input matrix in the i-th layer of the neural network in the student network; i is a positive integer less than or equal to N.

决策单元1420，用于重复执行上述步骤，直到满足迭代终止条件，得到目标二值神经网络模型。The decision unit 1420 is used to repeatedly execute the above steps until the iteration termination condition is met to obtain the target binary neural network model.

请参见图15，图15是本申请实施例提供的一种图像处理装置1500的结构示意图。装置1500包括获取单元1510和处理单元1520。Please refer to FIG15 , which is a schematic diagram of the structure of an image processing device 1500 provided in an embodiment of the present application. The device 1500 includes an acquisition unit 1510 and a processing unit 1520 .

获取单元1510，用于获取待处理图像。The acquisition unit 1510 is used to acquire the image to be processed.

处理单元1520，用于利用目标二值神经网络模型对待处理图像进行图像处理，得到待处理图像的预测值；其中，目标二值神经网络模型通过K次训练得到的，在K次训练中的第j+1次训练中：利用第j+1批图像和目标损失函数训练二值神经网络模型M_j，得到二值神经网络模型M_j+1；二值神经网络模型M_j为知识蒸馏框架中的学生网络；知识蒸馏框架中的教师网络为训练好的神经网络模型，教师网络和学生网络分别包含N层神经网络，N为正整数；目标损失函数包含角度损失项；K为正整数，j为大于或等于零，且小于或等于K的整数；角度损失项用于描述教师网络中第i层神经网络对应的第一角度和学生网络中第i层神经网络对应的第二角度之间的差异；第一角度是基于教师网络中第i层神经网络对应的权重矩阵和第j+1批图像在教师网络中第i层神经网络中的输入矩阵得到的；第二角度是基于学生网络中第i层神经网络对应的二值权重矩阵和第j+1批图像在学生网络中第i层神经网络中的二值输入矩阵得到的；i为小于或等于N的正整数。The processing unit 1520 is used to perform image processing on the image to be processed using the target binary neural network model to obtain a predicted value of the image to be processed; wherein the target binary neural network model is obtained through K trainings, and in the j+1th training of the K trainings: the binary neural network model M _j is trained using the j+1th batch of images and the target loss function to obtain the binary neural network model M _j+1 ; the binary neural network model M _j is the student network in the knowledge distillation framework; the teacher network in the knowledge distillation framework is a trained neural network model, and the teacher network and the student network respectively contain N layers of neural networks, where N is a positive integer; the target loss function contains an angle loss term; K is a positive integer, and j is an integer greater than or equal to zero and less than or equal to K; the angle loss term is used to describe the difference between the first angle corresponding to the i-th layer of the neural network in the teacher network and the second angle corresponding to the i-th layer of the neural network in the student network; the first angle is based on the weight matrix corresponding to the i-th layer of the neural network in the teacher network and the input matrix of the j+1th batch of images in the i-th layer of the neural network in the teacher network; the second angle is based on the binary weight matrix corresponding to the i-th layer of the neural network in the student network and the binary input matrix of the j+1th batch of images in the i-th layer of the neural network in the student network; i is a positive integer less than or equal to N.

具体地，图像处理装置1500可以用于处理图11中所述的图像处理方法1100的对应步骤，此处不再赘述。Specifically, the image processing device 1500 can be used to process the corresponding steps of the image processing method 1100 described in FIG. 11 , which will not be described in detail here.

请参见图16，图16是本申请实施例提供的一种模型训练装置1600的硬件结构示意图。图16所示的模型训练装置1600(该装置1600具体可以是一种计算机设备)包括存储器1601、处理器1602、通信接口1603以及总线1604。其中，存储器1601、处理器1602、通信接口1603通过总线1604实现彼此之间的通信连接。Please refer to Figure 16, which is a schematic diagram of the hardware structure of a model training device 1600 provided in an embodiment of the present application. The model training device 1600 shown in Figure 16 (the device 1600 can be a computer device) includes a memory 1601, a processor 1602, a communication interface 1603 and a bus 1604. Among them, the memory 1601, the processor 1602, and the communication interface 1603 are connected to each other through the bus 1604.

存储器1601可以是只读存储器(read only memory，ROM)，静态存储设备，动态存储设备或者随机存取存储器(random access memory，RAM)。存储器1601可以存储程序，当存储器1601中存储的程序被处理器1602执行时，处理器1602和通信接口1603用于执行本申请实施例的二值神经网络模型的训练方法的各个步骤。The memory 1601 may be a read only memory (ROM), a static storage device, a dynamic storage device or a random access memory (RAM). The memory 1601 may store a program. When the program stored in the memory 1601 is executed by the processor 1602, the processor 1602 and the communication interface 1603 are used to execute the various steps of the training method of the binary neural network model of the embodiment of the present application.

处理器1602可以采用通用的中央处理器(central processing unit，CPU)，微处理器，应用专用集成电路(application specific integrated circuit，ASIC)，图形处理器(graphics processing unit，GPU)或者一个或多个集成电路，用于执行相关程序，以实现本申请实施例中二值神经网络模型的训练装置中的单元所需执行的功能，或者执行本申请方法实施例的模型训练方法。Processor 1602 can adopt a general central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU) or one or more integrated circuits to execute relevant programs to implement the functions required to be performed by the units in the training device of the binary neural network model in the embodiment of the present application, or to execute the model training method of the method embodiment of the present application.

处理器1602还可以是一种集成电路芯片，具有信号的处理能力。在实现过程中，本申请的二值神经网络模型的训练方法的各个步骤可以通过处理器1602中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1602还可以是通用处理器、数字信号处理器(digital signal processing，DSP)、专用集成电路(ASIC)、现成可编程门阵列(fieldprogrammable gate array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成，或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器，闪存、只读存储器，可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1601，处理器1602读取存储器1601中的信息，结合其硬件完成本申请实施例的二值神经网络模型的训练装置中包括的单元所需执行的功能，或者执行本申请方法实施例的二值神经网络模型的训练方法。Processor 1602 may also be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the training method of the binary neural network model of the present application may be completed by an integrated logic circuit of hardware or software instructions in processor 1602. The above-mentioned processor 1602 may also be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The methods, steps and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor, etc. The steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as being executed by a hardware decoding processor, or may be executed by a combination of hardware and software modules in a decoding processor. The software module may be located in a mature storage medium in the art such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory or an electrically erasable programmable memory, a register, etc. The storage medium is located in the memory 1601, and the processor 1602 reads the information in the memory 1601, and combines its hardware to complete the functions required to be performed by the units included in the training device of the binary neural network model of the embodiment of the present application, or executes the training method of the binary neural network model of the method embodiment of the present application.

通信接口1603使用例如但不限于收发器一类的收发装置，来实现装置1600与其他设备或通信网络之间的通信。例如，可以通过通信接口1603获取训练数据。The communication interface 1603 uses a transceiver device such as, but not limited to, a transceiver to implement communication between the device 1600 and other devices or a communication network. For example, training data can be obtained through the communication interface 1603.

总线1604可包括在装置1600各个部件(例如，存储器1601、处理器1602、通信接口1603)之间传送信息的通路。The bus 1604 may include a path for transmitting information between various components of the device 1600 (eg, the memory 1601 , the processor 1602 , and the communication interface 1603 ).

请参见图17，图17是本申请实施例提供的图像处理装置1700的硬件结构示意图。其中，图像处理装置1700可以是汽车、摄像头、电脑、手机、可穿戴设备或其它可能的终端设备，本申请对此不限定。图17所示的图像处理装置1700(该装置1700具体可以是一种计算机设备)包括存储器1701、处理器1702、通信接口1703以及总线1704。其中，存储器1701、处理器1702、通信接口1703通过总线1704实现彼此之间的通信连接。Please refer to Figure 17, which is a schematic diagram of the hardware structure of the image processing device 1700 provided in an embodiment of the present application. Among them, the image processing device 1700 can be a car, a camera, a computer, a mobile phone, a wearable device or other possible terminal devices, which is not limited by the present application. The image processing device 1700 shown in Figure 17 (the device 1700 can specifically be a computer device) includes a memory 1701, a processor 1702, a communication interface 1703 and a bus 1704. Among them, the memory 1701, the processor 1702, and the communication interface 1703 realize communication connection with each other through the bus 1704.

存储器1701可以是只读存储器(read only memory，ROM)，静态存储设备，动态存储设备或者随机存取存储器(random access memory，RAM)。存储器1701可以存储程序，当存储器1701中存储的程序被处理器1702执行时，处理器1702和通信接口1703用于执行本申请实施例的图像处理方法的各个步骤。The memory 1701 may be a read only memory (ROM), a static storage device, a dynamic storage device or a random access memory (RAM). The memory 1701 may store a program. When the program stored in the memory 1701 is executed by the processor 1702, the processor 1702 and the communication interface 1703 are used to execute each step of the image processing method of the embodiment of the present application.

处理器1702可以采用通用的中央处理器(central processing unit，CPU)，微处理器，应用专用集成电路(application specific integrated circuit，ASIC)，图形处理器(graphics processing unit，GPU)或者一个或多个集成电路，用于执行相关程序，以实现本申请实施例的图像处理装置中的单元所需执行的功能，或者执行本申请方法实施例的图像处理方法。Processor 1702 can adopt a general-purpose central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU) or one or more integrated circuits to execute relevant programs to implement the functions required to be performed by the units in the image processing device of the embodiment of the present application, or to execute the image processing method of the method embodiment of the present application.

处理器1702还可以是一种集成电路芯片，具有信号的处理能力。在实现过程中，本申请的图像处理方法的各个步骤可以通过处理器1702中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1702还可以是通用处理器、数字信号处理器(digitalsignal processing，DSP)、专用集成电路(ASIC)、现成可编程门阵列(field programmablegate array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成，或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器，闪存、只读存储器，可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1701，处理器1702读取存储器1701中的信息，结合其硬件完成本申请实施例的图像处理装置中包括的单元所需执行的功能，或者执行本申请方法实施例的图像处理方法。The processor 1702 may also be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the image processing method of the present application may be completed by an integrated logic circuit of hardware or software instructions in the processor 1702. The above-mentioned processor 1702 may also be a general-purpose processor, a digital signal processor (digitalsignal processing, DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (field programmablegate array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The methods, steps and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor, etc. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed by a hardware decoding processor, or may be executed by a combination of hardware and software modules in a decoding processor. The software module may be located in a mature storage medium in the art such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory or an electrically erasable programmable memory, a register, etc. The storage medium is located in the memory 1701, and the processor 1702 reads the information in the memory 1701, and combines its hardware to complete the functions required to be performed by the units included in the image processing device of the embodiment of the present application, or executes the image processing method of the method embodiment of the present application.

通信接口1703使用例如但不限于收发器一类的收发装置，来实现装置1700与其他设备或通信网络之间的通信。例如，可以通过通信接口1703获取训练数据。The communication interface 1703 uses a transceiver device such as, but not limited to, a transceiver to implement communication between the device 1700 and other devices or a communication network. For example, training data can be obtained through the communication interface 1703.

总线1704可包括在装置1700各个部件(例如，存储器1701、处理器1702、通信接口1703)之间传送信息的通路。The bus 1704 may include a path for transmitting information between various components of the device 1700 (eg, the memory 1701 , the processor 1702 , and the communication interface 1703 ).

应注意，尽管图16和图17所示的装置1600和装置1700仅仅示出了存储器、处理器、通信接口，但是在具体实现过程中，本领域的技术人员应当理解，装置1600和装置1700还包括实现正常运行所必须的其他器件。同时，根据具体需要，本领域的技术人员应当理解，装置1600和装置1700还可包括实现其他附加功能的硬件器件。此外，本领域的技术人员应当理解，装置1600和装置1700也可仅仅包括实现本申请实施例所必须的器件，而不必包括图16或图17中所示的全部器件。It should be noted that although the apparatus 1600 and the apparatus 1700 shown in FIG. 16 and FIG. 17 only show a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that the apparatus 1600 and the apparatus 1700 also include other devices necessary for normal operation. At the same time, according to specific needs, those skilled in the art should understand that the apparatus 1600 and the apparatus 1700 may also include hardware devices for implementing other additional functions. In addition, those skilled in the art should understand that the apparatus 1600 and the apparatus 1700 may also only include the devices necessary for implementing the embodiments of the present application, and do not necessarily include all the devices shown in FIG. 16 or FIG. 17.

可以理解，上述装置1600相当于图1中的训练设备120，装置1700相当于图1中的执行设备110。本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。It can be understood that the above-mentioned device 1600 is equivalent to the training device 120 in Figure 1, and the device 1700 is equivalent to the execution device 110 in Figure 1. Those of ordinary skill in the art can appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统、装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working processes of the systems, devices and units described above can refer to the corresponding processes in the aforementioned method embodiments and will not be repeated here.

在本申请所提供的几个实施例中，应该理解到，所揭露的系统、装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.

上述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

上述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(read-only memory，ROM)、随机存取存储器(random access memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the above functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application, or the part that contributes to the prior art or the part of the technical solution, can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk, and other media that can store program codes.

以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以所述权利要求的保护范围为准。The above is only a specific implementation of the present application, but the protection scope of the present application is not limited thereto. Any person skilled in the art who is familiar with the present technical field can easily think of changes or substitutions within the technical scope disclosed in the present application, which should be included in the protection scope of the present application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

Claims

1. An image processing method, characterized in that the method comprises:

Get the image to be processed;

Performing image processing on the image to be processed using a target binary neural network model to obtain a feature vector of the image to be processed;

Based on the feature vector, the target object in the image to be processed is identified and segmented to obtain a target area corresponding to the target object and a multidimensional vector corresponding to the target area, wherein each element in the multidimensional vector corresponds to an image category, and each element is used to represent a probability value that the image to be processed is the image category corresponding to each element;

The target binary neural network model is obtained through K trainings, and in the j+1th training among the K trainings: the binary neural network model M _j is trained using the j+1th batch of images and the target loss function to obtain the binary neural network model M _j+1 ; the binary neural network model M _j is the student network in the knowledge distillation framework; the teacher network in the knowledge distillation framework is a trained neural network model, and the teacher network and the student network respectively include N layers of neural networks, where N is a positive integer; the target loss function includes an angle loss term; K is a positive integer, and j is an integer greater than or equal to zero and less than or equal to K;

The angle loss term is used to describe the difference between a first angle corresponding to the i-th layer of the neural network in the teacher network and a second angle corresponding to the i-th layer of the neural network in the student network; the first angle is obtained based on the weight matrix corresponding to the i-th layer of the neural network in the teacher network and the input matrix of the j+1-th batch of images in the i-th layer of the neural network in the teacher network; the second angle is obtained based on the binary weight matrix corresponding to the i-th layer of the neural network in the student network and the binary input matrix of the j+1-th batch of images in the i-th layer of the neural network in the student network; i is a positive integer less than or equal to N.

2. The method according to claim 1, characterized in that the target loss function also includes a convolution result loss term; wherein the convolution result loss term is used to describe the difference between the first convolution output result of the i-th layer neural network in the teacher network and the second convolution output result of the i-th layer neural network in the student network;

The first convolution output result is obtained based on the weight matrix of the i-th layer neural network in the teacher network and the input matrix of the i-th layer neural network of the j+1-th batch of images in the teacher network; the second convolution output result is obtained based on the binary weight matrix and the corresponding weight scaling factor corresponding to the i-th layer neural network in the student network, and the binary input matrix of the j+1-th batch of images in the i-th layer neural network in the student network.

3. The method according to claim 1 or 2, characterized in that the objective loss function also includes a weight loss term;

Among them, the weight loss term is used to describe the difference between the weight matrix of the i-th layer neural network in the teacher network and the binary weight matrix of the i-th layer neural network in the student network.

4. The method according to claim 1 or 2, characterized in that the step of training the binary neural network model M _j using the j+1th batch of images and the target loss function to obtain the binary neural network model M _j+1 comprises:

Inputting the j+1th batch of images into the binary neural network model M _j to obtain predicted values of the j+1th batch of images;

The parameters of each layer of the neural network in the binary neural network model M _j are updated based on the predicted values of the j+1th batch of images, the labels of the j+1th batch of images and the target loss function to obtain the binary neural network model M _j+1 .

5. The method according to claim 4, characterized in that the step of inputting the j+1th batch of images into the binary neural network model M _j to obtain the predicted values of the j+1th batch of images comprises:

P1: Based on the reference weight matrix and probability matrix corresponding to the i-th layer neural network in the binary neural network model _Mj, the binary weight matrix of the i-th layer neural network is obtained; wherein the element at any position in the probability matrix is used to represent the probability value of the element at any position in the binary weight matrix taking the element at any position in the reference weight matrix;

P2: Obtaining a second convolution output result of the i-th layer of the neural network according to the binary input matrix of the j+1-th batch of images in the i-th layer of the neural network and the binary weight matrix;

P3: Let i=i+1, and repeat steps P1-P2 to obtain the predicted value of the j+1th batch of images based on the second convolution output result of the Nth layer neural network.

6. The method according to claim 5, characterized in that the reference weight matrix includes a first reference weight matrix and a second reference weight matrix, and the probability matrix includes a first probability matrix and a second probability matrix; the binary weight matrix of the i-th layer of the neural network is obtained based on the reference weight matrix and probability matrix corresponding to the i-th layer of the neural network in the binary neural network model _Mj , comprising:

Determine an element at any position in the target binary weight matrix based on a first probability value corresponding to an element at any position in the first reference weight matrix in the first probability matrix and a second probability value corresponding to an element at any position in the second reference weight matrix in the second probability matrix;

Among them, the element at any position in the first probability matrix is used to represent the probability value of the element at any position in the binary weight matrix taking the element at any position in the first reference weight matrix; the element at any position in the second probability matrix is used to represent the probability value of the element at any position in the binary weight matrix taking the element at any position in the second reference weight matrix.

7. The method according to claim 5 or 6, characterized in that the step of obtaining the second convolution output result of the i-th layer of the neural network according to the binary input matrix of the j+1-th batch of images in the i-th layer of the neural network and the binary weight matrix comprises:

Performing convolution operations on the binary input matrix of each image in the j+1th batch of images in the i-th layer of the neural network and the binary weight matrix to obtain a reference feature matrix for each image;

The reference feature matrix of each image is scaled using the weight scaling factor of the i-th layer neural network to obtain the second convolution output result.

8. The method of claim 7, wherein the parameter comprises at least one of the probability matrix or the weight scaling factor.

9. An image processing method, characterized in that the method comprises:

Get the image to be processed;

Based on the feature vector, the target object in the image to be processed is identified and segmented to obtain a target area corresponding to the target object and a multidimensional vector corresponding to the target area, wherein each element in the multidimensional vector corresponds to an image category, and each element is used to characterize the probability value of the image to be processed being the image category corresponding to each element; wherein the target binary neural network model is obtained by training the initial binary neural network model _M0 in the knowledge distillation framework through a target loss function, wherein the initial binary neural network model _M0 is a student network in the knowledge distillation framework, and the teacher network in the knowledge distillation framework is a trained neural network model; the target loss function includes an angle loss term, and the angle loss term is used to describe the difference between the angle between the feature matrix and the weight matrix in the teacher network and the angle between the feature matrix and the weight matrix in the student network.

10. An image processing device, characterized in that the device comprises:

An acquisition unit, used for acquiring an image to be processed;

A processing unit, used to perform image processing on the image to be processed using a target binary neural network model to obtain a feature vector of the image to be processed;

The processing unit is further used to identify and segment the target object in the image to be processed based on the feature vector, to obtain a target area corresponding to the target object, and a multidimensional vector corresponding to the target area, wherein each element in the multidimensional vector corresponds to an image category, and each element is used to represent a probability value that the image to be processed is the image category corresponding to each element;

11. The device according to claim 10, characterized in that the target loss function also includes a convolution result loss term; wherein the convolution result loss term is used to describe the difference between the first convolution output result of the i-th layer neural network in the teacher network and the second convolution output result of the i-th layer neural network in the student network;

12. The device according to claim 10 or 11, characterized in that the objective loss function also includes a weight loss term;

13. The device according to claim 10 or 11, characterized in that the processing unit is specifically used for:

14. The device according to claim 13, characterized in that, in the aspect of inputting the j+1th batch of images into the binary neural network model M _j to obtain the predicted values of the j+1th batch of images, the processing unit is specifically used to:

P1: Obtain the binary weight matrix of the i-th layer of the neural network based on the reference weight matrix and probability matrix corresponding to the i-th layer of the neural network in the binary neural network model _Mj ;

P2: Obtain the second convolution output result of the i-th layer neural network according to the binary input matrix of the j+1-th batch of images in the i-th layer neural network and the binary weight matrix; wherein the element at any position in the probability matrix is used to represent the probability value of the element at any position in the binary weight matrix taking the element at any position in the reference weight matrix;

P3: Let i=i+1, repeat steps P1-P2, and obtain the predicted value of the j+1th batch of images based on the second convolution output result of the Nth layer neural network.

15. The device according to claim 14, characterized in that the reference weight matrix includes a first reference weight matrix and a second reference weight matrix, and the probability matrix includes a first probability matrix and a second probability matrix; in the aspect of obtaining the binary weight matrix of the i-th layer of the neural network based on the reference weight matrix and probability matrix corresponding to the i-th layer of the neural network in the binary neural network model M _j , the processing unit is specifically used to:

16. The device according to claim 14 or 15, characterized in that, in the aspect of obtaining the second convolution output result of the i-th layer of the neural network according to the binary input matrix of the j+1-th batch of images in the i-th layer of the neural network and the binary weight matrix, the processing unit is specifically used to:

17. The apparatus of claim 16, wherein the parameter comprises at least one of the probability matrix or the weight scaling factor.

18. An image processing device, characterized in that the device comprises:

An acquisition unit, used for acquiring an image to be processed;

Among them, the target binary neural network model is obtained by training the initial binary neural network model _M0 in the knowledge distillation framework through the target loss function, the initial binary neural network model _M0 is the student network in the knowledge distillation framework, and the teacher network in the knowledge distillation framework is a trained neural network model; the target loss function includes an angle loss term, and the angle loss term is used to describe the difference between the angle between the feature matrix and the weight matrix in the teacher network and the angle between the feature matrix and the weight matrix in the student network.

19. A chip system, characterized in that the chip system comprises a processor and a memory; wherein:

The memory is used to store the target binary neural network model and program instructions;

The processor is used to read the program instructions to call the target binary neural network model to execute the method as described in any one of claims 1 to 9.

20. A terminal device, characterized in that the terminal device comprises the chip system as described in claim 19, and a discrete device coupled to the chip system; wherein the terminal device comprises a car, a camera, a computer, a mobile phone or a wearable device.

21. A computer-readable storage medium, characterized in that the computer-readable storage medium stores program codes for execution by a device, the program codes comprising code for executing the method according to any one of claims 1 to 9.