CN116168400A

CN116168400A - Knowledge distillation method, device, electronic device and storage medium

Info

Publication number: CN116168400A
Application number: CN202310126575.7A
Authority: CN
Inventors: 肖雪丽; 陈国华; 邵向潮; 李惠仪; 廖常辉; 冷颖雄; 谢洁芳; 周彦吉; 叶海珍; 邓茵; 刘贯科; 钟荣富; 戴喜良
Original assignee: Guangdong Power Grid Co Ltd; Dongguan Power Supply Bureau of Guangdong Power Grid Co Ltd
Current assignee: Guangdong Power Grid Co Ltd; Dongguan Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date: 2023-02-15
Filing date: 2023-02-15
Publication date: 2023-05-26
Anticipated expiration: 2043-02-15
Also published as: CN116168400B

Abstract

The embodiment of the present application discloses a knowledge distillation method, device, electronic equipment and storage medium. Obtain the characteristics to be processed of the image to be processed extracted by the feature extraction layer of the network to be classified, and the classification results obtained by classifying the characteristics to be processed by the classification layer of the network to be classified; wherein, the network to be classified includes a teacher network and a student network; according to The features to be processed of the network determine the feature weights of the network to be classified; according to the features to be processed, feature weights and classification results of different networks to be classified, determine the target loss function when the teacher network and the student network perform knowledge distillation; according to the target loss function, Adjust the network parameters of the student network. The embodiment of the present application improves the classification accuracy and speed, and improves the applicability of the classification network.

Description

Knowledge distillation method, device, electronic device and storage medium

技术领域technical field

本申请实施例涉及图像处理技术，尤其涉及一种知识蒸馏方法、装置、电子设备和存储介质。The embodiments of the present application relate to image processing technologies, and in particular, to a knowledge distillation method, device, electronic equipment, and storage medium.

背景技术Background technique

在各种电网项目生产建设过程中，案卷会以扫描件的形式存储档案管理案系统中。档案员对各种审批手续进行核查时，核查内容中很重要的一项工作是确认相关文件中的签名、日期、审批意见等是手写字体还是机打字体。以人工去核查，不仅效率低下，而且容易因为视觉疲劳而出错。During the production and construction of various power grid projects, the files will be stored in the file management system in the form of scanned copies. When archivists check various approval procedures, one of the most important tasks in the inspection content is to confirm whether the signatures, dates, approval opinions, etc. in relevant documents are handwritten or machine typed. Manual verification is not only inefficient, but also prone to errors due to visual fatigue.

现有技术中，可以通过深度学习模型判别电网项目的案卷中指定区域字体为手写字体还是机打字体。In the prior art, a deep learning model can be used to determine whether the font in a specified area in the file of the power grid project is a handwritten font or a machine typed font.

但是，深度学习模型的网络层数较浅达不到精度要求，而网络层数加深会影响模型的推理速度，占用更多的资源，可应用性差。However, the shallower network layers of the deep learning model cannot meet the accuracy requirements, and the deeper the network layers will affect the reasoning speed of the model, occupy more resources, and have poor applicability.

发明内容Contents of the invention

本申请提供一种知识蒸馏方法、装置、电子设备和存储介质，以提高分类精度和速度，提高分类网络的可应用性。The present application provides a knowledge distillation method, device, electronic equipment and storage medium to improve classification accuracy and speed, and improve the applicability of classification networks.

第一方面，本申请实施例提供了一种知识蒸馏方法，该知识蒸馏方法包括：In the first aspect, the embodiment of the present application provides a knowledge distillation method, the knowledge distillation method includes:

获取待分类网络的特征提取层提取的待处理图片的待处理特征，以及待分类网络的分类层对待处理特征进行分类所得的分类结果；其中，待分类网络包括教师网络和学生网络；Obtaining the features to be processed of the picture to be processed extracted by the feature extraction layer of the network to be classified, and the classification result obtained by classifying the features to be processed by the classification layer of the network to be classified; wherein, the network to be classified includes a teacher network and a student network;

根据待分类网络的待处理特征，确定待分类网络的特征权重；Determine the feature weight of the network to be classified according to the characteristics to be processed of the network to be classified;

根据不同待分类网络的待处理特征、特征权重以及分类结果，确定教师网络和学生网络进行知识蒸馏时的目标损失函数；According to the features to be processed, feature weights and classification results of different networks to be classified, determine the target loss function when the teacher network and the student network perform knowledge distillation;

根据目标损失函数，调整学生网络的网络参数。According to the objective loss function, the network parameters of the student network are adjusted.

第二方面，本申请实施例还提供了一种知识蒸馏装置，该知识蒸馏装置包括：In the second aspect, the embodiment of the present application also provides a knowledge distillation device, the knowledge distillation device includes:

参数获取模块，用于获取待分类网络的特征提取层提取的待处理图片的待处理特征，以及待分类网络的分类层对待处理特征进行分类所得的分类结果；其中，待分类网络包括教师网络和学生网络；The parameter acquisition module is used to obtain the characteristics to be processed of the image to be processed extracted by the feature extraction layer of the network to be classified, and the classification result obtained by classifying the characteristics to be processed by the classification layer of the network to be classified; wherein, the network to be classified includes the teacher network and student network;

特征权重确定模块，用于根据待分类网络的待处理特征，确定待分类网络的特征权重；The feature weight determination module is used to determine the feature weight of the network to be classified according to the characteristics to be processed of the network to be classified;

目标损失函数确定模块，用于根据不同待分类网络的待处理特征、特征权重以及分类结果，确定教师网络和学生网络进行知识蒸馏时的目标损失函数；The target loss function determination module is used to determine the target loss function when the teacher network and the student network perform knowledge distillation according to the characteristics to be processed, feature weights and classification results of different networks to be classified;

网络参数调整模块，用于根据目标损失函数，调整学生网络的网络参数。The network parameter adjustment module is used to adjust the network parameters of the student network according to the target loss function.

第三方面，本申请实施例还提供了电子设备，该电子设备包括：In the third aspect, the embodiment of the present application also provides electronic equipment, the electronic equipment includes:

一个或多个处理器；one or more processors;

存储装置，用于存储一个或多个程序；storage means for storing one or more programs;

当一个或多个程序被一个或多个处理器执行，使得一个或多个处理器实现如本申请实施例提供的任意一种知识蒸馏方法。When one or more programs are executed by one or more processors, one or more processors are made to implement any knowledge distillation method provided in the embodiments of the present application.

第四方面，本申请实施例还提供了一种包括计算机可执行指令的存储介质，计算机可执行指令在由计算机处理器执行时用于执行如本申请实施例提供的任意一种知识蒸馏方法。In a fourth aspect, the embodiment of the present application further provides a storage medium including computer-executable instructions, and the computer-executable instructions are used to execute any one of the knowledge distillation methods provided in the embodiments of the present application when executed by a computer processor.

本申请通过获取待分类网络的特征提取层提取的待处理图片的待处理特征，以及待分类网络的分类层对待处理特征进行分类所得的分类结果；其中，待分类网络包括教师网络和学生网络，便于后续根据教师网络的特征提取层和分类层对学生网络的对应层进行监督学习；根据待分类网络的待处理特征，确定待分类网络的特征权重，通过特征权重加强对重要特征的学习，提高后续学生网络的分类准确性；根据不同待分类网络的待处理特征、特征权重以及分类结果，确定教师网络和学生网络进行知识蒸馏时的目标损失函数；根据目标损失函数，调整学生网络的网络参数；目标损失函数包括了特征提取层和分类层的损失函数，教师网络和学生网络进行知识蒸馏时可以同时学习教师网络的特征提取层和分类层的参数，提高学生网络和教师网络的一致性，提高学生网络的分类精度和速度，同时由于学生网络自身层数较浅，占用资源较少，进而提高可应用性。因此通过本申请的技术方案，解决了网络层数较浅达不到精度要求，网络层数加深会影响模型的推理速度，占用更多的资源，可应用性差的问题，达到了提高分类精度和速度，提高分类网络的可应用性的效果。This application obtains the features to be processed of the pictures to be processed extracted by the feature extraction layer of the network to be classified, and the classification results obtained by classifying the features to be processed by the classification layer of the network to be classified; wherein, the network to be classified includes a teacher network and a student network, It is convenient for subsequent supervised learning of the corresponding layer of the student network according to the feature extraction layer and classification layer of the teacher network; according to the characteristics of the network to be classified, the feature weight of the network to be classified is determined, and the learning of important features is strengthened through the feature weight. The classification accuracy of the subsequent student network; according to the characteristics to be processed, feature weights and classification results of different networks to be classified, determine the target loss function when the teacher network and the student network perform knowledge distillation; adjust the network parameters of the student network according to the target loss function ; The target loss function includes the loss function of the feature extraction layer and the classification layer. When the teacher network and the student network perform knowledge distillation, they can learn the parameters of the feature extraction layer and the classification layer of the teacher network at the same time, and improve the consistency of the student network and the teacher network. Improve the classification accuracy and speed of the student network, and at the same time, because the layer of the student network itself is shallow, it occupies less resources, thereby improving the applicability. Therefore, through the technical solution of this application, the problems of shallow network layers that cannot meet the accuracy requirements, deepened network layers will affect the reasoning speed of the model, occupy more resources, and poor applicability have been solved, and the problems of improving classification accuracy and Speed, the effect of improving the applicability of classification networks.

附图说明Description of drawings

图1是本申请实施例一中的一种知识蒸馏方法的流程图；FIG. 1 is a flow chart of a knowledge distillation method in Embodiment 1 of the present application;

图2a是本申请实施例二中的一种知识蒸馏方法的流程图；Figure 2a is a flow chart of a knowledge distillation method in Embodiment 2 of the present application;

图2b是本申请实施例二中的一种特征权重确定流程示意图；FIG. 2b is a schematic diagram of a feature weight determination process in Embodiment 2 of the present application;

图3a是本申请实施例三中的一种知识蒸馏方法的流程图；Fig. 3a is a flow chart of a knowledge distillation method in Embodiment 3 of the present application;

图3b是本申请实施例三中的一种目标损失函数确定流程示意图；Fig. 3b is a schematic diagram of a process for determining a target loss function in Embodiment 3 of the present application;

图4是本申请实施例四中的一种知识蒸馏装置的结构示意图；Fig. 4 is a schematic structural diagram of a knowledge distillation device in Embodiment 4 of the present application;

图5是本申请实施例五中的一种电子设备的结构示意图。FIG. 5 is a schematic structural diagram of an electronic device in Embodiment 5 of the present application.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本申请方案，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分的实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本申请保护的范围。In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiment of the application. Obviously, the described embodiment is only It is an embodiment of a part of the application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the scope of protection of this application.

需要说明的是，本申请的说明书和权利要求书及上述附图中的术语“第一”和“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first" and "second" in the specification and claims of the present application and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to the expressly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.

实施例一Embodiment one

图1为本申请实施例一提供的一种知识蒸馏方法的流程图，本实施例可适用于在对电网建设的档案中对手写字体进行分类时，对分类网络进行知识蒸馏的情况，该方法可以由知识蒸馏装置执行，该装置可以采用软件和/或硬件实现，并具体配置于电子设备中，例如，服务器。Fig. 1 is a flow chart of a knowledge distillation method provided in Embodiment 1 of the present application. This embodiment can be applied to the case of performing knowledge distillation on the classification network when classifying handwritten characters in the files of power grid construction. It can be executed by a knowledge distillation device, which can be implemented by software and/or hardware, and specifically configured in electronic equipment, such as a server.

参见图1所示的知识蒸馏方法，具体包括如下步骤：Referring to the knowledge distillation method shown in Figure 1, it specifically includes the following steps:

S110、获取待分类网络的特征提取层提取的待处理图片的待处理特征，以及待分类网络的分类层对待处理特征进行分类所得的分类结果；其中，待分类网络包括教师网络和学生网络。S110. Obtain the features to be processed of the picture to be processed extracted by the feature extraction layer of the network to be classified, and the classification result obtained by classifying the features to be processed by the classification layer of the network to be classified; wherein, the network to be classified includes a teacher network and a student network.

待分类网络可以为能够对手写字和机打字进行分类的网络，用于对电网建设项目类档案文档中的手写字进行分类。示例性的，待分类网络可以为卷积神经网络、残差网络或深度置信网络等。具体的，待分类网络包括教师网络和学生网络，教师网络可以为训练好的分类网络，也即网络参数固定的网络，用于对学生网络进行训练；学生网络为需要进行训练的网络，也即网络参数需要进行调整的网络，网络参数较浅。The network to be classified may be a network capable of classifying handwritten characters and machine-typed characters, and is used to classify handwritten characters in archive documents of power grid construction projects. Exemplarily, the network to be classified may be a convolutional neural network, a residual network, or a deep belief network. Specifically, the network to be classified includes a teacher network and a student network. The teacher network can be a trained classification network, that is, a network with fixed network parameters, which is used to train the student network; the student network is a network that needs to be trained, that is, The network parameters need to be tuned, and the network parameters are shallow.

知识蒸馏方法可以为通过迁移知识，利用训练好的教师网络得到更加适合推理的学生网络，其中教师网络可以为网络层数较深的网络，学生网络可以为网络层数较浅的网络。示例性的，教师网络可以为resnet34(Residual Network 34，34层残差网络)，学生网络可以为，resnet18(Residual Network 18，18层残差网络)。具体的，构建好教师网络后，用样本训练集训练教师网络，得到训练好的教师网络，固定教师网络；然后用训练好的教师网络去监督学生网络的训练，以使得学生网络在减少参数量的同时，也能学到教师网络的优秀性能，提高学生网络的推理精度。The knowledge distillation method can use the trained teacher network to obtain a student network that is more suitable for reasoning by transferring knowledge. The teacher network can be a network with a deep network layer, and the student network can be a network with a shallow network layer. Exemplarily, the teacher network may be resnet34 (Residual Network 34, 34-layer residual network), and the student network may be resnet18 (Residual Network 18, 18-layer residual network). Specifically, after building the teacher network, use the sample training set to train the teacher network, obtain the trained teacher network, and fix the teacher network; then use the trained teacher network to supervise the training of the student network, so that the student network can reduce the amount of parameters At the same time, it can also learn the excellent performance of the teacher network and improve the reasoning accuracy of the student network.

特征提取层可以为待分类网络的卷积层，用于对输入的待处理图片进行特征提取。示例性的，特征提取层可以为多层卷积层中的最后一层，也即特征提取层与输出层之间没有其他卷积层。待处理图片可以为样本数据，例如，电网建设项目类档案文档中的签字图片。待处理图片的待处理特征可以为待处理图片经过特征提取层进行特征提取后输出的图片，用于表示特征提取层提取到的特征。分类层可以为待分类网络的输出层，用于根据特征提取层得到的特征确定分类结果。示例性的，分类层可以为全连接层。分类结果可以为对待处理特征进行分类的结果。示例性的，当待分类网络用于对签字图片是否为手写字进行分类时，分类结果可以包括手写字和机打字。The feature extraction layer may be a convolutional layer of the network to be classified, and is used for feature extraction of the input image to be processed. Exemplarily, the feature extraction layer may be the last layer in the multi-layer convolutional layer, that is, there is no other convolutional layer between the feature extraction layer and the output layer. The image to be processed may be sample data, for example, a signature image in an archive document of a power grid construction project. The feature to be processed of the picture to be processed may be a picture output after the feature extraction layer performs feature extraction on the picture to be processed, which is used to represent the feature extracted by the feature extraction layer. The classification layer can be the output layer of the network to be classified, and is used to determine the classification result according to the features obtained by the feature extraction layer. Exemplarily, the classification layer may be a fully connected layer. The classification result may be a result of classifying the features to be processed. Exemplarily, when the network to be classified is used to classify whether the signature picture is handwritten, the classification result may include handwritten and machine typed.

S120、根据待分类网络的待处理特征，确定待分类网络的特征权重。S120. Determine feature weights of the network to be classified according to the characteristics to be processed of the network to be classified.

特征权重可以为根据待处理特征确定的权重，用于对学生网络的训练进行监督，从而使得学生网络学到教师网络更加丰富的判别特征。示例性的，可以通过注意力机制确定待分类网络的特征权重，以使得能够让学生网络更加关注教师网络中的特定信息，例如，边缘特征。具体的，基于注意力机制，根据教师网络的特征提取层得到教师网络的特征权重；基于注意力机制，根据学生网络的特征提取层得到学生网络的特征权重。The feature weights may be weights determined according to the features to be processed, and are used to supervise the training of the student network, so that the student network can learn richer discriminative features of the teacher network. Exemplarily, the feature weight of the network to be classified can be determined through an attention mechanism, so that the student network can pay more attention to specific information in the teacher network, for example, edge features. Specifically, based on the attention mechanism, the feature weight of the teacher network is obtained according to the feature extraction layer of the teacher network; based on the attention mechanism, the feature weight of the student network is obtained according to the feature extraction layer of the student network.

S130、根据不同待分类网络的待处理特征、特征权重以及分类结果，确定教师网络和学生网络进行知识蒸馏时的目标损失函数。S130. According to the features to be processed, feature weights, and classification results of different networks to be classified, determine a target loss function when the teacher network and the student network perform knowledge distillation.

目标损失函数可以用于学生网络对教师网络进行蒸馏学习，通过损失函数监督学生网络的学习。示例性的，损失函数可以为KL散度(Kullback-Leibler Divergence，一种损失函数)、均方误差(Mean Square Error，一种损失函数)、L1损失函数(专业名词，一种损失函数，平均绝对误差)和深度互学习损失等中的至少一种。目标损失函数可以为待分类网络构成的蒸馏网络的总损失函数。The target loss function can be used for the student network to distill the teacher network, and supervise the learning of the student network through the loss function. Exemplarily, the loss function can be KL divergence (Kullback-Leibler Divergence, a loss function), mean square error (Mean Square Error, a loss function), L1 loss function (professional term, a loss function, average Absolute error) and at least one of deep mutual learning loss, etc. The target loss function may be the total loss function of the distillation network composed of the network to be classified.

具体的，可以根据不同待分类网络的待处理特征和特征权重，确定学生网络对教师网络的特征提取层学习的损失函数；可以根据不同待分类网络的分类结果，确定学生网络对教师网络的分类层进行学习的损失函数，根据两种损失函数的加权和得到目标损失函数。Specifically, the loss function learned by the student network for the feature extraction layer of the teacher network can be determined according to the features to be processed and the feature weights of different networks to be classified; the classification of the teacher network by the student network can be determined according to the classification results of different networks to be classified The loss function for layer learning, and the target loss function is obtained according to the weighted sum of the two loss functions.

S140、根据目标损失函数，调整学生网络的网络参数。S140. Adjust network parameters of the student network according to the target loss function.

网络参数可以为学生网络中的进行训练时，学习的参数。示例性的，网络参数可以为卷积核。在得到目标损失函数的数值之后，根据目标损失函数，运用梯度下降算法，进行整个蒸馏网络的训练，调整学生网络的网络参数。The network parameters may be parameters learned during training in the student network. Exemplarily, the network parameter may be a convolution kernel. After obtaining the value of the target loss function, according to the target loss function, the gradient descent algorithm is used to train the entire distillation network and adjust the network parameters of the student network.

在电网建设项目类档案中，包括大量的档案项目案卷，这些案卷中很多文档需要对应的负责人手写签字审批，以确保文件有效性。案卷会以扫描件的形式存储在档案部门，再由档案部门对各种审批手续进行核查，核查内容中很重要的一项工作是确认相关文件中的签名、日期和审批意见等是手写字还是机打字，并对存在机打字的违规手续进行告警反馈。一个待审查工程通常会产生数千个文档，一份文档通常会包含大量的待审查项，当前都是以人工去核查，不仅效率低下，而且容易因为视觉疲劳而出错。运用计算机技术去定位待审查项位置，自动分类是否是手写体，将大大减少工作人员的任务量。目前在电网建设类项目审查中没有手写字自动分类模型。The power grid construction project archives include a large number of archive project files, and many documents in these files need to be signed and approved by the corresponding person in charge to ensure the validity of the files. The case files will be stored in the archives department in the form of scanned copies, and then the archives department will check the various approval procedures. One of the most important tasks in the inspection content is to confirm whether the signatures, dates, and approval opinions in the relevant documents are handwritten or not. machine typing, and give alarm feedback to the violation procedures of machine typing. A project to be reviewed usually produces thousands of documents, and a document usually contains a large number of items to be reviewed. Currently, it is manually checked, which is not only inefficient, but also prone to errors due to visual fatigue. Using computer technology to locate the position of the item to be reviewed and automatically classify whether it is handwritten will greatly reduce the workload of the staff. At present, there is no automatic classification model for handwritten characters in the review of power grid construction projects.

区分机打字和手写字可以抽象为图片二分类任务。现有技术中，对图片分类任务做了很多工作，基于深度学习的网络模型成为主流模型，经典的网络有LeNet-5(专业名词，一种深度学习网络)卷积神经网络(5层网络)、AlexNet(专业名词，一种深度学习网络)网络(8层网络)、VGG-16(专业名词，一种深度学习网络)网络(16层网络)、GoogLeNet(专业名词，一种深度学习网络)(22层网络)和ResNet-101(101层网络)等。随着网络层数不断加深，网络模型的分类精度越高，算法更加鲁棒；但同时模型的参数量也会越来越大，影响网络模型的运算速度，也会占用更多的计算资源，不利于技术的落地应用。然而，如果采用浅层的网络模型，分类精度效果较差，也不能满足技术的落地应用要求。Distinguishing machine typing and handwriting can be abstracted as a two-category task for pictures. In the prior art, a lot of work has been done on image classification tasks. The network model based on deep learning has become the mainstream model. The classic network has LeNet-5 (professional term, a deep learning network) convolutional neural network (5-layer network) , AlexNet (professional term, a deep learning network) network (8-layer network), VGG-16 (professional term, a deep learning network) network (16-layer network), GoogLeNet (professional term, a deep learning network) (22-layer network) and ResNet-101 (101-layer network), etc. As the number of network layers continues to deepen, the higher the classification accuracy of the network model, the more robust the algorithm; but at the same time, the number of parameters of the model will become larger and larger, which will affect the operation speed of the network model and occupy more computing resources. It is not conducive to the landing application of technology. However, if a shallow network model is used, the classification accuracy is poor, and it cannot meet the requirements of the technology's landing application.

本实施例的技术方案，通过获取待分类网络的特征提取层提取的待处理图片的待处理特征，以及待分类网络的分类层对待处理特征进行分类所得的分类结果；其中，待分类网络包括教师网络和学生网络，便于后续根据教师网络的特征提取层和分类层对学生网络的对应层进行监督学习；根据待分类网络的待处理特征，确定待分类网络的特征权重，通过特征权重加强对重要特征的学习，提高后续学生网络的分类准确性；根据不同待分类网络的待处理特征、特征权重以及分类结果，确定教师网络和学生网络进行知识蒸馏时的目标损失函数；根据目标损失函数，调整学生网络的网络参数；目标损失函数包括了特征提取层和分类层的损失函数，教师网络和学生网络进行知识蒸馏时可以同时学习教师网络的特征提取层和分类层的参数，提高学生网络和教师网络的一致性，提高学生网络的分类精度和速度，同时由于学生网络自身层数较浅，占用资源较少，进而提高可应用性。因此通过本申请的技术方案，解决了网络层数较浅达不到精度要求，网络层数加深会影响模型的推理速度，占用更多的资源，可应用性差的问题，达到了提高分类精度和速度，提高分类网络的可应用性的效果。In the technical solution of this embodiment, the features to be processed of the pictures to be processed extracted by the feature extraction layer of the network to be classified are obtained, and the classification results obtained by classifying the features to be processed by the classification layer of the network to be classified; wherein, the network to be classified includes teachers The network and the student network are convenient for subsequent supervised learning of the corresponding layer of the student network according to the feature extraction layer and classification layer of the teacher network; according to the characteristics of the network to be classified, the feature weight of the network to be classified is determined, and the important weight is strengthened through the feature weight. The learning of features improves the classification accuracy of subsequent student networks; according to the characteristics to be processed, feature weights and classification results of different networks to be classified, determine the target loss function when the teacher network and student network perform knowledge distillation; according to the target loss function, adjust The network parameters of the student network; the target loss function includes the loss function of the feature extraction layer and the classification layer. When the teacher network and the student network perform knowledge distillation, the parameters of the feature extraction layer and the classification layer of the teacher network can be learned at the same time, and the student network and teacher network can be improved. The consistency of the network improves the classification accuracy and speed of the student network. At the same time, because the layer of the student network itself is shallow, it occupies less resources, thereby improving the applicability. Therefore, through the technical solution of this application, the problems of shallow network layers that cannot meet the accuracy requirements, deepened network layers will affect the reasoning speed of the model, occupy more resources, and poor applicability have been solved, and the problems of improving classification accuracy and Speed, the effect of improving the applicability of classification networks.

实施例二Embodiment two

图2a为本申请实施例二提供的一种知识蒸馏方法的流程图方法的流程图，本实施例的技术方案在上述技术方案的基础上进一步细化。Fig. 2a is a flowchart of a knowledge distillation method provided in Embodiment 2 of the present application. The technical solution of this embodiment is further refined on the basis of the above technical solution.

进一步地，将“根据待分类网络的待处理特征，确定待分类网络的特征权重”，细化为：“提取待分类网络的待处理特征的边缘特征；根据边缘特征，通过权重映射函数，确定待分类网络的特征权重”，以确定待分类网络的特征权重。Further, "determine the feature weight of the network to be classified according to the characteristics to be processed of the network to be classified" is refined as: "extract the edge features of the features to be processed of the network to be classified; according to the edge features, through the weight mapping function, determine The feature weight of the network to be classified" to determine the feature weight of the network to be classified.

参见图2a所示的一种知识蒸馏方法，包括：See a knowledge distillation method shown in Figure 2a, including:

S210、获取待分类网络的特征提取层提取的待处理图片的待处理特征，以及待分类网络的分类层对待处理特征进行分类所得的分类结果；其中，待分类网络包括教师网络和学生网络。S210. Obtain the features to be processed extracted by the feature extraction layer of the network to be classified, and the classification result obtained by classifying the features to be processed by the classification layer of the network to be classified; wherein, the network to be classified includes a teacher network and a student network.

S220、提取待分类网络的待处理特征的边缘特征。S220. Extract edge features of the features to be processed of the network to be classified.

边缘特征可以为待处理特征中的一种不连续特征，用于确定权重特征。示例性的，可以通过Roberts算子(罗伯茨算子，专业名词，一种边缘检测算子)、Sobel算子(索贝尔算子，专业名词，一种边缘检测算子)和拉普拉斯边缘检测算子等中的一种提取边缘特征。边缘特征是图片中的重要特征，通过对边缘特征的学习可以提高对图片的分类精度。对于手写字和机打字其边缘特征是重要的分类依据，通过提取边缘特征，提高分类精度。The edge feature can be a discontinuous feature among the features to be processed, and is used to determine the weight feature. Exemplary, can be by Roberts operator (Roberts operator, professional noun, a kind of edge detection operator), Sobel operator (Sobel operator, professional noun, a kind of edge detection operator) and Laplacian edge One of detection operators etc. to extract edge features. The edge feature is an important feature in the picture, and the classification accuracy of the picture can be improved by learning the edge feature. For handwriting and machine typing, its edge features are an important classification basis, and the classification accuracy can be improved by extracting edge features.

在一个可选实施例中，提取待分类网络的待处理特征的边缘特征，包括：通过傅里叶变换，确定待处理特征的频域特征；通过傅里叶逆变换，确定频域特征的高频特征；通过索贝尔sobel算子，确定高频特征的边缘特征。In an optional embodiment, extracting the edge features of the features to be processed in the network to be classified includes: determining the frequency domain features of the features to be processed by Fourier transform; determining the height of the frequency domain features by inverse Fourier transform Frequency features; determine the edge features of high-frequency features through the Sobel sobel operator.

频域特征可以为待处理特征在频域上的对应特征，也即将待处理特征从时域变换到频域后对应的特征数据。具体的，可以通过对待处理特征进行傅里叶变换，得到待处理特征的频域特征。高频特征可以为图片中变化比较剧烈的特征，用于提取边缘特征。具体的，可以通过对频域特征进行傅里叶逆变换，得到频域特征的高频特征。Sobel算子很容易在空间上实现，边缘检测效果较好，且受噪声的影响也较小，因此通过Sobel算子，提取高频特征的边缘特征。The frequency domain feature may be the corresponding feature in the frequency domain of the feature to be processed, that is, the corresponding feature data after the feature to be processed is transformed from the time domain to the frequency domain. Specifically, the frequency domain features of the features to be processed can be obtained by performing Fourier transform on the features to be processed. High-frequency features can be features that change drastically in the picture and are used to extract edge features. Specifically, high-frequency features of the frequency-domain features may be obtained by inverse Fourier transforming the frequency-domain features. The Sobel operator is easy to implement in space, the edge detection effect is better, and it is less affected by noise, so the edge features of high-frequency features are extracted through the Sobel operator.

通过傅里叶变换，确定待处理特征的频域特征；通过傅里叶逆变换，确定频域特征的高频特征，利用对图片的傅里叶变换与逆变换获得待处理特征的高频特征，有利于后续边缘特征的提取，提高后续边缘特征的提取效率；通过索贝尔sobel算子，确定高频特征的边缘特征，基于sobel算子的易实现、低噪声和准确性高的特性，提高边缘特征的提取效率和精度。Determine the frequency domain features of the features to be processed through Fourier transform; determine the high frequency features of the frequency domain features through inverse Fourier transform, and use the Fourier transform and inverse transform of the picture to obtain the high frequency features of the features to be processed , which is conducive to the extraction of subsequent edge features and improves the extraction efficiency of subsequent edge features; through the Sobel sobel operator, the edge features of high-frequency features are determined, based on the characteristics of easy implementation, low noise and high accuracy of the sobel operator, improve Extraction efficiency and accuracy of edge features.

S230、根据边缘特征，通过权重映射函数，确定待分类网络的特征权重。S230. Determine the feature weight of the network to be classified by using a weight mapping function according to the edge feature.

权重映射函数可以为将边缘特征映射为权重矩阵的函数，用于根据边缘特征确定特征权重。示例性的，权重映射函数可以为softmax函数，softmax函数可以将边缘特征转为0-1之间的权重矩阵，使待分类网络更加关注图片中的边缘特征。在后续进行蒸馏学习的过程中，可以通过特征权重使得学生网络重点学习师网络对边缘特征的提取方法，使得学生网络学习到更鲁棒的边缘特征。The weight mapping function may be a function that maps edge features to a weight matrix, and is used to determine feature weights according to edge features. Exemplarily, the weight mapping function may be a softmax function, and the softmax function may convert the edge features into a weight matrix between 0 and 1, so that the network to be classified pays more attention to the edge features in the picture. In the subsequent process of distillation learning, the feature weight can be used to make the student network focus on learning the edge feature extraction method of the teacher network, so that the student network can learn more robust edge features.

图2b为一种特征权重确定流程示意图。如图2b中所示通过对待处理特征进行傅里叶变换，得到频域特征；通过对频域特征进行傅里叶逆变换，得到高频特征；通过sobel算子对高频特征进行处理，得到边缘特征，通过权重映射函数对边缘特征进行映射，得到特征权重。Fig. 2b is a schematic diagram of a feature weight determination process. As shown in Figure 2b, the frequency domain features are obtained by performing Fourier transform on the features to be processed; the high frequency features are obtained by performing inverse Fourier transform on the frequency domain features; the high frequency features are processed by the sobel operator to obtain Edge features, the edge features are mapped through the weight mapping function to obtain feature weights.

S240、根据不同待分类网络的待处理特征、特征权重以及分类结果，确定教师网络和学生网络进行知识蒸馏时的目标损失函数。S240. According to the features to be processed, feature weights, and classification results of different networks to be classified, determine a target loss function when the teacher network and the student network perform knowledge distillation.

S250、根据目标损失函数，调整学生网络的网络参数。S250. Adjust network parameters of the student network according to the target loss function.

本实施例的技术方案，通过提取待分类网络的待处理特征的边缘特征；根据边缘特征，通过权重映射函数，确定待分类网络的特征权重，边缘特征为对进行图片分类的重要特征依据，通过提取边缘特征，并根据边缘特征映射为特征权重，可以在通过教师网络对学生网络进行知识蒸馏的时，通过教师网络的特征权重监督学生网络的特征权重，使得学生网络学习到教师网络中更加精细的边缘特征，提高学生网络特征提取的准确性，为后续进行分类提供更加准确的分类依据，提高分类的准确性。In the technical solution of this embodiment, by extracting the edge features of the features to be processed in the network to be classified; according to the edge features, through the weight mapping function, the feature weights of the network to be classified are determined, and the edge features are important feature basis for classifying pictures. Extract edge features and map them to feature weights according to the edge features. When knowledge distillation is performed on the student network through the teacher network, the feature weights of the teacher network can be used to supervise the feature weights of the student network, so that the student network can learn more finely from the teacher network. The edge features can improve the accuracy of student network feature extraction, provide more accurate classification basis for subsequent classification, and improve the accuracy of classification.

实施例三Embodiment Three

图3a为本申请实施例三提供的一种知识蒸馏方法的流程图方法的流程图，本实施例的技术方案在上述技术方案的基础上进一步细化。Fig. 3a is a flow chart of a knowledge distillation method provided in Embodiment 3 of the present application. The technical solution of this embodiment is further refined on the basis of the above technical solution.

进一步地，将“根据不同待分类网络的待处理特征、特征权重以及分类结果，确定教师网络和学生网络进行知识蒸馏时的目标损失函数”，细化为：“根据不同待分类网络的待处理特征和特征权重，确定特征损失函数；根据不同待分类网络的分类结果，确定分类损失函数；根据特征损失函数和分类损失函数，确定教师网络和学生网络进行知识蒸馏时的目标损失函数”，以细化目标损失函数的确定方式。Further, "according to the characteristics to be processed, feature weights and classification results of different networks to be classified, determine the target loss function when the teacher network and the student network perform knowledge distillation" as: "According to the characteristics to be processed of different networks to be classified Features and feature weights to determine the feature loss function; according to the classification results of different networks to be classified, determine the classification loss function; according to the feature loss function and classification loss function, determine the target loss function when the teacher network and the student network perform knowledge distillation ", to Refine how the objective loss function is determined.

参见图3a所示的一种知识蒸馏方法，包括：See a knowledge distillation method shown in Figure 3a, including:

S310、获取待分类网络的特征提取层提取的待处理图片的待处理特征，以及待分类网络的分类层对待处理特征进行分类所得的分类结果；其中，待分类网络包括教师网络和学生网络。S310. Obtain the features to be processed of the picture to be processed extracted by the feature extraction layer of the network to be classified, and the classification result obtained by classifying the features to be processed by the classification layer of the network to be classified; wherein, the network to be classified includes a teacher network and a student network.

S320、根据待分类网络的待处理特征，确定待分类网络的特征权重。S320. Determine feature weights of the network to be classified according to the characteristics to be processed of the network to be classified.

S330、根据不同待分类网络的待处理特征和特征权重，确定特征损失函数。S330. Determine a feature loss function according to the features to be processed and feature weights of different networks to be classified.

特征损失函数可以为根据教师网络和学生网络的待处理特征和特征权重确定的损失函数，用于监督学生网络对教师网络的特征提取层参数的学习，提高学生网络的特征提取层的特征提取能力。The feature loss function can be a loss function determined according to the to-be-processed features and feature weights of the teacher network and the student network, which is used to supervise the learning of the feature extraction layer parameters of the teacher network by the student network, and improve the feature extraction ability of the feature extraction layer of the student network .

特征损失函数可以为至少一种，示例性的，特征损失函数可以为两种，分别为包含特征权重的损失函数和不包含特征权重的损失函数。具体的，特征损失函数可以为均方误差函数。There may be at least one type of feature loss function, and for example, there may be two types of feature loss functions, namely, a loss function including feature weights and a loss function not including feature weights. Specifically, the feature loss function may be a mean square error function.

在一个可选实施例中，根据不同待分类网络的待处理特征和特征权重，确定特征损失函数，包括：根据不同待分类网络的待处理特征和对应的特征权重，确定第一损失函数；根据不同待分类网络的待处理特征，确定第二损失函数；根据第一损失函数和第二损失函数，确定特征损失函数。In an optional embodiment, determining the feature loss function according to the features to be processed and the feature weights of different networks to be classified includes: determining the first loss function according to the features to be processed and the corresponding feature weights of different networks to be classified; A second loss function is determined for different features to be processed of the network to be classified; and a feature loss function is determined according to the first loss function and the second loss function.

第一损失函数可以为包括特征权重的损失函数，用于监督学生网络学习待处理特征的边缘特征。示例性的，第一损失函数可以为均方误差函数，则可以将教师网络的待处理特征与该待处理特征对应的特征权重的乘积，以及学生网络的待处理特征与该待处理特征对应的特征权重的乘积，分别作为均方误差函数中的两个参数，确定第一损失函数。通过第一损失函数可以监督学生网络学习到更加丰富的边缘特征，目的是使学生网络能够学习到教师网络的特征，甚至期待超过教师的性能。The first loss function may be a loss function including feature weights, and is used to supervise the student network to learn edge features of the features to be processed. Exemplarily, the first loss function may be a mean square error function, then the product of the feature weights corresponding to the features to be processed of the teacher network and the feature weights corresponding to the features to be processed, and the weight of the features to be processed of the student network and the features to be processed The product of the feature weights is respectively used as two parameters in the mean square error function to determine the first loss function. Through the first loss function, the student network can be supervised to learn richer edge features. The purpose is to enable the student network to learn the characteristics of the teacher network, and even expect to exceed the performance of the teacher.

第二损失函数可以为不包括特征权重的损失函数，用于监督学生网络学习待处理特征的本身的特征。示例性的，第一损失函数可以为均方误差函数，则可以将教师网络的待处理特征学生网络的待处理特征，分别作为均方误差函数中的两个参数，确定第二损失函数。在特征学习上，希望学生网络与教师网络的最后一层特征提取层图最大程度的保持一致，因此，通过第二损失函数避免学生网络刻意学习细节边缘特征而偏离教师模型的参数分布特征的现象发生。The second loss function may be a loss function that does not include feature weights, and is used to supervise the student network to learn the features of the features to be processed. Exemplarily, the first loss function may be a mean square error function, and then the features to be processed of the teacher network and the features to be processed of the student network may be respectively used as two parameters in the mean square error function to determine the second loss function. In terms of feature learning, it is hoped that the last layer of the feature extraction layer of the student network and the teacher network is consistent to the greatest extent. Therefore, the second loss function is used to avoid the phenomenon that the student network deliberately learns the edge features of details and deviates from the parameter distribution characteristics of the teacher model. occur.

将第一损失函数和第二损失函数的加权和，作为特征损失函数。第一损失函数和第二损失函数进行加权的加权系数可以根据经验和试验进行调整，本申请不做具体限定。可选的，可以将第一损失函数和第二损失函数的和，作为特征损失函数，也即第一损失函数和第二损失函数的权重都是1。The weighted sum of the first loss function and the second loss function is used as the feature loss function. The weighting coefficients for weighting the first loss function and the second loss function can be adjusted according to experience and experiments, and are not specifically limited in this application. Optionally, the sum of the first loss function and the second loss function may be used as the feature loss function, that is, the weights of the first loss function and the second loss function are both 1.

通过根据不同待分类网络的待处理特征和对应的特征权重，确定第一损失函数。可以监督学生网络学习到丰富的边缘特征；根据不同待分类网络的待处理特征，确定第二损失函数，可以监督学生网络最大程度上与教师网络的特征图提取层保持一致；根据第一损失函数和第二损失函数，确定特征损失函数，使得学生网络在学习到更多的边缘特征的同时，学习到教师网络的特征提取性能。The first loss function is determined according to the features to be processed and the corresponding feature weights of different networks to be classified. Can supervise the student network to learn rich edge features; determine the second loss function according to the characteristics to be processed of different networks to be classified, and can supervise the student network to be consistent with the feature map extraction layer of the teacher network to the greatest extent; according to the first loss function and the second loss function, determine the feature loss function, so that the student network can learn the feature extraction performance of the teacher network while learning more edge features.

在一个可选实施例中，根据不同待分类网络的待处理特征和对应的特征权重，确定第一损失函数，包括：根据教师网络的待处理特征和对应的特征权重，得到教师损失参数；根据学生网络的待处理特征和对应的特征权重，得到学生损失参数；根据教师损失参数和学生损失参数，通过均方损失函数确定第一损失函数。In an optional embodiment, determining the first loss function according to the features to be processed and the corresponding feature weights of different networks to be classified includes: obtaining the teacher loss parameter according to the features to be processed and the corresponding feature weights of the teacher network; The features to be processed of the student network and the corresponding feature weights are used to obtain the student loss parameters; according to the teacher loss parameters and the student loss parameters, the first loss function is determined through the mean square loss function.

教师损失参数可以为根据教师网络确定的第一损失函数中的一个计算参数。具体的，教师损失参数可以为教师网络的待处理特征与对应的特征权重的乘积。学生损失参数可以为根据学生网络确定的第一损失函数中的一个计算参数。具体的，学生损失参数可以为教师网络的待处理特征与对应的特征权重的乘积。将教师损失参数和学生损失参数，作为均方损失函数的两个参数，确定第一损失函数。The teacher loss parameter may be a calculation parameter in the first loss function determined according to the teacher network. Specifically, the teacher loss parameter may be the product of the to-be-processed features of the teacher network and the corresponding feature weights. The student loss parameter may be a calculation parameter in the first loss function determined according to the student network. Specifically, the student loss parameter can be the product of the to-be-processed features of the teacher network and the corresponding feature weights. The first loss function is determined by using the teacher loss parameter and the student loss parameter as two parameters of the mean square loss function.

通过根据教师网络的待处理特征和对应的特征权重，得到教师损失参数；根据学生网络的待处理特征和对应的特征权重，得到学生损失参数；根据教师损失参数和学生损失参数，通过均方损失函数确定第一损失函数，将特征权重作为待处理特征的加权系数，可以重点学习边缘特征，提高对边缘特征的提取能力，提高后续分类的准确度。According to the to-be-processed features of the teacher network and the corresponding feature weights, the teacher loss parameters are obtained; according to the to-be-processed features of the student network and the corresponding feature weights, the student loss parameters are obtained; according to the teacher loss parameters and student loss parameters, through the mean square loss The function determines the first loss function, and the feature weight is used as the weighting coefficient of the feature to be processed, which can focus on learning edge features, improve the ability to extract edge features, and improve the accuracy of subsequent classification.

在一个可选实施例中，根据学生网络的待处理特征和对应的特征权重，得到学生损失参数，包括：将单位矩阵与学生网络对应特征权重的差值作为学生网络的加权权重；根据学生网络的待处理特征和加权权重，确定学生损失参数。In an optional embodiment, according to the characteristics to be processed of the student network and the corresponding feature weights, the student loss parameters are obtained, including: using the difference between the identity matrix and the corresponding feature weights of the student network as the weighted weight of the student network; according to the student network The pending features and weighting weights determine the student loss parameters.

将单位矩阵与学生网络对应特征权重的差值作为学生网络的加权权重，相当于减小学生网络中待处理特征的边缘细节纹理区域的权重，增加了背景区域的权重。示例性的，学生网络用于对手写字和机打字进行分类时，将单位矩阵与学生网络对应特征权重的差值作为学生网络的加权权重，可以确保在字体边缘处赋予更小的权重，背景区域赋予更大的权重，这样在监督训练过程中，为使第一损失函数变小，促使学生网络更关注于待处理特征的边缘细节纹理信息，在边缘处学习到更丰富的特征，弱化对背景区域的学习，可以优化学生网络。将学生网络的待处理特征和加权权重的乘积，确定为学生损失参数。Taking the difference between the identity matrix and the corresponding feature weight of the student network as the weighted weight of the student network is equivalent to reducing the weight of the edge detail texture area of the feature to be processed in the student network and increasing the weight of the background area. Exemplarily, when the student network is used to classify handwriting and machine typing, the difference between the identity matrix and the corresponding feature weight of the student network is used as the weighted weight of the student network, which can ensure that a smaller weight is given to the edge of the font, and the background area Giving a larger weight, so that in the process of supervised training, in order to make the first loss function smaller, the student network is encouraged to pay more attention to the edge detail texture information of the features to be processed, learn more abundant features at the edge, and weaken the background. Regional learning can optimize student networks. The product of the features to be processed and the weighted weights of the student network is determined as the student loss parameter.

通过将单位矩阵与学生网络对应特征权重的差值作为学生网络的加权权重；根据学生网络的待处理特征和加权权重，确定学生损失参数，促使学生网络更关注于待处理特征的边缘细节纹理信息，优化学生网络。By using the difference between the unit matrix and the corresponding feature weight of the student network as the weighted weight of the student network; according to the characteristics to be processed and the weighted weight of the student network, the student loss parameter is determined, and the student network is prompted to pay more attention to the edge detail texture information of the characteristics to be processed , optimize the student network.

S340、根据不同待分类网络的分类结果，确定分类损失函数。S340. Determine a classification loss function according to classification results of different networks to be classified.

分类损失函数可以为根据教师网络和学生网络的分类结果确定的损失函数，用于监督学生网络对教师网络的分类层参数的学习，提高学生网络的分类层的分类准确性。示例性的，分类损失函数可以为KL散度损失函数，可以将教师网络的分类结果和学生网络的分类结果，分别作为KL散度损失函数中的两个参数，确定分类损失函数。The classification loss function may be a loss function determined according to the classification results of the teacher network and the student network, and is used to supervise the learning of the classification layer parameters of the teacher network by the student network and improve the classification accuracy of the classification layer of the student network. Exemplarily, the classification loss function may be a KL divergence loss function, and the classification result of the teacher network and the classification result of the student network may be respectively used as two parameters in the KL divergence loss function to determine the classification loss function.

S350、根据特征损失函数和分类损失函数，确定教师网络和学生网络进行知识蒸馏时的目标损失函数。S350. According to the feature loss function and the classification loss function, determine a target loss function when the teacher network and the student network perform knowledge distillation.

根据特征损失函数和分类损失函数的加权和，确定教师网络和学生网络进行知识蒸馏时的目标损失函数。特征损失函数和分类损失函数进行加权的加权系数可以根据经验或试验进行调整，本申请不做具体限定。According to the weighted sum of the feature loss function and the classification loss function, the target loss function when the teacher network and the student network perform knowledge distillation is determined. The weighting coefficients for weighting the feature loss function and the classification loss function can be adjusted according to experience or experiments, which is not specifically limited in this application.

图3b为一种目标损失函数确定流程示意图。如图3b所示，将待处理图片同时输入到教师网络和学生网络中，分别获取学生网络和教师网络的最后一层卷积层得到的待处理特征，如图3b中左边虚线框中所示，依据该待处理特征，确定第二损失函数；根据教师网络的特征权重和学生网络的加权权重分别对相应的待处理特征进行加权，如图3b中的乘法器所示，图3b中上方的乘法器进行加权的加权系数为教师网络对应的特征权重，图3b中下方的乘法器进行加权的加权系数为单位矩阵与学生网络对应特征权重的差，也即加权权重，进行加权操作后，得到加权后的待处理特征，如图3b中右边虚线框中所示，根据该待处理特征，确定第一损失函数；分别获取学生网络和教师网络的分类层得到的分类结果，确定分类损失函数，根据第一损失函数、第二损失函数和分类损失函数得到目标损失函数，也即图3b中所示的目标损失函数。Fig. 3b is a schematic diagram of a process for determining a target loss function. As shown in Figure 3b, the image to be processed is input into the teacher network and the student network at the same time, and the features to be processed obtained by the last convolutional layer of the student network and the teacher network are respectively obtained, as shown in the left dashed box in Figure 3b , according to the feature to be processed, determine the second loss function; weight the corresponding feature to be processed according to the feature weight of the teacher network and the weighted weight of the student network, as shown in the multiplier in Figure 3b, and the upper part of Figure 3b The weighting coefficient of the multiplier for weighting is the feature weight corresponding to the teacher network. The weighting coefficient of the multiplier at the bottom of Figure 3b is the difference between the identity matrix and the corresponding feature weight of the student network, that is, the weighting weight. After the weighting operation, we get The weighted features to be processed are shown in the dotted line box on the right in Figure 3b. According to the features to be processed, the first loss function is determined; the classification results obtained by the classification layers of the student network and the teacher network are obtained respectively, and the classification loss function is determined. The target loss function is obtained according to the first loss function, the second loss function and the classification loss function, that is, the target loss function shown in FIG. 3b.

S360、根据目标损失函数，调整学生网络的网络参数。S360. Adjust network parameters of the student network according to the target loss function.

本实施例的技术方案，通过根据不同待分类网络的待处理特征和特征权重，确定特征损失函数，可以使得学生网络学习教师网络的特征提取能力；根据不同待分类网络的分类结果，确定分类损失函数，可以使得学生网络学习教师网络根据提取的待处理特征进行分类的能力；根据特征损失函数和分类损失函数，确定教师网络和学生网络进行知识蒸馏时的目标损失函数，可以使得学生网络学习到更加鲁棒的，且有利于进行图片分类的待处理特征，提高学生网络的学习能力和分类的准确率，运用较小的学生网络，也能够实现优秀的分类性能。In the technical solution of this embodiment, by determining the feature loss function according to the features to be processed and the feature weights of different networks to be classified, the student network can learn the feature extraction ability of the teacher network; according to the classification results of different networks to be classified, the classification loss is determined The function can make the student network learn the ability of the teacher network to classify according to the extracted features to be processed; according to the feature loss function and the classification loss function, determine the target loss function when the teacher network and the student network perform knowledge distillation, so that the student network can learn More robust features to be processed that are conducive to image classification, improve the learning ability of the student network and the accuracy of classification, and use a smaller student network to achieve excellent classification performance.

实施例四Embodiment four

图4所示为本申请实施例四提供的一种知识蒸馏装置的结构示意图，本实施例可适用于在对电网建设的档案中对手写字体进行分类时，对分类网络进行知识蒸馏的情况，该知识蒸馏装置的具体结构如下：FIG. 4 is a schematic structural diagram of a knowledge distillation device provided in Embodiment 4 of the present application. This embodiment is applicable to the case of performing knowledge distillation on a classification network when classifying handwritten characters in files for power grid construction. The specific structure of the knowledge distillation device is as follows:

参数获取模块410，用于获取待分类网络的特征提取层提取的待处理图片的待处理特征，以及待分类网络的分类层对待处理特征进行分类所得的分类结果；其中，待分类网络包括教师网络和学生网络；The parameter acquisition module 410 is used to obtain the features to be processed of the picture to be processed extracted by the feature extraction layer of the network to be classified, and the classification result obtained by classifying the features to be processed by the classification layer of the network to be classified; wherein, the network to be classified includes the teacher network and student networks;

特征权重确定模块420，用于根据待分类网络的待处理特征，确定待分类网络的特征权重；The feature weight determination module 420 is used to determine the feature weight of the network to be classified according to the characteristics to be processed of the network to be classified;

目标损失函数确定模块430，用于根据不同待分类网络的待处理特征、特征权重以及分类结果，确定教师网络和学生网络进行知识蒸馏时的目标损失函数；The target loss function determination module 430 is used to determine the target loss function when the teacher network and the student network perform knowledge distillation according to the characteristics to be processed, feature weights and classification results of different networks to be classified;

网络参数调整模块440，用于根据目标损失函数，调整学生网络的网络参数。The network parameter adjustment module 440 is configured to adjust the network parameters of the student network according to the target loss function.

本实施例的技术方案，通过参数获取模块获取待分类网络的特征提取层提取的待处理图片的待处理特征，以及待分类网络的分类层对待处理特征进行分类所得的分类结果；其中，待分类网络包括教师网络和学生网络，便于后续根据教师网络的特征提取层和分类层对学生网络的对应层进行监督学习；通过特征权重确定模块根据待分类网络的待处理特征，确定待分类网络的特征权重，通过特征权重加强对重要特征的学习，提高后续学生网络的分类准确性；通过目标损失函数确定模块根据不同待分类网络的待处理特征、特征权重以及分类结果，确定教师网络和学生网络进行知识蒸馏时的目标损失函数；通过网络参数调整模块根据目标损失函数，调整学生网络的网络参数；目标损失函数包括了特征提取层和分类层的损失函数，教师网络和学生网络进行知识蒸馏时可以同时学习教师网络的特征提取层和分类层的参数，提高学生网络和教师网络的一致性，提高学生网络的分类精度和速度，同时由于学生网络自身层数较浅，占用资源较少，进而提高可应用性。因此通过本申请的技术方案，解决了网络层数较浅达不到精度要求，网络层数加深会影响模型的推理速度，占用更多的资源，可应用性差的问题，达到了提高分类精度和速度，提高分类网络的可应用性的效果。In the technical solution of this embodiment, the features to be processed of the pictures to be processed extracted by the feature extraction layer of the network to be classified are obtained through the parameter acquisition module, and the classification results obtained by classifying the features to be processed by the classification layer of the network to be classified; The network includes a teacher network and a student network, which facilitate subsequent supervised learning of the corresponding layer of the student network according to the feature extraction layer and classification layer of the teacher network; the feature weight determination module determines the characteristics of the network to be classified according to the characteristics to be processed of the network to be classified Weight, through the feature weight to strengthen the learning of important features, improve the classification accuracy of the subsequent student network; through the target loss function determination module, according to the characteristics to be processed, feature weight and classification results of different networks to be classified, determine the teacher network and student network. The target loss function during knowledge distillation; adjust the network parameters of the student network according to the target loss function through the network parameter adjustment module; the target loss function includes the loss function of the feature extraction layer and the classification layer, and the teacher network and the student network can perform knowledge distillation. At the same time, learn the parameters of the feature extraction layer and classification layer of the teacher network, improve the consistency between the student network and the teacher network, and improve the classification accuracy and speed of the student network. applicability. Therefore, through the technical solution of this application, the problems of shallow network layers that cannot meet the accuracy requirements, deepened network layers will affect the reasoning speed of the model, occupy more resources, and poor applicability have been solved, and the problems of improving classification accuracy and Speed, the effect of improving the applicability of classification networks.

可选的，特征权重确定模块420，包括：Optionally, the feature weight determination module 420 includes:

边缘特征提取单元，用于提取待分类网络的待处理特征的边缘特征；The edge feature extraction unit is used to extract the edge feature of the feature to be processed of the network to be classified;

边缘特征映射单元，用于根据边缘特征，通过权重映射函数，确定待分类网络的特征权重。The edge feature mapping unit is used to determine the feature weight of the network to be classified through the weight mapping function according to the edge feature.

可选的，边缘特征提取单元，包括：Optionally, the edge feature extraction unit includes:

频域特征确定子单元，用于通过傅里叶变换，确定待处理特征的频域特征；A frequency-domain feature determining subunit is used to determine the frequency-domain feature of the feature to be processed through Fourier transform;

高频特征确定子单元，用于通过傅里叶逆变换，确定频域特征的高频特征；The high-frequency feature determination subunit is used to determine the high-frequency feature of the frequency domain feature through inverse Fourier transform;

边缘特征确定子单元，用于通过索贝尔sobel算子，确定高频特征的边缘特征。The edge feature determination subunit is used to determine edge features of high-frequency features through a Sobel operator.

可选的，目标损失函数确定模块430，包括：Optionally, the target loss function determination module 430 includes:

特征损失函数确定单元，用于根据不同待分类网络的待处理特征和特征权重，确定特征损失函数；A feature loss function determination unit is used to determine the feature loss function according to the features to be processed and feature weights of different networks to be classified;

分类损失函数确定单元，用于根据不同待分类网络的分类结果，确定分类损失函数；The classification loss function determination unit is used to determine the classification loss function according to the classification results of different networks to be classified;

目标损失函数组合单元，用于根据特征损失函数和分类损失函数，确定教师网络和学生网络进行知识蒸馏时的目标损失函数。The target loss function combination unit is used to determine the target loss function when the teacher network and the student network perform knowledge distillation according to the feature loss function and the classification loss function.

可选的，特征损失函数确定单元，包括：Optionally, the feature loss function determines the unit, including:

第一损失函数确定子单元，用于根据不同待分类网络的待处理特征和对应的特征权重，确定第一损失函数；The first loss function determination subunit is used to determine the first loss function according to the features to be processed and the corresponding feature weights of different networks to be classified;

第二损失函数确定子单元，用于根据不同待分类网络的待处理特征，确定第二损失函数；The second loss function determination subunit is used to determine the second loss function according to the characteristics to be processed of different networks to be classified;

特征损失函数组合子单元，用于根据第一损失函数和第二损失函数，确定特征损失函数。The feature loss function combining subunit is used to determine the feature loss function according to the first loss function and the second loss function.

可选的，第一损失函数确定子单元，具体用于：根据教师网络的待处理特征和对应的特征权重，得到教师损失参数；根据学生网络的待处理特征和对应的特征权重，得到学生损失参数；根据教师损失参数和学生损失参数，通过均方损失函数确定第一损失函数。Optionally, the first loss function determination subunit is specifically used to: obtain the teacher loss parameter according to the to-be-processed features of the teacher network and the corresponding feature weights; obtain the student loss according to the to-be-processed features of the student network and the corresponding feature weights parameter; according to the teacher loss parameter and the student loss parameter, the first loss function is determined through the mean square loss function.

可选的，第一损失函数确定子单元，还具体用于：将单位矩阵与学生网络对应特征权重的差值作为学生网络的加权权重；根据学生网络的待处理特征和加权权重，确定学生损失参数。Optionally, the first loss function determines the subunit, which is also specifically used to: use the difference between the identity matrix and the corresponding feature weight of the student network as the weighted weight of the student network; determine the student loss according to the characteristics to be processed and the weighted weight of the student network parameter.

本申请实施例所提供的知识蒸馏装置可执行本申请任意实施例所提供的知识蒸馏方法，具备执行知识蒸馏方法相应的功能模块和有益效果。The knowledge distillation device provided in the embodiment of the present application can implement the knowledge distillation method provided in any embodiment of the present application, and has corresponding functional modules and beneficial effects for performing the knowledge distillation method.

实施例五Embodiment five

图5为本申请实施例五提供的一种电子设备的结构示意图，如图5所示，该电子设备包括处理器510、存储器520、输入装置530和输出装置540；电子设备中处理器510的数量可以是一个或多个，图5中以一个处理器510为例；电子设备中的处理器510、存储器520、输入装置530和输出装置540可以通过总线或其他方式连接，图5中以通过总线连接为例。FIG. 5 is a schematic structural diagram of an electronic device provided in Embodiment 5 of the present application. As shown in FIG. 5 , the electronic device includes a processor 510, a memory 520, an input device 530, and an output device 540; The quantity can be one or more, and a processor 510 is taken as an example in FIG. Take the bus connection as an example.

存储器520作为一种计算机可读存储介质，可用于存储软件程序、计算机可执行程序以及模块，如本申请实施例中的知识蒸馏方法对应的程序指令/模块(例如，参数获取模块410、特征权重确定模块420、目标损失函数确定模块430和网络参数调整模块440)。处理器510通过运行存储在存储器520中的软件程序、指令以及模块，从而执行电子设备的各种功能应用以及数据处理，即实现上述的知识蒸馏方法。The memory 520, as a computer-readable storage medium, can be used to store software programs, computer-executable programs and modules, such as program instructions/modules corresponding to the knowledge distillation method in the embodiment of the present application (for example, parameter acquisition module 410, feature weight Determination module 420, target loss function determination module 430 and network parameter adjustment module 440). The processor 510 executes various functional applications and data processing of the electronic device by running the software programs, instructions and modules stored in the memory 520 , that is, implements the above-mentioned knowledge distillation method.

存储器520可主要包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需的应用程序；存储数据区可存储根据终端的使用所创建的数据等。此外，存储器520可以包括高速随机存取存储器，还可以包括非易失性存储器，例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中，存储器520可进一步包括相对于处理器510远程设置的存储器，这些远程存储器可以通过网络连接至电子设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 520 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required by at least one function; the data storage area may store data created according to the use of the terminal, and the like. In addition, the memory 520 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage devices. In some examples, the memory 520 may further include memory located remotely relative to the processor 510, and these remote memories may be connected to the electronic device through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

输入装置530可用于接收输入的字符信息，以及产生与电子设备的用户设置以及功能控制有关的键信号输入。输出装置540可包括显示屏等显示设备。The input device 530 can be used to receive input character information, and generate key signal input related to user settings and function control of the electronic device. The output device 540 may include a display device such as a display screen.

实施例六Embodiment six

本申请实施例六还提供一种包含计算机可执行指令的存储介质，计算机可执行指令在由计算机处理器执行时用于执行一种知识蒸馏方法，该方法包括：获取待分类网络的特征提取层提取的待处理图片的待处理特征，以及待分类网络的分类层对待处理特征进行分类所得的分类结果；其中，待分类网络包括教师网络和学生网络；根据待分类网络的待处理特征，确定待分类网络的特征权重；根据不同待分类网络的待处理特征、特征权重以及分类结果，确定教师网络和学生网络进行知识蒸馏时的目标损失函数；根据目标损失函数，调整学生网络的网络参数。Embodiment 6 of the present application also provides a storage medium containing computer-executable instructions. When executed by a computer processor, the computer-executable instructions are used to perform a knowledge distillation method. The method includes: obtaining the feature extraction layer of the network to be classified The features to be processed of the extracted pictures to be processed, and the classification results obtained by classifying the features to be processed by the classification layer of the network to be classified; wherein, the network to be classified includes a teacher network and a student network; according to the characteristics to be processed of the network to be classified, determine the The feature weight of the classification network; according to the characteristics to be processed, feature weights and classification results of different networks to be classified, determine the target loss function when the teacher network and the student network perform knowledge distillation; adjust the network parameters of the student network according to the target loss function.

当然，本申请实施例所提供的一种包含计算机可执行指令的存储介质，其计算机可执行指令不限于如上所述的方法操作，还可以执行本申请任意实施例所提供的知识蒸馏方法中的相关操作。Of course, in the storage medium containing computer-executable instructions provided in the embodiments of the present application, the computer-executable instructions are not limited to the method operations described above, and can also perform the knowledge distillation method provided in any embodiment of the present application. related operations.

通过以上关于实施方式的描述，所属领域的技术人员可以清楚地了解到，本申请可借助软件及必需的通用硬件来实现，当然也可以通过硬件实现，但很多情况下前者是更佳的实施方式。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中，如计算机的软盘、只读存储器(Read-Only Memory，ROM)、随机存取存储器(RandomAccess Memory，RAM)、闪存(FLASH)、硬盘或光盘等，包括若干指令用以使得一台电子设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述的方法。Through the above description about the implementation, those skilled in the art can clearly understand that the present application can be realized by means of software and necessary general-purpose hardware, and of course it can also be realized by hardware, but in many cases the former is a better implementation . Based on this understanding, the essence of the technical solution of this application or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as a floppy disk of a computer , read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), flash memory (FLASH), hard disk or optical disc, etc., including several instructions to make an electronic device (which can be a personal computer, A server, or a network device, etc.) executes the methods described in various embodiments of the present application.

值得注意的是，上述搜索装置的实施例中，所包括的各个单元和模块只是按照功能逻辑进行划分的，但并不局限于上述的划分，只要能够实现相应的功能即可；另外，各功能单元的具体名称也只是为了便于相互区分，并不用于限制本申请的保护范围。It is worth noting that in the embodiments of the search device above, the units and modules included are only divided according to functional logic, but are not limited to the above-mentioned divisions, as long as the corresponding functions can be realized; in addition, each function The specific names of the units are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application.

注意，上述仅为本申请的较佳实施例及所运用技术原理。本领域技术人员会理解，本申请不限于这里所述的特定实施例，对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本申请的保护范围。因此，虽然通过以上实施例对本申请进行了较为详细的说明，但是本申请不仅仅限于以上实施例，在不脱离本申请构思的情况下，还可以包括更多其他等效实施例，而本申请的范围由所附的权利要求范围决定。Note that the above are only preferred embodiments and technical principles used in this application. Those skilled in the art will understand that the present application is not limited to the specific embodiments described herein, and various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present application. Therefore, although the present application has been described in detail through the above embodiments, the present application is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present application, and the present application The scope is determined by the scope of the appended claims.

Claims

1. A knowledge distillation method, characterized in that, comprising:

Obtain the features to be processed of the picture to be processed extracted by the feature extraction layer of the network to be classified, and the classification result obtained by classifying the features to be processed by the classification layer of the network to be classified; wherein, the network to be classified includes a teacher network and student networks;

determining the feature weight of the network to be classified according to the characteristics to be processed of the network to be classified;

According to the features to be processed, feature weights and classification results of different networks to be classified, determine the target loss function when the teacher network and the student network perform knowledge distillation;

According to the target loss function, network parameters of the student network are adjusted.

2. The method according to claim 1, wherein said determining the feature weight of said network to be classified according to the characteristics to be processed of said network to be classified comprises:

Extracting the edge features of the features to be processed of the network to be classified;

According to the edge features, the feature weights of the network to be classified are determined through a weight mapping function.

3. method according to claim 2, is characterized in that, described extracting the edge feature of the feature to be processed of described network to be classified comprises:

Determine the frequency domain features of the features to be processed by Fourier transform;

Determining the high-frequency features of the frequency-domain features by inverse Fourier transform;

The edge features of the high-frequency features are determined through a Sobel operator.

4. The method according to claim 1, wherein, according to the characteristics to be processed, feature weights and classification results of different networks to be classified, determining the target loss function when the teacher network and the student network carry out knowledge distillation includes:

Determine a feature loss function according to the features to be processed and feature weights of the different networks to be classified;

According to the classification results of the different networks to be classified, determine a classification loss function;

According to the feature loss function and the classification loss function, the target loss function when the teacher network and the student network perform knowledge distillation is determined.

5. The method according to claim 4, wherein the feature loss function is determined according to the features to be processed and the feature weights of the different networks to be classified, comprising:

Determine a first loss function according to the features to be processed and the corresponding feature weights of the different networks to be classified;

Determining a second loss function according to the characteristics to be processed of the different networks to be classified;

The feature loss function is determined according to the first loss function and the second loss function.

6. The method according to claim 5, wherein said determining a first loss function according to the features to be processed and the corresponding feature weights of said different networks to be classified comprises:

Obtain teacher loss parameters according to the features to be processed and the corresponding feature weights of the teacher network;

According to the characteristics to be processed of the student network and the corresponding feature weights, student loss parameters are obtained;

The first loss function is determined by means of a mean square loss function according to the teacher loss parameter and the student loss parameter.

7. The method according to claim 6, characterized in that, said according to the characteristics to be processed and the corresponding feature weights of said student network, obtain student loss parameters, comprising:

Using the difference between the identity matrix and the corresponding feature weight of the student network as the weighted weight of the student network;

The student loss parameter is determined according to the characteristics to be processed of the student network and the weighted weights.

8. A knowledge distillation device, comprising:

The parameter acquisition module is used to obtain the features to be processed of the picture to be processed extracted by the feature extraction layer of the network to be classified, and the classification result obtained by classifying the features to be processed by the classification layer of the network to be classified; wherein, the The networks to be classified include teacher networks and student networks;

A feature weight determination module, configured to determine the feature weight of the network to be classified according to the features to be processed of the network to be classified;

The target loss function determination module is used to determine the target loss function when the teacher network and the student network perform knowledge distillation according to the characteristics to be processed, feature weights and classification results of different networks to be classified;

A network parameter adjustment module, configured to adjust the network parameters of the student network according to the target loss function.

9. An electronic device, comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor implements any of claims 1-7 when executing the program. The knowledge distillation method described.

10. A computer-readable storage medium, on which a computer program is stored, wherein when the program is executed by a processor, the knowledge distillation method according to any one of claims 1-7 is implemented.