CN111507403B

CN111507403B - Image classification method, device, computer equipment and storage medium

Info

Publication number: CN111507403B
Application number: CN202010303814.8A
Authority: CN
Inventors: 李岩; 康斌
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-04-17
Filing date: 2020-04-17
Publication date: 2024-11-05
Anticipated expiration: 2040-04-17
Also published as: CN111507403A

Abstract

The present application relates to the field of artificial intelligence technology, and provides an image classification method, device, computer equipment and storage medium, which obtains an image to be classified, and inputs at least two image features of the image to be classified into at least two image classifiers, the at least two image classifiers correspond to at least two classification levels, and the image features input to the image classifiers corresponding to the adjacent classification levels have a similarity constraint relationship, which is used to reduce the similarity between the image features; and obtains the hierarchical classification result of the image to be classified according to the classification result of the image to be classified output by the image classifier at the corresponding classification level. The scheme reduces the similarity between the image features input to the image classifiers corresponding to the adjacent classification levels by applying a similarity constraint relationship between the image features, so that the image classifiers corresponding to the different classification levels pay attention to the different image features on the image to be classified, thereby improving the accuracy of hierarchical classification of the image.

Description

Image classification method, device, computer equipment and storage medium

技术领域Technical Field

本申请涉及人工智能技术领域，特别是涉及一种图像分类方法、装置、计算机设备和存储介质。The present application relates to the field of artificial intelligence technology, and in particular to an image classification method, apparatus, computer equipment and storage medium.

背景技术Background Art

随着人工智能技术的发展，出现了基于如深度神经网络等深度学习技术对图像进行分类的技术，例如可以基于深度神经网络构建图像分类器对输入的图像进行分类。其中，对于传统的图像分类任务来说，各图像的类别之间的地位一般是等同的，即对于所有图像不做类别的区分，在这种情况下图像分类器对于比较简单的分类任务，如将汽车图像和其他类别如猫、狗图像之间的区分比较容易实现。而采用层次化的图像分类技术，可以通过先判断图像属于动物类别还是非动物类别，然后再从例如动物类别中进一步学习猫、狗类别，进而完成层次化分类任务。With the development of artificial intelligence technology, there are technologies for classifying images based on deep learning technologies such as deep neural networks. For example, an image classifier can be built based on a deep neural network to classify input images. Among them, for traditional image classification tasks, the status of each image category is generally equal, that is, no category distinction is made for all images. In this case, the image classifier is relatively easy to achieve for relatively simple classification tasks, such as distinguishing between car images and other categories such as cat and dog images. By using hierarchical image classification technology, it is possible to complete the hierarchical classification task by first determining whether the image belongs to the animal category or the non-animal category, and then further learning the cat and dog categories from the animal category, for example.

传统技术所提供的层次化图像分类方法，往往直接将图像分别输入到例如两个不同的图像分类器，一个针对大类别进行分类训练，另一个针对大类别下的子类别进行分类训练。然而，采用这种方式对图像进行层次化分类的准确性较低。The hierarchical image classification method provided by the traditional technology often directly inputs the image into two different image classifiers, for example, one for classification training for a large category and the other for classification training for subcategories under the large category. However, the accuracy of hierarchical classification of images using this method is low.

发明内容Summary of the invention

基于此，有必要针对上述技术问题，提供一种图像分类方法、装置、计算机设备和存储介质。Based on this, it is necessary to provide an image classification method, device, computer equipment and storage medium to address the above technical problems.

一种图像分类方法，所述方法包括：A method for image classification, comprising:

获取待分类图像；Get the image to be classified;

将所述待分类图像具有的至少两种图像特征，对应输入至至少两个图像分类器；所述至少两个图像分类器与至少两个分类层次分别对应；输入至相邻分类层次对应的图像分类器的图像特征之间，具有相似性约束关系，用于降低所述图像特征之间的相似性；Inputting at least two image features of the image to be classified into at least two image classifiers respectively; the at least two image classifiers correspond to at least two classification levels respectively; the image features input into the image classifiers corresponding to adjacent classification levels have a similarity constraint relationship, which is used to reduce the similarity between the image features;

根据所述图像分类器输出的所述待分类图像在相应分类层次上的分类结果，获取所述待分类图像的层次化分类结果。According to the classification result of the image to be classified at the corresponding classification level output by the image classifier, a hierarchical classification result of the image to be classified is obtained.

一种图像分类装置，所述装置包括：An image classification device, comprising:

图像获取模块，用于获取待分类图像；An image acquisition module, used to acquire images to be classified;

特征输入模块，用于将所述待分类图像具有的至少两种图像特征，对应输入至至少两个图像分类器；所述至少两个图像分类器与至少两个分类层次分别对应；输入至相邻分类层次对应的图像分类器的图像特征之间，具有相似性约束关系，用于降低所述图像特征之间的相似性；A feature input module, used for inputting at least two image features of the image to be classified into at least two image classifiers; the at least two image classifiers correspond to at least two classification levels respectively; the image features input to the image classifiers corresponding to adjacent classification levels have a similarity constraint relationship, which is used to reduce the similarity between the image features;

结果获取模块，用于根据所述图像分类器输出的所述待分类图像在相应分类层次上的分类结果，获取所述待分类图像的层次化分类结果。The result acquisition module is used to acquire the hierarchical classification result of the image to be classified according to the classification result of the image to be classified at the corresponding classification level output by the image classifier.

一种计算机设备，包括存储器和处理器，所述存储器存储有计算机程序，所述处理器执行所述计算机程序时实现以下步骤：A computer device comprises a memory and a processor, wherein the memory stores a computer program, and when the processor executes the computer program, the following steps are implemented:

获取待分类图像；将所述待分类图像具有的至少两种图像特征，对应输入至至少两个图像分类器；所述至少两个图像分类器与至少两个分类层次分别对应；输入至相邻分类层次对应的图像分类器的图像特征之间，具有相似性约束关系，用于降低所述图像特征之间的相似性；根据所述图像分类器输出的所述待分类图像在相应分类层次上的分类结果，获取所述待分类图像的层次化分类结果。Acquire an image to be classified; input at least two image features of the image to be classified into at least two image classifiers; the at least two image classifiers correspond to at least two classification levels respectively; there is a similarity constraint relationship between the image features input to the image classifiers corresponding to adjacent classification levels, which is used to reduce the similarity between the image features; and obtain a hierarchical classification result of the image to be classified according to the classification result of the image to be classified at the corresponding classification level output by the image classifier.

一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现以下步骤：A computer-readable storage medium stores a computer program, which, when executed by a processor, implements the following steps:

上述图像分类方法、装置、计算机设备和存储介质，获取待分类图像，将该待分类图像所具有的至少两种图像特征，对应输入到至少两个图像分类器，该至少两个图像分类器与至少两个分类层次分别对应，而输入到相邻分类层次对应的图像分类器的图像特征之间，具有相似性约束关系，用于降低该图像特征之间的相似性；然后可以根据图像分类器输出的待分类图像在相应分类层次上的分类结果得到该待分类图像的层次化分类结果。该方案通过在图像特征之间施加相似性约束关系，由此可尽可能降低输入到相邻分类层次对应的图像分类器的图像特征之间的相似性，使得不同分类层次对应的图像分类器能够关注到同一待分类图像上的不同图像特征，并根据相应的图像特征在各自的分类层次上对图像进行分类，提高对图像进行层次化分类的准确性，可同时完成待分类图像在多个分类层次的分类任务。The above-mentioned image classification method, device, computer equipment and storage medium obtain an image to be classified, and input at least two image features of the image to be classified into at least two image classifiers, the at least two image classifiers correspond to at least two classification levels respectively, and the image features input into the image classifiers corresponding to the adjacent classification levels have a similarity constraint relationship, which is used to reduce the similarity between the image features; then, the hierarchical classification result of the image to be classified can be obtained according to the classification result of the image to be classified output by the image classifier at the corresponding classification level. The scheme imposes a similarity constraint relationship between image features, thereby reducing the similarity between image features input into the image classifiers corresponding to the adjacent classification levels as much as possible, so that the image classifiers corresponding to different classification levels can pay attention to different image features on the same image to be classified, and classify the image at their respective classification levels according to the corresponding image features, thereby improving the accuracy of hierarchical classification of images, and can simultaneously complete the classification task of the image to be classified at multiple classification levels.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为一个实施例中图像分类方法的应用环境图；FIG1 is a diagram of an application environment of an image classification method according to an embodiment;

图2为一个实施例中图像分类任务的示意图；FIG2 is a schematic diagram of an image classification task in one embodiment;

图3为一个实施例中图像分类方法的流程示意图；FIG3 is a schematic diagram of a flow chart of an image classification method in one embodiment;

图4为一个实施例中构建图像分类器的步骤的流程示意图；FIG4 is a schematic flow chart of steps for constructing an image classifier in one embodiment;

图5为一个实施例中图像分类的原理示意图；FIG5 is a schematic diagram of the principle of image classification in one embodiment;

图6为一个实施例中获取样本图像特征的步骤的流程示意图；FIG6 is a schematic flow chart of steps for obtaining sample image features in one embodiment;

图7为一个实施例中展示图像信息的界面示意图；FIG7 is a schematic diagram of an interface for displaying image information in one embodiment;

图8为一个应用实例中图像分类的原理示意图；FIG8 is a schematic diagram of the principle of image classification in an application example;

图9为一个实施例中图像分类装置的结构框图；FIG9 is a structural block diagram of an image classification device in one embodiment;

图10为一个实施例中计算机设备的内部结构图。FIG. 10 is a diagram showing the internal structure of a computer device in one embodiment.

具体实施方式DETAILED DESCRIPTION

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application more clearly understood, the present application is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application and are not used to limit the present application.

本申请提供的图像分类方法，可以应用于如图1所示的应用环境中，图1为一个实施例中图像分类方法的应用环境图。其中，终端110可以通过网络与服务器120进行通信。其中，终端110可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备，服务器120可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The image classification method provided by the present application can be applied in an application environment as shown in FIG1 , which is an application environment diagram of the image classification method in one embodiment. The terminal 110 can communicate with the server 120 through a network. The terminal 110 can be, but is not limited to, various personal computers, laptops, smart phones, tablet computers, and portable wearable devices, and the server 120 can be implemented as an independent server or a server cluster consisting of multiple servers.

本申请提供的图像分类方法，涉及人工智能技术领域。其中，人工智能(Artificial Intelligence，简称AI)是可以利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，人工智能可以感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。The image classification method provided in this application relates to the field of artificial intelligence technology. Among them, artificial intelligence (AI) is the theory, method, technology and application system that can use digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, and artificial intelligence can perceive the environment, acquire knowledge and use knowledge to obtain the best results.

其中，人工智能技术包括计算机视觉技术，计算机视觉技术(Computer Vision，简称CV)，是指可以用摄影机、电脑等终端设备代替人眼对目标进行识别、跟踪和测量等机器视觉，并进一步做图形处理，使电脑处理成为更适合人眼观察或传送给仪器检测的图像。而计算机视觉技术可以包括图像识别和图像分类等技术，例如识别该图像是汽车的图像、猫的图像或者狗的图像等。Among them, artificial intelligence technology includes computer vision technology. Computer vision technology (CV) refers to the use of terminal devices such as cameras and computers to replace human eyes to identify, track and measure targets, and further perform graphic processing to make the computer processing into an image that is more suitable for human eye observation or transmission to instrument detection. Computer vision technology can include image recognition and image classification technologies, such as identifying whether the image is an image of a car, a cat or a dog.

将机器学习技术结合到计算机视觉技术当中，可以使得终端设备根据已学习的图像分类知识对待分类图像智能化进行分类。进一步的，可以使得终端设备对待分类图像进行层次化分类。Combining machine learning technology with computer vision technology can enable the terminal device to intelligently classify the images to be classified based on the learned image classification knowledge. Furthermore, the terminal device can perform hierarchical classification of the images to be classified.

其中，结合图2对层次化分类任务进行说明，图2为一个实施例中图像分类任务的示意图，该图2示出了普通分类和层次化分类的区别，一般的分类任务中，各个图像的类别之间的地位是等同的，图像分类器将不对各个类别进行区分，即直接将图像识别为猫图像、狗图像、自行车图像或者汽车图像等，但实际上，不同类别之间的关系是不同的，比如猫、狗、汽车和自行车四个类别中，猫和狗都属于动物类别，关系比较接近，而动物类别与汽车、自行车所属的交通工具类别的关系其实是比较远的。由此，层次化分类任务，可以判断图像属于动物类别还是如交通工具类别等非动物类别，然后在动物类别、交通工具类别中进一步识别猫、狗、自行车或者是汽车等，其中，动物类别、交通工具类别属于同一分类层次，可称为大类分类层次；在动物类别下的猫、狗以及交通工具类别下的自行车、汽车则属于另一分类层次，可成为子类分类层次。本申请提供的图像分类方法，可以获取待分类图像在各分类层次上的分类结果，得到待分类图像的层次化分类结果，例如可以得到待分类图像属于动物类别、猫图像的层次化分类结果。In which, the hierarchical classification task is explained in conjunction with FIG2, FIG2 is a schematic diagram of an image classification task in an embodiment, and FIG2 shows the difference between ordinary classification and hierarchical classification. In general classification tasks, the status of the categories of each image is equal, and the image classifier will not distinguish between the categories, that is, directly identify the image as a cat image, a dog image, a bicycle image or a car image, etc., but in fact, the relationship between different categories is different. For example, in the four categories of cats, dogs, cars and bicycles, cats and dogs both belong to the animal category, and the relationship is relatively close, while the relationship between the animal category and the vehicle category to which cars and bicycles belong is actually relatively far. Therefore, the hierarchical classification task can determine whether the image belongs to the animal category or a non-animal category such as the vehicle category, and then further identify cats, dogs, bicycles or cars in the animal category and the vehicle category, wherein the animal category and the vehicle category belong to the same classification level, which can be called the major classification level; cats and dogs under the animal category and bicycles and cars under the vehicle category belong to another classification level, which can be called the sub-class classification level. The image classification method provided in the present application can obtain the classification results of the image to be classified at each classification level, and obtain the hierarchical classification results of the image to be classified. For example, the hierarchical classification results of the image to be classified belonging to the animal category and the cat image can be obtained.

本申请提供的图像分类方法，可以应用到包含有图像的各种类别的内容审核与内容理解任务当中，例如对于视频类别的业务，可以采用视频抽帧的策略，实现对视频业务的内容审核与理解当中。The image classification method provided in this application can be applied to content review and content understanding tasks involving various categories of images. For example, for video-related services, a video frame extraction strategy can be adopted to achieve content review and understanding of video services.

具体的，本申请提供的图像分类方法，可以由终端110或者服务器120单独执行，也可以由终端110和服务器120配合执行。Specifically, the image classification method provided in the present application may be executed by the terminal 110 or the server 120 alone, or may be executed by the terminal 110 and the server 120 in cooperation with each other.

首先，以终端110单独执行为例进行说明，终端110可以获取待分类图像，将该待分类图像具有的至少两种图像特征，对应输入至至少两个图像分类器；其中，该至少两个图像分类器可以预先配置在终端110上，与至少两个分类层次分别对应，而输入至相邻分类层次对应的图像分类器的图像特征之间，具有相似性约束关系，用于降低图像特征之间的相似性；最后，终端110可以根据获取图像分类器输出的待分类图像在相应分类层次上的分类结果，根据该分类结果获取待分类图像的层次化分类结果。First, taking the execution of terminal 110 alone as an example, the terminal 110 can obtain an image to be classified, and input at least two image features of the image to be classified into at least two image classifiers; wherein the at least two image classifiers can be pre-configured on the terminal 110, corresponding to at least two classification levels respectively, and there is a similarity constraint relationship between the image features input to the image classifiers corresponding to adjacent classification levels, which is used to reduce the similarity between the image features; finally, the terminal 110 can obtain the hierarchical classification result of the image to be classified according to the classification result of the image to be classified output by the image classifier at the corresponding classification level.

本申请提供的图像分类方法也可以由终端110和服务器120配合执行，具体的，终端110可以获取待分类图像，将该待分类图像发送至服务器120，服务器120将该待分类图像具有的至少两种图像特征，对应输入至至少两个图像分类器；其中，该至少两个图像分类器可以预先配置在服务器120上，与至少两个分类层次分别对应，而输入至相邻分类层次对应的图像分类器的图像特征之间，具有相似性约束关系，用于降低图像特征之间的相似性；然后服务器120可以将各图像分类器输出的待分类图像在相应分类层次上的分类结果发送给终端110，终端110可以根据该分类结果获取待分类图像的层次化分类结果。The image classification method provided in the present application can also be executed by the terminal 110 and the server 120 in cooperation. Specifically, the terminal 110 can obtain the image to be classified and send the image to be classified to the server 120. The server 120 inputs the at least two image features of the image to be classified into at least two image classifiers; wherein the at least two image classifiers can be pre-configured on the server 120, corresponding to at least two classification levels respectively, and the image features input to the image classifiers corresponding to adjacent classification levels have a similarity constraint relationship, which is used to reduce the similarity between the image features; then the server 120 can send the classification results of the image to be classified output by each image classifier at the corresponding classification level to the terminal 110, and the terminal 110 can obtain the hierarchical classification result of the image to be classified based on the classification result.

在一个实施例中，如图3所示，图3为一个实施例中图像分类方法的流程示意图，提供了一种图像分类方法，以该方法应用于图1中的终端110为例进行说明，包括以下步骤：In one embodiment, as shown in FIG. 3 , FIG. 3 is a flow chart of an image classification method in one embodiment, which provides an image classification method, and takes the method applied to the terminal 110 in FIG. 1 as an example for explanation, including the following steps:

步骤S301，获取待分类图像；Step S301, obtaining an image to be classified;

本步骤中，终端110可以获取待分类图像。该待分类图像可以是终端110通过例如摄像头等图像采集设备拍摄得到的图像，也可以是预存与终端110的电子图库中的图像，待分类图像中可以包括例如动物、植物等对象。具体的，终端110可以通过其配置的摄像头实时拍摄猫的图像，该猫的图像可以作为待分类图像。In this step, the terminal 110 may obtain an image to be classified. The image to be classified may be an image captured by the terminal 110 through an image acquisition device such as a camera, or may be an image pre-stored in an electronic gallery of the terminal 110. The image to be classified may include objects such as animals and plants. Specifically, the terminal 110 may capture an image of a cat in real time through a camera configured therein, and the image of the cat may be used as the image to be classified.

步骤S302，将待分类图像具有的至少两种图像特征，对应输入至至少两个图像分类器；Step S302, inputting at least two image features of the image to be classified into at least two image classifiers accordingly;

本步骤，终端110可以将待分类图像所具有的至少两种图像特征，对应输入到至少两个图像分类器中。其中，该至少两个图像分类器可以预先配置在终端110上，该至少两个图像分类器与至少两个分类层次分别对应，即不同的图像分类器对应于不同的分类层次，结合图2进行说明，在层次化分类任务中，终端110可以配置例如两个图像分类器，一个图像分类器用于在“动物、交通工具”的这一层次上进行分类，另一个图像分类器用于在“动物下属的猫和狗等、交通工具下属的自行车和汽车等”的这一层次上进行分类。In this step, the terminal 110 may input at least two image features of the image to be classified into at least two image classifiers. The at least two image classifiers may be pre-configured on the terminal 110, and the at least two image classifiers correspond to at least two classification levels, that is, different image classifiers correspond to different classification levels. In conjunction with FIG. 2, in a hierarchical classification task, the terminal 110 may be configured with, for example, two image classifiers, one image classifier for classification at the level of "animals, vehicles", and the other image classifier for classification at the level of "cats and dogs under animals, bicycles and cars under vehicles".

进一步的，输入到相邻分类层次对应的图像分类器的图像特征之间，具有相似性约束关系，该相似性约束关系用于降低该输入到相邻分类层次对应的图像分类器的图像特征之间的相似性。具体的，以三级分类任务为例进行说明，设第一级分类层次为：植物或者动物；第二级分类层次，以动物为例：哺乳动物或者爬行动物；第三级分类层次，以爬行动物为例：蜥蜴或者蛇。这种情况下，需添加两个相似性约束关系，第一个相似性约束关系施加在输入到第一、第二级分类层次对应的图像分类器的图像特征之间，第二个相似性约束关系施加在第二、第三级分类层次对应的图像分类器的图像特征之间。其中，图像特征通常可以用向量进行表示，由此可以通过降低向量间的相似性来对向量施加相似性约束关系，以降低输入至相邻分类层次对应的图像分类器的图像特征之间的相似性。Furthermore, there is a similarity constraint relationship between the image features input to the image classifiers corresponding to the adjacent classification levels, and the similarity constraint relationship is used to reduce the similarity between the image features input to the image classifiers corresponding to the adjacent classification levels. Specifically, taking a three-level classification task as an example, let the first level of classification be: plants or animals; the second level of classification, taking animals as an example: mammals or reptiles; the third level of classification, taking reptiles as an example: lizards or snakes. In this case, two similarity constraints need to be added, the first similarity constraint relationship is applied between the image features input to the image classifiers corresponding to the first and second level classification levels, and the second similarity constraint relationship is applied between the image features input to the image classifiers corresponding to the second and third level classification levels. Among them, image features can usually be represented by vectors, so that similarity constraints can be applied to vectors by reducing the similarity between vectors to reduce the similarity between image features input to the image classifiers corresponding to the adjacent classification levels.

示例性的，该种相似性约束关系可以包括互信息约束关系或者正交约束关系。其中，对于互信息约束关系，可以通过求取输入到相邻分类层次对应的图像分类器的图像特征之间的互信息，以使两种图像特征之间的互信息最小的方式降低两种图像特征之间的相似性。同理，对于正交约束关系，可以通过使得输入到相邻分类层次对应的图像分类器的图像特征相互正交的方式，来降低两种图像特征之间的相似性。Exemplarily, the similarity constraint relationship may include a mutual information constraint relationship or an orthogonal constraint relationship. For the mutual information constraint relationship, the mutual information between the image features input to the image classifiers corresponding to the adjacent classification levels may be obtained to reduce the similarity between the two image features in a manner that minimizes the mutual information between the two image features. Similarly, for the orthogonal constraint relationship, the similarity between the two image features may be reduced by making the image features input to the image classifiers corresponding to the adjacent classification levels orthogonal to each other.

通过如上方式，终端110可以将输入到让相邻分类层次对应的图像分类器的两种图像特征解耦，从而可剔除不同种图像特征之间的相似部分，使得两种图像特征分别对应于图像上的不同特点，进而使得图像分类器能够基于输入的前述解耦的图像特征关注到图像上的不同类型的特点，利用更加适应于相应分类层次的不同图像特征，更好准确地在相应分类层次上对待分类图像进行分类，同时完成多个分类层次的分类任务。In the above manner, the terminal 110 can decouple the two image features input to the image classifier corresponding to adjacent classification levels, thereby eliminating the similar parts between different types of image features, so that the two image features correspond to different characteristics on the image respectively, and then enable the image classifier to pay attention to different types of characteristics on the image based on the aforementioned decoupled image features of the input, and use different image features that are more suitable for the corresponding classification level to better and accurately classify the image to be classified at the corresponding classification level, and complete the classification tasks of multiple classification levels at the same time.

步骤S303，根据图像分类器输出的待分类图像在相应分类层次上的分类结果，获取待分类图像的层次化分类结果。Step S303, obtaining a hierarchical classification result of the image to be classified according to the classification result of the image to be classified at the corresponding classification level output by the image classifier.

本步骤中，终端110可以获取各图像分类器输出的分类结果，其中，该分类结果可以包括待分类图像在各个分类层次上的分类结果，例如具有三个图像分类器的情况下，终端110可以获取这三个图像分类器分别输出的，对应于三个分类层次的分类结果，终端110可以将该三个分类结果作为待分类图像的层次化分类结果，也可以从中选取其中一个分类层次的分类结果作为其所需要的层次化分类结果，由此，终端110可以同时完成对待分类图像在多个分类层次上的分类任务。In this step, the terminal 110 can obtain the classification results output by each image classifier, wherein the classification results may include the classification results of the image to be classified at each classification level. For example, in the case of three image classifiers, the terminal 110 can obtain the classification results corresponding to the three classification levels output by the three image classifiers respectively. The terminal 110 can use the three classification results as the hierarchical classification results of the image to be classified, or select the classification result of one of the classification levels as the required hierarchical classification result. Thus, the terminal 110 can simultaneously complete the classification task of the image to be classified at multiple classification levels.

上述图像分类方法，终端110获取待分类图像并将该待分类图像所具有的至少两种图像特征，对应输入到至少两个图像分类器，该至少两个图像分类器与至少两个分类层次分别对应，而输入到相邻分类层次对应的图像分类器的图像特征之间，具有相似性约束关系，用于降低该图像特征之间的相似性；然后终端110可以根据图像分类器输出的待分类图像在相应分类层次上的分类结果得到该待分类图像的层次化分类结果。该方案通过在图像特征之间施加相似性约束关系，由此可尽可能降低输入到相邻分类层次对应的图像分类器的图像特征之间的相似性，使得不同分类层次对应的图像分类器能够关注到同一待分类图像上的不同图像特征，并根据相应的图像特征在各自的分类层次上对图像进行分类，提高对图像进行层次化分类的准确性，可同时完成待分类图像在多个分类层次的分类任务。In the above-mentioned image classification method, the terminal 110 obtains the image to be classified and inputs at least two image features of the image to be classified into at least two image classifiers, the at least two image classifiers correspond to at least two classification levels respectively, and the image features input into the image classifiers corresponding to the adjacent classification levels have a similarity constraint relationship, which is used to reduce the similarity between the image features; then the terminal 110 can obtain the hierarchical classification result of the image to be classified according to the classification result of the image to be classified output by the image classifier at the corresponding classification level. This scheme imposes a similarity constraint relationship between image features, thereby reducing the similarity between the image features input into the image classifiers corresponding to the adjacent classification levels as much as possible, so that the image classifiers corresponding to different classification levels can pay attention to different image features on the same image to be classified, and classify the image at their respective classification levels according to the corresponding image features, thereby improving the accuracy of hierarchical classification of images, and can simultaneously complete the classification task of the image to be classified at multiple classification levels.

在一个实施例中，步骤S302中的将待分类图像具有的至少两种图像特征，对应输入至至少两个图像分类器，可以包括：In one embodiment, the step S302 of inputting the at least two image features of the image to be classified into at least two image classifiers respectively may include:

通过预先构建的特征提取器获取至少两种图像特征，并对应输入至至少两个图像分类器。At least two image features are obtained through a pre-built feature extractor and are input into at least two image classifiers accordingly.

本实施例中，终端110可以利用预先构建的特征提取器，从待分类图像中获取至少两种图像特征，并输入到前述至少两个图像分类器。其中，该特征提取器是与该至少两个图像分类器基于前述相似性约束关系构建得到的。该特征提取器可以基于神经网络模型实现。终端110可以将待分类图像输入到基于神经网络模型的特征提取器，将该特征提取器的最后一个卷积层输出的图像特征划分成为前述至少两种图像特征，并对应输入到至至少两个图像分类器。In this embodiment, the terminal 110 can use a pre-built feature extractor to obtain at least two image features from the image to be classified, and input them into the aforementioned at least two image classifiers. The feature extractor is constructed based on the aforementioned similarity constraint relationship with the at least two image classifiers. The feature extractor can be implemented based on a neural network model. The terminal 110 can input the image to be classified into a feature extractor based on a neural network model, divide the image features output by the last convolution layer of the feature extractor into the aforementioned at least two image features, and input them into the at least two image classifiers accordingly.

本实施例的方案，可以由终端110通过预先与至少两个图像分类器，在前述相似性约束关系下配合训练得到的特征提取器，对待分类图像进行特征提取，得到能够应用于各图像分类器进行分类的多种图像特征，无需每次进行图像时都需要重新计算图像特征之间的相似性，提高图像分类效率。In the scheme of this embodiment, the terminal 110 can extract features of the image to be classified by pre-cooperating with at least two image classifiers and the feature extractor trained under the aforementioned similarity constraint relationship, so as to obtain a variety of image features that can be applied to each image classifier for classification. There is no need to recalculate the similarity between image features each time the image is processed, thereby improving the image classification efficiency.

在一个实施例中，进一步的，上述特征提取器可以进一步包括特征抽取网络和编码器；上述通过预先构建的特征提取器获取至少两种图像特征的步骤，具体可以包括：In one embodiment, further, the feature extractor may further include a feature extraction network and an encoder; the step of obtaining at least two image features through the pre-built feature extractor may specifically include:

将待分类图像输入至特征抽取网络，得到特征抽取网络输出的初始图像特征；将初始图像特征输入至编码器，得到编码器输出的编码后的初始图像特征；基于编码后的初始图像特征，获取至少两种图像特征。The image to be classified is input into a feature extraction network to obtain initial image features output by the feature extraction network; the initial image features are input into an encoder to obtain encoded initial image features output by the encoder; and at least two image features are obtained based on the encoded initial image features.

本实施例中，特征提取器可以进一步包括特征抽取网络和编码器。其中，特征抽取网络和编码器可以基于神经网络模型实现，例如ResNet残差网络模型。In this embodiment, the feature extractor may further include a feature extraction network and an encoder, wherein the feature extraction network and the encoder may be implemented based on a neural network model, such as a ResNet residual network model.

特征抽取网络可以用于初步获取待分类图像的图像特征，而特征抽取网络所获取的图像特征往往维度比较高，且含有冗余信息。因此，终端110可以先将待分类图像输入到特征抽取网络，得到特征抽取网络输出的初始图像特征，然后终端110进一步将该初始图像特征输入到编码器，其中，终端110使用编码器可以将图像特征映射到编码特征，这可以用于降低特征维度，起到去除初始图像特征中的冗余信息的作用。从而，终端110将编码器输出的编码后的初始图像特征，并基于编码后的初始图像特征，获取至少两种图像特征。其中，该特征抽取网络、编码器以及至少连个分类器都是基于前述相似性约束关系进行训练得到的，采用本实施例的方案，终端110可以在对图像进行分类时，直接使用训练好的特征抽取网络进行初始图像特征的获取，利用编码器进行特征降维后，可以将该编码器编码后的初始图像特征划分为至少两种图像特征，且输入到相邻分类层次对应的图像分类器的图像特征之间具有约束关系，提高图像分类的效率和准确度。The feature extraction network can be used to preliminarily obtain the image features of the image to be classified, and the image features obtained by the feature extraction network are often of high dimension and contain redundant information. Therefore, the terminal 110 can first input the image to be classified into the feature extraction network to obtain the initial image features output by the feature extraction network, and then the terminal 110 further inputs the initial image features into the encoder, wherein the terminal 110 uses the encoder to map the image features to the coding features, which can be used to reduce the feature dimension and play a role in removing the redundant information in the initial image features. Thus, the terminal 110 obtains at least two image features based on the encoded initial image features output by the encoder. Among them, the feature extraction network, encoder and at least two classifiers are trained based on the aforementioned similarity constraint relationship. Using the solution of this embodiment, the terminal 110 can directly use the trained feature extraction network to obtain the initial image features when classifying the image. After using the encoder to perform feature dimensionality reduction, the initial image features encoded by the encoder can be divided into at least two image features, and there is a constraint relationship between the image features input into the image classifiers corresponding to adjacent classification levels, thereby improving the efficiency and accuracy of image classification.

在一个实施例中，如图4所示，图4为一个实施例中构建图像分类器的步骤的流程示意图，在通过预先构建的特征提取器获取至少两种图像特征之前，可以通过如下步骤构建特征提取器和图像分类器：In one embodiment, as shown in FIG. 4 , FIG. 4 is a flow chart of steps for constructing an image classifier in one embodiment. Before obtaining at least two image features through a pre-constructed feature extractor, the feature extractor and the image classifier may be constructed through the following steps:

步骤S401，获取样本图像，以及获取样本图像在至少两个分类层次上的分类标签，作为至少两个图像分类器的真实分类标签；Step S401, obtaining a sample image and obtaining classification labels of the sample image at at least two classification levels as true classification labels of at least two image classifiers;

本步骤中，终端110可以获取样本图像，该样本图像的数量一般是多个。终端110还需要获取样本图像对应的分类标签，该分类标签包括该样本图像在各分类层次上的分类标签。结合图2进行说明，终端110可以获取一张猫图像作为样本图像，终端110还需要获取该猫图像对应的两个分类层次上的分类标签，即“动物”、“猫”。进一步的，终端110可以将样本图像在至少两个分类层次上的分类标签，作为至少两个图像分类器的真实分类标签，用于对图像分类器、特征提取器进行训练。In this step, the terminal 110 can obtain sample images, and the number of the sample images is generally multiple. The terminal 110 also needs to obtain the classification label corresponding to the sample image, and the classification label includes the classification label of the sample image at each classification level. In conjunction with Figure 2, the terminal 110 can obtain a cat image as a sample image, and the terminal 110 also needs to obtain the classification labels at two classification levels corresponding to the cat image, namely, "animal" and "cat". Furthermore, the terminal 110 can use the classification labels of the sample image at at least two classification levels as the true classification labels of at least two image classifiers for training the image classifiers and feature extractors.

步骤S402，将样本图像输入至特征提取器，根据特征提取器输出的样本图像的图像特征，获取维度相同的至少两种样本图像特征。Step S402: input the sample image into a feature extractor, and obtain at least two sample image features of the same dimension according to the image features of the sample image output by the feature extractor.

参考图5，图5为一个实施例中图像分类的原理示意图，终端110将样本图像输入至特征提取器，特征提取器输出该样本图像的图像特征，终端110进一步将该样本图像特征划分为维度相同的至少两种样本图像特征。具体的，终端110可以将样本图像特征划分为维度相同的图像特征A和图像特征B。即假设样本图像特征的维度是2d，则终端110将该样本图像特征拆分为维度分别是1d的图像特征A和图像特征B。示例性的，假设获取的特征提取器输出的样本图像的图像特征维度是2048维的向量，拆分过程可以是把2048维中前1/2维向量，即第0维至第1023维向量对应于图像特征A，把后1/2维向量，即第1024维向量至第2047维向量对应于图像特征B。Referring to FIG. 5 , FIG. 5 is a schematic diagram of the principle of image classification in one embodiment. The terminal 110 inputs the sample image into the feature extractor, and the feature extractor outputs the image features of the sample image. The terminal 110 further divides the sample image features into at least two sample image features of the same dimension. Specifically, the terminal 110 can divide the sample image features into image features A and image features B of the same dimension. That is, assuming that the dimension of the sample image features is 2d, the terminal 110 splits the sample image features into image features A and image features B of 1d dimensions respectively. Exemplarily, assuming that the image feature dimension of the sample image output by the acquired feature extractor is a 2048-dimensional vector, the splitting process can be to correspond the first 1/2-dimensional vector in the 2048-dimensional vector, that is, the 0th to 1023rd dimension vectors to the image feature A, and the last 1/2-dimensional vector, that is, the 1024th to 2047th dimension vectors to the image feature B.

在一个实施例中，步骤S402中的将样本图像输入至特征提取器，具体可以包括：将样本图像进行预处理，得到图像尺寸为预设图像尺寸的样本图像；将该预设图像尺寸的样本图像输入至特征提取器。In one embodiment, inputting the sample image into the feature extractor in step S402 may specifically include: preprocessing the sample image to obtain a sample image with an image size of a preset image size; and inputting the sample image with the preset image size into the feature extractor.

本实施例主要是终端110可以在将样本图像输入到特征提取器之前，先对其进行预处理，可以通过将图像进行缩放的方式将样本图像的图像尺寸调节至预设图像尺寸，从而将预设图像尺寸的样本图像输入至特征提取器。具体的，例如图像分类器、特征提取器等模型训练时通常需要固定图像尺寸的图像，因此本实施例可以将任意图像尺寸的样本图像缩放为例如256×256的图像尺寸的图像，然后从中随机裁剪出224×224大小的图像作为预设图形尺寸的样本图像进行训练。The present embodiment mainly involves that the terminal 110 can pre-process the sample image before inputting it into the feature extractor, and can adjust the image size of the sample image to a preset image size by scaling the image, thereby inputting the sample image of the preset image size into the feature extractor. Specifically, for example, when training models such as image classifiers and feature extractors, images of a fixed image size are usually required. Therefore, the present embodiment can scale sample images of any image size to, for example, an image of a 256×256 image size, and then randomly crop an image of 224×224 from it as a sample image of a preset graphic size for training.

步骤S403，将至少两种样本图像特征分别输入至至少两个图像分类器，获取至少两个图像分类器输出的样本图像在相应分类层次上的预测分类标签；Step S403, inputting at least two sample image features into at least two image classifiers respectively, and obtaining predicted classification labels of the sample images output by the at least two image classifiers at corresponding classification levels;

本步骤中，参考图5，终端110可以将图像特征A输入至图像分类器A，将图像特征B输入至图像分类器B。其中，图像分类器A和图像分类器B用于将待分类图像在不同的分类层次上进行分类。在模型构建的过程当中，图像分类器A可以根据输入的图像特征A对分类结果进行预测，得到预测分类标签A，同理，图像分类器B可以根据输入的图像特征B得到预测分类标签B。该预测分类标签可以对应于在相应的分类层次上，属于某一类别的概率值。具体可结合图2进行说明，预测分类标签可以是在“动物、交通工具”这一分类层次上，样本图像属于动物或者交通工具的概率值。In this step, referring to FIG5 , the terminal 110 may input the image feature A into the image classifier A, and input the image feature B into the image classifier B. Among them, the image classifier A and the image classifier B are used to classify the images to be classified at different classification levels. In the process of model construction, the image classifier A may predict the classification result according to the input image feature A, and obtain the predicted classification label A. Similarly, the image classifier B may obtain the predicted classification label B according to the input image feature B. The predicted classification label may correspond to the probability value of belonging to a certain category at the corresponding classification level. Specifically, it can be explained in conjunction with FIG2 , and the predicted classification label may be the probability value of the sample image belonging to an animal or a vehicle at the classification level of “animals, vehicles”.

步骤S404，构建输入至相邻分类层次对应的图像分类器的样本图像特征之间的相似性约束关系；Step S404, constructing a similarity constraint relationship between sample image features input to image classifiers corresponding to adjacent classification levels;

本步骤中，终端110构建输入至相邻分类层次对应的图像分类器的样本图像特征之间的相似性约束关系。In this step, the terminal 110 constructs a similarity constraint relationship between sample image features input to image classifiers corresponding to adjacent classification levels.

步骤S405，基于真实分类标签、预测分类标签以及相似性约束关系对特征提取器和至少两个图像分类器进行训练，构建特征提取器和至少两个图像分类器。Step S405 , training a feature extractor and at least two image classifiers based on the real classification labels, the predicted classification labels, and the similarity constraint relationship, to construct the feature extractor and at least two image classifiers.

本实施例的技术方案，终端110可以基于样本图像的真实分类标签、预测分类标签以及相似性约束关系，对特征提取器和至少两个图像分类器进行联合训练，构建特征提取器以及至少两个图像分类器，以使得训练好的特征提取器能够从待分类图像中获取具有相似性约束关系的至少两种图像特征，以输入到前述至少两个图像分类器当中进行分类，实现对待分类图像的快速、精准分类。According to the technical solution of this embodiment, the terminal 110 can jointly train the feature extractor and at least two image classifiers based on the real classification labels, predicted classification labels and similarity constraint relationships of the sample images, and construct a feature extractor and at least two image classifiers, so that the trained feature extractor can obtain at least two image features with a similarity constraint relationship from the image to be classified, so as to input them into the aforementioned at least two image classifiers for classification, thereby realizing fast and accurate classification of the image to be classified.

在一个实施例中，如图6所示，图6为一个实施例中获取样本图像特征的步骤的流程示意图，特征提取器可以包括特征抽取网络和编码器；上述步骤S402中的将样本图像输入至特征提取器，根据特征提取器输出的样本图像的图像特征，获取维度相同的至少两种样本图像特征的步骤，可以包括：In one embodiment, as shown in FIG. 6 , FIG. 6 is a flow chart of the steps of obtaining sample image features in one embodiment, the feature extractor may include a feature extraction network and an encoder; the step of inputting the sample image into the feature extractor in the above step S402, and obtaining at least two sample image features of the same dimension according to the image features of the sample image output by the feature extractor, may include:

步骤S601，将样本图像输入至特征抽取网络，得到特征抽取网络输出的初始样本图像特征；Step S601, inputting the sample image into the feature extraction network to obtain the initial sample image features output by the feature extraction network;

步骤S602，将初始样本图像特征输入至编码器，得到编码器输出的编码后的初始样本图像特征；Step S602, inputting the initial sample image features into the encoder to obtain the encoded initial sample image features output by the encoder;

步骤S603，将初始样本图像特征拆分为维度相同的至少两种样本图像特征。Step S603: split the initial sample image features into at least two sample image features with the same dimension.

本实施例主要是终端110可以基于特征提取器包括的特征抽取网络和编码器，获取前述至少两种样本图像特征。参考图5，图5为一个实施例中图像分类的原理示意图，终端110可以先将样本图像输入至特征抽取网络，特征抽取网络可以用于从该样本图像中初步获取图像特征，终端110获取该特征抽取网络输出的初始样本图像特征，如上实施例所述，特征抽取网络所获取的图像特征往往含有冗余信息且特征维度比较高，从而终端110进一步将该初始样本图像特征输入至编码器，该编码器可以用于将图像特征映射到编码特征，以降低特征抽取网络所获取的初始样本图像特征的特征维度，并去除该初始样本图像特征中的冗余信息，最后终端110将编码器输出的编码后的初始样本图像特征拆分为前述维度相同的至少两种样本图像特征。采用本实施例的方案，可将特征抽取网络、编码器以及至少两个图像分类器一起，基于相似性约束关系进行模型训练，使得训练后的特征抽取网络、编码器以及至少两个图像分类器作为一个图像分类工具的整体，能够快速且精确地对待分类图像进行分类。The present embodiment mainly involves that the terminal 110 can obtain the aforementioned at least two sample image features based on the feature extraction network and encoder included in the feature extractor. Referring to FIG5 , FIG5 is a schematic diagram of the principle of image classification in one embodiment, the terminal 110 can first input the sample image into the feature extraction network, the feature extraction network can be used to preliminarily obtain image features from the sample image, the terminal 110 obtains the initial sample image features output by the feature extraction network, as described in the above embodiment, the image features obtained by the feature extraction network often contain redundant information and the feature dimension is relatively high, so the terminal 110 further inputs the initial sample image features into the encoder, the encoder can be used to map the image features to the encoding features, so as to reduce the feature dimension of the initial sample image features obtained by the feature extraction network, and remove the redundant information in the initial sample image features, and finally the terminal 110 splits the encoded initial sample image features output by the encoder into the aforementioned at least two sample image features of the same dimension. By adopting the solution of this embodiment, the feature extraction network, the encoder and at least two image classifiers can be trained together based on the similarity constraint relationship, so that the trained feature extraction network, the encoder and at least two image classifiers can be used as a whole as an image classification tool to quickly and accurately classify the images to be classified.

在一个实施例中，上述步骤S405中的基于真实分类标签、预测分类标签以及相似性约束关系对特征提取器和至少两个图像分类器进行训练的步骤，可以包括：In one embodiment, the step of training the feature extractor and at least two image classifiers based on the real classification labels, the predicted classification labels and the similarity constraint relationship in the above step S405 may include:

根据真实分类标签和预测分类标签，构建两个分类层次对应的第一损失函数，得到至少两个第一损失函数；根据相似性约束关系，构建第二损失函数；基于至少两个第一损失函数以及第二损失函数，对特征提取器和至少两个图像分类器进行训练，以使得至少两个第一损失函数以及第二损失函数最大化。According to the true classification labels and the predicted classification labels, first loss functions corresponding to the two classification levels are constructed to obtain at least two first loss functions; according to the similarity constraint relationship, a second loss function is constructed; based on the at least two first loss functions and the second loss function, a feature extractor and at least two image classifiers are trained to maximize the at least two first loss functions and the second loss function.

本实施例提供了一种对特征提取器和至少两个图像分类器进行训练的具体方式。具体的，终端110可以基于真实分类标签和预测分类标签构建第一损失函数，其中，该第一损失函数可以包括多个，分别与不同的分类层次相对应，例如有三个分类层次的情况下，第一损失函数包括三个。另外，终端110还根据相似性约束关系，构建第二损失函数，即通过对输入至相邻分类层次对应的图像分类器的图像特征之间的相似性约束关系，构建第二损失函数。相应的，如果分类层次有两个，则构建的第二损失函数的数量为一个，而如果有三个分类层次，则构建的第二损失函数的数量为两个。由此，终端110利用该至少两个第一损失函数以及第二损失函数，对特征提取器和至少两个图像分类器进行训练，以使得至少两个第一损失函数以及第二损失函数最大化。This embodiment provides a specific method for training a feature extractor and at least two image classifiers. Specifically, the terminal 110 can construct a first loss function based on the real classification label and the predicted classification label, wherein the first loss function may include multiple ones, corresponding to different classification levels respectively. For example, when there are three classification levels, the first loss function includes three. In addition, the terminal 110 also constructs a second loss function based on the similarity constraint relationship, that is, the second loss function is constructed by the similarity constraint relationship between the image features input to the image classifiers corresponding to the adjacent classification levels. Correspondingly, if there are two classification levels, the number of constructed second loss functions is one, and if there are three classification levels, the number of constructed second loss functions is two. Thus, the terminal 110 uses the at least two first loss functions and the second loss function to train the feature extractor and the at least two image classifiers so as to maximize the at least two first loss functions and the second loss function.

具体的，以两个分类层次的图像分类任务进行说明，对应于大类别分类和子类别分类，其中，设其大类别标签为y_super，子类别标签为y_sub，则对应于该两个分类层次的第一损失函数分别为：Specifically, an image classification task with two classification levels is used for illustration, corresponding to large category classification and subcategory classification. Assume that the large category label is y _super and the subcategory label is y _sub . The first loss functions corresponding to the two classification levels are:

其中，L_super表示大类别的第一损失函数，L_sub表示子类别的第一损失函数，C_super表示大类别(super category)类别总量，C_sub表示子类别(sub category)类别总量，表示大类别的真实分类标签，表示子类别的真实分类标签，表示大类别的预测分类标签，即待分类图像在该大类别下属于类别i的概率值，友示子类别的预测分类标签，即待分类图像在该子类别下属于类别i的概率值。Among them, L _super represents the first loss function of the super category, L _sub represents the first loss function of the sub category, C _super represents the total amount of the super category, and C _sub represents the total amount of the sub category. Represents the true classification label of the large category, represents the true classification label of the subcategory, Represents the predicted classification label of the large category, that is, the probability value of the image to be classified belonging to category i under the large category, The predicted classification label of the subcategory is the probability value that the image to be classified belongs to category i under this subcategory.

另外，设输入到该大类别和子类别的图像分类器的图像特征分别为E_α(x)和E_β(x)。其中，对该两个图像特征施加的相似性约束关系可以是互信息约束关系或者正交约束关系。以施加互信息约束为例，对应的第二损失函数为：In addition, suppose the image features input to the image classifiers of the large category and the subcategory are E _α (x) and E _β (x) respectively. The similarity constraint relationship imposed on the two image features can be a mutual information constraint relationship or an orthogonal constraint relationship. Taking the imposition of mutual information constraint as an example, the corresponding second loss function is:

其中，L_mul表示第二损失函数，其中互信息约束中的r代表梯度反转层(gradientreversal layer)，其作用是在网络反向传播梯度的时候将梯度乘以-1，即“反转梯度”。其中，⊙代表逐元素相乘运算。注意到其取值范围可以达到负无穷，而为了避免图像分类器和特征提取器学习到这种情况，对于解耦之后的特征E_α(x)与E_β(x)，可以使用L2归一化(即L2Normalization)限制特征的取值范围。采用这样的方式，只有当两个图像特征完全一样，即E_α(x)＝＝E_β(x)时，第二损失函数L_mul取得最小值-1，而当两个图像特征完全正交、互不相同时，第二损失函数L_mul取得最大值0。由于使用的是梯度反转层，最小化互信息损失经过梯度反转之后等价于最大化第二损失函数，即要求两个图像特征尽可能地减少相似程度。Among them, L _mul represents the second loss function, and the r in the mutual information constraint represents the gradient reversal layer, which multiplies the gradient by -1 when the network back-propagates the gradient, that is, "reversing the gradient". Among them, ⊙ represents the element-by-element multiplication operation. Note that its value range can reach negative infinity. In order to avoid the image classifier and feature extractor from learning this situation, for the decoupled features E _α (x) and E _β (x), L2 normalization (i.e., L2Normalization) can be used to limit the value range of the features. In this way, only when the two image features are exactly the same, that is, E _α (x) == E _β (x), the second loss function L _mul obtains the minimum value -1, and when the two image features are completely orthogonal and different from each other, the second loss function L _mul obtains the maximum value 0. Since the gradient reversal layer is used, minimizing the mutual information loss after gradient reversal is equivalent to maximizing the second loss function, that is, requiring the two image features to reduce the degree of similarity as much as possible.

最后，两个第一损失函数以及一个第二损失函数一起，基于L＝L_super+L_sub+L_mul最大化协同训练整个网络，包括对特征提取器和至少两个图像分类器进行训练，而当特征提取器包括特征抽取网络和编码器时，则是将特征抽取网络和编码器以及至少两个图像分类器一起训练。Finally, the two first loss functions and the second loss function together coordinately train the entire network based on L=L _super +L _sub +L _mul maximization, including training the feature extractor and at least two image classifiers. When the feature extractor includes a feature extraction network and an encoder, the feature extraction network, the encoder and at least two image classifiers are trained together.

将待分类图像发送至服务器，以使服务器将待分类图像具有的至少两种图像特征对应输入至至少两个图像分类器，得到图像分类器输出的待分类图像在相应分类层次上的分类结果；接收服务器得到的分类结果。The image to be classified is sent to the server, so that the server inputs at least two image features of the image to be classified into at least two image classifiers to obtain the classification results of the image to be classified at the corresponding classification level output by the image classifier; and the classification results obtained by the server are received.

参考图1，本实施例主要是图像分类处理的过程可以交由服务器120进行处理。具体的，终端110可以获取待分类图像，然后将该待分类图像发送给服务器120，该服务器120上可以预先配置有上述至少两个图像分类器。服务器120在接收到待分类图像后，将待分类图像具有的至少两种图像特征对应输入至至少两个图像分类器，得到图像分类器输出的待分类图像在相应分类层次上的分类结果，然后将该分类结果发送给终端110，终端110接收服务器120发送的分类结果。Referring to FIG1 , the main process of image classification processing in this embodiment can be handled by the server 120. Specifically, the terminal 110 can obtain the image to be classified, and then send the image to be classified to the server 120, and the server 120 can be pre-configured with the above-mentioned at least two image classifiers. After receiving the image to be classified, the server 120 inputs the at least two image features of the image to be classified into at least two image classifiers, obtains the classification result of the image to be classified output by the image classifier at the corresponding classification level, and then sends the classification result to the terminal 110, and the terminal 110 receives the classification result sent by the server 120.

采用本实施例的技术方案，终端110可以将图像分类处理的任务转移至服务器120来进行处理，以减轻终端110的数据处理压力。By adopting the technical solution of this embodiment, the terminal 110 can transfer the task of image classification processing to the server 120 for processing, so as to reduce the data processing pressure of the terminal 110.

在一个实施例中，如图7所示，图7为一个实施例中展示图像信息的界面示意图，步骤S303的根据图像分类器输出的待分类图像在相应分类层次上的分类结果，获取待分类图像的层次化分类结果之后，还可以包括如下步骤：In one embodiment, as shown in FIG. 7 , which is a schematic diagram of an interface for displaying image information in one embodiment, after obtaining the hierarchical classification result of the image to be classified at the corresponding classification level according to the classification result of the image to be classified output by the image classifier in step S303, the following steps may also be included:

获取携带层次化分类结果的图像分类信息；将图像分类信息显示在待分类图像上。Obtain image classification information carrying hierarchical classification results; and display the image classification information on the image to be classified.

本实施例主要是终端110可以将待分类图像的分类结果直接显示在待分类图像上。参考图7，终端110上可以展示有待分类图像700，终端110在获取到该待分类图像700的层次化分类结果后，可以将携带该层次化分类结果的图像分类信息展示在信息展示区域710中。其中，待分类图像700的层次化分类结果可以包括该待分类图像的大类分类结果A1以及子类分类结果B2。具体的，假设该待分类图像700是猫图像，则终端110所展示的图像分类信息可以包括大类分类结果：动物；子类分类结果：猫。采用本实施例的技术方案，可以将层次化分类结果叠加显示于待分类图像上，提高层次化分类结果的展示效率。The main feature of this embodiment is that the terminal 110 can directly display the classification result of the image to be classified on the image to be classified. Referring to FIG7 , the image to be classified 700 can be displayed on the terminal 110. After the terminal 110 obtains the hierarchical classification result of the image to be classified 700, it can display the image classification information carrying the hierarchical classification result in the information display area 710. Among them, the hierarchical classification result of the image to be classified 700 may include the major classification result A1 and the sub-class classification result B2 of the image to be classified. Specifically, assuming that the image to be classified 700 is a cat image, the image classification information displayed by the terminal 110 may include the major classification result: animal; sub-class classification result: cat. By adopting the technical solution of this embodiment, the hierarchical classification result can be superimposed and displayed on the image to be classified, thereby improving the display efficiency of the hierarchical classification result.

为了更清晰阐明本申请提供的技术方案，结合图8对图像分类的原理进行详细说明，图8为一个应用实例中图像分类的原理示意图。In order to more clearly illustrate the technical solution provided by the present application, the principle of image classification is described in detail in conjunction with FIG8 , which is a schematic diagram of the principle of image classification in an application example.

总体来说，输入的图像(x)可以是一幅任意大小的图像，而模型的训练(包括对特征抽取网络、编码器以及大类、子类分类器的训练)一般需要采用固定图像尺寸的图像，因此可以先将任意图像尺寸的图像调整为256×256的图像尺寸，然后从中随机裁剪出一幅224×224图像尺寸的图像作为待处理图像。接着，可以使用特征抽取网络提取图像特征f(x)。然后，使用一个编码器将图片特征映射到编码特征E(x)，编码特征可以看作解耦之前的图像特征。然后将该编码特征划分为两部分特征，对应两部分解耦特征。其中，第一部分特征E_α(x)用于训练大类分类器，第二部分特征E_β(x)用于训练子类分类器，同时，在该两个解耦特征之间施加互信息约束，降低两个解耦特征之间的相似性。需要说明的是，从待处理图像中提取的，输入至大类分类器和子类分类器的两部分特征之间，可以不具有任何关联特性，即只需为大类、子类分类器提供待处理图像，无需预先对该图像的图像特征做特殊处理，将待处理图像输入至模型即可自行完成对包括特征抽取网络、编码器以及大类、子类分类器等模型的训练，基于训练后的模型，输入待分类图像即可实现对该图像的层次化分类。In general, the input image (x) can be an image of any size, and the training of the model (including the training of the feature extraction network, encoder, and major and sub-category classifiers) generally requires the use of images of fixed image size. Therefore, the image of any image size can be first adjusted to an image size of 256×256, and then a 224×224 image size can be randomly cropped from it as the image to be processed. Next, the image features f(x) can be extracted using the feature extraction network. Then, an encoder is used to map the image features to the encoding features E(x), which can be regarded as the image features before decoupling. The encoding features are then divided into two parts of features, corresponding to two parts of decoupled features. Among them, the first part of the features E _α (x) is used to train the major classifier, and the second part of the features E _β (x) is used to train the sub-category classifier. At the same time, a mutual information constraint is imposed between the two decoupled features to reduce the similarity between the two decoupled features. It should be noted that the two parts of features extracted from the image to be processed and input into the major category classifier and the subcategory classifier may not have any correlation characteristics, that is, it is only necessary to provide the image to be processed for the major category and subcategory classifiers, and there is no need to do special processing on the image features of the image in advance. By inputting the image to be processed into the model, the training of models including feature extraction network, encoder, major category and subcategory classifier can be completed automatically. Based on the trained model, the hierarchical classification of the image can be realized by inputting the image to be classified.

具体的，假设输入图像为x，其大类标签为y_super，子类标签为y_sub，使用一个特征抽取网络抽取图像特征，特征抽取网络的不作具体限定，例如可以采用各种神经网络模型。一般来讲，可以采用神经网络中最后一个卷积层的输出作为图片抽取特征f(x)。接着，使用一个编码器将图像抽取特征f(x)映射为编码特征E(x)，该编码特征的维度是2d。编码器的结构可以采用单层的全连接层(fully connected layer)。然后，将2d维度的编码特征E(x)拆分为相同维度的两部分：E(x)→[E_α(x)；E_β(x)]。其中，特征E_α(x)与E_β(x)特征的都是1d维特征，分别被用于大类别分类与子类别分类。相应的损失函数分别为：Specifically, assuming that the input image is x, its major category label is y _super , and its subcategory label is y _sub , a feature extraction network is used to extract image features. The feature extraction network is not specifically limited, for example, various neural network models can be used. Generally speaking, the output of the last convolutional layer in the neural network can be used as the image extraction feature f(x). Next, an encoder is used to map the image extraction feature f(x) to the encoding feature E(x), and the dimension of the encoding feature is 2d. The structure of the encoder can use a single-layer fully connected layer. Then, the 2d-dimensional encoding feature E(x) is split into two parts of the same dimension: E(x)→[E _α (x); E _β (x)]. Among them, the features E _α (x) and E _β (x) are both 1d-dimensional features, which are used for major category classification and subcategory classification, respectively. The corresponding loss functions are:

其中，L_super表示大类别的损失函数，L_sub表示子类别的损失函数，C_super表示大类别(super category)类别总量，C_sub表示子类别(sub category)类别总量，表示大类别的真实分类标签，表示子类别的真实分类标签，表示大类别的预测分类标签，即图像在该大类别下属于类别i的概率值，表示子类别的预测分类标签，即图像在该子类别下属于类别i的概率值。Among them, L _super represents the loss function of the super category, L _sub represents the loss function of the sub category, C _super represents the total number of super categories, and C _sub represents the total number of sub categories. Represents the true classification label of the large category, represents the true classification label of the subcategory, Represents the predicted classification label of the large category, that is, the probability value of the image belonging to category i under the large category, Represents the predicted classification label of the subcategory, that is, the probability value of the image belonging to category i under this subcategory.

另外，为了保证特征E_α(x)与E_β(x)之间能够学习到尽可能不同的图像特征，在该两个图像特征之间施加一个互信息约束关系： In addition, in order to ensure that the features E _α (x) and E _β (x) can learn as different image features as possible, a mutual information constraint is imposed between the two image features:

其中，L_mul表示互信息损失函数，其中互信息约束中的r代表梯度反转层(gradientreversal layer)，其作用是在网络反向传播梯度的时候将梯度乘以-1，即“反转梯度”。其中，⊙代表逐元素相乘运算。注意到其取值范围可以达到负无穷，而为了避免网络在训练时学习到这种情况，对于解耦之后的特征E_α(x)与E_β(x)，可以使用L2归一化(即L2Normalization)限制特征的取值范围。Among them, L _mul represents the mutual information loss function, and the r in the mutual information constraint represents the gradient reversal layer, which multiplies the gradient by -1 when the network backpropagates the gradient, that is, "reversing the gradient". Among them, ⊙ represents the element-by-element multiplication operation. Note that its value range can reach negative infinity. In order to avoid the network learning this situation during training, for the decoupled features E _α (x) and E _β (x), L2 normalization (i.e. L2Normalization) can be used to limit the value range of the features.

在这种情况之下，只有当两个图像特征完全一样，即E_α(x)＝＝E_β(x)时，互信息损失函数取得最小值-1，而当两个图像特征完全正交、互不相同时，第二损失函数L_mul取得最大值0。而由于使用的是梯度反转层，最小化互信息损失经过梯度反转之后等价于最大化互信息损失函数，即要求两个图像特征尽可能地减少相似程度。最后，即可利用该三个损失函数一起协同训练整个网络(包括对特征抽取网络、编码器以及大类、子类分类器)：L＝L_super+L_sub+L_mul。In this case, only when the two image features are exactly the same, that is, E _α (x) == E _β (x), the mutual information loss function takes the minimum value -1, and when the two image features are completely orthogonal and different from each other, the second loss function L _mul takes the maximum value 0. Since the gradient reversal layer is used, minimizing the mutual information loss is equivalent to maximizing the mutual information loss function after gradient reversal, that is, requiring the two image features to reduce the similarity as much as possible. Finally, the three loss functions can be used to jointly train the entire network (including the feature extraction network, encoder, and major and sub-category classifiers): L = L _super + L _sub + L _mul .

上述应用实例提供的技术方案，可以将图像所具有的图像特征解耦为适应于大类别分类和子类别分类的两部分图像特征，同时使用互信息约束降低该两部分图像特征之间的相似程度，尽可能使它们关注到图像中不同的特点，更好地完成层次化分类任务。The technical solution provided by the above application example can decouple the image features of the image into two parts of image features suitable for large category classification and subcategory classification, and at the same time use mutual information constraints to reduce the similarity between the two parts of image features, so that they can focus on different characteristics in the image as much as possible, and better complete the hierarchical classification task.

应该理解的是，虽然图3至6的流程图中的各个步骤按照箭头的指示依次显示，但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，这些步骤可以以其它的顺序执行。而且，图3至6中的至少一部分步骤可以包括多个步骤或者多个阶段，这些步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，这些步骤或者阶段的执行顺序也不必然是依次进行，而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that, although the various steps in the flow charts of Figures 3 to 6 are displayed in sequence according to the indication of the arrows, these steps are not necessarily executed in sequence according to the order indicated by the arrows. Unless there is a clear explanation in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least a portion of the steps in Figures 3 to 6 may include multiple steps or multiple stages, and these steps or stages are not necessarily executed at the same time, but can be executed at different times, and the execution order of these steps or stages is not necessarily to be carried out in sequence, but can be executed in turn or alternately with other steps or at least a portion of the steps or stages in other steps.

在一个实施例中，如图9所示，图9为一个实施例中图像分类装置的结构框图，提供了一种图像分类装置，该装置可以采用软件模块或硬件模块，或者是二者的结合成为计算机设备的一部分，该装置900具体包括：In one embodiment, as shown in FIG. 9 , FIG. 9 is a structural block diagram of an image classification device in one embodiment, which provides an image classification device. The device may adopt a software module or a hardware module, or a combination of the two to form a part of a computer device. The device 900 specifically includes:

图像获取模块901，用于获取待分类图像；An image acquisition module 901 is used to acquire an image to be classified;

特征输入模块902，用于将待分类图像具有的至少两种图像特征，对应输入至至少两个图像分类器；至少两个图像分类器与至少两个分类层次分别对应；输入至相邻分类层次对应的图像分类器的图像特征之间，具有相似性约束关系，用于降低图像特征之间的相似性；The feature input module 902 is used to input at least two image features of the image to be classified into at least two image classifiers; the at least two image classifiers correspond to at least two classification levels respectively; the image features input to the image classifiers corresponding to adjacent classification levels have a similarity constraint relationship, which is used to reduce the similarity between the image features;

结果获取模块903，用于根据图像分类器输出的待分类图像在相应分类层次上的分类结果，获取待分类图像的层次化分类结果。The result acquisition module 903 is used to acquire the hierarchical classification result of the image to be classified according to the classification result of the image to be classified at the corresponding classification level output by the image classifier.

在一个实施例中，特征输入模块902，进一步用于通过预先构建的特征提取器获取至少两种图像特征，并对应输入至至少两个图像分类器；特征提取器和至少两个图像分类器是基于相似性约束关系构建得到的。In one embodiment, the feature input module 902 is further used to obtain at least two image features through a pre-built feature extractor, and input them into at least two image classifiers accordingly; the feature extractor and at least two image classifiers are constructed based on a similarity constraint relationship.

在一个实施例中，特征提取器包括特征抽取网络和编码器；特征输入模块902，进一步用于将待分类图像输入至特征抽取网络，得到特征抽取网络输出的初始图像特征；将初始图像特征输入至编码器，得到编码器输出的编码后的初始图像特征；基于编码后的初始图像特征，获取至少两种图像特征。In one embodiment, the feature extractor includes a feature extraction network and an encoder; the feature input module 902 is further used to input the image to be classified into the feature extraction network to obtain the initial image features output by the feature extraction network; input the initial image features into the encoder to obtain the encoded initial image features output by the encoder; based on the encoded initial image features, at least two image features are obtained.

在一个实施例中，装置900，还可以包括：In one embodiment, the apparatus 900 may further include:

分类器构建模块，用于获取样本图像，以及获取样本图像在至少两个分类层次上的分类标签，作为至少两个图像分类器的真实分类标签；将样本图像输入至特征提取器，根据特征提取器输出的样本图像的图像特征，获取维度相同的至少两种样本图像特征；将至少两种样本图像特征分别输入至至少两个图像分类器，获取至少两个图像分类器输出的样本图像在相应分类层次上的预测分类标签；构建输入至相邻分类层次对应的图像分类器的样本图像特征之间的相似性约束关系；基于真实分类标签、预测分类标签以及相似性约束关系对特征提取器和至少两个图像分类器进行训练，构建特征提取器和至少两个图像分类器。A classifier construction module is used to obtain a sample image and obtain classification labels of the sample image at at least two classification levels as real classification labels for at least two image classifiers; input the sample image into a feature extractor, and obtain at least two sample image features of the same dimension based on the image features of the sample image output by the feature extractor; input the at least two sample image features into at least two image classifiers respectively, and obtain predicted classification labels of the sample image output by at least two image classifiers at corresponding classification levels; construct a similarity constraint relationship between sample image features input into image classifiers corresponding to adjacent classification levels; train the feature extractor and at least two image classifiers based on the real classification labels, predicted classification labels and similarity constraint relationships, and construct the feature extractor and at least two image classifiers.

在一个实施例中，特征提取器包括特征抽取网络和编码器；分类器构建模块，进一步用于：将样本图像输入至特征抽取网络，得到特征抽取网络输出的初始样本图像特征；将初始样本图像特征输入至编码器，得到编码器输出的编码后的初始样本图像特征；将初始样本图像特征拆分为维度相同的至少两种样本图像特征。In one embodiment, the feature extractor includes a feature extraction network and an encoder; a classifier construction module is further used to: input the sample image into the feature extraction network to obtain the initial sample image features output by the feature extraction network; input the initial sample image features into the encoder to obtain the encoded initial sample image features output by the encoder; split the initial sample image features into at least two sample image features of the same dimension.

在一个实施例中，分类器构建模块，进一步用于：根据真实分类标签和预测分类标签，构建两个分类层次对应的第一损失函数，得到至少两个第一损失函数；根据相似性约束关系，构建第二损失函数；基于至少两个第一损失函数以及第二损失函数，对特征提取器和至少两个图像分类器进行训练，以使得至少两个第一损失函数以及第二损失函数最大化。In one embodiment, the classifier construction module is further used to: construct first loss functions corresponding to two classification levels according to the true classification labels and the predicted classification labels to obtain at least two first loss functions; construct a second loss function according to the similarity constraint relationship; based on the at least two first loss functions and the second loss function, train the feature extractor and at least two image classifiers to maximize the at least two first loss functions and the second loss function.

在一个实施例中，相似性约束关系包括互信息约束关系或者正交约束关系。In one embodiment, the similarity constraint relationship includes a mutual information constraint relationship or an orthogonality constraint relationship.

在一个实施例中，分类器构建模块，进一步用于：将样本图像进行预处理，得到图像尺寸为预设图像尺寸的样本图像；将预设图像尺寸的样本图像输入至特征提取器。In one embodiment, the classifier construction module is further used to: pre-process the sample image to obtain a sample image with an image size of a preset image size; and input the sample image with the preset image size into the feature extractor.

信息显示模块，用于获取携带层次化分类结果的图像分类信息；将图像分类信息显示在待分类图像上。The information display module is used to obtain image classification information carrying hierarchical classification results; and display the image classification information on the image to be classified.

在一个实施例中，特征输入模块902，进一步用于将待分类图像发送至服务器，以使服务器将待分类图像具有的至少两种图像特征对应输入至至少两个图像分类器，得到图像分类器输出的待分类图像在相应分类层次上的分类结果；接收服务器得到的分类结果。In one embodiment, the feature input module 902 is further used to send the image to be classified to the server, so that the server inputs at least two image features of the image to be classified into at least two image classifiers to obtain the classification results of the image to be classified output by the image classifier at the corresponding classification level; and receives the classification results obtained by the server.

关于图像分类装置的具体限定可以参见上文中对于图像分类方法的限定，在此不再赘述。上述图像分类装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中，也可以以软件形式存储于计算机设备中的存储器中，以便于处理器调用执行以上各个模块对应的操作。For the specific definition of the image classification device, please refer to the definition of the image classification method above, which will not be repeated here. Each module in the above-mentioned image classification device can be implemented in whole or in part by software, hardware and a combination thereof. The above-mentioned modules can be embedded in or independent of the processor in the computer device in the form of hardware, or can be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

在一个实施例中，提供了一种计算机设备，该计算机设备可以是终端，其内部结构图可以如图10所示，图10为一个实施例中计算机设备的内部结构图。该计算机设备包括通过系统总线连接的处理器、存储器、通信接口、显示屏和输入装置。其中，该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的通信接口用于与外部的终端进行有线或无线方式的通信，无线方式可通过WIFI、运营商网络、NFC(近场通信)或其他技术实现。该计算机程序被处理器执行时以实现一种图像分类方法。该计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏，该计算机设备的输入装置可以是显示屏上覆盖的触摸层，也可以是计算机设备外壳上设置的按键、轨迹球或触控板，还可以是外接的键盘、触控板或鼠标等。In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be shown in FIG10, which is an internal structure diagram of a computer device in one embodiment. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected via a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The communication interface of the computer device is used to communicate with an external terminal in a wired or wireless manner, and the wireless manner can be implemented through WIFI, an operator network, NFC (near field communication) or other technologies. When the computer program is executed by the processor, an image classification method is implemented. The display screen of the computer device may be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer device may be a touch layer covered on the display screen, or a key, trackball or touchpad provided on the housing of the computer device, or an external keyboard, touchpad or mouse, etc.

本领域技术人员可以理解，图10中示出的结构，仅仅是与本申请方案相关的部分结构的框图，并不构成对本申请方案所应用于其上的计算机设备的限定，具体的计算机设备可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。Those skilled in the art will understand that the structure shown in FIG. 10 is merely a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may include more or fewer components than shown in the figure, or combine certain components, or have a different arrangement of components.

在一个实施例中，还提供了一种计算机设备，包括存储器和处理器，存储器中存储有计算机程序，该处理器执行计算机程序时实现上述各方法实施例中的步骤。In one embodiment, a computer device is further provided, including a memory and a processor, wherein a computer program is stored in the memory, and the processor implements the steps in the above method embodiments when executing the computer program.

在一个实施例中，提供了一种计算机可读存储介质，存储有计算机程序，该计算机程序被处理器执行时实现上述各方法实施例中的步骤。In one embodiment, a computer-readable storage medium is provided, storing a computer program, which implements the steps in the above method embodiments when executed by a processor.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一非易失性计算机可读取存储介质中，该计算机程序在执行时，可包括如上述各方法的实施例的流程。其中，本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用，均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-Only Memory，ROM)、磁带、软盘、闪存或光存储器等。易失性存储器可包括随机存取存储器(Random Access Memory，RAM)或外部高速缓冲存储器。作为说明而非局限，RAM可以是多种形式，比如静态随机存取存储器(Static Random Access Memory，SRAM)或动态随机存取存储器(Dynamic Random Access Memory，DRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be completed by instructing the relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage medium. When the computer program is executed, it can include the processes of the embodiments of the above-mentioned methods. Among them, any reference to memory, storage, database or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory or optical memory, etc. Volatile memory can include random access memory (RAM) or external cache memory. As an illustration and not limitation, RAM can be in various forms, such as static random access memory (SRAM) or dynamic random access memory (DRAM).

以上实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above embodiments may be combined arbitrarily. To make the description concise, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation methods of the present application, and the descriptions thereof are relatively specific and detailed, but they cannot be understood as limiting the scope of the invention patent. It should be pointed out that, for a person of ordinary skill in the art, several variations and improvements can be made without departing from the concept of the present application, and these all belong to the protection scope of the present application. Therefore, the protection scope of the patent of the present application shall be subject to the attached claims.

Claims

1. A method of classifying images, the method comprising:

acquiring a sample image and a classification label of large-class classification and a classification label of sub-class classification of the sample image, wherein the classification label and the classification label of the sub-class classification are respectively used as a real classification label of an image classifier of the large-class classification and a real classification label of the sub-class classification;

Inputting the sample image to a feature extraction network of a feature extractor to obtain initial sample image features output by the feature extraction network; inputting the initial sample image characteristics to an encoder of the characteristic extractor to obtain encoded initial sample image characteristics output by the encoder; splitting the initial sample image features into first partial features and second partial features with the same dimensions;

Inputting the first partial features and the second partial features into the image classifier of the large category classification and the image classifier of the sub-category classification respectively, and acquiring a prediction classification label of a sample image output by the image classifier of the large category classification in the large category classification and a prediction classification label of a sample image output by the image classifier of the sub-category classification in the sub-category classification;

Constructing a similarity constraint relationship between the first partial feature and the second partial feature;

constructing first loss functions respectively corresponding to the large category classification and the sub-category classification according to the real classification label and the prediction classification label to obtain two first loss functions; constructing a second loss function according to the similarity constraint relation; the similarity constraint relationship is a mutual information constraint relationship, and the second loss function is ，Representing the second loss function in question,Representing the gradient inversion layer,Representing an element-by-element multiplication operation,A first partial characteristic is indicated and is indicated,Representing a second partial feature;

training the feature extractor and the two image classifiers based on the two first loss functions and the second loss function such that the two first loss functions and the second loss function are maximized, constructing the feature extractor, the large class classified image classifier, and the sub-class classified image classifier;

acquiring an image to be classified;

Inputting the image to be classified into a feature extraction network of the pre-constructed feature extractor to obtain initial image features output by the feature extraction network, inputting the initial image features into an encoder of the feature extractor, and performing coding mapping on the initial image features through the encoder to obtain coded features with redundant information removed; the feature dimension of the coding feature is lower than the feature dimension of the initial image feature; dividing the coding features into two image features with the same dimension;

correspondingly inputting the two image features into two image classifiers; the two image classifiers are respectively corresponding to the two classification levels of the large category classification and the sub-category classification;

And acquiring a hierarchical classification result of the image to be classified according to the classification result of the image to be classified, which is output by the image classifier, on the corresponding classification level.

2. The method of claim 1, wherein the feature extractor is implemented based on a neural network model.

3. The method of claim 1, wherein the inputting the sample image to a feature extraction network of a feature extractor comprises:

preprocessing the sample image to obtain a sample image with an image size being a preset image size;

and inputting the sample image with the preset image size into a feature extraction network of the feature extractor.

4. The method according to claim 1, wherein the step of obtaining the hierarchical classification result of the image to be classified according to the classification result of the image to be classified on the corresponding classification level output by the image classifier comprises:

acquiring image classification information carrying the hierarchical classification result;

and displaying the image classification information on the image to be classified.

5. The method of claim 1, wherein the inputting the two image features into the two image classifiers comprises:

The image to be classified is sent to a server, so that the server correspondingly inputs the two image features of the image to be classified to the two image classifiers to obtain a classification result of the image to be classified, which is output by the image classifier, on a corresponding classification level;

and receiving the classification result obtained by the server.

6. An image classification apparatus, the apparatus comprising:

The classifier construction module is used for acquiring a sample image and a classification label of large class classification and a classification label of sub class classification of the sample image, and respectively used as a real classification label of the image classifier of the large class classification and a real classification label of the sub class classification; inputting the sample image to a feature extraction network of a feature extractor to obtain initial sample image features output by the feature extraction network; inputting the initial sample image characteristics to an encoder of the characteristic extractor to obtain encoded initial sample image characteristics output by the encoder; splitting the initial sample image features into first partial features and second partial features with the same dimensions; inputting the first partial features and the second partial features into the image classifier of the large category classification and the image classifier of the sub-category classification respectively, and acquiring a prediction classification label of a sample image output by the image classifier of the large category classification in the large category classification and a prediction classification label of a sample image output by the image classifier of the sub-category classification in the sub-category classification; constructing a similarity constraint relationship between the first partial feature and the second partial feature; constructing first loss functions respectively corresponding to the large category classification and the sub-category classification according to the real classification label and the prediction classification label to obtain two first loss functions; constructing a second loss function according to the similarity constraint relation; the similarity constraint relationship is a mutual information constraint relationship, and the second loss function is ，Representing the second loss function in question,Representing the gradient inversion layer,Representing an element-by-element multiplication operation,A first partial characteristic is indicated and is indicated,Representing a second partial feature;

the image acquisition module is used for acquiring images to be classified;

The feature input module is used for inputting the image to be classified into a feature extraction network of the feature extractor, obtaining initial image features output by the feature extraction network, inputting the initial image features into an encoder of the feature extractor, and performing coding mapping on the initial image features through the encoder to obtain coded features with redundant information removed; the feature dimension of the coding feature is lower than the feature dimension of the initial image feature; dividing the coding features into two image features with the same dimension; correspondingly inputting the two image features into two image classifiers; the two image classifiers are respectively corresponding to the two classification levels of the large category classification and the sub-category classification;

the result acquisition module is used for acquiring the hierarchical classification result of the image to be classified according to the classification result of the image to be classified on the corresponding classification hierarchy, which is output by the image classifier.

7. The apparatus of claim 6, wherein the feature extractor is implemented based on a neural network model.

8. The apparatus of claim 6, wherein the classifier construction module is further to:

9. The apparatus of claim 6, wherein the apparatus further comprises:

the information display module is used for acquiring image classification information carrying the hierarchical classification result;

10. The apparatus of claim 6, wherein the feature input module is further configured to:

and receiving the classification result obtained by the server.

11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 5 when the computer program is executed.

12. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 5.