CN114550236B

CN114550236B - Training method, device, equipment and storage medium for image recognition and model thereof

Info

Publication number: CN114550236B
Application number: CN202210082039.7A
Authority: CN
Inventors: 杨馥魁; 温圣召; 韩钧宇
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-01-24
Filing date: 2022-01-24
Publication date: 2023-08-15
Anticipated expiration: 2042-01-24
Also published as: CN114550236A

Abstract

The disclosure provides a training method, device, equipment and storage medium for image recognition and a model thereof, relates to the technical field of artificial intelligence, in particular to the technical field of deep learning and computer vision, and can be applied to scenes such as face recognition and face image recognition. The image recognition method comprises the following steps: carrying out feature extraction processing on an image to obtain local features of the image, wherein the local features are used for expressing features in an area of the image; acquiring global features of the image, wherein the global features are used for expressing inter-region features of the image; and acquiring an image recognition result of the image based on the local feature and the global feature. The present disclosure can improve image recognition accuracy.

Description

Image recognition and its model training method, device, equipment and storage medium

技术领域technical field

本公开涉及人工智能技术领域，具体为深度学习、计算机视觉技术领域，可应用于人脸识别、人脸图像识别等场景下，尤其涉及一种图像识别及其模型的训练方法、装置、设备和存储介质。The present disclosure relates to the field of artificial intelligence technology, specifically the field of deep learning and computer vision technology, which can be applied to scenarios such as face recognition and face image recognition, and in particular relates to a training method, device, equipment, and training method for image recognition and its model. storage medium.

背景技术Background technique

人脸识别，是基于人的脸部特征信息进行身份识别的一种生物识别技术。通常，可以采用人脸识别模型，对输入的人脸图像进行识别，以获取人脸识别结果。Face recognition is a biometric technology for identification based on human facial feature information. Usually, a face recognition model can be used to recognize an input face image to obtain a face recognition result.

发明内容Contents of the invention

本公开提供了一种图像识别及其模型的训练方法、装置、设备和存储介质。The disclosure provides a training method, device, equipment and storage medium for image recognition and its model.

根据本公开的一方面，提供了一种图像识别方法，包括：对图像进行特征提取处理，以获取所述图像的局部特征，所述局部特征用于表达所述图像的区域内特征；获取所述图像的全局特征，所述全局特征用于表达所述图像的区域间特征；基于所述局部特征和所述全局特征，获取所述图像的图像识别结果。According to one aspect of the present disclosure, there is provided an image recognition method, including: performing feature extraction processing on the image to obtain local features of the image, the local features are used to express the features in the region of the image; obtaining the The global feature of the image, the global feature is used to express the inter-regional feature of the image; based on the local feature and the global feature, an image recognition result of the image is obtained.

根据本公开的另一方面，提供了一种图像识别模型的训练方法，包括：采用初始的图像识别模型，对输入的图像样本进行特征提取处理，以获取所述图像样本的局部特征，所述局部特征用于表达所述图像样本的区域内特征；获取所述图像样本的全局特征，所述全局特征用于表达所述图像样本的区域间特征；基于所述局部特征和所述全局特征，获取所述图像样本的预测识别结果；基于所述预测识别结果和所述图像样本对应的真实识别结果，构建损失函数；基于所述损失函数，调整所述初始的图像识别模型的参数，以生成最终的图像识别模型。According to another aspect of the present disclosure, a method for training an image recognition model is provided, including: using an initial image recognition model, performing feature extraction processing on an input image sample to obtain local features of the image sample, the The local feature is used to express the intra-regional feature of the image sample; the global feature of the image sample is obtained, and the global feature is used to express the inter-regional feature of the image sample; based on the local feature and the global feature, Acquiring the predicted recognition result of the image sample; constructing a loss function based on the predicted recognition result and the real recognition result corresponding to the image sample; based on the loss function, adjusting the parameters of the initial image recognition model to generate The final image recognition model.

根据本公开的另一方面，提供了一种图像识别装置，包括：第一获取模块，用于对图像进行特征提取处理，以获取所述图像的局部特征，所述局部特征用于表达所述图像的区域内特征；第二获取模块，用于获取所述图像的全局特征，所述全局特征用于表达所述图像的区域间特征；识别模块，用于基于所述局部特征和所述全局特征，获取所述图像的图像识别结果。According to another aspect of the present disclosure, an image recognition device is provided, including: a first acquisition module, configured to perform feature extraction processing on an image to acquire local features of the image, and the local features are used to express the Intra-regional features of the image; a second acquisition module, configured to acquire global features of the image, the global features used to express inter-regional features of the image; a recognition module, configured based on the local features and the global feature, to obtain the image recognition result of the image.

根据本公开的另一方面，提供了一种图像识别模型的训练装置，包括：第一获取模块，用于采用初始的图像识别模型，对输入的图像样本进行特征提取处理，以获取所述图像样本的局部特征，所述局部特征用于表达所述图像样本的区域内特征；第二获取模块，用于获取所述图像样本的全局特征，所述全局特征用于表达所述图像样本的区域间特征；预测模块，用于基于所述局部特征和所述全局特征，获取所述图像样本的预测识别结果；构建模块，用于基于所述预测识别结果和所述图像样本对应的真实识别结果，构建损失函数；生成模块，用于基于所述损失函数，调整所述初始的图像识别模型的参数，以生成最终的图像识别模型。According to another aspect of the present disclosure, an image recognition model training device is provided, including: a first acquisition module, configured to use an initial image recognition model to perform feature extraction processing on input image samples to acquire the image A local feature of the sample, the local feature is used to express the feature in the region of the image sample; a second acquisition module is used to obtain the global feature of the image sample, the global feature is used to express the region of the image sample Inter-features; a prediction module, used to obtain a predicted recognition result of the image sample based on the local feature and the global feature; a building module, used to obtain a real recognition result corresponding to the predicted recognition result and the image sample based on the local feature and the global feature , constructing a loss function; a generating module, configured to adjust parameters of the initial image recognition model based on the loss function, so as to generate a final image recognition model.

根据本公开的另一方面，提供了一种电子设备，包括：至少一个处理器；以及与所述至少一个处理器通信连接的存储器；其中，所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器能够执行如上述任一方面的任一项所述的方法。According to another aspect of the present disclosure, there is provided an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; Executable instructions, the instructions are executed by the at least one processor, so that the at least one processor can perform the method according to any one of the above aspects.

根据本公开的另一方面，提供了一种存储有计算机指令的非瞬时计算机可读存储介质，其中，所述计算机指令用于使所述计算机执行根据上述任一方面的任一项所述的方法。According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the method according to any one of the above-mentioned aspects. method.

根据本公开的另一方面，提供了一种计算机程序产品，包括计算机程序，所述计算机程序在被处理器执行时实现根据上述任一方面的任一项所述的方法。According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of the above aspects.

根据本公开的技术方案，可以提高图像识别精度。According to the technical solution of the present disclosure, the image recognition accuracy can be improved.

应当理解，本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征，也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will be readily understood through the following description.

附图说明Description of drawings

附图用于更好地理解本方案，不构成对本公开的限定。其中：The accompanying drawings are used to better understand the present solution, and do not constitute a limitation to the present disclosure. in:

图1是根据本公开第一实施例的示意图；FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;

图2是根据本公开第二实施例的示意图；FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;

图3是根据本公开第三实施例的示意图；Fig. 3 is a schematic diagram according to a third embodiment of the present disclosure;

图4是根据本公开第四实施例的示意图；FIG. 4 is a schematic diagram according to a fourth embodiment of the present disclosure;

图5是根据本公开第五实施例的示意图；FIG. 5 is a schematic diagram according to a fifth embodiment of the present disclosure;

图6是根据本公开第六实施例的示意图；FIG. 6 is a schematic diagram according to a sixth embodiment of the present disclosure;

图7是根据本公开第七实施例的示意图；FIG. 7 is a schematic diagram according to a seventh embodiment of the present disclosure;

图8是根据本公开第八实施例的示意图；FIG. 8 is a schematic diagram according to an eighth embodiment of the present disclosure;

图9是用来实现本公开实施例的图像识别方法或图像识别模型的训练方法的电子设备的示意图。FIG. 9 is a schematic diagram of an electronic device used to implement the image recognition method or the image recognition model training method of the embodiment of the present disclosure.

具体实施方式Detailed ways

以下结合附图对本公开的示范性实施例做出说明，其中包括本公开实施例的各种细节以助于理解，应当将它们认为仅仅是示范性的。因此，本领域普通技术人员应当认识到，可以对这里描述的实施例做出各种改变和修改，而不会背离本公开的范围和精神。同样，为了清楚和简明，以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

图1是根据本公开第一实施例的示意图。本实施例提供一种图像识别方法，该方法包括：FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure. This embodiment provides an image recognition method, the method comprising:

101、对图像进行特征提取处理，以获取所述图像的局部特征，所述局部特征用于表达所述图像的各个区域的区域内特征。101. Perform feature extraction processing on an image to acquire local features of the image, where the local features are used to express intra-regional features of each region of the image.

102、获取所述图像的全局特征，所述全局特征用于表达所述各个区域的区域间特征。102. Acquire global features of the image, where the global features are used to express inter-regional features of the respective regions.

103、基于所述局部特征和所述全局特征，获取所述图像的图像识别结果。103. Acquire an image recognition result of the image based on the local feature and the global feature.

其中，本实施例的执行主体可以为图像识别装置，图像识别装置可以位于电子设备内，电子设备可以为用户终端或者服务器，用户终端可以包括：个人电脑(PersonalComputer、PC)、移动设备、智能家居设备、智能家居设备、可穿戴式设备等，移动设备比如包括手机、便携式电脑、平板电脑等，智能家居设备比如包括智能音箱、智能电视等，可穿戴式设备比如包括智能手表、智能眼镜等。服务器可以为本地服务器或者云端服务器等。Wherein, the executor of this embodiment may be an image recognition device, the image recognition device may be located in an electronic device, the electronic device may be a user terminal or a server, and the user terminal may include: a personal computer (PersonalComputer, PC), a mobile device, a smart home Devices, smart home devices, wearable devices, etc. Mobile devices include mobile phones, laptops, tablets, etc. Smart home devices include smart speakers, smart TVs, etc. Wearable devices include smart watches, smart glasses, etc. The server may be a local server or a cloud server.

具体地，以图像识别为人脸识别为例，如图2所示，用户可以在移动设备(比如，手机)上安装可以进行人脸识别的应用(Application，APP)201，APP 201可以通过移动设备上的人脸采集装置(比如，摄像头)采集人脸图像，之后，若移动设备本身具有人脸识别能力，则可以利用移动设备的处理器等对采集的人脸图像进行人脸识别，以获取人脸识别结果。或者，APP 201可以将人脸图像发送给服务端202，服务端202可以位于本地服务器中或者云端。APP 201和服务端202可以采用通信网络进行数据传输。服务端202可以对接收的人脸图像进行人脸识别，以获取人脸识别结果，并将人脸识别结果反馈给APP 201。Specifically, taking the image recognition as face recognition as an example, as shown in FIG. The face acquisition device (such as a camera) on the mobile device collects face images, and then, if the mobile device itself has face recognition capabilities, it can use the processor of the mobile device to perform face recognition on the collected face images to obtain Face recognition results. Alternatively, the APP 201 can send the face image to the server 202, and the server 202 can be located in a local server or in the cloud. The APP 201 and the server 202 may use a communication network for data transmission. The server 202 can perform face recognition on the received face image to obtain a face recognition result, and feed back the face recognition result to the APP 201 .

本公开的技术方案中，所涉及的用户个人信息的收集、存储、使用、加工、传输、提供和公开等处理，均符合相关法律法规的规定，且不违背公序良俗。In the technical solution of this disclosure, the collection, storage, use, processing, transmission, provision, and disclosure of user personal information involved are all in compliance with relevant laws and regulations, and do not violate public order and good customs.

其中，图像识别时，可以先对图像进行特征提取处理，以获取所述图像的图像特征，该图像特征可以具体为特征图(feature map)。为了区分，由于此处是将图像作为整体进行的处理，可以认为此处提取的特征为区域内特征，可以称为局部特征。在特征为特征图时，局部特征可以称为局部特征图。相对应地，若一组特征能够表达图像的不同区域的区域间特征，该特征可以称为全局特征。Wherein, during image recognition, feature extraction processing may be performed on the image first to obtain image features of the image, and the image features may specifically be a feature map (feature map). In order to distinguish, since the image is processed here as a whole, the features extracted here can be considered as in-region features, which can be called local features. When the feature is a feature map, the local feature can be called a local feature map. Correspondingly, if a set of features can express the inter-regional features of different regions of the image, this feature can be called a global feature.

以图像为人脸图像，局部特征为局部特征图为例，如图2所示，可以将人脸图像输入到卷积神经网络(Convolutional Neural Networks，CNN)中，采用卷积神经网络对人脸图像进行特征提取处理，卷积神经网络的输出为输入的人脸图像对应的局部特征图。CNN的网络结构比如可以为VGG，或者，RetNet等。Taking the image as a face image and the local feature as a local feature map as an example, as shown in Figure 2, the face image can be input into a convolutional neural network (Convolutional Neural Networks, CNN), and the convolutional neural network is used to process the face image The feature extraction process is performed, and the output of the convolutional neural network is the local feature map corresponding to the input face image. The network structure of CNN can be, for example, VGG, or RetNet, etc.

图2以在服务端进行人脸识别为例，可以理解的是，也可以在用户终端本地进行人脸识别。其中，服务端提取局部特征图采用的CNN可以称为人脸识别模型，从而可以利用人脸识别模型获取局部特征图，进而对局部特征图进行后续处理，以获取人脸识别结果。FIG. 2 takes face recognition at the server as an example. It can be understood that face recognition may also be performed locally at the user terminal. Among them, the CNN used by the server to extract the local feature map can be called a face recognition model, so that the face recognition model can be used to obtain the local feature map, and then perform subsequent processing on the local feature map to obtain the face recognition result.

相关技术中，获取人脸图像的局部特征图后，一般直接采用该局部特征图获取人脸识别结果。即，人脸图像特征只是局部特征，由于仅反映了局部信息，导致这种人脸识别方式的精度不足。In related technologies, after obtaining a local feature map of a face image, the local feature map is generally directly used to obtain a face recognition result. That is, the face image feature is only a local feature, and since it only reflects local information, the accuracy of this face recognition method is insufficient.

本实施例中，不仅获取图像的局部特征，还获取图像的全局特征，其中，全局特征用于表达图像的区域间特征。由于局部特征只考虑区域内特征，可以认为是局部信息，全局特征考虑了区域间特征，可以认为是全局信息，因此，本实施例的图像特征包括局部特征和全局特征，反映了待识别图像的局部信息和全局信息，提高了图像特征的表达能力，进而可以提高图像识别精度。In this embodiment, not only the local features of the image are obtained, but also the global features of the image are obtained, wherein the global features are used to express inter-regional features of the image. Since the local feature only considers the features in the region, it can be considered as local information, and the global feature considers the inter-regional features, which can be considered as global information. Therefore, the image features in this embodiment include local features and global features, reflecting the characteristics of the image to be recognized. Local information and global information improve the expression ability of image features, which in turn can improve the accuracy of image recognition.

图3是根据本公开第三实施例的示意图。本实施例提供一种图像识别方法，本实施例以人脸识别为例，且采用人脸识别模型，获取人脸图像的局部特征图为例，本实施例的方法包括：FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure. This embodiment provides an image recognition method. This embodiment takes face recognition as an example, and uses a face recognition model to obtain a local feature map of a face image as an example. The method of this embodiment includes:

301、采用人脸识别模型，对输入的人脸图像进行特征提取处理，以输出所述人脸图像的局部特征图。301. Using a face recognition model, perform feature extraction processing on an input face image to output a local feature map of the face image.

其中，如图4所示，可以采用CNN模型，对人脸图像进行特征提取处理，以获取人脸图像的局部特征图。Wherein, as shown in FIG. 4 , the CNN model may be used to perform feature extraction processing on the face image to obtain a local feature map of the face image.

局部特征图F的维度为(W，H，C)。The dimensionality of the local feature map F is (W, H, C).

其中，W、H、C均为正整数，C为通道数，比如为128，W和H为每个通道上的局部特征图的宽和高所包含的像素点的个数，比如，W＝H＝112，即，每个通道上，局部特征图的宽和高均为112个像素点。Among them, W, H, and C are all positive integers, C is the number of channels, such as 128, W and H are the number of pixels contained in the width and height of the local feature map on each channel, for example, W= H=112, that is, on each channel, the width and height of the local feature map are both 112 pixels.

本公开实施例提供的图像识别方法可以应用于多种图像识别场景，比如，人脸识别场景、动植物种类识别等。The image recognition method provided by the embodiments of the present disclosure can be applied to various image recognition scenarios, such as face recognition scenarios, animal and plant species recognition, and the like.

以人脸识别场景为例，所述图像为人脸图像，所述对图像进行特征提取处理，以获取所述图像的局部特征图，包括：采用人脸识别模型，对输入的所述人脸图像进行特征提取处理，以输出所述人脸图像的局部特征图。Taking the face recognition scene as an example, the image is a face image, and performing feature extraction processing on the image to obtain a local feature map of the image includes: using a face recognition model to input the face image Perform feature extraction processing to output the local feature map of the face image.

即，获取局部特征图时，可以采用人脸识别模型获取，人脸识别模型可以为CNN模型，比如，VGG，RestNet等。That is, when obtaining the local feature map, a face recognition model can be used to obtain it, and the face recognition model can be a CNN model, for example, VGG, RestNet, etc.

由于人脸识别模型通常为深度神经网络模型，比如，上述的CNN模型，CNN模型具有准确度高等优点，从而可以提取出更准确的局部特征图。Since the face recognition model is usually a deep neural network model, such as the above-mentioned CNN model, the CNN model has the advantages of high accuracy, so that a more accurate local feature map can be extracted.

302、基于所述局部特征图，获取分块特征图。302. Based on the local feature map, acquire a block feature map.

其中，可以对所述局部特征图进行区域分块处理，以获取多个图像块；基于所述多个图像块，确定分块特征图，所述分块特征图包括所述多个图像块。Wherein, regional block processing may be performed on the local feature map to obtain multiple image blocks; based on the multiple image blocks, a block feature map is determined, and the block feature map includes the multiple image blocks.

参见图4，获取局部特征图F后，可以对局部特征图F进行区域分块处理。其中，可以预先设置每个图像块的大小，假设每个图像块的大小用w*h表示，w、h均为正整数，因此，图像块的个数N＝(W//w)*(H//h)。Referring to FIG. 4 , after the local feature map F is obtained, the local feature map F can be divided into regions. Wherein, the size of each image block can be set in advance, assuming that the size of each image block is represented by w*h, and both w and h are positive integers, therefore, the number of image blocks N=(W//w)*( H//h).

其中，w和h为预先设置的值，比如，w＝h＝2；*表示相乘；Wherein, w and h are preset values, for example, w=h=2; * represents multiplication;

//表示整除，在设置w和h时，可以选择能够被W和H整除的数值。//Indicates divisibility. When setting w and h, you can choose a value that can be divisible by W and H.

假设W＝H＝112，w＝h＝2，则可以分为N＝56*56个图像块。Assuming W=H=112, w=h=2, it can be divided into N=56*56 image blocks.

获取多个图像块后，可以基于多个图像块获取分块特征图T。After obtaining multiple image blocks, the block feature map T can be obtained based on the multiple image blocks.

参见图4，分块特征图T的维度为(N，P，C)，其中，N为图像块的个数，P为每个图像块的像素点的个数，即P＝w*h，C为通道数。基于上述示例，N＝56*56，P＝2*2，C＝128。Referring to Fig. 4, the dimension of block feature map T is (N, P, C), wherein, N is the number of image blocks, P is the number of pixels of each image block, that is, P=w*h, C is the number of channels. Based on the above example, N=56*56, P=2*2, C=128.

基于局部特征图获取分块特征图时，可以针对每个通道进行处理，从而局部特征图和分块特征图的通道数相同。针对每个通道进行处理时，可以将W*H的局部特征图，划分为大小为w*h的多个图像块，即，可以获取N＝(W//w)*(H//h)个图像块，每个图像块包括w*h个像素点，针对每个图像块，可以将这w*h个像素点转换为w*h维向量，比如，每个图像块是h行w列的矩阵，可以按照第一行、第二行直至第h行的顺序，依次排列各个行，每行包括w个元素，将其转换为w*h维向量，从而得到对应每个图像块的P维向量，由于存在N个图像块，所以，针对每个通道，可以获取N*P维的矩阵或者说图像。C个通道的N*P维的图像，可以组成维度为(N，P，C)的分块特征图。When the block feature map is obtained based on the local feature map, it can be processed for each channel, so that the local feature map and the block feature map have the same number of channels. When processing for each channel, the local feature map of W*H can be divided into multiple image blocks of size w*h, that is, N=(W//w)*(H//h) can be obtained image blocks, each image block includes w*h pixels, for each image block, these w*h pixels can be converted into w*h dimensional vectors, for example, each image block is h rows w columns The matrix can be arranged according to the order of the first row, the second row to the hth row, and each row includes w elements, which are converted into w*h dimensional vectors, so as to obtain P corresponding to each image block dimensional vector, since there are N image blocks, for each channel, an N*P dimensional matrix or image can be obtained. The N*P-dimensional image of C channels can form a block feature map with dimensions (N, P, C).

303、基于所述分块特征图，获取区域相似度矩阵。303. Obtain a region similarity matrix based on the block feature map.

其中，所述区域相似度矩阵用于表明所述多个图像块之间的相似度，即各个区域的区域间相似度。如图4所示，区域相似度矩阵M的维度为(N，N)。Wherein, the region similarity matrix is used to indicate the similarity among the plurality of image blocks, that is, the inter-regional similarity of each region. As shown in Figure 4, the dimension of the region similarity matrix M is (N, N).

其中，所述基于所述分块特征图，确定区域相似度矩阵，可以包括：Wherein, the determining the region similarity matrix based on the block feature map may include:

对所述分块特征图进行第一形状转换(reshape)处理，以获取第一矩阵；对所述分块特征图进行第二形状转换处理，以获取第二矩阵；将所述第一矩阵和所述第二矩阵的乘积，作为所述区域相似度矩阵；其中，所述第一矩阵的行数与所述第一矩阵的列数均为所述多个图像块的个数。Performing a first shape conversion (reshape) process on the block feature map to obtain a first matrix; performing a second shape reshape process on the block feature map to obtain a second matrix; combining the first matrix and The product of the second matrix is used as the region similarity matrix; wherein, the number of rows of the first matrix and the number of columns of the first matrix are both the number of the plurality of image blocks.

具体地，第一矩阵可以表示为：reshape(T，[N，P*C])；Specifically, the first matrix can be expressed as: reshape(T, [N, P*C]);

第二矩阵可以表示为：reshape(T，[P*C，N])；The second matrix can be expressed as: reshape(T, [P*C, N]);

以reshape(T，[N，P*C])，是指将维度为(N，P，C)的分块特征图T，形状转换为维度为(N，P*C)的矩阵，即行为N，列为P*C的矩阵。关于形状转换，可以采用相关技术实现，比如，针对T的每行，可以对各个通道的像素点的元素值进行拼接，得到第一矩阵的每行的P*C个元素，按行进行类似处理后，可以获取N行P*C列的第一矩阵。第二矩阵的获取过程类似第一矩阵的获取过程，从而可以获取行为P*C，列为N的第二矩阵。Taking reshape(T, [N, P*C]) refers to converting the block feature map T with dimensions (N, P, C) into a matrix with dimensions (N, P*C), that is, the behavior N, column is a matrix of P*C. Regarding the shape conversion, it can be realized by related technologies. For example, for each row of T, the element values of the pixel points of each channel can be spliced to obtain P*C elements of each row of the first matrix, and similar processing can be performed by row After that, the first matrix with N rows and P*C columns can be obtained. The acquisition process of the second matrix is similar to the acquisition process of the first matrix, so that the second matrix whose rows are P*C and whose columns are N can be acquired.

获取第一矩阵和第二矩阵后，可以将第一矩阵与第二矩阵的乘积，作为区域相似度矩阵。用公式表示为：After the first matrix and the second matrix are obtained, the product of the first matrix and the second matrix can be used as the area similarity matrix. Expressed as:

M＝reshape(T，[N，P*C])*reshape(T，[P*C，N])。M = reshape(T, [N, P*C])*reshape(T, [P*C, N]).

通过对分块特征图进行两种形状转换处理，可以分别获取行为N的第一矩阵，以及列为N的第二矩阵，N为图像块的个数，从而，基于第一矩阵和第二矩阵可以计算多个图像块中两两图像块之间的区域相似度，并由两两图像块之间的区域相似度，组成区域相似度矩阵，从而区域相似度矩阵可以表达不同区域间的联系。By performing two kinds of shape conversion processing on the block feature map, the first matrix with row N and the second matrix with column N can be obtained respectively, and N is the number of image blocks. Therefore, based on the first matrix and the second matrix The regional similarity between two image blocks in multiple image blocks can be calculated, and the regional similarity matrix is composed of the regional similarity between two image blocks, so that the regional similarity matrix can express the connection between different regions.

304、基于所述区域相似度矩阵和所述分块特征图，获取加权特征图。304. Obtain a weighted feature map based on the region similarity matrix and the block feature map.

其中，如图4所示，获取区域相似度矩阵后，可以对区域相似度矩阵和分块特征图进行加权计算，获取加权特征图B，加权特征图B的维度与分块特征图T的维度一致，即加权特征图B的维度为(N，P，C)。Among them, as shown in Figure 4, after obtaining the regional similarity matrix, weighted calculations can be performed on the regional similarity matrix and the block feature map to obtain the weighted feature map B, the dimension of the weighted feature map B and the dimension of the block feature map T Consistent, that is, the dimension of the weighted feature map B is (N, P, C).

具体地，可以对所述区域相似度矩阵进行归一化处理，以获取归一化处理后的区域相似度矩阵；将所述归一化处理后的区域相似度矩阵与所述分块特征图的乘积，作为加权特征图。Specifically, the region similarity matrix may be normalized to obtain a normalized region similarity matrix; the normalized region similarity matrix and the block feature map The product of is used as a weighted feature map.

其中，归一化可以采用softmax函数，加权特征图的计算公式可以表示为：Among them, the normalization can use the softmax function, and the calculation formula of the weighted feature map can be expressed as:

B＝∑softmax(M)*TB＝∑softmax(M)*T

其中，M为区域相似度矩阵，T为分块特征图，B为加权特征图。Among them, M is the region similarity matrix, T is the block feature map, and B is the weighted feature map.

softmax为softmax函数，∑为求和函数。softmax is a softmax function, and ∑ is a summation function.

归一化后的M为N*N的矩阵，求和时，可以针对每个图像块，即，归一化后的M的每行(或每列)，将该行的元素值进行相加后，作为该行对应的图像块的加权系数。在相乘时，可以针对T的每个通道，采用按行求和后的加权系数，分别乘以T的该通道上的该行(共N行)的元素(每行共P个元素)，从而可以获得维度为(N，P，C)的B。The normalized M is an N*N matrix. When summing, for each image block, that is, for each row (or column) of the normalized M, the element values of the row can be added After that, it is used as the weighting coefficient of the image block corresponding to the row. When multiplying, for each channel of T, the weighted coefficient after row-wise summation can be used to multiply the elements of the row (a total of N rows) on the channel of T (a total of P elements in each row), Thus, B with dimensions (N, P, C) can be obtained.

由于M包含了不同区域间的相似度信息，基于M获取的B包含了区域间的信息，即B蕴含丰富的全局信息，建立了不同区域间的联系。Since M contains similarity information between different regions, B obtained based on M contains information between regions, that is, B contains rich global information and establishes connections between different regions.

通过基于区域相似度矩阵和分块特征图获取加权特征图，可以获取包含全局信息的加权特征图，从而可以基于加权特征图获取全局特征图。By obtaining the weighted feature map based on the region similarity matrix and the block feature map, the weighted feature map containing global information can be obtained, so that the global feature map can be obtained based on the weighted feature map.

305、基于所述加权特征图，获取全局特征图。305. Acquire a global feature map based on the weighted feature map.

其中，可以对加权特征图进行形状转换(reshape)处理，将形状转换处理后的加权特征图，作为全局特征图，即，将加权特征图B转换为维度与局部特征图的维度一致的特征图，将该特征图作为全局特征图。Among them, the shape conversion (reshape) processing can be performed on the weighted feature map, and the weighted feature map after the shape conversion process is used as the global feature map, that is, the weighted feature map B is converted into a feature map whose dimension is consistent with the dimension of the local feature map , and use this feature map as the global feature map.

全局特征图的维度为(W，H，C)。The dimensionality of the global feature map is (W, H, C).

由于全局特征图是对加权特征图进行形状转换处理后得到的，加权特征图中包含丰富的全局信息，因此，全局特征图也包含丰富的全局信息，进而可以提高图像特征的表达能力，提高图像识别精度。Since the global feature map is obtained after shape transformation of the weighted feature map, the weighted feature map contains rich global information. Therefore, the global feature map also contains rich global information, which can improve the expressive ability of image features and improve the image quality. recognition accuracy.

另外，全局特征图是基于局部特征图获得的，具体是基于对局部特征图进行区域分块后获得的，可以准确高效地获得全局特征图。In addition, the global feature map is obtained based on the local feature map, specifically, the local feature map is obtained by region-blocking, and the global feature map can be obtained accurately and efficiently.

306、基于所述局部特征图和所述全局特征图，获取融合特征图。306. Acquire a fusion feature map based on the local feature map and the global feature map.

其中，可以将所述局部特征图和所述全局特征图，按照通道维度进行合并，将合并后的特征图作为所述融合特征图。Wherein, the local feature map and the global feature map may be merged according to channel dimensions, and the merged feature map may be used as the fusion feature map.

比如，参见图4，局部特征图和全局特征图的维度均为(W，H，C)，按照通道维度进行合并后，可以获取维度为(W，H，2*C)的融合特征图A。For example, see Figure 4. The dimensions of the local feature map and the global feature map are both (W, H, C). After merging according to the channel dimension, a fusion feature map A with a dimension of (W, H, 2*C) can be obtained. .

之后，可以将融合特征图作为最终的图像特征，基于最终的图像特征进行图像识别。After that, the fusion feature map can be used as the final image feature, and image recognition can be performed based on the final image feature.

相比于一般的图像特征仅包含局部特征的方案，本实施例中，图像特征融合了局部特征和全局特征，从而可以提高图像特征的表达能力，提高图像识别精度。Compared with the general scheme in which image features only include local features, in this embodiment, image features are combined with local features and global features, thereby improving the expressive ability of image features and improving the accuracy of image recognition.

307、基于所述融合特征图，获取所述人脸图像的人脸识别结果。307. Acquire a face recognition result of the face image based on the fused feature map.

其中，获取人脸图像对应的融合特征图后，可以将所述融合特征图转换为待识别特征向量；确定所述待识别特征向量与预存的多个用户的候选特征向量中各个候选特征向量之间的向量相似度；基于所述向量相似度，确定所述人脸图像属于的用户。Wherein, after the fusion feature map corresponding to the face image is obtained, the fusion feature map can be converted into a feature vector to be identified; The vector similarity among them; based on the vector similarity, determine the user to which the face image belongs.

其中，如图4所示，融合特征图A的维度为(W，H，C)，可以将其展平(flatten)为W*H*C维的向量，该向量可以称为待识别特征向量。Among them, as shown in Figure 4, the dimension of the fusion feature map A is (W, H, C), which can be flattened into a vector of W*H*C dimension, which can be called the feature vector to be identified .

展平可以采用各种相关技术实现，比如，针对每个通道，可以按行排列为W*H维的向量，再将C个通道中每个通道对应的W*H维的向量排列为W*H*C维的向量。Flattening can be achieved using various related technologies. For example, for each channel, it can be arranged in rows as W*H-dimensional vectors, and then the W*H-dimensional vectors corresponding to each channel in the C channels can be arranged as W* A vector of H*C dimensions.

数据库中可以预存多个用户中各个用户的特征向量，各个用户的特征向量可以称为候选特征向量，比如，数据库中存在1万个用户的候选特征向量。The feature vectors of each of the multiple users may be pre-stored in the database, and the feature vectors of each user may be called candidate feature vectors. For example, there are 10,000 candidate feature vectors of users in the database.

可以计算待识别特征向量与各个候选特征向量之间的向量相似度，其中，可以采用各种相关技术中的向量相似度的计算方式，比如，计算向量间的余弦相似度等。之后，可以基于向量相似度获取人脸识别结果，比如，将向量相似度最高的候选特征向量对应的用户，作为识别出的人脸图像属于的用户。The vector similarity between the feature vector to be identified and each candidate feature vector can be calculated, wherein the vector similarity calculation methods in various related technologies can be used, for example, the cosine similarity between vectors can be calculated. Afterwards, the face recognition result can be obtained based on the vector similarity, for example, the user corresponding to the candidate feature vector with the highest vector similarity is used as the user to which the recognized face image belongs.

通过基于向量间相似度确定图像识别结果，可以简便快速地获取图像识别结果。By determining the image recognition result based on the similarity between vectors, the image recognition result can be obtained easily and quickly.

本实施例中，通过采用人脸识别模型获取人脸图像的局部特征图，对局部特征图进行区域分块处理，以获取分块特征图，基于分块特征图获取全局特征图，对局部特征图和全局特征图进行融合处理，以获取融合特征图，基于融合特征图获取人脸识别结果，可以在人脸识别过程中结合人脸图像的局部信息和全局信息，提高人脸图像的图像特征的表述能力，进而提高人脸识别精度。In this embodiment, the local feature map of the face image is obtained by using the face recognition model, and the local feature map is subjected to regional block processing to obtain the block feature map, the global feature map is obtained based on the block feature map, and the local feature map The image and the global feature map are fused to obtain the fused feature map, and the face recognition result is obtained based on the fused feature map. In the process of face recognition, the local information and global information of the face image can be combined to improve the image characteristics of the face image. expression ability, thereby improving the accuracy of face recognition.

上述描述了图像识别过程，如上描述，其中，可以采用图像识别模型获取局部特征图。关于图像识别模型的训练过程可以参见下面的实施例。The image recognition process is described above, and as described above, an image recognition model can be used to obtain a local feature map. For the training process of the image recognition model, please refer to the following embodiments.

图5是根据本公开第五实施例的示意图。本实施例提供一种图像识别模型的训练方法，该方法包括：FIG. 5 is a schematic diagram according to a fifth embodiment of the present disclosure. This embodiment provides a training method for an image recognition model, the method comprising:

501、采用初始的图像识别模型，对输入的图像样本进行特征提取处理，以获取所述图像样本的局部特征，所述局部特征用于表达所述图像样本的区域内特征。501. Using an initial image recognition model, perform feature extraction processing on an input image sample to obtain local features of the image sample, where the local features are used to express features within a region of the image sample.

502、获取所述图像样本的全局特征，所述全局特征用于表达所述图像样本的区域间特征。502. Acquire a global feature of the image sample, where the global feature is used to express an inter-region feature of the image sample.

503、基于所述局部特征和所述全局特征，获取所述图像样本的预测识别结果。503. Acquire a predicted recognition result of the image sample based on the local feature and the global feature.

504、基于所述预测识别结果和所述图像样本对应的真实识别结果，构建损失函数。504. Construct a loss function based on the predicted recognition result and the real recognition result corresponding to the image sample.

505、基于所述损失函数，调整所述初始的图像识别模型的参数，以生成最终的图像识别模型。505. Adjust parameters of the initial image recognition model based on the loss function to generate a final image recognition model.

其中，本实施例的执行主体可以为图像识别模型的训练装置，图像识别模型的训练装置可以位于电子设备内，电子设备可以为用户终端或者服务器，用户终端可以包括：个人电脑(Personal Computer、PC)、移动设备、智能家居设备、智能家居设备、可穿戴式设备等，移动设备比如包括手机、便携式电脑、平板电脑等，智能家居设备比如包括智能音箱、智能电视等，可穿戴式设备比如包括智能手表、智能眼镜等。服务器可以为本地服务器或者云端服务器等。Wherein, the executor of this embodiment may be a training device for an image recognition model, the training device for an image recognition model may be located in an electronic device, the electronic device may be a user terminal or a server, and the user terminal may include: a personal computer (Personal Computer, PC ), mobile devices, smart home devices, smart home devices, wearable devices, etc. Mobile devices include mobile phones, laptops, tablets, etc. Smart home devices include smart speakers, smart TVs, etc. Wearable devices include Smart watches, smart glasses, etc. The server may be a local server or a cloud server.

其中，模型训练阶段所采用的图像可以称为图像样本，图像样本可以从已有数据集中获取，已有数据集比如为ImageNet。Wherein, the images used in the model training stage may be called image samples, and the image samples may be obtained from existing datasets, such as ImageNet.

另外，还可以预先采用人工标注或者其他方式，对图像样本的图像识别结果进行标注，比如，标注图像样本属于的用户的用户标识等，预先标注的图像识别结果可以称为图像样本对应的真实识别结果。In addition, the image recognition results of the image samples can also be marked in advance by manual labeling or other methods, for example, the user ID of the user to which the image sample belongs, etc., and the pre-labeled image recognition results can be called the real recognition results corresponding to the image samples. result.

与模型应用阶段(即上述的图像识别过程)类似，采用图像识别模型对图像样本进行处理后，可以获取图像识别结果，该图像识别结果可以称为预测识别结果。Similar to the model application stage (that is, the above-mentioned image recognition process), after the image sample is processed by the image recognition model, an image recognition result can be obtained, and the image recognition result can be called a predictive recognition result.

初始的图像识别模型的模型参数可以是人工设置的，或者，初始的图像识别模型可以采用已有的预训练模型。The model parameters of the initial image recognition model can be manually set, or the initial image recognition model can use an existing pre-trained model.

针对从图像样本到预测识别结果的过程，可以包括如下实施例，关于具体内容，可以参见上述关于图像识别过程的实施例，在此不再详述。For the process from image samples to predicted recognition results, the following embodiments may be included. For specific content, reference may be made to the above-mentioned embodiments on the image recognition process, which will not be described in detail here.

本实施例中，可以基于全局特征和局部特征，获取预测识别结果，从而不仅可以考虑图像样本的局部信息，还可以考虑图像样本的全局信息，因此，可以提高预测识别结果的精度，进而提高图像识别模型的精度。In this embodiment, the predicted recognition result can be obtained based on the global features and local features, so that not only the local information of the image sample, but also the global information of the image sample can be considered. Therefore, the accuracy of the predicted recognition result can be improved, and the image The accuracy of the recognition model.

一些实施例中，所述局部特征为局部特征图，所述全局特征为全局特征图，所述获取所述图像的全局特征，包括：对所述局部特征图进行区域分块处理，以获取多个图像块；基于所述多个图像块，确定分块特征图，所述分块特征图包括所述多个图像块；基于所述分块特征图，确定区域相似度矩阵，所述区域相似度矩阵用于表明所述多个图像块之间的相似度；基于所述区域相似度矩阵和所述分块特征图，获取所述全局特征图。In some embodiments, the local feature is a local feature map, the global feature is a global feature map, and the obtaining the global feature of the image includes: performing regional block processing on the local feature map to obtain multiple image blocks; based on the plurality of image blocks, determine a block feature map, the block feature map includes the plurality of image blocks; based on the block feature map, determine a region similarity matrix, and the regions are similar The degree matrix is used to indicate the similarity between the plurality of image blocks; the global feature map is obtained based on the region similarity matrix and the block feature map.

全局特征图是基于局部特征图获得的，具体是基于对局部特征图进行区域分块后获得的，可以准确高效地获得全局特征图。另外，由于区域相似度矩阵可以表达区域间信息，因此，基于区域相似度矩阵可以获得能够表达区域间信息的全局特征图。The global feature map is obtained based on the local feature map, specifically, it is obtained based on the regional block of the local feature map, and the global feature map can be obtained accurately and efficiently. In addition, since the regional similarity matrix can express inter-regional information, a global feature map that can express inter-regional information can be obtained based on the regional similarity matrix.

一些实施例中，所述基于所述分块特征图，确定区域相似度矩阵，包括：对所述分块特征图进行第一形状转换处理，以获取第一矩阵；对所述分块特征图进行第二形状转换处理，以获取第二矩阵；将所述第一矩阵和所述第二矩阵的乘积，作为所述区域相似度矩阵；其中，所述第一矩阵的行数与所述第一矩阵的列数均为所述多个图像块的个数。In some embodiments, the determining the region similarity matrix based on the block feature map includes: performing a first shape conversion process on the block feature map to obtain a first matrix; performing a second shape conversion process to obtain a second matrix; using the product of the first matrix and the second matrix as the region similarity matrix; wherein, the number of rows of the first matrix is the same as the number of rows of the second matrix The number of columns in a matrix is the number of the plurality of image blocks.

通过对分块特征图进行两种形状转换处理，可以分别获取行为N的第一矩阵，以及列为N的第二矩阵，N为图像块的个数，从而，基于第一矩阵和第二矩阵可以计算多个图像块中两两图像块之间的区域相似度，并由两两图像块之间的区域相似度，组成区域相似度矩阵。By performing two kinds of shape conversion processing on the block feature map, the first matrix with row N and the second matrix with column N can be obtained respectively, and N is the number of image blocks. Therefore, based on the first matrix and the second matrix The regional similarity between two image blocks in the plurality of image blocks can be calculated, and the regional similarity matrix between the two image blocks is formed.

一些实施例中，所述基于所述区域相似度矩阵和所述分块特征图，获取所述全局特征图，包括：对所述区域相似度矩阵进行归一化处理，以获取归一化处理后的区域相似度矩阵；将所述归一化处理后的区域相似度矩阵与所述分块特征图的乘积，作为加权特征图；对所述加权特征图进行形状转换处理，以获取所述全局特征图，其中，所述全局特征图的维度与所述局部特征图的维度一致。In some embodiments, the obtaining the global feature map based on the region similarity matrix and the block feature map includes: performing normalization processing on the region similarity matrix to obtain a normalized The regional similarity matrix after the normalization process; the product of the normalized regional similarity matrix and the block feature map is used as a weighted feature map; shape conversion processing is performed on the weighted feature map to obtain the described A global feature map, wherein the dimension of the global feature map is consistent with the dimension of the local feature map.

通过基于区域相似度矩阵和分块特征图获取加权特征图，可以获取包含全局信息的加权特征图。由于全局特征图是对加权特征图进行形状转换处理后得到的，加权特征图中包含丰富的全局信息，因此，全局特征图也包含丰富的全局信息，进而可以提高图像特征的表达能力，提高图像识别精度。By obtaining weighted feature maps based on the region similarity matrix and block feature maps, weighted feature maps containing global information can be obtained. Since the global feature map is obtained after shape transformation of the weighted feature map, the weighted feature map contains rich global information. Therefore, the global feature map also contains rich global information, which can improve the expressive ability of image features and improve the image quality. recognition accuracy.

一些实施例中，所述局部特征为局部特征图，所述全局特征为全局特征图，所述基于所述局部特征和所述全局特征，获取所述图像样本的预测识别结果，包括：对所述局部特征图和所述全局特征图进行融合处理，以获取融合特征图；基于所述融合特征图，获取所述图像样本的预测识别结果。In some embodiments, the local feature is a local feature map, the global feature is a global feature map, and the obtaining the predicted recognition result of the image sample based on the local feature and the global feature includes: performing fusion processing on the local feature map and the global feature map to obtain a fusion feature map; based on the fusion feature map, obtaining a prediction recognition result of the image sample.

一些实施例中，所述图像样本为人脸图像样本，所述基于所述融合特征图，获取所述图像样本的预测识别结果，包括：将所述融合特征图转换为待识别特征向量；确定所述待识别特征向量与预存的多个用户的候选特征向量中各个候选特征向量之间的向量相似度；基于所述向量相似度，确定所述人脸图像样本属于的用户的用户信息，将所述用户信息作为所述预测识别结果。In some embodiments, the image sample is a face image sample, and the obtaining the predicted recognition result of the image sample based on the fusion feature map includes: converting the fusion feature map into a feature vector to be recognized; determining the The vector similarity between the feature vector to be identified and the candidate feature vectors of multiple users pre-stored; based on the vector similarity, determine the user information of the user to which the face image sample belongs, and use the The above user information is used as the predicted identification result.

其中，用户信息可以为用户标识，或者用户为正确识别结果的概率值等。以概率值为例，真实识别结果用于表明用户是否为正确识别结果，真实识别结果可以为1(为正确识别结果)或者0(不是正确识别结果)，而预测识别结果一般为0～1的值。Wherein, the user information may be a user identifier, or a probability value that the user is a correct identification result, and the like. Taking the probability value as an example, the real recognition result is used to indicate whether the user is the correct recognition result. The real recognition result can be 1 (correct recognition result) or 0 (not the correct recognition result), while the predicted recognition result is generally 0 to 1. value.

通过基于向量间相似度确定预测识别结果，可以简便快速地获取预测识别结果。By determining the predicted recognition result based on the similarity between vectors, the predicted recognition result can be obtained easily and quickly.

获取预测识别结果和真实识别结果后，可以构建损失函数。After obtaining the predicted recognition results and the real recognition results, the loss function can be constructed.

比如，参见图6，从人脸图像样本经过卷积、区域分块等处理后，可以获取融合特征图，基于融合特征图可以预测识别结果，基于预测识别结果和真实识别结果可以构建损失函数，损失函比如为分类损失函数。分类损失函数具体可以为交叉熵函数。For example, see Figure 6. After the face image sample is processed by convolution and area division, the fusion feature map can be obtained. Based on the fusion feature map, the recognition result can be predicted, and the loss function can be constructed based on the predicted recognition result and the real recognition result. The loss function is, for example, a classification loss function. Specifically, the classification loss function may be a cross-entropy function.

获取损失函数后，可以基于该损失函数调整图像识别模型的模型参数，其中，可以采用反向传播(Back Propagation，BP)算法等进行模型参数的调整。模型参数调整过程可以设置最大迭代次数，在模型参数的调整次数达到最大迭代次数时，将此时的模型参数对应的模型，作为最终的图像识别模型。After the loss function is obtained, the model parameters of the image recognition model can be adjusted based on the loss function, wherein the model parameters can be adjusted by using a back propagation (Back Propagation, BP) algorithm or the like. The model parameter adjustment process can set a maximum number of iterations, and when the number of adjustments of the model parameters reaches the maximum number of iterations, the model corresponding to the model parameters at this time is used as the final image recognition model.

图7是本公开第七实施例的示意图，本实施例提供一种图像识别装置，该装置700包括：第一获取模块701、第二获取模块702和识别模块703。FIG. 7 is a schematic diagram of a seventh embodiment of the present disclosure. This embodiment provides an image recognition device, and the device 700 includes: a first acquisition module 701 , a second acquisition module 702 and an identification module 703 .

第一获取模块701用于对图像进行特征提取处理，以获取所述图像的局部特征，所述局部特征用于表达所述图像的区域内特征；第二获取模块702用于获取所述图像的全局特征，所述全局特征用于表达所述图像的区域间特征；识别模块703用于基于所述局部特征和所述全局特征，获取所述图像的图像识别结果。The first acquisition module 701 is used to perform feature extraction processing on the image to acquire the local features of the image, and the local features are used to express the features in the region of the image; the second acquisition module 702 is used to acquire the features of the image A global feature, the global feature is used to express the inter-regional feature of the image; the recognition module 703 is used to obtain an image recognition result of the image based on the local feature and the global feature.

一些实施例中，所述局部特征为局部特征图，所述全局特征为全局特征图，所述第二获取模块702进一步用于：对所述局部特征图进行区域分块处理，以获取多个图像块；基于所述多个图像块，确定分块特征图，所述分块特征图包括所述多个图像块；基于所述分块特征图，确定区域相似度矩阵，所述区域相似度矩阵用于表明所述多个图像块之间的相似度；基于所述区域相似度矩阵和所述分块特征图，获取所述全局特征图。In some embodiments, the local feature is a local feature map, the global feature is a global feature map, and the second acquisition module 702 is further configured to: perform regional block processing on the local feature map to obtain multiple An image block; based on the plurality of image blocks, determine a block feature map, the block feature map includes the plurality of image blocks; based on the block feature map, determine a region similarity matrix, the region similarity The matrix is used to indicate the similarity between the plurality of image blocks; the global feature map is obtained based on the region similarity matrix and the block feature map.

由于区域相似度矩阵可以表达区域间信息，因此，基于区域相似度矩阵可以获得能够表达区域间信息的全局特征图。Since the regional similarity matrix can express inter-regional information, a global feature map that can express inter-regional information can be obtained based on the regional similarity matrix.

一些实施例中，所述第二获取模块702进一步用于：对所述分块特征图进行第一形状转换处理，以获取第一矩阵；对所述分块特征图进行第二形状转换处理，以获取第二矩阵；将所述第一矩阵和所述第二矩阵的乘积，作为所述区域相似度矩阵；其中，所述第一矩阵的行数与所述第一矩阵的列数均为所述多个图像块的个数。In some embodiments, the second obtaining module 702 is further configured to: perform a first shape transformation process on the block feature map to obtain a first matrix; perform a second shape transformation process on the block feature map, to obtain a second matrix; the product of the first matrix and the second matrix is used as the area similarity matrix; wherein, the number of rows of the first matrix and the number of columns of the first matrix are both The number of the plurality of image blocks.

一些实施例中，所述第二获取模块702进一步用于：对所述区域相似度矩阵进行归一化处理，以获取归一化处理后的区域相似度矩阵；将所述归一化处理后的区域相似度矩阵与所述分块特征图的乘积，作为加权特征图；对所述加权特征图进行形状转换处理，以获取所述全局特征图，其中，所述全局特征图的维度与所述局部特征图的维度一致。In some embodiments, the second obtaining module 702 is further configured to: perform normalization processing on the region similarity matrix to obtain a normalized region similarity matrix; The product of the regional similarity matrix and the block feature map is used as a weighted feature map; the weighted feature map is subjected to shape conversion processing to obtain the global feature map, wherein the dimension of the global feature map is the same as the The dimensions of the local feature maps described above are consistent.

一些实施例中，所述局部特征为局部特征图，所述全局特征为全局特征图，所述识别模块703进一步用于：对所述局部特征图和所述全局特征图进行融合处理，以获取融合特征图；基于所述融合特征图，获取所述图像的图像识别结果。In some embodiments, the local feature is a local feature map, the global feature is a global feature map, and the identification module 703 is further configured to: perform fusion processing on the local feature map and the global feature map to obtain Fusion feature map; based on the fusion feature map, an image recognition result of the image is acquired.

一些实施例中，所述图像为人脸图像，所述识别模块703进一步用于：将所述融合特征图转换为待识别特征向量；确定所述待识别特征向量与预存的多个用户的候选特征向量中各个候选特征向量之间的向量相似度；基于所述向量相似度，确定所述人脸图像属于的用户。In some embodiments, the image is a face image, and the identification module 703 is further configured to: convert the fusion feature map into a feature vector to be identified; determine the feature vector to be identified and the candidate features of multiple pre-stored users The vector similarity between each candidate feature vector in the vector; based on the vector similarity, determine the user to which the face image belongs.

一些实施例中，所述第一获取模块701进一步用于：采用图像识别模型，对输入的所述图像进行特征提取处理，以输出所述图像的局部特征。In some embodiments, the first acquisition module 701 is further configured to: use an image recognition model to perform feature extraction processing on the input image, so as to output local features of the image.

由于人脸识别模型通常为深度神经网络模型，比如，上述的CNN模型，CNN模型具有准确度高等优点，从而可以提取出更准确的局部特征。Since the face recognition model is usually a deep neural network model, such as the above-mentioned CNN model, the CNN model has the advantages of high accuracy, so that more accurate local features can be extracted.

图8是本公开第八实施例的示意图，本实施例提供一种图像识别模型的训练装置，该装置800包括：第一获取模块801、第二获取模块802、预测模块803、构建模块804和生成模块805。8 is a schematic diagram of the eighth embodiment of the present disclosure. This embodiment provides a training device for an image recognition model. The device 800 includes: a first acquisition module 801, a second acquisition module 802, a prediction module 803, a construction module 804 and Generate module 805 .

第一获取模块801用于采用初始的图像识别模型，对输入的图像样本进行特征提取处理，以获取所述图像样本的局部特征，所述局部特征用于表达所述图像样本的区域内特征；第二获取模块802用于获取所述图像样本的全局特征，所述全局特征用于表达所述图像样本的区域间特征；预测模块803用于基于所述局部特征和所述全局特征，获取所述图像样本的预测识别结果；构建模块804用于基于所述预测识别结果和所述图像样本对应的真实识别结果，构建损失函数；生成模块805用于基于所述损失函数，调整所述初始的图像识别模型的参数，以生成最终的图像识别模型。The first acquisition module 801 is configured to use an initial image recognition model to perform feature extraction processing on the input image sample, so as to acquire local features of the image sample, and the local features are used to express regional features of the image sample; The second obtaining module 802 is used to obtain the global feature of the image sample, and the global feature is used to express the inter-regional feature of the image sample; the prediction module 803 is used to obtain the global feature based on the local feature and the global feature. The predicted recognition result of the image sample; the construction module 804 is used to construct a loss function based on the predicted recognition result and the real recognition result corresponding to the image sample; the generation module 805 is used to adjust the initial The parameters of the image recognition model to generate the final image recognition model.

一些实施例中，所述局部特征为局部特征图，所述全局特征为全局特征图，所述第二获取模块802进一步用于：对所述局部特征图进行区域分块处理，以获取多个图像块；基于所述多个图像块，确定分块特征图，所述分块特征图包括所述多个图像块；基于所述分块特征图，确定区域相似度矩阵，所述区域相似度矩阵用于表明所述多个图像块之间的相似度；基于所述区域相似度矩阵和所述分块特征图，获取所述全局特征图。In some embodiments, the local feature is a local feature map, the global feature is a global feature map, and the second acquisition module 802 is further configured to: perform regional block processing on the local feature map to obtain multiple An image block; based on the plurality of image blocks, determine a block feature map, the block feature map includes the plurality of image blocks; based on the block feature map, determine a region similarity matrix, the region similarity The matrix is used to indicate the similarity between the plurality of image blocks; the global feature map is obtained based on the region similarity matrix and the block feature map.

一些实施例中，所述第二获取模块802进一步用于：对所述分块特征图进行第一形状转换处理，以获取第一矩阵；对所述分块特征图进行第二形状转换处理，以获取第二矩阵；将所述第一矩阵和所述第二矩阵的乘积，作为所述区域相似度矩阵；其中，所述第一矩阵的行数与所述第一矩阵的列数均为所述多个图像块的个数。In some embodiments, the second acquisition module 802 is further configured to: perform a first shape conversion process on the block feature map to obtain a first matrix; perform a second shape conversion process on the block feature map, to obtain a second matrix; the product of the first matrix and the second matrix is used as the area similarity matrix; wherein, the number of rows of the first matrix and the number of columns of the first matrix are both The number of the plurality of image blocks.

一些实施例中，所述第二获取模块802进一步用于：对所述区域相似度矩阵进行归一化处理，以获取归一化处理后的区域相似度矩阵；将所述归一化处理后的区域相似度矩阵与所述分块特征图的乘积，作为加权特征图；对所述加权特征图进行形状转换处理，以获取所述全局特征图，其中，所述全局特征图的维度与所述局部特征图的维度一致。In some embodiments, the second obtaining module 802 is further configured to: perform normalization processing on the region similarity matrix to obtain a normalized region similarity matrix; The product of the regional similarity matrix and the block feature map is used as a weighted feature map; the weighted feature map is subjected to shape conversion processing to obtain the global feature map, wherein the dimension of the global feature map is the same as the The dimensions of the local feature maps described above are consistent.

一些实施例中，所述局部特征为局部特征图，所述全局特征为全局特征图，所述预测模块803进一步用于：对所述局部特征图和所述全局特征图进行融合处理，以获取融合特征图；基于所述融合特征图，获取所述图像样本的预测识别结果。In some embodiments, the local feature is a local feature map, the global feature is a global feature map, and the prediction module 803 is further configured to: perform fusion processing on the local feature map and the global feature map to obtain Fusion feature map; based on the fusion feature map, obtain the predicted recognition result of the image sample.

一些实施例中，所述图像样本为人脸图像样本，所述预测模块803进一步用于：将所述融合特征图转换为待识别特征向量；确定所述待识别特征向量与预存的多个用户的候选特征向量中各个候选特征向量之间的向量相似度；基于所述向量相似度，确定所述人脸图像样本属于的用户的用户信息，将所述用户信息作为所述预测识别结果。In some embodiments, the image sample is a face image sample, and the prediction module 803 is further configured to: convert the fusion feature map into a feature vector to be identified; determine the feature vector to be identified and the pre-stored multiple user The vector similarity between each candidate feature vector in the candidate feature vectors; based on the vector similarity, determine the user information of the user to which the face image sample belongs, and use the user information as the predicted recognition result.

可以理解的是，本公开实施例中的“第一”、“第二”等只是用于区分，不表示重要程度高低、时序先后等。It can be understood that the "first" and "second" in the embodiments of the present disclosure are only used for distinction, and do not indicate the level of importance, sequence of time, and the like.

可以理解的是，本公开实施例中，不同实施例中的相同或相似内容可以相互参考。It can be understood that, in the embodiments of the present disclosure, the same or similar content in different embodiments may refer to each other.

根据本公开的实施例，本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。According to the embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.

图9示出了可以用来实施本公开的实施例的示例电子设备900的示意性框图。电子设备旨在表示各种形式的数字计算机，诸如，膝上型计算机、台式计算机、工作台、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置，诸如，个人数字助理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例，并且不意在限制本文中描述的和/或者要求的本公开的实现。FIG. 9 shows a schematic block diagram of an example electronic device 900 that may be used to implement embodiments of the present disclosure. Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

如图9所示，电子设备900包括计算单元901，其可以根据存储在只读存储器(ROM)902中的计算机程序或者从存储单元908加载到随机访问存储器(RAM)903中的计算机程序，来执行各种适当的动作和处理。在RAM 903中，还可存储电子设备900操作所需的各种程序和数据。计算单元901、ROM 902以及RAM 903通过总线904彼此相连。输入/输出(I/O)接口905也连接至总线904。As shown in FIG. 9 , an electronic device 900 includes a computing unit 901, which can perform calculations according to a computer program stored in a read-only memory (ROM) 902 or a computer program loaded from a storage unit 908 into a random access memory (RAM) 903. Various appropriate actions and processes are performed. In the RAM 903, various programs and data necessary for the operation of the electronic device 900 can also be stored. The computing unit 901 , ROM 902 , and RAM 903 are connected to each other through a bus 904 . An input/output (I/O) interface 905 is also connected to the bus 904 .

电子设备900中的多个部件连接至I/O接口905，包括：输入单元906，例如键盘、鼠标等；输出单元907，例如各种类型的显示器、扬声器等；存储单元908，例如磁盘、光盘等；以及通信单元909，例如网卡、调制解调器、无线通信收发机等。通信单元909允许电子设备900通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Multiple components in the electronic device 900 are connected to the I/O interface 905, including: an input unit 906, such as a keyboard, a mouse, etc.; an output unit 907, such as various types of displays, speakers, etc.; a storage unit 908, such as a magnetic disk, an optical disk etc.; and a communication unit 909, such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the electronic device 900 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.

计算单元901可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元901的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元901执行上文所描述的各个方法和处理，例如图像识别方法或图像识别模型的训练方法。例如，在一些实施例中，图像识别方法或图像识别模型的训练方法可被实现为计算机软件程序，其被有形地包含于机器可读介质，例如存储单元908。在一些实施例中，计算机程序的部分或者全部可以经由ROM 902和/或通信单元909而被载入和/或安装到电子设备900上。当计算机程序加载到RAM 903并由计算单元901执行时，可以执行上文描述的图像识别方法或图像识别模型的训练方法的一个或多个步骤。备选地，在其他实施例中，计算单元901可以通过其他任何适当的方式(例如，借助于固件)而被配置为执行图像识别方法或图像识别模型的训练方法。The computing unit 901 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of computing units 901 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 901 executes various methods and processes described above, such as an image recognition method or an image recognition model training method. For example, in some embodiments, the image recognition method or the training method of the image recognition model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 908 . In some embodiments, part or all of the computer program may be loaded and/or installed on the electronic device 900 via the ROM 902 and/or the communication unit 909 . When the computer program is loaded into RAM 903 and executed by computing unit 901, one or more steps of the image recognition method or image recognition model training method described above can be executed. Alternatively, in other embodiments, the computing unit 901 may be configured to execute an image recognition method or an image recognition model training method in any other appropriate manner (for example, by means of firmware).

本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统(SOC)、复杂可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括：实施在一个或者多个计算机程序中，该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释，该可编程处理器可以是专用或者通用可编程处理器，可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令，并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips (SOC), Complex Programmable Logic Device (CPLD), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor Can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.

用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器，使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行，作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

在本公开的上下文中，机器可读介质可以是有形的介质，其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备，或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

为了提供与用户的交互，可以在计算机上实施此处描述的系统和技术，该计算机具有：用于向用户显示信息的显示装置(例如，CRT(阴极射线管)或者LCD(液晶显示器)监视器)；以及键盘和指向装置(例如，鼠标或者轨迹球)，用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互；例如，提供给用户的反馈可以是任何形式的传感反馈(例如，视觉反馈、听觉反馈、或者触觉反馈)；并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide for interaction with the user, the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.

可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如，作为数据服务器)、或者包括中间件部件的计算系统(例如，应用服务器)、或者包括前端部件的计算系统(例如，具有图形用户界面或者网络浏览器的用户计算机，用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如，通信网络)来将系统的部件相互连接。通信网络的示例包括：局域网(LAN)、广域网(WAN)和互联网。The systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.

计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器，又称为云计算服务器或云主机，是云计算服务体系中的一项主机产品，以解决了传统物理主机与VPS服务("Virtual Private Server"，或简称"VPS")中，存在的管理难度大，业务扩展性弱的缺陷。服务器也可以为分布式系统的服务器，或者是结合了区块链的服务器。A computer system may include clients and servers. Clients and servers are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also known as cloud computing server or cloud host, which is a host product in the cloud computing service system to solve the problem of traditional physical host and VPS service ("Virtual Private Server", or "VPS") Among them, there are defects such as difficult management and weak business scalability. The server can also be a server of a distributed system, or a server combined with a blockchain.

应该理解，可以使用上面所示的各种形式的流程，重新排序、增加或删除步骤。例如，本发公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行，只要能够实现本公开公开的技术方案所期望的结果，本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, each step described in the present disclosure may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the present disclosure can be achieved, no limitation is imposed herein.

上述具体实施方式，并不构成对本公开保护范围的限制。本领域技术人员应该明白的是，根据设计要求和其他因素，可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等，均应包含在本公开保护范围之内。The specific implementation manners described above do not limit the protection scope of the present disclosure. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present disclosure shall be included within the protection scope of the present disclosure.

Claims

1. An image recognition method, comprising:

carrying out feature extraction processing on an image to obtain local features of the image, wherein the local features are used for expressing features in an area of the image;

acquiring global features of the image, wherein the global features are used for expressing inter-region features of the image;

acquiring an image recognition result of the image based on the local feature and the global feature;

The local feature is a local feature map, the global feature is a global feature map, and the acquiring the global feature of the image includes:

performing regional blocking processing on the local feature map to obtain a plurality of image blocks;

determining a block feature map based on the plurality of image blocks, the block feature map comprising the plurality of image blocks;

determining a region similarity matrix based on the block feature map, wherein the region similarity matrix is used for indicating the similarity among the plurality of image blocks; wherein, based on the block feature map, determining a region similarity matrix includes: performing first shape conversion processing on the block feature map to obtain a first matrix; performing second shape conversion processing on the block feature map to obtain a second matrix; taking the product of the first matrix and the second matrix as the area similarity matrix; the number of rows of the first matrix and the number of columns of the second matrix are the number of the plurality of image blocks;

acquiring the global feature map based on the region similarity matrix and the block feature map; the dimension of the global feature map is consistent with the dimension of the local feature map, the global feature map is obtained by performing shape conversion on a weighted feature map, and the weighted feature map is obtained by performing weighted calculation on the block feature map and the region similarity matrix.

2. The method according to claim 1, wherein the weighted feature map is obtained by performing weighted calculation on the block feature map and the region similarity matrix, and specifically includes:

normalizing the region similarity matrix to obtain a normalized region similarity matrix;

and taking the product of the normalized area similarity matrix and the block feature map as a weighted feature map.

3. The method of claim 1, wherein the local feature is a local feature map and the global feature is a global feature map, the acquiring an image recognition result of the image based on the local feature and the global feature comprising:

carrying out fusion processing on the local feature map and the global feature map to obtain a fusion feature map;

and acquiring an image recognition result of the image based on the fusion feature map.

4. A method according to claim 3, wherein the image is a face image, and the acquiring an image recognition result of the image based on the fused feature map includes:

converting the fusion feature map into feature vectors to be identified;

determining the vector similarity between the feature vector to be identified and each candidate feature vector in the pre-stored candidate feature vectors of a plurality of users;

And determining the user to which the face image belongs based on the vector similarity.

5. A method according to any one of claims 1-3, wherein said subjecting the image to a feature extraction process to obtain local features of the image comprises:

and adopting an image recognition model to perform feature extraction processing on the input image so as to output local features of the image.

6. A training method of an image recognition model, comprising:

performing feature extraction processing on an input image sample by adopting an initial image recognition model to obtain local features of the image sample, wherein the local features are used for expressing features in a region of the image sample;

acquiring global features of the image sample, wherein the global features are used for expressing inter-region features of the image sample;

based on the local features and the global features, obtaining a prediction recognition result of the image sample;

constructing a loss function based on the predicted recognition result and a real recognition result corresponding to the image sample;

based on the loss function, adjusting parameters of the initial image recognition model to generate a final image recognition model;

determining a region similarity matrix based on the block feature map, wherein the region similarity matrix is used for indicating the similarity among the plurality of image blocks; wherein, based on the block feature map, determining a region similarity matrix includes: performing first shape conversion processing on the block feature map to obtain a first matrix; performing second shape conversion processing on the block feature map to obtain a second matrix; taking the product of the first matrix and the second matrix as the area similarity matrix;

the number of rows of the first matrix and the number of columns of the second matrix are the number of the plurality of image blocks;

7. The method of claim 6, wherein the weighted feature map is obtained by performing weighted calculation on the block feature map and the region similarity matrix, and specifically includes:

8. The method according to any one of claims 6-7, wherein the local feature is a local feature map and the global feature is a global feature map, the obtaining a predictive recognition result of the image sample based on the local feature and the global feature comprises:

and acquiring a prediction recognition result of the image sample based on the fusion feature map.

9. The method of claim 8, wherein the image sample is a face image sample, the obtaining a predicted recognition result of the image sample based on the fused feature map comprises:

converting the fusion feature map into feature vectors to be identified;

and determining user information of a user to which the face image sample belongs based on the vector similarity, and taking the user information as the prediction recognition result.

10. An image recognition apparatus comprising:

the first acquisition module is used for carrying out feature extraction processing on the image so as to acquire local features of the image, wherein the local features are used for expressing the features in the region of the image;

the second acquisition module is used for acquiring global features of the image, wherein the global features are used for expressing inter-region features of the image;

the identification module is used for acquiring an image identification result of the image based on the local feature and the global feature;

wherein the local feature is a local feature map, the global feature is a global feature map, and the second obtaining module is further configured to:

Determining a region similarity matrix based on the block feature map, wherein the region similarity matrix is used for indicating the similarity among the plurality of image blocks;

acquiring the global feature map based on the region similarity matrix and the block feature map; the dimension of the global feature map is consistent with the dimension of the local feature map, the global feature map is obtained by performing shape conversion on a weighted feature map, and the weighted feature map is obtained by performing weighted calculation on the block feature map and the region similarity matrix;

wherein the second acquisition module is further configured to:

performing first shape conversion processing on the block feature map to obtain a first matrix;

performing second shape conversion processing on the block feature map to obtain a second matrix;

taking the product of the first matrix and the second matrix as the area similarity matrix;

the number of rows of the first matrix and the number of columns of the second matrix are the number of the plurality of image blocks.

11. The apparatus of claim 10, wherein the second acquisition module is further to:

12. The apparatus of claim 10, wherein the local feature is a local feature map and the global feature is a global feature map, the identification module further to:

13. The apparatus of claim 12, wherein the image is a face image, the recognition module further to:

converting the fusion feature map into feature vectors to be identified;

14. The apparatus of any of claims 10-13, wherein the first acquisition module is further to:

15. A training device for an image recognition model, comprising:

the first acquisition module is used for carrying out feature extraction processing on an input image sample by adopting an initial image recognition model so as to acquire local features of the image sample, wherein the local features are used for expressing the features in the region of the image sample;

the second acquisition module is used for acquiring global features of the image sample, wherein the global features are used for expressing inter-region features of the image sample;

the prediction module is used for acquiring a prediction recognition result of the image sample based on the local feature and the global feature;

the construction module is used for constructing a loss function based on the prediction recognition result and the real recognition result corresponding to the image sample;

the generation module is used for adjusting parameters of the initial image recognition model based on the loss function so as to generate a final image recognition model;

wherein the second acquisition module is further configured to:

16. The apparatus of claim 15, wherein the second acquisition module is further to:

17. The apparatus of any of claims 15-16, wherein the local feature is a local feature map and the global feature is a global feature map, the prediction module further to:

18. The apparatus of claim 17, wherein the image sample is a face image sample, the prediction module further to:

converting the fusion feature map into feature vectors to be identified;

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

20. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-9.