CN115393896A

CN115393896A - Cross-mode pedestrian re-identification method, device and medium of infrared visible light

Info

Publication number: CN115393896A
Application number: CN202211011952.4A
Authority: CN
Inventors: 朱锦雷; 井焜; 刘辰飞; 张传锋
Original assignee: Synthesis Electronic Technology Co Ltd
Current assignee: Synthesis Electronic Technology Co Ltd
Priority date: 2022-08-23
Filing date: 2022-08-23
Publication date: 2022-11-25

Abstract

The embodiment of this specification discloses a cross-modal pedestrian re-identification method, equipment and medium of infrared and visible light, which relates to the field of image recognition technology. The image includes any one or more of visible light images and infrared images; the pedestrian image is input to the pre-trained cross-modal feature generation model, and the pedestrian cross-modal feature corresponding to the pedestrian image is generated, wherein the cross-modal feature generation The model is trained using a bimodal image dataset, and the loss functions include the center maximum mean difference loss function, the intra-class heterogeneity center loss function, and the cross-modal triplet loss function; the pedestrian cross-modal features are combined with the pre-built feature library Compare the features in the feature library to obtain the specified features that meet the requirements in the feature database, so as to determine the identification result of the pedestrian to be identified according to the specified features that meet the requirements.

Description

A cross-modal pedestrian re-identification method, device and medium of infrared and visible light

技术领域technical field

本说明书涉及图像识别技术领域，尤其涉及一种红外可见光的跨模态行人重识别方法、设备及介质。This specification relates to the technical field of image recognition, and in particular to a cross-modal pedestrian re-identification method, device and medium of infrared and visible light.

背景技术Background technique

行人重识别(Person Re-Identification，简称Re-ID)，是一种利用计算机视觉技术来检索图像或者视频序列中是否存在特定行人的AI技术，在智慧城市等监控场景中具有重要的应用意义和前景。跨模态行人识别，主要是指基于可见光、红外光、热成像等多模态图像，采用跨模态神经网络对行人等目标进行统一表征与识别。可见光摄像机在恶劣照明环境下捕捉有效信息的能力十分有限，例如夜间环境，进而限制了其在实际中的适用性，相比之下，红外摄像机对照明条件依赖性小，在黑暗环境中可捕获更多信息。双模相机(RGB和红外模式)因价格优势，已广泛应用于许多监控系统中，使用双模相机可以有效克服单模态行人匹配的局限性，研究可见光红外图像的跨模态行人再识别在实际应用中具有重要意义。Person Re-Identification (Re-ID for short) is an AI technology that uses computer vision technology to retrieve whether a specific pedestrian exists in an image or video sequence. It has important application significance and prospect. Cross-modal pedestrian recognition mainly refers to the unified representation and recognition of pedestrians and other targets based on multi-modal images such as visible light, infrared light, and thermal imaging, using cross-modal neural networks. The ability of visible light cameras to capture effective information in harsh lighting environments, such as nighttime environments, limits its applicability in practice. In contrast, infrared cameras are less dependent on lighting conditions and can capture More information. Dual-mode cameras (RGB and infrared modes) have been widely used in many surveillance systems due to their price advantages. The use of dual-mode cameras can effectively overcome the limitations of single-mode pedestrian matching. Research on cross-modal pedestrian re-identification of visible light infrared images is It is of great significance in practical application.

在恶劣照明环境尤其夜间等条件下，可见光与红外成像可以有效互补，现有跨模态行人识别方法一般基于模态共享特征学习或模态图像生成进行研究。模态共享特征学习将不同模态的图像映射到同一特征空间，并学习图像之间的共同特征，忽略了模态(整体)和身份(个体)之间的关系，导致提取的特征可能存在模态差异。模态图像生成通过生成特定模态的图像来补充缺失的信息,然而，生成的图像通常不稳定,生成图像的纹理可能严重缺失。综上可知，在进行跨模态行人识别时，无法得到准确的识别结果。In harsh lighting environments, especially at night, visible light and infrared imaging can effectively complement each other. Existing cross-modal pedestrian recognition methods are generally studied based on modal shared feature learning or modal image generation. Modal shared feature learning maps images of different modalities to the same feature space, and learns common features between images, ignoring the relationship between modality (whole) and identity (individual), resulting in possible modalities in the extracted features. state difference. Modal image generation supplements the missing information by generating modality-specific images, however, the generated images are usually unstable and the texture of the generated images may be severely missing. To sum up, it can be seen that when performing cross-modal pedestrian recognition, accurate recognition results cannot be obtained.

发明内容Contents of the invention

本说明书一个或多个实施例提供了一种红外可见光的跨模态行人重识别方法、设备及介质，用于解决如下技术问题：在进行跨模态行人识别时，无法得到准确的识别结果。One or more embodiments of this specification provide a cross-modal pedestrian re-identification method, device and medium of infrared and visible light, which are used to solve the following technical problem: accurate recognition results cannot be obtained during cross-modal pedestrian recognition.

本说明书一个或多个实施例采用下述技术方案：One or more embodiments of this specification adopt the following technical solutions:

本说明书一个或多个实施例提供一种红外可见光的跨模态行人重识别方法，所述方法包括：通过双模采集装置，采集待识别行人对应的行人图像，其中，所述行人图像包括可见光图像和红外图像中的任意一种或多种；将所述行人图像，输入至预先训练的跨模态特征生成模型，生成所述行人图像对应的行人跨模态特征，其中，所述跨模态特征生成模型使用双模态图像数据集进行训练，损失函数包括中心最大平均差异损失函数、类内异质中心损失函数以及跨模态三元组损失函数；将所述行人跨模态特征与预先构建的特征库中的特征进行对比，得到特征库中符合要求的指定特征，以根据所述符合要求的指定特征，确定所述待识别行人的身份识别结果，其中，所述特征库中包括多个行人的特征信息和身份信息。One or more embodiments of this specification provide a cross-modal pedestrian re-identification method of infrared and visible light. Any one or more of images and infrared images; the pedestrian image is input to a pre-trained cross-modal feature generation model to generate pedestrian cross-modal features corresponding to the pedestrian image, wherein the cross-modal The modality feature generation model is trained using a bimodal image data set, and the loss function includes a center maximum average difference loss function, a heterogeneous center loss function within a class, and a cross-modal triple loss function; the pedestrian cross-modal feature is combined with Comparing the features in the pre-built feature library to obtain the specified features in the feature library that meet the requirements, so as to determine the identification result of the pedestrian to be identified according to the specified features that meet the requirements, wherein the feature library includes Feature information and identity information of multiple pedestrians.

进一步地，将所述行人图像，输入至预先训练的跨模态特征生成模型，生成所述行人图像对应的行人跨模态特征之前，所述方法还包括构建所述双模态图像数据集，所述数据集中包括多个样本个体的多组双模态图像，其中，每组双模态图像包括多个红外样本图像和多个可见光样本图像；使用所述数据集中的所述双模态图像，对预先构建的特征生成模型进行训练，确定出模型参数，以得到所述模型参数下的跨模态特征生成模型。Further, the pedestrian image is input to a pre-trained cross-modal feature generation model, and before the pedestrian cross-modal feature corresponding to the pedestrian image is generated, the method further includes constructing the dual-modal image data set, The data set includes multiple sets of dual-modal images of multiple sample individuals, wherein each set of dual-modal images includes multiple infrared sample images and multiple visible light sample images; using the dual-modal images in the data set , train the pre-built feature generation model, and determine the model parameters, so as to obtain the cross-modal feature generation model under the model parameters.

进一步地，使用所述数据集中的所述双模态图像，对预先构建的特征生成模型进行训练，具体包括：将所述数据集中的所述双模态图像输入至所述特征生成模型中，在希尔伯特空间缩短所述双模态图像中的特征中心之间的距离，其中，所述特征中心为所述双模态图像中属于不同模态下的行人特征的特征中心；通过Transformer特征生成网络，对所述双模态图像中的样本个体进行单独区分，以对所述特征生成模型进行训练，生成所述样本个体的跨模态特征。Further, using the bimodal images in the data set to train a pre-built feature generation model specifically includes: inputting the bimodal images in the data set into the feature generation model, Shorten the distance between the feature centers in the dual-modal image in the Hilbert space, wherein the feature center is the feature center of pedestrian features belonging to different modes in the dual-modal image; by Transformer The feature generation network separates the sample individuals in the bimodal image to train the feature generation model to generate cross-modal features of the sample individuals.

进一步地，在希尔伯特空间缩短所述双模态图像中的特征中心之间的距离，具体包括：获取所述双模态图像中的可见光样本图像、红外样本图像、可见光样本特征提取函数以及红外样本特征提取函数；基于所述可见光样本图像、所述红外样本图像、所述可见光样本特征提取函数以及所述红外样本特征提取函数，定义所述中心最大平均差异损失函数；通过所述中心最大平均差异损失函数，将不同模态下的特征中心映射到所述希尔伯特空间，缩短所述双模态图像中两种模态的特征中心之间的中心距离。Further, shortening the distance between feature centers in the dual-mode image in the Hilbert space specifically includes: acquiring the visible light sample image, the infrared sample image, and the visible light sample feature extraction function in the dual-modal image and an infrared sample feature extraction function; based on the visible light sample image, the infrared sample image, the visible light sample feature extraction function and the infrared sample feature extraction function, define the center maximum average difference loss function; through the center The maximum average difference loss function maps the feature centers in different modalities to the Hilbert space, shortening the center distance between the feature centers of the two modalities in the bimodal image.

进一步地，通过所述中心最大平均差异损失函数，将不同模态下的特征中心映射到所述希尔伯特空间，缩短所述双模态图像中两种模态的特征中心之间的中心距离之后，所述方法还包括：分别获取所述可见光样本图像的跨模态类均值中心位置和所述红外样本图像中的跨模态类均值中心位置；基于所述可见光样本图像的跨模态类均值中心位置和所述红外样本图像中的跨模态类均值中心位置，定义第一中心协方差损失，其中，所述中心协方差损失为包括所述可见光样本图像中的可见光特征与所述可见光样本图像的跨模态类均值中心位置的可见光中心协方差损失，以及所述红外样本图像中的红外特征与所述红外样本图像中的跨模态类均值中心位置的红外中心协方差损失；定义所述双模态图像中整体跨模态类均值中心位置，基于所述跨模态类均值中心位置、所述可见光样本图像的跨模态类均值中心位置和所述红外样本图像中的跨模态类均值中心位置，计算两种模态下的跨模态类均值中心位置与所述整体跨模态类均值中心位置的第二中心协方差损失；根据所述第一中心协方差损失和所述第二中心协方差损失，确定所述类内异质中心损失函数；通过所述类内异质中心损失函数，控制每种模态下样本个体与所述双模态图像中两种模态的特征中心之间的距离。Further, through the center maximum average difference loss function, the feature centers in different modalities are mapped to the Hilbert space, and the center between the feature centers of the two modalities in the bimodal image is shortened After the distance, the method further includes: obtaining the cross-modal class mean center position of the visible light sample image and the cross-modal class mean center position in the infrared sample image respectively; based on the cross-modal class mean center position of the visible light sample image The class mean center position and the cross-modal class mean center position in the infrared sample image define a first center covariance loss, wherein the center covariance loss includes the visible light features in the visible light sample image and the The visible light center covariance loss of the cross-modal class mean center position of the visible light sample image, and the infrared center covariance loss between the infrared feature in the infrared sample image and the cross-modal class mean center position in the infrared sample image; Defining the overall cross-modal class mean center position in the dual-modal image, based on the cross-modal class mean center position, the cross-modal class mean center position of the visible light sample image, and the cross-modal class mean center position in the infrared sample image Modal class mean center position, calculating the second center covariance loss of the cross-modal class mean center position and the overall cross-modal class mean center position under the two modes; according to the first center covariance loss and The second center covariance loss determines the intra-class heterogeneous center loss function; through the intra-class heterogeneous center loss function, controls the relationship between the sample individual in each modality and the two modes in the bimodal image The distance between the feature centers of the states.

进一步地，通过Transformer特征生成网络，对所述双模态图像中的样本个体进行单独区分，以对所述特征生成模型进行训练，生成所述样本个体的跨模态特征，具体包括：获取所述双模态图像数据集中，同一样本个体对应的可见光样本图像和红外样本图像；提取所述可见光样本图像的可见光样本特征，以及所述红外样本图像的红外特征；将所述可见光样本特征和所述红外特征，输入至所述Transformer特征生成网络，生成输出可见光样本特征和输出红外特征；通过预先生成的所述跨模态三元组损失函数，对所述可见光样本特征和所述红外特征进行约束，生成所述样本个体的跨模态特征。Further, through the Transformer feature generation network, the sample individuals in the dual-modal image are separately distinguished, so as to train the feature generation model and generate cross-modal features of the sample individuals, specifically including: obtaining the In the dual-mode image data set, the visible light sample image and the infrared sample image corresponding to the same sample individual; extract the visible light sample features of the visible light sample image, and the infrared feature of the infrared sample image; combine the visible light sample features and the The infrared feature is input to the Transformer feature generation network to generate the output visible light sample feature and the output infrared feature; through the pre-generated cross-modal triplet loss function, the visible light sample feature and the infrared feature are processed constraints to generate the cross-modal features of the sample individual.

进一步地，所述中心最大平均差异损失函数为：Further, the center maximum average difference loss function is:

其中，L_CMMD为所述中心最大平均差异损失函数，

为可见光样本图像，f_v为可见光样本特征提取函数；

为近红外样本图像，f_I为近红外样本特征提取函数，ψ为再生核希尔伯特空间映射函数，P为选取的样本个体的身份种类数量，K为每个身份对应的样本个体的双模态图像数量。

Among them, L _CMMD is the center maximum mean difference loss function,

is the visible light sample image, f _v is the feature extraction function of the visible light sample;

is the near-infrared sample image, f _I is the feature extraction function of the near-infrared sample, ψ is the Hilbert space mapping function of the regeneration kernel, P is the number of identity types of the selected sample individual, K is the pair of sample individuals corresponding to each identity The number of modal images.

进一步地，通过预先生成的所述跨模态三元组损失函数，对所述可见光样本特征和所述红外特征进行约束，生成所述样本个体的跨模态特征之前，所述方法还包括：在所述双模态图像数据集中获取指定样本个体的指定红外样本图像和指定可见光样本图像，并获取与所述指定样本个体不同身份的预设样本个体的预设红外样本图像和预设可见光样本图像；基于所述指定红外样本图像和所述预设红外样本图像，形成红外三元组损失函数；基于所述指定可见光样本图像和所述预设可见光样本图像，形成可见光三元组损失函数；在批处理组内，根据所述红外三元组损失函数和所述可见光三元组损失函数，定义同身份样本个体的跨模态三元组损失函数。Further, the visible light sample feature and the infrared feature are constrained by the pre-generated cross-modal triplet loss function, and before generating the cross-modal feature of the sample individual, the method further includes: Obtain a designated infrared sample image and a designated visible light sample image of a designated sample individual in the dual-mode image data set, and obtain a preset infrared sample image and a preset visible light sample image of a preset sample individual with a different identity from the designated sample individual image; forming an infrared triplet loss function based on the specified infrared sample image and the preset infrared sample image; forming a visible light triplet loss function based on the specified visible light sample image and the preset visible light sample image; In the batch processing group, according to the infrared triplet loss function and the visible light triplet loss function, define a cross-modal triplet loss function for sample individuals with the same identity.

本说明书一个或多个实施例提供一种红外可见光的跨模态行人重识别设备，包括：One or more embodiments of this specification provide a cross-modal pedestrian re-identification device for infrared and visible light, including:

至少一个处理器；以及，at least one processor; and,

与所述至少一个处理器通信连接的存储器；其中，a memory communicatively coupled to the at least one processor; wherein,

所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器能够：The memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to:

通过双模采集装置，采集待识别行人对应的行人图像，其中，所述行人图像包括可见光图像和红外图像中的任意一种或多种；将所述行人图像，输入至预先训练的跨模态特征生成模型，生成所述行人图像对应的行人跨模态特征，其中，所述跨模态特征生成模型使用双模态图像数据集进行训练，损失函数包括中心最大平均差异损失函数、类内异质中心损失函数以及跨模态三元组损失函数；将所述行人跨模态特征与预先构建的特征库中的特征进行对比，得到特征库中符合要求的指定特征，以根据所述符合要求的指定特征，确定所述待识别行人的身份识别结果，其中，所述特征库中包括多个行人的特征信息和身份信息。Through the dual-mode acquisition device, the pedestrian image corresponding to the pedestrian to be identified is collected, wherein the pedestrian image includes any one or more of visible light images and infrared images; the pedestrian image is input to the pre-trained cross-modal A feature generation model for generating pedestrian cross-modal features corresponding to the pedestrian image, wherein the cross-modal feature generation model uses a dual-modal image data set for training, and the loss function includes a center maximum average difference loss function, an intra-class difference Centroid loss function and cross-modal triplet loss function; compare the cross-modal features of the pedestrian with the features in the pre-built feature library, and obtain the specified features in the feature library that meet the requirements, so as to meet the requirements according to the The specified feature of the pedestrian to be identified is determined to determine the identification result of the pedestrian to be identified, wherein the feature database includes feature information and identity information of a plurality of pedestrians.

本说明书一个或多个实施例提供的一种非易失性计算机存储介质，存储有计算机可执行指令，所述计算机可执行指令设置为：A non-volatile computer storage medium provided by one or more embodiments of this specification stores computer-executable instructions, and the computer-executable instructions are set to:

本说明书实施例采用的上述至少一个技术方案能够达到以下有益效果：通过上述技术方案，结合中心最大平均差异损失函数、类内异质中心损失函数以及跨模态三元组损失函数，可以从模态和身份两个方面对模型进行全面优化，充分拉近两种模态分布之间的距离，保证了生成的跨模态特征的准确性，同时增强了对行人的身份识别能力，可以对行人身份进行准确识别。The above-mentioned at least one technical solution adopted in the embodiment of this specification can achieve the following beneficial effects: through the above-mentioned technical solution, combined with the center maximum average difference loss function, the intra-class heterogeneous center loss function and the cross-modal triplet loss function, it can be obtained from the model The model is fully optimized in terms of both mode and identity, and the distance between the two modal distributions is fully shortened to ensure the accuracy of the generated cross-modal features. At the same time, the ability to identify pedestrians is enhanced, and the pedestrian accurate identification.

附图说明Description of drawings

为了更清楚地说明本说明书实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本说明书中记载的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。在附图中：In order to more clearly illustrate the technical solutions in the embodiments of this specification or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments described in this specification. Those skilled in the art can also obtain other drawings based on these drawings without any creative effort. In the attached picture:

图1为本说明书实施例提供的一种红外可见光的跨模态行人重识别方法的流程示意图；FIG. 1 is a schematic flowchart of a cross-modal pedestrian re-identification method of infrared and visible light provided by an embodiment of this specification;

图2为本说明书实施例提供的一种缩短空间距离与融合跨模态特征的方法示意图；FIG. 2 is a schematic diagram of a method for shortening spatial distance and fusing cross-modal features provided by an embodiment of this specification;

图3为本说明书实施例提供的一种红外可见光的跨模态行人特征的生成框架的结构示意图；FIG. 3 is a schematic structural diagram of a framework for generating cross-modal pedestrian features of infrared and visible light provided by an embodiment of this specification;

图4为本说明书实施例提供的一种红外可见光的跨模态行人重识别设备的结构示意图。FIG. 4 is a schematic structural diagram of an infrared-visible light cross-modal pedestrian re-identification device provided by an embodiment of this specification.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本说明书中的技术方案，下面将结合本说明书实施例中的附图，对本说明书实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本说明书一部分实施例，而不是全部的实施例。基于本说明书实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都应当属于本说明书保护的范围。In order to enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below in conjunction with the drawings in the embodiments of this specification. Obviously, the described The embodiments are only some of the embodiments in this specification, not all of them. Based on the embodiments of this specification, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of this specification.

本说明书实施例提供一种红外可见光的跨模态行人重识别方法，需要说明的是，本说明书实施例中的执行主体可以是服务器，也可以是任意一种具备数据处理能力的设备。图1为本说明书实施例提供的一种红外可见光的跨模态行人重识别方法的流程示意图，如图1所示，主要包括如下步骤：The embodiment of this specification provides a cross-modal pedestrian re-identification method using infrared and visible light. It should be noted that the execution subject in the embodiment of this specification may be a server, or any device with data processing capabilities. Fig. 1 is a schematic flowchart of a cross-modal pedestrian re-identification method of infrared and visible light provided by the embodiment of this specification. As shown in Fig. 1, it mainly includes the following steps:

步骤S101，通过双模采集装置，采集待识别行人对应的行人图像。In step S101, a pedestrian image corresponding to a pedestrian to be identified is collected by a dual-mode collection device.

其中，该行人图像包括可见光图像和红外图像中的任意一种或多种。Wherein, the pedestrian image includes any one or more of visible light images and infrared images.

在本说明书的一个实施例中，双模采集装置可以理解为双模相机，双模相机包括两种模式，一种是可见光模式，一种是红外模式。为了增强对行人重识别的准确性，避免可见光相机受到黑暗环境的影响，使用双模相机采集待识别行人的行人图像。In one embodiment of this specification, the dual-mode acquisition device can be understood as a dual-mode camera, and the dual-mode camera includes two modes, one is a visible light mode and the other is an infrared mode. In order to enhance the accuracy of pedestrian re-identification and avoid the influence of visible light camera by dark environment, a dual-mode camera is used to collect pedestrian images of pedestrians to be recognized.

步骤S102，将行人图像，输入至预先训练的跨模态特征生成模型，生成行人图像对应的行人跨模态特征。In step S102, the pedestrian image is input to the pre-trained cross-modal feature generation model, and the pedestrian cross-modal feature corresponding to the pedestrian image is generated.

其中，跨模态特征生成模型使用双模态图像数据集进行训练，损失函数包括中心最大平均差异损失函数、类内异质中心损失函数以及跨模态三元组损失函数。Among them, the cross-modal feature generation model is trained with a bimodal image dataset, and the loss functions include the center maximum average difference loss function, the intra-class heterogeneity center loss function, and the cross-modal triplet loss function.

在本说明书的一个实施例中，将待识别行人对应的行人图像，输入至预先训练的跨模态特征生成模型中，以输出行人图像对应的行人跨模态特征。需要说明的是，本实施例中的跨模态生成模型，在进行特征提取时，首先在希尔伯特空间缩短两种模态之间的距离，通过中心最大均值损失函数实现在两种模态下对相同身份的聚类，通过类内异质中心损失函数计算个体跨模态特征中心距离损失，从而将两种模态的中心拉近；然后采用Transformer特征生成网络与跨模态三元组损失函数，对不同身份的行人图像个体进行区分，拉近具有相同身份的跨模态图像，推远具有不同身份的跨模态图像。In one embodiment of the present specification, the pedestrian image corresponding to the pedestrian to be identified is input into the pre-trained cross-modal feature generation model to output the pedestrian cross-modal feature corresponding to the pedestrian image. It should be noted that for the cross-modal generation model in this embodiment, when performing feature extraction, the distance between the two modes is first shortened in the Hilbert space, and the distance between the two modes is realized by the center maximum mean loss function. For the clustering of the same identity in the same state, the individual cross-modal feature center distance loss is calculated through the intra-class heterogeneous center loss function, so as to bring the centers of the two modalities closer; then the Transformer feature generation network and the cross-modal triplet are used The group loss function distinguishes individual pedestrian images with different identities, draws cross-modal images with the same identity closer, and pushes away cross-modal images with different identities.

将该行人图像，输入至预先训练的跨模态特征生成模型，生成该行人图像对应的行人跨模态特征之前，该方法还包括：构建该双模态图像数据集，该数据集中包括多个样本个体的多组双模态图像，其中，每组双模态图像包括多个红外样本图像和多个可见光样本图像；使用该数据集中的该双模态图像，对预先构建的特征生成模型进行训练，确定出模型参数，以得到该模型参数下的跨模态特征生成模型。The pedestrian image is input to the pre-trained cross-modal feature generation model, and before the pedestrian cross-modal feature corresponding to the pedestrian image is generated, the method also includes: constructing the bimodal image data set, which includes multiple Multiple sets of dual-modal images of sample individuals, wherein each set of dual-modal images includes multiple infrared sample images and multiple visible light sample images; using the dual-modal images in the data set, the pre-built feature generation model is Training, to determine the model parameters, so as to obtain the cross-modal feature generation model under the model parameters.

在本说明书的一个实施例中，获取大量双模态图像数据集，在数据集中包括多个样本个体的多组双模态图像，每组双模态图像包括多个红外样本图像和多个可见光样本图像，每个样本个体的身份不同。使用数据集中的双模态图像对特征生成模型进行训练，以确定出模型参数，得到符合要求的模型参数下的跨模态特征生成模型。In one embodiment of this specification, a large number of dual-modality image datasets are obtained, including multiple sets of dual-modality images of multiple sample individuals in the data set, and each set of dual-modality images includes multiple infrared sample images and multiple visible light samples. Sample images, the identity of each sample individual is different. The feature generation model is trained by using the bimodal images in the data set to determine the model parameters, and a cross-modal feature generation model under the required model parameters is obtained.

使用该数据集中的该双模态图像，对预先构建的特征生成模型进行训练，具体包括：将该数据集中的该双模态图像输入至该特征生成模型中，在希尔伯特空间缩短该双模态图像中的特征中心之间的距离，其中，该特征中心为该双模态图像中属于不同模态下的行人特征的特征中心；通过Transformer特征生成网络，对该双模态图像中的样本个体进行单独区分，以对该特征生成模型进行训练，生成该样本个体的跨模态特征。Using the bimodal image in the data set to train the pre-built feature generation model, specifically includes: inputting the bimodal image in the data set into the feature generation model, shortening the feature generation model in Hilbert space The distance between the feature centers in the dual-modal image, wherein the feature center is the feature center of the pedestrian features belonging to different modes in the dual-modal image; through the Transformer feature generation network, in the dual-modal image Separately distinguish the sample individuals to train the feature generation model to generate the cross-modal features of the sample individuals.

在希尔伯特空间缩短该双模态图像中的特征中心之间的距离，具体包括：获取该双模态图像中的可见光样本图像、红外样本图像、可见光样本特征提取函数以及红外样本特征提取函数；基于该可见光样本图像、该红外样本图像、该可见光样本特征提取函数以及该红外样本特征提取函数，定义该中心最大平均差异损失函数；通过该中心最大平均差异损失函数，将不同模态下的特征中心映射到该希尔伯特空间，缩短该双模态图像中两种模态的特征中心之间的中心距离。该中心最大平均差异损失函数为：Shorten the distance between the feature centers in the dual-modal image in the Hilbert space, specifically including: obtaining the visible light sample image, the infrared sample image, the visible light sample feature extraction function and the infrared sample feature extraction function in the dual-modal image function; based on the visible light sample image, the infrared sample image, the visible light sample feature extraction function and the infrared sample feature extraction function, define the center maximum average difference loss function; through the center maximum average difference loss function, the different modes The feature center of is mapped to the Hilbert space, shortening the center distance between the feature centers of the two modalities in the bimodal image. The centered maximum mean difference loss function is:

其中，L_CMMD为该中心最大平均差异损失函数，

为可见光样本图像，f_v为可见光样本特征提取函数；

为近红外样本图像，f_I为近红外样本特征提取函数，ψ为再生核希尔伯特空间映射函数。Among them, L _CMMD is the maximum mean difference loss function of the center,

is the near-infrared sample image, f _I is the near-infrared sample feature extraction function, and ψ is the regeneration kernel Hilbert space mapping function.

在本说明书的一个实施例中，将每种模台视为一个整体，将这两种模态的中心拉近。如果直接采用最大平均差异损失(Maximum Mean Discrepancy，MMD)来减少域间差异，由于每个样本都参与了缩短两个分布之间距离，MMD损失在训练过程中剧烈波动，导致对局部样本过于敏感，不利于稳定拉近两种模态的中心距离。由于样本中心在一定程度上包含每个样本的信息，因此样本可以表示分布的信息，因此设计中心最大均值差损失(CenterMaximum Mean Discrepancy，CMMD)，来测量两个模态不同但分布相关的模态中心距离。在数据集中随机抽取P个样本个体，对应P类身份；每个身份选取K个样本数据形成批量处理组Batch，其中两个模态Batch具有相同的身份类别。也就是说，P为选取的样本个体的身份种类数量，K为每个身份对应的样本个体的双模态图像数量。In one embodiment of the present specification, each type of mold table is regarded as a whole, and the centers of the two modes are drawn closer. If the maximum mean discrepancy loss (Maximum Mean Discrepancy, MMD) is directly used to reduce the inter-domain difference, since each sample participates in shortening the distance between the two distributions, the MMD loss fluctuates violently during the training process, resulting in over-sensitivity to local samples , which is not conducive to stably narrowing the center distance of the two modes. Since the sample center contains the information of each sample to a certain extent, the sample can represent the information of the distribution, so the center maximum mean difference loss (CenterMaximum Mean Discrepancy, CMMD) is designed to measure two modes that are different but related to the distribution Center distance. Randomly select P sample individuals in the data set, corresponding to P types of identities; select K sample data for each identity to form a batch processing group Batch, in which two modal batches have the same identity category. That is to say, P is the number of identity categories of the selected sample individuals, and K is the number of bimodal images of the sample individuals corresponding to each identity.

CMMD主要关心特征中心之间的距离，其损失函数定义为：CMMD mainly cares about the distance between feature centers, and its loss function is defined as:

其中，

L_CMMD为该中心最大平均差异损失函数，

为可见光样本图像，f_v为可见光样本特征提取函数；

为近红外样本图像，f_I为近红外样本特征提取函数，ψ为再生核希尔伯特空间映射函数，KN为高斯核函数，定义为：

σ为超参数。in,

L _CMMD is the maximum mean difference loss function of the center,

is the visible light sample image, and f _v is the feature extraction function of the visible light sample;

is the near-infrared sample image, f _I is the near-infrared sample feature extraction function, ψ is the regeneration kernel Hilbert space mapping function, KN is the Gaussian kernel function, defined as:

σ is a hyperparameter.

通过该中心最大平均差异损失函数，将不同模态下的特征中心映射到该希尔伯特空间，缩短该双模态图像中两种模态的特征中心之间的中心距离之后，该方法还包括：分别获取该可见光样本图像的跨模态类均值中心位置和该红外样本图像中的跨模态类均值中心位置；基于该可见光样本图像的跨模态类均值中心位置和该红外样本图像中的跨模态类均值中心位置，定义第一中心协方差损失，其中，该中心协方差损失为包括该可见光样本图像中的可见光特征与该该可见光样本图像的跨模态类均值中心位置的可见光中心协方差损失，以及该红外样本图像中的红外特征与该红外样本图像中的跨模态类均值中心位置的红外中心协方差损失；定义该双模态图像中整体跨模态类均值中心位置，基于该跨模态类均值中心位置、该可见光样本图像的跨模态类均值中心位置和该红外样本图像中的跨模态类均值中心位置，计算两种模态下的跨模态类均值中心位置与该整体跨模态类均值中心位置的第二中心协方差损失；根据该第一中心协方差损失和该第二中心协方差损失，确定该类内异质中心损失函数；通过该类内异质中心损失函数，控制每种模态下样本个体与该双模态图像中两种模态的特征中心之间的距离。Through the center maximum average difference loss function, the feature centers in different modalities are mapped to the Hilbert space, and after shortening the center distance between the feature centers of the two modalities in the dual-modal image, the method also Including: obtaining the cross-modal class mean center position of the visible light sample image and the cross-modal class mean center position in the infrared sample image respectively; based on the cross-modal class mean center position of the visible light sample image and the infrared sample image The center position of the cross-modal class mean value of , defines the first center covariance loss, wherein the center covariance loss is the visible light including the visible light features in the visible light sample image and the cross-modal class mean center position of the visible light sample image Center covariance loss, and the infrared center covariance loss between the infrared feature in the infrared sample image and the cross-modal class mean center position in the infrared sample image; define the overall cross-modal class mean center position in the dual-modal image , based on the cross-modal class mean center position, the cross-modal class mean center position of the visible light sample image and the cross-modal class mean center position in the infrared sample image, calculate the cross-modal class mean under the two modalities The second center covariance loss between the center position and the center position of the overall cross-modal class mean; according to the first center covariance loss and the second center covariance loss, determine the heterogeneous center loss function within the class; through the class Inner heterogeneity center loss function, which controls the distance between sample individuals in each modality and the feature centers of the two modalities in this bimodal image.

通过L_CMMD控制了两种模态数据之间的中心距离，接下来控制每种模态下每个类别实例到其类中心的距离。The center distance between the two modality data is controlled by L _CMMD , and then the distance from each class instance to its class center in each modality is controlled.

首先，将双模态图像中的两种模态下的图像作为一个整体，定义双模态图像中整体跨模态类均值中心位置，整体跨模态同类均值中心C_i如下：First, the images under the two modalities in the bimodal image are taken as a whole, and the overall cross-modal class mean center position in the bimodal image is defined, and the overall cross-modal homogeneous mean center C _i is as follows:

并定义可见光样本图像的跨模态类均值中心位置

为：

定义红外样本图像中的跨模态类均值中心位置

为：

And define the center position of the cross-modal class mean value of the visible light sample image

for:

Defines the location of the center of the cross-modal class means in the infrared sample image

for:

根据可见光样本图像的跨模态类均值中心位置和红外样本图像中的跨模态类均值中心位置，定义第一中心协方差损失，得到第一中心协方差损失L_ICHC-A如下所示：According to the cross-modal class mean center position of the visible light sample image and the cross-modal class mean center position in the infrared sample image, the first center covariance loss is defined, and the first center covariance loss L _ICHC-A is obtained as follows:

其中，中心协方差损失为可见光样本图像中的其他可见光特征与可见光样本图像的跨模态类均值中心位置的可见光中心协方差损失，以及红外样本图像中的红外特征与红外样本图像中的跨模态类均值中心位置的红外中心协方差损失的和。

Among them, the central covariance loss is the visible central covariance loss between other visible light features in the visible light sample image and the cross-modal class mean center position of the visible light sample image, and the infrared feature in the infrared sample image and the cross-modal class mean in the infrared sample image. The sum of the IR center covariance loss for the center position of the state class mean.

通过跨模态类均值中心位置、可见光样本图像的跨模态类均值中心位置和红外样本图像中的跨模态类均值中心位置，计算两种模态下的跨模态类均值中心位置与整体跨模态类均值中心位置的第二中心协方差损失，得到的第二中心协方差损失L_ICHC-B如下：Calculate the cross-modal class mean center position and the overall The second central covariance loss across the modal class mean center position, the obtained second central covariance loss _LICHC-B is as follows:

将第一中心协方差损失和第二中心协方差损失的和，作为类内异质中心损失函数，得到类内异质中心损失函数L_ICHC如下：L_ICHC＝L_ICHC-A+L_ICHC-B。The sum of the first center covariance loss and the second center covariance loss is used as the intra-class heterogeneous center loss function to obtain the intra-class heterogeneous center loss function L _ICHC as follows: L _ICHC ＝L _ICHC-A +L _ICHC-B .

在本说明书的一个实施例中，将中心最大平均差异损失函数和类内异质中心损失函数作为缩短两种模态中心之间距离的损失函数，最终的损失函数为：L_Whole＝L_CMMD+L_ICHC，其中L_Whole为最终的损失函数。In one embodiment of this specification, the center maximum average difference loss function and the intra-class heterogeneity center loss function are used as the loss function to shorten the distance between the two modal centers, and the final loss function is: L _Whole = L _CMMD + L _ICHC , where L _Whole is the final loss function.

为了进一步控制实例特征相似度，引入交叉熵损失函数L_CE，In order to further control the instance feature similarity, a cross-entropy loss function L _CE is introduced,

其中，x_i为输入样本，y_i为对应标签，

是尺度为1×n的FC特征中的第j个概率值。模态距离损失函数L_W定义如下，L_W＝L_CE+λL_Whole其中，λ为超参数。Among them, x _i is the input sample, y _i is the corresponding label,

is the jth probability value in the FC feature with scale 1×n. The modal distance loss function L _W is defined as follows, L _W = L _CE + λL _Whole where λ is a hyperparameter.

通过Transformer特征生成网络，对该双模态图像中的样本个体进行单独区分，以对该特征生成模型进行训练，生成该样本个体的跨模态特征，具体包括：获取该双模态图像数据集中，同一样本个体对应的可见光样本图像和红外样本图像；提取该可见光样本图像的可见光样本特征，以及该红外样本图像的红外特征；将该可见光样本特征和该红外特征，输入至该Transformer特征生成网络，生成输出可见光样本特征和输出红外特征；通过预先生成的该跨模态三元组损失函数，对该可见光样本特征和该红外特征进行约束，生成该样本个体的跨模态特征。Through the Transformer feature generation network, the sample individuals in the dual-modal image are separately distinguished, so as to train the feature generation model and generate the cross-modal features of the sample individual, specifically including: obtaining the dual-modal image data set , the visible light sample image and the infrared sample image corresponding to the same sample individual; extract the visible light sample feature of the visible light sample image, and the infrared feature of the infrared sample image; input the visible light sample feature and the infrared feature to the Transformer feature generation network , to generate the output visible light sample feature and the output infrared feature; through the pre-generated cross-modal triplet loss function, the visible light sample feature and the infrared feature are constrained to generate the cross-modal feature of the sample individual.

通过预先生成的该跨模态三元组损失函数，对该可见光样本特征和该红外特征进行约束，生成该样本个体的跨模态特征之前，该方法还包括：在该双模态图像数据集中获取指定样本个体的指定红外样本图像和指定可见光样本图像，并获取与该指定样本个体不同身份的预设样本个体的预设红外样本图像和预设可见光样本图像；基于该指定红外样本图像和该预设红外样本图像，形成红外三元组损失函数；基于该指定可见光样本图像和该预设可见光样本图像，形成可见光三元组损失函数；在批处理组内，根据该红外三元组损失函数和该可见光三元组损失函数，定义同身份样本个体的跨模态三元组损失函数。The visible light sample feature and the infrared feature are constrained by the pre-generated cross-modal triplet loss function, and before generating the cross-modal feature of the sample individual, the method further includes: in the dual-modal image data set Obtain the specified infrared sample image and the specified visible light sample image of the specified sample individual, and obtain the preset infrared sample image and the preset visible light sample image of the preset sample individual with a different identity from the specified sample individual; based on the specified infrared sample image and the Preset the infrared sample image to form an infrared triplet loss function; based on the specified visible light sample image and the preset visible light sample image, form a visible light triplet loss function; in the batch processing group, according to the infrared triplet loss function And the visible light triplet loss function defines the cross-modal triplet loss function of the sample individual with the same identity.

在本说明书的一个实施例中，为了缩小具有相同身份的跨模态图像之间的距离，引入Transformer特征生成网络进行特征生成。Transformer特征生成网络主要包括Encoder和Decoder两部分，训练时，将同一身份、不同模态特征

和

输入融合网络，通过

生成

通过

生成

生成的特征再用跨模态三元组损失函数进行约束。In one embodiment of this specification, in order to reduce the distance between cross-modal images with the same identity, a Transformer feature generation network is introduced for feature generation. The Transformer feature generation network mainly includes two parts: Encoder and Decoder. During training, the same identity and different modal features

and

Enter the fusion network, through

generate

pass

generate

The generated features are then constrained with a cross-modal triplet loss function.

跨模态三元组损失函数CMT判断不同身份的跨模态图像之间的距离，对于一个模态中的图像x_i，CMT损失会拉近与x_i具有相同身份图像的特征，并推开与x_i具有不同身份图像的特征距离。假设可见光样本为

则随机选取与其具有相同身份的红外样本

及具有不同身份的红外样本

形成可见光三元组损失函数L_V2I；同理，可得到红外三元组损失函数L_I2V，其中，可见光三元组损失函数L_V2I以及红外三元组损失函数L_I2V的表达式如下所示：The cross-modal triplet loss function CMT judges the distance between cross-modal images of different identities. For an image x _i in one modality, the CMT loss will pull the features of the image with the same identity as x _i closer and push them away Feature distances from x _i for images with different identities. Suppose the visible light sample is

Then randomly select the infrared sample with the same identity as

and infrared samples with different identities

The visible light triplet loss function L _V2I is formed; similarly, the infrared triplet loss function L _I2V can be obtained, where the expressions of the visible light triplet loss function L _V2I and the infrared triplet loss function L _I2V are as follows:

其中，γ为超参数。批处理组内，根据红外三元组损失函数和可见光三元组损失函数的和，定义同身份样本个体的跨模态三元组损失函数，同身份样本个体的跨模态三元组损失函数L_CMT如下所示：Among them, γ is a hyperparameter. In the batch processing group, according to the sum of the infrared triplet loss function and the visible light triplet loss function, define the cross-modal triplet loss function of the same identity sample individual, and the cross-modal triplet loss function of the same identity sample individual The L _CMT looks like this:

L_CMT＝L_V2I+L_I2V。L _CMT =L _V2I +L _I2V .

在本说明书的一个实施例中，考虑到不同身份实例之间存在交叉熵损失，将身份损失L_I定义如下，L_I＝L_CE+βL_CMT，其中，β为超参数。In one embodiment of this specification, considering the cross-entropy loss between different identity instances, the identity loss L _I is defined as follows, L _I =L _CE +βL _CMT , where β is a hyperparameter.

图2为本说明书实施例提供的一种缩短空间距离与融合跨模态特征的方法示意图，图3为本说明书实施例提供的一种红外可见光的跨模态行人特征的生成框架的结构示意图。如图2和图3所示，在整体模态拉近时，CMMD将模态中心映射到再生核希尔伯特空间，然后拉近两种模态中心的距离，使两种模态图像的分布更加接近；在个体身份识别时，经过Transform变换和个体跨模态融合特征，ICHC损失基于样本的身份一致性，将每个身份的样本看作一个类，使所有具有相同身份的样本，无论其所属哪种模态，都向其样本中心靠近。结合这两种损失，可以从模态和身份两个方面对网络进行全面优化，充分拉近两种模态分布之间的距离，同时增强个体身份识别能力。Fig. 2 is a schematic diagram of a method for shortening spatial distance and fusing cross-modal features provided by the embodiment of this specification, and Fig. 3 is a schematic structural diagram of a generation framework for cross-modal pedestrian features of infrared and visible light provided by the embodiment of this specification. As shown in Fig. 2 and Fig. 3, when the overall modality is shortened, CMMD maps the modality center to the regenerating kernel Hilbert space, and then shortens the distance between the two modality centers, so that the two modality images The distribution is closer; in individual identity recognition, after Transform transformation and individual cross-modal fusion features, ICHC loss is based on the identity consistency of samples, and each identity sample is regarded as a class, so that all samples with the same identity, no matter Which mode it belongs to, it is close to its sample center. Combining these two losses, the network can be fully optimized from two aspects of modality and identity, and the distance between the two modality distributions can be fully shortened, while enhancing the ability of individual identification.

由上述论述可知，在生成跨模态特征时包括两个阶段，一个阶段是在希尔伯特空间中缩短双模态图像中的特征中心之间的距离，通过第一阶段约束共享特征可能存在的模式差异；第二个阶段是通过Transformer特征生成网络与跨模态三元组损失函数生成共享的嵌入特征，通过第二阶段生成特征，增强个体辨识能力。From the above discussion, it can be seen that there are two stages in the generation of cross-modal features. One stage is to shorten the distance between the feature centers in the bimodal image in the Hilbert space. Through the first stage, the shared features may exist The mode difference; the second stage is to generate shared embedded features through the Transformer feature generation network and the cross-modal triple loss function, and generate features through the second stage to enhance individual recognition ability.

在本说明书的一个实施例中，在进行模型训练时，使用中心最大平均差异损失函数、类内异质中心损失函数、交叉熵损失函数及其组合，对可见光和红外图像的CNN特征提取网络进行约束，约束模态空间距离损失、模内协方差损失、模间协方差损失以及模内身份交叉熵损失，实现对第一阶段的训练，此处的CNN是指Convolutional Neural Networks，卷积神经网络。在第二阶段，将跨模态三元组损失函数作用于Transformer特征生成网络和所有的CNN特征提取网络，实现第二阶段的训练，两个阶段的训练交替执行。In one embodiment of this specification, when performing model training, the CNN feature extraction network for visible light and infrared images is performed using the center maximum average difference loss function, the intra-class heterogeneity center loss function, the cross-entropy loss function, and a combination thereof. Constraints, constrained modal space distance loss, intra-mode covariance loss, inter-mode covariance loss, and intra-mode identity cross-entropy loss, to achieve the first stage of training. CNN here refers to Convolutional Neural Networks, convolutional neural network . In the second stage, the cross-modal triple loss function is applied to the Transformer feature generation network and all CNN feature extraction networks to realize the second stage of training, and the two stages of training are executed alternately.

在本说明书的一个实施例中，可以将第一阶段Stage1与第二阶段Stage2所有损失函数组合后训练网络参数，Stage1与Stage2也可以其各自作用域上交替执行训练过程。In one embodiment of this specification, all loss functions of the first stage Stage1 and the second stage Stage2 can be combined to train network parameters, and Stage1 and Stage2 can also alternately execute the training process in their respective scopes.

步骤S103，将行人跨模态特征与预先构建的特征库中的特征进行对比，得到特征库中符合要求的指定特征，以根据符合要求的指定特征，确定待识别行人的身份识别结果。Step S103, comparing the cross-modal features of the pedestrian with the features in the pre-built feature library to obtain the specified features in the feature library that meet the requirements, so as to determine the identification result of the pedestrian to be identified according to the specified features that meet the requirements.

其中，特征库中包括多个行人的特征信息和身份信息。Wherein, the feature database includes feature information and identity information of multiple pedestrians.

在得到行人跨模态特征后，将待识别行人的跨模态特征与特征库中的特征进行对比，到特征库中符合要求的指定特征，需要说明的是，若指定特征与行人的跨模态特征之间的余弦距离小于或等于预设阈值，则说明该指定特征为符合要求的特征。根据特征库中与特征对应的行人身份，作为待识别行人的身份信息。After obtaining the cross-modal features of the pedestrian, compare the cross-modal features of the pedestrian to be identified with the features in the feature library, and find the specified features in the feature library that meet the requirements. It should be noted that if the specified feature and the cross-modal feature of the pedestrian If the cosine distance between the state features is less than or equal to the preset threshold, it indicates that the specified feature meets the requirements. According to the identity of the pedestrian corresponding to the feature in the feature database, it is used as the identity information of the pedestrian to be identified.

通过上述技术方案，结合中心最大平均差异损失函数、类内异质中心损失函数以及跨模态三元组损失函数，可以从模态和身份两个方面对模型进行全面优化，充分拉近两种模态分布之间的距离，保证了生成的跨模态特征的准确性，同时增强了对行人的身份识别能力，可以对行人身份进行准确识别。Through the above technical solution, combined with the center maximum average difference loss function, the intra-class heterogeneity center loss function and the cross-modal triplet loss function, the model can be fully optimized from the two aspects of modality and identity, and the two aspects can be fully shortened. The distance between the modal distributions ensures the accuracy of the generated cross-modal features, and at the same time enhances the ability to identify pedestrians, allowing accurate identification of pedestrian identities.

本说明书实施例还提供一种红外可见光的跨模态行人重识别设备，如图4所示，设备包括：至少一个处理器；以及，与至少一个处理器通信连接的存储器；其中，存储器存储有可被至少一个处理器执行的指令，指令被至少一个处理器执行，以使至少一个处理器能够：The embodiment of this specification also provides a cross-modal pedestrian re-identification device of infrared and visible light. As shown in FIG. 4, the device includes: at least one processor; and a memory connected to the at least one processor; instructions executable by at least one processor, the instructions being executable by at least one processor to enable the at least one processor to:

通过双模采集装置，采集待识别行人对应的行人图像，其中，该行人图像包括可见光图像和红外图像中的任意一种或多种；将该行人图像，输入至预先训练的跨模态特征生成模型，生成该行人图像对应的行人跨模态特征，其中，该跨模态特征生成模型使用双模态图像数据集进行训练，损失函数包括中心最大平均差异损失函数、类内异质中心损失函数以及跨模态三元组损失函数；将该行人跨模态特征与预先构建的特征库中的特征进行对比，得到特征库中符合要求的指定特征，以根据该符合要求的指定特征，确定该待识别行人的身份识别结果，其中，该特征库中包括多个行人的特征信息和身份信息。The pedestrian image corresponding to the pedestrian to be identified is collected through a dual-mode acquisition device, wherein the pedestrian image includes any one or more of visible light images and infrared images; the pedestrian image is input to the pre-trained cross-modal feature generation model to generate pedestrian cross-modal features corresponding to the pedestrian image, where the cross-modal feature generation model is trained using a dual-modal image dataset, and the loss function includes the center maximum average difference loss function, the intra-class heterogeneity center loss function And the cross-modal triplet loss function; compare the cross-modal features of the pedestrian with the features in the pre-built feature library, and obtain the specified features in the feature library that meet the requirements, so as to determine the specified features according to the specified features that meet the requirements. The identification result of the pedestrian to be identified, wherein the feature database includes feature information and identity information of multiple pedestrians.

本说明书实施例还提供一种非易失性计算机存储介质，存储有计算机可执行指令，计算机可执行指令设置为：The embodiment of this specification also provides a non-volatile computer storage medium, which stores computer-executable instructions, and the computer-executable instructions are set to:

本说明书中的各个实施例均采用递进的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。尤其，对于装置、设备、非易失性计算机存储介质实施例而言，由于其基本相似于方法实施例，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the apparatus, equipment, and non-volatile computer storage medium embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiments.

上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下，在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外，在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中，多任务处理和并行处理也是可以的或者可能是有利的。The foregoing describes specific embodiments of this specification. Other implementations are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain embodiments.

以上所述仅为本说明书的一个或多个实施例而已，并不用于限制本说明书。对于本领域技术人员来说，本说明书的一个或多个实施例可以有各种更改和变化。凡在本说明书的一个或多个实施例的精神和原理之内所作的任何修改、等同替换、改进等，均应包含在本说明书的权利要求范围之内。The above description is only one or more embodiments of this specification, and is not intended to limit this specification. Various modifications and changes will occur to one or more embodiments of this description for those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of one or more embodiments of this specification shall be included within the scope of claims of this specification.

Claims

1. A cross-mode pedestrian re-identification method of infrared visible light is characterized by comprising the following steps:

acquiring a pedestrian image corresponding to a pedestrian to be identified through a dual-mode acquisition device, wherein the pedestrian image comprises any one or more of a visible light image and an infrared image;

inputting the pedestrian image into a pre-trained cross-modal feature generation model, and generating a pedestrian cross-modal feature corresponding to the pedestrian image, wherein the cross-modal feature generation model is trained by using a bimodal image data set, and the loss function comprises a central maximum average difference loss function, an intra-class heterogeneous central loss function and a cross-modal triplet loss function;

comparing the pedestrian cross-modal characteristics with characteristics in a pre-constructed characteristic library to obtain specified characteristics meeting requirements in the characteristic library, and determining the identity recognition result of the pedestrian to be recognized according to the specified characteristics meeting the requirements, wherein the characteristic library comprises characteristic information and identity information of a plurality of pedestrians.

2. The method according to claim 1, wherein the pedestrian image is input to a pre-trained cross-modal feature generation model, and before the pedestrian cross-modal feature corresponding to the pedestrian image is generated, the method further comprises:

constructing the bimodal image dataset comprising a plurality of sets of bimodal images of a plurality of sample individuals, wherein each set of bimodal images comprises a plurality of infrared sample images and a plurality of visible light sample images;

and training a pre-constructed feature generation model by using the bimodal images in the data set, and determining model parameters to obtain a cross-modal feature generation model under the model parameters.

3. The method according to claim 2, wherein the training of the pre-constructed feature generation model using the bimodal images in the dataset comprises:

inputting the bimodal images in the data set into the feature generation model, and shortening the distance between feature centers in the bimodal images in a Hilbert space, wherein the feature centers are feature centers of pedestrian features belonging to different modalities in the bimodal images;

and through a Transformer feature generation network, sample individuals in the bimodal images are distinguished independently, so that the feature generation model is trained, and cross-modal features of the sample individuals are generated.

4. The method according to claim 3, wherein shortening the distance between feature centers in the bimodal image in the Hilbert space comprises:

acquiring a visible light sample image, an infrared sample image, a visible light sample feature extraction function and an infrared sample feature extraction function in the bimodal image;

defining the central maximum average difference loss function based on the visible light sample image, the infrared sample image, the visible light sample feature extraction function, and the infrared sample feature extraction function;

and mapping the characteristic centers under different modes to the Hilbert space through the maximum average difference loss function of the centers, and shortening the center distance between the characteristic centers of the two modes in the bimodal image.

5. The method according to claim 4, wherein feature centers of different modalities are mapped to the Hilbert space by the center maximum average difference loss function, and after a center distance between feature centers of two modalities in the bimodal image is shortened, the method further comprises:

respectively acquiring a trans-modal mean value center position of the visible light sample image and a trans-modal mean value center position in the infrared sample image;

defining a first center covariance loss based on the cross-modality mean-like center position of the visible light sample image and the cross-modality mean-like center position in the infrared sample image, wherein the center covariance loss is a visible light center covariance loss comprising visible light features in the visible light sample image and the cross-modality mean-like center position of the visible light sample image, and an infrared center covariance loss comprising infrared features in the infrared sample image and the cross-modality mean-like center position in the infrared sample image;

defining the overall cross-modal mean value center position in the dual-modal image, and calculating a second center covariance loss of the cross-modal mean value center position and the overall cross-modal mean value center position under two modalities based on the cross-modal mean value center position, the cross-modal mean value center position of the visible light sample image and the cross-modal mean value center position in the infrared sample image;

determining the intra-class heterogeneous center loss function according to the first center covariance loss and the second center covariance loss;

and controlling the distance between the sample individual in each mode and the characteristic centers of the two modes in the bimodal image through the intra-class heterogeneous center loss function.

6. The method according to claim 3, wherein the method for re-identifying the infrared-visible cross-modal pedestrian is characterized in that sample individuals in the bimodal image are individually distinguished through a transform feature generation network, so as to train the feature generation model to generate cross-modal features of the sample individuals, and specifically comprises:

acquiring a visible light sample image and an infrared sample image corresponding to the same sample individual in the bimodal image dataset;

extracting visible light sample characteristics of the visible light sample image and infrared characteristics of the infrared sample image;

inputting the visible light sample characteristics and the infrared characteristics into the transform characteristic generation network to generate output visible light sample characteristics and output infrared characteristics;

and constraining the visible light sample characteristics and the infrared characteristics through the pre-generated trans-modal triplet loss function to generate the trans-modal characteristics of the sample individual.

7. The method according to claim 5, wherein the central maximum average difference loss function is:

wherein L is _CMMD Is the central maximum average difference loss function,

for visible light sample images, f _v Extracting a function for the characteristics of the visible light sample;

as a near-infrared sample image, f _I The method comprises the steps of extracting features of near infrared samples, psi representing a regeneration nuclear Hilbert space mapping function, P representing the number of identity types of selected sample individuals, and K representing the number of bimodal images of the sample individuals corresponding to each identity.

8. The method of claim 6, wherein before the visible light sample features and the infrared features are constrained by the pre-generated trans-modal triplet loss function to generate the trans-modal features of the individual samples, the method further comprises:

acquiring an appointed infrared sample image and an appointed visible light sample image of an appointed sample individual in the dual-mode image dataset, and acquiring a preset infrared sample image and a preset visible light sample image of a preset sample individual with different identities from the appointed sample individual;

forming an infrared triple loss function based on the specified infrared sample image and the preset infrared sample image;

forming a visible light triplet loss function based on the specified visible light sample image and the preset visible light sample image;

and in a batch processing group, defining a cross-mode triplet loss function of the identity sample individual according to the infrared triplet loss function and the visible light triplet loss function.

9. An infrared visible cross-modal pedestrian re-identification device, the device comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to:

10. A non-transitory computer storage medium storing computer-executable instructions configured to: