CN113971814A

CN113971814A - Data processing method and device, electronic equipment and computer storage medium

Info

Publication number: CN113971814A
Application number: CN202010641928.3A
Authority: CN
Inventors: 陈威华; 王帆; 张宇琪; 李�昊
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-07-06
Filing date: 2020-07-06
Publication date: 2022-01-25
Anticipated expiration: 2040-07-06
Also published as: CN113971814B

Abstract

Embodiments of the present invention provide a data processing method, apparatus, electronic device, and computer storage medium. The data processing method includes: obtaining a first data model and a sample image, wherein the first data model includes a machine learning model; and obtaining a multi-modality corresponding to the sample image at least according to the spatiotemporal information of the sample image Supervision data; train the first data model according to at least the sample image and the multimodal supervision data to obtain a second data model. The data processing method can improve the training effect of the data model.

Description

Data processing method, device, electronic device and computer storage medium

技术领域technical field

本发明实施例涉及计算机技术领域，尤其涉及一种数据处理方法、装置、电子设备及计算机存储介质。Embodiments of the present invention relate to the field of computer technologies, and in particular, to a data processing method, apparatus, electronic device, and computer storage medium.

背景技术Background technique

随着人工智能、机器学习技术的不断发展，机器学习技术在日常生活的中的应用越来越广泛。使用机器学习技术训练的神经网络模型可以快速、高效地完成一些日常任务。例如，在视频监控的行人再识别领域，可以通过训练的行人再识别模型，对监控视频进行处理，从而实现对特定人的移动轨迹追踪等功能。但是，现有的行人再识别模型应用场景较为受限，如针对A场景训练出的行人再识别模型，无法直接转用到B场景中。With the continuous development of artificial intelligence and machine learning technology, the application of machine learning technology in daily life is more and more extensive. Neural network models trained using machine learning techniques can perform some everyday tasks quickly and efficiently. For example, in the field of pedestrian re-identification in video surveillance, the surveillance video can be processed through a trained pedestrian re-identification model, so as to realize functions such as tracking the movement trajectory of a specific person. However, the application scenarios of existing pedestrian re-identification models are relatively limited. For example, a pedestrian re-identification model trained for scene A cannot be directly transferred to scene B.

为了解决上述的问题，现有技术中，通常是针对B场景训练用的样本图像进行人工标注，以获得一批用于训练的监督数据，再使用这些人工标注的监督数据训练适合B场景的行人再识别模型。也就是说，现有技术中普遍存在着机器学习模型需要人工标注监督数据，导致训练成本高，难度大，不易推广的问题。In order to solve the above problems, in the prior art, the sample images used for scene B training are usually manually labeled to obtain a batch of supervised data for training, and then these manually labeled supervised data are used to train pedestrians suitable for scene B. Re-identify the model. That is to say, there is a common problem in the prior art that machine learning models need to manually label supervised data, resulting in high training costs, high difficulty, and difficulty in generalization.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明实施例提供一种数据处理方案，以至少部分解决上述问题。In view of this, embodiments of the present invention provide a data processing solution to at least partially solve the above problems.

根据本发明实施例的第一方面，提供了一种数据处理方法，包括：获取第一数据模型及样本图像，其中，所述第一数据模型包括机器学习模型；至少根据所述样本图像的时空信息，获得与所述样本图像对应的多模态监督数据；至少根据所述样本图像和所述多模态监督数据，对所述第一数据模型进行训练，得到第二数据模型。According to a first aspect of the embodiments of the present invention, a data processing method is provided, including: acquiring a first data model and a sample image, wherein the first data model includes a machine learning model; information to obtain multi-modal supervision data corresponding to the sample image; train the first data model according to at least the sample image and the multi-modal supervision data to obtain a second data model.

根据本发明实施例的第二方面，提供了一种数据处理方法，包括：至少根据样本图像的时空信息，获得与所述样本图像对应的多模态监督数据；至少根据所述样本图像和所述多模态监督数据，对待训练的数据模型进行训练，并得到目标数据模型，其中，所述待训练的数据模型包括机器学习模型。According to a second aspect of the embodiments of the present invention, there is provided a data processing method, comprising: obtaining multimodal supervision data corresponding to the sample image at least according to the spatiotemporal information of the sample image; The multimodal supervision data is used to train the data model to be trained, and the target data model is obtained, wherein the data model to be trained includes a machine learning model.

根据本发明实施例的第三方面，提供了一种数据处理方法，包括：接收客户端通过调用预设的训练接口发送的、用于请求对第一数据模型进行训练的模型训练请求；根据所述模型训练请求，获取用于训练的样本图像和第一数据模型；通过如第一方面的数据处理方法获得所述样本图像对应的多模态监督数据，并使用所述样本图像和所述多模态监督数据对所述第一数据模型进行训练。According to a third aspect of the embodiments of the present invention, a data processing method is provided, including: receiving a model training request for requesting training of a first data model sent by a client by calling a preset training interface; The model training request is obtained, and a sample image and a first data model for training are obtained; the multimodal supervision data corresponding to the sample image is obtained by the data processing method according to the first aspect, and the sample image and the multimodal supervision data are obtained using the sample image and the multimodal supervision data. The modal supervision data trains the first data model.

根据本发明实施例的第四方面，提供了一种数据处理装置，包括：第一获取模块，用于获取第一数据模型及样本图像，其中，所述第一数据模型包括机器学习模型；第二获取模块，用于至少根据所述样本图像的时空信息，获得与所述样本图像对应的多模态监督数据；训练模块，用于至少根据所述样本图像和所述多模态监督数据，对所述第一数据模型进行训练，得到第二数据模型。According to a fourth aspect of the embodiments of the present invention, a data processing apparatus is provided, including: a first acquisition module for acquiring a first data model and a sample image, wherein the first data model includes a machine learning model; The second acquisition module is used to obtain the multimodal supervision data corresponding to the sample image at least according to the spatiotemporal information of the sample image; the training module is used to obtain the multimodal supervision data corresponding to the sample image at least according to the sample image and the multimodal supervision data, The first data model is trained to obtain a second data model.

根据本发明实施例的第五方面，提供一种数据处理装置，包括：第三获取模块，用于至少根据样本图像的时空信息，获得与所述样本图像对应的多模态监督数据；第二训练模块，用于至少根据所述样本图像和所述多模态监督数据，对待训练的数据模型进行训练，并得到目标数据模型，其中，所述待训练的数据模型包括机器学习模型。According to a fifth aspect of the embodiments of the present invention, there is provided a data processing apparatus, comprising: a third acquisition module, configured to acquire multimodal supervision data corresponding to the sample image at least according to the spatiotemporal information of the sample image; second A training module, configured to train a data model to be trained according to at least the sample image and the multimodal supervision data, and obtain a target data model, wherein the data model to be trained includes a machine learning model.

根据本发明实施例的第六方面，提供了一种电子设备，包括：处理器、存储器、通信接口和通信总线，所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信；所述存储器用于存放至少一可执行指令，所述可执行指令使所述处理器执行如第一方面、第二方面或第三方面所述的数据处理方法对应的操作。According to a sixth aspect of the embodiments of the present invention, an electronic device is provided, including: a processor, a memory, a communication interface, and a communication bus, and the processor, the memory, and the communication interface communicate with each other through the communication bus. The memory is used for storing at least one executable instruction, and the executable instruction causes the processor to perform operations corresponding to the data processing methods described in the first aspect, the second aspect or the third aspect.

根据本发明实施例的第七方面，提供了一种计算机存储介质，其上存储有计算机程序，该程序被处理器执行时实现如第一方面、第二方面或第三方面所述的数据处理方法。According to a seventh aspect of the embodiments of the present invention, there is provided a computer storage medium on which a computer program is stored, and when the program is executed by a processor, implements the data processing described in the first aspect, the second aspect or the third aspect method.

根据本发明实施例提供的数据处理方案，多模态监督数据是根据样本图像的时空信息等从样本图像中挖掘出的，因而使得对第一数据模型进行训练用的监督数据无需人工手动标注，从而降低了将数据模型从源场景转用到目标场景的过程中的难度和人工成本，实现了不需要对目标场景的样本图像进行人工标注，就可以对数据模型进行训练的目的，而且由于训练时使用了多模态监督数据，因而可以提升对数据模型进行训练的效果。According to the data processing solution provided by the embodiment of the present invention, the multimodal supervision data is mined from the sample images according to the spatiotemporal information of the sample images, etc., so that the supervision data used for training the first data model does not need to be manually marked. This reduces the difficulty and labor cost in the process of transferring the data model from the source scene to the target scene, and realizes the purpose of training the data model without manually labeling the sample images of the target scene. It uses multimodal supervised data, which can improve the training effect of the data model.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明实施例中记载的一些实施例，对于本领域普通技术人员来讲，还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments described in the embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings.

图1a为根据本发明实施例一的一种数据处理方法的步骤流程图；FIG. 1a is a flow chart of the steps of a data processing method according to Embodiment 1 of the present invention;

图1b为图1a所示实施例中的一种场景示例的示意图；Fig. 1b is a schematic diagram of an example of a scenario in the embodiment shown in Fig. 1a;

图1c为图1a所示实施例中的使用场景的示意图；Fig. 1c is a schematic diagram of a usage scenario in the embodiment shown in Fig. 1a;

图2为根据本发明实施例二的一种数据处理方法的步骤流程图；2 is a flow chart of steps of a data processing method according to Embodiment 2 of the present invention;

图3为根据本发明实施例三的一种数据处理方法的步骤流程图；3 is a flowchart of steps of a data processing method according to Embodiment 3 of the present invention;

图4a为根据本发明实施例四的一种数据处理方法的步骤流程图；Fig. 4a is a flow chart of the steps of a data processing method according to Embodiment 4 of the present invention;

图4b为图4a所示实施例中的一种场景示例的示意图；Fig. 4b is a schematic diagram of an example of a scenario in the embodiment shown in Fig. 4a;

图5为根据本发明实施例五的一种数据处理方法的步骤流程图；5 is a flowchart of steps of a data processing method according to Embodiment 5 of the present invention;

图6a为根据本发明实施例六的一种数据处理方法的步骤流程图；Fig. 6a is a flow chart of the steps of a data processing method according to Embodiment 6 of the present invention;

图6b为根据本发明实施例六的一种SaaS平台与客户端连接的示意图；Fig. 6b is a schematic diagram of a connection between a SaaS platform and a client according to Embodiment 6 of the present invention;

图7为根据本发明实施例七的一种数据处理装置的结构框图；7 is a structural block diagram of a data processing apparatus according to Embodiment 7 of the present invention;

图8为根据本发明实施例八的一种数据处理装置的结构框图；8 is a structural block diagram of a data processing apparatus according to Embodiment 8 of the present invention;

图9为根据本发明实施例九的一种电子设备的结构示意图。FIG. 9 is a schematic structural diagram of an electronic device according to Embodiment 9 of the present invention.

具体实施方式Detailed ways

为了使本领域的人员更好地理解本发明实施例中的技术方案，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅是本发明实施例一部分实施例，而不是全部的实施例。基于本发明实施例中的实施例，本领域普通技术人员所获得的所有其他实施例，都应当属于本发明实施例保护的范围。In order to make those skilled in the art better understand the technical solutions in the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. The embodiments described above are only a part of the embodiments of the present invention, but not all of the embodiments. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments in the embodiments of the present invention should fall within the protection scope of the embodiments of the present invention.

下面结合本发明实施例附图进一步说明本发明实施例具体实现。The specific implementation of the embodiments of the present invention is further described below with reference to the accompanying drawings of the embodiments of the present invention.

实施例一Example 1

参照图1，示出了本发明实施例一的数据处理方法的步骤流程图。Referring to FIG. 1 , a flowchart of steps of a data processing method according to Embodiment 1 of the present invention is shown.

在本实施例中，数据处理方法可以应用于行人再识别模型的训练领域，实现在不需要对目标场景中的样本图像进行人工标注的情况下，对源场景的行人再识别模型进行训练，使其能够在目标场景中使用。该数据处理方法可以配置于服务端(服务端包括服务器和云端)，由服务端执行。In this embodiment, the data processing method can be applied to the training field of the pedestrian re-identification model, so that the pedestrian re-identification model of the source scene can be trained without manually labeling the sample images in the target scene, so that the It can be used in the target scene. The data processing method can be configured on a server (the server includes a server and a cloud) and executed by the server.

当然，在其他实施例中，数据处理方法可以应用到其他适当的领域，如应用于其他进行图像处理的神经网络模型的训练领域。而且，该数据处理方法可以由任何适当的执行主体，如终端设备等执行，本实施例对此不作限制。Of course, in other embodiments, the data processing method can be applied to other appropriate fields, such as applied to the training field of other neural network models for image processing. Moreover, the data processing method may be executed by any suitable execution subject, such as a terminal device, etc., which is not limited in this embodiment.

步骤S102：获取第一数据模型及样本图像。Step S102: Obtain a first data model and a sample image.

在第一数据模型转用过程中，源场景可以指初始训练出第一数据模型时使用的训练样本所在场景。例如，第一数据模型A是使用城市a的监控图像训练出的行人再识别模型，则源场景为城市a。In the process of reusing the first data model, the source scene may refer to the scene where the training sample used when the first data model is initially trained. For example, if the first data model A is a pedestrian re-identification model trained using surveillance images of city a, the source scene is city a.

与源场景相对的，目标场景是指第一数据模型需要转用到的场景，如需要使用第一数据模型A对城市b的监控视频进行行人再识别，则目标场景是城市b。由于城市b的建筑风格、服饰风格等种种因素与城市a不同，导致直接第一数据模型A转用到目标场景中，其使用效果不能满足需求。Compared with the source scene, the target scene refers to the scene to which the first data model needs to be transferred. If the first data model A needs to be used to re-identify pedestrians in the surveillance video of city b, the target scene is city b. Because the architectural style, clothing style and other factors of city b are different from those of city a, the first data model A is directly transferred to the target scene, and its use effect cannot meet the needs.

第一数据模型可以是任何适当的机器学习模型(如神经网络模型)。例如，第一数据模型可以是前述的行人再识别模型，或者进行人脸识别的神经网络模型等。行人再识别模型是指能够实现针对特定行人进行监控视频检索功能的神经网络模型。The first data model may be any suitable machine learning model (eg, a neural network model). For example, the first data model may be the aforementioned pedestrian re-identification model, or a neural network model for face recognition, or the like. The pedestrian re-identification model refers to a neural network model that can realize the function of retrieving surveillance video for specific pedestrians.

针对不同功能的第一数据模型，可以使用不同的类型的图像作为样本图像，如第一数据模型是行人再识别模型，则样本图像可以是监控视频的监控图像。监控图像中包括采集的行人的人体图像。For the first data models with different functions, different types of images may be used as sample images. For example, if the first data model is a pedestrian re-identification model, the sample images may be surveillance images of surveillance videos. The surveillance images include the collected human body images of pedestrians.

步骤S104：至少根据所述样本图像的时空信息，获得与所述样本图像对应的多模态监督数据。Step S104: Obtain multimodal supervision data corresponding to the sample image at least according to the spatiotemporal information of the sample image.

时空信息至少包括样本图像的采集时间和采集位置。例如，在通过监控摄像机采集监控图像时，记录下每个监控图像的时间戳(即采集时间)，以及记录每个监控摄像机的经纬度信息，该经纬度信息即为该监控摄像机采集的所有监控图像的采集位置。The spatiotemporal information includes at least the acquisition time and acquisition location of the sample image. For example, when collecting surveillance images through surveillance cameras, record the time stamp of each surveillance image (that is, the acquisition time), and record the latitude and longitude information of each surveillance camera, which is the latitude and longitude information of all surveillance images collected by the surveillance camera. Collection location.

多模态监督数据中多模态是指多种模态的信息相结合，比如人脸信息和时空信息相结合。多模态监督数据包括第一监督数据和/或第二监督数据。第一监督数据可以是强监督的监督数据，强监督是指这些监督数据的基于多模态的信息建立的伪标签准确率非常高(如为100％或满足近似100％的条件)。第二监督数据可以是弱监督的监督数据，弱监督是指基于多模态的信息建立的伪标签准确率相对低(通常是不到100％或者不满足近似100％的条件)。In multimodal supervised data, multimodality refers to the combination of information from multiple modalities, such as the combination of face information and spatiotemporal information. The multimodal supervision data includes first supervision data and/or second supervision data. The first supervised data may be strongly supervised supervised data, and strong supervision means that the accuracy of pseudo-labels established based on multimodal information of these supervised data is very high (eg, 100% or approximately 100%). The second supervised data may be weakly supervised supervised data. Weak supervision means that the accuracy of pseudo-labels established based on multimodal information is relatively low (usually less than 100% or does not meet the condition of approximately 100%).

本领域技术人员可以根据需要采用任何适当的方式获得多模态监督数据。Those skilled in the art can obtain multimodal supervision data in any suitable manner as required.

以获得强监督的多模态监督数据为例，其获取过程例如为：Taking the acquisition of strongly supervised multimodal supervised data as an example, the acquisition process is as follows:

针对样本图像中的目标样本图像Q，根据各样本图像中的人脸信息，确定与目标样本图像Q包含相同人脸的样本图像W，由于样本图像W与目标样本图像Q包含相同人脸，因此可以认为样本图像W与目标样本图像Q包含相同人，故而样本图像W的伪标签为正样本。For the target sample image Q in the sample image, according to the face information in each sample image, determine the sample image W that contains the same face as the target sample image Q, because the sample image W and the target sample image Q contain the same face, so It can be considered that the sample image W and the target sample image Q contain the same person, so the pseudo-label of the sample image W is a positive sample.

根据样本图像的时空信息，确定与目标样本图像Q时间相同、且采集位置之间的距离大于设定距离阈值(设定距离阈值可以根据需要确定)的样本图像E，由于一个人在同一时刻不可能出现在两个地点，因此可以认为样本图像E和目标样本图像Q不可能包含相同人，故而样本图像E的伪标签为负样本，这样根据样本图像W和样本图像E就可以获得强监督的多模态监督数据。According to the spatiotemporal information of the sample image, determine the sample image E with the same time as the target sample image Q and the distance between the collection positions is greater than the set distance threshold (the set distance threshold can be determined according to needs). It may appear in two places, so it can be considered that the sample image E and the target sample image Q cannot contain the same person, so the pseudo-label of the sample image E is a negative sample, so that strong supervision can be obtained according to the sample image W and the sample image E. Multimodal Supervision Data.

当然，在其他实施例中，可以采用其他方式获得多模态监督数据，本实施例对此不作限制。Of course, in other embodiments, the multimodal supervision data may be obtained in other manners, which is not limited in this embodiment.

步骤S106：至少根据所述样本图像和所述多模态监督数据，对所述第一数据模型进行训练，得到第二数据模型。Step S106: Train the first data model according to at least the sample image and the multimodal supervision data to obtain a second data model.

在获得多模态监督数据后，使用样本图像对第一数据模型进行训练，在训练过程中，以样本图像对应的多模态监督数据作为监督，从而计算损失值，根据损失值对第一数据模型中的参数进行更新，以获得能够适用于目标场景的第二数据模型。After obtaining the multimodal supervision data, use the sample images to train the first data model. During the training process, the multimodal supervision data corresponding to the sample images is used as supervision to calculate the loss value, and the first data model is calculated according to the loss value. The parameters in the model are updated to obtain a second data model suitable for the target scene.

下面结合一具体使用场景，对数据处理方法的实现过程进行说明如下：The following describes the implementation process of the data processing method in combination with a specific usage scenario:

在本使用场景中，以第一数据模型是行人再识别模型，目标场景对应的样本图像是城市b的监控视频中的监控图像为例进行说明。In this usage scenario, the first data model is a pedestrian re-identification model, and the sample image corresponding to the target scene is a surveillance image in a surveillance video of city b for illustration.

在本使用场景中以从城市b获得的监控视频中的监控图像作为样本图像。第一数据模型可以是基于城市a的监控图像训练出的行人再识别模型。In this usage scenario, the surveillance images in the surveillance video obtained from city b are used as sample images. The first data model may be a pedestrian re-identification model trained based on surveillance images of city a.

如图1b和图1c所示，在本使用场景中该数据处理方法配置于服务端，服务端获取第一数据模型和样本图像。As shown in Figure 1b and Figure 1c, in this usage scenario, the data processing method is configured on the server, and the server obtains the first data model and the sample image.

针对样本图像，至少根据其时空信息，获得样本图像的多模态监督数据。For the sample image, at least according to its spatiotemporal information, the multimodal supervision data of the sample image is obtained.

如，like,

样本图像的总数为N个，针对第i个样本图像I_i，根据样本图像I_i的人脸信息，确定除样本图像I_i外的剩余N-1个样本图像中与样本图像I_i包含相同人脸的样本图像(记作I_j)，由于包含相同人脸的样本图像中的人体图像大概率是同一个人，因此样本图像I_j即为正样本，数据模型对样本图像I_i和I_j输出的特征信息的损失函数应较小。The total number of sample images is N. For the ith sample image I _i , according to the face information of the sample image I _i , it is determined that the remaining N-1 sample images except the sample image I _i contain the same as the sample image I _i . A sample image of a human face (denoted as I _j ), since the human body images in the sample images containing the same face are _likely to be the same person, the sample image I _j is a positive sample _. The loss function of the output feature information should be small.

除了获得正样本外，还可以根据样本图像I_i的采集时间和采集位置，确定采集时间相同，但采集位置之间的距离大于设定距离阈值的样本图像

由于一个人不可能在同一时刻出现在相距较远(如1公里)的两个位置，因此，相距距离较远的两个样本图像中不会包括相同人，故而两者组成一个负样本对，互为负样本。In addition to obtaining positive samples, it can also be determined according to the collection time and collection position of the sample image I _i that the collection time is the same, but the distance between the collection positions is greater than the set distance threshold.

Since it is impossible for a person to appear at two locations that are far apart (such as 1 km) at the same time, the two sample images that are far apart will not include the same person, so the two form a negative sample pair, are negative samples of each other.

通过正样本和负样本形成多模态监督数据，使用样本图像和多模态监督数据就可以对第一数据模型进行训练，得到第二数据模型，该第二数据模型可以应用与城市b的监控图像的行人再识别。Multi-modal supervision data is formed by positive samples and negative samples, and the first data model can be trained by using the sample images and multi-modal supervision data to obtain a second data model, which can be applied to the monitoring of city b Pedestrian re-identification of images.

训练出的第二数据模型可以实现将不同的测试图像输入到数据模型中，如果两个测试图像包含相同人的人体图像，则输出的两个测试图像对应的特征信息空间上更加接近。这样在进行行人再识别时，将特定对象的图像输入到数据模型中，获得其特征信息，再使用该特征信息对其他监控图像的特征信息进行检索，就可以获得包含特定对象的监控图像，从而找出特定对象的移动轨迹。The trained second data model can input different test images into the data model. If the two test images contain human body images of the same person, the feature information corresponding to the two output test images will be closer in space. In this way, during pedestrian re-identification, the image of a specific object is input into the data model to obtain its characteristic information, and then the characteristic information of other monitoring images is retrieved by using the characteristic information, and then the monitoring image containing the specific object can be obtained. Find out the movement trajectory of a specific object.

通过本实施例，多模态监督数据是根据样本图像的时空信息等从样本图像中挖掘出的，因而使得对第一数据模型进行训练用的监督数据无需人工手动标注，从而降低了将数据模型从源场景转用到目标场景的过程中的难度和人工成本，实现了不需要对样本图像进行人工标注，就可以对数据模型进行训练的目的，而且由于训练时使用了多模态监督数据，因而可以提升对数据模型进行训练的效果。With this embodiment, the multimodal supervision data is mined from the sample images according to the spatiotemporal information of the sample images, so that the supervision data used for training the first data model does not need to be manually marked, thereby reducing the need for data model The difficulty and labor cost in the process of transferring from the source scene to the target scene realizes the purpose of training the data model without manual annotation of the sample images, and because multi-modal supervision data is used during training, Therefore, the effect of training the data model can be improved.

实施例二Embodiment 2

参照图2，示出了根据本发明实施例二的数据处理方法的步骤流程图。Referring to FIG. 2 , a flowchart of steps of a data processing method according to Embodiment 2 of the present invention is shown.

在本实施例中，数据处理方法包括前述的步骤S102～步骤S106。其中，本实施例中，例举一种获取多模态监督数据中的第一监督数据的实现方式，步骤S104包括子步骤S1041和子步骤S1042。In this embodiment, the data processing method includes the aforementioned steps S102 to S106. Wherein, in this embodiment, an implementation manner of acquiring the first supervision data in the multimodal supervision data is exemplified, and step S104 includes sub-step S1041 and sub-step S1042.

子步骤S1041：对所述样本图像进行人脸识别，获得所述样本图像对应的人脸信息。Sub-step S1041: Perform face recognition on the sample image to obtain face information corresponding to the sample image.

例如，使用现有的任何适当的人脸识别技术对所有的样本图像进行人脸识别，获得各样本图像的人脸信息。For example, face recognition is performed on all the sample images using any existing suitable face recognition technology, and the face information of each sample image is obtained.

子步骤S1042：根据所述人脸信息和所述样本图像的时空信息，确定与所述样本图像对应的第一监督数据。Sub-step S1042: Determine the first supervision data corresponding to the sample image according to the face information and the spatiotemporal information of the sample image.

在本实施例中，第一监督数据为强监督的多模态监督数据。In this embodiment, the first supervision data is strongly supervised multimodal supervision data.

在一具体实现中，子步骤S1042包括：In a specific implementation, sub-step S1042 includes:

子步骤I：根据所述样本图像的人脸信息，确定包含相同人脸的样本图像组成的正样本集合。Sub-step I: According to the face information of the sample images, a positive sample set composed of sample images containing the same face is determined.

例如，根据识别出的人脸信息，确定样本图像中属于同一个人的样本图像，这些样本图像中的任意两个可以组成一个正样本对，记为

For example, according to the identified face information, determine the sample images belonging to the same person in the sample images, any two of these sample images can form a positive sample pair, denoted as

如，样本图像I_i和样本图像I_j均包括人脸A，则两者组成一个正样本对，记作

其中

表示第k对正样本对,I_i表示第i张样本图像，I_j表示第j张样本图像。For example, the sample image I _i and the sample image I _j both include the face A, then the two form a positive sample pair, denoted as

in

Represents the k-th positive sample pair, I _i represents the i-th sample image, and I _j represents the j-th sample image.

样本图像I_n和样本图像I_m均包括人脸B，则两者组成一个正样本对，记作

其中

表示第k对正样本对,I_n表示第n张样本图像，I_m表示第m张样本图像。Both the sample image I _n and the sample image I _m include the face B, then the two form a positive sample pair, denoted as

in

represents the kth positive sample pair, In represents the _nth sample image, and Im represents the _mth sample image.

这些正样本对组成正样本集合。These pairs of positive samples form a set of positive samples.

子步骤II：根据所述样本图像的时空信息，确定满足设定条件的样本图像组成的负样本集合。Sub-step II: According to the spatiotemporal information of the sample images, determine a negative sample set composed of sample images that satisfy the set condition.

在一具体实现中子步骤II可以实现为：从样本图像中确定目标样本图像；从除所述目标样本图像之外的样本图像中，与所述目标样本图像的采集时间相同、且采集位置之间的距离大于设定距离阈值的样本图像确定为负样本图像；根据所述负样本图像确定与所述目标样本图像对应的负样本集合。In a specific implementation, sub-step II can be implemented as: determining a target sample image from the sample image; The sample images whose distances are greater than the set distance threshold are determined as negative sample images; a negative sample set corresponding to the target sample image is determined according to the negative sample images.

目标样本图像可以是样本图像中的任意样本图像，也就是说，针对每个样本图像都可以执行子步骤II，从而获得与之相关的负样本对，进而获得负样本集合。The target sample image may be any sample image in the sample images, that is, sub-step II may be performed for each sample image, thereby obtaining a negative sample pair related to it, and further obtaining a negative sample set.

在本实施例中，每个样本图像都有对应的时空信息，时空信息包括采集时间和采集位置。采集时间可以根据样本图像的时间戳确定。采集位置可以根据采集该样本图像的监控摄像机确定。例如，每个监控摄像机都有对应的编号以及经纬度信息。In this embodiment, each sample image has corresponding spatiotemporal information, and the spatiotemporal information includes acquisition time and acquisition location. The acquisition time can be determined from the time stamps of the sample images. The collection position can be determined according to the surveillance camera that collects the sample image. For example, each surveillance camera has a corresponding number and latitude and longitude information.

针对目标样本图像，根据采集其的目标监控摄像机的编号，就可以确定目标监控摄像机的经纬度信息，进而可以找到与目标监控摄像机的距离大于设定距离阈值(如一公里)的监控摄像机，这些监控摄像机采集的样本图像(为了便于描述记作set1中的样本图像)与目标样本图像之间的距离大于设定距离阈值。For the target sample image, the longitude and latitude information of the target monitoring camera can be determined according to the number of the target monitoring camera collected, and then the monitoring cameras whose distance from the target monitoring camera is greater than the set distance threshold (such as one kilometer) can be found. The distance between the collected sample image (denoted as the sample image in set1 for the convenience of description) and the target sample image is greater than the set distance threshold.

进而，从set1中的样本图像中，确定采集时间与目标样本图像的采集时间相同的样本图像记作

这些样本图像包括的人与目标样本图像包含的人不同，因此为目标样本图像的负样本，目标样本图像与任意一个负样本可以组成一个负样本对记作

其可以表示为

其中，

表示第k个负样本对，I_i表示目标样本图像，

表示第n个负样本。Further, from the sample images in set1, it is determined that the sample images whose acquisition time is the same as the acquisition time of the target sample image are recorded as

The people included in these sample images are different from those included in the target sample image, so they are negative samples of the target sample image. The target sample image and any negative sample can form a negative sample pair, which is recorded as

It can be expressed as

in,

represents the kth negative sample pair, I _i represents the target sample image,

represents the nth negative sample.

由于人的移动速度有限，因此只要针对人的不同移动方式(如骑自行车、走路、跑步等)可以设定适当的设定距离阈值，从而保证找出的负样本与目标样本图像包括的人不同。Since the moving speed of people is limited, an appropriate distance threshold can be set for different moving modes of people (such as cycling, walking, running, etc.), so as to ensure that the negative samples found are different from the people included in the target sample images. .

根据每个目标样本图像的负样本对可以确定与目标样本图像对应的负样本集合。A set of negative samples corresponding to the target sample image can be determined according to the negative sample pair of each target sample image.

子步骤III：根据所述正样本集合和所述负样本集合，确定与所述样本图像对应的第一监督数据。Sub-step III: Determine the first supervision data corresponding to the sample image according to the positive sample set and the negative sample set.

在本实施例中，为了保证使用第一监督数据训练的数据模型后，在使用该数据模型时，将相同人的人体图像输入到数据模型后，其输出的特征信息在空间上比较相近，从而实现对特定人的追踪，针对某个样本图像，使用该样本图像、该样本图像的正样本集合中的某个正样本、该样本图像中的负样本集合中的某个负样本，组成对应的第一监督数据。In this embodiment, in order to ensure that after using the data model trained by the first supervision data, when the data model is used, after inputting the human body image of the same person into the data model, the output feature information is relatively similar in space, so that To achieve the tracking of a specific person, for a sample image, use the sample image, a positive sample in the positive sample set of the sample image, and a negative sample in the negative sample set in the sample image to form a corresponding The first monitoring data.

例如，针对样本图像I_i，其对应的第一监督数据可以表示为

其中，I_j是正样本，

是负样本。因为，这些正样本对和负样本对的可靠度都非常高，准确率接近100％，而且在活动正样本对和负样本对的过程中，除了使用样本图像中的人体图像信息外，还使用了人脸信息和时空信息，所以第一监督数据可以称之为强监督多模态监督数据。For example, for the sample image I _i , the corresponding first supervision data can be expressed as

where I _j is a positive sample,

is a negative sample. Because, the reliability of these positive sample pairs and negative sample pairs is very high, and the accuracy rate is close to 100%, and in the process of active positive sample pairs and negative sample pairs, in addition to using the human image information in the sample images, also use face information and spatiotemporal information, so the first supervision data can be called strong supervision multimodal supervision data.

在训练时，可以利用triplet loss损失函数，使用第一监督数据对数据模型进行训练。During training, the triplet loss loss function can be used to train the data model using the first supervised data.

除此之外，利用人脸信息和时空信息，保证了正样本和负样本的准确率，确保第一监督数据是准确率接近100％的强监督数据，进而提升对数据模型的训练效果。In addition, the use of face information and spatiotemporal information ensures the accuracy of positive samples and negative samples, and ensures that the first supervision data is strong supervision data with an accuracy rate close to 100%, thereby improving the training effect of the data model.

实施例三Embodiment 3

参照图3，示出了根据本发明实施例三的一种数据处理方法的步骤流程图。Referring to FIG. 3 , a flowchart of steps of a data processing method according to Embodiment 3 of the present invention is shown.

在本实施例中，在本实施例中，数据处理方法包括前述的步骤S102～步骤S106。其中，步骤S104可以采用前述的实现方式，本实施例中，例举一种获取多模态监督数据中的第二监督数据的实现方式，即通过步骤S104包括的子步骤S1043～子步骤S1045获取第二监督数据。In this embodiment, in this embodiment, the data processing method includes the aforementioned steps S102 to S106. Wherein, step S104 may adopt the aforementioned implementation manner. In this embodiment, an implementation manner of obtaining the second supervision data in the multimodal supervision data is exemplified, that is, obtaining the second supervision data in the step S104 through sub-steps S1043 to S1045 included in step S104. The second monitoring data.

需要说明的是，在本实施例中，步骤S104可以仅包括子步骤S1043～子步骤S1045。或者，也可以包括子步骤S1041～子步骤S1045，此情况中，子步骤S1043～子步骤S1045可以是在子步骤S1041和子步骤S1042的之前、之后或者并行执行。It should be noted that, in this embodiment, step S104 may only include sub-steps S1043 to S1045. Alternatively, sub-steps S1041 to S1045 may also be included. In this case, sub-steps S1043 to S1045 may be executed before, after, or in parallel with sub-steps S1041 and S1042.

子步骤S1043：使用所述第一数据模型对所述样本图像进行特征提取，获取用于表征人体图像信息的特征信息。Sub-step S1043: Use the first data model to perform feature extraction on the sample image, and obtain feature information for characterizing human body image information.

第一数据模型可以是前述的在源场景中预先训练的数据模型。或者，也可以是在目标场景中完成一个或多个训练周期的数据模型。例如，针对第一数据模型A，在第一个训练周期中，使用N张样本图像对其进行训练完成后即认为第一数据模型A完成第一个训练周期，为了便于描述将完成第一个训练周期的数据模型记作A’。在第二个训练周期中，训练对象变为第一数据模型A’。也就是说，在不同的训练周期中，对样本图像进行特征提取的第一数据模型可以不同，其可以是前一训练周期训练完成的数据模型。The first data model may be the aforementioned data model pre-trained in the source scene. Alternatively, it can also be a data model that completes one or more training cycles in the target scene. For example, for the first data model A, in the first training cycle, after it is trained with N sample images, it is considered that the first data model A has completed the first training cycle. For the convenience of description, the first training cycle will be completed. The data model of the training cycle is denoted as A'. In the second training cycle, the training object becomes the first data model A'. That is, in different training cycles, the first data models for feature extraction on the sample images may be different, and may be the data models trained in the previous training cycle.

子步骤S1044：至少根据所述样本图像的时空信息、和所述特征信息与待排序特征之间的相似度，对待排序特征信息进行排序，获得目标排序结果。Sub-step S1044: Sort the feature information to be sorted according to at least the spatiotemporal information of the sample image and the similarity between the feature information and the feature to be sorted, and obtain a target sorting result.

在一具体实现中，子步骤S1044包括以下子步骤：In a specific implementation, sub-step S1044 includes the following sub-steps:

子步骤A：从样本图像中确定目标样本图像。Sub-step A: Determine the target sample image from the sample image.

目标样本图像可以是任意样本图像。针对各样本图像均可以通过子步骤B获得其对应的目标排序结果。The target sample image can be any sample image. For each sample image, the corresponding target ranking result can be obtained through sub-step B.

子步骤B：根据目标样本图像的特征信息与待排序特征之间的相似度，对待排序特征信息进行排序，获得初始排序结果。Sub-step B: According to the similarity between the feature information of the target sample image and the feature to be sorted, sort the feature information to be sorted to obtain an initial sorting result.

待排序特征可以是除目标样本图像外的剩余样本图像的特征信息。例如，样本图像为N个，其中第i个样本图像为目标样本图像，则剩余的N-1个样本图像即为剩余样本图像。The feature to be sorted may be feature information of the remaining sample images except the target sample image. For example, there are N sample images, and the ith sample image is the target sample image, and the remaining N-1 sample images are the remaining sample images.

针对目标样本图像，通过计算目标样本图像与各剩余样本图像之间的距离等方式可以确定相似度，进而可以根据相似度对待排序特征进行排序，获得初始排序结果。例如，初始排序结果为相似度由高到低排序，相似度越高，表示该待排序特征对应的剩余样本图像和目标样本图像包括相同人的概率越高。For the target sample image, the similarity can be determined by calculating the distance between the target sample image and each remaining sample image, and then the features to be sorted can be sorted according to the similarity to obtain an initial sorting result. For example, the initial sorting result is that the similarity is sorted from high to low, and the higher the similarity, the higher the probability that the remaining sample image and the target sample image corresponding to the feature to be sorted include the same person.

当然，在其他实施例中，也可以通过其他方式对待排序特征进行排序，本实施例对此不作限制。Certainly, in other embodiments, the features to be sorted may also be sorted in other manners, which is not limited in this embodiment.

子步骤C：至少根据所述目标样本图像的时空信息，对初始排序结果进行调整，获得所述目标排序结果。Sub-step C: Adjust the initial sorting result at least according to the spatiotemporal information of the target sample image to obtain the target sorting result.

在一具体实现中，子步骤C包括以下过程：In a specific implementation, sub-step C includes the following processes:

过程C1：根据所述目标样本图像的人脸信息和所述待排序特征对应的样本图像的人脸信息之间的相似度，对所述初始排序结果进行第一调整，获得第一预调整排序结果。Process C1: According to the similarity between the face information of the target sample image and the face information of the sample images corresponding to the features to be sorted, perform a first adjustment on the initial sorting result to obtain a first pre-adjustment sorting result.

人脸信息可以通过对目标样本图像和剩余样本图像进行人脸识别获得。目标样本图像和剩余样本图像的人脸信息间的相似度越高，表示两个图像中包含相同人的概率越高，因此根据人脸信息对初始排序结果进行第一调整(如重新校正和排序)，可以使获得的第一预调整排序结果的排序更加准确，使与目标样本图像包含相同人的剩余样本图像的排序更加靠前。The face information can be obtained by performing face recognition on the target sample image and the remaining sample images. The higher the similarity between the face information of the target sample image and the remaining sample images, the higher the probability that the two images contain the same person, so the initial sorting result is first adjusted (such as re-correction and sorting) according to the face information. ), the obtained first pre-adjustment sorting result can be ranked more accurately, and the remaining sample images that contain the same person as the target sample image can be ranked more advanced.

过程C2：根据所述目标样本图像的时空信息和所述待排序特征对应的样本图像的时空信息，对第一预调整排序结果进行第二调整，获得第二预调整排序结果。Process C2: According to the spatiotemporal information of the target sample image and the spatiotemporal information of the sample image corresponding to the feature to be sorted, a second adjustment is performed on the first pre-adjustment sorting result to obtain a second pre-adjustment sorting result.

除了根据人脸信息对排序进行调整外，还可以根据时空信息对排序进行第二调整(即对第二预调整排序结果进行重新校正和排序)，从而利用时空信息弥补人脸信息的不足，从而提升排序准确性。在根据时空信息进行调整时，可以将采集时间与目标样本图像的采集时间相同，但与目标样本图像的采集位置相距较远的剩余样本图像的待排序特征的顺序向后移，因为这些剩余样本图像与目标样本图像包含相同人的可能性很低。In addition to adjusting the sorting according to the face information, the sorting can also be adjusted according to the space-time information (that is, the second pre-adjustment sorting results are re-corrected and sorted), so that the space-time information can be used to make up for the lack of face information. Improve sorting accuracy. When adjusting according to the spatiotemporal information, the acquisition time can be the same as the acquisition time of the target sample image, but the order of the features to be sorted for the remaining sample images that are far away from the acquisition position of the target sample image can be moved backward, because these remaining samples The probability that the image contains the same person as the target sample image is low.

过程C3：根据所述目标样本图像的人体属性信息和所述待排序特征对应的样本图像的人体属性信息，对所述第二预调整排序结果进行第三调整，获得所述目标排序结果。Process C3: Perform a third adjustment on the second pre-adjustment sorting result according to the human body attribute information of the target sample image and the human body attribute information of the sample image corresponding to the features to be sorted to obtain the target sorting result.

为了进一步提升排序的准确性，除了根据人脸信息和时空信息对排序进行调整外，还可以根据人体属性信息进行调整。人体属性信息包括外观信息，如是否戴帽子、帽子颜色、衣服颜色、身高等等。根据人体属性信息可以对人脸较为模糊、根据时空信息无法准确确定是否包含相同人的待排序特征进行较好的排序调整，从而使目标排序结果的准确度更高。In order to further improve the accuracy of sorting, in addition to adjusting the sorting based on face information and spatiotemporal information, it can also be adjusted based on human attribute information. The human attribute information includes appearance information, such as whether to wear a hat, hat color, clothes color, height, and so on. According to the human attribute information, it is possible to better sort and adjust the to-be-sorted features whose faces are more ambiguous and cannot be accurately determined according to the spatiotemporal information, so that the accuracy of the target sorting result is higher.

例如，针对目标样本图像，一些剩余样本图像中人脸较为模糊，根据人脸信息无法确定目标样本图像和剩余样本图像是否包含相同人，而根据目标样本图像的时空信息和剩余样本图像的时空信息也无法准确地确定两者是否包含相同人时，可以根据人体属性信息对第二预调整排序结果进行第三调整(即对第二预调整排序结果进行重新校正和排序)，获得目标排序结果，使目标排序结果的准确性更高。For example, for the target sample image, the faces in some of the remaining sample images are relatively blurred, and it is impossible to determine whether the target sample image and the remaining sample images contain the same person based on the face information. When it is also impossible to accurately determine whether the two include the same person, the second pre-adjustment sorting result can be thirdly adjusted according to the human attribute information (that is, the second pre-adjustment sorting result is re-corrected and sorted) to obtain the target sorting result. Make the target sorting result more accurate.

子步骤S1045：将所述目标排序结果确定为第二监督数据。Sub-step S1045: Determine the target sorting result as the second supervision data.

排序调整后的目标排序结果作为第二监督数据，即目标样本图像的排序标签。由于目标排序结果的正确率并非100％，因此称之为弱监督多模态训练数据。The target ranking result after the ranking adjustment is used as the second supervision data, that is, the ranking label of the target sample image. Since the accuracy of target ranking results is not 100%, it is called weakly supervised multimodal training data.

后续可以利用ranking loss损失函数以这些弱监督多模态监督数据作为监督，对数据模型进行训练。这样在使用训练的数据模型时，使数据模型输出的针对测试图像的排序结果的准确率更高，即与测试图像包含相同人的概率较高的图像的排序更加靠前，从而使得能更加准确、快速地确定特定人的移动轨迹。Subsequently, the ranking loss function can be used to train the data model with these weakly supervised multimodal supervised data as supervision. In this way, when the trained data model is used, the accuracy of the ranking results for the test images output by the data model is higher, that is, the images with a higher probability of containing the same person as the test image are ranked higher, so that the ranking results can be more accurate. , Quickly determine the movement trajectory of a specific person.

通过本实施例，多模态监督数据是根据样本图像的时空信息等从样本图像中挖掘出的，因而使得对第一数据模型进行训练用的监督数据无需人工手动标注，从而降低了将数据模型从源场景转用到目标场景的过程中的难度和人工成本，实现了不需要对目标场景的样本图像进行人工标注，就可以对数据模型进行训练的目的，而且由于训练时使用了多模态监督数据，因而可以提升对数据模型进行训练的效果。With this embodiment, the multimodal supervision data is mined from the sample images according to the spatiotemporal information of the sample images, so that the supervision data used for training the first data model does not need to be manually marked, thereby reducing the need for data model The difficulty and labor cost in the process of transferring from the source scene to the target scene realize the purpose of training the data model without manually labeling the sample images of the target scene. Supervise the data, thus improving the training of the data model.

除此之外，利用人脸信息和时空信息，对初始排序结果进行调整，获得排序更加准确的目标排序结果，以该目标排序结果作为弱监督多模态监督数据，使得根据其训练出的第二数据模型在使用时输出的排序更加准确。In addition, the face information and spatiotemporal information are used to adjust the initial sorting results to obtain target sorting results with more accurate sorting. The ordering of the output of the second data model is more accurate when used.

实施例四Embodiment 4

参照图4a，示出了根据本发明实施例四的一种数据处理方法的步骤流程图。Referring to Fig. 4a, a flowchart of steps of a data processing method according to Embodiment 4 of the present invention is shown.

在本实施例中，数据处理方法包括前述的步骤S102～步骤S106。其中，步骤S104可以采用前述任一的实现方式。在本实施例中，在步骤S106之前，所述方法还包括步骤S104a和步骤S104b。In this embodiment, the data processing method includes the aforementioned steps S102 to S106. Wherein, step S104 may adopt any of the foregoing implementation manners. In this embodiment, before step S106, the method further includes step S104a and step S104b.

步骤S104a：使用所述第一数据模型对所述样本图像进行特征提取。Step S104a: Use the first data model to perform feature extraction on the sample image.

需要说明的是，本步骤可以在步骤S106之前的任意适当的时机执行，其可以在步骤S104之前、之后或者并行执行。It should be noted that this step can be executed at any appropriate timing before step S106, and it can be executed before, after or in parallel with step S104.

第一数据模型可以是是前述的在源场景中预先训练的数据模型。或者，也可以是在目标场景中完成一个或多个训练周期的数据模型。例如，针对第一数据模型A，在第一个训练周期中，使用N张样本图像对其进行训练完成后即认为第一数据模型A完成第一个训练周期，为了便于描述将完成第一个训练周期的数据模型记作A’。在第二个训练周期中，训练对象变为第一数据模型A’。也就是说，在不同的训练周期中，对样本图像进行特征提取的数据模型可以不同，其可以是前一训练周期训练完成的第一数据模型。The first data model may be the aforementioned data model pre-trained in the source scene. Alternatively, it can also be a data model that completes one or more training cycles in the target scene. For example, for the first data model A, in the first training cycle, after it is trained with N sample images, it is considered that the first data model A has completed the first training cycle. For the convenience of description, the first training cycle will be completed. The data model of the training cycle is denoted as A'. In the second training cycle, the training object becomes the first data model A'. That is to say, in different training cycles, the data models for feature extraction on the sample images may be different, which may be the first data models trained in the previous training cycle.

步骤S104b：对所述样本图像的特征信息进行聚类处理，获得所述样本图像的所属类别，并将所述所属类别确定为所述样本图像的单模态监督数据。Step S104b: Perform clustering processing on the feature information of the sample image to obtain the category to which the sample image belongs, and determine the category to be the single-modality supervision data of the sample image.

本领域技术人员可以采用任何适当的聚类方法进行聚类，如K-means聚类算法等。通过聚类处理获得的类别即为样本图像对应的伪标签。Those skilled in the art can use any appropriate clustering method for clustering, such as K-means clustering algorithm and the like. The category obtained through clustering processing is the pseudo-label corresponding to the sample image.

如针对N个样本图像的特征信息，对其进行聚类处理，获得K个类别，其中，样本图像1、2、5属于同一类别，则其所属类别(记作label1)。则样本图像1的伪标签为label1，该伪标签即为对应的单模态监督数据。For example, for the feature information of N sample images, perform clustering processing on them to obtain K categories, among which, sample images 1, 2, and 5 belong to the same category, and the category to which they belong (denoted as label1). Then the pseudo-label of sample image 1 is label1, and the pseudo-label is the corresponding single-modal supervision data.

由于特征信息和聚类的可靠行相对较低，准确率无法保证100％，因此称之为弱监督的监督数据。Due to the relatively low reliability of feature information and clustering, the accuracy cannot be guaranteed to be 100%, so it is called weakly supervised supervised data.

后续可以使用该单模态监督数据，利用分类损失函数，对数据模型进行训练。Subsequently, the single-modal supervision data can be used to train the data model by using the classification loss function.

在获得监督数据后，可以利用这些监督数据对第一数据模型进行训练，以使训练出的第二数据模型能够适应目标场景。在本实施例中，步骤S106包括以下子步骤：After the supervised data is obtained, the first data model can be trained by using the supervised data, so that the trained second data model can be adapted to the target scene. In this embodiment, step S106 includes the following sub-steps:

子步骤S1061：对所述第一监督数据、所述第二监督数据和所述单模态监督数据进行训练权重配置。Sub-step S1061: Perform training weight configuration on the first supervision data, the second supervision data and the single-modal supervision data.

为了提升训练效果，使多个监督数据可以对第一数据模型进行同时作用，进而使训练出的第一数据模型的效果更好，在本实施例中，可以将使用每个监督数据进行训练的过程看成一个子训练框架，并将三个子训练框架同时进行，针对数据模型的训练即为一个多任务学习，同时使用三个子训练框架对同一个第一数据模型进行更新。In order to improve the training effect, multiple supervised data can act on the first data model at the same time, so that the effect of the trained first data model is better, in this embodiment, each supervised data can be used for training The process is regarded as a sub-training framework, and the three sub-training frameworks are carried out at the same time. The training of the data model is a multi-task learning, and the three sub-training frameworks are used to update the same first data model at the same time.

为了保证在这一训练的过程中不同的监督数据对训练结果的影响不同，对第一监督数据、第二监督数据和单模态监督数据进行训练权重配置。如第一监督数据的训练权重为0.7，第二监督数据的训练权重为0.2，单模态监督数据的训练权重为0.1。In order to ensure that different supervised data have different influences on the training results during this training process, training weights are configured for the first supervised data, the second supervised data and the single-modal supervised data. For example, the training weight of the first supervision data is 0.7, the training weight of the second supervision data is 0.2, and the training weight of the single-modal supervision data is 0.1.

当然，本领域技术人员可以根据需要配置任何适当的训练权重，本实施例对此不作限制。Of course, those skilled in the art can configure any appropriate training weights as required, which is not limited in this embodiment.

子步骤S1062：使用所述样本图像、进行了权重配置的所述第一监督数据、所述第二监督数据、所述单模态监督数据和监督数据对应的损失函数，对所述第一数据模型进行多任务训练。Sub-step S1062: Using the sample image, the weight-configured first supervision data, the second supervision data, the single-modal supervision data, and the loss function corresponding to the supervision data, analyze the first data The model is trained on multiple tasks.

例如，一个训练周期的一次训练过程为：根据训练权重从第一监督数据、第二监督数据和单模态监督数据中选取一个监督数据，如选中的监督数据为第一监督数据，则将选中的第一监督数据对应的样本图像作为输入，输入到第一数据模型中，并以第一监督数据作为监督，使用triplet loss损失函数计算损失值，根据损失值对第一数据模型的参数进行更新。For example, a training process in a training cycle is: select one supervision data from the first supervision data, the second supervision data and the single-modal supervision data according to the training weight. If the selected supervision data is the first supervision data, the selected supervision data will be selected. The sample image corresponding to the first supervision data is input into the first data model, and the first supervision data is used as supervision, the loss value is calculated using the triplet loss loss function, and the parameters of the first data model are updated according to the loss value. .

在完成一次训练后，可以重复根据训练权重从第一监督数据、第二监督数据和单模态监督数据中选取一个监督数据的过程。如此重复执行，直至完成一个训练周期。每次训练时使用的损失函数与选中的监督数据对应，如选中的监督数据是第一监督数据，则损失函数为triplet loss损失函数；选中的监督数据是第二监督数据，则损失函数为rankingloss损失函数；选中的监督数据是单模态监督数据，则损失函数为分类损失函数。After one training is completed, the process of selecting one supervised data from the first supervised data, the second supervised data and the single-modal supervised data according to the training weight may be repeated. This is repeated until one training cycle is completed. The loss function used in each training corresponds to the selected supervision data. If the selected supervision data is the first supervision data, the loss function is the triplet loss loss function; if the selected supervision data is the second supervision data, the loss function is rankingloss Loss function; if the selected supervised data is unimodal supervised data, the loss function is the classification loss function.

下面结合一具体使用场景，对数据处理的实现过程进行说明如下：The implementation process of data processing is described below in combination with a specific usage scenario:

在本使用场景中，可以在源场景对数据模型进行训练，将训练获得的第一数据模型作为初始模型。当然，也可以采用其他现有的模型，本使用场景对此不作限制。In this usage scenario, the data model can be trained in the source scenario, and the first data model obtained by training is used as the initial model. Of course, other existing models can also be used, which is not limited in this usage scenario.

针对要转用到的目标场景，获取对应的样本图像，如监控视频中的监控图像。For the target scene to be transferred to, obtain the corresponding sample image, such as the surveillance image in the surveillance video.

如图4b所示，在本使用场景中，为了避免现有技术中在新的场景通过人工标注一批样本图像用于训练存在的成本高、难度大，而且不易推广的问题。本使用场景的数据处理方法通过在源场景训练的初始模型，从目标场景的样本图像中挖掘更多的信息作为监督数据，实现自动化获得监督数据的目的，后续再利用这些监督数据训练和更新初始模型，从而得到在目标场景上效果更好的数据模型。As shown in Figure 4b, in this usage scenario, in order to avoid the problems of high cost, high difficulty, and difficulty in generalization in the prior art by manually labeling a batch of sample images for training in new scenarios. The data processing method of this use scene uses the initial model trained in the source scene to mine more information from the sample images of the target scene as supervised data to achieve the purpose of automatically obtaining supervised data, and then use these supervised data to train and update the initial model, so as to obtain a data model with better effect on the target scene.

在本使用场景中，挖掘三种信息分别为强监督的第一监督数据、弱监督的第二监督数据和单模态监督数据。利用这三种监督数据对第一数据模型进行训练，克服了现有技术中在不进行标注的情况下，利用样本图像中的人体图像来生成一定的伪监督数据帮助训练导致训练出的第二数据模型准确度过低，难以满足实际使用需求的问题。而且能够克服现有技术中，利用源场景人体图像去自动生成假的目标场景的人体图像，用来进行目标场景的训练存在的使用假的目标场景样本图像进行训练训练效果不好的问题。In this usage scenario, three types of information are mined: the first supervised data with strong supervision, the second supervised data with weak supervision, and the unimodal supervision data. Using these three kinds of supervision data to train the first data model overcomes the problem of using the human body images in the sample images to generate certain pseudo-supervised data to help the training without labeling in the prior art. The accuracy of the data model is too low, and it is difficult to meet the actual use requirements. Moreover, it can overcome the problem of using fake target scene sample images for training and training in the prior art to automatically generate fake target scene body images for training the target scene in the prior art.

由于使用了多模态的信息获得的监督数据准确率更高，有助于提升训练的数据模型的效果，相较于使用单模态的伪标签训练效果更好。Since the supervised data obtained by using multi-modal information is more accurate, it helps to improve the effect of the trained data model, which is better than that of single-modal pseudo-label training.

下面结合图4b对获取监督数据和使用监督数据进行训练的过程进行说明：The following describes the process of acquiring supervised data and using supervised data for training in conjunction with Figure 4b:

如图4b所示，在本使用场景中第一数据模型为行人再识别模型。As shown in Figure 4b, in this usage scenario, the first data model is a pedestrian re-identification model.

在获得目标场景的样本图像后，第一方面基于其挖掘强监督的第一监督数据，该监督数据中综合了人脸信息和时空信息，因此获得的第一监督数据为强监督的多模态监督数据。After obtaining the sample image of the target scene, the first aspect mines the strongly supervised first supervised data based on it. The supervised data integrates face information and spatiotemporal information, so the obtained first supervised data is strongly supervised multimodal Monitor data.

具体地，针对样本图像，利用人脸识别技术对目标场景的所有样本图像(即监控视频包含的行人图像)进行人脸识别，获得人脸信息。利用人脸信息，找到这些样本图像中属于同一个人的样本图像，将其关联起来。这些属于同一个人的样本图像用于组成正样本对。如样本图像I_i和样本图像I_j包含相同的人，则这两个样本图像构成一个正样本对。Specifically, for the sample images, face recognition is performed on all sample images of the target scene (that is, pedestrian images included in the surveillance video) by using the face recognition technology to obtain face information. Using the face information, find the sample images belonging to the same person among these sample images, and associate them. These sample images belonging to the same person are used to compose positive sample pairs. If the sample image I _i and the sample image I _j contain the same person, the two sample images constitute a positive sample pair.

由于每个样本图像都具有采集时间和对应的采集其的监控摄像机的编号，而每个监控摄像机都有经纬度信息。因此，对每个样本图像，如样本图像I_i可以确定其对应的监控摄像机的编号，进而获得经纬度信息，利用经纬度信息找到和该监控摄像机距离一公里以外的所有监控摄像机，将这些监控摄像机记为set1。从set1采集的所有样本图像中找到与样本图像I_i采集时间一致的样本图像，这些样本图像就是不可能与样本图像I_i包含相同人的样本图像。利用这些样本图像建立负样本对，其可以记作

Since each sample image has the acquisition time and the corresponding number of the surveillance camera that acquired it, each surveillance camera has longitude and latitude information. Therefore, for each sample image, such as the sample image I _i , the number of the corresponding surveillance camera can be determined, and then the latitude and longitude information can be obtained, and all the surveillance cameras that are one kilometer away from the surveillance camera can be found by using the latitude and longitude information. for set1. From all the sample images collected by set1, find the sample images that are consistent with the collection time of the sample image I _i , and these sample images are the sample images that cannot possibly contain the same person as the sample image I _i . Use these sample images to create negative sample pairs, which can be written as

利用来自于人脸信息的正样本对和来自于时空信息的负样本对，组成强监督多模态监督数据。后续结合样本图像和监督数据可以组成训练数据，表示为

利用triplet loss损失函数，使用训练数据对第一数据模型进行训练。因为，这些样本对的可靠度可以非常高，准确率接近100％，所以称之为强监督多模态监督数据。Using positive sample pairs from face information and negative sample pairs from spatio-temporal information, strong supervision multimodal supervision data is formed. Subsequent combination of sample images and supervised data can form training data, which is expressed as

Using the triplet loss loss function, the first data model is trained using the training data. Because the reliability of these sample pairs can be very high and the accuracy rate is close to 100%, it is called strongly supervised multimodal supervised data.

在第二方面中，基于样本图像挖掘第二监督数据，即弱监督的多模态监督数据。In a second aspect, second supervised data, ie weakly supervised multimodal supervised data, is mined based on sample images.

具体地，基于利用源场景的数据训练出的第一数据模型，对样本图像进行特征提取，获得特征信息。Specifically, based on the first data model trained by using the data of the source scene, feature extraction is performed on the sample image to obtain feature information.

对目标场景中的每张样本图像执行下述过程，从而获得对应的第二监督数据。The following process is performed on each sample image in the target scene, so as to obtain the corresponding second supervision data.

例如，从目标场景中确定一个目标样本图像，除此之外的样本图像即为剩余样本图像。利用目标样本图像的特征信息和剩余样本图像的待排序特征进行排序，获得初始排序结果。For example, a target sample image is determined from the target scene, and the other sample images are the remaining sample images. Use the feature information of the target sample image and the to-be-sorted features of the remaining sample images to perform sorting to obtain an initial sorting result.

利用目标样本图像的人脸信息，对初始排序结果进行重新校正和排序，得到新的更准确的第一预调整排序结果。Using the face information of the target sample image, the initial sorting result is re-corrected and sorted to obtain a new and more accurate first pre-adjusted sorting result.

利用目标样本图像的时空信息，对第一预调整排序结果进行重新校正和排序，得到新的更准确的第二预调整排序结果。Using the spatiotemporal information of the target sample image, the first pre-adjustment and sorting results are re-corrected and sorted to obtain new and more accurate second pre-adjustment and sorting results.

利用目标样本图像的人体属性信息，对第二预调整排序结果进行重新校正和排序，得到新的更准确的目标排序结果。Using the human attribute information of the target sample image, the second pre-adjustment sorting result is re-corrected and sorted to obtain a new and more accurate target sorting result.

将目标排序结果作为目标样本图像的排序标签。后续可以根据目标样本图像和目标排序结果建立弱监督的训练数据。由于目标排序结果的正确率并非100％，因此称之为弱监督的训练数据，利用ranking loss损失函数和形成的训练数据进行训练。Take the target ranking result as the ranking label of the target sample image. Subsequently, weakly supervised training data can be established based on target sample images and target ranking results. Since the correct rate of the target ranking result is not 100%, it is called weakly supervised training data, and the ranking loss loss function and the formed training data are used for training.

在第三方面，在目标场景挖掘弱监督的单模态监督数据。例如，利用源场景训练的第一数据模型，对目标场景的样本图像进行特征提取，获得特征信息。利用聚类方法对这些特征信息进行聚类，将聚类获得的所属类别作为监督数据(即伪标签)。In the third aspect, we mine weakly supervised unimodal supervised data in the target scene. For example, by using the first data model trained on the source scene, feature extraction is performed on the sample image of the target scene to obtain feature information. These feature information are clustered by clustering method, and the category obtained by clustering is used as supervised data (ie, pseudo-label).

后续可以利用聚类得到的伪标签和样本图像建立弱监督的训练数据。由于特征信息和所属类别的可靠行相对较低，准确率无法保证100％，因此称之为弱监督的训练数据。利用弱监督的训练数据使用分类损失函数对数据模型进行训练。Subsequently, weakly supervised training data can be established by using the pseudo-labels and sample images obtained by clustering. Due to the relatively low feature information and reliable rows belonging to the category, the accuracy cannot be guaranteed to be 100%, so it is called weakly supervised training data. The data model is trained using a classification loss function with weakly supervised training data.

在获得监督数据后，每个类型的监督数据看成是一个子训练框架。将三个子训练框架同时进行，把整个训练过程看成是一个多任务学习的过程，即同时训练三个子框架，更新同一个模型。通过训练权重调节不同训练框架对数据模型的影响比重。最后得到需要的新的目标场景的第二数据模型(如行人再识别模型)。After obtaining supervised data, each type of supervised data is treated as a sub-training framework. The three sub-training frameworks are carried out at the same time, and the whole training process is regarded as a multi-task learning process, that is, three sub-frames are trained at the same time and the same model is updated. Adjust the proportion of the impact of different training frameworks on the data model by training weights. Finally, the required second data model of the new target scene (such as a pedestrian re-identification model) is obtained.

在完成一个训练周期后，利用训练好的第一数据模型代替源场景的第一数据模型，重复上述过程，直至完成需要的训练周期数获得第二数据模型。After one training cycle is completed, the first data model of the source scene is replaced by the trained first data model, and the above process is repeated until the required number of training cycles is completed to obtain the second data model.

上述过程中，不仅仅是用了目标场景的样本图像(包含人体图像)来训练第一数据模型，而且引入了人脸信息和时空信息等多种模态信息来提取监督数据，并形成训练数据。In the above process, not only the sample images of the target scene (including human body images) are used to train the first data model, but also various modal information such as face information and spatiotemporal information are introduced to extract supervision data and form training data. .

而且，将多种途径得到的监督数据，据准确率的不同，进行模块化，分到三个不同的子训练框架中，以多任务学习的方法融合到了一起，一起来更新训练第一数据模型。相比于以前只用一种监督数据训练得到的模型，效果上要更好，而且可以不需要在目标场景进行人工标注，直接能自动提高数据模型的训练效果。Moreover, the supervised data obtained in various ways are modularized according to the different accuracy rates, divided into three different sub-training frameworks, and integrated with the method of multi-task learning to update and train the first data model together. . Compared with the previous model trained with only one type of supervised data, the effect is better, and it can directly improve the training effect of the data model without manual annotation in the target scene.

实施例五Embodiment 5

参照图5，示出了根据本发明实施例五的数据处理方法的步骤流程图。Referring to FIG. 5 , a flowchart of steps of a data processing method according to Embodiment 5 of the present invention is shown.

在本实施例中，数据处理方法包括以下步骤：In this embodiment, the data processing method includes the following steps:

步骤S502：至少根据样本图像的时空信息，获得与所述样本图像对应的多模态监督数据。Step S502: Obtain multimodal supervision data corresponding to the sample image at least according to the spatiotemporal information of the sample image.

针对不同的使用需求，可以使用不同的样本图像。例如，针对行人再识别场景中的使用需求，可以使用监控图像作为样本图像。针对物体识别场景中的使用需求，可以使用包含不同的物体的图像作为样本图像，等等。Different sample images can be used for different usage requirements. For example, monitoring images can be used as sample images for use in pedestrian re-identification scenarios. For use in object recognition scenarios, images containing different objects can be used as sample images, and so on.

以行人再识别场景为例，样本图像的时刻信息至少包括样本图像的采集时间和采集位置。采集时间可以拍摄该样本图像时创建的时间戳。采集位置可以是拍摄该样本图像的图像采集设备所在的位置。Taking the pedestrian re-identification scene as an example, the time information of the sample image includes at least the collection time and collection location of the sample image. The acquisition time can be the timestamp created when this sample image was taken. The acquisition location may be the location where the image acquisition device that photographed the sample image is located.

本领域技术人员可以采用任何适当的方式获得样本图像对应的多模态监督数据，本实施例对此不作限制。Those skilled in the art may obtain the multimodal supervision data corresponding to the sample image in any appropriate manner, which is not limited in this embodiment.

以获得第一监督数据为例，第一监督数据用于指示两个样本图像中是否包含相同的目标对象。为此，针对目标样本图像Q，根据目标样本图像Q的人脸信息和剩余的样本图像的人脸信息，可以确定与目标样本图像Q包含相同人脸的样本图像，由于通常情况下相同人脸表示为同一个人，因此包含相同人脸的样本图像与目标样本图像Q组成正样本对，各包含相同人脸的样本图像的第一监督数据为正样本。Taking obtaining the first supervision data as an example, the first supervision data is used to indicate whether the two sample images contain the same target object. Therefore, for the target sample image Q, according to the face information of the target sample image Q and the face information of the remaining sample images, the sample images containing the same face as the target sample image Q can be determined. represents the same person, so the sample images containing the same face and the target sample image Q form a positive sample pair, and the first supervision data of each sample image containing the same face is a positive sample.

同样针对目标样本图像Q，根据目标样本图像Q的时空信息和剩余的样本图像的时空信息，可以确定与目标样本图像Q在同一时刻采集，但采集位置之间的距离大于设定距离阈值(设定距离阈值可以根据需要确定)的样本图像，由于在通常情况下，同一个人在某一时刻不可能同时出现在两个位置，因此，采集位置之间的距离大于设定距离阈值的样本图像与目标样本图像Q组成负样本对，各采集位置之间的距离大于设定距离阈值的样本图像的第一监督数据为负样本。Also for the target sample image Q, according to the spatiotemporal information of the target sample image Q and the spatiotemporal information of the remaining sample images, it can be determined that the target sample image Q is collected at the same time, but the distance between the collection positions is greater than the set distance threshold (set The distance threshold can be determined as needed) sample images, because under normal circumstances, it is impossible for the same person to appear in two locations at the same time at a certain moment, therefore, the distance between the collection locations is greater than the set distance threshold. The target sample image Q constitutes a negative sample pair, and the first supervision data of the sample image whose distance between each collection position is greater than the set distance threshold is a negative sample.

通过这种方式，由于在确定监督数据时至少综合了样本图像的时空信息，因此可以获得多模态监督数据，不仅实现了自动获取监督数据，而且无需依赖人工标注，减少了获取监督数据的成本。In this way, since at least the spatiotemporal information of the sample images is integrated when determining the supervision data, multi-modal supervision data can be obtained, which not only realizes the automatic acquisition of supervision data, but also does not need to rely on manual annotation, reducing the cost of obtaining supervision data. .

步骤S504：至少根据所述样本图像和所述多模态监督数据，对待训练的数据模型进行训练，并得到目标数据模型。Step S504: Train the data model to be trained according to at least the sample image and the multimodal supervision data, and obtain a target data model.

其中，所述待训练的数据模型包括机器学习模型。针对不同的使用场景，其可以是不同形式的机器学习模型。如在行人再识别场景中，其可以是卷积神经网络模型，或者结合注意力机制的卷积神经网络模型，等等。Wherein, the data model to be trained includes a machine learning model. It can be a different form of machine learning model for different usage scenarios. For example, in a pedestrian re-identification scenario, it can be a convolutional neural network model, or a convolutional neural network model combined with an attention mechanism, and so on.

在使用样本图像对待训练的数据模型进行训练的过程中，以样本图像对应的多模态监督数据作为监督，并计算损失值，进而根据损失值对待训练的数据模型中的参数进行更新，以获得满足使用场景需求的目标数据模型。In the process of using the sample images to train the data model to be trained, the multimodal supervision data corresponding to the sample images is used as supervision, and the loss value is calculated, and then the parameters in the data model to be trained are updated according to the loss value to obtain A target data model that meets the needs of the usage scenario.

例如，在训练时，样本图像Q和样本图像W组成正样本对，对应的第一监督数据指示样本图像W为正样本。则将样本图像Q和样本图像W输入到待训练的数据模型中，获得待训练的数据模型的输出数据，根据第一监督数据和输出数据计算损失值，并根据损失值调整待训练的数据模型的参数。再将其他正样本对输入到调参后的待训练的数据模型中，如此重复直至满足训练终止条件，并获得目标数据模型。For example, during training, the sample image Q and the sample image W form a positive sample pair, and the corresponding first supervision data indicates that the sample image W is a positive sample. Then the sample image Q and the sample image W are input into the data model to be trained, the output data of the data model to be trained is obtained, the loss value is calculated according to the first supervision data and the output data, and the data model to be trained is adjusted according to the loss value. parameter. Then input other positive sample pairs into the data model to be trained after parameter adjustment, and repeat until the training termination condition is met, and the target data model is obtained.

可选地，在本实施例中，针对行人再识别场景训练的目标数据模型可以应用到对目标对象的路径识别中。例如，通过下述步骤实现对目标对象的移动路径的识别。Optionally, in this embodiment, the target data model trained for the pedestrian re-identification scene can be applied to the path recognition of the target object. For example, the recognition of the movement path of the target object is achieved through the following steps.

步骤S506：获取视频数据中的多帧图像，所述多帧图像中至少部分帧图像包含目标对象。Step S506: Acquire multiple frames of images in the video data, where at least some of the frame images in the multiple frames of images include the target object.

多帧图像中的至少部分帧图像可以是一帧图像或一帧以上的图像。以视频数据是监控视频为例，监控视频中可以有一帧图像或一帧以上的图像拍摄到目标对象。At least part of the frame images in the multi-frame images may be one frame image or more than one frame image. Taking the video data as surveillance video as an example, in the surveillance video, there may be one frame of image or more than one frame of images to capture the target object.

步骤S508：使用所述目标数据模型对所述多帧图像进行对象识别。Step S508: Use the target data model to perform object recognition on the multi-frame images.

在进行对象识别时，将多帧图像输入到目标数据模型中，使用目标数据模型识别出包含目标对象的目标帧图像作为识别结果。During object recognition, multiple frames of images are input into the target data model, and the target frame images containing the target object are identified by using the target data model as the recognition result.

步骤S510：根据所述识别结果，确定目标对象的移动路径。Step S510: Determine the movement path of the target object according to the recognition result.

针对识别结果中包含的各目标帧图像，按照各目标帧图像的采集时间对各目标帧图像的采集位置进行组合，从而形成目标对象的移动路径。For each target frame image included in the recognition result, the collection positions of each target frame image are combined according to the collection time of each target frame image, thereby forming a moving path of the target object.

例如，目标帧图像包括图像1、图像2、图像3和图像7，分别对应的采集时间为t1时刻、t2时刻、t3时刻和t7时刻，对应的采集位置为P1位置、P2位置、P3位置和P2位置，则基于目标帧图像确定目标对象的移动路径可以表示为：P1—P2—P3—P2。For example, the target frame image includes image 1, image 2, image 3 and image 7, the corresponding acquisition times are time t1, time t2, time t3 and time t7 respectively, and the corresponding acquisition positions are P1 position, P2 position, P3 position and The position of P2, the moving path of the target object determined based on the target frame image can be expressed as: P1-P2-P3-P2.

通过该数据处理方法可以基于样本图像的时空信息等挖掘出样本图像的多模态监督数据，因而使得对待训练的数据模型进行训练用的监督数据无需人工手动标注，从而降低了训练出目标数据模型的难度和人工成本，实现了不需要对样本图像进行人工标注，就可以对数据模型进行训练的目的，而且由于训练时使用了多模态监督数据，因而可以提升对数据模型进行训练的效果。Through this data processing method, the multi-modal supervision data of the sample image can be mined based on the spatiotemporal information of the sample image, so that the supervision data used for training the data model to be trained does not need to be manually labeled, thereby reducing the need for training the target data model. It realizes the purpose of training the data model without manual annotation of sample images, and because multi-modal supervision data is used during training, the training effect of the data model can be improved.

实施例六Embodiment 6

参照图6a，示出了本发明实施例六的一种数据处理方法的步骤流程图。Referring to Fig. 6a, a flowchart of steps of a data processing method according to Embodiment 6 of the present invention is shown.

本实施例中，以将数据处理方法部署于服务端(如云端或服务器或SaaS平台)，根据客户端请求对第一数据模型进行训练为例，对本发明实施例提供的数据处理方法进行说明。In this embodiment, the data processing method provided by the embodiment of the present invention is described by taking the example of deploying the data processing method on the server (eg, cloud or server or SaaS platform) and training the first data model according to the client's request.

本实施例的数据处理方法包括以下步骤：The data processing method of this embodiment includes the following steps:

步骤S602：接收客户端通过调用预设的训练接口发送的、用于请求对第一数据模型进行训练的模型训练请求。Step S602: Receive a model training request for requesting to train the first data model and sent by the client by calling a preset training interface.

以方法部署于SaaS平台为例，训练接口可以是SaaS平台预设的用于接收客户端的模型训练请求的接口。该接口可以的根据需要配置为任何适当的形式，本实施例对此不作限制。Taking the method deployed on the SaaS platform as an example, the training interface may be an interface preset by the SaaS platform for receiving a model training request from a client. The interface may be configured in any appropriate form as required, which is not limited in this embodiment.

第一数据模型可以是实施例一至四中任一所述的模型。所述模型训练请求可以为任意适当形式的请求。The first data model may be the model described in any one of the first to fourth embodiments. The model training request may be any suitable form of request.

步骤S604：根据所述模型训练请求，获取用于训练的样本图像。Step S604: Obtain sample images for training according to the model training request.

第一数据模型可以部署于SaaS平台，也可以由SaaS平台通过网络从客户端或者第三方获取，本实施例对此不作限制。The first data model may be deployed on the SaaS platform, or may be acquired by the SaaS platform from a client or a third party through a network, which is not limited in this embodiment.

以第一数据模型部署于SaaS平台为例，获取样本图像时，通过SaaS平台接收客户端的模型训练请求对第一数据模型进行训练时，可以根据所述模型训练请求，从SaaS平台本地获取用于训练的样本图像。此种情况下，SaaS平台本地存储有适用的样本图像，则可直接获得，由此可提高对模型训练的速度和效率。Taking the deployment of the first data model on the SaaS platform as an example, when obtaining a sample image, when the first data model is trained by receiving a model training request from the client through the SaaS platform, the model training request can be locally obtained from the SaaS platform for training. training sample images. In this case, the SaaS platform stores applicable sample images locally and can obtain them directly, thereby improving the speed and efficiency of model training.

在另一种可行方式中，通过SaaS平台接收客户端的模型训练请求对第一数据模型进行训练时，可以由SaaS平台根据所述模型训练请求，从第三方采集用于训练的样本图像。如通过网络从第三方网站或从第三方应用提供的数据接口获取样本图像。此种情况下，SaaS平台通过第三方获得样本图像，无需本地存储，节省了SaaS平台的存储资源。In another feasible manner, when the first data model is trained by receiving a model training request from the client through the SaaS platform, the SaaS platform may collect sample images for training from a third party according to the model training request. Such as obtaining sample images from third-party websites or data interfaces provided by third-party applications through the network. In this case, the SaaS platform obtains sample images from a third party without local storage, which saves the storage resources of the SaaS platform.

在再一种可行方式中，通过SaaS平台接收客户端模型的训练请求对第一数据模型进行训练时，可以由SaaS平台根据所述模型训练请求，从所述客户端获取用于训练的样本图像。此种情况下，在客户端中存储有样本图像，SaaS平台从客户端获得样本图像，可以训练出更符合客户端需求的第二数据模型。In yet another feasible manner, when training the first data model by receiving a training request from a client model through the SaaS platform, the SaaS platform may obtain sample images for training from the client according to the model training request. . In this case, sample images are stored in the client, and the SaaS platform obtains the sample images from the client, and can train a second data model that better meets the needs of the client.

步骤S606：获得所述样本图像对应的多模态监督数据，并使用所述样本图像和所述多模态监督数据对第一数据模型进行训练。Step S606: Obtain multimodal supervision data corresponding to the sample image, and use the sample image and the multimodal supervision data to train a first data model.

例如，可以通过实施例一至四中任一的方式获得样本图像对应的多模态监督数据，并使用样本图像和多模态监督数据对第一数据模型进行训练，以获得第二数据模型，在此不再赘述。For example, the multimodal supervision data corresponding to the sample image can be obtained by any one of Embodiments 1 to 4, and the first data model can be trained by using the sample image and the multimodal supervision data to obtain the second data model. This will not be repeated here.

以下，以第一数据模型部署在SaaS平台为示例，对上述过程进行示例性说明，如图6b所示。Hereinafter, the above process is exemplarily described by taking the first data model deployed on the SaaS platform as an example, as shown in FIG. 6b.

图6b中，客户端向SaaS平台发送模型训练请求；SaaS平台在接收到该请求后，由处理设备从本地存储设备中获取用于训练第一数据模型的样本图像；SaaS平台基于获取的样本图像，至少根据样本图像的时空信息获得样本图像的多模态监督数据。使用样本图像和多模态监督数据对第一数据模型进行训练；SaaS平台在完成对第一数据模型的训练获得第二数据模型后，向客户端发送训练完成消息。后续，客户端若有需求，则可向SaaS平台发送视频数据或待检测图像，以获得相应的目标对象的移动路径。In Figure 6b, the client sends a model training request to the SaaS platform; after the SaaS platform receives the request, the processing device obtains a sample image for training the first data model from the local storage device; the SaaS platform obtains a sample image based on the obtained sample image , and obtain the multimodal supervision data of the sample image at least according to the spatiotemporal information of the sample image. The first data model is trained using the sample images and the multimodal supervision data; after the SaaS platform completes the training of the first data model and obtains the second data model, the SaaS platform sends a training completion message to the client. Subsequently, if the client has requirements, it can send video data or images to be detected to the SaaS platform to obtain the corresponding moving path of the target object.

以上，以第一数据模型部署于SaaS平台为例，但本领域技术人员应当明了，对于第一数据模型部署于其它形式的服务端的情况，同样适用地本实施例的方案。In the above, the first data model is deployed on the SaaS platform as an example, but those skilled in the art should understand that the solution of this embodiment is also applicable to the case where the first data model is deployed on other forms of servers.

可见，通过本实施例，将第一数据模型及其训练均部署于服务端，由服务端根据客户端的请求对第一数据模型进行训练，实现了对客户端资源或性能无要求条件下的第一数据模型的训练，保证了训练效果和效率。It can be seen that through this embodiment, the first data model and its training are both deployed on the server, and the server trains the first data model according to the client's request, thereby realizing the first data model without requiring client resources or performance. The training of a data model ensures the training effect and efficiency.

实施例七Embodiment 7

参照图7，示出了根据本发明实施例七的数据处理装置的结构框图。Referring to FIG. 7 , it shows a structural block diagram of a data processing apparatus according to Embodiment 7 of the present invention.

本实施例中，数据处理装置包括：第一获取模块702，用于获取第一数据模型及样本图像，其中，所述第一数据模型包括机器学习模型；第二获取模块704，用于至少根据所述样本图像的时空信息，获得与所述样本图像对应的多模态监督数据；第一训练模块706，用于至少根据所述样本图像和所述多模态监督数据，对所述第一数据模型进行训练，得到第二数据模型。In this embodiment, the data processing apparatus includes: a first acquisition module 702 for acquiring a first data model and a sample image, wherein the first data model includes a machine learning model; a second acquisition module 704 for at least according to The spatiotemporal information of the sample image, to obtain the multi-modal supervision data corresponding to the sample image; the first training module 706 is used for at least according to the sample image and the multi-modal supervision data. The data model is trained to obtain a second data model.

可选地，第二获取模块704包括：人脸识别模块7041，用于对所述样本图像进行人脸识别，获得所述样本图像对应的人脸信息；确定模块7042，用于根据所述人脸信息和所述样本图像的时空信息，确定与所述样本图像对应的第一监督数据。Optionally, the second obtaining module 704 includes: a face recognition module 7041, configured to perform face recognition on the sample image, and obtain face information corresponding to the sample image; The face information and the spatiotemporal information of the sample image determine the first supervision data corresponding to the sample image.

可选地，确定模块7042用于根据所述样本图像的人脸信息，确定包含相同人脸的样本图像组成的正样本集合；根据所述样本图像的时空信息，确定满足设定条件的样本图像组成的负样本集合；根据所述正样本集合和所述负样本集合，确定与所述样本图像对应的第一监督数据。Optionally, the determining module 7042 is configured to determine a positive sample set consisting of sample images containing the same face according to the face information of the sample images; according to the spatiotemporal information of the sample images, determine the sample images that meet the set conditions. A set of negative samples formed; according to the set of positive samples and the set of negative samples, determine the first supervision data corresponding to the sample images.

可选地，确定模块7042用于在所述根据所述样本图像的时空信息，确定满足设定条件的样本图像组成的负样本集合时，从样本图像中确定目标样本图像；从除所述目标样本图像之外的样本图像中，与所述目标样本图像的采集时间相同、且采集位置之间的距离大于设定距离阈值的样本图像确定为负样本图像；根据所述负样本图像确定与所述目标样本图像对应的负样本集合。Optionally, the determining module 7042 is configured to determine the target sample image from the sample images when the negative sample set composed of the sample images satisfying the set condition is determined according to the spatiotemporal information of the sample images; Among the sample images other than the sample images, the sample images whose acquisition time is the same as that of the target sample image and the distance between the acquisition positions is greater than the set distance threshold value are determined as negative sample images; The set of negative samples corresponding to the target sample image.

可选地，确定模块7042包括：第一特征提取模块7042a，用于使用所述第一数据模型对所述样本图像进行特征提取，获取用于表征人体图像信息的特征信息；排序模块7042b，用于至少根据所述样本图像的时空信息、和所述特征信息与待排序特征之间的相似度，对待排序特征信息进行排序，获得目标排序结果；监督确定模块7042c，用于将所述目标排序结果确定为第二监督数据。Optionally, the determining module 7042 includes: a first feature extraction module 7042a, configured to perform feature extraction on the sample image by using the first data model, and obtain feature information used to characterize human body image information; a sorting module 7042b, using at least according to the spatiotemporal information of the sample images and the similarity between the feature information and the features to be sorted, the feature information to be sorted is sorted to obtain a target sorting result; the supervision and determination module 7042c is used to sort the targets The result is determined as the second supervision data.

可选地，所述排序模块7042b用于从样本图像中确定目标样本图像；根据目标样本图像的特征信息与待排序特征之间的相似度，对待排序特征信息进行排序，获得初始排序结果；至少根据所述目标样本图像的时空信息，对初始排序结果进行调整，获得所述目标排序结果。Optionally, the sorting module 7042b is used to determine the target sample image from the sample images; according to the similarity between the feature information of the target sample image and the feature to be sorted, sort the feature information to be sorted to obtain an initial sorting result; at least According to the spatiotemporal information of the target sample image, the initial sorting result is adjusted to obtain the target sorting result.

可选地，所述排序模块7042b用于在至少根据所述目标样本图像的时空信息，对初始排序结果进行调整，获得所述目标排序结果时，根据所述目标样本图像的人脸信息和所述待排序特征对应的样本图像的人脸信息之间的相似度，对所述初始排序结果进行第一调整，获得第一预调整排序结果；根据所述目标样本图像的时空信息和所述待排序特征对应的样本图像的时空信息，对第一预调整排序结果进行第二调整，获得第二预调整排序结果；根据所述目标样本图像的人体属性信息和所述待排序特征对应的样本图像的人体属性信息，对所述第二预调整排序结果进行第三调整，获得所述目标排序结果。Optionally, the sorting module 7042b is configured to adjust the initial sorting result at least according to the spatiotemporal information of the target sample image, and when obtaining the target sorting result, according to the face information of the target sample image and all According to the similarity between the face information of the sample images corresponding to the features to be sorted, perform a first adjustment on the initial sorting result to obtain a first pre-adjustment sorting result; according to the spatiotemporal information of the target sample image and the According to the spatiotemporal information of the sample images corresponding to the sorting features, a second adjustment is performed on the first pre-adjustment sorting result to obtain a second pre-adjusting sorting result; according to the human attribute information of the target sample image and the sample image corresponding to the to-be-sorted feature The human body attribute information is obtained, and a third adjustment is performed on the second pre-adjustment sorting result to obtain the target sorting result.

可选地，所述装置还包括：第二特征提取模块708，用于在第一训练模块706至少根据所述样本图像和所述多模态监督数据，对所述第一数据模型进行训练，得到第二数据模型之前，使用所述第一数据模型对所述样本图像进行特征提取；聚类模块710，用于对所述样本图像的特征信息进行聚类处理，获得所述样本图像的所属类别，并将所述所属类别确定为所述样本图像的单模态监督数据。Optionally, the apparatus further includes: a second feature extraction module 708, configured to train the first data model in the first training module 706 at least according to the sample images and the multimodal supervision data, Before obtaining the second data model, use the first data model to perform feature extraction on the sample image; the clustering module 710 is configured to perform clustering processing on the feature information of the sample image to obtain the belonging of the sample image. category, and the category to which it belongs is determined as the single-modal supervision data of the sample image.

可选地，所述多模态监督数据包括第一监督数据和第二监督数据；第一训练模块706包括：权重配置模块7061，用于对所述第一监督数据、所述第二监督数据和所述单模态监督数据进行训练权重配置；模型第一训练模块7062，用于使用所述样本图像、进行了权重配置的所述第一监督数据、所述第二监督数据、所述单模态监督数据和监督数据对应的损失函数，对所述数据模型进行多任务训练。Optionally, the multimodal supervision data includes first supervision data and second supervision data; the first training module 706 includes: a weight configuration module 7061, which is used for comparing the first supervision data and the second supervision data Perform training weight configuration with the single-modal supervision data; the model first training module 7062 is used to use the sample image, the first supervision data, the second supervision data, the single The modal supervision data and the loss function corresponding to the supervision data are used to perform multi-task training on the data model.

本实施例的数据处理装置用于实现前述多个方法实施例中相应的数据处理方法，并具有相应的方法实施例的有益效果，在此不再赘述。此外，本实施例的数据处理装置中的各个模块的功能实现均可参照前述方法实施例中的相应部分的描述，在此亦不再赘述。The data processing apparatus in this embodiment is used to implement the corresponding data processing methods in the foregoing multiple method embodiments, and has the beneficial effects of the corresponding method embodiments, which will not be repeated here. In addition, for the function implementation of each module in the data processing apparatus of this embodiment, reference may be made to the descriptions of the corresponding parts in the foregoing method embodiments, and details are not repeated here.

实施例八Embodiment 8

参照图8，示出了根据本发明实施例八的一种数据处理装置的结构框图。Referring to FIG. 8 , a structural block diagram of a data processing apparatus according to Embodiment 8 of the present invention is shown.

本实施例中，数据处理装置包括：In this embodiment, the data processing device includes:

第三获取模块802，用于至少根据样本图像的时空信息，获得与所述样本图像对应的多模态监督数据；A third obtaining module 802, configured to obtain multimodal supervision data corresponding to the sample image at least according to the spatiotemporal information of the sample image;

第二训练模块804，用于至少根据所述样本图像和所述多模态监督数据，对待训练的数据模型进行训练，并得到目标数据模型，其中，所述待训练的数据模型包括机器学习模型。The second training module 804 is configured to train a data model to be trained according to at least the sample image and the multimodal supervision data, and obtain a target data model, wherein the data model to be trained includes a machine learning model .

可选地，所述装置还包括：Optionally, the device further includes:

第四获取模块806，用于获取视频数据中的多帧图像，所述多帧图像中至少部分帧图像包含目标对象；a fourth acquisition module 806, configured to acquire multiple frames of images in the video data, at least some of the frame images in the multiple frames of images include a target object;

对象识别模块808，用于使用所述目标数据模型对所述多帧图像进行对象识别；an object recognition module 808, configured to perform object recognition on the multi-frame images using the target data model;

路径确定模块810，用于根据所述识别结果，确定目标对象的移动路径。The path determination module 810 is configured to determine the moving path of the target object according to the recognition result.

实施例九Embodiment 9

参照图9，示出了根据本发明实施例六的一种电子设备的结构示意图，本发明具体实施例并不对电子设备的具体实现做限定。Referring to FIG. 9 , a schematic structural diagram of an electronic device according to Embodiment 6 of the present invention is shown. The specific embodiment of the present invention does not limit the specific implementation of the electronic device.

如图9所示，该电子设备可以包括：处理器(processor)902、通信接口(Communications Interface)904、存储器(memory)906、以及通信总线908。As shown in FIG. 9 , the electronic device may include: a processor (processor) 902 , a communication interface (Communications Interface) 904 , a memory (memory) 906 , and a communication bus 908 .

其中：in:

处理器902、通信接口904、以及存储器906通过通信总线908完成相互间的通信。The processor 902 , the communication interface 904 , and the memory 906 communicate with each other through the communication bus 908 .

通信接口904，用于与其它电子设备或服务器进行通信。The communication interface 904 is used to communicate with other electronic devices or servers.

处理器902，用于执行程序910，具体可以执行上述校验码生成方法实施例中的相关步骤。The processor 902 is configured to execute the program 910, and specifically may execute the relevant steps in the above-mentioned embodiments of the check code generation method.

具体地，程序910可以包括程序代码，该程序代码包括计算机操作指令。Specifically, the program 910 may include program code including computer operation instructions.

处理器92可能是中央处理器CPU，或者是特定集成电路ASIC(ApplicationSpecific Integrated Circuit)，或者是被配置成实施本发明实施例的一个或多个集成电路。智能设备包括的一个或多个处理器，可以是同一类型的处理器，如一个或多个CPU；也可以是不同类型的处理器，如一个或多个CPU以及一个或多个ASIC。The processor 92 may be a central processing unit (CPU), or an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention. One or more processors included in the smart device may be the same type of processors, such as one or more CPUs; or may be different types of processors, such as one or more CPUs and one or more ASICs.

存储器906，用于存放程序910。存储器906可能包含高速RAM存储器，也可能还包括非易失性存储器(non-volatile memory)，例如至少一个磁盘存储器。The memory 906 is used to store the program 910 . Memory 906 may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk memory.

第一实施方式：First Embodiment:

程序910具体可以用于使得处理器902执行以下操作：获取第一数据模型及样本图像，所述第一数据模型包括机器学习模型；至少根据所述样本图像的时空信息，获得与所述样本图像对应的多模态监督数据；至少根据所述样本图像和所述多模态监督数据，对所述第一数据模型进行训练，得到第二数据模型。The program 910 can be specifically configured to cause the processor 902 to perform the following operations: acquire a first data model and a sample image, where the first data model includes a machine learning model; Corresponding multi-modal supervision data; at least according to the sample image and the multi-modal supervision data, the first data model is trained to obtain a second data model.

在一种可选的实施方式中，程序910还用于使得处理器902在至少根据所述样本图像的时空信息，获得与所述样本图像对应的多模态监督数据时，对所述样本图像进行人脸识别，获得所述样本图像对应的人脸信息；根据所述人脸信息和所述样本图像的时空信息，确定与所述样本图像对应的第一监督数据。In an optional implementation manner, the program 910 is further configured to cause the processor 902 to perform an analysis on the sample image when obtaining multimodal supervision data corresponding to the sample image according to at least the spatiotemporal information of the sample image. Perform face recognition to obtain face information corresponding to the sample image; and determine first supervision data corresponding to the sample image according to the face information and the spatiotemporal information of the sample image.

在一种可选的实施方式中，程序910还用于使得处理器902在根据所述人脸信息和所述样本图像的时空信息，确定与所述样本图像对应的第一监督数据时，根据所述样本图像的人脸信息，确定包含相同人脸的样本图像组成的正样本集合；根据所述样本图像的时空信息，确定满足设定条件的样本图像组成的负样本集合；根据所述正样本集合和所述负样本集合，确定与所述样本图像对应的第一监督数据。In an optional implementation manner, the program 910 is further configured to make the processor 902 determine the first supervision data corresponding to the sample image according to the face information and the spatiotemporal information of the sample image, according to The face information of the sample images determines a positive sample set composed of sample images containing the same face; according to the spatiotemporal information of the sample images, determines a negative sample set composed of sample images that meet the set conditions; The sample set and the negative sample set determine the first supervision data corresponding to the sample image.

在一种可选的实施方式中，程序910还用于使得处理器902在根据所述样本图像的时空信息，确定满足设定条件的样本图像组成的负样本集合时，从样本图像中确定目标样本图像；从除所述目标样本图像之外的样本图像中，与所述目标样本图像的采集时间相同、且采集位置之间的距离大于设定距离阈值的样本图像确定为负样本图像；根据所述负样本图像确定与所述目标样本图像对应的负样本集合。In an optional implementation manner, the program 910 is further configured to cause the processor 902 to determine the target from the sample images when determining a negative sample set consisting of sample images that satisfy the set condition according to the spatiotemporal information of the sample images sample image; from the sample images other than the target sample image, the sample image whose acquisition time is the same as that of the target sample image and the distance between the acquisition positions is greater than the set distance threshold is determined as a negative sample image; according to The negative sample image determines a negative sample set corresponding to the target sample image.

在一种可选的实施方式中，程序910还用于使得处理器902在至少根据所述样本图像的时空信息，获得与所述样本图像对应的多模态监督数据时，使用所述第一数据模型对所述样本图像进行特征提取，获取用于表征人体图像信息的特征信息；至少根据所述样本图像的时空信息、和所述特征信息与待排序特征之间的相似度，对待排序特征信息进行排序，获得目标排序结果；将所述目标排序结果确定为第二监督数据。In an optional implementation manner, the program 910 is further configured to cause the processor 902 to use the first method when obtaining the multimodal supervision data corresponding to the sample image according to at least the spatiotemporal information of the sample image. The data model performs feature extraction on the sample image to obtain feature information used to characterize human body image information; at least according to the spatiotemporal information of the sample image and the similarity between the feature information and the feature to be sorted, the feature to be sorted is The information is sorted to obtain a target sorting result; the target sorting result is determined as the second supervision data.

在一种可选的实施方式中，程序910还用于使得处理器902在至少根据所述样本图像的时空信息、和所述特征信息与待排序特征之间的相似度，对待排序特征信息进行排序，获得目标排序结果时，从样本图像中确定目标样本图像；根据目标样本图像的特征信息与待排序特征之间的相似度，对待排序特征信息进行排序，获得初始排序结果；至少根据所述目标样本图像的时空信息，对初始排序结果进行调整，获得所述目标排序结果。In an optional implementation manner, the program 910 is further configured to cause the processor 902 to perform the sorting process on the feature information to be sorted according to at least the spatiotemporal information of the sample image and the similarity between the feature information and the feature to be sorted. Sorting, when the target sorting result is obtained, the target sample image is determined from the sample images; according to the similarity between the feature information of the target sample image and the feature to be sorted, the feature information to be sorted is sorted to obtain the initial sorting result; at least according to the Based on the spatiotemporal information of the target sample image, the initial sorting result is adjusted to obtain the target sorting result.

在一种可选的实施方式中，程序910还用于使得处理器902在至少根据所述目标样本图像的时空信息，对初始排序结果进行调整，获得所述目标排序结果时，根据所述目标样本图像的人脸信息和所述待排序特征对应的样本图像的人脸信息之间的相似度，对所述初始排序结果进行第一调整，获得第一预调整排序结果；根据所述目标样本图像的时空信息和所述待排序特征对应的样本图像的时空信息，对第一预调整排序结果进行第二调整，获得第二预调整排序结果；根据所述目标样本图像的人体属性信息和所述待排序特征对应的样本图像的人体属性信息，对所述第二预调整排序结果进行第三调整，获得所述目标排序结果。In an optional implementation manner, the program 910 is further configured to make the processor 902 adjust the initial sorting result according to at least the spatiotemporal information of the target sample image, and when obtaining the target sorting result, adjust the sorting result according to the target the similarity between the face information of the sample image and the face information of the sample image corresponding to the feature to be sorted, first adjust the initial sorting result to obtain a first pre-adjustment sorting result; according to the target sample The spatiotemporal information of the image and the spatiotemporal information of the sample image corresponding to the feature to be sorted, perform a second adjustment on the first pre-adjustment sorting result, and obtain a second pre-adjustment sorting result; The human body attribute information of the sample images corresponding to the features to be sorted is performed, and a third adjustment is performed on the second pre-adjustment sorting result to obtain the target sorting result.

在一种可选的实施方式中，程序910还用于使得处理器902在至少根据所述样本图像和所述多模态监督数据，对所述第一数据模型进行训练，得到第二数据模型之前，使用所述第一数据模型对所述样本图像进行特征提取；对所述样本图像的特征信息进行聚类处理，获得所述样本图像的所属类别，并将所述所属类别确定为所述样本图像的单模态监督数据。In an optional implementation manner, the program 910 is further configured to cause the processor 902 to train the first data model at least according to the sample image and the multimodal supervision data to obtain a second data model Before, use the first data model to perform feature extraction on the sample image; perform clustering processing on the feature information of the sample image to obtain the category to which the sample image belongs, and determine the category as the Single-modal supervised data for sample images.

在一种可选的实施方式中，多模态监督数据包括第一监督数据和第二监督数据；程序910还用于使得处理器902在至少根据所述样本图像和所述多模态监督数据，对所述第一数据模型进行训练，得到第二数据模型时，对所述第一监督数据、所述第二监督数据和所述单模态监督数据进行训练权重配置；使用所述样本图像、进行了权重配置的所述第一监督数据、所述第二监督数据、所述单模态监督数据和监督数据对应的损失函数，对所述数据模型进行多任务训练。In an optional implementation manner, the multimodal supervision data includes first supervision data and second supervision data; the program 910 is further configured to cause the processor 902 to perform an operation based on at least the sample image and the multimodal supervision data. , train the first data model, and when the second data model is obtained, perform training weight configuration on the first supervision data, the second supervision data and the single-modal supervision data; use the sample image , The first supervision data, the second supervision data, the single-modal supervision data, and the loss function corresponding to the supervision data, which have been weighted, perform multi-task training on the data model.

第二实施方式：Second embodiment:

程序910具体可以用于使得处理器902执行以下操作：至少根据样本图像的时空信息，获得与所述样本图像对应的多模态监督数据；至少根据所述样本图像和所述多模态监督数据，对待训练的数据模型进行训练，并得到目标数据模型，其中，所述待训练的数据模型包括机器学习模型。The program 910 may be specifically configured to cause the processor 902 to perform the following operations: at least according to the spatiotemporal information of the sample image, obtain multimodal supervision data corresponding to the sample image; at least according to the sample image and the multimodal supervision data , train the data model to be trained, and obtain the target data model, wherein the data model to be trained includes a machine learning model.

在一种可选的实施方式中，程序910还用于使得处理器902获取视频数据中的多帧图像，所述多帧图像中至少部分帧图像包含目标对象；使用所述目标数据模型对所述多帧图像进行对象识别；根据所述识别结果，确定目标对象的移动路径。In an optional implementation manner, the program 910 is further configured to cause the processor 902 to acquire multiple frames of images in the video data, and at least some of the frame images in the multiple frames of images include the target object; The multi-frame images are used for object recognition; according to the recognition results, the moving path of the target object is determined.

第三实施方式：Third Embodiment:

程序910具体可以用于使得处理器902执行以下操作：接收客户端通过调用预设的训练接口发送的、用于请求对第一数据模型进行训练的模型训练请求；根据所述模型训练请求，获取用于训练的样本图像；通过如第一实施方式的数据处理方法获得所述样本图像对应的多模态监督数据，并使用所述样本图像和所述多模态监督数据对第一数据模型进行训练。The program 910 can specifically be used to cause the processor 902 to perform the following operations: receive a model training request sent by the client by calling a preset training interface for requesting training of the first data model; and obtain the model training request according to the model training request. Sample images used for training; obtain the multimodal supervision data corresponding to the sample images by using the data processing method according to the first embodiment, and use the sample images and the multimodal supervision data to carry out the first data model. train.

在一种可选的实施方式中，程序910还用于使得处理器902在根据所述模型训练请求，获取样本图像时，根据所述模型训练请求，从SaaS平台本地获取样本图像；或者，根据所述模型训练请求，由SaaS平台从第三方样本图像；或者，根据所述模型训练请求，由SaaS平台从所述客户端获取样本图像。In an optional implementation manner, the program 910 is further configured to cause the processor 902 to obtain the sample image locally from the SaaS platform according to the model training request when obtaining the sample image according to the model training request; or, according to the model training request; For the model training request, the SaaS platform obtains sample images from a third party; or, according to the model training request, the SaaS platform obtains sample images from the client.

程序910中各步骤的具体实现可以参见上述数据处理方法实施例中的相应步骤和单元中对应的描述，在此不赘述。所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的设备和模块的具体工作过程，可以参考前述方法实施例中的对应过程描述，在此不再赘述。For the specific implementation of the steps in the program 910, reference may be made to the corresponding descriptions in the corresponding steps and units in the above data processing method embodiments, which are not repeated here. Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the above-described devices and modules, reference may be made to the corresponding process descriptions in the foregoing method embodiments, which will not be repeated here.

需要指出，根据实施的需要，可将本发明实施例中描述的各个部件/步骤拆分为更多部件/步骤，也可将两个或多个部件/步骤或者部件/步骤的部分操作组合成新的部件/步骤，以实现本发明实施例的目的。It should be pointed out that, according to the needs of implementation, each component/step described in the embodiments of the present invention may be split into more components/steps, or two or more components/steps or some operations of components/steps may be combined into New components/steps to achieve the purpose of embodiments of the present invention.

上述根据本发明实施例的方法可在硬件、固件中实现，或者被实现为可存储在记录介质(诸如CD ROM、RAM、软盘、硬盘或磁光盘)中的软件或计算机代码，或者被实现通过网络下载的原始存储在远程记录介质或非暂时机器可读介质中并将被存储在本地记录介质中的计算机代码，从而在此描述的方法可被存储在使用通用计算机、专用处理器或者可编程或专用硬件(诸如ASIC或FPGA)的记录介质上的这样的软件处理。可以理解，计算机、处理器、微处理器控制器或可编程硬件包括可存储或接收软件或计算机代码的存储组件(例如，RAM、ROM、闪存等)，当所述软件或计算机代码被计算机、处理器或硬件访问且执行时，实现在此描述的数据处理方法。此外，当通用计算机访问用于实现在此示出的数据处理方法的代码时，代码的执行将通用计算机转换为用于执行在此示出的数据处理方法的专用计算机。The above-described methods according to embodiments of the present invention may be implemented in hardware, firmware, or as software or computer codes that may be stored in a recording medium (such as CD ROM, RAM, floppy disk, hard disk, or magneto-optical disk), or implemented by Network downloaded computer code originally stored in a remote recording medium or non-transitory machine-readable medium and will be stored in a local recording medium so that the methods described herein can be stored on a computer using a general purpose computer, special purpose processor or programmable or such software processing on a recording medium of dedicated hardware such as ASIC or FPGA. It is to be understood that a computer, processor, microprocessor controller or programmable hardware includes storage components (eg, RAM, ROM, flash memory, etc.) that can store or receive software or computer code when the software or computer code is executed by a computer, When accessed and executed by a processor or hardware, the data processing methods described herein are implemented. Furthermore, when a general-purpose computer accesses code for implementing the data processing methods shown herein, execution of the code converts the general-purpose computer into a special-purpose computer for executing the data processing methods shown herein.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及方法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本发明实施例的范围。Those of ordinary skill in the art can realize that the units and method steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of the embodiments of the present invention.

以上实施方式仅用于说明本发明实施例，而并非对本发明实施例的限制，有关技术领域的普通技术人员，在不脱离本发明实施例的精神和范围的情况下，还可以做出各种变化和变型，因此所有等同的技术方案也属于本发明实施例的范畴，本发明实施例的专利保护范围应由权利要求限定。The above embodiments are only used to illustrate the embodiments of the present invention, but not to limit the embodiments of the present invention. Those of ordinary skill in the relevant technical field can make various Therefore, all equivalent technical solutions also belong to the scope of the embodiments of the present invention, and the patent protection scope of the embodiments of the present invention should be defined by the claims.

Claims

1. A data processing method, comprising:

acquiring a first data model and a sample image, wherein the first data model includes a machine learning model;

Obtaining multimodal supervision data corresponding to the sample image at least according to the spatiotemporal information of the sample image;

The first data model is trained according to at least the sample image and the multimodal supervision data to obtain a second data model.

2. The method according to claim 1, wherein the obtaining multimodal supervision data corresponding to the sample image at least according to the spatiotemporal information of the sample image, comprising:

Perform face recognition on the sample image to obtain face information corresponding to the sample image;

First supervision data corresponding to the sample image is determined according to the face information and the spatiotemporal information of the sample image.

3. The method according to claim 2, wherein the determining the first supervision data corresponding to the sample image according to the face information and the spatiotemporal information of the sample image comprises:

According to the face information of the sample images, determine a positive sample set composed of sample images containing the same face;

According to the spatiotemporal information of the sample images, determine a negative sample set composed of sample images that meet the set conditions;

According to the positive sample set and the negative sample set, first supervision data corresponding to the sample image is determined.

4. The method according to claim 3, wherein, according to the spatiotemporal information of the sample images, determining a negative sample set composed of sample images that satisfy a set condition, comprising:

Determine the target sample image from the sample image;

From the sample images other than the target sample image, the sample image with the same acquisition time as the target sample image and the distance between the acquisition positions is greater than the set distance threshold is determined as a negative sample image;

A negative sample set corresponding to the target sample image is determined according to the negative sample image.

5. The method according to claim 1, wherein obtaining multimodal supervision data corresponding to the sample image at least according to the spatiotemporal information of the sample image, comprising:

using the first data model to perform feature extraction on the sample image to obtain feature information for characterizing human body image information;

Sorting the feature information to be sorted according to at least the spatiotemporal information of the sample image and the similarity between the feature information and the feature to be sorted to obtain a target sorting result;

The target ranking result is determined as the second supervision data.

6. The method according to claim 5, wherein, according to at least the spatiotemporal information of the sample image and the similarity between the feature information and the feature to be sorted, the feature information to be sorted is sorted to obtain the target ranking Results, including:

Determine the target sample image from the sample image;

According to the similarity between the feature information of the target sample image and the features to be sorted, sort the feature information to be sorted to obtain the initial sorting result;

The initial sorting result is adjusted at least according to the spatiotemporal information of the target sample image to obtain the target sorting result.

7. The method according to claim 6, wherein, at least according to the spatiotemporal information of the target sample image, adjusting the initial sorting result to obtain the target sorting result, comprising:

According to the similarity between the face information of the target sample image and the face information of the sample image corresponding to the feature to be sorted, perform a first adjustment on the initial sorting result to obtain a first pre-adjustment sorting result;

According to the spatiotemporal information of the target sample image and the spatiotemporal information of the sample image corresponding to the feature to be sorted, a second adjustment is performed on the first pre-adjustment sorting result to obtain a second pre-adjustment sorting result;

According to the human body attribute information of the target sample image and the human body attribute information of the sample image corresponding to the features to be sorted, a third adjustment is performed on the second pre-adjustment sorting result to obtain the target sorting result.

8. The method according to claim 1, wherein before the first data model is trained to obtain a second data model according to at least the sample images and the multimodal supervision data, the method further comprises: include:

performing feature extraction on the sample image using the first data model;

The feature information of the sample image is clustered to obtain the category to which the sample image belongs, and the category is determined as the single-modality supervision data of the sample image.

9. The method of claim 8, wherein the multimodal supervisory data comprises first supervisory data and second supervisory data;

The said first data model is trained at least according to the sample image and the multimodal supervision data to obtain a second data model, including:

performing training weight configuration on the first supervision data, the second supervision data and the single-modal supervision data;

Multi-task training is performed on the data model using the sample image, the first supervised data, the second supervised data, the single-modal supervised data, and the loss function corresponding to the supervised data with weights configured.

10. A data processing method, comprising:

At least according to the sample image and the multimodal supervision data, the data model to be trained is trained, and the target data model is obtained, wherein the data model to be trained includes a machine learning model.

11. The method of claim 10, wherein the method further comprises:

Acquiring multiple frames of images in the video data, at least part of the frame images in the multiple frames of images includes a target object;

performing object recognition on the multi-frame images using the target data model;

According to the recognition result, the moving path of the target object is determined.

12. A data processing method, comprising:

receiving a model training request sent by the client by calling a preset training interface and used for requesting training of the first data model;

obtaining sample images for training according to the model training request;

Obtain the multimodal supervision data corresponding to the sample image by using the data processing method according to any one of claims 1-9, and use the sample image and the multimodal supervision data to carry out the first data model. train.

13. The method according to claim 12, wherein the obtaining a sample image according to the model training request comprises:

Obtain sample images locally from the SaaS platform according to the model training request;

or,

According to the model training request, sample images from a third party by the SaaS platform;

or,

According to the model training request, the SaaS platform obtains sample images from the client.

14. A data processing device comprising:

a first acquisition module, configured to acquire a first data model and a sample image, wherein the first data model includes a machine learning model;

a second obtaining module, configured to obtain multimodal supervision data corresponding to the sample image at least according to the spatiotemporal information of the sample image;

A first training module, configured to train the first data model according to at least the sample image and the multimodal supervision data to obtain a second data model.

15. A data processing device, comprising:

a third obtaining module, configured to obtain multimodal supervision data corresponding to the sample image at least according to the spatiotemporal information of the sample image;

The second training module is configured to train a data model to be trained according to at least the sample image and the multimodal supervision data, and obtain a target data model, wherein the data model to be trained includes a machine learning model.

16. An electronic device, comprising: a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface communicate with each other through the communication bus;

The memory is used to store at least one executable instruction, and the executable instruction causes the processor to execute the data processing method according to any one of claims 1-9, or execute the method according to claim 10 or 11. the data processing method of claim 12 or 13, or perform operations corresponding to the data processing method of claim 12 or 13.

17. A computer storage medium on which a computer program is stored, and when the program is executed by a processor, the data processing method according to any one of claims 1-9, or the method according to claim 10 or 11 is realized. A data processing method, or implementing the data processing method as claimed in claim 12 or 13.