CN111539992A

CN111539992A - Image processing method, device, electronic device and storage medium

Info

Publication number: CN111539992A
Application number: CN202010357593.2A
Authority: CN
Inventors: 李通; 金晟; 刘文韬; 钱晨
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2020-04-29
Filing date: 2020-04-29
Publication date: 2020-08-14
Also published as: TW202141340A; WO2021218293A1; JP2022534666A

Abstract

The embodiment of the disclosure discloses an image processing method and device, electronic equipment and a storage medium. The method comprises the following steps: obtaining a plurality of frame images; performing limb key point detection processing on a target object in a first image in the multi-frame images to obtain first key point information corresponding to part of limbs of the target object; determining second key point information corresponding to the part of the limb of the target object in a second image based on the first key point information; the first image is any one of the multi-frame images; the second image is a frame of image after the first image.

Description

Image processing method, device, electronic device and storage medium

技术领域technical field

本公开涉及计算机视觉技术领域，具体涉及一种图像处理方法、装置、电子设备和存储介质。The present disclosure relates to the technical field of computer vision, and in particular, to an image processing method, apparatus, electronic device and storage medium.

背景技术Background technique

目标跟踪技术通常基于肢体检测算法和肢体关键点检测算法，利用肢体检测算法检测出的人体，以及肢体关键点检测算法检测出的人体关键点，实现目标跟踪。但是目前的肢体检测算法和肢体关键点检测算法无法适应只有上半身肢体的场景，从而导致只有上半身肢体的目标无法进行跟踪。The target tracking technology is usually based on the limb detection algorithm and the limb key point detection algorithm, using the human body detected by the limb detection algorithm and the key points of the human body detected by the limb key point detection algorithm to achieve target tracking. However, the current limb detection algorithm and limb key point detection algorithm cannot adapt to the scene with only the upper body limbs, so that the target with only the upper body limbs cannot be tracked.

发明内容SUMMARY OF THE INVENTION

本公开实施例提供一种图像处理方法、装置、电子设备和存储介质。Embodiments of the present disclosure provide an image processing method, apparatus, electronic device, and storage medium.

本公开实施例提供了一种图像处理方法，所述方法包括：获得多帧图像；对所述多帧图像中的第一图像中的目标对象进行肢体关键点检测处理，获得所述目标对象的部分肢体对应的第一关键点信息；基于所述第一关键点信息确定第二图像中的所述目标对象的所述部分肢体对应的第二关键点信息；其中，所述第一图像为所述多帧图像中的任意一帧图像；所述第二图像为所述第一图像后的一帧图像。An embodiment of the present disclosure provides an image processing method, the method includes: obtaining multiple frames of images; performing limb key point detection processing on a target object in a first image in the multiple frame images, and obtaining an image of the target object. first key point information corresponding to a part of the limb; determining second key point information corresponding to the part of the limb of the target object in the second image based on the first key point information; wherein the first image is the any one frame of images in the multiple frames of images; the second image is a frame of images after the first image.

上述方案中，所述对第一图像中的目标对象进行肢体关键点检测处理，获得所述目标对象的部分肢体对应的第一关键点信息，包括：对所述第一图像中的所述目标对象进行肢体检测处理，确定所述目标对象的第一区域；所述第一区域包括所述目标对象的部分肢体所在区域；对所述第一区域对应的像素点进行肢体关键点检测处理，获得所述目标对象的所述部分肢体对应的第一关键点信息。In the above solution, performing limb key point detection processing on the target object in the first image to obtain first key point information corresponding to part of the limbs of the target object includes: detecting the target object in the first image The object is subjected to limb detection processing to determine the first region of the target object; the first region includes the region where part of the limbs of the target object are located; the pixel points corresponding to the first region are subjected to limb key point detection processing to obtain The first key point information corresponding to the part of the limb of the target object.

上述方案中，所述基于所述第一关键点信息确定第二图像中的所述目标对象的所述部分肢体对应的第二关键点信息，包括：基于所述第一关键点信息在所述第一图像中确定第二区域；所述第二区域大于所述目标对象的第一区域；所述第一区域包括所述目标对象的部分肢体所在区域；根据所述第二区域或所述第一关键点信息，确定所述第二图像中与所述第二区域的位置范围对应的第三区域；对所述第二图像中的所述第三区域内的像素点进行肢体关键点检测处理，获得所述部分肢体对应的第二关键点信息。In the above solution, the determining, based on the first key point information, the second key point information corresponding to the part of the limb of the target object in the second image includes: based on the first key point information, in the A second area is determined in the first image; the second area is larger than the first area of the target object; the first area includes the area where part of the body of the target object is located; according to the second area or the first area a key point information, determine a third area in the second image corresponding to the position range of the second area; perform limb key point detection processing on the pixels in the third area in the second image , to obtain the second key point information corresponding to the part of the limb.

上述方案中，所述对所述第一图像中的所述目标对象进行肢体检测处理，包括：利用肢体检测网络对所述第一图像中的所述目标对象进行肢体检测处理；其中，所述肢体检测网络采用第一类样本图像训练得到；所述第一类样本图像中标注有目标对象的检测框；所述检测框的标注范围包括所述目标对象的部分肢体所在区域。In the above solution, the performing limb detection processing on the target object in the first image includes: using a limb detection network to perform limb detection processing on the target object in the first image; wherein, the The limb detection network is obtained by training the first-type sample images; the first-type sample images are marked with a detection frame of the target object; the marked range of the detection frame includes the region where part of the target object's limbs are located.

上述方案中，所述对所述第一区域对应的像素点进行肢体关键点检测处理，包括：利用肢体关键点检测网络对所述第一区域对应的像素点进行肢体关键点检测处理；其中，所述肢体关键点检测网络采用第二类样本图像训练得到；所述第二类样本图像中标注有包括所述目标对象的部分肢体的关键点。In the above solution, performing limb key point detection processing on the pixels corresponding to the first area includes: using a limb key point detection network to perform limb key point detection processing on the pixels corresponding to the first area; wherein, The limb key point detection network is obtained by training the second type of sample images; the second type of sample images are marked with key points including part of the limbs of the target object.

上述方案中，所述目标对象的部分肢体包括以下至少之一：头部、颈部、肩部、胸部、腰部、髋部、手臂、手部；所述第一关键点信息和所述第二关键点信息包括头部、颈部、肩部、胸部、腰部、髋部、手臂和手部中的至少一个肢体的轮廓关键点信息和/骨骼关键点信息。In the above solution, the partial limbs of the target object include at least one of the following: head, neck, shoulder, chest, waist, hip, arm, hand; the first key point information and the second The key point information includes contour key point information and/or bone key point information of at least one limb among the head, neck, shoulder, chest, waist, hip, arm, and hand.

上述方案中，所述方法还包括：响应于获得所述目标对象的部分肢体对应的第一关键点信息的情况，为所述目标对象分配跟踪标识；基于对所述多帧图像的处理过程中分配的所述跟踪标识的数量，确定所述多帧图像中的目标对象的数量。In the above solution, the method further includes: in response to obtaining the first key point information corresponding to part of the limbs of the target object, assigning a tracking identifier to the target object; The number of the assigned tracking identifiers determines the number of target objects in the multi-frame images.

上述方案中，所述方法还包括：基于所述第二关键点信息确定所述目标对象的姿态；基于所述目标对象的姿态确定对应于所述目标对象的交互指令。In the above solution, the method further includes: determining the gesture of the target object based on the second key point information; determining an interaction instruction corresponding to the target object based on the gesture of the target object.

本公开实施例还提供了一种图像处理装置，所述装置包括：获取单元、检测单元和跟踪确定单元；其中，所述获取单元，用于获得多帧图像；所述检测单元，用于对所述多帧图像中的第一图像中的目标对象进行肢体关键点检测处理，获得所述目标对象的部分肢体对应的第一关键点信息；所述跟踪确定单元，用于基于所述第一关键点信息确定第二图像中的所述目标对象的所述部分肢体对应的第二关键点信息；其中，所述第一图像为所述多帧图像中的任意一帧图像；所述第二图像为所述第一图像后的一帧图像。The embodiment of the present disclosure further provides an image processing apparatus, the apparatus includes: an acquisition unit, a detection unit, and a tracking determination unit; wherein, the acquisition unit is used for acquiring multiple frames of images; the detection unit is used for Performing limb key point detection processing on the target object in the first image of the multi-frame images to obtain first key point information corresponding to part of the limbs of the target object; The key point information determines the second key point information corresponding to the part of the limb of the target object in the second image; wherein, the first image is any one frame of images in the multi-frame images; the second image The image is a frame of image after the first image.

上述方案中，所述检测单元包括：肢体检测模块和肢体关键点检测模块；其中，所述肢体检测模块，用于对所述第一图像中的所述目标对象进行肢体检测处理，确定所述目标对象的第一区域；所述第一区域包括所述目标对象的部分肢体所在区域；所述肢体关键点检测模块，用于对所述第一区域对应的像素点进行肢体关键点检测处理，获得所述目标对象的所述部分肢体对应的第一关键点信息。In the above solution, the detection unit includes: a limb detection module and a limb key point detection module; wherein, the limb detection module is configured to perform limb detection processing on the target object in the first image, and determine the The first area of the target object; the first area includes the area where part of the limbs of the target object are located; the limb key point detection module is configured to perform limb key point detection processing on the pixel points corresponding to the first area, Obtain first key point information corresponding to the part of the limb of the target object.

上述方案中，所述跟踪确定单元，用于基于所述第一关键点信息在所述第一图像中确定第二区域；所述第二区域大于所述目标对象的第一区域；所述第一区域包括所述目标对象的部分肢体所在区域；根据所述第二区域或所述第一关键点信息，确定所述第二图像中与所述第二区域的位置范围对应的第三区域；对所述第二图像中的所述第三区域内的像素点进行肢体关键点检测处理，获得所述部分肢体对应的第二关键点信息。In the above solution, the tracking determination unit is configured to determine a second area in the first image based on the first key point information; the second area is larger than the first area of the target object; the first area An area includes the area where part of the limb of the target object is located; according to the second area or the first key point information, determine a third area in the second image corresponding to the position range of the second area; Perform limb key point detection processing on the pixels in the third area in the second image to obtain second key point information corresponding to the part of the limb.

上述方案中，所述肢体检测模块，用于利用肢体检测网络对所述第一图像中的所述目标对象进行肢体检测处理；其中，所述肢体检测网络采用第一类样本图像训练得到；所述第一类样本图像中标注有目标对象的检测框；所述检测框的标注范围包括所述目标对象的部分肢体所在区域。In the above solution, the limb detection module is used to perform limb detection processing on the target object in the first image by using a limb detection network; wherein, the limb detection network is obtained by training the first type of sample images; A detection frame of the target object is marked in the sample image of the first type; the marked range of the detection frame includes the area where part of the limb of the target object is located.

上述方案中，所述肢体关键点检测模块，用于利用肢体关键点检测网络对所述第一区域对应的像素点进行肢体关键点检测处理；其中，所述肢体关键点检测网络采用第二类样本图像训练得到；所述第二类样本图像中标注有包括所述目标对象的部分肢体的关键点。In the above solution, the limb key point detection module is configured to use the limb key point detection network to perform limb key point detection processing on the pixel points corresponding to the first area; wherein, the limb key point detection network adopts the second type. The sample images are obtained through training; the second type of sample images are marked with key points including part of the limbs of the target object.

上述方案中，所述装置还包括分配单元和统计单元；其中，所述分配单元，用于响应于所述检测单元获得所述目标对象的部分肢体对应的第一关键点信息的情况，为所述目标对象分配跟踪标识；所述统计单元，用于基于对所述多帧图像的处理过程中分配的所述跟踪标识的数量，确定所述多帧图像中的目标对象的数量。In the above solution, the device further includes an allocation unit and a statistics unit; wherein, the allocation unit is configured to respond to the situation that the detection unit obtains the first key point information corresponding to the partial limbs of the target object. The target object is assigned a tracking identifier; the statistics unit is configured to determine the number of target objects in the multi-frame image based on the number of the tracking identifiers assigned during the processing of the multi-frame image.

上述方案中，所述装置还包括确定单元，用于基于所述第二关键点信息确定所述目标对象的姿态；基于所述目标对象的姿态确定对应于所述目标对象的交互指令。In the above solution, the apparatus further includes a determination unit, configured to determine the gesture of the target object based on the second key point information; and determine the interaction instruction corresponding to the target object based on the gesture of the target object.

本公开实施例还提供了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现本公开实施例所述的图像处理方法的步骤。The embodiments of the present disclosure further provide a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, implements the steps of the image processing method described in the embodiments of the present disclosure.

本公开实施例还提供了一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现本公开实施例所述的图像处理方法的步骤。Embodiments of the present disclosure also provide an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the processor, where the processor implements the methods described in the embodiments of the present disclosure when the processor executes the program. The steps of an image processing method.

本公开实施例提供的图像处理方法、装置、电子设备和存储介质，通过对待处理的多帧图像中的第一图像中的目标对象的部分肢体的关键点进行识别，并基于识别出的部分肢体的关键点确定在后的第二图像中的目标对象的部分肢体的关键点，从而实现了在图像中具有目标对象的部分肢体(例如上半身)的场景下的目标跟踪。The image processing method, device, electronic device, and storage medium provided by the embodiments of the present disclosure identify the key points of the partial limbs of the target object in the first image of the multi-frame images to be processed, and identify the partial limbs based on the identified partial limbs. The key points of the target object determine the key points of the partial limbs of the target object in the second image, so as to realize the target tracking in the scene with the partial limbs (eg upper body) of the target object in the image.

附图说明Description of drawings

图1为本公开实施例的图像处理方法的流程示意图一；FIG. 1 is a schematic flowchart 1 of an image processing method according to an embodiment of the present disclosure;

图2为本公开实施例的图像处理方法中的肢体关键点检测处理方法的流程示意图；2 is a schematic flowchart of a method for detecting and processing limb key points in an image processing method according to an embodiment of the present disclosure;

图3为本公开实施例的图像处理方法中的肢体关键点跟踪方法的一种流程示意图；3 is a schematic flowchart of a method for tracking limb key points in an image processing method according to an embodiment of the present disclosure;

图4为本公开实施例的图像处理方法的流程示意图二；4 is a second schematic flowchart of an image processing method according to an embodiment of the present disclosure;

图5为本公开实施例的图像处理装置的组成结构示意图一；FIG. 5 is a schematic diagram 1 of a composition structure of an image processing apparatus according to an embodiment of the present disclosure;

图6为本公开实施例的图像处理装置的组成结构示意图二；FIG. 6 is a second schematic diagram of the composition and structure of an image processing apparatus according to an embodiment of the present disclosure;

图7为本公开实施例的图像处理装置的组成结构示意图三；FIG. 7 is a schematic diagram 3 of a composition structure of an image processing apparatus according to an embodiment of the present disclosure;

图8为本公开实施例的图像处理装置的组成结构示意图四；FIG. 8 is a fourth schematic diagram of the composition and structure of an image processing apparatus according to an embodiment of the disclosure;

图9为本公开实施例的电子设备的硬件组成结构示意图。FIG. 9 is a schematic structural diagram of a hardware composition of an electronic device according to an embodiment of the disclosure.

具体实施方式Detailed ways

下面结合附图及具体实施例对本公开作进一步详细的说明。The present disclosure will be described in further detail below with reference to the accompanying drawings and specific embodiments.

以下描述中，为了说明而不是为了限定，提出了诸如特定系统结构、接口、技术之类的具体细节，以便透彻理解本申请。In the following description, for purposes of illustration and not limitation, specific details such as specific system structures, interfaces, techniques, etc. are set forth in order to provide a thorough understanding of the present application.

本文中术语“和/或”，仅仅是一种描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。另外，本文中字符“/”，一般表示前后关联对象是一种“或”的关系。此外，本文中的“多”表示两个或者多于两个。The term "and/or" in this article is only an association relationship to describe the associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, it can mean that A exists alone, A and B exist at the same time, and A and B exist independently B these three cases. In addition, the character "/" in this document generally indicates that the related objects are an "or" relationship. Also, "multiple" herein means two or more than two.

本公开实施例提供了一种图像处理方法。图1为本公开实施例的图像处理方法的流程示意图一；如图1所示，所述方法包括：Embodiments of the present disclosure provide an image processing method. FIG. 1 is a schematic flowchart 1 of an image processing method according to an embodiment of the present disclosure; as shown in FIG. 1 , the method includes:

步骤101：获得多帧图像；Step 101: obtain multiple frames of images;

步骤102：对多帧图像中的第一图像中的目标对象进行肢体关键点检测处理，获得目标对象的部分肢体对应的第一关键点信息；Step 102: Perform limb key point detection processing on the target object in the first image in the multi-frame images to obtain first key point information corresponding to part of the limbs of the target object;

步骤103：基于第一关键点信息确定第二图像中的目标对象的部分肢体对应的第二关键点信息；其中，第一图像为多帧图像中的任意一帧图像；第二图像为第一图像后的一帧图像。Step 103: Determine the second key point information corresponding to the part of the limb of the target object in the second image based on the first key point information; wherein, the first image is any one frame of images in the multi-frame images; the second image is the first image A frame of image after the image.

本实施例的图像处理方法可应用于图像处理装置中，图像处理装置可设置于个人计算机、服务器等具有处理功能的电子设备内，或者由处理器执行计算机程序实现。The image processing method in this embodiment can be applied to an image processing apparatus, and the image processing apparatus can be set in an electronic device with processing functions such as a personal computer, a server, or the like, or implemented by a processor executing a computer program.

本实施例中，上述多帧图像可以为由电子设备内置或外接的摄像设备采集的连续的视频，或者也可以是接收的由其它电子设备传输的视频等。在一些应用场景中，上述多帧图像可以为监控摄像头采集的监控视频，以对监控视频中的各个目标对象进行跟踪。在另一些应用场景中，上述多帧图像也可以是本地或其他视频库中存储的视频，以对视频中的各个目标对象进行跟踪。在又一些应用场景中，本实施例的图像处理方法可应用于虚拟现实(Virtual Reality，VR)、增强现实(Augmented Reality，AR)或者体感游戏等应用场景中；则上述多帧图像还可以为虚拟现实或增强现实场景中采集到的操作者的图像，可通过对图像中的操作者的姿态的识别，控制虚拟现实场景或增强现实场景中的虚拟对象的动作；或者还可以为体感游戏中采集到的参与游戏的目标对象(如多个用户)的图像等。In this embodiment, the above-mentioned multi-frame images may be continuous videos collected by a built-in or external camera device of the electronic device, or may also be received videos transmitted by other electronic devices, or the like. In some application scenarios, the above-mentioned multi-frame images may be surveillance videos collected by surveillance cameras, so as to track each target object in the surveillance videos. In other application scenarios, the above-mentioned multi-frame images may also be videos stored locally or in other video libraries, so as to track each target object in the video. In still other application scenarios, the image processing method of this embodiment may be applied to application scenarios such as virtual reality (Virtual Reality, VR), augmented reality (Augmented Reality, AR), or somatosensory games; the above-mentioned multi-frame images may also be The image of the operator collected in the virtual reality or augmented reality scene can control the action of the virtual object in the virtual reality scene or the augmented reality scene by recognizing the gesture of the operator in the image; The collected images of target objects (such as multiple users) participating in the game, etc.

在一些应用场景中，图像处理装置可与一个或多个监控摄像头建立通信连接，将实时获得监控摄像头采集的监控视频作为待处理的多帧图像。在另一些应用场景中，图像处理装置也可从自身存储的视频中获取视频作为待处理的多帧图像，或者，也可从其他电子设备存储的视频中获取视频作为待处理的多帧图像等等。在又一些应用场景中，图像处理装置也可置于游戏设备中，在游戏设备的处理器执行计算机程序从而实现游戏操作者操作过程中，将输出显示的图像作为待处理的多帧图像，对图像中的目标对象(目标对象对应于游戏操作者)进行跟踪。In some application scenarios, the image processing apparatus may establish a communication connection with one or more surveillance cameras, and obtain surveillance video captured by the surveillance cameras in real time as multi-frame images to be processed. In other application scenarios, the image processing apparatus can also obtain video from the video stored by itself as the multi-frame image to be processed, or can also obtain the video from the video stored by other electronic devices as the multi-frame image to be processed, etc. Wait. In still other application scenarios, the image processing device can also be placed in the game device, and the output and displayed image is taken as the multi-frame image to be processed during the execution of the computer program by the processor of the game device to realize the operation of the game operator. The target object in the image (the target object corresponds to the game operator) is tracked.

本实施例中，待处理的多帧图像中可包括目标对象，目标对象可以为一个或多个；在一些应用场景中，目标对象可以是真实人物；在另一些应用场景中，目标对象也可以是根据实际追踪需要而确定的其他对象，例如虚拟人物或其他虚拟对象等。In this embodiment, the multi-frame images to be processed may include target objects, and the target objects may be one or more; in some application scenarios, the target objects may be real people; in other application scenarios, the target objects may also be It is other objects determined according to the actual tracking needs, such as avatars or other virtual objects.

本实施例中，多帧图像中的每一帧图像可称为帧图像，是组成视频(即待处理图像)的最小单位，可以理解，多帧图像为一组时间连续的帧图像，按照各个帧图像的采集时间形成上述多帧图像，各个帧图像对应的时间参数是连续的。In this embodiment, each frame of images in the multi-frame images may be called a frame image, which is the smallest unit that constitutes a video (ie, an image to be processed). The acquisition time of the frame images forms the above-mentioned multi-frame images, and the time parameters corresponding to each frame image are continuous.

示例性的，以目标对象为真实人物为例，在多帧图像中包括目标对象的情况下，上述多帧图像对应的时间范围内可包括一个或多个目标对象，也可以是上述多帧图像的时间范围内的部分时间范围内包括一个或多个目标对象，本实施例中对此不作限定。Exemplarily, taking the target object as a real person as an example, in the case where the target object is included in the multi-frame image, the time range corresponding to the multi-frame image may include one or more target objects, or the multi-frame image may include one or more target objects. A part of the time range within the time range includes one or more target objects, which is not limited in this embodiment.

本实施例中，上述第一图像为多帧图像中的任意一帧图像，第二图像为第一图像后的一帧图像。其中，在一些可选的实施例中，第二图像可以是与第一图像时间连续的、在后的一帧图像。例如，多帧图像包括10帧图像，上述第一图像为10帧图像中的第2帧图像，则上述第二图像为第3帧图像。在另一些可选实施例中，第二图像也可以是第一图像后的、与第一图像相距预设数量帧图像的一帧图像。例如，多帧图像包括20帧图像，上述第一图像为20帧图像中的第2帧图像，假设预设数量帧图像为3帧图像，则上述第二图像可以为20帧图像中的第6帧图像。其中，上述预设数量可依据实际情况预先设定，例如预设数量可依据目标对象的移动速度预先设定。这种实施方式能够有效的减小数据处理量，从而减轻图像处理装置的消耗。In this embodiment, the above-mentioned first image is any one frame of images among the multiple frames of images, and the second image is an image of one frame after the first image. Wherein, in some optional embodiments, the second image may be a subsequent frame of images temporally continuous with the first image. For example, the multi-frame images include 10 frames of images, the above-mentioned first image is the second frame of images in the 10-frame images, and the above-mentioned second image is the third frame of images. In some other optional embodiments, the second image may also be a frame of images after the first image and separated from the first image by a preset number of frames of images. For example, the multi-frame images include 20 frames of images, and the above-mentioned first image is the second frame of images in the 20 frames of images. Assuming that the preset number of frames of images is 3 frames of images, the above-mentioned second image may be the 6th image in the 20 frames of images. frame image. Wherein, the above-mentioned preset number can be preset according to the actual situation, for example, the preset number can be preset according to the moving speed of the target object. This embodiment can effectively reduce the amount of data processing, thereby reducing the consumption of the image processing device.

本实施例中，图像处理装置可通过肢体关键点检测网络对第一图像中的目标对象进行肢体关键点检测处理，获得目标对象的部分肢体对应的第一关键点信息。本实施例中，上述目标对象的部分肢体包括以下至少之一：头部、颈部、肩部、胸部、腰部、髋部、手臂、手部。相应的，目标对象的部分肢体对应的第一关键点信息和第二关键点信息包括目标对象的头部、颈部、肩部、胸部、腰部、髋部、手臂、手部中的至少一个肢体的轮廓关键点信息和/骨骼关键点信息。In this embodiment, the image processing apparatus may perform limb key point detection processing on the target object in the first image through the limb key point detection network, and obtain first key point information corresponding to some limbs of the target object. In this embodiment, the partial limbs of the target object include at least one of the following: a head, a neck, a shoulder, a chest, a waist, a hip, an arm, and a hand. Correspondingly, the first key point information and the second key point information corresponding to part of the limbs of the target object include at least one limb among the head, neck, shoulder, chest, waist, hip, arm, and hand of the target object. The contour key point information and/or bone key point information of .

示例性的，本实施例中目标对象的部分肢体为目标对象的上半身肢体，以便于能够识别出多帧图像中具有上半身的目标对象，从而实现仅具有上半身或具有全身的目标对象的追踪。Exemplarily, part of the limbs of the target object in this embodiment are the upper body limbs of the target object, so that the target object with the upper body in the multi-frame images can be identified, so as to realize the tracking of the target object with only the upper body or the whole body.

示例性的，上述第一关键点信息和第二关键点信息对应的关键点可以包括：头部的至少一个关键点、肩部的至少一个关键点、手臂的至少一个关键点、胸部的至少一个关键点、髋部的至少一个关键点和腰部的至少一个关键点；可选的，上述第一关键点信息和第二关键点信息对应的关键点还可以包括手部的至少一个关键点。图像处理装置是否能够获得手部的关键点，取决于用于训练肢体关键点检测网络的样本图像中是否标注了手部的关键点；在样本图像中标注了手部的关键点的情况下，则可通过肢体关键点检测网络检测到手部的关键点。Exemplarily, the key points corresponding to the first key point information and the second key point information may include: at least one key point of the head, at least one key point of the shoulder, at least one key point of the arm, and at least one key point of the chest. Key points, at least one key point of the hip, and at least one key point of the waist; optionally, the key points corresponding to the first key point information and the second key point information may also include at least one key point of the hand. Whether the image processing device can obtain the key points of the hand depends on whether the key points of the hand are marked in the sample image used for training the limb key point detection network; in the case where the key points of the hand are marked in the sample image, Then the key points of the hand can be detected by the limb key point detection network.

在一些可选实施例中，在上述目标对象的部分肢体包括头部的情况下，第一关键点信息和第二关键点信息中可包括至少一个器官的关键点信息，至少一个器官的关键点信息可包括以下至少之一：鼻子关键点信息、眉心关键点信息、嘴部关键点信息。In some optional embodiments, when part of the limbs of the target object includes the head, the first key point information and the second key point information may include key point information of at least one organ, and key point information of at least one organ The information may include at least one of the following: nose key point information, eyebrow key point information, and mouth key point information.

在一些可选实施例中，在上述目标对象的部分肢体包括手臂的情况下，第一关键点信息和第二关键点信息中可包括手肘部关键点信息。In some optional embodiments, in the case that the part of the limb of the target object includes an arm, the first key point information and the second key point information may include elbow key point information.

在一些可选实施例中，在上述目标对象的部分肢体包括手部的情况下，第一关键点信息和第二关键点信息中可包括手腕关键点信息。可选地，第一关键点信息和第二关键点信息中还可包括手部的轮廓关键点信息。In some optional embodiments, when part of the limbs of the target object include hands, the first key point information and the second key point information may include wrist key point information. Optionally, the first key point information and the second key point information may further include contour key point information of the hand.

在一些可选实施例中，在上述目标对象的部分肢体包括髋部的情况下，第一关键点信息和第二关键点信息中可包括左髋关键点信息和右髋关键点信息。可选地，第一关键点信息和第二关键点信息中还可包括脊柱根部关键点信息。In some optional embodiments, when part of the limbs of the target object includes a hip, the first key point information and the second key point information may include left hip key point information and right hip key point information. Optionally, the first key point information and the second key point information may further include spine root key point information.

其中，上述第一关键点信息具体可以包括关键点的坐标。上述第一关键点信息可以包括轮廓关键点的坐标和/或骨骼关键点的坐标。可以理解，通过轮廓关键点的坐标能够形成对应的部分肢体的轮廓边缘；通过骨骼关键点的坐标能够形成对应的部分肢体的骨骼。The above-mentioned first key point information may specifically include the coordinates of the key points. The above-mentioned first key point information may include the coordinates of contour key points and/or the coordinates of bone key points. It can be understood that the contour edges of the corresponding partial limbs can be formed by the coordinates of the contour key points; the bones of the corresponding partial limbs can be formed by the coordinates of the bone key points.

图2为本公开实施例的图像处理方法中的肢体关键点检测处理方法的流程示意图；在一些可选的实施例中，步骤102可参照图2所示，包括：FIG. 2 is a schematic flowchart of a method for detecting and processing limb key points in an image processing method according to an embodiment of the present disclosure; in some optional embodiments, step 102 may refer to FIG. 2 , including:

步骤1021：对第一图像中的目标对象进行肢体检测处理，确定目标对象的第一区域；第一区域包括上述目标对象的部分肢体所在区域；Step 1021: Perform limb detection processing on the target object in the first image to determine a first region of the target object; the first region includes the region where part of the limbs of the target object are located;

步骤1022：对第一区域对应的像素点进行肢体关键点检测处理，获得目标对象的部分肢体对应的第一关键点信息。Step 1022 : Perform limb key point detection processing on the pixels corresponding to the first area to obtain first key point information corresponding to part of the limbs of the target object.

本实施例中，首先对第一图像中的各个目标对象进行肢体检测，确定各个目标对象的第一区域，例如可确定各个目标对象的上半身对应的第一区域或者各个目标对象的全身对应的第一区域。实际应用中，可通过标识目标对象的检测框(例如矩形框)表示部分肢体对应的第一区域，例如，通过各个矩形框标识出第一图像中的各个人物的上半身。In this embodiment, limb detection is first performed on each target object in the first image, and the first area of each target object is determined. For example, the first area corresponding to the upper body of each target object or the first area corresponding to the whole body of each target object can be determined. an area. In practical applications, the first region corresponding to a part of the limb may be represented by a detection frame (eg, a rectangular frame) that identifies the target object, for example, the upper body of each person in the first image is identified by each rectangular frame.

在一些可选的实施例中，上述对第一图像中的目标对象进行肢体检测处理，包括：利用肢体检测网络对第一图像中的目标对象进行肢体检测处理；其中，上述肢体检测网络采用第一类样本图像训练得到；第一类样本图像中标注有目标对象的检测框；检测框的标注范围包括目标对象的部分肢体所在区域；目标对象的部分肢体可以是目标对象的上半身肢体。In some optional embodiments, performing limb detection processing on the target object in the first image includes: using a limb detection network to perform limb detection processing on the target object in the first image; wherein the limb detection network uses the first limb detection network. One class of sample images is obtained by training; the first class of sample images is marked with a detection frame of the target object; the labeling range of the detection frame includes the area where part of the target object's limbs are located; and the part of the target object's limbs may be the upper body limbs of the target object.

本实施例中，可通过预先训练好的肢体检测网络对第一图像进行肢体检测，确定目标对象的第一区域，也即获得第一图像中各个目标对象的检测框。上述检测框可标识目标对象的部分肢体或全部肢体，也即通过肢体检测网络可检测获得目标对象的全部肢体或者上半身肢体。其中，上述肢体检测网络可采用任意一种能够检测目标对象肢体的网络结构，本实施例中对此不做限定。In this embodiment, a pre-trained limb detection network can be used to perform limb detection on the first image to determine the first region of the target object, that is, to obtain detection frames of each target object in the first image. The above-mentioned detection frame can identify part or all of the limbs of the target object, that is, all limbs or upper body limbs of the target object can be detected and obtained through the limb detection network. The above-mentioned limb detection network may adopt any network structure capable of detecting limbs of the target object, which is not limited in this embodiment.

示例性的，以通过肢体检测网络检测得到目标对象的部分肢体的检测框为例，可通过肢体检测网络对第一图像进行特征提取，基于提取到的特征确定第一图像中的各个目标对象的部分肢体的中心点以及对应于各个目标对象的部分肢体的检测框的高度和宽度，基于各个目标对象的部分肢体的中心点以及对应的高度和宽度，可确定各个目标对象的部分肢体的检测框。Exemplarily, taking the detection frame of the partial limbs of the target object detected by the limb detection network as an example, the limb detection network can be used to perform feature extraction on the first image, and the characteristics of each target object in the first image can be determined based on the extracted features. The center point of the partial limb and the height and width of the detection frame corresponding to the partial limb of each target object. Based on the center point and the corresponding height and width of the partial limb of each target object, the detection frame of the partial limb of each target object can be determined. .

本实施例中，肢体检测网络可采用标注有目标对象的检测框的第一类样本图像训练获得；其中，检测框的标注范围包括目标对象的部分肢体，可以理解，第一类样本图像中可仅标注有目标对象的部分肢体(例如目标对象的上半身肢体)的检测框，也可以标注有目标对象完整肢体的检测框。示例性的，以检测框的标注范围为目标对象的部分肢体为例，可利用肢体检测网络提取第一类样本图像的特征数据，基于特征数据确定第一类样本图像中各个目标对象的部分肢体的预测中心点以及对应部分肢体的预测检测框的高度和宽度，基于上述部分肢体的预测中心点以及对应的高度和宽度确定各个部分肢体对应的预测检测框；根据预测检测框以及标注的部分肢体的检测框确定损失，基于损失调整肢体检测网络的网络参数。In this embodiment, the limb detection network can be obtained by training the first type of sample images marked with the detection frame of the target object; wherein, the marked range of the detection frame includes part of the limbs of the target object. It can be understood that the first type of sample images may be The detection frame marked only with part of the limbs of the target object (for example, the upper body limbs of the target object) may also be marked with the detection frame of the complete limbs of the target object. Exemplarily, taking the labeling range of the detection frame as the part of the limb of the target object as an example, the limb detection network can be used to extract the feature data of the first type of sample images, and based on the feature data, the partial limbs of each target object in the first type of sample images can be determined. Based on the predicted center point of the partial limbs and the height and width of the predicted detection frame of the corresponding part of the limb, the predicted detection frame corresponding to each partial limb is determined based on the predicted center point of the above-mentioned partial limb and the corresponding height and width; According to the predicted detection frame and the marked part of the limb The detection frame determines the loss, and the network parameters of the limb detection network are adjusted based on the loss.

在一些可选的实施例中，上述对第一区域对应的像素点进行肢体关键点检测处理，包括：利用肢体关键点检测网络对第一区域对应的像素点进行肢体关键点检测处理；其中，上述肢体关键点检测网络采用第二类样本图像训练得到；第二类样本图像中标注有目标对象的关键点；上述关键点的标注范围包括目标对象的部分肢体。In some optional embodiments, performing limb key point detection processing on the pixels corresponding to the first area includes: using a limb key point detection network to perform limb key point detection processing on the pixels corresponding to the first area; wherein, The above-mentioned limb key point detection network is obtained by training the second type of sample images; the second type of sample images are marked with key points of the target object; the marked range of the above-mentioned key points includes part of the limbs of the target object.

本实施例中，可通过预先训练好的肢体关键点检测网络对第一区域对应的像素点进行肢体关键点检测，确定各个目标对象的部分肢体的第一关键点信息。示例性的，上述第一区域可包括目标对象的部分肢体，可将各个目标对象的检测框对应的像素点输入至肢体关键点检测网络，得到各个目标对象的部分肢体对应的第一关键点信息。其中，上述肢体关键点检测网络可采用任意一种能够检测肢体关键点的网络结构，本实施例中对此不做限定。In this embodiment, a pre-trained limb key point detection network may be used to perform limb key point detection on the pixel points corresponding to the first area to determine the first key point information of part of the limbs of each target object. Exemplarily, the above-mentioned first area may include part of the limbs of the target object, and the pixel points corresponding to the detection frame of each target object may be input into the limb key point detection network to obtain the first key point information corresponding to the partial limbs of each target object. . The above-mentioned limb key point detection network may adopt any network structure capable of detecting limb key points, which is not limited in this embodiment.

本实施例中，肢体关键点检测网络可采用标注有目标对象的关键点的第二类样本图像训练获得，其中，关键点的标注范围包括目标对象的部分肢体，可以理解，第二类样本图像中可仅标注有目标对象的部分肢体(例如目标对象的上半身肢体)的关键点，也可以标注有目标对象的完整肢体的关键点。示例性的，以第二类样本图像中标注有目标对象的部分肢体的关键点为例，可利用肢体关键点检测网络提取第二类样本图像的特征数据，基于特征数据确定第二类样本图像中各个目标对象的部分肢体的预测关键点；基于上述预测关键点和标注的关键点确定损失，基于损失调整肢体关键点检测网络的网络参数。In this embodiment, the limb key point detection network can be obtained by training the second type of sample images marked with the key points of the target object, wherein the marked range of the key points includes part of the limbs of the target object. Only the key points of part of the limbs of the target object (for example, the upper body limbs of the target object) can be marked, or the key points of the complete limbs of the target object can be marked. Exemplarily, taking the key points of part of the limbs of the target object marked in the second type of sample image as an example, the limb key point detection network can be used to extract the feature data of the second type of sample image, and the second type of sample image can be determined based on the feature data. The prediction key points of some limbs of each target object in the above; determine the loss based on the above predicted key points and the marked key points, and adjust the network parameters of the limb key point detection network based on the loss.

图3为本公开实施例的图像处理方法中的肢体关键点跟踪方法的一种流程示意图；在一些可选的实施例中，步骤103可参照图3所示，方法包括：FIG. 3 is a schematic flowchart of a method for tracking limb key points in an image processing method according to an embodiment of the present disclosure; in some optional embodiments, step 103 may refer to FIG. 3 , and the method includes:

步骤1031：基于第一关键点信息在第一图像中确定第二区域；第二区域大于目标对象的第一区域；第一区域包括上述目标对象的部分肢体所在区域；Step 1031: Determine a second area in the first image based on the first key point information; the second area is larger than the first area of the target object; the first area includes the area where part of the limbs of the target object are located;

步骤1032：根据第二区域或第一关键点信息，确定第二图像中与第二区域的位置范围对应的第三区域；Step 1032: Determine a third area in the second image corresponding to the position range of the second area according to the second area or the first key point information;

步骤1033：对第二图像中的第三区域内的像素点进行肢体关键点检测处理，获得部分肢体对应的第二关键点信息。Step 1033: Perform limb key point detection processing on the pixels in the third area in the second image to obtain second key point information corresponding to some limbs.

本实施例中，针对第一图像中的一个目标对象，基于该目标对象的部分肢体的第一关键点信息确定一个区域，该区域可以为包含该目标对象的部分肢体的所有关键点的最小区域。示例性的，若该区域为矩形区域，则该矩形区域为包含该目标对象的部分肢体的所有关键点的最小区域。则上述第二区域为在第一图像中、对第一区域进行放大处理得到的区域。In this embodiment, for a target object in the first image, an area is determined based on the first key point information of the partial limb of the target object, and the area may be the smallest area including all the key points of the partial limb of the target object . Exemplarily, if the area is a rectangular area, the rectangular area is the smallest area including all the key points of the partial limbs of the target object. Then the above-mentioned second area is an area obtained by enlarging the first area in the first image.

示例性的，若第一区域为矩形为例，假设上述第一区域的高度为H，宽度为W，则可以该区域的中心点为中心、以该区域的四边朝向远离中心点的方向延伸，例如在高度方向上，分别向远离中心点的方向延伸H/4，在宽度方向上，分别向远离中心点的方向延伸W/4，则上述第二区域可通过第一图像中、以上述中心点为中心，高度为3H/2、宽度为3W/2的矩形区域表示。Exemplarily, if the first area is a rectangle as an example, assuming that the height of the first area is H and the width is W, the center point of the area can be the center, and the four sides of the area can be extended in the direction away from the center point, For example, in the height direction, extending H/4 in the direction away from the center point, and in the width direction, extending W/4 in the direction away from the center point, the second area can pass through the center in the first image. The point is the center, and it is represented by a rectangular area with a height of 3H/2 and a width of 3W/2.

则本实施例中，可依据第二区域在第一图像中的位置范围，或者第一关键点信息在第一图像中的位置范围，确定第二图像中、与上述位置范围对应的第三区域。In this embodiment, the third region in the second image corresponding to the above-mentioned position range may be determined according to the position range of the second region in the first image, or the position range of the first key point information in the first image. .

在一些可选实施例中，根据第二区域，确定第二图像中与第二区域的位置范围对应的第三区域，还可以包括：对第二区域对应的像素点进行肢体关键点检测处理，获得第三关键点信息；确定第三关键点信息在第一图像中的位置范围，基于上述位置范围确定第二图像中、与上述位置范围对应的第三区域。In some optional embodiments, determining a third area in the second image corresponding to the position range of the second area according to the second area may further include: performing limb key point detection processing on the pixels corresponding to the second area, Obtaining third key point information; determining a position range of the third key point information in the first image, and determining a third area in the second image corresponding to the position range based on the position range.

示例性的，本实施例中，依旧采用肢体关键点检测网络对第二区域对应的像素点进行肢体关键点检测处理，可以将第一图像中、扩大后的上述第二区域对应的像素点作为肢体关键点检测网络的输入数据，输出第三关键点信息，上述第三关键点信息作为第二图像中的目标对象的预测关键点信息，也即本申请实施例通过对前一帧图像中的目标对象的所在区域进行扩大处理(例如对前一帧图像中的目标对象的部分肢体所在区域进行扩大处理)，通过对扩大后的区域进行肢体关键点检测，将获得的关键点作为当前帧图像(即第一图像)之后的一帧图像(即第二图像)中、对应于目标对象(例如目标对象的部分肢体)的预测关键点。进一步基于预测出的位置范围，对第二图像中的第三区域对应的像素点进行肢体关键点检测处理，检测到的关键点信息即为上述目标对象的部分肢体对应的第二关键点信息。Exemplarily, in this embodiment, the limb key point detection network is still used to perform limb key point detection processing on the pixels corresponding to the second area, and the pixels corresponding to the enlarged second area in the first image may be used as The input data of the limb key point detection network outputs third key point information, and the third key point information is used as the predicted key point information of the target object in the second image, that is, the embodiment of the present application uses The area where the target object is located is enlarged (for example, the area where part of the body of the target object is located in the previous frame image is enlarged), and the key points obtained are used as the current frame image by performing limb key point detection on the enlarged area. The predicted key points corresponding to the target object (eg, part of the limbs of the target object) in a frame of images (ie, the second image) following the first image (ie, the first image). Further based on the predicted position range, limb key point detection processing is performed on the pixels corresponding to the third area in the second image, and the detected key point information is the second key point information corresponding to the partial limbs of the target object.

在另一些可选实施例中，步骤103还可包括：基于第一图像、目标对象的第一区域和目标追踪网络，确定第二图像中的目标对象的预测区域，基于第二图像中的上述预测区域的像素点进行肢体关键点检测处理，得到目标对象的部分肢体对应的第二关键点信息；其中，目标追踪网络采用多帧样本图像训练得到；多帧样本图像中至少包括第一样本图像和第二样本图像，第二样本图像为第一样本图像后的一帧图像；第一样本图像中标注有目标对象的位置，第二样本图像中标注有目标对象的位置。示例性的，多帧样本图像中均标注有目标对象的检测框，通过检测框表示目标对象在样本图像中的位置；检测框的标注范围包括目标对象的部分肢体所在区域；目标对象的部分肢体可以是目标对象的上半身肢体。In other optional embodiments, step 103 may further include: based on the first image, the first area of the target object, and the target tracking network, determining the predicted area of the target object in the second image, based on the above-mentioned Perform limb key point detection processing on the pixels in the prediction area to obtain second key point information corresponding to part of the limbs of the target object; wherein, the target tracking network is obtained by training multiple frames of sample images; the multi-frame sample images include at least the first sample The image and the second sample image, the second sample image is a frame image after the first sample image; the position of the target object is marked in the first sample image, and the position of the target object is marked in the second sample image. Exemplarily, a detection frame of the target object is marked in the multi-frame sample images, and the position of the target object in the sample image is represented by the detection frame; the labeling range of the detection frame includes the area where part of the limb of the target object is located; part of the limb of the target object Can be the upper body limb of the target subject.

本实施例中，可利用上一帧图像(即第一图像)以及图像中的目标对象的位置、通过预先训练好的目标追踪网络确定下一帧图像(即第二图像)中该目标对象的预测位置。示例性的，可将包含有目标对象的检测框的第一图像输入至目标追踪网络，得到第二图像中的目标对象的预测位置；再对第二图像中的预测位置处的像素点进行肢体关键点检测处理，得到目标对象的部分肢体在第二图像中的第二关键点信息。其中，上述目标跟踪网络可采用任意一种能够实现目标跟踪的网络结构，本实施例中对此不做限定。In this embodiment, the position of the target object in the previous frame of image (ie, the first image) and the target object in the image can be used to determine the position of the target object in the next frame of image (ie, the second image) through a pre-trained target tracking network. Predicted location. Exemplarily, the first image containing the detection frame of the target object can be input into the target tracking network to obtain the predicted position of the target object in the second image; The key point detection process obtains second key point information of part of the limbs of the target object in the second image. The above target tracking network may adopt any network structure capable of implementing target tracking, which is not limited in this embodiment.

本实施例中，目标追踪网络可采用标注有目标对象的位置(例如包含目标对象的检测框，或者包含目标对象的部分肢体的检测框)的多帧样本图像训练获得。示例性的，以多帧样本图像中至少包括第一图像和第二图像为例，可利用目标追踪网络对第一样本图像进行处理，第一样本图像中标注有目标对象的位置，处理结果为该目标对象在第二样本图像中的预测位置；则可根据上述预测位置和第二图像中目标对象的标注位置确定损失，基于损失调整目标追踪网络的网络参数。In this embodiment, the target tracking network can be obtained by training multiple frames of sample images marked with the position of the target object (for example, a detection frame including the target object, or a detection frame including a part of a limb of the target object). Exemplarily, taking a multi-frame sample image including at least a first image and a second image as an example, the target tracking network can be used to process the first sample image, and the position of the target object is marked in the first sample image. The result is the predicted position of the target object in the second sample image; then the loss can be determined according to the predicted position and the marked position of the target object in the second image, and the network parameters of the target tracking network can be adjusted based on the loss.

需要说明的是，在基于第一关键点信息确定第二图像中的目标对象的部分肢体对应的第二关键点信息后，可基于第二图像中的目标对象的部分肢体对应的第二关键点信息进一步确定在后图像中的目标对象的部分肢体对应的关键点信息，以此类推，直至无法在后一帧图像中检测出目标对象的部分肢体对应的关键点信息，此时，可表明待处理的多帧图像中已不包括上述目标对象，即目标对象已移出待处理的多帧图像的视野范围内。It should be noted that, after determining the second key point information corresponding to the partial limb of the target object in the second image based on the first key point information, the second key point corresponding to the partial limb of the target object in the second image may be used. The information further determines the key point information corresponding to the part of the limb of the target object in the subsequent image, and so on, until the key point information corresponding to the part of the limb of the target object cannot be detected in the next frame of image. The above-mentioned target object is not included in the processed multi-frame image, that is, the target object has moved out of the field of view of the multi-frame image to be processed.

在一些可选实施例中，图像处理装置也可针对每一帧图像中的目标对象进行肢体检测，得到每一帧图像中的目标对象所在的区域。将检测到的目标对象作为追踪对象，从而可确定当前帧图像中是否出现新的目标对象；在当前帧图像中出现新的目标对象的情况下，将新的目标对象作为追踪对象，将新的目标对象对应的第一区域内的像素点进行肢体关键点检测处理，即针对新的目标对象执行本公开实施例中步骤103的处理。示例性的，图像处理装置可每隔预设时间或者每隔预设数量的图像帧执行图像中的目标对象的肢体检测处理，从而实现每隔一段时间检测图像中是否有新的目标对象出现，对新的目标对象进行跟踪。In some optional embodiments, the image processing apparatus may also perform limb detection on the target object in each frame of image, to obtain the region where the target object is located in each frame of image. The detected target object is used as the tracking object, so as to determine whether a new target object appears in the current frame image; if a new target object appears in the current frame image, the new target object is used as the tracking object, and the new target object is used as the tracking object. The pixel points in the first area corresponding to the target object are subjected to limb key point detection processing, that is, the processing of step 103 in the embodiment of the present disclosure is performed for a new target object. Exemplarily, the image processing apparatus may perform the limb detection processing of the target object in the image every preset time or every preset number of image frames, so as to detect whether a new target object appears in the image at regular intervals, A new target object is tracked.

在本公开的一些可选实施例中，上述方法还包括：响应于获得目标对象的部分肢体对应的第一关键点信息的情况，为目标对象分配跟踪标识；基于对多帧图像的处理过程中分配的跟踪标识的数量，确定多帧图像中的目标对象的数量。In some optional embodiments of the present disclosure, the above method further includes: in response to obtaining the first key point information corresponding to part of the limbs of the target object, assigning a tracking identifier to the target object; The number of assigned tracking identities determines the number of target objects in the multi-frame image.

本实施例中，图像处理装置在待处理的多帧图像中的首帧图像中检测到目标对象，即获得目标对象的部分肢体对应的第一关键点信息时，为目标对象分配一个跟踪标识，该跟踪标识与该目标对象建立关联，直至在对该目标对象进行跟踪的过程中，无法跟踪到该目标对象。In this embodiment, when the image processing device detects the target object in the first frame of the multi-frame images to be processed, that is, when obtaining the first key point information corresponding to part of the limbs of the target object, a tracking identifier is allocated to the target object, The tracking identifier is associated with the target object until the target object cannot be tracked during the process of tracking the target object.

在一些可选实施例中，图像处理装置也可针对每一帧图像中的目标对象进行肢体检测，得到每一帧图像中的目标对象的部分肢体对应的区域，将检测到的目标对象作为追踪对象。基于此，图像处理装置对待处理图像中的首帧图像进行检测，为检测到的目标对象分配跟踪标识。之后，该跟踪标识一直跟随该目标对象，直至无法跟踪到该目标对象。若在某一帧图像中检测到新的目标对象，则为该新的目标对象分配跟踪标识，重复执行上述方案。可以理解，在同一时刻检测到的各个目标对象对应于不同的跟踪标识；在连续的时间范围内跟踪到的目标对象对应于相同的跟踪标识；在非连续的时间范围内分别检测到的目标对象对应于不同的跟踪标识。In some optional embodiments, the image processing apparatus may also perform limb detection on the target object in each frame of image, obtain the area corresponding to part of the limb of the target object in each frame of image, and use the detected target object as the tracking object. Based on this, the image processing apparatus detects the first frame of the image to be processed, and assigns a tracking identifier to the detected target object. After that, the tracking identifier follows the target object until the target object cannot be tracked. If a new target object is detected in a certain frame of image, a tracking identifier is assigned to the new target object, and the above scheme is repeated. It can be understood that each target object detected at the same time corresponds to different tracking identifiers; the target objects tracked in a continuous time range correspond to the same tracking identifier; the target objects detected respectively in a non-continuous time range Corresponds to different trace identifiers.

例如，若某一帧图像，分别检测到三个目标对象，则针对三个目标对象分别分配一个跟踪标识，每个目标对象分别对应一个跟踪标识。For example, if three target objects are detected in a certain frame of image, respectively, a tracking mark is allocated to the three target objects, and each target object corresponds to a tracking mark respectively.

又例如，针对5分钟的多帧图像，在第一个1分钟内检测到三个目标对象，分别为三个目标对象分配一个跟踪标识，例如可记为标识1、标识2和标识3；在第二个1分钟内，上述三个目标对象中的第一个目标对象消失，则在当前1分钟内，只有两个目标对象，分别对应的跟踪标识为标识2和标识3；在第三个1分钟内，上述第一个目标对象又出现在图像中，即相比于在先图像中、检测到新的目标对象，尽管该目标对象是第一个1分钟内出现过的目标对象(即第一个目标对象)，依旧为该目标对象分配标识4作为跟踪标识，以此类推。For another example, for a 5-minute multi-frame image, three target objects are detected in the first 1 minute, and a tracking identification is assigned to the three target objects, for example, it can be recorded as identification 1, identification 2 and identification 3; In the second 1 minute, the first target object among the above three target objects disappears, then in the current 1 minute, there are only two target objects, and the corresponding tracking IDs are ID 2 and ID 3; Within 1 minute, the above-mentioned first target object appeared in the image again, that is, compared with the previous image, a new target object was detected, although the target object was the target object that appeared in the first 1 minute (i.e. The first target object), the target object is still assigned identifier 4 as the tracking identifier, and so on.

基于此，本实施例的技术方案可基于多帧图像处理过程中对应的跟踪标识的数量，确定多帧图像中出现过的目标对象的数量。示例性的，多帧图像中出现过的目标对象的数量指的是多帧图像对应的时间范围内出现过的目标对象的次数。Based on this, the technical solution of this embodiment can determine the number of target objects that have appeared in the multi-frame images based on the number of corresponding tracking marks in the multi-frame image processing process. Exemplarily, the number of target objects that appear in the multi-frame images refers to the number of times the target objects appear in the time range corresponding to the multi-frame images.

采用本公开实施例的技术方案，通过对待处理的多帧图像中的第一图像中的目标对象的部分肢体的关键点进行识别，并基于识别出的部分肢体的关键点确定在后的第二图像中的目标对象的部分肢体的关键点，从而实现了在图像中仅具有目标对象的部分肢体(例如上半身)的场景下的目标跟踪，也即本公开实施例的技术方案能够同时适应完整肢体场景和部分肢体(例如上半身)场景，实现了图像中的目标跟踪。By adopting the technical solutions of the embodiments of the present disclosure, the key points of the partial limbs of the target object in the first image in the multi-frame images to be processed are identified, and the following second limbs are determined based on the identified key points of the partial limbs. The key points of the partial limbs of the target object in the image, so as to realize the target tracking in the scene with only partial limbs (eg upper body) of the target object in the image, that is, the technical solutions of the embodiments of the present disclosure can simultaneously adapt to the complete limbs Scenes and partial limb (eg upper body) scenes, enabling target tracking in images.

本公开实施例还提供了一种图像处理方法。图4为本公开实施例的图像处理方法的流程示意图二；如图4所示，所述方法包括：The embodiments of the present disclosure also provide an image processing method. FIG. 4 is a second schematic flowchart of an image processing method according to an embodiment of the present disclosure; as shown in FIG. 4 , the method includes:

步骤201：获得多帧图像；Step 201: obtain multiple frames of images;

步骤202：对多帧图像中的第一图像中的目标对象进行肢体关键点检测处理，获得目标对象的部分肢体对应的第一关键点信息；Step 202: Perform limb key point detection processing on the target object in the first image in the multi-frame images, and obtain first key point information corresponding to part of the limbs of the target object;

步骤203：基于第一关键点信息确定第二图像中的目标对象的部分肢体对应的第二关键点信息；其中，第一图像为多帧图像中的任意一帧图像；第二图像为第一图像后的一帧图像；Step 203: Determine the second key point information corresponding to the part of the limb of the target object in the second image based on the first key point information; wherein, the first image is any one frame of images in the multi-frame images; the second image is the first image A frame of image after the image;

步骤204：基于第二关键点信息确定目标对象的姿态；基于目标对象的姿态确定对应于目标对象的交互指令。Step 204: Determine the gesture of the target object based on the second key point information; determine the interaction instruction corresponding to the target object based on the gesture of the target object.

本实施例步骤201至步骤203的具体阐述可参照步骤101至步骤103的描述，这里不再赘述。For a specific description of steps 201 to 203 in this embodiment, reference may be made to the descriptions of steps 101 to 103, and details are not repeated here.

本实施例可通过追踪到的目标对象、进而基于该目标对象的第二关键点信息确定目标对象的姿态，并基于目标对象的姿态确定各姿态对应的交互指令。之后，对各个姿态对应的交互指令进行响应。In this embodiment, the gesture of the target object can be determined based on the tracked target object and then based on the second key point information of the target object, and the interaction instruction corresponding to each gesture can be determined based on the gesture of the target object. After that, respond to the interaction instructions corresponding to each gesture.

本实施例适用于动作交互场景，图像处理装置可基于各姿态确定对应的交互指令，响应上述交互指令；响应上述交互指令，例如可以为开启或关闭图像处理装置自身或者图像处理装置所在的电子设备自身的某些功能等；或者，响应上述交互指令还可以是将上述交互指令发送给其他电子设备，其他电子设备接收到上述交互指令，基于交互指令开启或关闭某些功能，换句话说，上述交互指令也可以用于开启或关闭其他电子设备的对应功能。This embodiment is applicable to action interaction scenarios. The image processing apparatus can determine the corresponding interaction instruction based on each gesture, and respond to the above interaction instruction; in response to the above interaction instruction, for example, the image processing apparatus itself or the electronic device where the image processing apparatus is located can be turned on or off. Some functions of itself; or, in response to the above-mentioned interactive instructions, the above-mentioned interactive instructions can also be sent to other electronic devices, and other electronic devices receive the above-mentioned interactive instructions, and turn on or off certain functions based on the interactive instructions. In other words, the above-mentioned interactive instructions The interactive instructions can also be used to turn on or turn off corresponding functions of other electronic devices.

本实施例还适用于虚拟现实、增强现实或者体感游戏等各种应用场景。图像处理装置可基于各种交互指令执行相应的处理，处理包括但不限于控制虚拟现实或增强现实场景中、针对虚拟对象执行相应动作；控制体感游戏场景中、针对目标对象对应的虚拟角色执行相应的动作。一些示例中，若方法应用于增强现实或虚拟现实等场景，则图像处理装置基于交互指令执行的相应处理可以包括控制虚拟目标对象在真实场景或虚拟场景中执行与交互指令相应的动作。This embodiment is also applicable to various application scenarios such as virtual reality, augmented reality, or somatosensory games. The image processing device can perform corresponding processing based on various interactive instructions, including but not limited to controlling virtual reality or augmented reality scenes to perform corresponding actions on virtual objects; controlling somatosensory game scenes to perform corresponding actions on virtual characters corresponding to target objects. Actions. In some examples, if the method is applied to scenes such as augmented reality or virtual reality, the corresponding processing performed by the image processing apparatus based on the interaction instruction may include controlling the virtual target object to perform actions corresponding to the interaction instruction in the real scene or the virtual scene.

采用本公开实施例的技术方案，一方面实现了在图像中仅具有目标对象的部分肢体(例如上半身)的场景下的目标跟踪，也即本公开实施例的技术方案能够同时适应完整肢体场景和部分肢体(例如上半身)场景，实现了图像中的目标跟踪；另一方面，在目标跟踪过程中检测跟踪到的目标对象的关键点信息，并基于目标对象的关键点信息确定跟踪到的目标对象的姿态，基于目标对象的姿态确定对应的交互指令，实现了在特定应用场景(例如虚拟现实场景、增强现实场景、体感游戏场景等交互场景)中的人机交互，提升用户的交互体验。By adopting the technical solutions of the embodiments of the present disclosure, on the one hand, target tracking in a scene where only part of the limbs (such as the upper body) of the target object is in the image is realized, that is, the technical solutions of the embodiments of the present disclosure can be adapted to the scene of the complete limbs and the scene of the entire body at the same time. Some limbs (such as the upper body) scene, realize the target tracking in the image; on the other hand, during the target tracking process, the key point information of the tracked target object is detected, and the tracked target object is determined based on the key point information of the target object Based on the gesture of the target object, the corresponding interaction instruction is determined, which realizes the human-computer interaction in specific application scenarios (such as virtual reality scenarios, augmented reality scenarios, somatosensory game scenarios and other interactive scenarios) and improves the user's interactive experience.

本公开实施例还提供了一种图像处理装置。图5为本公开实施例的图像处理装置的组成结构示意图一；如图5所示，所述装置包括：获取单元31、检测单元32和跟踪确定单元33；其中，Embodiments of the present disclosure also provide an image processing apparatus. FIG. 5 is a schematic diagram 1 of the composition structure of an image processing apparatus according to an embodiment of the present disclosure; as shown in FIG. 5 , the apparatus includes: an acquisition unit 31 , a detection unit 32 and a tracking determination unit 33 ; wherein,

上述获取单元31，用于获得多帧图像；The above-mentioned obtaining unit 31 is used to obtain multiple frames of images;

上述检测单元32，用于对多帧图像中的第一图像中的目标对象进行肢体关键点检测处理，获得上述目标对象的部分肢体对应的第一关键点信息；The above-mentioned detection unit 32 is configured to perform limb key point detection processing on the target object in the first image in the multi-frame images, and obtain first key point information corresponding to part of the limbs of the above-mentioned target object;

上述跟踪确定单元33，用于基于上述第一关键点信息确定第二图像中的上述目标对象的上述部分肢体对应的第二关键点信息；其中，上述第一图像为上述多帧图像中的任意一帧图像；上述第二图像为上述第一图像后的一帧图像。The above-mentioned tracking determination unit 33 is configured to determine, based on the above-mentioned first key point information, second key point information corresponding to the above-mentioned part of the limb of the above-mentioned target object in the second image; wherein, the above-mentioned first image is any of the above-mentioned multiple frames of images. A frame of image; the second image is a frame of image after the first image.

在本公开的一些可选实施例中，如图6所示，上述检测单元32包括：肢体检测模块321和肢体关键点检测模块322；其中，In some optional embodiments of the present disclosure, as shown in FIG. 6 , the foregoing detection unit 32 includes: a limb detection module 321 and a limb key point detection module 322; wherein,

上述肢体检测模块321，用于对上述第一图像中的目标对象进行肢体检测处理，确定目标对象的第一区域；第一区域包括目标对象的部分肢体所在区域；The above-mentioned limb detection module 321 is used to perform limb detection processing on the target object in the above-mentioned first image, and determine the first area of the target object; the first area includes the area where part of the limbs of the target object are located;

上述肢体关键点检测模块322，用于对上述第一区域对应的像素点进行肢体关键点检测处理，获得上述目标对象的上述部分肢体对应的第一关键点信息。The limb key point detection module 322 is configured to perform limb key point detection processing on the pixel points corresponding to the first region to obtain first key point information corresponding to the partial limb of the target object.

在本公开的一些可选实施例中，上述跟踪确定单元33，用于基于上述第一关键点信息在第一图像中确定第二区域；上述第二区域大于上述目标对象的第一区域；第一区域包括目标对象的部分肢体所在区域；根据第二区域或第一关键点信息，确定第二图像中与第二区域的位置范围对应的第三区域；对第二图像中的第三区域内的像素点进行肢体关键点检测处理，获得上述部分肢体对应的第二关键点信息。In some optional embodiments of the present disclosure, the above-mentioned tracking determination unit 33 is configured to determine a second area in the first image based on the above-mentioned first key point information; the above-mentioned second area is larger than the above-mentioned first area of the target object; An area includes the area where part of the limb of the target object is located; according to the second area or the first key point information, determine the third area in the second image corresponding to the position range of the second area; Perform limb key point detection processing on the pixel points of the above-mentioned parts, and obtain the second key point information corresponding to the above-mentioned part of the limbs.

在本公开的一些可选实施例中，上述肢体检测模块321，用于利用肢体检测网络对上述第一图像中的上述目标对象进行肢体检测处理；其中，上述肢体检测网络采用第一类样本图像训练得到；上述第一类样本图像中标注有目标对象的检测框；检测框的标注范围包括目标对象的部分肢体所在区域。In some optional embodiments of the present disclosure, the above-mentioned limb detection module 321 is configured to use a limb detection network to perform limb detection processing on the above-mentioned target object in the above-mentioned first image; wherein, the above-mentioned limb detection network adopts the first type of sample images Training is obtained; the first type of sample image is marked with a detection frame of the target object; the labelled range of the detection frame includes the area where part of the limbs of the target object are located.

在本公开的一些可选实施例中，上述肢体关键点检测模块322，用于利用肢体关键点检测网络对上述第一区域对应的像素点进行肢体关键点检测处理；其中，上述肢体关键点检测网络采用第二类样本图像训练得到；上述第二类样本图像中标注有包括所述目标对象的部分肢体的关键点。In some optional embodiments of the present disclosure, the above-mentioned limb key point detection module 322 is configured to use a limb key point detection network to perform limb key point detection processing on the pixels corresponding to the above-mentioned first area; wherein, the above-mentioned limb key point detection The network is obtained by training the sample images of the second type; the above-mentioned sample images of the second type are marked with key points including part of the limbs of the target object.

在本公开的一些可选实施例中，上述目标对象的部分肢体包括以下至少之一：头部、颈部、肩部、胸部、腰部、髋部、手臂、手部；上述第一关键点信息和上述第二关键点信息包括头部、颈部、肩部、胸部、腰部、髋部、手臂和手部中的至少一个肢体的轮廓关键点信息和/骨骼关键点信息。In some optional embodiments of the present disclosure, the partial limbs of the target object include at least one of the following: head, neck, shoulder, chest, waist, hip, arm, and hand; the first key point information And the above-mentioned second key point information includes contour key point information and/or bone key point information of at least one limb among the head, neck, shoulder, chest, waist, hip, arm and hand.

在本公开的一些可选实施例中，如图7所示，上述装置还包括：分配单元34和统计单元35；其中，In some optional embodiments of the present disclosure, as shown in FIG. 7 , the above-mentioned apparatus further includes: an allocation unit 34 and a statistics unit 35; wherein,

上述分配单元34，用于响应于上述检测单元32获得目标对象的部分肢体对应的第一关键点信息的情况，为目标对象分配跟踪标识；The above-mentioned allocation unit 34 is configured to allocate a tracking identifier to the target object in response to the situation in which the above-mentioned detection unit 32 obtains the first key point information corresponding to part of the limbs of the target object;

上述统计单元35，用于基于对多帧图像的处理过程中分配的跟踪标识的数量，确定多帧图像中的目标对象的数量。The above-mentioned statistical unit 35 is configured to determine the number of target objects in the multi-frame images based on the number of tracking identifiers allocated in the process of processing the multi-frame images.

在本公开的一些可选实施例中，如图8所示，上述装置还包括确定单元36，用于基于第二关键点信息确定目标对象的姿态；基于目标对象的姿态确定对应于目标对象的交互指令。In some optional embodiments of the present disclosure, as shown in FIG. 8 , the above-mentioned apparatus further includes a determination unit 36, configured to determine the posture of the target object based on the second key point information; determine the posture corresponding to the target object based on the posture of the target object interactive instructions.

本公开实施例中，上述图像处理装置中的获取单元31、检测单元32(包括肢体检测模块321和肢体关键点检测模块322)、跟踪确定单元33、分配单元34、统计单元35和确定单元36，在实际应用中均可由中央处理器(CPU，Central Processing Unit)、数字信号处理器(DSP，Digital Signal Processor)、微控制单元(MCU，Microcontroller Unit)或可编程门阵列(FPGA，Field－Programmable Gate Array)实现。In the embodiment of the present disclosure, the acquisition unit 31 , the detection unit 32 (including the limb detection module 321 and the limb key point detection module 322 ), the tracking determination unit 33 , the allocation unit 34 , the statistics unit 35 and the determination unit 36 in the above-mentioned image processing apparatus , in practical applications, it can be composed of a central processing unit (CPU, Central Processing Unit), a digital signal processor (DSP, Digital Signal Processor), a microcontroller unit (MCU, Microcontroller Unit) or a programmable gate array (FPGA, Field-Programmable). Gate Array) implementation.

需要说明的是：上述实施例提供的图像处理装置在进行图像处理时，仅以上述各程序模块的划分进行举例说明，实际应用中，可以根据需要而将上述处理分配由不同的程序模块完成，即将装置的内部结构划分成不同的程序模块，以完成以上描述的全部或者部分处理。另外，上述实施例提供的图像处理装置与图像处理方法实施例属于同一构思，其具体实现过程详见方法实施例，这里不再赘述。It should be noted that: when the image processing apparatus provided in the above-mentioned embodiments performs image processing, only the division of the above-mentioned program modules is used as an example for illustration. That is, the internal structure of the device is divided into different program modules to complete all or part of the processing described above. In addition, the image processing apparatus and the image processing method embodiments provided by the above embodiments belong to the same concept, and the specific implementation process thereof is detailed in the method embodiments, which will not be repeated here.

本公开实施例还提供了一种电子设备。图9为本公开实施例的电子设备的硬件组成结构示意图；如图9所示，电子设备40可包括存储器42、处理器41及存储在存储器42上并可在处理器41上运行的计算机程序，上述处理器41执行上述程序时实现本公开实施例上述图像处理方法的步骤。Embodiments of the present disclosure also provide an electronic device. FIG. 9 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the disclosure; as shown in FIG. 9 , the electronic device 40 may include a memory 42 , a processor 41 and a computer program stored in the memory 42 and running on the processor 41 , when the processor 41 executes the program, the steps of the image processing method in the embodiment of the present disclosure are implemented.

可以理解，电子设备40中的各个组件可通过总线系统43耦合在一起。可理解，总线系统43用于实现这些组件之间的连接通信。总线系统43除包括数据总线之外，还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见，在图9中将各种总线都标为总线系统43。It will be appreciated that the various components in electronic device 40 may be coupled together by bus system 43 . It is understood that the bus system 43 is used to implement the connection communication between these components. In addition to the data bus, the bus system 43 also includes a power bus, a control bus and a status signal bus. However, for the sake of clarity, the various buses are designated as bus system 43 in FIG. 9 .

可以理解，存储器42可以是易失性存储器或非易失性存储器，也可包括易失性和非易失性存储器两者。其中，非易失性存储器可以是只读存储器(ROM，Read Only Memory)、可编程只读存储器(PROM，Programmable Read-Only Memory)、可擦除可编程只读存储器(EPROM，Erasable Programmable Read-Only Memory)、电可擦除可编程只读存储器(EEPROM，Electrically Erasable Programmable Read-Only Memory)、磁性随机存取存储器(FRAM，ferromagnetic random access memory)、快闪存储器(Flash Memory)、磁表面存储器、光盘、或只读光盘(CD-ROM，Compact Disc Read-Only Memory)；磁表面存储器可以是磁盘存储器或磁带存储器。易失性存储器可以是随机存取存储器(RAM，Random AccessMemory)，其用作外部高速缓存。通过示例性但不是限制性说明，许多形式的RAM可用，例如静态随机存取存储器(SRAM，Static Random Access Memory)、同步静态随机存取存储器(SSRAM，Synchronous Static Random Access Memory)、动态随机存取存储器(DRAM，Dynamic Random Access Memory)、同步动态随机存取存储器(SDRAM，SynchronousDynamic Random Access Memory)、双倍数据速率同步动态随机存取存储器(DDRSDRAM，Double Data Rate Synchronous Dynamic Random Access Memory)、增强型同步动态随机存取存储器(ESDRAM，Enhanced Synchronous Dynamic Random Access Memory)、同步连接动态随机存取存储器(SLDRAM，SyncLink Dynamic Random Access Memory)、直接内存总线随机存取存储器(DRRAM，Direct Rambus Random Access Memory)。本公开实施例描述的存储器42旨在包括但不限于这些和任意其它适合类型的存储器。It will be appreciated that the memory 42 may be either volatile memory or non-volatile memory, and may include both volatile and non-volatile memory. Among them, the non-volatile memory may be a read-only memory (ROM, Read Only Memory), a programmable read-only memory (PROM, Programmable Read-Only Memory), an erasable programmable read-only memory (EPROM, Erasable Programmable Read-only memory) Only Memory), Electrically Erasable Programmable Read-Only Memory (EEPROM, Electrically Erasable Programmable Read-Only Memory), Magnetic Random Access Memory (FRAM, ferromagnetic random access memory), Flash Memory (Flash Memory), Magnetic Surface Memory , CD-ROM, or Compact Disc Read-Only Memory (CD-ROM, Compact Disc Read-Only Memory); the magnetic surface memory can be a magnetic disk memory or a tape memory. The volatile memory may be Random Access Memory (RAM), which is used as an external cache memory. By way of example and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory Memory (DRAM, Dynamic Random Access Memory), Synchronous Dynamic Random Access Memory (SDRAM, SynchronousDynamic Random Access Memory), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM, Double Data Rate Synchronous Dynamic Random Access Memory), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM, Enhanced Synchronous Dynamic Random Access Memory), Synchronous Link Dynamic Random Access Memory (SLDRAM, SyncLink Dynamic Random Access Memory), Direct Memory Bus Random Access Memory (DRRAM, Direct Rambus Random Access Memory) . The memory 42 described in the embodiments of the present disclosure is intended to include, but not be limited to, these and any other suitable types of memory.

上述本公开实施例揭示的方法可以应用于处理器41中，或者由处理器41实现。处理器41可能是一种集成电路芯片，具有信号的处理能力。在实现过程中，上述方法的各步骤可以通过处理器41中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器41可以是通用处理器、DSP，或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。处理器41可以实现或者执行本公开实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本公开实施例所公开的方法的步骤，可以直接体现为硬件译码处理器执行完成，或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于存储介质中，该存储介质位于存储器42，处理器41读取存储器42中的信息，结合其硬件完成前述方法的步骤。The methods disclosed in the above embodiments of the present disclosure may be applied to the processor 41 or implemented by the processor 41 . The processor 41 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above-mentioned method can be completed by a hardware integrated logic circuit in the processor 41 or an instruction in the form of software. The above-mentioned processor 41 may be a general-purpose processor, a DSP, or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. The processor 41 may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present disclosure. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the methods disclosed in combination with the embodiments of the present disclosure can be directly embodied as being executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module may be located in a storage medium, and the storage medium is located in the memory 42, and the processor 41 reads the information in the memory 42, and completes the steps of the foregoing method in combination with its hardware.

在示例性实施例中，电子设备40可以被一个或多个应用专用集成电路(ASIC，Application Specific Integrated Circuit)、DSP、可编程逻辑器件(PLD，ProgrammableLogic Device)、复杂可编程逻辑器件(CPLD，Complex Programmable Logic Device)、FPGA、通用处理器、控制器、MCU、微处理器(Microprocessor)、或其他电子元件实现，用于执行前述方法。In an exemplary embodiment, the electronic device 40 may be implemented by one or more of Application Specific Integrated Circuit (ASIC, Application Specific Integrated Circuit), DSP, Programmable Logic Device (PLD, Programmable Logic Device), Complex Programmable Logic Device (CPLD, Complex Programmable Logic Device), FPGA, general-purpose processor, controller, MCU, microprocessor (Microprocessor), or other electronic components implemented for performing the aforementioned method.

在示例性实施例中，本公开实施例还提供了一种计算机可读存储介质，例如包括计算机程序的存储器42，上述计算机程序可由电子设备40的处理器41执行，以完成前述方法所述步骤。计算机可读存储介质可以是FRAM、ROM、PROM、EPROM、EEPROM、Flash Memory、磁表面存储器、光盘、或CD-ROM等存储器；也可以是包括上述存储器之一或任意组合的各种设备，如移动电话、计算机、平板设备、个人数字助理等。In an exemplary embodiment, the embodiment of the present disclosure further provides a computer-readable storage medium, such as a memory 42 including a computer program, and the computer program can be executed by the processor 41 of the electronic device 40 to complete the steps of the aforementioned method. . The computer-readable storage medium can be memory such as FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface memory, optical disk, or CD-ROM; it can also be various devices including one or any combination of the above-mentioned memories, such as Mobile phones, computers, tablet devices, personal digital assistants, etc.

本公开实施例还提供了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现本公开实施例所述图像处理方法的步骤。The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, implements the steps of the image processing methods described in the embodiments of the present disclosure.

本申请所提供的几个方法实施例中所揭露的方法，在不冲突的情况下可以任意组合，得到新的方法实施例。The methods disclosed in the several method embodiments provided in this application can be arbitrarily combined under the condition of no conflict to obtain new method embodiments.

本申请所提供的几个产品实施例中所揭露的特征，在不冲突的情况下可以任意组合，得到新的产品实施例。The features disclosed in the several product embodiments provided in this application can be combined arbitrarily without conflict to obtain a new product embodiment.

本申请所提供的几个方法或设备实施例中所揭露的特征，在不冲突的情况下可以任意组合，得到新的方法实施例或设备实施例。The features disclosed in several method or device embodiments provided in this application can be combined arbitrarily without conflict to obtain new method embodiments or device embodiments.

在本申请所提供的几个实施例中，应该理解到，所揭露的设备和方法，可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，如：多个单元或组件可以结合，或可以集成到另一个系统，或一些特征可以忽略，或不执行。另外，所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口，设备或单元的间接耦合或通信连接，可以是电性的、机械的或其它形式的。In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored, or not implemented. In addition, the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms. of.

上述作为分离部件说明的单元可以是、或也可以不是物理上分开的，作为单元显示的部件可以是、或也可以不是物理单元，即可以位于一个地方，也可以分布到多个网络单元上；可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。The unit described above as a separate component may or may not be physically separated, and the component displayed as a unit may or may not be a physical unit, that is, it may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本公开各实施例中的各功能单元可以全部集成在一个处理单元中，也可以是各单元分别单独作为一个单元，也可以两个或两个以上单元集成在一个单元中；上述集成的单元既可以采用硬件的形式实现，也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present disclosure may be all integrated into one processing unit, or each unit may be separately used as a unit, or two or more units may be integrated into one unit; the above integration The unit can be implemented either in the form of hardware or in the form of hardware plus software functional units.

本领域普通技术人员可以理解：实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成，前述的程序可以存储于一计算机可读取存储介质中，该程序在执行时，执行包括上述方法实施例的步骤；而前述的存储介质包括：移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps of implementing the above method embodiments can be completed by program instructions related to hardware, the aforementioned program can be stored in a computer-readable storage medium, and when the program is executed, execute It includes the steps of the above method embodiments; and the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic disk or an optical disk and other media that can store program codes.

或者，本公开上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时，也可以存储在一个计算机可读取存储介质中。基于这样的理解，本公开实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本公开各个实施例所述方法的全部或部分。而前述的存储介质包括：移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Alternatively, if the above-mentioned integrated units of the present disclosure are implemented in the form of software functional modules and sold or used as independent products, they may also be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present disclosure essentially or the parts that make contributions to the prior art can be embodied in the form of a software product, and the computer software product is stored in a storage medium and includes several instructions for A computer device (which may be a personal computer, a server, or a network device, etc.) is caused to execute all or part of the methods described in the various embodiments of the present disclosure. The aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic disk or an optical disk and other mediums that can store program codes.

以上所述，仅为本公开的具体实施方式，但本公开的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本公开揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本公开的保护范围之内。因此，本公开的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present disclosure, but the protection scope of the present disclosure is not limited to this. should be included within the scope of protection of the present disclosure. Therefore, the protection scope of the present disclosure should be based on the protection scope of the claims.

Claims

1. an image processing method, is characterized in that, described method comprises:

Get multiple frames of images;

Performing limb key point detection processing on the target object in the first image of the multi-frame images to obtain first key point information corresponding to part of the limbs of the target object;

The second key point information corresponding to the part of the limb of the target object in the second image is determined based on the first key point information; wherein, the first image is any one frame of images in the multiple frames of images ; the second image is a frame of image after the first image.

2 . The method according to claim 1 , wherein, performing limb key point detection processing on the target object in the first image to obtain first key point information corresponding to part of the limbs of the target object, comprising: 2 .

Perform limb detection processing on the target object in the first image, and determine a first area of the target object; the first area includes the area where part of the limbs of the target object are located;

Perform limb key point detection processing on the pixel points corresponding to the first area to obtain first key point information corresponding to the part of the limb of the target object.

3 . The method according to claim 1 , wherein the determining, based on the first key point information, the second key point information corresponding to the part of the limb of the target object in the second image comprises: 3 .

A second area is determined in the first image based on the first key point information; the second area is larger than the first area of the target object; the first area includes the area where part of the target object's limbs are located ;

determining a third area in the second image corresponding to the position range of the second area according to the second area or the first key point information;

Perform limb key point detection processing on the pixels in the third area in the second image to obtain second key point information corresponding to the part of the limb.

4. The method according to claim 2 or 3, wherein the performing limb detection processing on the target object in the first image comprises:

Use a limb detection network to perform limb detection processing on the target object in the first image;

The limb detection network is obtained by training the first type of sample images; the first type of sample images are marked with a detection frame of the target object; the marked range of the detection frame includes the area where part of the limbs of the target object are located.

5. The method according to claim 2 or 3, characterized in that, performing limb key point detection processing on the pixels corresponding to the first area, comprising:

Use the limb key point detection network to perform limb key point detection processing on the pixels corresponding to the first area;

Wherein, the limb key point detection network is obtained by training the second type of sample images; the second type of sample images are marked with key points including part of the limbs of the target object.

6. The method according to any one of claims 1 to 5, wherein the partial limbs of the target object comprise at least one of the following: head, neck, shoulder, chest, waist, hip, arm ,hand;

The first key point information and the second key point information include contour key point information and/or bone key information of at least one limb in the head, neck, shoulder, chest, waist, hip, arm and hand. point information.

7. The method according to any one of claims 1 to 6, wherein the method further comprises:

In response to obtaining the first key point information corresponding to the partial limbs of the target object, assigning a tracking identifier to the target object;

The number of target objects in the multi-frame images is determined based on the number of the tracking identifiers allocated during the processing of the multi-frame images.

8. The method according to any one of claims 1 to 7, wherein the method further comprises:

determining the pose of the target object based on the second key point information;

An interaction instruction corresponding to the target object is determined based on the gesture of the target object.

9. An image processing device, characterized in that the device comprises: an acquisition unit, a detection unit, and a tracking determination unit; wherein,

The acquisition unit is used to acquire multiple frames of images;

The detection unit is configured to perform limb key point detection processing on the target object in the first image of the multi-frame images, and obtain first key point information corresponding to part of the limbs of the target object;

The tracking determination unit is configured to determine, based on the first key point information, second key point information corresponding to the part of the limb of the target object in the second image; wherein the first image is the multiple Any frame image in the frame images; the second image is a frame image after the first image.

10. The device according to claim 9, wherein the detection unit comprises: a limb detection module and a limb key point detection module; wherein,

The limb detection module is configured to perform limb detection processing on the target object in the first image, and determine a first region of the target object; the first region includes the region where part of the limbs of the target object are located ;

The limb key point detection module is configured to perform limb key point detection processing on the pixel points corresponding to the first area to obtain first key point information corresponding to the part of the limb of the target object.

11. The apparatus according to claim 9, wherein the tracking determination unit is configured to determine a second area in the first image based on the first key point information; the second area is larger than the the first area of the target object; the first area includes the area where part of the body of the target object is located; according to the second area or the first key point information, determine the difference between the second image and the first The third area corresponding to the position range of the second area; performing limb key point detection processing on the pixel points in the third area in the second image to obtain second key point information corresponding to the part of the limb.

12. The device according to claim 10 or 11, wherein the limb detection module is configured to perform limb detection processing on the target object in the first image by using a limb detection network;

13. The device according to claim 10 or 11, wherein the limb key point detection module is used to perform limb key point detection processing on the pixels corresponding to the first area by using a limb key point detection network;

14. The device according to any one of claims 9 to 13, wherein part of the limbs of the target object comprises at least one of the following: head, neck, shoulder, chest, waist, hip, arm ,hand;

15. The device according to any one of claims 9 to 14, wherein the device further comprises an allocation unit and a statistics unit; wherein,

the assigning unit, configured to assign a tracking identifier to the target object in response to the detection unit obtaining the first key point information corresponding to part of the limbs of the target object;

The statistics unit is configured to determine the number of target objects in the multi-frame images based on the number of the tracking identifiers allocated during the processing of the multi-frame images.

16. The apparatus according to any one of claims 9 to 15, wherein the apparatus further comprises a determination unit, configured to determine the pose of the target object based on the second key point information; based on the target The pose determination of the object corresponds to the interaction instruction of the target object.

17. A computer-readable storage medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the steps of the method according to any one of claims 1 to 8 are implemented.

18. An electronic device comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements any one of claims 1 to 8 when executing the program the steps of the method.