CN111401318A

CN111401318A - Action recognition method and device

Info

Publication number: CN111401318A
Application number: CN202010292042.2A
Authority: CN
Inventors: 周明才; 周大江; 朱世艾; 杜志军
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Digital Service Technology Co ltd
Priority date: 2020-04-14
Filing date: 2020-04-14
Publication date: 2020-07-10
Anticipated expiration: 2040-04-14
Also published as: CN111401318B

Abstract

This specification provides an action recognition method and device, wherein the action recognition method includes: acquiring an image frame collected by an image acquisition device; segmenting the action area of a target object in the image frame to obtain an intermediate image; The image input action recognition model performs key point identification and coordinate mapping, and obtains the action group node corresponding to the identified action key point, and the three-dimensional coordinate information of the action key point mapping; according to the three-dimensional coordinate information, the action group The frame node and the virtual action to which it belongs are assembled to generate a 3D virtual object; the action recognition is performed on the 3D virtual object based on the action recognition data set, and the action type of the target object is determined.

Description

Action recognition method and device

技术领域technical field

本说明书涉及手势识别技术领域，特别涉及一种动作识别方法。本说明书同时涉及一种动作识别装置，一种计算设备，以及一种计算机可读存储介质。The present specification relates to the technical field of gesture recognition, and in particular, to an action recognition method. This specification also relates to a motion recognition device, a computing device, and a computer-readable storage medium.

背景技术Background technique

随着互联网技术的发展，基于手势识别的AR交互在移动终端得到了广泛的应用，用户可以通过不触碰移动终端的方式，实现通过手势对移动终端上相关的应用进行操作。然而，手势识别通常是采用预先采集较多的手势图像作为训练样本，离线训练一个手势检测器，之后将该检测器部署到线上即可实现手势的识别，或者通过二维手部关键点识别的方式构建二维手势图像，再基于二维手势图像实现手势的识别；无论是通过检测器实现手势识别，还是通过构建二维图像实现手势识别，都存在着成本高、通用性和灵活性低的问题，并且由于人手的自由度比较高，同一种手势在不同视角下的手势类型可能存在差异，从而进一步的影响了手势识别的准确度。With the development of Internet technology, AR interaction based on gesture recognition has been widely used in mobile terminals. Users can operate related applications on the mobile terminal through gestures without touching the mobile terminal. However, gesture recognition usually uses a large number of pre-collected gesture images as training samples, trains a gesture detector offline, and then deploys the detector online to realize gesture recognition, or recognizes through two-dimensional hand key points The two-dimensional gesture image is constructed by the method, and then the gesture recognition is realized based on the two-dimensional gesture image; whether the gesture recognition is realized by the detector or the gesture recognition by constructing the two-dimensional image, there are high cost, low versatility and low flexibility. Moreover, due to the relatively high degree of freedom of the human hand, there may be differences in the gesture types of the same gesture under different viewing angles, which further affects the accuracy of gesture recognition.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本说明书实施例提供了一种动作识别方法。本说明书同时涉及一种动作识别装置，一种计算设备，以及一种计算机可读存储介质，以解决现有技术中存在的技术缺陷。In view of this, the embodiments of the present specification provide an action recognition method. This specification also relates to a motion recognition device, a computing device, and a computer-readable storage medium, so as to solve the technical defects existing in the prior art.

根据本说明书实施例的第一方面，提供了一种动作识别方法，包括：According to a first aspect of the embodiments of the present specification, an action recognition method is provided, including:

获取图像采集设备采集的图像帧；Obtain the image frame collected by the image acquisition device;

对所述图像帧中的目标对象的动作区域进行分割处理，获得中间图像；Segmenting the action area of the target object in the image frame to obtain an intermediate image;

将所述中间图像输入动作识别模型进行关键点识别和坐标映射，获得识别出的动作关键点对应的动作组架节点，以及所述动作关键点映射的三维坐标信息；Inputting the intermediate image into the action recognition model to perform key point recognition and coordinate mapping, to obtain the action frame node corresponding to the identified action key point, and the three-dimensional coordinate information of the action key point mapping;

根据所述三维坐标信息、所述动作组架节点及其所属的虚拟动作组架生成三维虚拟对象；Generate a three-dimensional virtual object according to the three-dimensional coordinate information, the action frame node and the virtual action frame to which it belongs;

基于动作识别数据集对所述三维虚拟对象进行动作识别，确定所述目标对象的动作类型。Action recognition is performed on the three-dimensional virtual object based on the action recognition data set, and the action type of the target object is determined.

可选的，所述动作识别模型进行关键点识别，包括：Optionally, the action recognition model performs key point recognition, including:

在所述中间图像中识别出所述目标对象对应的所述动作关键点，并确定所述动作关键点的关键点标签；Identifying the action key point corresponding to the target object in the intermediate image, and determining the key point label of the action key point;

基于关键点标签与动作组架节点的节点标签的对应关系，确定所述关键点标签对应的目标节点标签；Determine the target node label corresponding to the key point label based on the correspondence between the key point label and the node label of the action frame node;

根据所述目标节点标签确定所述关键点标签所属的动作关键点对应的所述动作组架节点。The action frame node corresponding to the action key point to which the key point label belongs is determined according to the target node label.

可选的，所述动作识别模型进行坐标映射，包括：Optionally, the action recognition model performs coordinate mapping, including:

确定所述动作关键点在所述中间图像中的位置信息；determining the position information of the action key point in the intermediate image;

基于所述位置信息映射出所述动作关键点对应的所述三维坐标信息。The three-dimensional coordinate information corresponding to the action key point is mapped based on the position information.

可选的，所述根据所述三维坐标信息、所述动作组架节点及其所属的虚拟动作组架生成三维虚拟对象，包括：Optionally, generating a three-dimensional virtual object according to the three-dimensional coordinate information, the action frame node and the virtual action frame to which it belongs includes:

根据所述动作关键点确定所述动作组架节点与所述三维坐标信息的对应关系，以及基于所述虚拟动作组架确定所述动作组架节点的连接关系；Determine the corresponding relationship between the action frame node and the three-dimensional coordinate information according to the action key point, and determine the connection relationship of the action frame node based on the virtual action frame;

将所述三维坐标信息按照所述连接关系进行连接处理，生成所述三维虚拟对象。The three-dimensional coordinate information is connected according to the connection relationship to generate the three-dimensional virtual object.

可选的，所述根据所述三维坐标信息、所述动作组架节点及其所属的虚拟动作组架生成三维虚拟对象步骤执行之后，且所述基于动作识别数据集对所述三维虚拟对象进行动作识别，确定所述目标对象的动作类型步骤执行之前，还包括：Optionally, after the step of generating a 3D virtual object according to the 3D coordinate information, the action frame node and the virtual action frame to which it belongs is performed, and the 3D virtual object is performed based on the motion recognition data set. Action recognition, before the step of determining the action type of the target object is performed, also includes:

检测生成的所述三维虚拟对象的对象数目；detecting the number of objects of the generated three-dimensional virtual object;

在所述对象数目大于预设数目阈值的情况下，对所述对象数目对应的多个三维虚拟对象进行标准化处理，获得待选择虚拟对象；In the case that the number of objects is greater than the preset number threshold, standardizing processing is performed on a plurality of three-dimensional virtual objects corresponding to the number of objects to obtain the virtual objects to be selected;

在所述待选择虚拟对象中选择目标虚拟对象作为所述三维虚拟对象。A target virtual object is selected from the virtual objects to be selected as the three-dimensional virtual object.

可选的，所述获取图像采集设备采集的图像帧步骤执行之前，还包括：Optionally, before the step of acquiring the image frames acquired by the image acquisition device is performed, the method further includes:

接收用户通过动作交互页面提交的点击指令；Receive the click instruction submitted by the user through the action interaction page;

根据所述点击指令向所述用户展示至少一个动作图像帧；所述动作图像帧中包含展示动作对应的展示区域。At least one action image frame is displayed to the user according to the click instruction; the action image frame includes a display area corresponding to the display action.

可选的，所述基于动作识别数据集对所述三维虚拟对象进行动作识别，确定所述目标对象的动作类型，包括：Optionally, performing motion recognition on the three-dimensional virtual object based on the motion recognition data set, and determining the motion type of the target object, including:

接收服务端针对所述动作图像帧下发的所述动作识别数据集；所述动作识别数据中携带有与所述展示动作匹配的动作识别规则；Receive the motion recognition data set issued by the server for the motion image frame; the motion recognition data carries the motion recognition rule matching the display action;

根据所述动作识别规则，判断所述三维虚拟对象的动作与所述展示动作是否匹配；According to the action recognition rule, determine whether the action of the three-dimensional virtual object matches the display action;

若是，根据所述动作识别数据集确定所述展示动作的展示动作类型，并将所述展示动作类型作为所述动作类型。If so, determine a presentation action type of the presentation action according to the action recognition data set, and use the presentation action type as the action type.

可选的，所述根据所述动作识别数据集确定所述展示动作的展示动作类型，并将所述展示动作类型作为所述动作类型子步骤执行之后，还包括：Optionally, after determining the display action type of the display action according to the action recognition data set, and performing the display action type as the action type sub-step, the method further includes:

通过所述动作交互页面向所述用户展示与所述动作类型匹配的推荐信息。Recommended information matching the action type is displayed to the user through the action interaction page.

可选的，所述基于动作识别数据集对所述三维虚拟对象进行动作识别，确定所述目标对象的动作类型步骤执行之后，还包括：Optionally, after the step of performing motion recognition on the three-dimensional virtual object based on the motion recognition data set and determining the motion type of the target object is performed, the method further includes:

将所述动作类型与所述展示动作的展示动作类型进行匹配；matching the action type with the presentation action type of the presentation action;

若匹配成功，则通过所述动作交互页面向所述用户展示成功匹配信息；If the matching is successful, displaying the successful matching information to the user through the action interaction page;

若匹配失败，则通过所述动作交互页面向所述用户展示提醒信息，所述提醒信息中携带有动作策略。If the matching fails, reminding information is displayed to the user through the action interaction page, and the reminding information carries an action strategy.

可选的，在所述目标对象为手部的情况下，所述对所述图像帧中的目标对象的动作区域进行分割处理，获得中间图像，包括：Optionally, in the case where the target object is a hand, performing segmentation processing on the action area of the target object in the image frame to obtain an intermediate image, including:

检测所述图像帧中所述手部对应的特征区域；detecting the characteristic area corresponding to the hand in the image frame;

按照所述特征区域对所述图像帧进行裁剪，获得包含手部特征的所述中间图像；Crop the image frame according to the feature area to obtain the intermediate image containing hand features;

相应的，所述动作类型为手势动作类型。Correspondingly, the action type is a gesture action type.

对所述动作识别数据集进行解析，获得动作要素；Analyzing the action recognition data set to obtain action elements;

基于所述动作要素对所述三维虚拟对象进行动作识别的识别结果，确定所述三维虚拟对象的中间动作类型；determining an intermediate action type of the 3D virtual object based on the recognition result of the action recognition performed on the 3D virtual object by the action element;

将所述中间动作类型确定为所述目标对象的所述动作类型。The intermediate action type is determined as the action type of the target object.

可选的，所述基于所述动作要素对所述三维虚拟对象进行动作识别的识别结果，确定所述三维虚拟对象的中间动作类型，包括：Optionally, determining the intermediate action type of the 3D virtual object based on the recognition result of the action recognition performed on the 3D virtual object based on the action element includes:

计算所述三维虚拟对象中各个虚拟子对象之间的动作角度；calculating an action angle between each virtual sub-object in the three-dimensional virtual object;

基于所述动作要素对所述动作角度进行检测，根据检测结果确定所述三维虚拟对象的所述中间动作类型。The action angle is detected based on the action element, and the intermediate action type of the three-dimensional virtual object is determined according to the detection result.

根据所述动作要素对所述三维虚拟对象中各个虚拟子对象的动作位置进行检测；Detecting the action position of each virtual sub-object in the three-dimensional virtual object according to the action element;

基于检测结果确定所述三维虚拟对象的所述中间动作类型。The intermediate action type of the three-dimensional virtual object is determined based on the detection result.

可选的，所述动作识别数据集是由服务端下发的动作识别包组成，其中，所述动作识别包中包含对所述三维虚拟对象进行识别的识别规则。Optionally, the motion recognition data set is composed of an motion recognition package delivered by a server, wherein the motion recognition package contains recognition rules for recognizing the three-dimensional virtual object.

根据本说明书实施例的第二方面，提供了一种动作识别装置，包括：According to a second aspect of the embodiments of the present specification, a motion recognition device is provided, including:

获取模块，被配置为获取图像采集设备采集的图像帧；an acquisition module, configured to acquire image frames collected by the image acquisition device;

处理模块，被配置为对所述图像帧中的目标对象的动作区域进行分割处理，获得中间图像；a processing module, configured to perform segmentation processing on the action area of the target object in the image frame to obtain an intermediate image;

识别模块，被配置为将所述中间图像输入动作识别模型进行关键点识别和坐标映射，获得识别出的动作关键点对应的动作组架节点，以及所述动作关键点映射的三维坐标信息；The recognition module is configured to input the intermediate image into the action recognition model to perform key point recognition and coordinate mapping, and obtain the action frame node corresponding to the identified action key point, and the three-dimensional coordinate information of the action key point mapping;

生成模块，被配置为根据所述三维坐标信息、所述动作组架节点及其所属的虚拟动作组架生成三维虚拟对象；a generating module, configured to generate a three-dimensional virtual object according to the three-dimensional coordinate information, the action framing node and the virtual motion framing node to which it belongs;

确定模块，被配置为基于动作识别数据集对所述三维虚拟对象进行动作识别，确定所述目标对象的动作类型。The determining module is configured to perform motion recognition on the three-dimensional virtual object based on the motion recognition data set, and determine the motion type of the target object.

根据本说明书实施例的第三方面，提供了一种计算设备，包括：According to a third aspect of the embodiments of the present specification, a computing device is provided, including:

存储器和处理器；memory and processor;

所述存储器用于存储计算机可执行指令，所述处理器用于执行所述计算机可执行指令：The memory is used to store computer-executable instructions, and the processor is used to execute the computer-executable instructions:

根据本说明书实施例的第四方面，提供了一种计算机可读存储介质，其存储有计算机可执行指令，该指令被处理器执行时实现所述动作识别方法的步骤。According to a fourth aspect of the embodiments of the present specification, a computer-readable storage medium is provided, which stores computer-executable instructions, and when the instructions are executed by a processor, implements the steps of the action recognition method.

本说明书一实施例提供的动作识别方法，通过对用户摆放的动作进行识别的过程中，对获取到的图像帧进行图像分割处理，获得包含目标对象的中间图像，再将所述中间图像输入到动作识别模型，获得动作组架节点以及三位坐标信息，根据所述三维坐标信息、所述动作组架节点及其所属的虚拟动作组架生成所述三维虚拟对象，实现了可以通过生成的三维虚拟对象进行动作类型的识别，有效的解决不同图像采集角度导致图像帧采集不够标准而影响动作类型识别准确率的问题，同时根据所述动作识别数据集对所述三维虚拟对象进行动作识别，确定所述目标对象的动作类型，使得在动作识别的场景中具有更加通用、灵活以及更好的拓展性，能够实现在添加新动作类型的情况下，无需对模型进行重新训练，通过增加动作识别数据集的方式即可实现对新动作类型的识别，使得应用场景变的更加广泛。In the action recognition method provided by an embodiment of this specification, in the process of recognizing the action placed by the user, image segmentation processing is performed on the acquired image frame to obtain an intermediate image including the target object, and then the intermediate image is input To the action recognition model, obtain the action frame node and three-dimensional coordinate information, and generate the three-dimensional virtual object according to the three-dimensional coordinate information, the action frame node and the virtual action frame to which it belongs, and realize the The three-dimensional virtual object recognizes the action type, which effectively solves the problem that the image frame acquisition is not standard enough due to different image acquisition angles and affects the accuracy of the action type recognition. Determining the action type of the target object makes it more versatile, flexible and more scalable in the action recognition scene, and can realize that in the case of adding a new action type, there is no need to retrain the model, and by adding action recognition The data set method can realize the identification of new action types, making the application scenarios more extensive.

附图说明Description of drawings

图1是本说明书一实施例提供的一种动作识别方法的流程图；FIG. 1 is a flowchart of an action recognition method provided by an embodiment of the present specification;

图2是本说明书一实施例提供的一种动作识别方法中的中间图像的示意图；2 is a schematic diagram of an intermediate image in an action recognition method provided by an embodiment of the present specification;

图3是本说明书一实施例提供的一种动作识别方法中的动作组架的示意图；3 is a schematic diagram of an action framing in an action recognition method provided by an embodiment of the present specification;

图4是本说明书一实施例提供的一种动作识别方法中的三维虚拟对象的示意图；4 is a schematic diagram of a three-dimensional virtual object in an action recognition method provided by an embodiment of the present specification;

图5是本说明书一实施例提供的一种应用于手势识别场景中的动作识别方法的处理流程图；FIG. 5 is a process flow diagram of an action recognition method applied in a gesture recognition scene provided by an embodiment of the present specification;

图6是本说明书一实施例提供的一种动作识别装置的结构示意图；6 is a schematic structural diagram of a motion recognition device provided by an embodiment of the present specification;

图7是本说明书一实施例提供的一种计算设备的结构框图。FIG. 7 is a structural block diagram of a computing device provided by an embodiment of the present specification.

具体实施方式Detailed ways

在下面的描述中阐述了很多具体细节以便于充分理解本说明书。但是本说明书能够以很多不同于在此描述的其它方式来实施，本领域技术人员可以在不违背本说明书内涵的情况下做类似推广，因此本说明书不受下面公开的具体实施的限制。In the following description, numerous specific details are set forth in order to provide a thorough understanding of this specification. However, this specification can be implemented in many other ways different from those described herein, and those skilled in the art can make similar promotions without departing from the connotation of this specification. Therefore, this specification is not limited by the specific implementation disclosed below.

在本说明书一个或多个实施例中使用的术语是仅仅出于描述特定实施例的目的，而非旨在限制本说明书一个或多个实施例。在本说明书一个或多个实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义。还应当理解，本说明书一个或多个实施例中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terminology used in one or more embodiments of this specification is for the purpose of describing a particular embodiment only and is not intended to limit the one or more embodiments of this specification. As used in the specification or embodiments and the appended claims, the singular forms "a," "the," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that the term "and/or" as used in this specification in one or more embodiments refers to and includes any and all possible combinations of one or more of the associated listed items.

应当理解，尽管在本说明书一个或多个实施例中可能采用术语第一、第二等来描述各种信息，但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如，在不脱离本说明书一个或多个实施例范围的情况下，第一也可以被称为第二，类似地，第二也可以被称为第一。取决于语境，如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It will be understood that although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, such information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other. For example, a first could be termed a second, and similarly, a second could be termed a first, without departing from the scope of one or more embodiments of this specification. Depending on the context, the word "if" as used herein can be interpreted as "at the time of" or "when" or "in response to determining."

在本说明书中，提供了一种动作识别方法，本说明书同时涉及一种动作识别装置，一种计算设备，以及一种计算机可读存储介质，在下面的实施例中逐一进行详细说明。In this specification, a motion recognition method is provided, and the specification also relates to a motion recognition apparatus, a computing device, and a computer-readable storage medium, which will be described in detail one by one in the following embodiments.

图1示出了根据本说明书一实施例提供的一种动作识别方法的流程图，具体包括以下步骤：FIG. 1 shows a flowchart of an action recognition method provided according to an embodiment of the present specification, which specifically includes the following steps:

步骤102：获取图像采集设备采集的图像帧。Step 102: Acquire an image frame collected by an image collection device.

实际应用中，通过对用户摆放的手势进行识别，为用户提供相应的服务，使得服务平台可以使用户通过便捷的操作得到相应的服务，能够有效的提高用户的体验效果，而手势识别的准确度将是支持该项服务的基础；手势识别通常是采用训练手势检测器的方式，以及通过二维手部关键点进行手势识别，虽然可以达到识别的目的，但是在灵活性以及准确度上存在着缺陷，在采集到的手势图像受到采集角度影响的情况下，此时手势识别的准确率将会很大程度的下降，并且在增加新手势时，手势检测器需要重新进行训练才能够识别新手势，故对场景的局限比较大。In practical applications, by recognizing the gestures placed by users to provide users with corresponding services, the service platform can enable users to obtain corresponding services through convenient operations, which can effectively improve the user experience. Degree will be the basis to support this service; gesture recognition is usually done by training gesture detectors, and gesture recognition through two-dimensional hand key points. Although the purpose of recognition can be achieved, there are limitations in flexibility and accuracy. Due to the defect, when the collected gesture images are affected by the collection angle, the accuracy of gesture recognition will be greatly reduced, and when new gestures are added, the gesture detector needs to be retrained to be able to recognize new gestures. Gestures, so the limitations of the scene are relatively large.

本实施例提供的动作识别方法，为了能够提高动作识别的准确度，以及在新增动作的情况下，依旧可以进行动作识别，通过对用户摆放的动作进行识别的过程中，对获取到的图像帧进行图像分割处理，获得包含目标对象的中间图像，再将所述中间图像输入到动作识别模型，获得动作组架节点以及三位坐标信息，根据所述三维坐标信息、所述动作组架节点及其所属的虚拟动作组架生成所述三维虚拟对象，实现了可以通过生成的三维虚拟对象进行动作类型的识别，有效的解决不同图像采集角度导致图像帧采集不够标准而影响动作类型识别准确率的问题，同时根据所述动作识别数据集对所述三维虚拟对象进行动作识别，确定所述目标对象的动作类型，使得在动作识别的场景中具有更加通用、灵活以及更好的拓展性，能够实现在添加新动作类型的情况下，无需对模型进行重新训练，通过增加动作识别数据集的方式即可实现对新动作类型的识别，使得应用场景变的更加广泛。In the action recognition method provided in this embodiment, in order to improve the accuracy of action recognition, and in the case of adding new actions, action recognition can still be performed. The image frame is subjected to image segmentation processing to obtain an intermediate image containing the target object, and then the intermediate image is input into the action recognition model to obtain the action frame node and three-dimensional coordinate information. According to the three-dimensional coordinate information, the action frame The nodes and their associated virtual actions are assembled to generate the 3D virtual object, which realizes the recognition of the action type through the generated 3D virtual object, and effectively solves the problem that the image frame collection is not standard enough due to different image capture angles, which affects the accuracy of action type recognition. At the same time, according to the action recognition data set, the action recognition of the three-dimensional virtual object is carried out, and the action type of the target object is determined, so that it has more generality, flexibility and better expansion in the action recognition scene, It can realize that in the case of adding a new action type, the model does not need to be retrained, and the recognition of the new action type can be realized by adding an action recognition data set, so that the application scene becomes more extensive.

具体实施时，本说明书提供的动作识别方法，可以对手势动作进行识别，例如，用户通过手部摆放数字“8”，通过对用户的手部摆放姿势进行识别，可以识别出用户的手势类型即为数字类型“8”；或者还可以对肢体动作进行识别，例如，用户通过肢体动作摆放字母“Y”，通过对用户的肢体动作进行识别，可以识别出用户的肢体动作即为字母类型“Y”；此外还可以对腿部动作进行识别，例如，用户通过腿部摆放汉字“人”，通过对用户的腿部动作进行识别，可以识别出用户的腿部动作即为汉字类型“人”。In specific implementation, the action recognition method provided in this specification can recognize gesture actions. For example, the user can recognize the gesture of the user by placing the number "8" on the hand and recognizing the posture of the user's hand. The type is the number type "8"; or it can also identify body movements, for example, the user places the letter "Y" through body movements, and by recognizing the user's body movements, it can be recognized that the user's body movements are letters Type "Y"; in addition, leg movements can also be identified, for example, the user places the Chinese character "人" on his legs, and by recognizing the user's leg movements, it can be identified that the user's leg movements are the type of Chinese characters "people".

基于此，动作识别即为对用户通过手部、肢体或者腿部摆放动作的动作类型进行识别，相应的，所述目标对象即为用户的手部、肢体或者腿部；在目标对象为手部的情况下，此时动作识别方法即为对用户的手部摆放的动作类型进行识别，识别出的动作类型即为用户通过手部摆放出的手势所属的类型，比如数字“8”或数字“1”等；在目标对象为肢体的情况下，此时动作识别方法即为对用户的肢体摆放的动作类型进行识别，识别出的动作类型即为用户通过肢体摆放处的肢体所属的类型，比如字母“Y”或者汉字“个”等。Based on this, action recognition is to recognize the action type of the user's hand, limb or leg placement action. Correspondingly, the target object is the user's hand, limb or leg; if the target object is the hand In the case of the user's hand, the action recognition method at this time is to recognize the action type placed by the user's hand, and the recognized action type is the type of the gesture placed by the user through the hand, such as the number "8". Or the number "1", etc.; when the target object is a limb, the action recognition method at this time is to identify the action type of the user's limb, and the identified action type is the limb where the user passes the limb. The type it belongs to, such as the letter "Y" or the Chinese character "ge", etc.

本实施例将以目标对象为用户的手部对所述动作识别方法进行描述，在所述目标对象为肢体或者脚部的情况下，可以参见本实施相关的描述内容，在此不作过多赘述，需要说明的是，无论目标对象为肢体还是手部，具体描述内容都可以相互参见。In this embodiment, the action recognition method will be described by taking the target object as the user's hand. In the case where the target object is a limb or a foot, reference may be made to the description related to this implementation, and details are not repeated here. , it should be noted that, regardless of whether the target object is a limb or a hand, the specific description content can refer to each other.

具体的，所述图像采集设备具体是指对用户摆放的动作进行图像采集的设备，可以是手机或者摄像机等，相应的，所述图像帧具体是指针对用户摆放的动作进行图像采集后获得的图像，当采集到的图像帧中包含用户摆放的动作，则可以进行后续的动作识别，再或者可以是通过图像采集设备采集到大量的图像帧，选择其中包含用户摆放的动作的图像作为图像帧进行后续的动作识别。Specifically, the image capture device specifically refers to a device that captures images of the actions placed by the user, which may be mobile phones or cameras. In the obtained image, when the captured image frame contains the action placed by the user, subsequent action recognition can be performed, or a large number of image frames can be collected through the image acquisition device, and the image frame containing the action placed by the user can be selected. The image is used as an image frame for subsequent action recognition.

此外，在进行动作识别之前，需要告知用户能够识别的动作类型，比如支付场景中，能够识别用户摆放的手势有“1”和“2”，手势“1”表示支付金额，手势“2”表示放弃支付金额，此时支付平台能够识别的手势需要告知用户之后，才能够对用户摆放的手势进行识别，识别用户摆放的是手势“1”或者“2”，以此进行后续的操作，基于此，需要通过动作交互页面向用户展示用户需要摆放的动作，本实施例的一个或多个实施方式中，具体实现方式如下：In addition, before performing action recognition, the user needs to be informed of the types of actions that can be recognized. For example, in a payment scenario, the gestures that can be recognized by the user are "1" and "2". The gesture "1" indicates the payment amount, and the gesture "2" Indicates giving up the payment amount. At this time, the gestures that can be recognized by the payment platform need to be notified to the user, and then the gestures placed by the user can be recognized, and the gestures "1" or "2" placed by the user can be recognized for subsequent operations. , based on this, the action that the user needs to place needs to be displayed to the user through the action interaction page. In one or more implementations of this embodiment, the specific implementation is as follows:

实际应用中，所述动作交互页面具体是指通过终端设备向用户展示的页面，在此页面可以进行动作交互，即对用户摆放的动作进行识别，基于此，接收用户通过所述动作交互页面提交的点击指令，此时可以确定用户需要通过摆放相应的动作获得相应的服务，此时则根据所述点击指令向用户展示至少一个动作图像帧，其中，所述动作图像帧中包含展示动作对应的展示区域，所述展示动作即为告知用户需要摆放的动作，所述展示动作可以是手势动作或者肢体动作，相应的，所述展示区域即为动作图像帧中所述展示动作对应的区域。In practical applications, the action interaction page specifically refers to a page displayed to the user through the terminal device, and action interaction can be performed on this page, that is, the action placed by the user is recognized, and based on this, the user is received through the action interaction page. The click instruction submitted, at this time, it can be determined that the user needs to obtain the corresponding service by placing the corresponding action. At this time, at least one action image frame is displayed to the user according to the click instruction, wherein the action image frame contains the display action. Corresponding display area, the display action is the action that informs the user to be placed, and the display action can be a gesture action or a physical action. Correspondingly, the display area is the action corresponding to the display action in the action image frame. area.

例如，在小程序选择场景中，用户可以通过在小程序选择项对应的动作交互页面提交点击指令后，通过该动作交互页面向用户展示各个小程序对应的手势动作图像帧，当用户摆放与手势动作图像帧中的展示动作相同手势的情况下，可以从该页面跳转到相应的小程序页面，基于此，当用户提交点击指令后，根据点击指令向用户展示小程序a对应的动作图像帧a，在该动作图像帧中包含展示动作a；向用户展示小程序b对应的动作图像帧b，在该动作图像帧中包含展示动作b，以及向用户展示小程序c对应的动作图像帧c，在该动作图像帧中包含展示动作c，当用户通过手部摆放动作后，后续将对用户的手部动作进行识别，根据识别结果跳转到对应的小程序，或者识别结果与a，b，c三个动作都不一致的情况下，则不作任何处理。For example, in the applet selection scenario, after the user submits a click instruction on the action interaction page corresponding to the applet selection item, the gesture action image frame corresponding to each applet is displayed to the user through the action interaction page. If the display action in the gesture action image frame is the same gesture, you can jump from this page to the corresponding applet page. Based on this, after the user submits the click instruction, the action image corresponding to the applet a is displayed to the user according to the click instruction. Frame a, which includes the display action a in the action image frame; displays the action image frame b corresponding to the applet b to the user, includes the display action b in the action image frame, and shows the user the action image frame corresponding to the applet c c. The action image frame contains the display action c. After the user places the action by hand, the user's hand action will be recognized later, and the corresponding applet will be jumped according to the recognition result, or the recognition result will be the same as a. If the three actions of , b, and c are all inconsistent, no action will be taken.

在获取图像帧之前，向用户展示包含展示动作的展示区域的动作图像帧，实现了可以告知用户能够识别的动作类型，方便用户摆放处相应的动作，可以进行准确的动作识别，以提供相应的服务，有效提高用户的体验效果。Before acquiring the image frame, the action image frame containing the display area of the display action is displayed to the user, so that the user can be informed of the type of action that can be recognized, and it is convenient for the user to place the corresponding action, and accurate action recognition can be performed to provide corresponding services to effectively improve the user experience.

步骤104：对所述图像帧中的目标对象的动作区域进行分割处理，获得中间图像。Step 104: Perform segmentation processing on the action area of the target object in the image frame to obtain an intermediate image.

具体的，在上述获取图像帧的基础上，进一步的，将需要对所述图像帧进行图像分割处理，获得包含所述目标对象的动作区域的中间图像，再根据所述中间图像进行那你后续的动作识别过程。Specifically, on the basis of the above-mentioned acquisition of image frames, further, it is necessary to perform image segmentation processing on the image frames to obtain an intermediate image containing the action area of the target object, and then perform the following steps according to the intermediate image. action recognition process.

进一步的，在所述目标对象为手部的情况下，即为需要对手部对应的特征区域进行分割、裁剪或者包含手部的中间图像，用于后续的手部的动作类型识别，本实施例的一个或多个实施方式中，具体实现方式如下所述：Further, in the case where the target object is a hand, that is, the feature area corresponding to the hand needs to be segmented, cropped, or an intermediate image containing the hand, which is used for subsequent recognition of the action type of the hand. This embodiment In one or more embodiments, the specific implementation is as follows:

实际应用中，在获取到图像帧之后，需要对图像帧中的手部进行检测，当检测到所述图像帧中包含手部对应的特征区域之后，将按照所述特征区域多所述图像帧进行裁剪，即可获得包含手部特征的所述中间图像，用于后续的手部的动作类型识别。In practical applications, after the image frame is acquired, the hand in the image frame needs to be detected. When it is detected that the image frame contains the characteristic area corresponding to the hand, the image frame will be added according to the characteristic area. After cropping, the intermediate image containing the hand features can be obtained, which is used for the subsequent recognition of the action type of the hand.

步骤106：将所述中间图像输入动作识别模型进行关键点识别和坐标映射，获得识别出的动作关键点对应的动作组架节点，以及所述动作关键点映射的三维坐标信息。Step 106 : Input the intermediate image into an action recognition model to perform key point identification and coordinate mapping, and obtain an action frame node corresponding to the identified action key point and three-dimensional coordinate information of the action key point mapping.

具体的，在上述通过图像分割处理获得中间图像的基础上，进一步的，将对所述中间图像中的目标对象对应的动作关键点识别以及坐标映射，基于此，通过将所述中间图像输入至所述动作识别模型进行关键点识别和坐标映射，获得所述动作识别模型输出的所述中间图像中的目标对象对应的动作关键点，所述动作关键点对应的动作组架节点，以及所述动作关键点映射的三维坐标信息。Specifically, on the basis of obtaining the intermediate image through the image segmentation process, further, the key points of action corresponding to the target object in the intermediate image are identified and the coordinates are mapped, based on this, by inputting the intermediate image to the The action recognition model performs key point recognition and coordinate mapping, and obtains the action key points corresponding to the target object in the intermediate image output by the action recognition model, the action frame nodes corresponding to the action key points, and the 3D coordinate information for action key mapping.

实际应用中，所述动作关键点具体是指用户摆放的动作中能够识别出的关键点，所述各个动作关键点用于在后续进行目标对象的动作识别，所述动作组架节点具体是指在虚拟动作组架中包含的节点，相应的，所述虚拟动作组架可以理解为三维立体模型，三维立体模型是指在三维空间中的模型架构，通过三维坐标信息能够对该模型进行调整，以构建出所述三维虚拟对象，可以提高动作识别的准确率，三维虚拟对象即为与目标对象的动作一致的三维立体模型。In practical applications, the action key points specifically refer to the key points that can be identified in the actions placed by the user, and each action key point is used for subsequent action recognition of the target object, and the action grouping node is specifically: Refers to the nodes included in the virtual action frame. Correspondingly, the virtual action frame can be understood as a three-dimensional model. The three-dimensional model refers to the model structure in the three-dimensional space, and the model can be adjusted through the three-dimensional coordinate information. , so as to construct the three-dimensional virtual object, which can improve the accuracy of action recognition, and the three-dimensional virtual object is a three-dimensional three-dimensional model consistent with the action of the target object.

参见图2所示，在目标对象为手部的情况下，此时通过对图像帧进行图像分割处理获得图2中(a)所示的中间图像，其中包含手部对应的特征区域，此时将该图像输入到动作识别模型进行关键点识别和坐标映射，获得识别出的手部对应的动作关键点，而手部能够识别出的动作关键点有21个，这21个关键点的分布参见图2中(b)所示，通过对图2中(a)所示的中间图像进行关键点识别，获得了这21个动作关键点，分别是动作关键点0，动作关键点1…，以及动作关键点20，此时根据虚拟动作组架中包含的动作组架节点，参见图3所示，动作组架中包含的动作组架节点分别是动作组架节点a，动作组架节点b…，以及动作组架节点u，共21个动作组架节点，动作识别模型可以识别出各个动作关键点对应的动作组架节点，确定动作关键点0对应动作组架节点a，动作关键点1对应动作组架节点b…，以及动作关键点20对应动作组架节点u；Referring to Fig. 2, in the case where the target object is a hand, the intermediate image shown in Fig. 2 (a) is obtained by performing image segmentation processing on the image frame at this time, which includes the characteristic area corresponding to the hand. Input the image into the action recognition model for key point recognition and coordinate mapping, and obtain the action key points corresponding to the identified hand. There are 21 action key points that can be identified by the hand. For the distribution of these 21 key points, see As shown in (b) of Figure 2, by performing key point recognition on the intermediate image shown in (a) of Figure 2, the 21 action key points are obtained, which are action key point 0, action key point 1..., and Action key point 20, at this time, according to the action grouping nodes included in the virtual action grouping, as shown in Figure 3, the action grouping nodes included in the action grouping are respectively the action grouping node a, the action grouping node b... , and the action group node u, a total of 21 action group nodes, the action recognition model can identify the action group node corresponding to each action key point, and determine that the action key point 0 corresponds to the action group node a, and the action key point 1 corresponds to Action framing node b..., and action key point 20 corresponds to action framing node u;

基于此，为了能够提高后续进行动作识别的准确率，需要在三维空间中构建出三维虚拟对象，此时将确定所述各个动作关键点映射的三维坐标信息，后续将根据三维坐标信息以及动作关键点对应的动作组架节点生成所述三维虚拟对象。Based on this, in order to improve the accuracy of subsequent action recognition, it is necessary to construct a three-dimensional virtual object in three-dimensional space. At this time, the three-dimensional coordinate information mapped by the key points of each action will be determined. The three-dimensional virtual object is generated by the action framing node corresponding to the point.

本实施例的一个或多个实施方式中，所述动作识别模型进行关键点识别，包括：In one or more implementations of this embodiment, the action recognition model performs key point recognition, including:

本实施例的一个或多个实施方式中，所述动作识别模型进行坐标映射，包括：In one or more implementations of this embodiment, the motion recognition model performs coordinate mapping, including:

实际应用中，所述动作识别模型在进行关键点识别以及坐标映射的过程中，实则是在所述中间图像中识别出目标对象对应的各个动作关键点，同时确定所述动作关键点的关键点标签，再根据预先建立的关键点标签与动作组架节点的节点标签的对应关系，确定所述关键点标签对应的目标节点标签，最后根据所述目标节点标签即可确定所述关键点标签所属的动作关键点对应的所述动作组架节点；In practical applications, in the process of key point recognition and coordinate mapping, the action recognition model actually identifies each action key point corresponding to the target object in the intermediate image, and at the same time determines the key point of the action key point. label, and then determine the target node label corresponding to the key point label according to the pre-established correspondence between the key point label and the node label of the action grouping node, and finally determine the key point label according to the target node label to which the key point label belongs. The action frame node corresponding to the action key point of ;

基于此，在识别出所述各个动作关键点之后，同时确定所述动作关键点在所述中间图像中的位置信息，再基于所述位置信息映射出所述动作关键点对应的三维坐标信息，其中，所述动作关键点映射的三维坐标信息可以理解为根据所述位置信息为各个动作关键点配置深度坐标，从而映射出所述三维坐标信息。Based on this, after identifying each of the action key points, the position information of the action key point in the intermediate image is determined at the same time, and then the three-dimensional coordinate information corresponding to the action key point is mapped based on the position information, The three-dimensional coordinate information mapped by the action key point may be understood as configuring depth coordinates for each action key point according to the position information, so as to map the three-dimensional coordinate information.

参见图2所示，将图2中(a)所示的中间图像输入到动作识别模型之后，通过在中间图像中识别出手部对应的21个动作关键点，同时确定各个动作关键点的关键点标签分别是标签那0，标签1…，以及标签20，而动作组架节点的节点标签分别是标签a，标签b…，以及标签u，基于关键点标签与动作组架节点的节点标签的对应关系，确定各个关键点标签对应的节点标签，根据节点标签即可确定各个动作关键点对应的动作组架节点，分别是动作关键点0对应动作组架节点a，动作关键点1对应动作组架节点b…，以及动作关键点20对应动作组架节点u；Referring to Figure 2, after inputting the intermediate image shown in Figure 2 (a) into the action recognition model, 21 action key points corresponding to the hand are identified in the intermediate image, and the key points of each action key point are determined at the same time The labels are label that 0, label 1..., and label 20 respectively, and the node labels of the action building node are label a, label b..., and label u, respectively, based on the correspondence between the key point label and the node label of the action building node relationship, determine the node label corresponding to each key point label, according to the node label, you can determine the action frame node corresponding to each action key point, respectively, the action key point 0 corresponds to the action frame node a, and the action key point 1 corresponds to the action frame Node b..., and the action key point 20 corresponds to the action frame node u;

基于此，同时确定各个动作关键点在中间图像中的位置信息分别是动作关键点0对应的位置信息是(x1，y1)，动作关键点1对应的位置信息是(x2，y2)…，以及动作关键点20对应的位置信息是(x20，y20)，再基于位置信息为各个动作关键点映射出三维坐标信息，相应的，动作关键点0映射的三维坐标信息是(x1，y1，z1)，动作关键点1映射的三维坐标信息是(x2，y2，z2)…，以及动作关键点20映射的三维坐标信息是(x20，y20，z20)，以用于后续构建三维虚拟对象，即手部对应的三维手部模型。Based on this, it is also determined that the position information of each action key point in the intermediate image is that the position information corresponding to the action key point 0 is (x1, y1), the position information corresponding to the action key point 1 is (x2, y2)..., and The position information corresponding to the action key point 20 is (x20, y20), and then the three-dimensional coordinate information is mapped for each action key point based on the position information. Correspondingly, the three-dimensional coordinate information mapped by the action key point 0 is (x1, y1, z1) , the three-dimensional coordinate information mapped by the action key point 1 is (x2, y2, z2)..., and the three-dimensional coordinate information mapped by the action key point 20 is (x20, y20, z20), which is used for the subsequent construction of a three-dimensional virtual object, that is, a hand The corresponding three-dimensional hand model.

实际应用中，在根据所述位置信息映射出所述动作关键点对应的所述三维坐标信息的过程可以是：首先基于所述中间图像构建二维坐标系，并确定所述各个动作关键点在所述二维坐标系中对应的二维坐标信息，再确定所述二维坐标系与基于虚拟动作组架构建内的三维坐标系二者之间的变换关系，最后基于该变换关系即可确定所述二维坐标信息映射到三维坐标系中的三维坐标信息，将此三维坐标信息作为所述各个动作关键点映射的三维坐标信息即可。In practical applications, the process of mapping the three-dimensional coordinate information corresponding to the action key points according to the position information may be as follows: first, a two-dimensional coordinate system is constructed based on the intermediate image, and it is determined that each action key point is in The corresponding two-dimensional coordinate information in the two-dimensional coordinate system, and then determine the transformation relationship between the two-dimensional coordinate system and the three-dimensional coordinate system constructed based on the virtual action frame, and finally determine based on the transformation relationship. The two-dimensional coordinate information is mapped to the three-dimensional coordinate information in the three-dimensional coordinate system, and the three-dimensional coordinate information may be used as the three-dimensional coordinate information mapped by each action key point.

此外，所述目标对象的动作关键点可以根据实际应用场景进行设定，在设定完成动作关键点之后，通过动作识别模型只要识别出各个动作关键点，即可在后续构建目标对应的动作对应的三维虚拟对象，以进行动作识别。In addition, the action key points of the target object can be set according to the actual application scenario. After the action key points are set and completed, as long as each action key point is identified through the action recognition model, the action corresponding to the target can be subsequently constructed. 3D virtual objects for action recognition.

综上，通过所述动作识别模型识别出各个动作关键点对应的动作组架节点，以及映射的三维坐标信息，实现了在后续可以根据所述三维坐标信息在三维空间中构建出虚拟对象进行动作识别，避免发生因为角度问题而影响识别准确率的情况发生，可以有效的提高动作识别的准确率。In summary, through the action recognition model, the action frame node corresponding to each action key point and the mapped three-dimensional coordinate information are identified, so that virtual objects can be constructed in the three-dimensional space to perform actions in the follow-up according to the three-dimensional coordinate information. Recognition, avoiding the occurrence of the situation that affects the recognition accuracy due to the angle problem, can effectively improve the accuracy of action recognition.

步骤108：根据所述三维坐标信息、所述动作组架节点及其所属的虚拟动作组架生成三维虚拟对象。Step 108: Generate a three-dimensional virtual object according to the three-dimensional coordinate information, the action frame node and the virtual action frame to which it belongs.

具体的，在上述获得所述动作关键点对应的动作组架节点以及映射的三维坐标信息的基础上，进一步的，将根据所述三维坐标信息、所述动作组架节点及其所属的虚拟动作组架生成所述三维虚拟对象，所述三维虚拟对象具体是指与所述目标对象所摆放的动作一致的立体“模型”，通过生成所述三维虚拟对象可以避免因为动作中存在遮挡或者角度的问题，而影响准确率的情况发生。Specifically, on the basis of obtaining the action frame node corresponding to the action key point and the mapped three-dimensional coordinate information, further, according to the three-dimensional coordinate information, the action frame node and the virtual action to which it belongs The three-dimensional virtual object is generated by assembling the frame, and the three-dimensional virtual object specifically refers to a three-dimensional "model" that is consistent with the action placed by the target object. problem, and the situation that affects the accuracy occurs.

进一步的，在生成所述三维虚拟对象的过程中，需要确定所述三维坐标信息与动作组架节点的对应关系，才能够实现对所述虚拟组架节点进行调整，生成所述三维虚拟对象，本实施例的一个或多个实施方式中，具体实施方式如下：Further, in the process of generating the three-dimensional virtual object, it is necessary to determine the corresponding relationship between the three-dimensional coordinate information and the action frame node, so that the virtual frame node can be adjusted to generate the three-dimensional virtual object, In one or more implementations of this embodiment, the specific implementations are as follows:

实际应用中，在确定所述动作关键点对应的动作组架节点，以及所述动作关键点映射的三维坐标信息，即可确定所述动作组架节点与所述三维坐标信息的对应关系，同时根据所述虚拟动作组架可以确定所述动作组架节点的连接关系，即各个动作组架节点应该按照连接关系才能够组成虚拟动作组架；In practical applications, after determining the action frame node corresponding to the action key point and the three-dimensional coordinate information mapped by the action key point, the corresponding relationship between the action frame node and the three-dimensional coordinate information can be determined, and at the same time. According to the virtual action grouping, the connection relationship of the action grouping nodes can be determined, that is, each action grouping node should be able to form a virtual action grouping according to the connection relationship;

基于此，将所述三维坐标信息按照所述连接关系进行连接出来，即可生成与所述目标对象所摆放的动作匹配的三维虚拟对象。Based on this, the three-dimensional coordinate information is connected according to the connection relationship, so as to generate a three-dimensional virtual object matching the action placed by the target object.

沿用上例，确定所述动作关键点0对应动作组架节点a，动作关键点1对应动作组架节点b…动作关键点20对应动作组架节点u，以及动作关键点0映射的三维坐标信息是(x1，y1，z1)，动作关键点1映射的三维坐标信息是(x2，y2，z2)…，以及动作关键点20映射的三维坐标信息是(x20，y20，z20)的基础上，此时可以确定动作组架节点a对应三维坐标信息(x1，y1，z1)，动作组架节点b对应的三维坐标信息(x2，y2，z2)，动作组架节点u对应三维坐标信息(x20，y20，z20)；Following the above example, it is determined that the action key point 0 corresponds to the action frame node a, the action key point 1 corresponds to the action frame node b... The action key point 20 corresponds to the action frame node u, and the three-dimensional coordinate information mapped by the action key point 0 is (x1, y1, z1), the three-dimensional coordinate information mapped by the action key point 1 is (x2, y2, z2)..., and the three-dimensional coordinate information mapped by the action key point 20 is (x20, y20, z20) Based on, At this point, it can be determined that the three-dimensional coordinate information (x1, y1, z1) corresponding to the action group node a, the three-dimensional coordinate information (x2, y2, z2) corresponding to the action group node b, and the action group node u corresponding to the three-dimensional coordinate information (x20 , y20, z20);

进一步，在确定各个动作组架节点与三维坐标信息的对应关系的同时，根据所述虚拟动作组架可以确定动作组架节点的连接关系是(a，b，c，d，e)、(a，f，g，h，i)、(a，j，k，l，m)，(a，n，o，p，q)和(a，r，s，t，u)，则按照该连接关系对各个动作关键点映射的三维坐标信息进行连接处理，可以构建出出如图4中(a)所示的手部对应的骨架，再基于该连接关系构建出手部中掌心对应的区域，参见图4中(b)所示，由a，b，c，d，e和f构建掌心区域，此时可以确定与用户的手部摆放的手势对应的骨架初步完成，在经过渲染以及生成处理即可获得三维虚拟对象，即为与用户摆放的手势对应的立体模型如图4中(c)所示，之后对三维虚拟对象的动作进行识别即可确定用户摆放的手势的动作类型。Further, while determining the corresponding relationship between each action frame node and the three-dimensional coordinate information, it can be determined according to the virtual action frame that the connection relationship of the action frame nodes is (a, b, c, d, e), (a) , f, g, h, i), (a, j, k, l, m), (a, n, o, p, q) and (a, r, s, t, u), according to the connection The three-dimensional coordinate information mapped by the key points of each action is connected, and the skeleton corresponding to the hand as shown in (a) in Figure 4 can be constructed, and then the area corresponding to the center of the hand is constructed based on the connection relationship. Refer to As shown in (b) of Figure 4, the palm area is constructed by a, b, c, d, e and f. At this time, it can be determined that the skeleton corresponding to the gesture placed by the user's hand is preliminarily completed, and after rendering and generation processing The three-dimensional virtual object can be obtained, that is, the three-dimensional model corresponding to the gesture placed by the user is shown in (c) of FIG.

综上，在确定所述动作组架节点与三维虚拟对象的对应关系的同时，确定所述虚拟动作组架中各个动作组架节点的连接关系，按照该连接关系对所述三维坐标信息进行连接处理即可生成所述三维虚拟对象，实现了在三维空间中生成与目标对象的动作一致的立体“模型”，再针对三维虚拟对象进行动作识别，可以准确的识别出目标对象的动作类型，有效的提高了识别准确率。To sum up, while determining the corresponding relationship between the action frame node and the three-dimensional virtual object, the connection relationship of each action frame node in the virtual action frame is determined, and the three-dimensional coordinate information is connected according to the connection relationship. The three-dimensional virtual object can be generated by processing, and the three-dimensional "model" that is consistent with the action of the target object can be generated in the three-dimensional space, and then the action recognition of the three-dimensional virtual object can accurately identify the action type of the target object, effectively to improve the recognition accuracy.

此外，在生成所述三维虚拟对象的同时，由于是基于所述中间图像获得的三维坐标信息，可能存在一个动作关键点对应两个或两个以上三维坐标信息的情况，此时若根据所述三维坐标信息生成三维虚拟对象，将会存在两个或两个以上的可能性，比如手部的动作关键点20映射的三维坐标信息有(x20，y20，z20)和(x21，y21，z21)，此时将生成两个三维虚拟对象，分别是手部的小拇指扣向掌心，以及小拇指伸直，若继续进行手势识别，将存在识别出两种手势对应的动作类型，为了避免这一情况发生，可以对构建出的三维虚拟对象进行归一化处理，本实施例的一个或多个实施方式中，具体实现方式如下所述：In addition, when the 3D virtual object is generated, since it is based on the 3D coordinate information obtained from the intermediate image, there may be a situation where one action key point corresponds to two or more 3D coordinate information. There will be two or more possibilities for 3D coordinate information to generate 3D virtual objects. For example, the 3D coordinate information mapped by the action key point 20 of the hand is (x20, y20, z20) and (x21, y21, z21) , at this time, two 3D virtual objects will be generated, namely, the little finger of the hand is buckled to the palm, and the little finger is straightened. If you continue to perform gesture recognition, there will be two types of actions corresponding to the gestures. In order to avoid this from happening , the constructed 3D virtual object can be normalized. In one or more implementations of this embodiment, the specific implementation is as follows:

实际应用中，在生成所述三维虚拟对象的基础上，进一步的检测所述三维虚拟对象的对象数目，在所述对象数目大于预设数目阈值的情况下，此时说明生成的所述三维虚拟对象存在多个，首先需要对多个三维虚拟对象进行标准化处理，获得对象数目对应的待选择虚拟对象，之后在多个所述待选择虚拟对象中选择与目标对象的动作匹配度最高的待选择虚拟对象确定为目标虚拟对象，将所述目标虚拟对象作为所述三维虚拟对象进行后续的动作识别即可；在所述对象数目小于等于数目阈值的情况下，此时说明生成的所述三维虚拟对象存在一个，则执行下述步骤110即可。具体实施时，所述预设数目阈值即为1。In practical applications, on the basis of generating the three-dimensional virtual object, the number of objects of the three-dimensional virtual object is further detected, and in the case that the number of objects is greater than a preset number threshold, the generated three-dimensional virtual object There are multiple objects. First, it is necessary to standardize the multiple three-dimensional virtual objects to obtain the virtual objects to be selected corresponding to the number of objects, and then select the one to be selected with the highest degree of matching with the action of the target object among the multiple virtual objects to be selected. The virtual object is determined as the target virtual object, and the target virtual object is used as the three-dimensional virtual object for subsequent action recognition; in the case that the number of the objects is less than or equal to the number threshold, the generated three-dimensional virtual object is described at this time. If there is one object, the following step 110 can be executed. During specific implementation, the preset number threshold is 1.

例如，生成的与用户手部动作匹配的三维虚拟对象存在三个，分别是三维虚拟对象1，三维虚拟对象2以及三维虚拟对象3，其中，三维虚拟对象1的动作是大拇指和食指伸直，其他三个手指扣向掌心，三维虚拟对象2的动作是大拇指、食指和中指伸直，其他两个手指扣向掌心，三维虚拟对象3的动作是大拇指、食指和小拇指伸直，其他两个手指扣向掌心，此时可以确定存在三个手势；For example, there are three generated 3D virtual objects that match the user's hand movements, namely 3D virtual object 1, 3D virtual object 2 and 3D virtual object 3, wherein the action of 3D virtual object 1 is to straighten the thumb and forefinger , the other three fingers are buckled to the palm, the action of the 3D virtual object 2 is to straighten the thumb, index finger and middle finger, the other two fingers are buckled to the palm, the action of the 3D virtual object 3 is to straighten the thumb, index finger and little finger, other When two fingers are clasped to the palm, it can be determined that there are three gestures;

进一步的，可以通过计算三个手势分别与手部对应的中间图像中的手势的相似度，选择相似度最高的手势对应的三维虚拟对象确定为后续进行识别的三维虚拟对象，通过计算确定三维虚拟对象1与中间图像中的手势相似度最高，则将三维虚拟对象1作为目标三维虚拟对象进行后续的手势识别。Further, by calculating the similarity between the three gestures and the gestures in the intermediate images corresponding to the hands, the three-dimensional virtual object corresponding to the gesture with the highest similarity is selected and determined as the three-dimensional virtual object for subsequent identification, and the three-dimensional virtual object is determined by calculation. Object 1 has the highest similarity to the gesture in the intermediate image, then the 3D virtual object 1 is used as the target 3D virtual object to perform subsequent gesture recognition.

实际应用中，对所述三维虚拟对象进行标准化处理，具体是指通过旋转、缩放、平移等方式使得多个三维虚拟对象的面向方向相同，此时标准化处理后三维虚拟对象更加方便选择出目标虚拟对象作为所述三维虚拟对象。In practical applications, standardizing the three-dimensional virtual objects specifically refers to making multiple three-dimensional virtual objects face the same direction by means of rotation, scaling, translation, etc. In this case, the standardized three-dimensional virtual objects are more convenient to select the target virtual object. object as the three-dimensional virtual object.

综上，在生成所述三维虚拟对象后，若存在多个三维虚拟对象的情况下，可以通过标准化处理获得待选择虚拟对象，再从所述待选择虚拟对象中选择出目标虚拟对象作为三维虚拟对象进行后续的识别，可以更进一步的提高动作识别的准确率。To sum up, after the three-dimensional virtual object is generated, if there are multiple three-dimensional virtual objects, the virtual object to be selected can be obtained through standardization processing, and then the target virtual object is selected from the virtual objects to be selected as the three-dimensional virtual object. Subsequent object recognition can further improve the accuracy of action recognition.

步骤110：基于动作识别数据集对所述三维虚拟对象进行动作识别，确定所述目标对象的动作类型。Step 110: Perform motion recognition on the three-dimensional virtual object based on the motion recognition data set, and determine the motion type of the target object.

具体的，在上述生成所述三维虚拟对象的基础上，将根据所述动作识别数据集对所述三维虚拟对象进行动作识别，从而确定所述目标对象的动作类型，其中，所述动作识别数据集是由服务端下发的动作识别包组成，所述动作识别包中包含对所述三维虚拟对象进行识别的识别规则；所述服务端具体是指与动作识别相关的平台所属的服务端。Specifically, on the basis of the above-mentioned generation of the 3D virtual object, motion recognition is performed on the 3D virtual object according to the motion recognition data set, so as to determine the motion type of the target object, wherein the motion recognition data The set is composed of an action identification package issued by the server, and the action identification package contains identification rules for identifying the three-dimensional virtual object; the server specifically refers to the server to which the platform related to action identification belongs.

实际应用中，所述动作识别数据集由多个动作识别包组成，并且每个动作识别包中均包含至少一种动作识别规则，每种动作识别规则将能够识别一种动作类型，例如，动作识别规则为判断各个指尖是否处于手掌区域，以及判断各个指尖是否伸直的规则；基于该动作识别规则对三维虚拟对象进行动作识别，确定三维虚拟对象的大拇指和食指伸直，其他三个手指的指尖处于掌心，确定此时三维虚拟对象的动作类型是数字类型“8”，则可以确定用户的手势是“8”。In practical applications, the action recognition data set is composed of multiple action recognition packages, and each action recognition package contains at least one action recognition rule, and each action recognition rule will be able to recognize an action type, for example, an action The recognition rules are rules for judging whether each fingertip is in the palm area and whether each fingertip is straight; based on the action recognition rule, the action recognition of the three-dimensional virtual object is performed, and the thumb and index finger of the three-dimensional virtual object are determined to be straight, and the other three are straightened. The fingertips of each of the fingers are in the palm, and it is determined that the action type of the three-dimensional virtual object at this time is the number type "8", then it can be determined that the user's gesture is "8".

进一步的，对所述动作类型进行识别的过程中，需要根据所述动作识别数据集才能够实现，本实施例的一个或多个实施方式中，具体实现方式如下所述：Further, in the process of identifying the action type, it needs to be realized according to the action identification data set. In one or more implementations of this embodiment, the specific implementation is as follows:

实际应用中，首先需要对所述识别数据进行解析，获得动作要素，所述动作要素具体是指对所述三维虚拟对象进行动作识别的要素，即为对所述三维虚拟对象的动作类型进行判断的条件，根据所述动作要素对所述三维虚拟对象进行动作识别，此时可以根据所述动作识别结果确定所述三维虚拟对象的中间动作类型，之后将所述中间动作类型确定为所述目标对象的动作类型即可。In practical applications, it is first necessary to analyze the identification data to obtain action elements. The action elements specifically refer to the elements for performing action recognition on the three-dimensional virtual object, that is, to judge the action type of the three-dimensional virtual object. The condition of performing motion recognition on the 3D virtual object according to the motion element, at this time, the intermediate motion type of the 3D virtual object can be determined according to the motion recognition result, and then the intermediate motion type is determined as the target The action type of the object is sufficient.

更进一步的，基于所述动作要素对所述三维虚拟对象进行动作识别确定所述三维虚拟对象的中间动作类型的过程中，第一方面可以通过计算三维虚拟对象的各个虚拟子对象的动作角度实现，第二方面可以通过计算三维虚拟对象中的各个虚拟子对象的位置实现，本实施例的一个或多个实施方式中，第一方面确定中间动作类型的过程包括：Further, in the process of performing motion recognition on the 3D virtual object based on the action elements to determine the intermediate action type of the 3D virtual object, the first aspect can be achieved by calculating the action angle of each virtual sub-object of the 3D virtual object. , the second aspect can be implemented by calculating the position of each virtual sub-object in the three-dimensional virtual object. In one or more implementations of this embodiment, the process of determining the intermediate action type in the first aspect includes:

实际应用中，所述虚拟子对象具体是指组成所述三维虚拟对象的子单元，比如目标对象为手部的情况下，生成的三维虚拟对象即为三维手部对象，此时虚拟子对象即为三维手部对象中的各个手指以及掌部；或者目标对象为肢体的情况下，生成的三维虚拟对象即为三维肢体对象，此时虚拟子对象即为三维肢体对象中的各个肢体以及身躯。In practical applications, the virtual sub-object specifically refers to the sub-units that make up the three-dimensional virtual object. For example, when the target object is a hand, the generated three-dimensional virtual object is the three-dimensional hand object, and the virtual sub-object is the are each finger and palm in the 3D hand object; or when the target object is a limb, the generated 3D virtual object is the 3D limb object, and the virtual sub-objects are each limb and body in the 3D limb object.

基于此，通过计算所述三维虚拟对象中各个虚拟子对象之间的动作角度，确定各个虚拟子对象存在的姿势，之后基于所述动作要素对所述动作角度进行检测，即可根据检测结果确定所述三维虚拟对象的所述中间动作类型。Based on this, by calculating the action angle between each virtual sub-object in the three-dimensional virtual object, the posture of each virtual sub-object is determined, and then the action angle is detected based on the action element, which can be determined according to the detection result. the intermediate action type of the three-dimensional virtual object.

例如，三维虚拟对象是基于手部构建的立体模型，此时虚拟子对象即为手部的各个手指以及掌心，通过计算各个手指间与掌心的第一动作角度，以及各个手指间的第二动作角度，确定小拇指、无名指以及中指与掌心的角度是0，大拇指与食指的角度是90度，此时根据动作要素对动作角度进行检测，确定在小拇指、无名指以及中指与掌心的角度是0，大拇指与食指的角度是90度的情况下，此时的手势与手势类型“8”的动作要素相同，则根据检测结果确定三维虚拟对象的中间动作类型是手势类型“8”。For example, the 3D virtual object is a three-dimensional model based on the hand. At this time, the virtual sub-objects are the fingers and the palm of the hand. By calculating the first action angle between the fingers and the palm, and the second action between the fingers Angle, determine that the angle between the little finger, the ring finger, the middle finger and the palm is 0, and the angle between the thumb and the index finger is 90 degrees. At this time, the action angle is detected according to the action elements, and it is determined that the angle between the little finger, the ring finger, and the middle finger and the palm is 0. When the angle between the thumb and the index finger is 90 degrees, and the gesture at this time has the same action element as the gesture type "8", the intermediate action type of the 3D virtual object is determined to be the gesture type "8" according to the detection result.

本实施例的一个或多个实施方式中，第二方面确定中间动作类型的过程包括：In one or more implementations of this embodiment, the process of determining the intermediate action type in the second aspect includes:

实际应用中，所述动作位置具体是指各个虚拟子对象所处的位置，比如三维虚拟对象是根据手部生成的情况下，此时虚拟子对象即为手部的各个手指以及掌心，确定小拇指、无名指以及中指的动作位置是处于掌心，大拇指与食指的动作位置是处于掌心外，此时根据动作要素对各个动作位置进行检测，确定在小拇指、无名指以及中指的动作位置是处于掌心，大拇指与食指的动作位置是处于掌心外的情况下，此时的手势与手势类型“8”的动作要素相同，则根据检测结果确定三维虚拟对象的中间动作类型是手势类型“8”。In practical applications, the action position specifically refers to the position of each virtual sub-object. For example, when the three-dimensional virtual object is generated based on the hand, the virtual sub-object is each finger and the palm of the hand, and the little finger is determined. , The action position of the ring finger and the middle finger is in the palm, the action position of the thumb and the index finger is outside the palm, at this time, each action position is detected according to the action elements, and it is determined that the action position of the little finger, the ring finger and the middle finger is in the palm, the big If the action positions of the thumb and forefinger are outside the palm, the gesture at this time has the same action element as the gesture type "8", then the intermediate action type of the 3D virtual object is determined to be gesture type "8" according to the detection result.

综上，通过在确定所述三维虚拟对象的中间动作类型的过程中，可以通过计算三维虚拟对象中的各个虚拟子对象的动作角度或者动作位置检测出所述三维虚拟对象的中间动作类型，有效的提高了对动作类型进行识别的准确率。To sum up, in the process of determining the intermediate action type of the three-dimensional virtual object, the intermediate action type of the three-dimensional virtual object can be detected by calculating the action angle or action position of each virtual sub-object in the three-dimensional virtual object, which is effective. The accuracy of the recognition of action types is improved.

此外，在通过动作交互页面向用户展示动作图像帧的基础上，此时进行动作识别，需要根据所述动作图像帧中的展示动作相关的识别规则对所述目标对象的动作进行识别，从而判断出用户摆放的动作是否正确，正确的情况下说明动作类型与展示动作的动作类型相同，则进行其他的处理即可，本实施例的一个或多个实施方式中，具体实现方式如下所述：In addition, on the basis of displaying the action image frame to the user through the action interaction page, to perform action recognition at this time, it is necessary to recognize the action of the target object according to the recognition rules related to the displayed action in the action image frame, so as to determine Check whether the action placed by the user is correct. If it is correct, it means that the action type is the same as the action type of the displayed action, then other processing can be performed. In one or more implementations of this embodiment, the specific implementation is as follows :

若是，根据所述动作识别数据集确定所述展示动作的展示动作类型，并将所述展示动作类型作为所述动作类型；If so, determine the presentation action type of the presentation action according to the action recognition data set, and use the presentation action type as the action type;

若否，则不作任何处理；If not, no processing will be made;

进一步的，在将所述展示动作类型作为所述动作类型后，可以通过所述动作交互页面向所述用户展示与所述动作类型匹配的推荐信息。Further, after the displayed action type is used as the action type, recommendation information matching the action type may be displayed to the user through the action interaction page.

实际应用中，在接收到所述服务端针对所述动作图像帧下发的所述动作识别数据集的情况下，根据所述动作识别数据中携带有与所述展示动作匹配的动作识别规则，判断所述三维虚拟对象的动作与所述展示动作是否匹配；若匹配，说明用户通过目标对象摆放的动作与展示动作相同，此时将所述展示动作类型作为所述动作类型即可，若不匹配，说明用户通过目标对象摆放的动作与展示动作不相同，则需要用户重新进行动作摆放，直至摆放正确后，将展示动作类型作为所述动作类型；In practical application, in the case of receiving the motion recognition data set issued by the server for the motion image frame, according to the motion recognition data that carries the motion recognition rule matching the display action, Determine whether the action of the three-dimensional virtual object matches the display action; if it matches, it means that the action placed by the user through the target object is the same as the display action. At this time, the display action type can be used as the action type. If it does not match, it means that the action placed by the user through the target object is different from the display action, and the user needs to re-place the action until the placement is correct, and the display action type is used as the action type;

基于此，此时可以确定用户摆放的动作类型是正确的，则通过所述动作交互页面向所述用户展示与所述动作类型匹配的推荐信息即可。Based on this, at this time, it can be determined that the action type placed by the user is correct, and the recommendation information matching the action type can be displayed to the user through the action interaction page.

例如，在支付场景中，当用户通过手势摆放“1”的情况下则表示进行支付，当用户通过手势摆放“2”的情况下则表示放弃支付，通过手机端向用户展示两种手势以及两种手势的含义，通过手机端的摄像头采集到用户摆放的手势，并生成相应的三维虚拟对象，此时根据服务端下发的识别数据集中的动作识别规则(该动作识别规则与手势“1”与手势“2”对应)，判断三维虚拟对象的动作与手势“1”或手势“2”是否匹配；For example, in the payment scenario, when the user places "1" by gesture, it means to make payment, and when the user places "2" by gesture, it means to give up the payment, and two gestures are displayed to the user through the mobile phone. and the meaning of the two gestures, the gestures placed by the user are collected through the camera on the mobile phone, and the corresponding three-dimensional virtual objects are generated. 1" corresponds to gesture "2"), to determine whether the action of the three-dimensional virtual object matches gesture "1" or gesture "2";

当与手势“1”匹配的情况下，此时确定用户的手势类型是“1”表示进行支付，将手势类型“1”确定为用户的手势类型，并通过支付页面向用户展示支付成功的提醒信息；当与手势“2”匹配的情况下，此时确定用户的手势类型是“2”表示放弃支付，将手势类型“2”确定为用户的手势类型，并通过支付页面向用户展示放弃支付的提醒信息；When it matches the gesture "1", it is determined that the user's gesture type is "1" to indicate payment, the gesture type "1" is determined as the user's gesture type, and a payment successful reminder is displayed to the user through the payment page. Information; when it matches the gesture "2", it is determined that the user's gesture type is "2" to indicate abandonment of payment, the gesture type "2" is determined as the user's gesture type, and the payment page is displayed to the user to abandon payment reminder information;

当与手势“1”和手势“2”都不匹配的情况下，此时确定用户通过手部摆放的手势存在错误，无法正确识别出手势类型，则通过支付页面向用户展示重新进行手势识别的提醒信息即可。When it does not match with gesture "1" and gesture "2", it is determined that there is an error in the gesture placed by the user through the hand, and the gesture type cannot be correctly recognized, and the user will be shown the gesture recognition again through the payment page. reminder information.

综上，在对所述目标对象的动作进行识别完成之后，将根据所述动作类型向用户展示相应的推荐信息，并且在识别不出的情况下，会通过动作交互页面向用户展示提醒信息，有效的提高了用户的体验效果，并且方便用户获取相关的推荐信息。To sum up, after the recognition of the action of the target object is completed, the corresponding recommendation information will be displayed to the user according to the type of the action, and if it cannot be recognized, the reminder information will be displayed to the user through the action interaction page. The user experience effect is effectively improved, and it is convenient for the user to obtain relevant recommendation information.

此外，在通过动作交互页面向用户展示动作图像帧的基础上，在识别出所述动作类型的情况下，将对所述动作类型与展示动作的动作类型进行匹配，根据匹配结果展示对应的提醒信息，本实施例的一个或多个实施方式中，具体实现方式如下所述：In addition, on the basis of displaying the action image frame to the user through the action interaction page, in the case of identifying the action type, the action type will be matched with the action type of the displayed action, and the corresponding reminder will be displayed according to the matching result. information, in one or more implementations of this embodiment, the specific implementation is as follows:

实际应用中，所述动作策略具体是指提醒用户如何正确的摆放动作才能够成功。In practical applications, the action strategy specifically refers to reminding the user how to correctly place the action to be successful.

例如，在小程序选择场景中，通过手机端的摄像头采集到用户摆放的手势，并生成相应的三维虚拟对象，根据动作识别数据集识别出三维虚拟对象的动作类型是手势“2”，确定用户摆放的手势是“2”，当小程序选择场景中存在与手势“2”相应的展示动作的情况下，此时将通过手机中的动作交互页面向用户展示成功匹配提醒，并跳转至手势“2”对应的小程序，当小程序选择场景中未存在与手势“2”相应的展示动作的情况下，此时将通过手机中的动作交互页面向用户展示匹配失败提醒，并向用户展示小程序中存在的手势类型，便于用户进行摆放动作。For example, in the applet selection scene, the gestures placed by the user are collected through the camera on the mobile phone, and the corresponding 3D virtual objects are generated. According to the action recognition data set, the action type of the 3D virtual object is identified as gesture "2", and the user is determined. The gesture placed is "2". When there is a display action corresponding to gesture "2" in the selection scene of the applet, the user will be shown a successful matching reminder through the action interaction page in the mobile phone, and jump to For the applet corresponding to gesture "2", when there is no display action corresponding to gesture "2" in the applet selection scene, a matching failure reminder will be displayed to the user through the action interaction page in the mobile phone, and the user will be notified to the user. Displays the types of gestures that exist in the applet, which is convenient for users to place actions.

本实施例提供的动作识别方法，通过对用户摆放的动作进行识别的过程中，对获取到的图像帧进行图像分割处理，获得包含目标对象的中间图像，再将所述中间图像输入到动作识别模型，获得动作组架节点以及三位坐标信息，根据所述三维坐标信息、所述动作组架节点及其所属的虚拟动作组架生成所述三维虚拟对象，实现了可以通过生成的三维虚拟对象进行动作类型的识别，有效的解决不同图像采集角度导致图像帧采集不够标准而影响动作类型识别准确率的问题，同时根据所述动作识别数据集对所述三维虚拟对象进行动作识别，确定所述目标对象的动作类型，使得在动作识别的场景中具有更加通用、灵活以及更好的拓展性，能够实现在添加新动作类型的情况下，无需对模型进行重新训练，通过增加动作识别数据集的方式即可实现对新动作类型的识别，使得应用场景变的更加广泛。In the action recognition method provided in this embodiment, in the process of recognizing the action placed by the user, image segmentation is performed on the acquired image frame to obtain an intermediate image including the target object, and then the intermediate image is input into the action Identify the model, obtain the action frame node and three-dimensional coordinate information, and generate the three-dimensional virtual object according to the three-dimensional coordinate information, the action frame node and the virtual action frame to which it belongs, and realize the three-dimensional virtual object that can be generated by The object recognizes the action type, which effectively solves the problem that the image frame collection is not standard enough due to different image acquisition angles, which affects the accuracy of the action type recognition. The action type of the target object is described, which makes it more versatile, flexible and better expandable in the action recognition scene. It can realize that in the case of adding a new action type, there is no need to retrain the model. By adding action recognition data sets In this way, the recognition of new action types can be realized, making the application scenarios more extensive.

下述结合附图5，以本说明书提供的动作识别方法在手势识别场景中的应用为例，对所述动作识别方法进行进一步说明。其中，图5示出了本说明书一实施例提供的一种应用于手势识别场景中的动作识别方法的处理流程图，具体包括以下步骤：The action recognition method is further described below by taking the application of the action recognition method provided in this specification in a gesture recognition scene as an example with reference to FIG. 5 . 5 shows a processing flow chart of an action recognition method applied in a gesture recognition scene provided by an embodiment of the present specification, which specifically includes the following steps:

步骤502：接收用户通过动作交互页面提交的点击指令。Step 502: Receive a click instruction submitted by the user through the action interaction page.

具体的，支付平台为方便用户使用相应的支付软件，提高用户的便捷性，为用户提供的手势识别服务，用户可以通过动作交互页面摆放手势动作的方式进行相应的支付操作，当用户手势摆放是“1”的情况下，可以向用户展示支付账户余额，当用户手势摆放是“2”的情况下，可以向用户展示支付详情，当用户手势摆放是……，即不同的手势对应不同的支付信息；Specifically, in order to facilitate users to use the corresponding payment software and improve the convenience of users, the payment platform provides gesture recognition services for users. Users can perform corresponding payment operations by placing gesture actions on the action interaction page. When the user gestures In the case of "1", the payment account balance can be shown to the user, when the user's gesture is "2", the payment details can be displayed to the user, when the user's gesture is ..., that is, different gestures Corresponding to different payment information;

基于此，当用户提交点击指令的情况下，此时需要对用户的手势进行识别，以向用户展示相应的支付信息。Based on this, when the user submits a click instruction, the user's gesture needs to be recognized at this time, so as to display the corresponding payment information to the user.

步骤504：根据点击指令通过动作交互页面展示动作图像帧，动作图像帧中包含展示动作对应的展示区域。Step 504: Display the action image frame through the action interaction page according to the click instruction, and the action image frame includes a display area corresponding to the display action.

具体的，动作图像帧中包含与手势“1”和手势“2”对应的展示区域，通过动作交互页面告知用户可以摆放的手势，便于用户正确进行手势的摆放以及后续的手势识别。Specifically, the action image frame includes display areas corresponding to gesture "1" and gesture "2", and the user is informed of the gestures that can be placed through the action interaction page, so that the user can correctly place gestures and subsequent gesture recognition.

步骤506：获取手机采集到的图像帧。Step 506: Acquire image frames collected by the mobile phone.

步骤508：对图像帧中的手势的特征区域进行分割处理，获得包含手势的中间图像。Step 508: Perform segmentation processing on the feature area of the gesture in the image frame to obtain an intermediate image including the gesture.

步骤510：将中间图像输入动作识别模型进行关键点识别和坐标映射，获得识别出的手势关键点对应的手势组架节点，以及手势关键点映射的三维坐标信息。Step 510: Input the intermediate image into the action recognition model to perform key point recognition and coordinate mapping, and obtain the gesture frame nodes corresponding to the identified gesture key points and the three-dimensional coordinate information of the gesture key point mapping.

具体的，进行关键点识别和坐标映射可以参见上述实施例中相关的描述内容，本实施例在此不作过多赘述。Specifically, for key point identification and coordinate mapping, reference may be made to the relevant descriptions in the foregoing embodiments, and details are not described here in this embodiment.

步骤512：根据手势关键点确定手势组架节点与三维坐标信息的对应关系，以及基于虚拟手势组架确定手势组架节点的连接关系。Step 512 : Determine the correspondence between the gesture framing nodes and the three-dimensional coordinate information according to the gesture key points, and determine the connection relationship between the gesture framing nodes based on the virtual gesture framing.

步骤514：将三维坐标信息按照连接关系进行连接处理，获得多个三维虚拟手势。Step 514: Perform connection processing on the three-dimensional coordinate information according to the connection relationship to obtain a plurality of three-dimensional virtual gestures.

步骤516：将多个三维虚拟手势进行标准化处理，并根据标准化处理结果确定目标三维虚拟手势。Step 516: Standardize the multiple three-dimensional virtual gestures, and determine the target three-dimensional virtual gesture according to the standardization processing result.

步骤518：接收服务端针对动作图像帧下发的手势识别规则。Step 518: Receive the gesture recognition rules issued by the server for the action image frame.

步骤520：根据手势识别规则对目标三维虚拟手势进行手势识别，确定用户的手势类型。Step 520: Perform gesture recognition on the target three-dimensional virtual gesture according to the gesture recognition rule, and determine the gesture type of the user.

步骤522：判断手势类型是否与展示动的手势类型匹配；若否，执行步骤524；若是，执行步骤526。Step 522 : determine whether the gesture type matches the gesture type of the display motion; if not, go to step 524 ; if yes, go to step 526 .

步骤524：向用户展示提醒信息。Step 524: Display the reminder information to the user.

步骤526：向用户展示与手势类型匹配的推荐信息。Step 526: Show the user recommended information matching the gesture type.

具体的，根据手势识别规则对三维虚拟手势进行手势识别，确定用户的手势类型是“1”的情况下，则向用户展示支付账户余额，而当确定用户的手势与展示动作的手势类型未匹配的情况下，说明用户摆放的手势存在问题，则向用户展示提醒信息，提醒用户重新进行手势的摆放。Specifically, gesture recognition is performed on the three-dimensional virtual gesture according to the gesture recognition rules, and if it is determined that the user's gesture type is "1", the payment account balance is displayed to the user, and when it is determined that the user's gesture does not match the gesture type of the displayed action In the case of , indicating that there is a problem with the gesture placed by the user, a reminder message is displayed to the user to remind the user to place the gesture again.

本实施例提供的动作识别方法，实现了可以通过生成的三维虚拟对象进行动作类型的识别，有效的解决不同图像采集角度导致图像帧采集不够标准而影响动作类型识别准确率的问题，同时根据所述动作识别数据集对所述三维虚拟对象进行动作识别，确定所述目标对象的动作类型，使得在动作识别的场景中具有更加通用、灵活以及更好的拓展性，能够实现在添加新动作类型的情况下，无需对模型进行重新训练，通过增加动作识别数据集的方式即可实现对新动作类型的识别，使得应用场景变的更加广泛。The action recognition method provided in this embodiment realizes the recognition of action types through the generated three-dimensional virtual objects, effectively solves the problem that the image frame acquisition is not standard enough due to different image acquisition angles and affects the accuracy of action type recognition. The action recognition data set performs action recognition on the three-dimensional virtual object, and determines the action type of the target object, which makes it more versatile, flexible and better expansibility in the action recognition scene, and can realize the addition of new action types. In this case, there is no need to retrain the model, and the recognition of new action types can be realized by adding action recognition data sets, making the application scenarios more extensive.

与上述方法实施例相对应，本说明书还提供了动作识别装置实施例，图6示出了本说明书一实施例提供的一种动作识别装置的结构示意图。如图6所示，该装置包括：Corresponding to the above method embodiments, the present specification also provides an embodiment of a motion recognition apparatus, and FIG. 6 shows a schematic structural diagram of a motion recognition apparatus provided by an embodiment of the present specification. As shown in Figure 6, the device includes:

获取模块602，被配置为获取图像采集设备采集的图像帧；an acquisition module 602, configured to acquire image frames acquired by the image acquisition device;

处理模块604，被配置为对所述图像帧中的目标对象的动作区域进行分割处理，获得中间图像；The processing module 604 is configured to perform segmentation processing on the action area of the target object in the image frame to obtain an intermediate image;

识别模块606，被配置为将所述中间图像输入动作识别模型进行关键点识别和坐标映射，获得识别出的动作关键点对应的动作组架节点，以及所述动作关键点映射的三维坐标信息；The identification module 606 is configured to input the intermediate image into the action recognition model to perform key point identification and coordinate mapping, and obtain the action frame node corresponding to the identified action key point, and the three-dimensional coordinate information of the action key point mapping;

生成模块608，被配置为根据所述三维坐标信息、所述动作组架节点及其所属的虚拟动作组架生成三维虚拟对象；The generating module 608 is configured to generate a three-dimensional virtual object according to the three-dimensional coordinate information, the action framing node and the virtual action framing node to which it belongs;

确定模块610，被配置为基于动作识别数据集对所述三维虚拟对象进行动作识别，确定所述目标对象的动作类型。The determining module 610 is configured to perform motion recognition on the three-dimensional virtual object based on the motion recognition data set, and determine the motion type of the target object.

一个可选的实施例中，所述识别模块606，包括：In an optional embodiment, the identifying module 606 includes:

确定关键点标签单元，被配置为在所述中间图像中识别出所述目标对象对应的所述动作关键点，并确定所述动作关键点的关键点标签；determining a key point labeling unit, configured to identify the action key point corresponding to the target object in the intermediate image, and determine the key point label of the action key point;

确定节点标签单元，被配置为基于关键点标签与动作组架节点的节点标签的对应关系，确定所述关键点标签对应的目标节点标签；determining a node label unit, configured to determine the target node label corresponding to the key point label based on the corresponding relationship between the key point label and the node label of the action grouping node;

确定动作组架节点单元，被配置为根据所述目标节点标签确定所述关键点标签所属的动作关键点对应的所述动作组架节点。A unit for determining an action building node is configured to determine, according to the target node label, the action building node corresponding to the action key point to which the key point label belongs.

确定位置信息单元，被配置为确定所述动作关键点在所述中间图像中的位置信息；a location information determining unit configured to determine location information of the action key point in the intermediate image;

映射坐标信息单元，被配置为基于所述位置信息映射出所述动作关键点对应的所述三维坐标信息。The mapping coordinate information unit is configured to map the three-dimensional coordinate information corresponding to the action key point based on the position information.

一个可选的实施例中，所述生成模块608，包括：In an optional embodiment, the generating module 608 includes:

确定连接关系单元，被配置为根据所述动作关键点确定所述动作组架节点与所述三维坐标信息的对应关系，以及基于所述虚拟动作组架确定所述动作组架节点的连接关系；determining a connection relationship unit, configured to determine the corresponding relationship between the action frame node and the three-dimensional coordinate information according to the action key point, and determine the connection relationship of the action frame node based on the virtual action frame;

生成三维虚拟对象单元，被配置为将所述三维坐标信息按照所述连接关系进行连接处理，生成所述三维虚拟对象。A three-dimensional virtual object generating unit is configured to perform connection processing on the three-dimensional coordinate information according to the connection relationship to generate the three-dimensional virtual object.

一个可选的实施例中，所述动作识别装置，还包括：In an optional embodiment, the motion recognition device further includes:

检测模块，被配置为检测生成的所述三维虚拟对象的对象数目；a detection module, configured to detect the number of objects of the generated three-dimensional virtual object;

标准化处理模块，被配置为在所述对象数目大于预设数目阈值的情况下，对所述对象数目对应的多个三维虚拟对象进行标准化处理，获得待选择虚拟对象；a standardization processing module, configured to perform standardization processing on a plurality of three-dimensional virtual objects corresponding to the number of objects when the number of objects is greater than a preset number threshold to obtain virtual objects to be selected;

选择模块，被配置为在所述待选择虚拟对象中选择目标虚拟对象作为所述三维虚拟对象。The selection module is configured to select a target virtual object from the virtual objects to be selected as the three-dimensional virtual object.

接收指令模块，被配置为接收用户通过动作交互页面提交的点击指令；an instruction receiving module, configured to receive a click instruction submitted by the user through the action interaction page;

展示模块，被配置为根据所述点击指令向所述用户展示至少一个动作图像帧；所述动作图像帧中包含展示动作对应的展示区域。The display module is configured to display at least one action image frame to the user according to the click instruction; the action image frame includes a display area corresponding to the display action.

一个可选的实施例中，所述确定模块610，包括：In an optional embodiment, the determining module 610 includes:

接收单元，被配置为接收服务端针对所述动作图像帧下发的所述动作识别数据集；所述动作识别数据中携带有与所述展示动作匹配的动作识别规则；a receiving unit, configured to receive the motion recognition data set issued by the server for the motion image frame; the motion recognition data carries the motion recognition rule matching the display action;

判断单元，被配置为根据所述动作识别规则，判断所述三维虚拟对象的动作与所述展示动作是否匹配；a judgment unit, configured to judge whether the action of the three-dimensional virtual object matches the display action according to the action recognition rule;

若是，运行确定动作类型单元；If so, run the determine action type unit;

所述确定动作类型单元，被配置为根据所述动作识别数据集确定所述展示动作的展示动作类型，并将所述展示动作类型作为所述动作类型。The determining action type unit is configured to determine a presentation action type of the presentation action according to the action recognition data set, and use the presentation action type as the action type.

展示推荐信息模块，被配置为通过所述动作交互页面向所述用户展示与所述动作类型匹配的推荐信息。A display recommendation information module is configured to display recommendation information matching the action type to the user through the action interaction page.

匹配模块，被配置为将所述动作类型与所述展示动作的展示动作类型进行匹配；a matching module configured to match the action type with the presentation action type of the presentation action;

若匹配成功，则运行第一展示信息模块，所述第一展示信息模块，被配置为通过所述动作交互页面向所述用户展示成功匹配信息；If the matching is successful, the first display information module is executed, and the first display information module is configured to display the successful matching information to the user through the action interaction page;

若匹配失败，则运行第二展示信息模块，所述第二展示信息模块，被配置为通过所述动作交互页面向所述用户展示提醒信息，所述提醒信息中携带有动作策略。If the matching fails, a second display information module is executed, and the second display information module is configured to display reminder information to the user through the action interaction page, and the reminder information carries an action strategy.

一个可选的实施例中，在所述目标对象为手部的情况下，所述处理模块604，包括：In an optional embodiment, when the target object is a hand, the processing module 604 includes:

检测特征区域单元，被配置为检测所述图像帧中所述手部对应的特征区域；a feature region detection unit, configured to detect the feature region corresponding to the hand in the image frame;

裁剪图像单元，被配置为按照所述特征区域对所述图像帧进行裁剪，获得包含手部特征的所述中间图像；an image cropping unit, configured to crop the image frame according to the feature area to obtain the intermediate image containing hand features;

解析单元，被配置为对所述动作识别数据集进行解析，获得动作要素；a parsing unit, configured to parse the action recognition data set to obtain action elements;

识别单元，被配置为基于所述动作要素对所述三维虚拟对象进行动作识别的识别结果，确定所述三维虚拟对象的中间动作类型；an identification unit, configured to determine an intermediate action type of the three-dimensional virtual object based on the identification result of the action recognition on the three-dimensional virtual object based on the action element;

确定单元，被配置为将所述中间动作类型确定为所述目标对象的所述动作类型。A determination unit configured to determine the intermediate action type as the action type of the target object.

一个可选的实施例中，所述识别单元，包括：In an optional embodiment, the identifying unit includes:

计算子模块，被配置为计算所述三维虚拟对象中各个虚拟子对象之间的动作角度；a calculation submodule, configured to calculate the action angle between each virtual sub-object in the three-dimensional virtual object;

第一确定子模块，被配置为基于所述动作要素对所述动作角度进行检测，根据检测结果确定所述三维虚拟对象的所述中间动作类型。The first determining submodule is configured to detect the action angle based on the action element, and determine the intermediate action type of the three-dimensional virtual object according to the detection result.

检测子模块，被配置为根据所述动作要素对所述三维虚拟对象中各个虚拟子对象的动作位置进行检测；a detection sub-module, configured to detect the action position of each virtual sub-object in the three-dimensional virtual object according to the action element;

第二确定子模块，被配置为基于检测结果确定所述三维虚拟对象的所述中间动作类型。The second determination submodule is configured to determine the intermediate action type of the three-dimensional virtual object based on the detection result.

一个可选的实施例中，所述动作识别数据集是由服务端下发的动作识别包组成，其中，所述动作识别包中包含对所述三维虚拟对象进行识别的识别规则。In an optional embodiment, the motion recognition data set is composed of an motion recognition package delivered by a server, wherein the motion recognition package contains recognition rules for recognizing the three-dimensional virtual object.

本说明书提供的动作识别装置，通过对用户摆放的动作进行识别的过程中，对获取到的图像帧进行图像分割处理，获得包含目标对象的中间图像，再将所述中间图像输入到动作识别模型，获得动作组架节点以及三位坐标信息，根据所述三维坐标信息、所述动作组架节点及其所属的虚拟动作组架生成所述三维虚拟对象，实现了可以通过生成的三维虚拟对象进行动作类型的识别，有效的解决不同图像采集角度导致图像帧采集不够标准而影响动作类型识别准确率的问题，同时根据所述动作识别数据集对所述三维虚拟对象进行动作识别，确定所述目标对象的动作类型，使得在动作识别的场景中具有更加通用、灵活以及更好的拓展性，能够实现在添加新动作类型的情况下，无需对模型进行重新训练，通过增加动作识别数据集的方式即可实现对新动作类型的识别，使得应用场景变的更加广泛。The motion recognition device provided in this specification performs image segmentation processing on the acquired image frame during the process of recognizing the motion placed by the user, obtains an intermediate image containing the target object, and then inputs the intermediate image into the motion recognition model, obtain the action frame node and three-dimensional coordinate information, and generate the three-dimensional virtual object according to the three-dimensional coordinate information, the action frame node and the virtual action frame to which it belongs, and realize the three-dimensional virtual object that can be generated by Identifying the action type effectively solves the problem that the image frame collection is not standard enough due to different image acquisition angles and affects the accuracy of the action type recognition. The action type of the target object makes it more versatile, flexible and more scalable in the action recognition scene. It can realize that when a new action type is added, there is no need to retrain the model. In this way, the recognition of new action types can be realized, making the application scenarios more extensive.

上述为本实施例的一种动作识别装置的示意性方案。需要说明的是，该动作识别装置的技术方案与上述的动作识别方法的技术方案属于同一构思，动作识别装置的技术方案未详细描述的细节内容，均可以参见上述动作识别方法的技术方案的描述。The above is a schematic solution of a motion recognition apparatus according to this embodiment. It should be noted that the technical solution of the motion recognition device and the technical solution of the above-mentioned motion recognition method belong to the same concept, and the details that are not described in detail in the technical solution of the motion recognition device can be referred to the description of the technical solution of the above-mentioned motion recognition method. .

图7示出了根据本说明书一实施例提供的一种计算设备700的结构框图。该计算设备700的部件包括但不限于存储器710和处理器720。处理器720与存储器710通过总线730相连接，数据库750用于保存数据。FIG. 7 shows a structural block diagram of a computing device 700 according to an embodiment of the present specification. Components of the computing device 700 include, but are not limited to, memory 710 and processor 720 . The processor 720 is connected with the memory 710 through the bus 730, and the database 750 is used for storing data.

计算设备700还包括接入设备740，接入设备740使得计算设备700能够经由一个或多个网络760通信。这些网络的示例包括公用交换电话网(PSTN)、局域网(LAN)、广域网(WAN)、个域网(PAN)或诸如因特网的通信网络的组合。接入设备740可以包括有线或无线的任何类型的网络接口(例如，网络接口卡(NIC))中的一个或多个，诸如IEEE802.11无线局域网(WLAN)无线接口、全球微波互联接入(Wi-MAX)接口、以太网接口、通用串行总线(USB)接口、蜂窝网络接口、蓝牙接口、近场通信(NFC)接口，等等。Computing device 700 also includes access device 740 that enables computing device 700 to communicate via one or more networks 760 . Examples of such networks include a public switched telephone network (PSTN), a local area network (LAN), a wide area network (WAN), a personal area network (PAN), or a combination of communication networks such as the Internet. Access device 740 may include one or more of any type of network interface (eg, network interface card (NIC)), wired or wireless, such as IEEE 802.11 wireless local area network (WLAN) wireless interface, World Interoperability for Microwave Access ( Wi-MAX) interface, Ethernet interface, Universal Serial Bus (USB) interface, cellular network interface, Bluetooth interface, Near Field Communication (NFC) interface, and the like.

在本说明书的一个实施例中，计算设备700的上述部件以及图7中未示出的其他部件也可以彼此相连接，例如通过总线。应当理解，图7所示的计算设备结构框图仅仅是出于示例的目的，而不是对本说明书范围的限制。本领域技术人员可以根据需要，增添或替换其他部件。In one embodiment of the present specification, the above-described components of computing device 700 and other components not shown in FIG. 7 may also be connected to each other, such as through a bus. It should be understood that the structural block diagram of the computing device shown in FIG. 7 is only for the purpose of example, rather than limiting the scope of the present specification. Those skilled in the art can add or replace other components as required.

计算设备700可以是任何类型的静止或移动计算设备，包括移动计算机或移动计算设备(例如，平板计算机、个人数字助理、膝上型计算机、笔记本计算机、上网本等)、移动电话(例如，智能手机)、可佩戴的计算设备(例如，智能手表、智能眼镜等)或其他类型的移动设备，或者诸如台式计算机或PC的静止计算设备。计算设备700还可以是移动式或静止式的服务器。Computing device 700 may be any type of stationary or mobile computing device, including mobile computers or mobile computing devices (eg, tablet computers, personal digital assistants, laptop computers, notebook computers, netbooks, etc.), mobile phones (eg, smart phones) ), wearable computing devices (eg, smart watches, smart glasses, etc.) or other types of mobile devices, or stationary computing devices such as desktop computers or PCs. Computing device 700 may also be a mobile or stationary server.

其中，处理器720用于执行如下计算机可执行指令：The processor 720 is configured to execute the following computer-executable instructions:

上述为本实施例的一种计算设备的示意性方案。需要说明的是，该计算设备的技术方案与上述的动作识别方法的技术方案属于同一构思，计算设备的技术方案未详细描述的细节内容，均可以参见上述动作识别方法的技术方案的描述。The above is a schematic solution of a computing device according to this embodiment. It should be noted that the technical solution of the computing device and the technical solution of the above-mentioned motion recognition method belong to the same concept, and the details not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the above-mentioned motion recognition method.

本说明书一实施例还提供一种计算机可读存储介质，其存储有计算机指令，该指令被处理器执行时以用于：An embodiment of the present specification further provides a computer-readable storage medium, which stores computer instructions, which, when executed by a processor, are used for:

上述为本实施例的一种计算机可读存储介质的示意性方案。需要说明的是，该存储介质的技术方案与上述的动作识别方法的技术方案属于同一构思，存储介质的技术方案未详细描述的细节内容，均可以参见上述动作识别方法的技术方案的描述。The above is a schematic solution of a computer-readable storage medium of this embodiment. It should be noted that the technical solution of the storage medium and the technical solution of the above-mentioned motion recognition method belong to the same concept, and the details not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the above-mentioned motion recognition method.

上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下，在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外，在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中，多任务处理和并行处理也是可以的或者可能是有利的。The foregoing describes specific embodiments of the present specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. Additionally, the processes depicted in the figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

所述计算机指令包括计算机程序代码，所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括：能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是，所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减，例如在某些司法管辖区，根据立法和专利实践，计算机可读介质不包括电载波信号和电信信号。The computer instructions include computer program code, which may be in source code form, object code form, an executable file, some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in the computer-readable media may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction, for example, in some jurisdictions, according to legislation and patent practice, the computer-readable media Electric carrier signals and telecommunication signals are not included.

需要说明的是，对于前述的各方法实施例，为了简便描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本说明书并不受所描述的动作顺序的限制，因为依据本说明书，某些步骤可以采用其它顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于优选实施例，所涉及的动作和模块并不一定都是本说明书所必须的。It should be noted that, for the convenience of description, the foregoing method embodiments are all expressed as a series of action combinations, but those skilled in the art should know that this specification is not limited by the described action sequence. Because in accordance with this specification, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily all necessary in the specification.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其它实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.

以上公开的本说明书优选实施例只是用于帮助阐述本说明书。可选实施例并没有详尽叙述所有的细节，也不限制该发明仅为所述的具体实施方式。显然，根据本说明书的内容，可作很多的修改和变化。本说明书选取并具体描述这些实施例，是为了更好地解释本说明书的原理和实际应用，从而使所属技术领域技术人员能很好地理解和利用本说明书。本说明书仅受权利要求书及其全部范围和等效物的限制。The preferred embodiments of the present specification disclosed above are provided only to aid in the elaboration of the present specification. Alternative embodiments are not intended to exhaust all details, nor do they limit the invention to only the described embodiments. Obviously, many modifications and variations are possible in light of the content of this specification. These embodiments are selected and described in this specification to better explain the principles and practical applications of this specification, so that those skilled in the art can well understand and utilize this specification. This specification is limited only by the claims and their full scope and equivalents.

Claims

1. An action recognition method, comprising:

Obtain the image frame collected by the image acquisition device;

Segmenting the action area of the target object in the image frame to obtain an intermediate image;

Inputting the intermediate image into the action recognition model to perform key point recognition and coordinate mapping, to obtain the action frame node corresponding to the identified action key point, and the three-dimensional coordinate information of the action key point mapping;

Generate a three-dimensional virtual object according to the three-dimensional coordinate information, the action frame node and the virtual action frame to which it belongs;

Action recognition is performed on the three-dimensional virtual object based on the action recognition data set, and the action type of the target object is determined.

2. The action recognition method according to claim 1, wherein the action recognition model carries out key point recognition, comprising:

Identifying the action key point corresponding to the target object in the intermediate image, and determining the key point label of the action key point;

Determine the target node label corresponding to the key point label based on the correspondence between the key point label and the node label of the action frame node;

The action frame node corresponding to the action key point to which the key point label belongs is determined according to the target node label.

3. The action recognition method according to claim 2, wherein the action recognition model carries out coordinate mapping, comprising:

determining the position information of the action key point in the intermediate image;

The three-dimensional coordinate information corresponding to the action key point is mapped based on the position information.

4. The action recognition method according to claim 1, wherein generating a three-dimensional virtual object according to the three-dimensional coordinate information, the action frame node and the virtual action frame to which it belongs, comprising:

Determine the corresponding relationship between the action frame node and the three-dimensional coordinate information according to the action key point, and determine the connection relationship of the action frame node based on the virtual action frame;

The three-dimensional coordinate information is connected according to the connection relationship to generate the three-dimensional virtual object.

5. The action recognition method according to claim 1 , after the step of generating a three-dimensional virtual object according to the three-dimensional coordinate information, the action frame node and the virtual action frame to which it belongs is executed, and the action recognition-based The data set further includes:

detecting the number of objects of the generated three-dimensional virtual object;

In the case that the number of objects is greater than the preset number threshold, standardizing processing is performed on a plurality of three-dimensional virtual objects corresponding to the number of objects to obtain the virtual objects to be selected;

A target virtual object is selected from the virtual objects to be selected as the three-dimensional virtual object.

6. The action recognition method according to claim 1, before the step of acquiring the image frames collected by the image capturing device is performed, further comprising:

Receive the click instruction submitted by the user through the action interaction page;

At least one action image frame is displayed to the user according to the click instruction; the action image frame includes a display area corresponding to the display action.

7. The motion recognition method according to claim 6, wherein the motion recognition is performed on the three-dimensional virtual object based on the motion recognition data set, and the motion type of the target object is determined, comprising:

Receive the motion recognition data set issued by the server for the motion image frame; the motion recognition data carries the motion recognition rule matching the display action;

According to the action recognition rule, determine whether the action of the three-dimensional virtual object matches the display action;

If so, determine a presentation action type of the presentation action according to the action recognition data set, and use the presentation action type as the action type.

8 . The action recognition method according to claim 7 , wherein after determining the presentation action type of the presentation action according to the action recognition data set, and performing the presentation action type as the action type sub-step, further 8 . include:

Recommended information matching the action type is displayed to the user through the action interaction page.

9. The motion recognition method according to claim 6, wherein after the step of performing motion recognition on the three-dimensional virtual object based on the motion recognition data set and determining the motion type of the target object is performed, the method further comprises:

matching the action type with the presentation action type of the presentation action;

If the matching is successful, displaying the successful matching information to the user through the action interaction page;

If the matching fails, reminding information is displayed to the user through the action interaction page, and the reminding information carries an action strategy.

10. The action recognition method according to claim 1, wherein when the target object is a hand, performing segmentation processing on the action region of the target object in the image frame to obtain an intermediate image, comprising:

detecting the characteristic area corresponding to the hand in the image frame;

Crop the image frame according to the feature area to obtain the intermediate image containing hand features;

Correspondingly, the action type is a gesture action type.

11. The action recognition method according to claim 1, wherein the action recognition is performed on the three-dimensional virtual object based on the action recognition data set, and the action type of the target object is determined, comprising:

Analyzing the action recognition data set to obtain action elements;

determining an intermediate action type of the 3D virtual object based on the recognition result of the action recognition performed on the 3D virtual object by the action element;

The intermediate action type is determined as the action type of the target object.

12. The action recognition method according to claim 11, wherein the determination of the intermediate action type of the three-dimensional virtual object based on the recognition result of the action recognition on the three-dimensional virtual object based on the action elements comprises:

calculating an action angle between each virtual sub-object in the three-dimensional virtual object;

The action angle is detected based on the action element, and the intermediate action type of the three-dimensional virtual object is determined according to the detection result.

13. The motion recognition method according to claim 11, wherein the determination of the intermediate motion type of the 3D virtual object based on the recognition result of performing motion recognition on the 3D virtual object based on the motion elements, comprises:

Detecting the action position of each virtual sub-object in the three-dimensional virtual object according to the action element;

The intermediate action type of the three-dimensional virtual object is determined based on the detection result.

14. The motion recognition method according to claim 1, wherein the motion recognition data set is composed of an motion recognition package issued by a server, wherein the motion recognition package includes an identification for recognizing the three-dimensional virtual object rule.

15. An action recognition device, comprising:

an acquisition module, configured to acquire image frames collected by the image acquisition device;

a processing module, configured to perform segmentation processing on the action area of the target object in the image frame to obtain an intermediate image;

The recognition module is configured to input the intermediate image into the action recognition model to perform key point recognition and coordinate mapping, and obtain the action frame node corresponding to the identified action key point, and the three-dimensional coordinate information of the action key point mapping;

a generating module, configured to generate a three-dimensional virtual object according to the three-dimensional coordinate information, the action framing node and the virtual motion framing node to which it belongs;

The determining module is configured to perform motion recognition on the three-dimensional virtual object based on the motion recognition data set, and determine the motion type of the target object.

16. A computing device comprising:

memory and processor;

The memory is used to store computer-executable instructions, and the processor is used to execute the computer-executable instructions:

Obtain the image frame collected by the image acquisition device;

17. A computer-readable storage medium storing computer instructions, which, when executed by a processor, implement the steps of the action recognition method according to any one of claims 1 to 14.