[go: up one dir, main page]

CN119964191A - A gesture recognition method and its device, equipment and storage medium - Google Patents

A gesture recognition method and its device, equipment and storage medium Download PDF

Info

Publication number
CN119964191A
CN119964191A CN202311436504.3A CN202311436504A CN119964191A CN 119964191 A CN119964191 A CN 119964191A CN 202311436504 A CN202311436504 A CN 202311436504A CN 119964191 A CN119964191 A CN 119964191A
Authority
CN
China
Prior art keywords
features
image
sample
posture
key point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311436504.3A
Other languages
Chinese (zh)
Inventor
钟盛涛
程虎
林垠
殷保才
殷兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN202311436504.3A priority Critical patent/CN119964191A/en
Publication of CN119964191A publication Critical patent/CN119964191A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

本申请公开了一种姿态识别方法及其装置、设备、存储介质,该姿态识别方法包括:获取包含目标的待识别图像;利用姿态识别模型对待识别图像进行特征提取,得到图像特征和关键点特征;基于图像特征和关键点特征进行识别,得到姿态识别结果。上述方案,能够实现姿态的准确识别。

The present application discloses a gesture recognition method and its device, equipment, and storage medium. The gesture recognition method comprises: obtaining an image to be recognized containing a target; extracting features of the image to be recognized using a gesture recognition model to obtain image features and key point features; and performing recognition based on the image features and key point features to obtain a gesture recognition result. The above scheme can realize accurate gesture recognition.

Description

Gesture recognition method and device, equipment and storage medium thereof
Technical Field
The present application relates to the field of image recognition technologies, and in particular, to a gesture recognition method, a device, an apparatus, and a storage medium thereof.
Background
In the existing image recognition technology, a deep learning scheme is generally used for extracting modal features of the gestures of a person in an image when the gestures of the person are acquired, and recognition and classification of the gestures are performed based on the modal features, but the final recognition result is often inaccurate. For example, in a scheme for identifying an abnormal sitting posture of a child, only key point information obtained by 2D or 3D human body posture estimation is generally used for carrying out posture classification, and the scheme uses a key point mode to identify the sitting posture of the child, so that an obtained identification result is often inaccurate, and the problems of less effective information, insufficient robustness and the like exist.
Disclosure of Invention
The application aims to provide at least one gesture recognition method, a device, equipment and a storage medium thereof, and can realize accurate recognition of gestures.
The first aspect of the application provides a gesture recognition method, which comprises the steps of obtaining an image to be recognized containing a target, extracting features of the image to be recognized by utilizing a gesture recognition model to obtain image features and key point features, and recognizing the image features and the key point features to obtain a gesture recognition result.
The gesture recognition model comprises a network feature extraction module, an image feature extraction module and a key point feature extraction module, wherein the gesture recognition module is used for extracting features of an image to be recognized to obtain image features and key point features.
The gesture recognition model comprises a fusion module and a recognition module; the method comprises the steps of carrying out recognition based on image features and key point features to obtain a gesture recognition result, and inputting the image features and the key point features into a fusion module to carry out feature fusion to obtain gesture features, inputting the gesture features into a recognition module to carry out gesture recognition to determine a gesture recognition result corresponding to an image to be recognized.
The method comprises the steps of inputting image features and key point features into a fusion module to perform feature fusion to obtain gesture features, wherein the step of performing weighted fusion on the image features and the key point features by using the fusion module to obtain gesture features, and/or inputting the gesture features into a recognition module to perform gesture recognition to determine gesture recognition results corresponding to images to be recognized, and the step of using the recognition module to match the gesture features with preset gesture features in a database to determine gesture types to which the gesture features belong and further determine gesture recognition results corresponding to the gesture features.
The key point features are the position information of the key points of the target in the image to be identified, and/or the gesture recognition result is the recognition result of the sitting posture of the target.
The method further comprises the steps of obtaining a sample image, carrying out feature extraction on the sample image by utilizing a gesture recognition model to obtain image sample features and key sample application sample features, obtaining gesture sample features based on the image sample features and the key sample application sample features, recognizing the image sample features to obtain a first predicted gesture category, recognizing the gesture sample features to obtain a gesture recognition sample result, wherein the gesture recognition sample result comprises a second predicted gesture category, and adjusting parameters of the gesture recognition model by utilizing the first predicted gesture category and a reference gesture category marked by the sample image, key point information marked by the key sample application sample features and the sample image, and the second predicted gesture category and the reference gesture category.
The method comprises the steps of determining a first loss value based on the difference between a first predicted gesture type and a reference gesture type, determining a second loss value based on the difference between a key sample characteristic and reference key point information, determining a third loss value based on the difference between the second predicted gesture type and the reference gesture type, and adjusting parameters of a gesture recognition model by using the first loss value, the second loss value and the third loss value.
The gesture recognition model comprises a network feature extraction module, an image feature extraction module, a classification module, a key point feature extraction module, a fusion module and a recognition module, wherein the gesture recognition module is used for carrying out feature extraction on a sample image to obtain image sample features and key sample features, the method comprises the steps of carrying out feature extraction on the sample image by the network feature extraction module to obtain initial sample features, carrying out convolution processing on the initial sample features by the image feature extraction module to obtain image sample features, carrying out key point information extraction on the initial sample features by the key point feature extraction module to obtain key sample features, obtaining gesture sample features based on the image sample features and the key sample features, and comprises the steps of carrying out fusion on the image sample features and the key sample features by the fusion module to obtain gesture sample features, carrying out recognition on the gesture sample features to obtain gesture recognition sample results, carrying out recognition on the gesture sample features by the recognition module to obtain gesture recognition sample results, carrying out recognition on the image sample features to obtain first prediction gesture categories, and comprises the step of carrying out recognition on the image sample features by the classification module to determine first prediction gesture categories, the first prediction gesture categories corresponding to the image sample features, the first prediction gesture categories, the first prediction gesture losses, second prediction gesture losses and third prediction gesture categories, the first prediction gesture losses, the first prediction losses and third prediction gesture losses, the first prediction losses, the second prediction losses and third prediction losses, the first prediction losses and the second prediction losses The system comprises a key point feature extraction module, a fusion module and parameters of an identification module.
The method comprises the steps of utilizing a first loss value, a second loss value and a third loss value to adjust parameters of a gesture recognition model, and utilizing the first loss value, the second loss value and the third loss value to conduct fusion to generate a fourth loss value, conducting weighted fusion to the second loss value and the fourth loss value to generate a comprehensive loss value, and utilizing the comprehensive loss value to adjust all parameters of the gesture recognition model.
The second aspect of the application provides a training method of a gesture recognition model, which comprises the steps of obtaining a sample image, carrying out feature extraction on the sample image by utilizing the gesture recognition model to obtain image sample features and key sample features, obtaining gesture sample features based on the image sample features and the key sample features, recognizing the image sample features to obtain a first predicted gesture category, recognizing the gesture sample features to obtain a gesture recognition sample result, wherein the gesture recognition sample result comprises a second predicted gesture category, and adjusting parameters of the gesture recognition model by utilizing the first predicted gesture category and a reference gesture category marked by the sample image, key sample feature and reference key point information marked by the sample image, and the second predicted gesture category and the reference gesture category.
The application provides a gesture recognition device which comprises an acquisition module, a feature extraction module and a gesture recognition module, wherein the acquisition module is used for acquiring an image to be recognized containing a target, the feature extraction module is used for extracting features of the image to be recognized by utilizing a gesture recognition model to at least obtain image features and key point features, and the gesture recognition module is used for recognizing the image features and the key point features to obtain a gesture recognition result.
The application provides a model training device which comprises a sample acquisition module, a sample feature extraction module, an image feature recognition module, a sample gesture recognition module and an adjustment module, wherein the sample acquisition module is used for acquiring a sample image, the sample feature extraction module is used for carrying out feature extraction on the sample image by utilizing a gesture recognition model to obtain image sample features and key sample features, the gesture sample features are obtained based on the image sample features and the key sample features, the image feature recognition module is used for recognizing the image sample features to obtain a first predicted gesture category, the sample gesture recognition module is used for recognizing the gesture sample features to obtain a gesture recognition sample result, the gesture recognition sample result comprises a second predicted gesture category, and the adjustment module is used for adjusting parameters of the gesture recognition model by utilizing the first predicted gesture category and the reference gesture category marked by the sample image, the key sample feature and the reference key point information marked by the sample image and the second predicted gesture category and the reference gesture category.
A fifth aspect of the present application provides an electronic device, including a memory and a processor coupled to each other, the processor being configured to execute program instructions stored in the memory to implement the gesture recognition method in the first aspect or implement the training method of the gesture recognition model in the second aspect.
A sixth aspect of the present application provides a computer-readable storage medium having stored thereon program instructions which, when executed by a processor, implement the gesture recognition method in the first aspect described above, or implement the training method of the gesture recognition model in the second aspect described above.
According to the scheme, the feature extraction is carried out on the image to be identified containing the target by utilizing the gesture recognition model, so that the image feature and the key point feature are obtained, and the final gesture recognition result is obtained by carrying out gesture recognition on the image feature and the key point feature, so that the gesture recognition can be realized by utilizing the multi-mode feature. Moreover, the scheme can realize the extraction of the multi-mode features by using the same model, reduces the time for feature extraction and the occupation of processing resources, and greatly reduces the training time and the model parameters to be adjusted because only one model is required to be trained.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
FIG. 1 is a schematic flow chart of an embodiment of a gesture recognition method of the present application;
FIG. 2 is a schematic diagram of a framework of one embodiment of a gesture recognition model of the present application;
FIG. 3 is a flow chart of an embodiment of a training method of the gesture recognition model of the present application;
FIG. 4 is a schematic diagram of a frame of an embodiment of a gesture recognition apparatus of the present application;
FIG. 5 is a schematic diagram of a frame of an embodiment of the model training apparatus of the present application;
FIG. 6 is a schematic diagram of a frame of an embodiment of an electronic device of the present application;
FIG. 7 is a schematic diagram of a frame of an embodiment of a computer-readable storage medium of the present application.
Detailed Description
The following describes embodiments of the present application in detail with reference to the drawings.
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.
The term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean that a exists alone, while a and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. Further, "a plurality" herein means two or more than two. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, may mean including any one or more elements selected from the group consisting of A, B and C.
Referring to fig. 1, fig. 1 is a flowchart illustrating an embodiment of a gesture recognition method according to the present application.
Specifically, the method may include the steps of:
Step S110, obtaining an image to be identified containing the target.
The gesture recognition method can be applied to application scenes such as gait analysis, video monitoring, augmented reality, man-machine interaction, finance, mobile payment, entertainment, games, sports science and the like, multiple modal information in the image to be recognized is obtained by using the gesture recognition method, and target gesture recognition is carried out based on the multiple modal information so as to monitor or classify the gesture of the target, so that a gesture recognition result is obtained, wherein the gesture recognition result is a recognition result about the gesture of the target. The image to be identified may be a 2D image, a 3D image, a thermodynamic diagram, or the like, and the detected target may be a human or an animal, which is not particularly limited herein.
In some embodiments, to acquire a 2D image containing a target, the target may be captured with a camera. If the acquired 3D image includes the target, the 3D stereo camera may be used to capture the target, or the structured light may be used to scan the target.
In addition, the acquired image to be identified can be obtained by shooting by using equipment or can be selected from a database.
And step S120, extracting features of the image to be identified by using the gesture recognition model to obtain image features and key point features.
In the application, the multi-modal characteristics extracted from the image to be identified are key point characteristics and image characteristics by utilizing a gesture identification method. Furthermore, in order to improve the gesture recognition accuracy, modal features such as target action features and the like can be added on the basis of key point features and image features. It can be understood that the application can utilize two modes of the key point feature and the image feature to perform gesture recognition, and can additionally add a plurality of modes to perform gesture recognition on the basis of the key point feature and the image feature, which is not particularly limited herein.
The key point feature may specifically be position information of the key point of the target in the image to be identified, for example, if the image to be identified is a 2D image, the key point feature is coordinate information of the target key point in the 2D image. For another example, if the image to be identified is a 3D image, the key point features include depth information of the key point in addition to coordinate information of the target key point in the image. The posture recognition result is a recognition result concerning the sitting posture of the subject. The image features are features characterizing the condition of pixels in the image to be identified, such as pixel values of a plurality of statistical regions in the image to be identified.
In some embodiments, the object to be identified is a human, and the gesture recognition model is used to perform feature extraction on an image to be identified containing the human to obtain image features, where the image features include pixel features of the image to be identified, identity information of a human body in the image to be identified, and the like. And then, detecting key points of human bones of the image to be identified by using the gesture identification model so as to acquire key point characteristics. In the process of acquiring the key point features, target detection is firstly carried out on the image to be identified to acquire the number of people in the image to be identified and whether related information such as contact, shielding and the like exists among the people, and then the key points of the bones of the single person are detected to acquire the key point features. Wherein the gesture recognition model used is a deep neural network based on multitasking training. Compared with the extraction of single-mode features in the prior art, the gesture recognition method and device can effectively improve the accuracy of gesture recognition.
In some embodiments, the pose recognition model includes two branches, one of which is an image branch and the other of which is a pose estimation branch. Image features can be obtained by using the image branches, and key point features can be obtained by using the gesture estimation branches. Specifically, referring to fig. 2 in combination, fig. 2 is a schematic frame diagram of an embodiment of the gesture recognition model of the present application. The gesture recognition model includes a network feature extraction module 210, an image feature extraction module 220, and a keypoint feature extraction module 230. The image feature extraction module 220 is located in the image branch, the key point feature extraction module 230 is located in the pose estimation branch, and the steps executed by each feature extraction model can refer to steps S121 to S123.
And S121, carrying out feature extraction on the image to be identified by utilizing a network feature extraction module to obtain initial features.
In some embodiments, after the image to be identified is input to the gesture recognition model, the gesture recognition model performs feature extraction on the image to be identified by using the network feature extraction module 210 to obtain initial features. The initial characteristics comprise action information, image characteristics, key point characteristics and the like of a human body in the image to be identified.
And step S122, carrying out convolution processing on the initial features by utilizing an image feature extraction module to obtain image features.
In some embodiments, the acquired initial features are input into an image branch, and the image feature extraction module 220 is utilized to convolve the initial features multiple times to obtain image features with high correlation with the human body posture from a plurality of image features in the initial features.
And step 123, extracting key point information from the initial features by using a key point feature extraction module to obtain key point features.
In some embodiments, the obtained initial features are input into a pose estimation branch, and human body pose estimation is performed on the initial features by using a key point feature extraction module 230 to extract human body key point information, so as to obtain key point features. Specifically, when the recognition target is a human body, the gesture estimation branch can adopt a framework based on SimDR of the 1D heat map as a branch thereof to perform human body gesture estimation, so as to obtain 2D human body key point coordinate information as key point characteristics. It will be appreciated that the human body posture estimation algorithm is not particularly limited herein.
And step S130, identifying based on the image features and the key point features to obtain a gesture identification result.
In some embodiments, after the image features and the key point features are obtained by using the gesture recognition model, the image features and the key point features are analyzed, and the gesture recognition result about the target is obtained by combining the analysis result of the image features and the analysis result of the key point features.
With continued reference to fig. 2, the gesture recognition model includes a fusion module 240 and a recognition module 250. The steps of acquiring the gesture recognition result are steps S131 to S132.
And S131, inputting the image features and the key point features into a fusion module for feature fusion to obtain gesture features.
In some embodiments, after obtaining the image features and the keypoint features, the image features and the keypoint features are input into the fusion module 240 of the gesture recognition model for fusion, so as to obtain new gesture features. Wherein, the image features and the key point features can be weighted and fused by utilizing the fusion module 240 to obtain the gesture features when the features are fused according to the weights of the image features and the key point features in the gesture features.
Specifically, concat fusion may be used when the image features and the key point features are weighted and fused, or other fusion algorithms may be used, which are not limited in detail herein. However, the feature fusion method tends to make stronger branch return gradient take the dominant role, so that the fused features collapse into a single feature, and therefore, in this embodiment, the SE attention mechanism can be adopted for feature fusion.
And S132, inputting the gesture features into a recognition module for gesture recognition, and determining a gesture recognition result corresponding to the image to be recognized.
In some embodiments, after the gesture feature is obtained, the gesture feature may be matched with a preset gesture feature in the database by using the recognition module 250, so as to determine a gesture category to which the gesture feature belongs, and further determine a gesture recognition result corresponding to the gesture feature.
In a specific application scenario, the abnormal sitting posture of the child is identified by using a posture identification method, so that the sitting posture of the child is corrected. The method comprises the steps of installing cameras in classrooms, shooting sitting postures of children in real time by the aid of the cameras, analyzing shot videos by a system, preprocessing each frame of image in the videos by means of a top-down human body posture estimation algorithm, cutting out an image of an area where each child is located to serve as an image to be recognized, inputting the image to a posture recognition model, and recognizing painful postures in the image to be recognized by the aid of the posture recognition model.
After the gesture recognition model receives the image to be recognized, the network feature extraction module 210 extracts initial features in the image to be recognized. And input the initial feature into image branch and human body posture estimation branch of the posture recognition model, utilize image feature extraction module 220 in the image branch to convolve the initial feature to extract the image feature, utilize key point feature extraction module 230 in the human body posture estimation branch to extract the key point information of the initial feature, thus finish the human body posture estimation task, in order to extract the key point feature.
The obtained image features and the key point features are input into a fusion module 240 for weighted fusion, so as to obtain gesture features. The gesture features are input into the recognition module 250, and the recognition module 250 matches the gesture features with preset gesture features in a database to determine the gesture category to which the gesture features belong, and further determines a gesture recognition result corresponding to the gesture features, so that the sitting gesture category of the child can be obtained. If the sitting posture of the child is a sitting posture such as a face supporting posture, a lying table posture, a humpback posture, a tilting posture and the like, the system calculates the position of the child in the classroom according to the position of the image to be recognized of the child in the original image, and prompts a teacher that the sitting posture of the child at a certain position is abnormal to remind. If the sitting posture of the child is normal, reminding is not needed.
In addition, in order to improve the accuracy of the gesture recognition, the gesture recognition model can be trained before the gesture recognition method is formally applied, so that the gesture recognition model is put into a specific application scene for use after reaching the expected effect after being trained. Referring to fig. 3, fig. 3 is a flowchart illustrating an embodiment of a training method of the gesture recognition model of the present application. Specifically, the method may include the steps of:
step S310, acquiring a sample image.
In some embodiments, the acquired sample image may be an image of the gesture recognition results known in the database.
In other embodiments, when the target is a person, the camera assembly of the acquisition device may be used to acquire color (RGB) images and label the human frame, joint coordinates, and corresponding gesture categories in the color images, including walking, lying table, humpback, sitting position, and the like. Wherein the color image is a sample image.
In addition, an open source NanoDet target detection algorithm based on Anchor-free (without prior frame) can be adopted, and a human body detection model can be obtained by combining the public data set COCO and the acquired human body detection data set through training. The algorithm can achieve excellent balance in detection precision and speed, and provides powerful support for subsequent processes. NanoDet is a target detection model of an Anchor-free mobile terminal with ultra-high speed and light weight.
And step S320, carrying out feature extraction on the sample image by utilizing the gesture recognition model to obtain image sample features and key sample application features, and obtaining gesture sample features based on the image sample features and the key sample application features.
In some embodiments, after the sample image is acquired, the sample image is input into the gesture recognition model to train the gesture recognition model. Specifically, the gesture recognition model is composed of a plurality of modules, and thus, a sample image needs to be predicted by using the plurality of modules in the training process. With continued reference to fig. 2, after the gesture recognition model receives the sample image, the network feature extraction module 210 of the gesture recognition model is used to perform feature extraction on the sample image, so as to obtain an initial sample feature. The image feature extraction module 220 is then utilized to convolve the initial sample features to obtain image sample features. Meanwhile, the key point feature extraction module 230 is utilized to extract key point information of the initial sample feature to obtain a key sample feature, wherein the key point feature extraction module 230 can be utilized to extract the key sample feature to perform target pose estimation by adopting SimDR (SIMPLE DISENTAGLED coordinate Representation, decoupling coordinate representation of pose estimation) based on the 1D heat map as a framework of the key point branch, so as to obtain the key sample feature.
And after the image sample features and the key sample features are obtained, the image sample features and the key sample features are fused by utilizing a fusion module 240 of the gesture recognition model, so as to obtain gesture sample features.
And step S330, identifying the image sample characteristics to obtain a first predicted gesture category.
In some embodiments, after the image sample features are obtained, to further calculate a loss between the image features predicted by the gesture recognition model and the actual image features, the classification module 260 may be used to recognize and classify the image sample features to determine a first predicted gesture class corresponding to the image sample features.
And step S340, recognizing the gesture sample characteristics to obtain a gesture recognition sample result, wherein the gesture recognition sample result comprises a second predicted gesture category.
In some embodiments, after the gesture sample features are obtained, the gesture sample features are identified using the identification module 240 to obtain a gesture identification sample result. The gesture recognition sample result includes a second predicted gesture category, and it can be known which gesture the gesture sample feature belongs to, for example, belongs to a walking category or a sitting category, etc., according to the second predicted gesture category.
And S350, adjusting parameters of the gesture recognition model by using the first predicted gesture type and the reference gesture type marked by the sample image, the key point information marked by the key sample features and the sample image and the second predicted gesture type and the reference gesture type.
In some embodiments, the sample image is identified by using the gesture recognition model, after the predicted recognition result is obtained, the predicted recognition result is compared with the actual labeling result to determine a loss value of the gesture recognition model, and the loss value is returned, so that parameters of the gesture recognition model are adjusted.
Wherein the first loss value may be determined using a difference between the first predicted pose class and the reference pose class, and in particular, the first loss value may be calculated using a cross entropy loss algorithm to supervise the image branches. It will be appreciated that in calculating the first Loss value, other Loss (Loss) algorithms may be used in addition to the cross entropy Loss algorithm, and are not specifically limited herein.
And determining a second loss value by utilizing the difference between the key sample characteristics and the reference key point information, and particularly calculating the second loss value by using KL divergence so as to supervise the gesture estimation task of the gesture estimation branch. It is to be understood that, in addition to the KL-divergence algorithm, other loss algorithms may be used in calculating the second loss value, which is not specifically limited herein.
And determining a third loss value by utilizing the difference between the second predicted gesture category and the reference gesture category, and particularly calculating the third loss value by adopting a cross entropy loss algorithm, and carrying out overall network supervision on the fusion characteristics. It will be appreciated that other loss algorithms may be used in calculating the third loss value, in addition to the cross entropy loss algorithm, and are not specifically limited herein.
And then, adjusting parameters of the gesture recognition model by using the first loss value, the second loss value and the third loss value.
In some embodiments, after obtaining the first loss value, the second loss value, and the third loss value, parameters of the network feature extraction module 210, the classification module 260, and the image feature extraction module 220 in the gesture recognition model are adjusted using the first loss value. Parameters of the network feature extraction module 210 and the keypoint feature extraction module 230 in the gesture recognition model are adjusted using the second loss value. Parameters of the network feature extraction module 210, the image feature extraction module 220, the key point feature extraction module 230, the fusion module 240, and the recognition module 250 in the gesture recognition model are adjusted using the third loss value.
In other embodiments, since the image features often contain much more abundant information than the keypoint features, which is difficult to avoid, dominant in the fused features, during the training process, before adjusting the parameters of the gesture recognition model, the second loss values corresponding to the keypoint features are weighted and fused, so that even if the information contained in the keypoint features is much less than the image features, a certain gradient is still transmitted back to the keypoint feature extraction module 230, thereby ensuring that the keypoint feature extraction module 230 can be continuously updated, and greatly alleviating the problem of feature collapse. In addition to the overall model network supervision of the fused feature addition classification Loss (Loss), classification Loss is added to the individual image features and the key point features, and finally the three are added to form the overall classification Loss. The individual classification loss of each modal branch updates the parameters of the respective feature extraction module and affects the parameters of the shared portion of the two branches in the model, while the classification loss on the fusion module 240 updates all the parameters in the network.
Specifically, the first loss value, the second loss value and the third loss value are used for fusion to generate a fourth loss value, the second loss value and the fourth loss value are subjected to weighted fusion to generate a comprehensive loss value, and then the comprehensive loss value is used for adjusting all parameters of the gesture recognition model. The first loss value, the second loss value and the third loss value can be added to obtain a fourth loss value, the second loss value and the fourth loss value are subjected to weighted fusion to generate a comprehensive loss value, and then the comprehensive loss value is utilized to adjust all parameters of the gesture recognition model, wherein the following formula can be specifically combined:
LossCls=LossImg_Cls+LossPoss_Cls+LossFuse_Cls (1)
Loss=α1×LossPoss_Cls2×LossCls (2)
Where Loss Img_Cls is a first Loss value, loss Poss_Cls is a second Loss value, loss Fuse_Cls is a third Loss value, loss Cls is a fourth Loss value, and Loss is a composite Loss value.
Referring to fig. 4, fig. 4 is a schematic frame diagram of an embodiment of a gesture recognition apparatus according to the present application. The gesture recognition apparatus 400 includes an acquisition module 410, a feature extraction module 420, and a gesture recognition module 430. The acquiring module 410 is configured to acquire an image to be identified including a target. The feature extraction module 420 is configured to perform feature extraction on an image to be identified by using the gesture recognition model, so as to obtain at least an image feature and a key point feature. The gesture recognition module 430 is configured to perform recognition based on at least the obtained image feature and the key point feature, and obtain a gesture recognition result.
Referring to fig. 5, fig. 5 is a schematic diagram of a model training apparatus according to an embodiment of the application. Model training apparatus 500 includes a sample acquisition module 510, a sample feature extraction module 520, an image feature recognition module 530, a sample gesture recognition module 540, and an adjustment module 550. The sample acquisition module 510 is used to acquire a sample image. The sample feature extraction module 520 is configured to perform feature extraction on a sample image by using a gesture recognition model to obtain an image sample feature and a key sample feature, and obtain a gesture sample feature based on the image sample feature and the key sample feature. The image feature recognition module 530 is configured to recognize features of an image sample to obtain a first predicted gesture class. The sample gesture recognition module 540 is configured to recognize the gesture sample feature to obtain a gesture recognition sample result, where the gesture recognition sample result includes a second predicted gesture category. The adjustment module 550 is configured to adjust parameters of the gesture recognition model by using the first predicted gesture category and the reference gesture category of the sample image annotation, the key point information of the key sample feature and the reference key point information of the sample image annotation, and the second predicted gesture category and the reference gesture category.
According to the application, the gesture recognition model is utilized to extract the characteristics of the image to be recognized containing the target, so that the image characteristics and the key point characteristics are obtained, and the final gesture recognition result is obtained by carrying out gesture recognition on the image characteristics and the key point characteristics, so that the problem of misrecognition caused by inaccurate single-mode data can be effectively reduced, and the accuracy of image recognition and the robustness of the overall scheme are improved.
In addition, the multi-mode feature extraction method based on multitasking is adopted, features of different modes can be extracted by using only a single network, most parameters can be effectively shared, the model reasoning speed is increased, the regularization constraint function can be achieved, and the recognition accuracy of the model is improved. Meanwhile, the application also adopts a method of fusing multi-mode information to carry out gesture recognition, thereby effectively reducing false recognition caused by inaccurate single-mode data and greatly improving the robustness of the whole scheme. The multi-mode gesture recognition scheme based on the multi-task provided by the application skillfully combines multi-task learning with multi-modes, and is an accurate and real-time gesture recognition solution.
In the prior art, only single-mode information is used, so that the problem of inaccuracy or low generalization often exists, and the scheme fuses the image and the key point information to carry out gesture recognition, so that the discrimination capability, the robustness and the generalization of the model are greatly improved.
It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.
Referring to fig. 6, fig. 6 is a schematic diagram of a frame of an electronic device 60 according to an embodiment of the application. The electronic device 60 comprises a memory 61 and a processor 62 coupled to each other, the processor 62 being adapted to execute program instructions stored in the memory 61 for implementing the steps of any of the gesture recognition method embodiments described above or for implementing the steps of any of the gesture recognition model training method embodiments described above. In one specific implementation scenario, electronic device 60 may include, but is not limited to, a microcomputer, a server, and further, electronic device 60 may also include a mobile device such as a notebook computer, a tablet computer, etc., without limitation.
Specifically, the processor 62 is configured to control itself and the memory 61 to implement the steps of any of the gesture recognition method embodiments described above, or to implement the steps of any of the gesture recognition model training method embodiments described above. The processor 62 may also be referred to as a CPU (Central Processing Unit ). The processor 62 may be an integrated circuit chip having signal processing capabilities. The Processor 62 may also be a general purpose Processor, a digital signal Processor (DIGITAL SIGNAL Processor, DSP), an Application SPECIFIC INTEGRATED Circuit (ASIC), a Field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 62 may be commonly implemented by an integrated circuit chip.
Referring to FIG. 7, FIG. 7 is a schematic diagram of a computer readable storage medium 70 according to an embodiment of the application. The computer readable storage medium 70 stores program instructions 701 executable by a processor, the program instructions 701 for implementing the steps of any of the gesture recognition method embodiments described above, or implementing the steps of any of the gesture recognition model training method embodiments described above.
In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.
The foregoing description of various embodiments is intended to highlight differences between the various embodiments, which may be the same or similar to each other by reference, and is not repeated herein for the sake of brevity.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical, or other forms.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.

Claims (14)

1.一种姿态识别方法,其特征在于,包括:1. A gesture recognition method, comprising: 获取包含目标的待识别图像;Obtain an image to be identified containing a target; 利用姿态识别模型对所述待识别图像进行特征提取,得到图像特征和关键点特征;Extracting features of the image to be recognized by using a posture recognition model to obtain image features and key point features; 基于所述图像特征和关键点特征进行识别,得到姿态识别结果。Recognition is performed based on the image features and key point features to obtain a gesture recognition result. 2.根据权利要求1所述的方法,其特征在于,所述姿态识别模型包括网络特征提取模块、图像特征提取模块和关键点特征提取模块;所述利用姿态识别模型对所述待识别图像进行特征提取,得到图像特征和关键点特征,包括:2. The method according to claim 1 is characterized in that the posture recognition model includes a network feature extraction module, an image feature extraction module and a key point feature extraction module; the step of extracting features of the image to be recognized by using the posture recognition model to obtain image features and key point features comprises: 利用所述网络特征提取模块对所述待识别图像进行特征提取,得到初始特征;Using the network feature extraction module to extract features from the image to be identified to obtain initial features; 利用所述图像特征提取模块对所述初始特征进行卷积处理,得到所述图像特征;Using the image feature extraction module to perform convolution processing on the initial features to obtain the image features; 利用所述关键点特征提取模块对所述初始特征进行关键点信息提取,得到所述关键点特征。The key point feature extraction module is used to extract key point information from the initial features to obtain the key point features. 3.根据权利要求1所述的方法,其特征在于,所述姿态识别模型包括融合模块和识别模块;所述基于所述图像特征和关键点特征进行识别,得到姿态识别结果,包括:3. The method according to claim 1, characterized in that the posture recognition model includes a fusion module and a recognition module; the recognition based on the image features and key point features to obtain the posture recognition result includes: 将所述图像特征和关键点特征输入至所述融合模块中进行特征融合,得到姿态特征;Inputting the image features and key point features into the fusion module for feature fusion to obtain posture features; 将所述姿态特征输入至所述识别模块中进行姿态识别,确定所述待识别图像对应的所述姿态识别结果。The posture feature is input into the recognition module for posture recognition, and the posture recognition result corresponding to the image to be recognized is determined. 4.根据权利要求3所述的方法,其特征在于,所述将所述图像特征和关键点特征输入至所述融合模块中进行特征融合,得到姿态特征,包括:4. The method according to claim 3, characterized in that the step of inputting the image features and key point features into the fusion module for feature fusion to obtain posture features comprises: 利用所述融合模块对所述图像特征和关键点特征进行加权融合,得到姿态特征;Using the fusion module to perform weighted fusion on the image features and key point features to obtain posture features; 和/或,所述将所述姿态特征输入至所述识别模块中进行姿态识别,确定所述待识别图像对应的所述姿态识别结果,包括:And/or, inputting the posture feature into the recognition module for posture recognition, and determining the posture recognition result corresponding to the image to be recognized, comprises: 利用所述识别模块将所述姿态特征与数据库中的预设姿态特征进行匹配,以确定所述姿态特征所属的姿态类别,进而确定所述姿态特征对应的所述姿态识别结果。The recognition module is used to match the posture feature with a preset posture feature in a database to determine the posture category to which the posture feature belongs, and then determine the posture recognition result corresponding to the posture feature. 5.根据权利要求1所述的方法,其特征在于,所述关键点特征为所述目标的关键点在所述待识别图像中的位置信息;和/或,所述姿态识别结果为关于所述目标的坐姿的识别结果。5. The method according to claim 1 is characterized in that the key point feature is the position information of the key point of the target in the image to be identified; and/or the posture recognition result is the recognition result of the sitting posture of the target. 6.根据权利要求1所述的方法,其特征在于,所述方法还包括:6. The method according to claim 1, characterized in that the method further comprises: 获取样本图像;Get a sample image; 利用姿态识别模型对所述样本图像进行特征提取,得到图像样本特征和关键点样本特征,并基于所述图像样本特征和关键点样本特征,得到姿态样本特征;Extracting features from the sample image using a posture recognition model to obtain image sample features and key point sample features, and obtaining posture sample features based on the image sample features and key point sample features; 对所述图像样本特征进行识别,得到第一预测姿态类别;Identifying features of the image sample to obtain a first predicted posture category; 对所述姿态样本特征进行识别,得到姿态识别样本结果,所述姿态识别样本结果包括第二预测姿态类别;Identify the gesture sample features to obtain a gesture recognition sample result, wherein the gesture recognition sample result includes a second predicted gesture category; 利用所述第一预测姿态类别与所述样本图像标注的参考姿态类别、所述关键点样本特征与所述样本图像标注的参考关键点信息以及所述第二预测姿态类别与所述参考姿态类别,调整所述姿态识别模型的参数。The parameters of the posture recognition model are adjusted using the first predicted posture category and the reference posture category annotated by the sample image, the key point sample features and the reference key point information annotated by the sample image, and the second predicted posture category and the reference posture category. 7.根据权利要求6所述的方法,其特征在于,所述利用所述第一预测姿态类别与所述样本图像标注的参考姿态类别、所述关键点样本特征与所述样本图像标注的参考关键点信息以及所述第二预测姿态类别与所述参考姿态类别,调整所述姿态识别模型的参数,包括:7. The method according to claim 6, characterized in that the adjusting the parameters of the posture recognition model by using the first predicted posture category and the reference posture category annotated by the sample image, the key point sample features and the reference key point information annotated by the sample image, and the second predicted posture category and the reference posture category comprises: 基于所述第一预测姿态类别与所述参考姿态类别之间的差异,确定第一损失值,基于所述关键点样本特征与所述参考关键点信息之间的差异,确定第二损失值,以及基于所述第二预测姿态类别与所述参考姿态类别之间的差异,确定第三损失值;Determining a first loss value based on a difference between the first predicted posture category and the reference posture category, determining a second loss value based on a difference between the key point sample feature and the reference key point information, and determining a third loss value based on a difference between the second predicted posture category and the reference posture category; 利用所述第一损失值、第二损失值和第三损失值,调整所述姿态识别模型的参数。The first loss value, the second loss value and the third loss value are used to adjust parameters of the gesture recognition model. 8.根据权利要求7所述的方法,其特征在于,所述姿态识别模型包括:网络特征提取模块、图像特征提取模块、分类模块、关键点特征提取模块、融合模块和识别模块;8. The method according to claim 7, characterized in that the posture recognition model comprises: a network feature extraction module, an image feature extraction module, a classification module, a key point feature extraction module, a fusion module and a recognition module; 所述利用姿态识别模型对所述样本图像进行特征提取,得到图像样本特征和关键点样本特征,包括:The method of extracting features from the sample image using the posture recognition model to obtain image sample features and key point sample features includes: 利用所述网络特征提取模块对所述样本图像进行特征提取,得到初始样本特征;Using the network feature extraction module to extract features from the sample image to obtain initial sample features; 利用所述图像特征提取模块对所述初始样本特征进行卷积处理,得到所述图像样本特征;Using the image feature extraction module to perform convolution processing on the initial sample features to obtain the image sample features; 利用所述关键点特征提取模块对所述初始样本特征进行关键点信息提取,得到所述关键点样本特征;Extracting key point information from the initial sample features using the key point feature extraction module to obtain the key point sample features; 所述基于所述图像样本特征和关键点样本特征,得到姿态样本特征,包括:The obtaining of posture sample features based on the image sample features and the key point sample features includes: 利用所述融合模块对所述图像样本特征和关键点样本特征进行融合,得到所述姿态样本特征;Using the fusion module to fuse the image sample features and the key point sample features to obtain the posture sample features; 所述对所述姿态样本特征进行识别,得到姿态识别样本结果,包括:The step of identifying the gesture sample features to obtain a gesture recognition sample result includes: 利用所述识别模块对所述姿态样本特征进行识别,得到所述姿态识别样本结果;Using the recognition module to recognize the features of the gesture sample to obtain the gesture recognition sample result; 所述对所述图像样本特征进行识别,得到第一预测姿态类别,包括:The step of identifying the image sample feature to obtain a first predicted posture category includes: 利用分类模块对所述图像样本特征进行识别、分类,确定所述图像样本特征对应的所述第一预测姿态类别;Using a classification module to identify and classify the image sample features, and determine the first predicted posture category corresponding to the image sample features; 所述利用所述第一损失值、第二损失值和第三损失值,调整所述姿态识别模型的参数,包括:The adjusting the parameters of the gesture recognition model by using the first loss value, the second loss value and the third loss value includes: 利用所述第一损失值,调整所述网络特征提取模块、分类模块、图像特征提取模块的参数;Using the first loss value, adjusting the parameters of the network feature extraction module, the classification module, and the image feature extraction module; 利用所述第二损失值,调整所述网络特征提取模块和关键点特征提取模块的参数;Using the second loss value, adjusting the parameters of the network feature extraction module and the key point feature extraction module; 利用所述第三损失值,调整所述网络特征提取模块、图像特征提取模块、关键点特征提取模块、融合模块和识别模块的参数。The third loss value is used to adjust the parameters of the network feature extraction module, the image feature extraction module, the key point feature extraction module, the fusion module and the recognition module. 9.根据权利要求7所述的方法,其特征在于,所述利用所述第一损失值、第二损失值和第三损失值,调整所述姿态识别模型的参数,包括:9. The method according to claim 7, wherein the adjusting the parameters of the gesture recognition model by using the first loss value, the second loss value and the third loss value comprises: 利用所述第一损失值、第二损失值和第三损失值进行融合,生成第四损失值;The first loss value, the second loss value and the third loss value are combined to generate a fourth loss value; 对所述第二损失值和所述第四损失值进行加权融合,生成综合损失值;Performing weighted fusion on the second loss value and the fourth loss value to generate a comprehensive loss value; 利用所述综合损失值,调整所述姿态识别模型的所有参数。All parameters of the gesture recognition model are adjusted using the comprehensive loss value. 10.一种姿态识别模型的训练方法,其特征在于,包括:10. A training method for a posture recognition model, comprising: 获取样本图像;Get a sample image; 利用姿态识别模型对所述样本图像进行特征提取,得到图像样本特征和关键点样本特征,并基于所述图像样本特征和关键点样本特征,得到姿态样本特征;Extracting features from the sample image using a posture recognition model to obtain image sample features and key point sample features, and obtaining posture sample features based on the image sample features and key point sample features; 对所述图像样本特征进行识别,得到第一预测姿态类别;Identifying features of the image sample to obtain a first predicted posture category; 对所述姿态样本特征进行识别,得到姿态识别样本结果,所述姿态识别样本结果包括第二预测姿态类别;Identify the gesture sample features to obtain a gesture recognition sample result, wherein the gesture recognition sample result includes a second predicted gesture category; 利用所述第一预测姿态类别与所述样本图像标注的参考姿态类别、所述关键点样本特征与所述样本图像标注的参考关键点信息以及所述第二预测姿态类别与所述参考姿态类别,调整所述姿态识别模型的参数。The parameters of the posture recognition model are adjusted using the first predicted posture category and the reference posture category annotated by the sample image, the key point sample features and the reference key point information annotated by the sample image, and the second predicted posture category and the reference posture category. 11.一种姿态识别装置,其特征在于,包括:11. A gesture recognition device, comprising: 获取模块,用于获取包含目标的待识别图像;An acquisition module is used to acquire an image to be identified containing a target; 特征提取模块,用于利用姿态识别模型对所述待识别图像进行特征提取,至少得到图像特征和关键点特征;A feature extraction module, used to extract features of the image to be recognized using a posture recognition model, and obtain at least image features and key point features; 姿态识别模块,基于至少得到的所述图像特征和关键点特征进行识别,得到姿态识别结果。The gesture recognition module performs recognition based on at least the obtained image features and key point features to obtain a gesture recognition result. 12.一种模型训练装置,其特征在于,包括:12. A model training device, comprising: 样本获取模块,用于获取样本图像;A sample acquisition module, used for acquiring a sample image; 样本特征提取模块,用于利用姿态识别模型对所述样本图像进行特征提取,得到图像样本特征和关键点样本特征,并基于所述图像样本特征和关键点样本特征,得到姿态样本特征;A sample feature extraction module, used to extract features from the sample image using a posture recognition model to obtain image sample features and key point sample features, and to obtain posture sample features based on the image sample features and key point sample features; 图像特征识别模块,用于对所述图像样本特征进行识别,得到第一预测姿态类别;An image feature recognition module, used to recognize features of the image sample to obtain a first predicted posture category; 样本姿态识别模块,用于对所述姿态样本特征进行识别,得到姿态识别样本结果,所述姿态识别样本结果包括第二预测姿态类别;A sample posture recognition module, used for recognizing the posture sample features to obtain a posture recognition sample result, wherein the posture recognition sample result includes a second predicted posture category; 调整模块,用于利用所述第一预测姿态类别与所述样本图像标注的参考姿态类别、所述关键点样本特征与所述样本图像标注的参考关键点信息以及所述第二预测姿态类别与所述参考姿态类别,调整所述姿态识别模型的参数。An adjustment module is used to adjust the parameters of the posture recognition model by using the first predicted posture category and the reference posture category annotated by the sample image, the key point sample features and the reference key point information annotated by the sample image, and the second predicted posture category and the reference posture category. 13.一种电子设备,其特征在于,包括相互耦接的存储器和处理器,所述处理器用于执行所述存储器中存储的程序指令,以实现权利要求1至9任一项所述的姿态识别方法,和/或实现权利要求10所述的姿态识别模型的训练方法。13. An electronic device, characterized in that it comprises a memory and a processor coupled to each other, wherein the processor is used to execute program instructions stored in the memory to implement the gesture recognition method described in any one of claims 1 to 9, and/or the training method of the gesture recognition model described in claim 10. 14.一种计算机可读存储介质,其上存储有程序指令,其特征在于,所述程序指令被处理器执行时实现权利要求1至9任一项所述的姿态识别方法,和/或实现权利要求10所述的姿态识别模型的训练方法。14. A computer-readable storage medium having program instructions stored thereon, characterized in that when the program instructions are executed by a processor, the posture recognition method described in any one of claims 1 to 9 and/or the posture recognition model training method described in claim 10 are implemented.
CN202311436504.3A 2023-10-30 2023-10-30 A gesture recognition method and its device, equipment and storage medium Pending CN119964191A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311436504.3A CN119964191A (en) 2023-10-30 2023-10-30 A gesture recognition method and its device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311436504.3A CN119964191A (en) 2023-10-30 2023-10-30 A gesture recognition method and its device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN119964191A true CN119964191A (en) 2025-05-09

Family

ID=95588324

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311436504.3A Pending CN119964191A (en) 2023-10-30 2023-10-30 A gesture recognition method and its device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN119964191A (en)

Similar Documents

Publication Publication Date Title
CN107609383B (en) 3D face identity authentication method and device
CN107748869B (en) 3D face identity authentication method and device
CN108829900B (en) Face image retrieval method and device based on deep learning and terminal
CN112597941A (en) Face recognition method and device and electronic equipment
CN112818722B (en) Modular dynamic configurable living body face recognition system
Seow et al. Neural network based skin color model for face detection
CN104599287B (en) Method for tracing object and device, object identifying method and device
WO2020103700A1 (en) Image recognition method based on micro facial expressions, apparatus and related device
US20170270355A1 (en) Method and Apparatus for Pattern Tracking
CN106897658A (en) The discrimination method and device of face live body
CN108229375B (en) Method and device for detecting face image
CN112257617B (en) Multi-modal target recognition method and system
CN118411602A (en) An improved YOLOv8 forest fire detection method
KR20220160303A (en) System for Analysing and Generating Gaze Detection Data by Gaze Detection of User
CN114863405B (en) A clothing identification method, device, terminal and storage medium
CN119152581B (en) Pedestrian re-identification method, device and equipment based on multi-mode semantic information
Yuan et al. Ear detection based on CenterNet
WO2025077282A1 (en) Liveness detection method and apparatus, computer device, and storage medium
CN113657155A (en) Behavior detection method and device, computer equipment and storage medium
CN114694243A (en) Fall detection method and device, electronic equipment and storage medium
CN119964191A (en) A gesture recognition method and its device, equipment and storage medium
CN104751144A (en) Frontal face quick evaluation method for video surveillance
CN115116136A (en) A kind of abnormal behavior detection method, device and medium
CN114511877A (en) Behavior recognition method and device, storage medium and terminal
CN113822222A (en) Human face anti-cheating method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination