CN113064483A

CN113064483A - A gesture recognition method and related device

Info

Publication number: CN113064483A
Application number: CN202110221938.6A
Authority: CN
Inventors: 夏朝阳; 丁根明; 徐丰
Original assignee: Huawei Technologies Co Ltd; Fudan University
Current assignee: Huawei Technologies Co Ltd; Fudan University
Priority date: 2021-02-27
Filing date: 2021-02-27
Publication date: 2021-07-02

Abstract

The embodiment of the present application discloses a gesture recognition method, which relates to the field of artificial intelligence and gesture recognition. The method includes: acquiring a current target application scenario, and acquiring all gesture recognition models from a plurality of gesture recognition models for recognizing gestures in different application scenarios. The target gesture recognition model corresponding to the target application scene, and then the target gesture type corresponding to the gesture data can be recognized through the target gesture recognition model. The present application separately trains a target gesture recognition model for the target application scenario. The target gesture recognition model is used to recognize the gesture type that needs to be recognized in the target application scenario. It only needs to correctly identify the gesture type that needs to be recognized in the target application scenario. There is no need to identify gesture categories other than the gesture types that need to be identified in the target application scene, which can reduce the confusion of gesture recognition and improve the accuracy of gesture recognition.

Description

Gesture recognition method and related device

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to an application operating method and related apparatus.

Background

With the rapid development of smart homes and wearable devices in recent years, devices for computing are ubiquitous and integrated into the living environment of people. In order to facilitate interaction between a person and a computing device, it is necessary to provide an interaction mode, such as an air gesture, which is more natural and enables a user to get rid of dependence on an input device and the like as much as possible, compared with a traditional contact-type human-computer interaction mode. The air-spaced gesture is a non-contact air gesture which enables a user to operate in a free-hand mode, and is a natural man-machine interaction mode which does not bring any inconvenience to the user gesture interaction.

The air gesture expresses the interaction intention of the user by naturally utilizing the actions of fingers, wrists and arms, mainly comprises finger swinging, fist making, palm rotating and the like, and has the characteristics of wider interaction space, higher flexibility, better interaction experience and the like. According to different sensing devices, the gesture recognition technology mainly comprises three types of gesture recognition technologies based on computer vision, ultrasonic waves and electromagnetic wave signals.

In the existing implementation mode, gesture types can be recognized through a pre-trained neural network model, the neural network model obtains data acquired by a radar device, and feature extraction and segmentation and gesture category recognition are performed. However, with the increase of the demand of gesture control, the number of gesture types is continuously increased, and the degree of distinction between different gesture types is smaller and smaller, so that accurate gesture type recognition becomes a relatively difficult matter, and the phenomenon of gesture misrecognition often occurs in the process of gesture type recognition by using the existing neural network model.

Disclosure of Invention

In a first aspect, the present application provides a gesture recognition method, which is applied to a computing device applied to a plurality of gesture-controlled application scenarios, where types of gestures to be recognized by the plurality of gesture-controlled application scenarios are not exactly the same, where multiple types of gesture-controlled application scenarios may be different application interfaces in the same application (for example, an application scenario may be an audio playing interface, a video playing interface, or an application navigation interface, etc.), or different applications of the same terminal device (for example, an application scenario may be an audio playing application, a chat application, a video playing application, or a game application), or different terminal devices in the same physical area (for example, an application scenario may be a terminal device such as a television, a game console, an air conditioner, etc.).

The method comprises the following steps:

acquiring a target application scene; acquiring a target gesture recognition model corresponding to the target application scene from a plurality of gesture recognition models, wherein each gesture recognition model in the plurality of gesture recognition models corresponds to one type of application scene, and each gesture recognition model is used for recognizing a gesture type needing to be recognized in the corresponding one type of application scene;

in the embodiment of the application, for each type of application scene, a corresponding gesture recognition model is trained, wherein the gesture recognition model corresponding to each type of application scene can recognize the gesture type of the application scene needing to be recognized.

Acquiring gesture data in the target application scene; and identifying a target gesture type corresponding to the gesture data through the target gesture identification model.

In this embodiment, the gesture data may be obtained by processing a reflected signal of a received radar signal, where the category of the radar signal may include, but is not limited to, a Continuous Wave (CW) signal and a chirp signal.

In the existing implementation manner, gesture types can be identified through a pre-trained neural network model, the neural network model can acquire data acquired by a radar device, and performs feature extraction and segmentation and gesture category identification, and in order to identify all gesture types required to be identified in various application scenes, the neural network model needs to have the capability of identifying all gesture types. Therefore, the neural network model needs to recognize more types of gesture types (for example, Y gesture types), however, if the target application scenario only needs to respond to a part of the gesture types (for example, M gesture types, and M is smaller than Y), the neural network model only needs to correctly recognize M gesture types when the target application scenario is in progress. If the existing implementation mode is adopted for gesture recognition, the following problems exist:

on the one hand, a problem of gesture misrecognition may be caused.

For example, two very similar gestures exist in Y gesture types, such as a circle drawing gesture and an X drawing gesture, M gesture types include one of the two very similar gestures, such as a circle drawing gesture without including an X drawing gesture or an X drawing gesture without including a circle drawing gesture, if the neural network model has a capability of recognizing the two very similar gestures, the neural network model needs to distinguish the two very similar gestures, and in the training process of the neural network model, because the similarities between the gestures are very large, the neural network model may have a very high probability of having local maxima when parameters are updated, and recognition of the two very similar gestures may be confused, such as recognition of a circle drawing gesture as an X drawing gesture, or recognition of an X drawing gesture as a circle drawing gesture. The target application scene only needs to respond to the circle drawing gesture and does not need to respond to the X drawing gesture, if a user carries out the X drawing gesture, the neural network model may recognize the X drawing gesture as the circle drawing gesture, and then the gesture is recognized by mistake.

On the other hand, the original neural network model may not be able to recognize the new gesture type.

Because the neural network model capable of recognizing Y gesture types is pre-trained, under the conditions that the types of application scenes are more and the types of gestures are various, some application scenarios have emerged that require response to gesture types not included within the Y gesture types, for example when the terminal device has newly installed some applications that may respond to gesture types that are not included in the Y gesture types, the above neural network model capable of recognizing the Y gesture types cannot be used, in which case, training is required to be continued on the original neural network model, or retraining a neural network model that can recognize the original Y-type gesture classes and the newly added gesture classes, because the number of gesture categories to be recognized is large, the number of parameters of the model is large, and the training process needs much calculation and time. In addition, if the similarity between the newly added gesture category and one or more gesture categories in the original Y-type gesture categories is large, the above-mentioned problem of recognition confusion may also occur. However, if a target gesture recognition model is trained separately for a target application interface, the target gesture recognition model is used to recognize M types of gestures, and taking the example that M types of gestures include a circle drawing gesture and do not include an X drawing gesture, the neural network model does not need to distinguish the circle drawing gesture and the X drawing gesture, but only needs to correctly recognize the circle drawing gesture, and since the X drawing gesture does not need to be recognized, the neural network model can converge quickly when updating parameters, and can accurately recognize the circle drawing gesture. And the number of the M gesture types is far smaller than that of the Y gesture types, so the parameter quantity of the model is small, and the calculation quantity and time required by the training process are much less than those required by training a neural network model capable of identifying the Y gesture types. When some new application scenes appear (that is, gesture categories which can be identified and need to be identified for the new application scenes are not stored in the terminal device or the cloud-side server), only a neural network model which can identify the gesture categories which need to be identified for the new application scenes needs to be trained.

The embodiment is directed at a target application scene, a target gesture recognition model is trained independently, the target gesture recognition model is used for recognizing gesture types needing to be recognized in the target application scene, only the gesture types needing to be recognized in the target application scene need to be recognized correctly, gesture categories except the gesture types needing to be recognized in the target application scene do not need to be recognized, the neural network model can be converged quickly when parameters are updated, confusion of gesture recognition can be reduced, and accuracy of gesture recognition is improved. And because the number of the gesture categories needing to be recognized is small, the parameter quantity of the model is small, and the calculation quantity and time needed by the training process are small. In addition, when some new application scenes appear, only a neural network model capable of recognizing the gesture classes which can be recognized by the new application scenes needs to be trained.

In one possible implementation, the target application scenario requires the recognition of M gesture types; the gesture recognition models further comprise a first gesture recognition model, the first gesture recognition model is used for recognizing corresponding gesture types according to the acquired gesture data in a first application scene, and the first application scene needs to recognize N gesture types;

the M gesture types are not identical to the N gesture types.

The computing device may obtain a gesture recognition model (target gesture recognition model) corresponding to a target application scenario from a local storage or a cloud storage, where the local storage or the cloud storage may store a plurality of gesture recognition models, each gesture recognition model corresponds to one type of application scenario, different types of application scenarios require different corresponding gesture categories, and each gesture recognition model may recognize a gesture category that the corresponding application scenario requires to recognize, specifically, the plurality of gesture recognition models further include a first gesture recognition model, the first gesture recognition model corresponds to a first application scenario, and the first application scenario requires to respond to N types of gestures; the first gesture recognition model is used for recognizing N gesture types needing to be recognized by a first application scene, and the M gesture types are not identical to the N gesture types. In one implementation, the M gesture types are part of the N gesture types, or the N gesture types are part of the M gesture types, or each gesture type of the M gesture types does not belong to the N gesture types, that is, an intersection of the M gesture types and the N gesture types is an empty set, or the M gesture types and the N gesture types include the same gesture type, and there is a gesture type that is not included in the N gesture types in the M gesture types.

In one possible implementation, the N gesture types include a first gesture and a second gesture, a similarity of the first gesture and the second gesture is greater than a preset value, and the M gesture types include the first gesture and do not include the second gesture.

Here, that the similarity between the first gesture and the second gesture is greater than a preset value means that the first gesture and the second gesture are very similar gestures, specifically, it can be understood that the trajectories between the first gesture and the second gesture are similar, if the application scene is not sensitive to the trajectory amplitude of the gesture, that is, no matter how large the gesture amplitude is made by the user, the trajectories should be considered as the same kind of gestures as long as the shapes of the trajectories are similar, and the application scene will make the same response, in this case, the similarity between the first gesture and the second gesture larger than the preset value can be understood as the shape of the trajectory is similar, for example, any two gestures, such as a circle-drawing gesture with a large amplitude, a circle-drawing gesture with a small amplitude, an X-drawing gesture with a large amplitude, and an X-drawing gesture with a small amplitude, may be considered as gestures with a similarity greater than a preset value. If the application scene is sensitive to the track amplitude of the gesture, that is, for gestures of the same shape, the amplitudes made by the user are different, and the gestures are considered to be different types of gestures, and the application scene performs different identifications, in this case, the similarity between the first gesture and the second gesture is greater than the preset value, and it can be understood that the shapes of the tracks are similar and the amplitudes of the tracks are similar, for example, a gesture with the similarity greater than the preset value can be considered between a circling gesture and an X-drawing gesture of similar amplitude.

In one possible implementation, where the first gesture recognition model processes the gesture data, the first gesture recognition model determines that the gesture data corresponds to a first gesture type or that the gesture data is unrecognizable, the first gesture type being different from the target gesture type.

In some implementations, the same gesture data recognition results may be different for different application scenarios (either different gesture categories may be recognized or not recognized). For example, for the interface a, when the user performs a circling gesture, the circling gesture needs to be recognized and a response is made, while for the interface B, when the user performs a circling gesture, the response is not needed; for example, for the interface a, it is necessary to distinguish the difference in the magnitude of the circling gesture of the user and perform different recognition, specifically, when the user performs a circling gesture with a large magnitude, it is necessary to recognize the circling gesture with a large magnitude and perform a response corresponding to the circling gesture with a large magnitude, when the user performs a circling gesture with a small magnitude, it is necessary to recognize the circling gesture with a small magnitude and perform a response corresponding to the circling gesture with a small magnitude, and for the interface B, when the user performs a circling gesture, regardless of the magnitude, it is only necessary to recognize the circling gesture and perform the same response. By the classification aiming at different application scenes, the recognition of different results can be carried out according to different applications aiming at the same gesture.

In one possible implementation, the plurality of gesture recognition models are neural network models.

In one possible implementation, the training samples of the target gesture recognition model include gesture data with sample labels of M gesture types;

the training sample of the first gesture recognition model comprises gesture data with sample labels of N gesture types.

In one possible implementation, the target gesture type is a circle-drawing gesture, the method further comprising: determining a motion characteristic of the circling gesture based on the gesture data, the motion characteristic comprising an angular change of the circling gesture over time in an azimuth angle or a pitch angle; acquiring the number of wave crests and wave troughs in the angle change; and determining the circling times of the circling gesture according to the number of the wave crests or the wave troughs.

The target gesture type can be recognized as a circling gesture through the target gesture recognition model, and the circling times of the circling gesture can be determined based on the motion characteristics of gesture data.

In another implementation, second gesture data in the target application scenario may be acquired; determining motion characteristics of a third gesture according to the second gesture data, wherein the motion characteristics comprise angle changes of the third gesture in an azimuth angle or a pitch angle along with time; if the preset condition is met, determining that the gesture type of the third gesture is a circling gesture, and the first wave peak and the second wave peak are adjacent wave peaks in the angle change; the preset condition comprises that the difference of time corresponding to a first peak and a second peak in the angle change is larger than a target time threshold, and at least one of an azimuth angle or a pitch angle corresponding to the first peak and the second peak is larger than a target angle threshold, and the difference of the angle change and the sine wave is within a preset range.

In the embodiment, whether the gesture type is the circling gesture is judged directly by analyzing and processing the motion characteristics of the gesture data without a neural network, so that the time overhead required can be greatly reduced, and the circling gesture type can be quickly judged.

In some scenarios, the application scenario may identify some periodic gestures, and perform an application operation based on the category of the periodic gestures and some motion characteristics, where the application operation is related to the motion characteristics of the gestures, for example, the video playing interface may identify a circling gesture, and perform an application operation of volume adjustment, where the volume adjustment is related to the number of circling of the circling gesture, and the adjustment process is continuous and smooth, for example, every 30 degrees is drawn, the volume adjustment is performed by one unit, in this case, real-time gesture data (specifically, angle data including gesture motion, for example, azimuth angle and pitch angle) needs to be acquired, so that the identified smooth application operation may be performed. At this time, if a neural network model for gesture recognition is still used, only the gesture category can be acquired, and the motion data of the gesture cannot be acquired in real time.

In one possible implementation, the method further comprises: acquiring the number of wave crests and wave troughs in the angle change;

and determining the circling times of the circling gesture according to the number of the wave crests and the wave troughs.

Specifically, M peaks and N troughs of the angle change may be obtained; determining the number of times of circling of the circling gesture based on the M wave crests and the N wave troughs, wherein one wave crest reaches the back and the adjacent wave crests correspond to one circling, and one wave trough reaches the back and the adjacent wave troughs correspond to one circling; in order to obtain the number of circles drawn by the circling gesture, M wave crests and N wave troughs of the angle change may be obtained, and the number of circling times of the circling gesture is determined based on the M wave crests and the N wave troughs, where one wave crest reaches the next adjacent wave crest and corresponds to one circling, and one wave trough reaches the next adjacent wave trough and corresponds to one circling.

In one possible implementation, the N wave troughs include a first wave trough and a second wave trough, the motion characteristic includes an angular change over time in the target gesture azimuth or pitch angle within a target time window, the second wave trough corresponds to a last frame within the target time window, and a difference value between the second wave trough and the first wave trough is smaller than a preset angle; or the like, or, alternatively,

the M wave crests include a first wave crest and a second wave crest, the motion characteristics include angle changes along with time in a target gesture azimuth angle or a target pitch angle in a target time window, the second wave crest corresponds to a last frame in the target time window, and a difference value between the second wave crest and the first wave crest is smaller than a preset angle.

In one implementation, the N wave troughs include a first wave trough and a second wave trough, the motion characteristic includes an angular change over time in the target gesture azimuth or pitch angle within a target time window, the second wave trough corresponds to a last frame within the target time window, and a difference value between the second wave trough and the first wave trough is smaller than a preset angle; or, the M peaks include a first peak and a second peak, the motion characteristic includes an angle change with time in the target gesture azimuth angle or the pitch angle in a target time window, the second peak corresponds to a last frame in the target time window, and a difference value between the second peak and the first peak is smaller than a preset angle.

Taking 1/10 with a preset angle as the peak height as an example, in this embodiment, peak searching may be performed on 1-dimensional (1D) time domain features of an azimuth angle and a pitch angle in a sliding window, which change with time, to obtain parameters such as the number of peaks, the peak position, the peak height, the peak width of the features, and the like, feature segmentation is performed by using a peak searching result, periodic gesture determination is performed based on peak searching and harmonic distortion calculation, and after a periodic gesture (for example, the above-mentioned circling gesture) is determined, feature segmentation, periodic gesture recognition, and counting of the periodic gesture may be performed according to the peak searching result.

In one possible implementation, the angular change in azimuth or elevation over time comprises a first angular change in azimuth of the circling gesture over time and a second angular change in elevation over time of the circling gesture; the method further comprises the following steps: and determining the circling direction of the circling gesture according to the first angle change and the second angle change.

Specifically, the first angle change includes a first sub-angle change and a second sub-angle change, the second angle change includes a third sub-angle change and a fourth sub-angle change, the first sub-angle change and the third sub-angle change are angle changes in the same time period, the second sub-angle change and the fourth sub-angle change are angle changes in the same time period, the third sub-angle change is an angle change from one peak to an adjacent valley in the second angle change, the fourth sub-angle change is an angle change from one valley to an adjacent peak in the second angle change, and the circling direction of the target gesture can be determined to be clockwise circling based on the first sub-angle change increasing first and then decreasing and the second sub-angle change decreasing first and then increasing; determining that the circling direction of the target gesture is anticlockwise circling based on the first sub-angle change decreasing first and then increasing and the second sub-angle change increasing first and then decreasing;

specifically, the angular change over time in the azimuth angle or the pitch angle includes a first angular change over time in the azimuth angle of the circling gesture, and a second angular change over time in the pitch angle of the target gesture, where the first angular change includes a first sub-angular change and a second sub-angular change, the second angular change includes a third sub-angular change and a fourth sub-angular change, the first sub-angular change and the third sub-angular change are angular changes in the same time period, the second sub-angular change and the fourth sub-angular change are angular changes in the same time period, the third sub-angular change is an angular change from one peak to an adjacent valley in the second angular change, and the fourth sub-angular change is an angular change from one valley to an adjacent peak in the second angular change, determining that the circling direction of the target gesture is clockwise circling based on the first sub-angle change increasing first and then decreasing and the second sub-angle change decreasing first and then increasing; determining that the circling direction of the target gesture is anticlockwise circling based on the first sub-angle change decreasing first and then increasing and the second sub-angle change increasing first and then decreasing, wherein clockwise circling is correspondingly characterized in that the pitch angle crest azimuth angle increases first and then decreases, and the pitch angle trough azimuth angle decreases first and then increases; the anticlockwise circling is correspondingly characterized in that the azimuth angle from the pitch angle wave crest to the wave trough is firstly negative and then positive, and the azimuth angle from the pitch angle wave trough to the wave crest is firstly increased and then reduced.

In a second aspect, the present application provides a gesture recognition apparatus, which is applied to a plurality of gesture-controlled application scenarios, where types of gestures to be recognized are not identical, the apparatus including:

the acquisition module is used for acquiring a target application scene;

the obtaining module is further configured to obtain a target gesture recognition model corresponding to the target application scene from a plurality of gesture recognition models, where each gesture recognition model in the plurality of gesture recognition models corresponds to one type of application scene, and each gesture recognition model is used to identify a gesture type to be identified in the corresponding one type of application scene;

the acquisition module is further used for acquiring gesture data in the target application scene;

and the gesture recognition module is used for recognizing the target gesture type corresponding to the gesture data based on the target gesture recognition model.

In one possible implementation, the target application scenario requires the recognition of M gesture types;

the gesture recognition models further comprise a first gesture recognition model, the first gesture recognition model corresponds to a first application scene, and the first application scene needs to recognize N gesture types;

the M gesture types are not identical to the N gesture types.

In a possible implementation, the obtaining module is further configured to obtain gesture data in the target application scenario, specifically:

the acquisition module is further used for acquiring radar reflection information in the target application scene;

the acquisition module is further used for acquiring gesture data based on the radar reflection information.

In one possible implementation, the gesture recognition module is further configured to:

determining motion features of a target gesture based on the gesture data, the motion features including angular variation of the target gesture over time in azimuth or pitch;

the acquisition module is further used for acquiring the number of wave crests and wave troughs in the angle change;

the device further comprises:

a circling number determination module configured to: and determining the circling times of the circling gesture according to the number of the wave crests and the wave troughs.

In one possible implementation, the angular change in azimuth or pitch over time comprises a first angular change in azimuth of the circling gesture over time and a second angular change in pitch over time of the target gesture;

the device further comprises: a circling direction determination module to:

and determining the circling direction of the circling gesture according to the first angle change and the second angle change.

In a third aspect, an embodiment of the present application provides a gesture recognition apparatus, including: one or more processors and memory; wherein the memory has stored therein computer readable instructions; the one or more processors read the computer-readable instructions to cause the computer apparatus to implement the first aspect and any optional method thereof as described above.

In a fourth aspect, the present application provides a computer-readable storage medium, which is characterized by comprising computer-readable instructions, when the computer-readable instructions are executed on a computer device, the computer device is caused to execute the first aspect and any optional method thereof.

In a fifth aspect, the present application provides a computer program product, which is characterized by comprising computer readable instructions, when the computer readable instructions are executed on a computer device, the computer device is caused to execute the first aspect and any optional method thereof.

In a sixth aspect, an embodiment of the present application provides a terminal device, including:

a radar device for transmitting a radar signal; receiving a reflected signal of the radar signal;

a memory and a processor; the memory stores code, and the processor is configured to retrieve the code and perform the method of the first aspect and any of the alternatives described above to obtain a target gesture type;

and the application operation module is used for executing corresponding application operation based on the target gesture type.

In a seventh aspect, the present application provides a chip system, which includes a processor, configured to support an execution device or a training device to implement the functions recited in the above aspects, for example, to transmit or process data recited in the above methods; or, information. In one possible design, the system-on-chip further includes a memory for storing program instructions and data necessary for the execution device or the training device. The chip system may be formed by a chip, or may include a chip and other discrete devices.

The embodiment of the application provides a gesture recognition method, which is applied to a computing device, wherein the computing device is applied to a plurality of gesture-controlled application scenes, and gesture types needing to be recognized in the plurality of gesture-controlled application scenes are not identical, and the method comprises the following steps: acquiring a target application scene; acquiring a target gesture recognition model corresponding to the target application scene from a plurality of gesture recognition models, wherein each gesture recognition model in the plurality of gesture recognition models corresponds to one type of application scene, and each gesture recognition model is used for recognizing a gesture type needing to be recognized in the corresponding one type of application scene; acquiring gesture data in the target application scene; and identifying a target gesture type corresponding to the gesture data based on the target gesture identification model. The embodiment is directed at a target application scene, a target gesture recognition model is trained independently, the target gesture recognition model is used for recognizing gesture types needing to be recognized in the target application scene, only the gesture types needing to be recognized in the target application scene need to be recognized correctly, gesture categories except the gesture types needing to be recognized in the target application scene do not need to be recognized, the neural network model can be converged quickly when parameters are updated, confusion of gesture recognition can be reduced, and accuracy of gesture recognition is improved. And because the number of the gesture categories needing to be recognized is small, the parameter quantity of the model is small, and the calculation quantity and time needed by the training process are small. In addition, when some new application scenes appear, only a neural network model capable of recognizing the gesture classes which can be recognized by the new application scenes needs to be trained.

Drawings

Fig. 1a is a scene schematic provided in an embodiment of the present application;

fig. 1b is a scene schematic provided in the embodiment of the present application;

fig. 1c is a scene schematic provided in the embodiment of the present application;

fig. 2 is a schematic structural diagram of a radar apparatus according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a terminal device according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application;

fig. 5 is a schematic diagram of an embodiment of a gesture recognition method provided in an embodiment of the present application;

FIG. 6 is an application interaction diagram provided by an embodiment of the present application;

FIG. 7a is a schematic diagram of an application interaction provided in an embodiment of the present application;

FIG. 7b is a schematic diagram of an application interaction provided by an embodiment of the present application;

FIG. 8 is an application interaction diagram provided by an embodiment of the present application;

FIG. 9 is an application interaction diagram provided by an embodiment of the present application;

FIG. 10 is an application interaction diagram provided by an embodiment of the present application;

FIG. 11 is a schematic diagram of a 3D point cloud provided by an embodiment of the present application;

fig. 12 is a schematic diagram of the time domain amplitude variation of the chirp signal;

fig. 13 is a schematic diagram of a frame signal including K chirp signals;

fig. 14 is a schematic diagram of a process for generating a range-doppler spectrum according to an embodiment of the present application;

FIG. 15 is a schematic diagram of a 3D point cloud provided by an embodiment of the present application;

FIG. 16 is an illustration of an angular change of a gesture provided by an embodiment of the present application;

FIG. 17 is an illustration of an angular change of a gesture provided by an embodiment of the present application;

FIG. 18 is an illustration of an angular change of a gesture provided by an embodiment of the present application;

fig. 19 is a gesture recognition rate diagram provided in an embodiment of the present application;

fig. 20 is a gesture recognition rate diagram provided in an embodiment of the present application;

fig. 21 is a schematic structural diagram of a gesture recognition apparatus provided in the present application;

fig. 22 is a schematic structural diagram of a gesture recognition apparatus according to an embodiment of the present application;

fig. 23 is a schematic structural diagram of a server provided in an embodiment of the present application;

fig. 24 is a schematic structural diagram of a chip according to an embodiment of the present disclosure.

Detailed Description

The embodiments of the present invention will be described below with reference to the drawings. The terminology used in the description of the embodiments of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Embodiments of the present application are described below with reference to the accompanying drawings. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems. First, an application scenario of the embodiment of the present application is introduced:

the embodiment of the application can be applied to gesture control application scenes.

In one implementation, the application scene controlled by the gesture can be different applications (such as an audio playing application, a chat application, a video playing application or a game application) of the same terminal device; in one implementation, the gesture-controlled application scenes may be application interfaces (e.g., audio playing interfaces, video playing interfaces, application navigation interfaces, etc.) in the same application; in one implementation, the application scenarios of gesture control may be different terminal devices (e.g., television, game console, air conditioner, etc.) in the same physical area.

When performing gesture recognition, a radar device on the terminal device may send a radar signal and receive a reflected signal of the radar signal, and it should be understood that if the reflected signal is a reflected signal specific to a gesture of a user, the reflected signal may be referred to as gesture data, and the gesture data may be a reflected signal or data obtained by performing certain signal processing (for example, analog-to-digital conversion and the like) on the reflected signal. By processing the gesture signal, the gesture type can be obtained, and further the operation corresponding to the gesture type can be performed on the terminal.

In one implementation, a computing device (e.g., a processor) may be deployed on the terminal device, and the computing device may perform a process of processing the gesture signals.

In one implementation, the terminal device may send the gesture data to other terminal devices (for convenience of description, it may be referred to as a first device) deployed with the computing device, the first device may perform a process of processing the gesture signal, and after the first device obtains the gesture category, the gesture category may be sent to the terminal device.

In one implementation, the process of processing the gesture signal may be performed by a computing device (e.g., a server) on the cloud side, and after the computing device on the cloud side obtains the gesture category, the gesture category may be sent to the terminal device.

For two cases where the computing device is deployed in the terminal device and the server, the following description is respectively given:

in one application architecture, the computing device is deployed in a terminal device, and the terminal device may also be deployed with a radar apparatus.

Taking an example that a computing device is deployed on a terminal device, please refer to fig. 1a, where fig. 1a is a schematic diagram of an application scenario provided in an embodiment of the present application. As shown in fig. 1a, the computing device 101 may be integrated in a television, and the computing device 101 may also be integrated in other terminal devices (for example, a computer, a smart phone, or a smart watch, and other products that require a user to perform contactless human-computer interaction).

The terminal device 100 in fig. 1a may be provided with a radar apparatus 102. In the scenario shown in FIG. 1a, a user waves a hand in the surveillance area of radar device 102, which forms a motion trajectory in a spatial coordinate system (such as, but not limited to, an x-y-z spatial rectangular coordinate system) as shown in FIG. 1 a. The radar device 102 may transmit a radar signal to the human body and receive a reflected signal of the radar signal.

Depending on the implementation of the radar apparatus 102, the radar signal may have a variety of carriers, such as: when the radar device 102 is a microwave radar, the radar signal is a microwave signal; when the radar device 102 is an ultrasonic radar, the radar signal is an ultrasonic signal; when the radar device 102 is a lidar, the radar signal is a laser signal. It should be noted that, when the radar apparatus 102 is integrated with a plurality of different radars, the radar signal may be a collection of a plurality of radar signals, which is not limited herein.

After receiving the reflected signal, the radar apparatus 102 may process the reflected signal to determine gesture data, and determine a gesture type from the gesture data by the computing device 101, or directly send the reflected signal to the computing device 101, and determine the gesture type from the reflected signal by the computing device 101.

In one application architecture, the computing device may be a processor deployed in a terminal device.

Next, taking the computing device 101 as a processor as an example, the architecture of the terminal device in fig. 1a is described, referring to fig. 2, and fig. 2 is a schematic architecture diagram of a terminal device 100 according to an embodiment of the present disclosure. Therein, the terminal device 100 may generate and transmit radar signals into the area that the terminal device 100 is monitoring. The generation and transmission of signals may be accomplished by a Radio Frequency (RF) signal generator 12, a radar transmit circuit 14, and a transmit antenna 32. Radar transmit circuitry 14 generally includes any circuitry required to generate signals for transmission via transmit antenna 32, such as pulse shaping circuitry, transmit trigger circuitry, RF switching circuitry, or any other suitable transmit circuitry used by terminal device 100. The RF signal generator 12 and the radar transmission circuit 14 may be controlled via a processor 20 which issues command and control signals via control lines 34 so that a desired RF signal having a desired configuration and signal parameters is transmitted at the transmission antenna 32.

Terminal device 100 also receives a return radar signal, which may be referred to as an "echo" or "echo signal" or "reflected signal," at analog processing circuitry 16 via receive antenna 30. Analog processing circuitry 16 generally includes any circuitry required to process signals received via receive antenna 30 (e.g., signal splitting, mixing, heterodyning and/or homodyne conversion, amplification, filtering, receive signal triggering, signal switching and routing, and/or any other suitable radar signal receiving function performed by terminal device 100). Thus, analog processing circuit 16 generates one or more analog signals, e.g., an in-phase (I) analog signal and a quadrature (Q) analog signal, that are processed by terminal device 100. The resulting analog signal is transmitted to and digitized by analog-to-digital converter circuitry (ADC) 18. The digitized signal is then forwarded to processor 20 for reflected signal processing.

The processor 20 may be, among other things, the computing device 101 shown in FIG. 1 a.

The processor 20 may be one of various types of processors that implement the following functions: which is capable of processing the digitized received signals and controlling RF signal generator 12 and radar transmit circuit 14 to provide radar operation and functionality of terminal device 100. Thus, the processor 20 may be a Digital Signal Processor (DSP), microprocessor, microcontroller, or other such device. To perform the radar operations and functions of the terminal device 100, the processor 20 interfaces via the system bus 22 with one or more other desired circuits (e.g., one or more memory devices 24 comprised of one or more types of memory, any desired peripheral circuit 26 identification, and any desired input/output circuit 28).

As described above, the processor 20 may interface the RF signal generator 12 and the radar transmission circuit 14 via the control line 34. In an alternative embodiment, the RF signal generator 12 and/or the radar transmit circuit 14 may be connected to the bus 22 such that they may communicate with one or more of the processor 20, the memory device 24, the peripheral circuits 26, and the input/output circuits 28 via the bus 22.

In one application architecture, a computing device is deployed on a server.

Taking an example that a computing device is deployed on a server, please refer to fig. 1b, where fig. 1b is a schematic diagram of an application scenario provided in an embodiment of the present application. As shown in fig. 1b, computing device 101 may be deployed on a server.

The embodiment of the present application is also applied to an interactive system between a terminal device and a server, as shown in fig. 1b, the system may include the terminal device and the server. The terminal device may be a mobile terminal, a human-computer interaction device, a vehicle-mounted visual perception device, such as a mobile phone, a sweeper, an intelligent robot, an unmanned vehicle, an intelligent monitor, an Augmented Reality (AR) wearable device, and the like.

Taking a computing device as an example of a processor in a server, the server 200 may include a processor 210 and a transceiver 220, and the transceiver 220 may be connected to the processor 210, as shown in fig. 1 c. Transceiver 220 may include a receiver and a transmitter that may be used to receive or transmit messages or data, and transceiver 220 may be a network card. The server 200 may also include an acceleration component (which may be referred to as an accelerator), which may be a network card when the acceleration component is a network acceleration component. Processor 210 may be the control center for server 200, connecting various portions of the overall server 200, such as transceiver 220, using various interfaces and lines. In the present invention, the processor 210 may be a Central Processing Unit (CPU), and optionally, the processor 210 may include one or more Processing units. The processor 210 may also be a digital signal processor, an application specific integrated circuit, a field programmable gate array, a GPU, other programmable logic device, or the like. The server 200 may further include a memory 230, the memory 230 may be used to store software programs and modules, and the processor 210 executes various functional applications and data processing of the server 200 by reading the software codes and modules stored in the memory 230.

The terminal device provided by the embodiment of the present application is described next in conjunction with the form of a product. The terminal equipment can be a television, a computer, a smart phone or a smart watch and other products which need to be subjected to non-contact human-computer interaction by a user.

Referring to fig. 3, fig. 3 is a schematic structure of a terminal device provided in this embodiment, as shown in fig. 3, the terminal device provided in this embodiment may include a sensor unit and an edge service unit, the sensor unit may include a radar device and an auxiliary sensor, optionally, the radar device may be a millimeter wave radar shown in fig. 3 or another type of radar device, and the auxiliary sensor may include, but is not limited to, a light sensor (ambient light sensor), a distance sensor (proximity sensor), an acceleration sensor (accelerometric sensor), a gravity sensor (G-sensor), a hall sensor (hall sensor), an ultraviolet sensor, and the like. It should be understood that the terminal device may not include an auxiliary sensor.

The edge service unit mainly comprises computing equipment and a storage device, data collected by the sensor unit can be transmitted to the edge service unit, and data processing is carried out by the computing equipment included in the edge service unit. It should be understood that the radar apparatus in fig. 3 may be an apparatus that does not include a processor, but includes a radar signal transmitting and receiving apparatus, the edge service unit may be integrated into the radar apparatus in the sensor unit to serve as a processor of the radar apparatus, and the edge service unit may not be integrated into the radar apparatus in the sensor unit but serve as a separate data processing apparatus.

In addition, data are mutually transmitted between each terminal device and the server, the terminal devices can also send the processed data to the server on the cloud side, and specifically, the server can be in communication connection with the n terminal devices.

Taking a terminal device as a television as an example, as shown in fig. 4, taking a gesture interaction scene of the television as an example, a display screen of the television may display an application interface, for example, in an application scene of video control, the application interface may be a video playing interface, in an application scene of audio control, the application interface may be an audio playing interface, and in an application scene of User Interface (UI) navigation, the application interface may be an application navigation interface.

In the embodiment of the application, a radar device (e.g., a millimeter wave radar) in the sensor unit may be used to detect a gesture in a scene and acquire gesture data, and the edge service unit may be used to determine a gesture type based on a type of an application scene (e.g., a type of an application interface). Or the terminal device may send the gesture data to the cloud-side server, and the cloud computing center on the cloud side (i.e., the computing device in the cloud-side server) determines the gesture type.

Referring to fig. 5, fig. 5 is a schematic view of an embodiment of a gesture recognition method provided in an embodiment of the present application, where the gesture recognition method provided in the embodiment of the present application may be applied to a terminal device or a computing device in a server, where the terminal device may be a product that requires a user to perform contactless human-computer interaction, such as a television, a computer, a smart phone, or a smart watch. As shown in fig. 5, a gesture recognition method provided in an embodiment of the present application includes:

501. and acquiring a target application scene.

In the embodiment of the application, the computing device needs to identify and acquire the current application scene; if the computing device is deployed in the terminal device, the computing device may identify a currently running application category or a currently applied interface category of the terminal device, so as to determine a current target application scenario of the terminal device.

Taking the granularity for dividing different application scenes as an application interface as an example, the terminal device may include a display screen, the terminal device may open an application program, and correspondingly, an application interface corresponding to the opened application program may be displayed on the display screen of the terminal device.

Because the different types of application interfaces have different functions, the types of gestures to be recognized by the different types of application interfaces may be different, and taking the application interface as a video playing interface as an example, the video playing interface may respond to a swing-up gesture, a swing-down gesture, a swing-left gesture, a swing-right gesture, a swing-forward gesture (or referred to as a click operation), a swing-back gesture (or referred to as a hand-raising operation), and a circle-drawing gesture.

The video playing interface can respond to the left-hand gesture and the right-hand gesture and conduct application operation of playing progress adjustment. Wherein, left wave operation corresponds to forward play progress adjustment, and right wave operation corresponds to backward play progress adjustment, for example, as shown in fig. 7a, before the display screen that shows the video playback interface of fig. 6 is displayed, the user can carry out the gesture of waving right, and after terminal equipment identified this gesture of waving right, can carry out the application operation that the gesture of waving right corresponds, that is to say the application operation of backward play progress adjustment.

It should be understood that the magnitude of the left-hand swing gesture and the right-hand swing gesture may determine the magnitude of the video progress adjustment in the video playing interface, the greater the magnitude of the video progress adjustment, the video playing interface may display the adjustment progress in real time (for example, the fast-forward progress bar shown in fig. 7 a) when the user performs the adjustment of the video playing progress, the user may perform a hand-raising gesture (or referred to as a back-hand swing gesture) when the user performs the adjustment of the video playing progress, the video playing interface may fast-forward the video to the adjusted video playing progress in response to the hand-raising gesture, and as shown in fig. 8, the video playing interface may fast-forward the video to the adjusted video playing progress (10: 03).

Wherein, the video playing interface can also respond to the circling gesture and carry out the application operation of adjusting the playing progress, wherein, when the user performs clockwise circling, the circling gesture of the upper half circle corresponds to backward play progress adjustment, the circling gesture of the lower half circle corresponds to forward play progress adjustment, when the user performs a counterclockwise circling, the circling gesture of the lower half corresponds to a backward play progress adjustment, the circling gesture of the upper half corresponds to a forward play progress adjustment, as shown for example with reference to figure 7b, before displaying the display screen of the video playing interface of fig. 6, the user may perform a clockwise circling gesture, and after recognizing the circling gesture of the clockwise circling, when the circle can be drawn clockwise, the application operation corresponding to the circle drawing gesture of the first half circle is the application operation of the backward playing progress adjustment.

It should be understood that the magnitude of the circling gesture can determine the size of the video progress adjustment in the video playing interface, the larger the magnitude is, the larger the size of the video progress adjustment is, when the user adjusts the video playing progress, the video playing interface can display the adjustment progress in real time (for example, the fast-forward progress bar shown in fig. 7 b), when the user adjusts the video playing progress that the user wants to adjust, the user can perform a hand-up gesture (or referred to as a back-swing gesture), the video playing interface can fast-forward the video to the adjusted video playing progress in response to the hand-up gesture, as shown in fig. 8, the video playing interface can fast-forward the video to the adjusted video playing progress (10:03)

The video playing interface can respond to the upward swing gesture and the downward swing gesture and perform application operation of volume adjustment. Wherein, the gesture of waving upward corresponds to the volume of turning up, the gesture of waving downward corresponds to the volume of turning down, for example refer to fig. 9 and show, before the display screen that shows the video playback interface of fig. 6 and near the region on the right side, the user can wave upward the gesture, terminal equipment is after discerning this gesture of waving upward, can wave upward the application operation that the gesture corresponds, that is, the application operation of the volume of turning up, the user can wave down the gesture, terminal equipment is after discerning this gesture of waving downward, can wave down the application operation that the gesture corresponds, that is, the application operation of the volume of turning down.

The video playing interface can respond to the swing-up gesture and the swing-down gesture and perform application operation of brightness adjustment. For example, as shown in fig. 10, before a display screen displaying the video playing interface of fig. 6 is displayed and in a region near the left side, a user can perform a swing gesture, after recognizing the swing gesture, the terminal device can perform an application operation corresponding to the swing gesture, that is, an application operation of increasing the brightness, the user can perform a swing gesture, and after recognizing the swing gesture, the terminal device can perform an application operation corresponding to the swing gesture, that is, an application operation of decreasing the brightness.

It should be understood that the gesture operation described above is merely an illustration, the video playing interface may respond to a part, all or none of the gestures shown above, and the application operation performed is not limited to the above example, for example, the video playing interface may perform volume adjustment instead of progress adjustment in response to a circle-drawing gesture, wherein the number of circles determines the volume adjustment size, and the direction of circles (clockwise or counterclockwise) determines the direction of audio adjustment (volume up or volume down).

In the embodiment of the application, different types of application interfaces can respond to different numbers or different types of gestures and perform corresponding application operations, and the application operations performed in response to the same gesture can also be different. For example, the video playback interface may respond to a top swing gesture, a bottom swing gesture, a left swing gesture, a right swing gesture, a forward swing gesture (or referred to as a tap operation), a back swing gesture (or referred to as a hand raise operation), a circle drawing gesture, for a total of 7 gestures, while the application navigation interface may respond to a left swing gesture, a right swing gesture, a forward swing gesture (or referred to as a tap operation), a back swing gesture (or referred to as a hand raise operation), for a total of 4 gestures.

Taking the granularity of dividing different application scenes as an example, the gesture categories to be recognized by different applications may be different.

Taking the granularity for dividing different application scenes as the terminal device as an example, the gesture categories to be recognized by different terminal devices may be different.

502. And acquiring a target gesture recognition model corresponding to the target application scene from a plurality of gesture recognition models, wherein each gesture recognition model in the plurality of gesture recognition models corresponds to one type of application scene, and each gesture recognition model is used for recognizing the gesture type needing to be recognized in the corresponding application scene.

In the existing implementation manner, gesture types can be identified through a pre-trained neural network model, the neural network model can acquire data acquired by a radar device, and perform feature extraction and segmentation and gesture category identification, and in order to identify all gesture types that need to be identified in each application scenario, the neural network model needs to have the capability of identifying all gesture types. Therefore, the neural network model needs to recognize a large number of types of gestures (for example, Y types of gestures), however, if the target application scenario only needs to respond to a part of the gesture types (for example, M types of gestures, and M is smaller than Y), the neural network model only needs to correctly recognize M types of gestures when the target application scenario is in the target application scenario, which may have the following problems:

on the one hand, a problem of gesture misrecognition may be caused.

Therefore, in the embodiment of the application, for each type of application scenario, a corresponding gesture recognition model is trained, where the gesture recognition model corresponding to each type of application scenario can recognize a gesture type that the application scenario needs to be recognized.

Taking a target application scene as an example of a target application interface, in this embodiment of the application, if an application interface (referred to as a target application interface in the following embodiment) is currently displayed on a display screen of a terminal device, a target gesture recognition model corresponding to the target application interface may be obtained from a plurality of gesture recognition models based on that the target application interface currently displayed by the terminal device is used for responding to M types of gestures, and the target gesture recognition model may recognize a gesture type corresponding to gesture data of a user from the M types of gestures according to the gesture data of the user.

In an alternative implementation, the computing device may obtain, from a local storage or a cloud storage, a gesture recognition model (target gesture recognition model) corresponding to a target application scenario, where the local storage or the cloud storage may store a plurality of gesture recognition models, each gesture recognition model corresponds to one type of application scenario, different types of application scenarios require different corresponding gesture categories, and each gesture recognition model may recognize a gesture category that the corresponding application scenario requires to recognize, specifically, the plurality of gesture recognition models further include a first gesture recognition model of a first gesture recognition model, the first gesture recognition model corresponds to a first application scenario, and the first application scenario requires to respond to N types of gestures; the first gesture recognition model is used for recognizing N gesture types needing to be recognized by a first application scene, and the M gesture types are not identical to the N gesture types. In one implementation, the M gesture types are part of the N gesture types, or the N gesture types are part of the M gesture types, or each gesture type of the M gesture types does not belong to the N gesture types, that is, an intersection of the M gesture types and the N gesture types is an empty set, or the M gesture types and the N gesture types include the same gesture type, and there is a gesture type that is not included in the N gesture types in the M gesture types.

In one implementation, the N gesture types include a first gesture and a second gesture, a similarity of the first gesture and the second gesture is greater than a preset value, the M gesture types include the first gesture and do not include the second gesture,

503. And acquiring gesture data in the target application scene.

In an alternative implementation, a radar apparatus on a terminal device may transmit a first radar signal and receive a first reflection of the first radar signal.

In this embodiment, the radar device on the terminal device may transmit the first radar signal, and the category of the first radar signal may include, but is not limited to, a Continuous Wave (CW) signal and a chirp signal.

Wherein the chirp signal is an electromagnetic signal whose frequency varies with time. Generally, the frequency of the rising chirp signal increases over time, while the frequency of the falling chirp signal decreases over time. The frequency variation of the chirp signal may take many different forms. For example, the frequency of a chirp modulated (LFM) signal varies linearly. Other forms of frequency variation in the chirp signal include exponential variations.

In addition to these latter types of chirp signals in which the frequency varies continuously according to some predetermined function (i.e., a linear function or an exponential function), a chirp signal in the form of a stepped chirp signal in which the frequency varies stepwise may be generated. That is, a typical stepped chirp signal comprises a plurality of frequency steps, where the frequency is constant at each step for some predetermined duration. The stepped chirp signal may also be pulsed on and off, with the pulses being on during some predetermined time period during each step of the chirp scan.

In this embodiment, the radar apparatus may transmit a chirp signal, where the mathematical expression of the chirp signal may be:

wherein

B is the bandwidth of the mobile communication terminal,

to fix the initial phase, t_cFor the period of Chirp signal, A is amplitude, f₀Is the starting frequency.

In an embodiment of the application, a radar apparatus generates a first radar signal and transmits the first radar signal through a transmitter into an area being monitored by the radar apparatus. The generation and transmission of signals may be accomplished by the RF signal generator 12, the radar transmission circuitry 14, and the transmit antenna 32 in fig. 2.

In an embodiment of the present application, a transmitter of the radar apparatus transmits a first radar signal, and a receiver of the radar apparatus may receive an echo signal or a first reflected signal from a remote object. The echo signal or first reflected signal is a signal that the transmitted first radar signal strikes the remote object and is reflected by the object.

In the embodiment of the application, after receiving the first reflection signal, the gesture data may be acquired according to the first reflection signal. Specifically, the terminal device may transmit a first reflection signal received by the radar device to the computing device, and the computing wading pen may compute gesture data according to the first reflection signal.

In one implementation, the radar device has certain signal processing capability in addition to the capability of transmitting and receiving signals, and the radar device may perform analog-to-digital conversion on the first reflected signal to obtain a preprocessed first reflected signal (digitized signal).

In particular, the radar device may receive a reflected first radar signal, which may be referred to herein as an "echo" or "echo signal" or "first reflected signal," at the analog processing circuit 16 via the receive antenna 30 shown in fig. 2. Analog processing circuitry 16 may include any circuitry required to process signals received via receive antenna 30 (e.g., signal separation, mixing, heterodyne and/or homodyne conversion, amplification, filtering, receive signal triggering, signal switching and routing, and/or any other suitable radar signal receiving function performed by radar device 100). Analog processing circuitry 16 may generate one or more analog signals, such as an in-phase (I) analog signal and a quadrature (Q) analog signal, based on the first reflected signal. The resulting analog signal is transmitted to an analog to digital converter (ADC) circuit 18 and digitized by the circuit (e.g., a demodulated baseband discrete sampled signal, which may also be referred to as intermediate frequency raw data in the following embodiments, may be obtained). The digitized signal (baseband discretely sampled signal) is then forwarded to a computing device for signal processing to derive gesture data.

For example, a first reflection signal corresponding to a gesture or a non-gesture made by a user may be received by a receiving antenna of the radar apparatus, the first reflection signal may obtain intermediate-frequency raw data after processes of frequency mixing, low-pass filtering, ADC, and the like, and the computing device may perform Fast Fourier Transform (FFT), constant false-alarm rate (CFAR), angle measurement, and the like on the intermediate-frequency raw data to obtain gesture data.

In one implementation, the terminal device may transfer the gesture data processed by the radar apparatus to a computing device deployed by itself, or a computing device deployed by another terminal device, or a computing device deployed by a cloud-side server.

The gesture data may characterize, among other things, motion characteristics of the target gesture, such as distance characteristics, motion rate characteristics, angle characteristics, and so forth. How to acquire gesture data according to the first reflection signal is described next.

In one implementation, a Linear Frequency Modulated Continuous Wave (LFMCW) signal system may be adopted, a high distance resolution is obtained by using a large bandwidth in a millimeter wave frequency band, and a high speed resolution is obtained by using a configuration of a small wavelength and a large modulation period of a millimeter wave. Based on a two-dimensional real or virtual antenna array, the device has the capability of measuring the azimuth angle and the pitch angle. The method comprises the steps of carrying out difference frequency processing on a reflection echo containing target information in a single frequency modulation period and a transmission signal to obtain an intermediate frequency signal, wherein the frequency of the intermediate frequency signal is in direct proportion to a target distance, and transforming the intermediate frequency signal to a frequency domain by utilizing Fast Fourier Transform (FFT) to obtain distance distribution in a measurable range. The FFT is carried out on signals corresponding to a plurality of frequency modulation cycles, the same sampling point positions or the same distances, so that the velocity distribution in a measurable range can be obtained, and the distances and the velocity distribution of the plurality of frequency modulation cycles form a distance Doppler distribution or a distance Doppler (RD) diagram.

And (3) performing target detection on the multi-channel RD diagrams obtained by all the transmitting and receiving antennas (one method is to perform Constant False Alarm Rate (CFAR) detection on the RD diagrams of all the channels which are subjected to incoherent superposition and averaging) to obtain all the moving target points in the scene. For example, referring to fig. 11, it can be seen from fig. 11 that a gesture object located near 1m is making a motion with a radial velocity close to 1m (a gesture motion close to a radar), and a moving part occupies a plurality of distance values and a plurality of velocity values in an RD diagram, and besides the gesture object, the radar also detects a multipath interference point with a larger distance. And selecting a proper angle estimation algorithm by utilizing the spatial phase difference of the plurality of virtual antennas, and calculating the azimuth angle and the pitch angle of the target point in the RD diagram.

Illustratively, the first reflected signal may include a plurality of chirp signals, each of which may be processed to obtain a corresponding distance fourier spectrum and a plurality of amplitude data.

In the embodiment of the present application, if r (N) is a baseband discrete sampling signal obtained by receiving and demodulating a reflection signal (chirp signal) received by a receiving antenna, where N is a sampling number in a single chirp signal period, N is applied to r (N)₁Fast Fourier Transform (FFT) computation, yielding r (k):

R(k)＝FFT(r(n)，N₁)，N₁≥n；

namely, one-dimensional fast fourier transform 1D-FFT calculation is performed on a baseband discrete sampling signal obtained by receiving and demodulating a reflection signal (chirp signal) received by an antenna to obtain a corresponding Range-FFT, wherein the Range-FFT consists of a plurality of frequency point Range-bins, and the frequency point Range-bins can be expressed as frequency point Range-bins

Frequency point alpha_iFor the module value of the complex value of the R (k) positive frequency domain, the distance corresponding to the Range-bin of a single frequency point can be defined as the distance resolution d_resThen the distance value d_i＝α_i×d_resMaximum detection distance of

Further, a Range-FFT can be obtained, wherein the vertical axis of the Range-FFT represents the signal reflection intensity corresponding to each Range value, and the signal reflection intensity can be defined as the modulus of the complex signal (for example, if the complex signal is a + bj, the signal reflection intensity can be represented as the modulus of the complex signal

) The Range-FFT of the Range Fourier spectrum may include N ₁2 distance values, and a signal reflection intensity corresponding to each distance value.

For example, referring to fig. 12, fig. 12 is a schematic diagram of a Range-FFT of a Range fourier spectrum provided in an embodiment of the present application, and as shown in fig. 12, an abscissa of the Range-FFT is a distance value d (including a distance value d)

) The vertical axis represents the intensity of the reflected signal, and it should be noted that the Range-FFT of the distance fourier spectrum shown in fig. 12 is composed of discrete data, which may include a plurality of reflected intensity peaks, such as the reflected intensity peak 401 and the reflected intensity peak 402 shown in fig. 12.

In the embodiment of the present application, after calculating the Range fourier spectrum of one chirp signal, similarly, the computing device may perform 1D-FFT on all K chirp signals in one frame to obtain K Range fourier spectrums Range-FFT, referring to fig. 13, where fig. 13 is a schematic diagram of the Range fourier spectrums Range-FFT provided in the embodiment of the present application, and fig. 13 shows the K Range fourier spectrums Range-FFT corresponding to the K chirp signals.

In this embodiment, the computing device may further calculate a plurality of amplitude data according to the reflected signal, whereinWhere each amplitude data corresponds to a distance value, the amplitude data is used to represent the motion amplitude of the corresponding distance value, and the amplitude data may be defined as, for example, an arc-tangent demodulation of the complex signal (e.g., if the complex signal is a + bj, the amplitude data may be represented as

In one embodiment, the computing device may calculate a range-doppler spectrum from the first reflected signal, from which a plurality of point cloud data, referred to as range-rate spectrum point clouds, may be derived. Each point cloud data includes a distance value, corresponding doppler velocity, and corresponding signal to noise ratio (SNR).

Specifically, the computing device may perform 1D-FFT computation on all K chirp signals in one frame to obtain K r (K) sequences, and perform FFT computation on a sequence formed by K values on the same Range-bin of each r (K) sequence, that is, FFT computation (which may be referred to as 2D-FFT) in the second dimension, to obtain Range-Doppler spectrum. As shown in fig. 14, fig. 14 is a schematic flow chart of generating a Range-doppler spectrum according to an embodiment of the present application, and as shown in fig. 14, first, 1D-FFT computation is performed on K chirp signals to obtain K1D-FFT results (Range-FFT) (as shown in the left side of fig. 14), then, the K1D-FFT results (Range-FFT) are arranged by rows to obtain a complex value matrix (as shown in the middle of fig. 14), where a horizontal axis of the complex value matrix is a Range-bin sequence to represent distances, and then, N times may be performed again for each column to represent distances₂The point FFT calculation, i.e., 2D-FFT calculation, obtains a Range-Doppler spectrum shown on the right side in fig. 14, where the horizontal axis is Range-bin, the vertical axis is Doppler rate value, each square in the Range-Doppler spectrum corresponds to information of a distance value and a Doppler rate, and the color depth may also represent a corresponding signal-to-noise ratio, and illustratively, the color depth indicates the signal-to-noise ratio is larger.

In the embodiment of the present application, a plurality of point cloud data may be obtained according to Range-Doppler spectrum, where each point cloud data may include a distance value and a corresponding Doppler rate, and specifically, each point cloud data may further include a signal-to-noise ratio, where the signal-to-noise ratio indicates a signal-to-noise ratio of the corresponding distance value and the corresponding Doppler rate. That is, each cell in Range-Doppler spectrum may constitute a point cloud data, and the point cloud data format may be exemplarily defined as α ═ r, v, s, where r is a distance value, v is a modulus value of Doppler velocity, and s is a signal-to-noise ratio.

In one embodiment, the computing device may calculate a range-doppler spectrum of the reflected signal from the reflected signal, from which a plurality of point cloud data, referred to as location rate spectrum point clouds, may be derived. Each point cloud data includes a location coordinate, a corresponding doppler velocity, and a corresponding signal-to-noise ratio.

Specifically, the computing device may perform 1D-FFT computation on all K chirp signals in one frame to obtain K r (K) sequences, and perform FFT computation on a sequence formed by K values on the same Range-bin of each r (K) sequence, that is, FFT computation (which may be referred to as 2D-FFT) of the second dimension to obtain the Range-doppler spectrum. As shown in fig. 14, fig. 14 is a schematic flow diagram for generating a position doppler spectrum according to an embodiment of the present application, and as shown in fig. 14, first, 1D-FFT computation is performed on K chirp signals to obtain K1D-FFT results (Range-FFT) (as shown in the left side of fig. 14), then, the K1D-FFT results (Range-FFT) are arranged by rows to obtain a complex value matrix (as shown in the middle of fig. 14), where a horizontal axis of the complex value matrix is a Range-bin sequence and represents a distance value, and then, N times may be performed again for each column to obtain a distance value₂The point FFT calculation, i.e., the 2D-FFT calculation, obtains a range-doppler spectrum shown on the right side in fig. 14, where the horizontal axis is a distance value, the vertical axis is a doppler velocity value, the position coordinate where each square in the range-doppler spectrum is located corresponds to and the information of the doppler velocity, and the color depth may also indicate the size of the corresponding signal-to-noise ratio, and exemplarily, the deeper the color, the larger the signal-to-noise ratio.

In the embodiment of the present application, when the radar apparatus has a multiple-input multiple-output (MIMO) antenna array, for example, 3 transmitting antennas and 4 receiving antennas are 3 × 4MIMO antenna arrays, the radar apparatus may perform angle estimation of a reflected signal, may estimate a horizontal two-dimensional coordinate corresponding to each square in the position doppler spectrum according to an estimated horizontal azimuth, and may estimate a three-dimensional coordinate corresponding to each square in the position doppler spectrum when the antenna array has a vertical azimuth at the same time. Taking two-dimensional coordinates as an example, the position-rate spectrum point cloud data format may be exemplarily defined as α ═ x, y, v, s ], where x, y are horizontal coordinate positions, y axis is a horizontal radial direction of the radar device, and x axis is a horizontal tangential direction of the radar device; taking three-dimensional coordinates as an example, the position-rate spectrum point cloud data format may be exemplarily defined as α ═ x, y, z, v, s ], where z is of the radar apparatus.

By the method, the point cloud of the 3D space position can be obtained, then the point cloud is clustered to obtain the possible gesture object positions, outliers are filtered out, and the point cloud distribution corresponding to the effective gesture object is reserved. For example, a valid gesture object may be defined as an object whose 2D location point cloud of the target object is distributed in clusters within a limited range (e.g., set to 0.8m × 0.8m) and which can detect the gesture motion feature, as shown in fig. 15, the leftmost cluster in the left diagram of fig. 15 is an invalid gesture object, the right cluster cannot detect the gesture motion feature and is an invalid gesture object, and the middle cluster can detect the gesture motion feature and is a valid gesture object.

Alternatively, the clustering algorithm in the embodiment of the present application may be, for example, a K-Means clustering method (K-Means clustering algorithm), a density-based clustering with noise (DBSCAN) algorithm, a balanced iterative reduction and clustering with hierarchical approach (BIRCH) algorithm, a STING algorithm model, etc., which is not limited in any way by the embodiment of the present application.

By the method, the gesture data of the target gesture can be obtained.

504. And identifying a target gesture type corresponding to the gesture data through the target gesture identification model.

After the gesture data is obtained, the gesture data may be input to a target gesture recognition model, and a target gesture type corresponding to the gesture data is determined through the target gesture recognition model.

Optionally, in an implementation, after obtaining the target gesture type, the computing device may trigger the terminal device to perform an application operation corresponding to the target gesture type.

In a possible implementation, the target gesture type corresponding to the gesture data can be recognized as a circling gesture based on the target gesture recognition model, and in this case, the circling times and the circling direction of the circling gesture can be recognized according to the gesture data. The following description is made separately:

in one possible implementation, a motion feature of a target gesture may be determined based on gesture data, the motion feature comprising an angular change of the target gesture over time in azimuth or elevation; acquiring the number of wave crests and wave troughs in the angle change; and determining the circling times of the circling gesture according to the number of the wave crests and the wave troughs.

In order to obtain the number of circles drawn by the circling gesture, the number of wave crests and wave troughs in the angle change can be obtained; and determining the circling times of the circling gesture according to the number of the wave crests and the wave troughs.

In one implementation, M peaks and N troughs of the angle change may be obtained, and the number of times of circling of the circling gesture is determined based on the M peaks and the N troughs, where a peak after and adjacent to a peak corresponds to one circling, and a trough after and adjacent to a trough corresponds to one circling.

Taking 1/10 with a preset angle as the peak height as an example, in this embodiment, peak searching may be performed on 1-dimensional (1D) time domain features of an azimuth angle and a pitch angle in a sliding window, which change with time, to obtain parameters such as the number of peaks, the peak position, the peak height, the peak width of the features, and the like, feature segmentation is performed by using a peak searching result, periodic gesture determination is performed based on peak searching and harmonic distortion calculation, and after a periodic gesture (for example, the above-mentioned circling gesture) is determined, feature segmentation, periodic gesture recognition, and counting of the periodic gesture may be performed according to the peak searching result. The technical scheme for segmenting the continuous and periodic circling gesture comprises the following steps: the gesture segmentation is performed by using the peak searching result of the time domain feature of the pitch angle 1D in the sliding window, for example, the length of the sliding window shown in fig. 16 is 50 frames, and the segmentation conditions are as follows: two newly added peaks are detected within the sliding window (the lower triangle labeled in fig. 17 is the peak position, the gesture frame that has been previously segmented is skipped), and the pitch angle magnitude at the last frame (-8 deg. in fig. 17) deviates from the valley value between the two newly added peaks (-9 deg. in fig. 17) by less than 1/10 of two peak heights (16 deg. and 18 deg.).

In one implementation, the circling direction in which the circling gesture can also be acquired is clockwise or counterclockwise.

Specifically, the angular change over time in the azimuth angle or the pitch angle includes a first angular change over time in the azimuth angle of the circling gesture and a second angular change over time in the pitch angle of the target gesture, and the circling direction of the circling gesture can be determined according to the first angular change and the second angular change.

In one implementation, the first angle change includes a first sub-angle change and a second sub-angle change, the second angle change includes a third sub-angle change and a fourth sub-angle change, the first sub-angle change and the third sub-angle change are angle changes within the same time period, the second sub-angle change and the fourth sub-angle change are angle changes within the same time period, the third sub-angle change is an angle change from one peak to an adjacent valley in the second angle change, the fourth sub-angle change is an angle change from one valley to an adjacent peak in the second angle change, and the circling direction of the target gesture can be determined to be clockwise circling based on the first sub-angle change increasing first and then decreasing and the second sub-angle change decreasing first and then increasing; determining that the circling direction of the target gesture is anticlockwise circling based on the first sub-angle change decreasing first and then increasing and the second sub-angle change increasing first and then decreasing, in the present example, clockwise circling is correspondingly characterized in that the pitch angle crest azimuth angle increases first and then decreases, and the pitch angle trough azimuth angle decreases first and then increases; the anticlockwise circling is correspondingly characterized in that the azimuth angle from the pitch angle wave crest to the wave trough is firstly negative and then positive, and the azimuth angle from the pitch angle wave trough to the wave crest is firstly increased and then reduced.

In the case that the application scene is an application interface, the application interface may identify some periodic gestures, and perform application operations based on the categories of the periodic gestures and some motion characteristics, where the application operations are related to the motion characteristics of the gestures, for example, the video playback interface may identify a circling gesture, and perform application operations of volume adjustment, where the volume adjustment is related to the number of circling of the circling gesture, and the adjustment process is continuous and smooth, for example, every 30 degrees is drawn, the volume is adjusted by one unit, in this case, real-time gesture data (specifically, angle data including gesture motion, for example, azimuth angle and pitch angle) needs to be acquired, and then the identified smooth application operations may be performed. At this time, if a neural network model for gesture recognition is still used, only the gesture category can be acquired, and the motion data of the gesture cannot be acquired in real time.

In an optional implementation, in a continuous motion process of a user gesture, the terminal device continuously obtains gesture data, and may enter a continuous periodic gesture determination process, specifically, may obtain second gesture data in the target application scene; determining motion characteristics of a third gesture according to the second gesture data, wherein the motion characteristics comprise angle changes of the third gesture in an azimuth angle or a pitch angle along with time; if the preset condition is met, determining that the gesture type of the third gesture is a circling gesture, and the first wave peak and the second wave peak are adjacent wave peaks in the angle change; the preset condition comprises that the difference of time corresponding to a first peak and a second peak in the angle change is larger than a target time threshold, and at least one of an azimuth angle or a pitch angle corresponding to the first peak and the second peak is larger than a target angle threshold, and the difference of the angle change and the sine wave is within a preset range.

Next, it is described how to determine that the difference between the angle change and the sine wave is within a preset range, the difference between the time corresponding to the first peak and the time corresponding to the second peak in the angle change is greater than a target time threshold, and the azimuth angle or the pitch angle corresponding to the first peak and the second peak is greater than a target angle threshold.

Taking the target time threshold value may be 5 frames and the target angle threshold value may be 5 degrees as an example, specifically, the change of the azimuth angle or the pitch angle of a complete circling gesture with time is close to a periodic sine wave, so that sine wave periodicity or distortion calculation may be performed. For example, FFT may be performed on the 1D time domain feature of the azimuth angle or the pitch angle, and then peak searching may be performed on the 1D frequency domain feature, where the maximum peak value is defined as the fundamental peak value, other peak values are defined as the harmonic peak values, the "distortion degree" in the reference signal system is defined, and the total harmonic distortion degree is defined as:

wherein peak is_maxDefined as the fundamental peak. The smaller the total harmonic distortion, the closer the feature is to a sine wave, i.e., the better the periodicity. When the total harmonic distortion of the azimuth angle or the pitch angle is less than 0.5, the gesture can be determined to be a sine wave-like gesture, and whether two frames with peak width larger than a threshold value of 5 and peak height larger than a threshold value of 5 exist in the azimuth angle and pitch angle characteristics shown in fig. 18 or not is combinedPeak, it can be determined whether a circling gesture is present.

According to the embodiment, the gesture recognition model for recognizing the circling gesture does not need to be trained in advance, and the effective recognition of the circling gesture can be realized by utilizing the motion characteristics of the circling gesture.

With the scene-based clustering processing of 7 types of gestures, namely, swinging up the hand, swinging down the hand, swinging left the hand, swinging right the hand, pushing forward the hand, drawing a circle clockwise by the hand and drawing a circle counterclockwise by the hand, the 7 types of gestures are divided into 2 types of continuous periodic circling gestures and 5 types of non-continuous non-periodic non-circling gestures. For the 2-class continuous periodic circling gestures, by utilizing the segmentation of the continuous periodic circling gestures provided by the embodiment, whether the continuous periodic circling gestures are effective circling gestures or not can be judged through total harmonic distortion analysis, and if the continuous periodic circling gestures are effective circling gestures, clockwise and anticlockwise are judged through time-frequency characteristic peak-seeking analysis. Aiming at 5 classes of non-continuous non-periodic non-circling gestures, in an effective wave beam coverage range determined by a radar transmitting and receiving assembly, a plurality of gesture objects respectively perform predefined gestures at a plurality of different positions, characteristics related to target motion are extracted from received radar echoes, motion characteristics related to the gestures are separated, a gesture characteristic data set is established and sent to a gesture recognition model obtained by training in a machine learning or deep learning classifier, and when the gesture is judged to be a non-continuous non-periodic gesture, the gesture recognition model is called to perform gesture recognition.

Referring to fig. 19 and 20, it can be seen from the confusion matrix corresponding to the non-clustering and clustering gesture recognition of the 7-type gestures, confusion can be effectively reduced through gesture clustering, and the classification accuracy is improved.

Referring to fig. 21, fig. 21 is a gesture recognition apparatus serving multiple gesture-controlled application scenarios, where types of gestures to be recognized in the multiple gesture-controlled application scenarios are not exactly the same, where the apparatus 2100 includes:

an obtaining module 2101, configured to obtain a target application scene;

the obtaining module 2101 is further configured to obtain a target gesture recognition model corresponding to the target application scene from a plurality of gesture recognition models, where each gesture recognition model in the plurality of gesture recognition models corresponds to one type of application scene, and each gesture recognition model is used to identify a gesture type that needs to be identified in the corresponding one type of application scene;

the obtaining module 2101 is further configured to obtain first gesture data in the target application scenario; in one possible implementation, the target application scenario requires the recognition of M gesture types; the gesture recognition models further comprise a first gesture recognition model, the first gesture recognition model corresponds to a first application scene, and the first application scene needs to recognize N gesture types;

the M gesture types are not identical to the N gesture types.

In one possible implementation, the training samples of the target gesture recognition model include gesture data with sample labels of M gesture types; the training sample of the first gesture recognition model comprises gesture data with sample labels of N gesture types.

In a possible implementation, the obtaining module is further configured to obtain gesture data in the target application scenario, specifically: the acquisition module is further used for acquiring radar reflection information in the target application scene; the acquisition module is further used for acquiring gesture data based on the radar reflection information.

The gesture recognition apparatus further includes:

a gesture recognition module 2102 configured to recognize a target gesture type corresponding to the gesture data based on the target gesture recognition model.

The specific description of the gesture recognition module 2102 can refer to the description of step 502, and is not repeated here.

In one possible implementation, the gesture recognition module 2102 is configured to: determining motion features of a target gesture based on the gesture data, the motion features including angular variation of the target gesture over time in azimuth or pitch;

the obtaining module 2101 is further configured to obtain the number of peaks and troughs in the angle change;

the device further comprises: and the circling frequency determining module is used for determining the circling frequency of the circling gesture according to the number of the wave crests and the wave troughs.

In one possible implementation, the angular change in azimuth or pitch over time comprises a first angular change in azimuth of the circling gesture over time and a second angular change in pitch over time of the target gesture; the device further comprises: and the circling direction determining module is used for determining the circling direction of the circling gesture according to the first angle change and the second angle change.

The application also provides a gesture recognition device, which comprises a radar device and a processor, wherein the processor is in communication connection with the radar device, and the radar device is used for sending a radar signal, receiving a reflection signal of the radar signal and transmitting the reflection signal to the processor; it will be appreciated that in one possible implementation, the reflected signal may also be pre-processed (e.g., analog to digital processed) and then the processed reflected signal may be delivered to the processor.

The processor is used for acquiring a target application scene; acquiring a target gesture recognition model corresponding to the target application scene from a plurality of gesture recognition models, wherein each gesture recognition model in the plurality of gesture recognition models corresponds to one type of application scene, and each gesture recognition model is used for recognizing a gesture type needing to be recognized in the corresponding one type of application scene; acquiring gesture data in the target application scene; and identifying a target gesture type corresponding to the gesture data based on the target gesture identification model.

the M gesture types are not identical to the N gesture types.

In one possible implementation, the processor is configured to obtain radar reflection information in the target application scenario;

and acquiring gesture data based on the radar reflection information.

In one possible implementation, the target gesture category is a circling gesture, and the processor is configured to determine a motion characteristic of the target gesture based on the gesture data, where the motion characteristic includes an angular change of the target gesture in an azimuth angle or a pitch angle over time;

acquiring the number of wave crests and wave troughs in the angle change;

and the processor is used for determining the circling direction of the circling gesture according to the first angle change and the second angle change.

In one possible implementation, determining, from the processor, motion characteristics of a target gesture for the gesture data, the motion characteristics including angular variation of the target gesture over time in azimuth or elevation;

if the difference between the angle change and the sine wave is within a preset range, the difference between the time corresponding to a first peak and the time corresponding to a second peak in the angle change is larger than a target time threshold, and the azimuth angle or the pitch angle corresponding to the first peak and the second peak is larger than a target angle threshold, determining that the target gesture type of the target gesture is a circle-drawing gesture, and the first peak and the second peak are adjacent peaks in the angle change.

In one possible implementation, the processor is configured to obtain M peaks and N troughs of the angle change; determining the number of times of circling of the circling gesture based on the M wave crests and the N wave troughs, wherein one wave crest reaches the back and the adjacent wave crests correspond to one circling, and one wave trough reaches the back and the adjacent wave troughs correspond to one circling; and according to the circling times, carrying out application operation corresponding to the circling gesture on the target application interface.

In one possible implementation, the N wave troughs include a first wave trough and a second wave trough, the motion characteristic includes an angular change over time in the target gesture azimuth or pitch angle within a target time window, the second wave trough corresponds to a last frame within the target time window, and a difference value between the second wave trough and the first wave trough is smaller than a preset angle; or, the M peaks include a first peak and a second peak, the motion characteristic includes an angle change with time in the target gesture azimuth angle or the pitch angle in a target time window, the second peak corresponds to a last frame in the target time window, and a difference value between the second peak and the first peak is smaller than a preset angle.

In one possible implementation, the angular change over time in azimuth or pitch comprises a first angular change over time in azimuth of the circling gesture and a second angular change over time in pitch of the target gesture, the first angular change comprising a first sub-angular change and a second sub-angular change, the second angular change comprising a third sub-angular change and a fourth sub-angular change, the first sub-angular change and the third sub-angular change being angular changes within the same time period, the second sub-angular change and the fourth sub-angular change being angular changes within the same time period, the third sub-angular change being an angular change from one peak to an adjacent valley in the second angular change, the fourth sub-angular change being an angular change from one valley to an adjacent peak in the second angular change, the processor is configured to determine that the circling direction of the target gesture is clockwise circling based on the first sub-angle change increasing first and then decreasing and the second sub-angle change decreasing first and then increasing; determining that the circling direction of the target gesture is anticlockwise circling based on the first sub-angle change decreasing first and then increasing and the second sub-angle change increasing first and then decreasing; and according to the circling direction, carrying out application operation corresponding to the circling gesture on the target application interface.

In one possible implementation, the target gesture type includes a swing up gesture, a swing down gesture, a swing left gesture, a swing right gesture, a push forward gesture, or a circle drawing gesture.

Next, a gesture recognition device provided in an embodiment of the present application is introduced, please refer to fig. 22, and fig. 22 is a schematic structural diagram of the gesture recognition device provided in the embodiment of the present application. Specifically, the gesture recognition apparatus 2200 includes: a receiver 2201, a transmitter 2202, a processor 2203 and a memory 2204 (wherein the number of the processors 2203 in the gesture recognition apparatus 2200 may be one or more, and one processor is taken as an example in fig. 22), wherein the processor 2203 may include an application processor 22031 and a communication processor 22032. In some embodiments of the present application, the receiver 2201, the transmitter 2202, the processor 2203, and the memory 2204 may be connected by a bus or other means.

The memory 2204 may include both read-only memory and random access memory, and provides instructions and data to the processor 2203. A portion of the memory 2204 may also include non-volatile random access memory (NVRAM). The memory 2204 stores the processor and operating instructions, executable modules or data structures, or a subset thereof, or an expanded set thereof, wherein the operating instructions may include various operating instructions for performing various operations.

The processor 2203 controls the operation of the radar apparatus (including the antenna, the receiver 2201, and the transmitter 2202). In a particular application, the various components of the radar apparatus are coupled together by a bus system, which may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus. For clarity of illustration, the various buses are referred to in the figures as a bus system.

The method disclosed in the embodiments of the present application may be applied to the processor 2203, or implemented by the processor 2203. The processor 2203 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 2203. The processor 2203 may be a general-purpose processor, a Digital Signal Processor (DSP), a microprocessor or a microcontroller, and may further include an Application Specific Integrated Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The processor 2203 may implement or perform the methods, steps and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 2204, and the processor 2203 reads the information in the memory 2204 and completes the steps of the gesture recognition method provided by the above embodiments in combination with the hardware thereof.

The receiver 2201 is operable to receive input digital or character information and to generate signal inputs relating to the relevant settings and functional control of the radar apparatus. The transmitter 2202 is operable to output numeric or character information through a first interface; the transmitter 2202 is also operable to send instructions to the disk groups through the first interface to modify data in the disk groups.

It should be understood that, optionally, the gesture recognition apparatus may further include an application operation module, configured to, in the target application scene, execute an application operation corresponding to the target gesture type.

Referring to fig. 23, fig. 23 is a schematic structural diagram of a server provided in the embodiment of the present application, which may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 2322 (e.g., one or more processors) and a memory 2332, and one or more storage media 2330 (e.g., one or more mass storage devices) for storing an application 2342 or data 2344. Memory 2332 and storage media 2330 can be transient or persistent storage, among others. The program stored on storage medium 2330 may include one or more modules (not shown), each of which may include a sequence of instructions operating on the exercise device. Still further, the central processor 2322 may be provided in communication with the storage medium 2330, executing a series of instruction operations in the storage medium 2330 on the server 2300.

The server 2300 may also include one or more power supplies 2326, one or more wired or wireless network interfaces 2350, one or more input-output interfaces 2358, and/or one or more operating systems 2341, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.

In this embodiment, the central processing unit 2322 is configured to execute the gesture recognition method described in the foregoing embodiment.

Embodiments of the present application also provide a computer program product, which when run on a computer, causes the computer to execute the gesture recognition method described in the above embodiments.

Also provided in an embodiment of the present application is a computer-readable storage medium in which a program for signal processing is stored, which, when run on a computer, causes the computer to perform the gesture recognition method described in the above embodiment.

The gesture recognition device provided by the embodiment of the application can be specifically a chip, and the chip comprises: a processing unit, which may be for example a processor, and a communication unit, which may be for example an input/output interface, a pin or a circuit, etc. The processing unit may execute the computer-executable instructions stored by the storage unit to cause the chip in the execution device to execute the image enhancement method described in the above embodiments, or to cause the chip in the training device to execute the image enhancement method described in the above embodiments. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as a read-only memory (ROM) or another type of static storage device that can store static information and instructions, a Random Access Memory (RAM), and the like.

Specifically, referring to fig. 24, fig. 24 is a schematic structural diagram of a chip provided in the embodiment of the present application, where the chip may be represented as a neural network processor NPU240, and the NPU240 is mounted on a main CPU (Host CPU) as a coprocessor, and the Host CPU allocates tasks. The core portion of the NPU is an arithmetic circuit 2403, and the controller 2404 controls the arithmetic circuit 2403 to extract matrix data in the memory and perform multiplication.

In some implementations, the arithmetic circuit 2403 includes a plurality of processing units (PEs) therein. In some implementations, the operational circuitry 2403 is a two-dimensional systolic array. Operational circuit 2403 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the operational circuitry 2403 is a general-purpose matrix processor.

For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 2402 and buffers it in each PE in the arithmetic circuit. The arithmetic circuit takes the matrix a data from the input memory 2401 and performs matrix operation with the matrix B, and partial results or final results of the obtained matrix are stored in an accumulator (accumulator) 2408.

The unified memory 2406 is used for storing input data and output data. The weight data directly passes through a Direct Memory Access Controller (DMAC) 2405, and the DMAC is transferred to a weight memory 2402. The input data is also carried into the unified memory 2406 through the DMAC.

The BIU is a Bus Interface Unit 2410 for the interaction of the AXI Bus with the DMAC and the Instruction Fetch memory (IFB) 2409.

The Bus Interface Unit 2410(Bus Interface Unit, BIU for short) is configured to obtain an instruction from the external memory by the instruction fetch memory 2409, and further configured to obtain the original data of the input matrix a or the weight matrix B from the external memory by the storage Unit access controller 2405.

The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 2406 or transfer weight data to the weight memory 2402 or transfer input data to the input memory 2401.

The vector calculation unit 2407 includes a plurality of operation processing units, and further processes the output of the operation circuit such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like, if necessary. The method is mainly used for non-convolution/full-connection layer network calculation in the neural network, such as Batch Normalization, pixel-level summation, up-sampling of a feature plane and the like.

In some implementations, the vector calculation unit 2407 can store the processed output vector to the unified memory 2406. For example, the vector calculation unit 2407 may apply a linear function and/or a nonlinear function to the output of the arithmetic circuit 2403, such as linear interpolation of the feature planes extracted by the convolution layer, and further such as a vector of accumulated values to generate the activation value. In some implementations, the vector calculation unit 2407 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the operational circuitry 2403, e.g., for use in subsequent layers in a neural network.

An instruction fetch buffer (instruction fetch buffer)2409 connected to the controller 2404, configured to store instructions used by the controller 2404;

the unified memory 2406, the input memory 2401, the weight memory 2402, and the instruction fetch memory 2409 are all On-Chip memories. The external memory is private to the NPU hardware architecture.

The processor mentioned in any of the above embodiments may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of programs related to the gesture recognition method described in the above embodiments.

It should be noted that the above-described embodiments of the apparatus are merely schematic, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiments of the apparatus provided in the present application, the connection relationship between the modules indicates that there is a communication connection therebetween, and may be implemented as one or more communication buses or signal lines.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general-purpose hardware, and certainly can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application may be substantially embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, an exercise device, or a network device) to execute the method according to the embodiments of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, training device, or data center to another website site, computer, training device, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a training device, a data center, etc., that incorporates one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Claims

1. A gesture recognition method is applied to a computing device, the computing device is applied to a plurality of gesture-controlled application scenes, the gesture types needing to be recognized in the plurality of gesture-controlled application scenes are not identical, and the method comprises the following steps:

acquiring a target application scene;

acquiring a target gesture recognition model corresponding to the target application scene from a plurality of gesture recognition models, wherein each gesture recognition model in the plurality of gesture recognition models corresponds to one type of application scene, and each gesture recognition model is used for recognizing a gesture type needing to be recognized in the corresponding one type of application scene;

acquiring gesture data in the target application scene;

and identifying a target gesture type corresponding to the gesture data based on the target gesture identification model.

2. The method of claim 1,

the target application scene needs to identify M gesture types;

the M gesture types are not identical to the N gesture types.

3. The method of claim 2, wherein the N gesture types include a first gesture and a second gesture, a similarity of the first gesture and the second gesture is greater than a preset value, and wherein the M gesture types include the first gesture and do not include the second gesture.

4. The method of any one of claims 1 to 3, wherein the plurality of gesture recognition models are neural network models.

5. The method according to any one of claims 2 to 4,

the training sample of the target gesture recognition model comprises gesture data with M gesture types of sample labels;

6. The method according to any one of claims 1 to 5, wherein the acquiring gesture data in the target application scene specifically includes:

acquiring radar reflection information in the target application scene;

and acquiring gesture data based on the radar reflection information.

7. The method of any of claims 1 to 6, wherein the target gesture type is a circling gesture, the method further comprising: determining a motion characteristic of the circling gesture based on the gesture data, the motion characteristic comprising an angular change of the circling gesture over time in an azimuth angle or a pitch angle;

acquiring the number of wave crests and wave troughs in the angle change;

and determining the circling times of the circling gesture according to the number of the wave crests or the wave troughs.

8. The method of claim 7, wherein the angular change in azimuth or elevation over time comprises a first angular change in azimuth of the circling gesture over time and a second angular change in elevation of the circling gesture over time;

the method further comprises the following steps:

9. A gesture recognition apparatus applied to a plurality of gesture-controlled application scenarios, wherein types of gestures to be recognized are not identical, the apparatus comprising:

the acquisition module is used for acquiring a target application scene;

10. The apparatus of claim 9, wherein the target application scenario requires recognition of M gesture types;

the M gesture types are not identical to the N gesture types.

11. The apparatus of claim 10, wherein the N gesture types include a first gesture and a second gesture, a similarity of the first gesture and the second gesture is greater than a preset value, and wherein the M gesture types include the first gesture and do not include the second gesture.

12. The apparatus according to claim 11, wherein the obtaining module is further configured to obtain gesture data in the target application scenario, specifically:

13. A gesture recognition device, comprising: one or more processors and memory; wherein the memory has stored therein computer readable instructions;

the one or more processors read the computer-readable instructions to cause the computer device to implement the method of any of claims 1 to 8.

14. A computer readable storage medium comprising computer readable instructions which, when run on a computer device, cause the computer device to perform the method of any of claims 1 to 8.

15. A computer program product comprising computer readable instructions which, when run on a computer device, cause the computer device to perform the method of any one of claims 1 to 8.

16. A gesture recognition apparatus, comprising:

the radar device is used for transmitting radar signals and receiving reflected signals of the radar signals, and the reflected signals are used for acquiring gesture data in a target application scene;

a memory and a processor; the memory stores code, and the processor is configured to retrieve the code and perform the method of any of claims 1 to 8 to obtain a target gesture type.

17. A terminal device, comprising:

a memory and a processor; the memory stores code, and the processor is configured to retrieve the code and perform the method of any of claims 1 to 8 to obtain a target gesture type;

and the application module is used for executing the application operation corresponding to the target gesture type in the target application scene.