WO2019120290A1 - 动态手势识别方法和装置、手势交互控制方法和装置 - Google Patents
动态手势识别方法和装置、手势交互控制方法和装置 Download PDFInfo
- Publication number
- WO2019120290A1 WO2019120290A1 PCT/CN2018/122767 CN2018122767W WO2019120290A1 WO 2019120290 A1 WO2019120290 A1 WO 2019120290A1 CN 2018122767 W CN2018122767 W CN 2018122767W WO 2019120290 A1 WO2019120290 A1 WO 2019120290A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- dynamic gesture
- image
- frame
- dynamic
- gesture recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24143—Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
Definitions
- the embodiments of the present application relate to image processing technologies, and in particular, to a dynamic gesture recognition method and apparatus, and a gesture interaction control method and apparatus.
- Gestures are an important human-computer interaction feature in image and video information.
- the core task of the gesture recognition algorithm is to give a picture containing the hand and determine the type of the gesture.
- the embodiment of the present application provides a dynamic gesture recognition technical solution and a gesture interaction control technical solution.
- a dynamic gesture recognition method includes: positioning a dynamic gesture in a video stream to be detected to obtain a dynamic gesture frame; and extracting from the multi-frame image frame of the video stream An image block corresponding to the dynamic gesture frame; generating a detection sequence based on the segmented image block; and performing dynamic gesture recognition according to the detection sequence.
- a dynamic gesture recognition modeling method includes: collecting sample video streams of different dynamic gesture types; marking the dynamic gesture frames of the different dynamic gesture types; The image block corresponding to the annotation information of the dynamic gesture frame is intercepted in the multi-frame image frame to form an image sequence; the dynamic gesture type is used as the supervision data, and the image sequence is used as the training data to train the first dynamic gesture recognition model.
- a dynamic gesture recognition apparatus includes: a gesture locating unit, configured to locate a dynamic gesture in a video stream to be detected, to obtain a dynamic gesture frame; and a processing unit, configured to An image block corresponding to the dynamic gesture frame is captured in a multi-frame image frame of the video stream; a detection sequence generating unit is configured to generate a detection sequence based on the truncated image block; and a gesture recognition unit is configured to use the detection sequence according to the detection sequence Perform dynamic gesture recognition.
- a dynamic gesture recognition model establishing apparatus includes: a first dynamic gesture recognition model establishing unit; the first dynamic gesture recognition model establishing unit includes: a sample collection subunit, configured to collect a sample video stream of different dynamic gesture types; a gesture frame marking sub-unit for marking the dynamic gesture frame of the different dynamic gesture type; the image sequence forming a sub-unit for intercepting the multi-frame image frame of the sample video stream.
- the image block corresponding to the annotation information of the dynamic gesture frame constitutes a sequence of images; the training subunit is configured to use the dynamic gesture type as the supervision data, and use the image sequence as the training data to train the first dynamic gesture recognition model.
- a gesture interaction control method including:
- the control device performs an operation corresponding to the dynamic gesture recognition result.
- a gesture interaction control apparatus includes:
- a video stream obtaining module configured to acquire a video stream
- a result obtaining module configured to determine a dynamic gesture recognition result in the video stream by using the dynamic gesture recognition apparatus according to any one of the preceding claims;
- an operation execution module configured to control the device to perform an operation corresponding to the dynamic gesture recognition result.
- an electronic device comprising a processor, comprising the dynamic gesture recognition device according to any one of the above, or the dynamic gesture recognition modeling described in any one of the above The device, or the gesture interaction control device of any of the above.
- an electronic device comprising: a memory for storing executable instructions; and a processor for communicating with the memory to execute the executable instructions to complete any of the above
- a computer readable storage medium for storing computer readable instructions, wherein when the instructions are executed, performing the dynamic gesture recognition method according to any one of the above Or the operation of the dynamic gesture recognition modeling method according to any one of the above, or the gesture interaction control method according to any one of the above.
- a computer program product comprising computer readable code, wherein when the computer readable code is run on a device, a processor in the device executes The instruction of the dynamic gesture recognition method according to any one of the preceding claims, or the dynamic gesture recognition modeling method according to any one of the above, or the gesture interaction control method according to any one of the above.
- the dynamic gesture recognition method and apparatus and the gesture interaction control method and apparatus provided by the above embodiments of the present application intercept an image block corresponding to a dynamic gesture frame from a multi-frame image frame of a video stream, and perform dynamics based on the detection sequence generated by the image block.
- Gesture Recognition Since the dynamic gesture recognition is based on the image block corresponding to the dynamic gesture frame, a series of changed dynamic gestures can be identified.
- FIG. 1 is a flowchart of a dynamic gesture recognition method according to an embodiment of the present application.
- FIG. 2 is another flowchart of a dynamic gesture recognition method according to an embodiment of the present application.
- FIG. 3 is a flowchart of establishing a first dynamic gesture recognition model according to an embodiment of the present application.
- FIG. 4 is a flowchart of establishing a second dynamic gesture recognition model according to an embodiment of the present application.
- FIG. 5 is a schematic structural diagram of a dynamic gesture recognition apparatus according to an embodiment of the present disclosure.
- FIG. 6 is another schematic structural diagram of a dynamic gesture recognition apparatus according to an embodiment of the present application.
- FIG. 7 is a flowchart of a gesture interaction control method according to an embodiment of the present application.
- FIG. 8 is a flowchart of an application example of a gesture interaction control method according to an embodiment of the present application.
- FIG. 9 is a schematic structural diagram of a gesture interaction control apparatus according to an embodiment of the present disclosure.
- FIG. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
- Embodiments of the present application can be applied to computer systems/servers that can operate with numerous other general purpose or special purpose computing system environments or configurations.
- Examples of well-known computing systems, environments, and/or configurations suitable for use with computer systems/servers include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, based on Microprocessor systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
- the computer system/server can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
- program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
- the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
- program modules may be located on a local or remote computing system storage medium including storage devices.
- the inventor found that the current gesture recognition problem only recognizes a static single image, and recognizes a single picture and can only recognize some simple static gestures, such as scissors, fists, OK, etc.
- static gestures In the process of human-computer interaction, using static gestures to manipulate the machine is not as natural as dynamic gestures, and it also carries less information. Therefore, there is a need for a solution for identifying dynamic gestures.
- FIG. 1 is a flowchart of a dynamic gesture recognition method according to an embodiment of the present application.
- the method can be performed by any electronic device, such as a terminal device, a server, a mobile device, an in-vehicle device, and the like.
- the method of this embodiment includes S101-S104.
- S101 Position a dynamic gesture in the video stream to be detected to obtain a dynamic gesture frame.
- the step S101 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a gesture locating unit 501 executed by the processor.
- the dynamic gesture in the embodiment of the present application refers to a gesture composed of a series of actions (which may be continuous or discontinuous actions), which are relative static gestures, including but not limited to: waving, clicking, and pistol gestures. , grab gestures, and more.
- waving can realize the operation of turning pages; clicking can realize operations similar to mouse clicks; playing a pistol can make special effects or connecting games; grabbing can drag and drop things, like dragging files with mouse.
- the dynamic gesture box refers to a box that covers a series of actions in a dynamic gesture, such as a rectangular box, in which the gesture image is included.
- the dynamic gesture frame can be determined by first determining the static gesture frame and then zooming in, thereby ensuring that the dynamic gesture frame includes the remaining associated static gestures.
- the dynamic gesture frame may be determined by: selecting a static gesture in an image frame of any frame from the image frame of the video stream to be detected, and positioning the static gesture to determine a static gesture.
- a frame is enlarged according to a preset magnification ratio to determine a dynamic gesture frame.
- a static gesture frame is selected from a certain image frame of the video stream, and the static gesture frame is enlarged according to a preset magnification ratio (for example, 120%), and the enlarged frame is a dynamic gesture frame.
- the static gesture frame of the multi-frame image frame may satisfy that the static gesture frame is located in the dynamic gesture frame, or the static gesture frame is the same as the dynamic gesture frame.
- S102 Capture an image block corresponding to the dynamic gesture frame from the multi-frame image frame of the video stream.
- the image block that is truncated in the video stream may be a continuous frame in the video stream, or may be a continuous key frame or a sample frame, as long as it corresponds to the dynamic gesture frame.
- step S102 may be performed by a processor invoking a corresponding instruction stored in a memory or by a processing unit 502 being executed by the processor.
- S103 Generate a detection sequence based on the truncated image block.
- the truncated image block is typically smaller in size than the image frame and includes a dynamic gesture frame in the image.
- the advantage of this processing is that the hand positioning information of the multi-frame image frame is considered, and in addition, the portion of the image frame larger than the dynamic gesture frame is removed without consideration, thereby achieving the effect of noise reduction.
- the step S103 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a detection sequence generation unit 503 executed by the processor.
- S104 Perform dynamic gesture recognition according to the detection sequence.
- the step S104 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a gesture recognition unit 504 being executed by the processor.
- a detection sequence is generated based on the truncated image block, and then a dynamic gesture frame is used to intercept and generate an image block sequence (instead of the multi-frame image frame of the original video stream) for gesture recognition.
- the image block corresponding to the dynamic gesture frame is intercepted from the multi-frame image frame of the video stream, and the dynamic gesture recognition is performed based on the detection sequence generated by the image block. Since the dynamic gesture recognition is based on the image block corresponding to the dynamic gesture frame, a series of changed dynamic gestures can be identified.
- dynamic gesture recognition is performed based on inter-frame image differences in the detected sequence.
- first determining an image difference between the plurality of frame image frames in the detection sequence then generating an image difference sequence based on the image difference between the plurality of frame image frames; and finally performing dynamic gesture recognition according to the detection sequence and the image difference sequence.
- the embodiment of the present application provides an alternative manner of not only performing gesture recognition according to the image, but also performing gesture recognition according to the image difference.
- the image difference can also be understood as the pixel difference, which is obtained by the difference between the pixels at the same position of the adjacent two frames. Since the pixel difference processing is performed at the same position of the adjacent two frames, the change process and trend of the dynamic gesture can be embodied. Better recognition of the dynamic process of gestures.
- the inter-frame difference is only an example, and the inter-frame difference is not limited to such a limitation, and may be an image difference between non-adjacent frames, for example, an image difference between a fixed number of frames or random frames. It can be seen that the interframe difference is the image difference between two adjacent reference frames in the detection sequence.
- the reference frame is an actual frame or a key frame.
- the types of dynamic gestures may include, but are not limited to, wave, click, pistol gesture, grab gesture, etc.
- the first dynamic gesture recognition model and the second dynamic may be separately established in advance.
- the gesture recognition model inputs the intercepted image and the calculated image difference into the two models respectively, and outputs the probability of at least one dynamic gesture type, and the dynamic gesture type with higher probability (for example, the highest probability) is the current recognition. the result of.
- the identification may be performed multiple times (multiple segments) and the dynamic gesture type is determined based on the multiple recognition results. For example, after identifying according to the intercepted segment (referred to as the first segment) image, the second segment image is recognized, and the third segment image is recognized. Finally, the dynamic gesture type is determined according to the three recognition results. Therefore, in this implementation manner, the foregoing method further includes the following steps: capturing an image that obtains a preset number of frames multiple times, performing image difference calculation multiple times, and performing dynamic gesture recognition multiple times according to the intercepted image and the calculated image difference. The final dynamic gesture recognition result is determined according to the probability of the dynamic gesture type obtained by multiple dynamic gesture recognition.
- At least one dynamic gesture type probability of all times of dynamic gesture recognition is summed to determine that the sum probability is high (including: the highest probability or the highest order of probability from high to low)
- the dynamic gesture type of a certain probability in the bit, n is an integer greater than 1 as the final dynamic gesture recognition result.
- FIG. 2 is another flowchart of a dynamic gesture recognition method according to an embodiment of the present application. Based on the embodiment of FIG. 1, the embodiment of FIG. 2 introduces a process of detecting a dynamic gesture in a video stream to be detected by using a convolutional neural network as a dynamic gesture recognition model.
- the method of this embodiment includes S201-S204.
- S201 Establish a first dynamic gesture recognition model and a second dynamic gesture recognition model.
- FIG. 3 is a flowchart of establishing a first dynamic gesture recognition model according to an embodiment of the present application.
- FIG. 4 is a flowchart of establishing a second dynamic gesture recognition model according to an embodiment of the present application.
- the process of establishing the first dynamic gesture recognition model includes S301-S304.
- S301 Collect sample video streams of different dynamic gesture types.
- the step S301 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a sample collection sub-unit 6071 operated by the processor.
- a video stream of known dynamic gesture types (eg, wave, click, pistol, grab) is captured, marking the start and end frames of the sample video stream.
- step S302 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a gesture box marking sub-unit 6072 that is executed by the processor.
- a dynamic gesture frame refers to a frame that covers a series of actions in a dynamic gesture in a sample video stream, such as a rectangular frame, and each static gesture image of the dynamic gesture is included in the rectangular frame.
- the dynamic gesture frame can be determined by first determining the static gesture frame and then zooming in, thereby ensuring that the dynamic gesture frame includes the remaining associated static gestures.
- the dynamic gesture frame may be determined by: selecting a static gesture in any image from the image of the sample video stream, and positioning the static gesture to determine a static gesture frame; The zoom ratio is set, and the static gesture frame is enlarged to determine a dynamic gesture frame.
- a static gesture frame is selected from a frame image of the video stream, and the static gesture frame is enlarged according to a preset magnification ratio (for example, 120%), and the enlarged frame is a dynamic gesture frame.
- S303 Intercept an image block corresponding to the annotation information of the dynamic gesture frame from the multi-frame image frame of the sample video stream to form an image sequence.
- the step S303 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by an image sequence forming sub-unit 6073 that is executed by the processor.
- S304 The dynamic gesture type is used as the monitoring data, and the image sequence is used as the training data to train the first dynamic gesture recognition model.
- the step S304 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a training sub-unit 6074 that is executed by the processor.
- the first dynamic gesture recognition model is established by the following steps:
- the image sequence is divided into at least one segment; for example, the image sequence is equally divided into three segments.
- five (frames) images are extracted (random or continuous) in each piece of image data for stacking to constitute image training data.
- a total of ten (random or continuous) ten frames of images are extracted from at least one piece of image data for stacking to form image training data.
- the three dimensions are the height and width of the channel and the image respectively.
- the number of channels of one grayscale image is 1
- the number of RGB image channels is 3
- the stack here is a channel stack, for example, there are five images with a channel number of 1
- after stacking is a three-dimensional matrix with a channel number of 5.
- the dynamic gesture type is used as the supervision data, and the image sequence is used as the training data to train the first dynamic gesture recognition model.
- the process of establishing the second dynamic gesture recognition model includes S401-S406.
- S401 Collect sample video streams of different dynamic gesture types.
- the step S401 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a sample collection sub-unit 6081 that is executed by the processor.
- a video stream of known dynamic gesture types (eg, wave, click, pistol, grab) is captured, marking the start and end frames of the sample video stream.
- S402 Mark a dynamic gesture frame of different dynamic gesture types.
- the step S402 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a gesture box marking sub-unit 6082 that is executed by the processor.
- a dynamic gesture frame refers to a frame that covers a series of actions in a dynamic gesture in a sample video stream, such as a rectangular frame, and each static gesture image of the dynamic gesture is included in the rectangular frame.
- the dynamic gesture frame can be determined by first determining the static gesture frame and then zooming in, thereby ensuring that the dynamic gesture frame includes the remaining associated static gestures.
- the dynamic gesture frame may be determined by: selecting a static gesture in any image from the image of the sample video stream, and positioning the static gesture to determine a static gesture frame; The zoom ratio is set, and the static gesture frame is enlarged to determine a dynamic gesture frame.
- a static gesture frame is selected from a frame image of the video stream, and the static gesture frame is enlarged according to a preset magnification ratio (for example, 120%), and the enlarged frame is a dynamic gesture frame.
- S403 Obtain an image block corresponding to the annotation information of the dynamic gesture frame from the multi-frame image frame of the sample video stream to form an image sequence.
- the step S403 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by an image sequence forming sub-unit 6083 executed by the processor.
- S404 Determine a plurality of inter-frame image differences in the image sequence.
- the step S404 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by an image difference determination sub-unit 6084 that is executed by the processor.
- the image difference can also be understood as the pixel difference, which is obtained by the difference between the pixels at the same position of the adjacent two frames. Since the pixel difference processing is performed at the same position of the adjacent two frames, the change process and trend of the dynamic gesture can be embodied. Better recognition of the dynamic process of gestures.
- the inter-frame difference is only an example, and the inter-frame difference is not limited to such a limitation, and may be an image difference between non-adjacent frames, for example, an image difference between a fixed number of frames or random frames. It can be seen that the interframe difference is the image difference between two adjacent reference frames in the detection sequence.
- the reference frame is an actual frame or a key frame.
- S405 Generate an image difference sequence based on the determined plurality of inter-frame image differences.
- the step S405 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by an image difference sequence determining sub-unit 6085 that is executed by the processor.
- the dynamic gesture type is used as the monitoring data, and the image difference sequence is used as the training data to train the second dynamic gesture recognition model.
- the step S406 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a training sub-unit 6086 that is executed by the processor.
- the second dynamic gesture recognition model is established by the following steps:
- first dynamic gesture recognition model and the second dynamic gesture recognition model may be implemented based on different networks.
- the first dynamic gesture recognition model and the second dynamic gesture recognition model may be implemented based on a convolutional neural network.
- the first dynamic gesture recognition model may be, but is not limited to, a first neural network model, and the first neural network model is pre-trained based on the sample video stream.
- the first neural network model may include, but is not limited to, a convolutional layer, a non-linear layer (Relu), a pooling layer, and/or a classification layer, and the like.
- the second dynamic gesture recognition model may be, but is not limited to, a second neural network model, and the second neural network model is pre-trained based on the sample video stream.
- the second neural network model may include, but is not limited to, a convolutional layer, a non-linear layer (Relu), a pooling layer, and/or a classification layer, and the like.
- the first dynamic gesture recognition model and the second dynamic gesture recognition model may be implemented based on the cyclic neural network, the enhanced learning neural network, or the generated anti-neural network, and the embodiment of the present application does not limited.
- S202 Input the cut image into the first dynamic gesture recognition model, and input the image difference of the adjacent two frames into the second dynamic gesture recognition model to identify the prediction probability of the dynamic gesture type.
- the identification work process of the convolutional neural network may generally include: an image feature extraction phase and a classification phase of the feature.
- an image feature extraction phase For example, in order to input an image into the first dynamic recognition model, an image of a preset number of frames (for example, 5 frames) is input to the first dynamic recognition model, and the convolution layer, the activation layer, and the pooling layer are used. The features in the image are extracted, and then the features are classified by the classifier to finally obtain the prediction probability of the dynamic gesture type.
- S203 Determine a dynamic gesture recognition result according to a prediction probability of the dynamic gesture type of the first dynamic gesture recognition model and the second dynamic gesture recognition model.
- the prediction probability of the at least one dynamic gesture type of the first dynamic gesture recognition model and the second dynamic gesture recognition model may be weighted and averaged to determine that the weighted average probability is higher (eg, the weighted average probability is the highest).
- the dynamic gesture type is the result of this dynamic gesture recognition.
- the weighting coefficients of the two models may be preset.
- the weighted average of the two models is weighted and averaged according to the weighting coefficients of each model, and the dynamic gesture type with the highest weighted average probability is determined.
- the second dynamic gesture recognition result can be used to process the predicted probability to finally determine the recognition result.
- the image and the image difference are respectively probabilistically recognized, thereby obtaining a probability of at least one dynamic gesture type, and determining a dynamic gesture type with a large probability (eg, the highest probability) as the recognition result.
- the image difference can better reflect the temporal correlation before and after the image, and can realize the recognition of dynamic gestures.
- the calculation amount of each time can be reduced, and the real-time speed of the recognition can be improved, especially for the action time. Dynamic gestures with large spans.
- the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
- the foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
- FIG. 5 is a schematic structural diagram of a dynamic gesture recognition apparatus according to an embodiment of the present disclosure.
- the apparatus of this embodiment can be used to implement the various method embodiments described above. As shown in FIG. 5, the apparatus of this embodiment includes:
- the gesture locating unit 501 is configured to locate a dynamic gesture in the video stream to be detected to obtain a dynamic gesture frame.
- the processing unit 502 is configured to intercept an image block corresponding to the dynamic gesture frame from the multi-frame image frame of the video stream.
- the detection sequence generating unit 503 is configured to generate a detection sequence based on the truncated image block.
- the gesture recognition unit 504 is configured to perform dynamic gesture recognition according to the detection sequence.
- FIG. 6 is another schematic structural diagram of a dynamic gesture recognition apparatus according to an embodiment of the present application.
- the apparatus of this embodiment can be used to implement the various method embodiments described above. As shown in FIG. 6, the apparatus of this embodiment includes:
- the gesture positioning unit 601 is configured to locate a dynamic gesture in the video stream to be detected to obtain a dynamic gesture frame.
- the processing unit 602 is configured to intercept an image block corresponding to the dynamic gesture frame from the multi-frame image frame of the video stream.
- the detection sequence generating unit 603 is configured to generate a detection sequence based on the truncated image block.
- the gesture recognition unit 604 is configured to perform dynamic gesture recognition according to the detection sequence.
- the gesture positioning unit 601 includes:
- the static gesture frame locating sub-unit 6011 is configured to perform static gesture positioning on at least one frame of the multi-frame image of the video stream to obtain a static gesture frame of at least one frame of the image frame;
- the dynamic gesture frame determining sub-unit 6012 is configured to determine a dynamic gesture frame according to the obtained static gesture frame of the at least one frame image frame.
- the dynamic gesture frame determining sub-unit 6012 is configured to: perform amplifying processing on the static gesture frame of the at least one frame of the image frame to obtain a dynamic gesture frame.
- the static gesture frame of at least one frame of the image frame of the video stream satisfies: the static gesture frame is located in the dynamic gesture frame, or the static gesture frame is the same as the dynamic gesture frame.
- the gesture recognition unit 604 includes:
- the image difference determining sub-unit 6041 is configured to determine a plurality of inter-frame image differences in the detection sequence
- An image difference sequence determining sub-unit 6042 configured to generate an image difference sequence based on the determined plurality of inter-frame image differences
- the dynamic gesture recognition sub-unit 6043 is configured to perform dynamic gesture recognition according to the detection sequence and the image difference sequence.
- the interframe difference is an image difference between two adjacent reference frames in the detection sequence.
- the dynamic gesture recognition sub-unit 6043 is configured to: input the detection sequence into the first dynamic gesture recognition model to obtain a first dynamic gesture category prediction probability output by the first dynamic gesture recognition model; And inputting a second dynamic gesture recognition model to obtain a second dynamic gesture category prediction probability output by the second dynamic gesture recognition model; determining a dynamic gesture recognition result according to the first dynamic gesture category prediction probability and the second dynamic gesture category prediction probability.
- the first dynamic gesture recognition model is a first neural network
- the second dynamic gesture recognition model is a second neural network
- the structures of the first neural network and the second neural network are the same or different.
- the gesture recognition unit 604 further includes:
- the multiple recognition control unit 605 is configured to intercept the obtained detection sequence multiple times, generate the image difference sequence multiple times, and perform dynamic gesture recognition according to the detection sequence and the image difference sequence multiple times;
- the recognition result determining unit 606 is configured to determine a dynamic gesture recognition result according to the probability of the dynamic gesture type obtained by each dynamic gesture recognition.
- the gesture recognition unit 604 further includes: a first dynamic gesture recognition model establishing unit 607; the first dynamic gesture recognition model establishing unit 607 includes:
- a sample collection subunit 6071 configured to collect sample video streams of different dynamic gesture types
- a gesture box marking sub-unit 6072 configured to mark a dynamic gesture box of different dynamic gesture types
- the image sequence constituting sub-unit 6073 is configured to intercept an image block corresponding to the annotation information of the dynamic gesture frame from the multi-frame image frame of the sample video stream to form an image sequence;
- the training sub-unit 6074 is configured to use the dynamic gesture type as the monitoring data, and the image sequence as the training data to train the first dynamic gesture recognition model.
- the training subunit 6074 is configured to: divide the image sequence into at least one segment; extract an image of the preset number of frames from at least one segment, and stack the image training data; and, with the dynamic gesture type as the supervision Data, training the first dynamic gesture recognition model with image training data.
- the gesture recognition unit 604 further includes: a second dynamic gesture recognition model establishing unit 608; the second dynamic gesture recognition model establishing unit 608 includes:
- the sample collection subunit 6081 is configured to collect sample video streams of different dynamic gesture types
- a gesture box marking sub-unit 6082 configured to mark a dynamic gesture box of different dynamic gesture types
- the image sequence constituting sub-unit 6083 is configured to intercept an image block corresponding to the annotation information of the dynamic gesture frame from the multi-frame image frame of the sample video stream to form an image sequence;
- the image difference determining subunit 6084 is configured to determine a plurality of inter-frame image differences in the image sequence
- the image difference sequence determining sub-unit 6085 is configured to generate an image difference sequence based on the determined plurality of inter-frame image differences
- the training sub-unit 6086 is configured to use the dynamic gesture type as the monitoring data, the image difference sequence as the training data, and the second dynamic gesture recognition model.
- the training sub-unit 6086 is configured to: divide the image difference sequence into at least one segment; extract an image of the preset frame number from at least one segment, and stack the image difference training data; and, in the dynamic gesture type As the supervisory data, the second dynamic gesture recognition model is trained with the image difference training data.
- the dynamic gesture recognition apparatus of the present embodiment can be used to implement the corresponding dynamic gesture recognition method in the foregoing multiple method embodiments, and has the beneficial effects of the corresponding method embodiments, and details are not described herein again.
- FIG. 7 is a flowchart of a gesture interaction control method according to an embodiment of the present application.
- the method can be performed by any electronic device, such as a terminal device, a server, a mobile device, an in-vehicle device, a drone, a robot, an unmanned vehicle, a television, a vehicle, a home device, or other type of smart device, and the like.
- the gesture interaction control method includes:
- Step S700 acquiring a video stream.
- the step S700 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by the video stream acquisition module 100 executed by the processor.
- Step S710 determining the dynamic gesture recognition result in the video stream by using any one of the above dynamic gesture recognition methods.
- the step S710 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by the result acquisition module 200 executed by the processor.
- Step S720 the control device performs an operation corresponding to the dynamic gesture recognition result.
- step S720 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by the operation execution module 300 executed by the processor.
- the camera can be set on the device that needs to perform the control operation, and the video stream can be acquired in real time by the camera. It is also possible to obtain a video stream that has been captured by the camera. Video streams can be taken with different cameras.
- the photographing device may include a binocular camera, a depth camera, or a normal camera. Different types of cameras can be used to capture video streams based on the needs of dynamic gesture recognition.
- step S720 includes:
- the device is controlled to perform the corresponding operation according to the operation instruction.
- a correspondence between a dynamic gesture recognition result and an operation instruction may be preset.
- a dynamic gesture recognition result may correspond to one operation instruction, or a plurality of dynamic gesture recognition results may correspond to one operation instruction.
- the type and content of the operational instructions can be determined based on the type of device being operated and the operational requirements. The embodiment of the present application does not limit the form and specific content of the operation instruction.
- an operational command can be output to control the device being operated. Since the dynamic gesture recognition result can track the motion in the video stream in real time, the output operation instruction can also track the action of the execution object in the video stream in real time, so that the operator can control the operated device relatively accurately.
- an operation instruction corresponding to the dynamic gesture recognition result may be determined and output according to the correspondence relationship and the dynamic gesture recognition result. Based on the real-time and accuracy of the dynamic gesture recognition result, the operation instruction can also track the action of the execution object in the video stream in real time, so that the operator can operate the device more accurately.
- the device is controlled to perform corresponding operations according to the operation instruction, including:
- a monitoring device can be placed in the vehicle to capture a surveillance video of the driver or occupant in the vehicle as a video stream.
- the captured video stream can be dynamically recognized in real time.
- control device performs operations corresponding to the dynamic gesture recognition result, including:
- control vehicle In response to the dynamic gesture recognition result being a predefined dynamic action, the control vehicle is operative to perform an operation corresponding to the predefined dynamic action.
- the predefined dynamic action includes a dynamic gesture
- the dynamic gesture may include, but is not limited to, at least one of: single finger forward/counterclockwise rotation, palm left/right wave, two finger front stamp, thumb And the little finger sticks out, the palms are pressed down, the palms are raised upwards, the palms are left/right, the thumb is extended left/right, the palms are left/right, the palms are up, the palms are up, and the palms are up.
- Change the palm change the palm to the palm, change the palm to the palm, single-finger slide, multi-point pinch, single-finger double click, single-finger click, multi-finger double click, multi-finger click;
- the operations corresponding to the predefined dynamic actions may include, but are not limited to, at least one of: adjusting the volume/small, song switching, song pause/continue, answering or starting the call, hanging up or rejecting the call, the air conditioner temperature is raised or lowered Multi-screen interaction, open skylight, close skylight, lock door lock, unlock door lock, drag navigation, zoom out map, zoom in on map.
- the single-finger/counter-clockwise dynamic gesture can be used to adjust the volume of the vehicle's audio equipment to a large/small operation.
- the song switching operation can be performed on the sound device of the vehicle by using the dynamic gesture of the left/right swing of the palm.
- the song can be paused/continued for the audio device of the vehicle by using the dynamic gesture of the two-finger front stamp.
- the communication device of the vehicle can be answered or the operation of the phone can be started by using the dynamic gesture of the thumb and the little finger.
- the communication device of the vehicle can be hung up or rejected by the dynamic gesture of pressing down the palm of the hand.
- the air conditioner of the vehicle can be operated with an increase or decrease in the temperature of the air conditioner by using a dynamic gesture in which the thumb extends left/right.
- the movement of the sunroof can be performed on the vehicle by using the dynamic gesture of the palm up to the palm (for example, the set length can be opened each time, for example, 10 cm each time).
- the operation of locking the door lock can be performed on the vehicle by using the dynamic gesture of changing the palm of the palm to the palm.
- the vehicle can be unlocked by using the dynamic gesture of the palm down to the palm.
- the navigation device of the vehicle can be dragged and navigated by using a single-finger sliding dynamic gesture.
- the operation of reducing the map can be performed on the navigation device of the vehicle by using the dynamic gesture pinched in the multi-pointing.
- the navigation device of the vehicle can be enlarged and zoomed by using a single-finger double-click dynamic gesture.
- the window, the door or the in-vehicle system of the vehicle can be controlled by using an operation instruction. Different operations can be performed on the vehicle itself or on the vehicle system on the vehicle using the dynamic actions identified in the video stream. Based on the dynamic motion detecting method in the embodiment of the present application, the operator can be relatively accurately controlled on the vehicle itself or the in-vehicle system.
- FIG. 8 is a flowchart of an application example of a gesture interaction control method according to an embodiment of the present application. As shown in Figure 8:
- a depth camera can be configured in the vehicle, and the driver's surveillance image is acquired as a video stream using the depth camera.
- the captured surveillance image can be recognized in real time.
- dynamic gesture recognition is performed on the hand motion of the driver.
- step S810 the first queue and the second queue are set to be empty, and the dynamic gesture recognition result is also set to be empty.
- the frame images in the video stream are sequentially added to the first queue in order of time period.
- Step S820 detecting whether there is a dynamic action in the frame image of the first queue. If there is no dynamic action, step S830 is followed, and if there is a dynamic action, step S840 is followed.
- the length of the first queue may be ten frame images.
- a dynamic action is a hand motion.
- the gesture in the image to be recognized may be identified according to the finger and/or the palm on the hand.
- motion trajectories and/or switching information of the gestures in the at least one frame of images may be determined. In the case where the motion trajectory and/or the switching information of the gesture in at least one frame of the image also match, the dynamic motion in the first queue can be detected.
- step S830 at least one frame of the video stream is continuously added to the first queue in time sequence, and the process proceeds to step S820.
- the eleventh frame image may be added to the back end of the first queue, and the first frame image of the first frame in the first queue is moved out of the first queue.
- the first frame to the eleventh frame frame image is included in the first queue, and after jumping to step S820, it can be determined whether there is a dynamic action in the first queue at this time. If there is no dynamic action, the frame image of the twelfth frame may be added to the first queue, and the frame image of the second frame is removed, until it is determined according to step S820 that there is a dynamic action in the frame image of the first queue.
- Step S840 moving the frame image in the first queue to the second queue.
- the first queue is emptied, and the dynamic gesture recognition result is determined according to the frame image in the second queue, and the previously detected dynamic gesture recognition result is obtained.
- the first queue is empty, and the second queue has ten frame images, which are the thirteenth to third frames in the video stream. Twelve frames of image.
- the dynamic gesture recognition result may be determined according to an action in the frame image of the second queue.
- a frame image in which no dynamic gesture recognition is performed in the video stream can be used as a frame image to be identified for subsequent analysis. That is, the frame image starting from the frame image of the thirty-third frame can be used as the frame image to be recognized, and proceeds to step S850 of the subsequent dynamic motion detection.
- Step S850 sequentially determining whether at least one frame image to be identified in the video stream matches the dynamic gesture recognition result. If there is a match, step S860 is followed, and if not, step S870 is followed.
- the third-third frame image and the dynamic gesture may be determined according to the frame image of the thirty-third frame and the frame image of the last end of the second queue (the thirty-second frame image) Identify if the results match. It is possible to first determine whether the gesture in the frame image of the thirty-third frame is consistent with the gesture of the frame image of the thirty-second frame in the second queue.
- Determining, in a case where the gestures are consistent, an action trajectory and/or switching information of the gesture in the frame image of the thirty-third frame, and whether the motion trajectory and/or the switching information of the gesture in the dynamic gesture recognition result match (wherein the frame image The dynamic trajectory of the gesture is matched with the dynamic trajectory of the gesture in the dynamic gesture recognition result, and the switching information of the gesture in the frame image is matched with the switching information of the gesture in the dynamic gesture recognition result).
- the motion trajectory of the gesture and/or the switching information also match, it may be determined that the thirty-third frame frame image matches the dynamic gesture recognition result.
- Step S860 if matching, add the frame image to be identified to the second queue.
- the frame image in the second queue is updated to the 24th frame to the 33rd frame.
- Step S870 if there is no match, the frame image to be identified is added to the first queue.
- Step S880 determining whether the dynamic gesture in the frame image of the first queue matches the dynamic gesture recognition result. If it does not match, proceed to step S890.
- Step S890 If the dynamic gesture in the frame image of the first queue does not match the dynamic gesture recognition result, the second queue is cleared, and the frame image in the first queue is moved to the second queue. And updating the dynamic gesture recognition result according to the action in the updated frame image of the second queue.
- the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
- the foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
- FIG. 9 is a schematic structural diagram of a gesture interaction control apparatus according to an embodiment of the present disclosure. As shown in FIG. 9, the gesture interaction control apparatus includes:
- a video stream obtaining module 100 configured to acquire a video stream
- the result obtaining module 200 is configured to determine a dynamic gesture recognition result in the video stream by using the dynamic gesture recognition device described above;
- the operation execution module 300 is configured to control the device to perform an operation corresponding to the dynamic gesture recognition result.
- the operation execution module 300 includes:
- the operation instruction acquisition submodule is configured to acquire an operation instruction corresponding to the dynamic gesture recognition result according to the correspondence between the predetermined dynamic gesture recognition result and the operation instruction;
- the operation execution sub-module is configured to control the device to perform a corresponding operation according to the operation instruction.
- the operation execution sub-module is used to:
- the operation execution module 300 is further configured to:
- the vehicle In response to the detection result being a predefined dynamic action, the vehicle is controlled to perform an operation corresponding to the predefined dynamic action.
- the predefined dynamic action includes a dynamic gesture
- the dynamic gesture may include, but is not limited to, at least one of: single finger forward/counterclockwise rotation, palm left/right wave, two finger front stamp, thumb And the little finger sticks out, the palms are pressed down, the palms are raised upwards, the palms are left/right, the thumb is extended left/right, the palms are left/right, the palms are up, the palms are up, and the palms are up.
- Change the palm change the palm to the palm, change the palm to the palm, single-finger slide, multi-point pinch, single-finger double click, single-finger click, multi-finger double click, multi-finger click;
- the operations corresponding to the predefined dynamic actions may include, but are not limited to, at least one of: adjusting the volume/small, song switching, song pause/continue, answering or starting the call, hanging up or rejecting the call, the air conditioner temperature is raised or lowered Multi-screen interaction, open skylight, close skylight, lock door lock, unlock door lock, drag navigation, zoom out map, zoom in on map.
- the functions provided by the apparatus provided by the embodiments of the present application or the modules included may be used to implement the methods described in the foregoing method embodiments.
- an electronic device comprising a processor, the processor comprising the dynamic gesture recognition device provided by any one of the foregoing embodiments of the present application, or the dynamic gesture recognition provided by any one of the foregoing embodiments of the present application.
- the modeling device, or the gesture interaction control device provided by any one of the above embodiments of the present application.
- an electronic device including: a memory for storing executable instructions; and a processor for communicating with a memory to execute executable instructions to complete any one of the above embodiments of the present application
- the dynamic gesture recognition method provided by the dynamic gesture recognition modeling method provided by any one of the foregoing embodiments of the present application or the operation of the gesture interaction control method provided by any one of the foregoing embodiments of the present application.
- a computer readable storage medium for storing computer readable instructions, wherein when the instructions are executed, performing dynamic gesture recognition provided by any one of the above embodiments of the present application.
- a computer program product comprising computer readable code, wherein when a computer readable code is run on a device, a processor in the device performs the The dynamic gesture recognition method provided by any one of the embodiments or the dynamic gesture recognition modeling method provided by any one of the foregoing embodiments of the present application or the instruction of the gesture interaction control method provided by any one of the above embodiments.
- the embodiment of the present application further provides an electronic device, such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
- an electronic device such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
- FIG. 10 a schematic structural diagram of an electronic device 1000 suitable for implementing a terminal device or a server of an embodiment of the present application is shown.
- the computer system 1000 includes one or more processors and a communication unit.
- the one or more processors such as: one or more central processing units (CPUs) 1001, and/or one or more image processing units (GPUs) 1013, etc., the processors may be stored in a read only memory ( Various suitable actions and processes are performed by executable instructions in ROM) 1002 or executable instructions loaded into random access memory (RAM) 1003 from storage portion 1008.
- the communication unit 1012 may include, but is not limited to, a network card, which may include, but is not limited to, an IB (Infiniband) network card.
- the processor can communicate with the read-only memory 602 and/or the random access memory 1030 to execute executable instructions, connect to the communication unit 1012 via the bus 1004, and communicate with other target devices via the communication unit 1012, thereby completing the embodiments of the present application.
- the operation corresponding to any method, for example, positioning a dynamic gesture in the video stream to be detected, obtaining a dynamic gesture frame; and intercepting an image block corresponding to the dynamic gesture frame from the multi-frame image frame of the video stream; The image block generates a detection sequence; the dynamic gesture recognition is performed according to the detection sequence.
- RAM 1003 various programs and data required for the operation of the device can be stored.
- the CPU 1001, the ROM 1002, and the RAM 1003 are connected to each other through a bus 1004.
- ROM 1002 is an optional module.
- the RAM 1003 stores executable instructions or writes executable instructions to the ROM 1002 at runtime, the executable instructions causing the processor 1001 to perform operations corresponding to the above-described communication methods.
- An input/output (I/O) interface 1005 is also coupled to bus 1004.
- the communication unit 1012 may be integrated or may be provided with a plurality of sub-modules (for example, a plurality of IB network cards) and on the bus link.
- the following components are connected to the I/O interface 1005: an input portion 1006 including a keyboard, a mouse, etc.; an output portion 1007 including a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a speaker; a storage portion 1008 including a hard disk or the like And a communication portion 1009 including a network interface card such as a LAN card, a modem, or the like.
- the communication section 1009 performs communication processing via a network such as the Internet.
- Driver 1010 is also coupled to I/O interface 1005 as needed.
- a removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 1010 as needed so that a computer program read therefrom is installed into the storage portion 1008 as needed.
- FIG. 10 is only an optional implementation manner.
- the number and type of components in the foregoing FIG. 10 may be selected, deleted, added, or replaced according to actual needs;
- Different function components can also be implemented in separate settings or integrated settings, such as GPU and CPU detachable settings or GPU can be integrated on the CPU, the communication part can be separated, or integrated on the CPU or GPU. and many more.
- an embodiment of the present application includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising Executing instructions corresponding to the method steps provided by the embodiment of the present application, for example, positioning a dynamic gesture in the video stream to be detected to obtain a dynamic gesture frame; extracting an image of the preset frame number from the dynamic gesture frame, and extracting the image from the frame The image difference of the adjacent two frames of images is calculated in the image; the dynamic gesture recognition is performed according to the intercepted image and the image difference of the adjacent two frames of images.
- the computer program can be downloaded and installed from the network via the communication portion 1009, and/or installed from the removable medium 1011.
- the computer program is executed by the central processing unit (CPU) 1001, the above-described functions defined in the method of the present application are performed.
- the methods and apparatus of the present application may be implemented in a number of ways.
- the methods and apparatus of the present application can be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware.
- the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the present application are not limited to the order specifically described above unless otherwise specifically stated.
- the present application can also be implemented as a program recorded in a recording medium, the programs including machine readable instructions for implementing the method according to the present application.
- the present application also covers a recording medium storing a program for executing the method according to the present application.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Biodiversity & Conservation Biology (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
Claims (50)
- 一种动态手势识别方法,其特征在于,包括:对待检测的视频流中的动态手势进行定位,得到动态手势框;从所述视频流的多帧图像帧中截取与所述动态手势框对应的图像块;基于所述截取出的图像块生成检测序列;根据所述检测序列进行动态手势识别。
- 根据权利要求1所述的方法,其特征在于,所述对待检测的视频流中的动态手势进行定位,得到动态手势框,包括:对所述视频流的多帧图像中的至少一帧图像帧进行静态手势定位,得到所述至少一帧图像帧的静态手势框;根据所述至少一帧图像帧的静态手势框确定所述动态手势框。
- 根据权利要求2所述的方法,其特征在于,所述根据所述至少一帧图像帧的静态手势框确定所述动态手势框,包括:对所述至少一帧图像帧的静态手势框进行放大处理,得到所述动态手势框。
- 根据权利要求2或3所述的方法,其特征在于,所述视频流的多帧图像帧中至少一帧图像帧的静态手势框满足:所述静态手势框位于所述动态手势框内,或者,所述静态手势框与所述动态手势框相同。
- 根据权利要求1-4任一项所述的方法,其特征在于,所述根据所述检测序列进行动态手势识别,包括:确定所述检测序列中多个帧间图像差;基于所述多个帧间图像差生成图像差序列;根据所述检测序列和所述图像差序列进行动态手势识别。
- 根据权利要求5所述的方法,其特征在于,所述帧间图像差为所述检测序列中两个相邻参考帧之间的图像差。
- 根据权利要求5或6所述的方法,其特征在于,根据所述检测序列和所述图像差序列进行动态手势识别,包括:将所述检测序列输入第一动态手势识别模型,以获得所述第一动态手势识别模型输出的第一动态手势类别预测概率;将所述图像差序列输入第二动态手势识别模型,以获得所述第二动态手势识别模型输出的第二动态手势类别预测概率;根据所述第一动态手势类别预测概率和所述第二动态手势类别预测概率,确定动态手势识别结果。
- 根据权利要求7所述的方法,其特征在于,所述第一动态手势识别模型为第一神经网络,所述第二动态手势识别模型为第二神经网络,所述第一神经网络和所述第二神经网络的结构相同或不同。
- 根据权利要求5-8任一项所述的方法,其特征在于,还包括:多次截取获得所述检测序列,并多次生成所述图像差序列,以及多次根据所述检测序列以及所述图像差序列进行动态手势识别;根据每次所述动态手势识别得到的动态手势类型的概率,确定动态手势识别结果。
- 根据权利要求7或8项所述的方法,其特征在于,所述根据所述检测序列和所述图像差序列进行动态手势识别之前,还包括:采用以下方法建立所述第一动态手势识别模型:采集不同动态手势类型的样本视频流;对所述不同动态手势类型的动态手势框进行标记;从样本视频流的多帧图像帧中截取与动态手势框的标注信息对应的图像块,构成图像序列;以所述动态手势类型作为监督数据,以所述图像序列作为训练数据,训练所述第一动态手势识别模 型。
- 根据权利要求10所述的方法,其特征在于,所述以所述动态手势类型作为监督数据,以所述图像序列作为训练数据,训练所述第一动态手势识别模型,包括:将所述图像序列分为至少一段;从所述至少一段中抽取出预置帧数的图像,堆叠组成图像训练数据;以所述动态手势类型作为监督数据,以所述图像训练数据训练所述第一动态手势识别模型。
- 根据权利要求7、8、10、11任一项所述的方法,其特征在于,所述根据所述检测序列和所述图像差序列进行动态手势识别之前,还包括:采用以下方法建立所述第二动态手势识别模型:采集不同动态手势类型的样本视频流;对所述不同动态手势类型的动态手势框进行标记;从所述样本视频流的多帧图像帧中截取与动态手势框的标注信息对应的图像块,构成图像序列;确定所述图像序列中多个帧间图像差;基于所述多个帧间图像差生成图像差序列;以所述动态手势类型作为监督数据,以所述图像差序列作为训练数据,训练所述第二动态手势识别模型。
- 根据权利要求12所述的方法,其特征在于,所述以所述动态手势类型作为监督数据,以所述图像差序列作为训练数据,训练出所述第二动态手势识别模型,包括:将所述图像差序列分为至少一段;从所述至少一段中抽取出预置帧数的图像,堆叠组成图像差训练数据;以所述动态手势类型作为监督数据,以所述图像差训练数据训练所述第二动态手势识别模型。
- 根据权利要求1-13任一项所述的方法,其特征在于,所述动态手势的类型包括以下之一或任意组合:挥手手势、点击手势、打手枪手势、抓取手势。
- 一种动态手势识别建模方法,其特征在于,包括:采集不同动态手势类型的样本视频流;对所述不同动态手势类型的动态手势框进行标记;从样本视频流的多帧图像帧中截取与动态手势框的标注信息对应的图像块,构成图像序列;以动态手势类型作为监督数据,以所述图像序列作为训练数据,训练第一动态手势识别模型。
- 根据权利要求15所述的方法,其特征在于,所述以动态手势类型作为监督数据,以所述图像序列作为训练数据,训练第一动态手势识别模型,包括:将所述图像序列分为至少一段;从所述至少一段中抽取出预置帧数的图像,堆叠组成图像训练数据;以所述动态手势类型作为监督数据,以所述图像训练数据训练所述第一动态手势识别模型。
- 根据权利要求15或16所述的方法,其特征在于,还包括:采集不同动态手势类型的样本视频流;对所述不同动态手势类型的动态手势框进行标记;从所述样本视频流的多帧图像帧中截取与动态手势框的标注信息对应的图像块,构成图像序列;确定所述图像序列中多个帧间图像差;基于确定的多个帧间图像差生成图像差序列;以所述动态手势类型作为监督数据,以所述图像差序列作为训练数据,训练所述第二动态手势识别模型。
- 根据权利要求17所述的方法,其特征在于,所述以所述动态手势类型作为监督数据,以所述图像差序列作为训练数据,训练出所述第二动态手势识别模型,包括:将所述图像差序列分为至少一段;从所述至少一段中抽取出预置帧数的图像,堆叠组成图像差训练数据;以所述动态手势类型作为监督数据,以所述图像差训练数据训练所述第二动态手势识别模型。
- 一种动态手势识别装置,其特征在于,包括:手势定位单元,用于对待检测的视频流中的动态手势进行定位,得到动态手势框;处理单元,用于从所述视频流的多帧图像帧中截取与所述动态手势框对应的图像块;检测序列生成单元,用于基于所述截取出的图像块生成检测序列;手势识别单元,用于根据所述检测序列进行动态手势识别。
- 根据权利要求19所述的装置,其特征在于,所述手势定位单元包括:静态手势框定位子单元,用于对所述视频流的多帧图像中的至少一帧图像帧进行静态手势定位,得到所述至少一帧图像帧的静态手势框;动态手势框确定子单元,用于根据所述至少一帧图像帧的静态手势框确定所述动态手势框。
- 根据权利要求20所述的装置,其特征在于,所述动态手势框确定子单元用于:对所述至少一帧图像帧的静态手势框进行放大处理,得到所述动态手势框。
- 根据权利要求20或21所述的装置,其特征在于,所述视频流的多帧图像帧中至少一帧图像帧的静态手势框满足:所述静态手势框位于所述动态手势框内,或者,所述静态手势框与所述动态手势框相同。
- 根据权利要求19-22任一项所述的装置,其特征在于,所述手势识别单元包括:图像差确定子单元,用于确定所述检测序列中多个帧间图像差;图像差序列确定子单元,用于基于所述多个帧间图像差生成图像差序列;动态手势识别子单元,用于根据所述检测序列和所述图像差序列进行动态手势识别。
- 根据权利要求23所述的装置,其特征在于,所述帧间图像差为所述检测序列中两个相邻参考帧之间的图像差。
- 根据权利要求23或24所述的装置,其特征在于,所述动态手势识别子单元用于:将所述检测序列输入第一动态手势识别模型,以获得所述第一动态手势识别模型输出的第一动态手势类别预测概率;将所述图像差序列输入第二动态手势识别模型,以获得所述第二动态手势识别模型输出的第二动态手势类别预测概率;根据所述第一动态手势类别预测概率和所述第二动态手势类别预测概率,确定动态手势识别结果。
- 根据权利要求25所述的装置,其特征在于,所述第一动态手势识别模型为第一神经网络,所述第二动态手势识别模型为第二神经网络,所述第一神经网络和所述第二神经网络的结构相同或不同。
- 根据权利要求23-26任一项所述的装置,其特征在于,所述手势识别单元还包括:多次识别控制单元,用于多次截取获得所述检测序列,并多次生成所述图像差序列,以及多次根据所述检测序列以及所述图像差序列进行动态手势识别;识别结果确定单元,用于根据每次所述动态手势识别得到的动态手势类型的概率,确定动态手势识别结果。
- 根据权利要求25或26所述的装置,其特征在于,所述手势识别单元还包括:第一动态手势识别模型建立单元;所述第一动态手势识别模型建立单元包括:样本采集子单元,用于采集不同动态手势类型的样本视频流;手势框标记子单元,用于对所述不同动态手势类型的动态手势框进行标记;图像序列构成子单元,用于从样本视频流的多帧图像帧中截取与动态手势框的标注信息对应的图像块,构成图像序列;训练子单元,用于以所述动态手势类型作为监督数据,以所述图像序列作为训练数据,训练所述第一动态手势识别模型。
- 根据权利要求28所述的装置,其特征在于,所述训练子单元用于:将所述图像序列分为至少 一段;从所述至少一段中抽取出预置帧数的图像,堆叠组成图像训练数据;以及,以所述动态手势类型作为监督数据,以所述图像训练数据训练所述第一动态手势识别模型。
- 根据权利要求25、26、28、29任一项所述的装置,其特征在于,所述手势识别单元还包括:第二动态手势识别模型建立单元;所述第二动态手势识别模型建立单元包括:样本采集子单元,用于采集不同动态手势类型的样本视频流;手势框标记子单元,用于对所述不同动态手势类型的动态手势框进行标记;图像序列构成子单元,用于从所述样本视频流的多帧图像帧中截取与动态手势框的标注信息对应的图像块,构成图像序列;图像差确定子单元,用于确定所述图像序列中多个帧间图像差;图像差序列确定子单元,用于基于所述多个帧间图像差生成图像差序列;训练子单元,用于以所述动态手势类型作为监督数据,以所述图像差序列作为训练数据,训练所述第二动态手势识别模型。
- 根据权利要求30所述的装置,其特征在于,所述训练子单元用于:将所述图像差序列分为至少一段;从所述至少一段中抽取出预置帧数的图像,堆叠组成图像差训练数据;以及,以所述动态手势类型作为监督数据,以所述图像差训练数据训练所述第二动态手势识别模型。
- 根据权利要求14-31任一项所述的装置,其特征在于,所述动态手势的类型包括以下之一或任意组合:挥手手势、点击手势、打手枪手势、抓取手势。
- 一种动态手势识别模型建立装置,其特征在于,包括:第一动态手势识别模型建立单元;所述第一动态手势识别模型建立单元包括:样本采集子单元,用于采集不同动态手势类型的样本视频流;手势框标记子单元,用于对所述不同动态手势类型的动态手势框进行标记;图像序列构成子单元,用于从样本视频流的多帧图像帧中截取与动态手势框的标注信息对应的图像块,构成图像序列;训练子单元,用于以动态手势类型作为监督数据,以所述图像序列作为训练数据,训练第一动态手势识别模型。
- 根据权利要求33所述的装置,其特征在于,所述训练子单元用于:将所述图像序列分为至少一段;从所述至少一段中抽取出预置帧数的图像,堆叠组成图像训练数据;以及,以所述动态手势类型作为监督数据,以所述图像训练数据训练所述第一动态手势识别模型。
- 根据权利要求33或34所述的装置,其特征在于,还包括:第二动态手势识别模型建立单元;所述第二动态手势识别模型建立单元包括:样本采集子单元,用于采集不同动态手势类型的样本视频流;手势框标记子单元,用于对所述不同动态手势类型的动态手势框进行标记;图像序列构成子单元,用于从所述样本视频流的多帧图像帧中截取与动态手势框的标注信息对应的图像块,构成图像序列;图像差确定子单元,用于确定所述图像序列中多个帧间图像差;图像差序列确定子单元,用于基于确定的多个帧间图像差生成图像差序列;训练子单元,用于以所述动态手势类型作为监督数据,以所述图像差序列作为训练数据,训练所述第二动态手势识别模型。
- 根据权利要求35所述的装置,其特征在于,所述训练子单元用于:将所述图像差序列分为至少一段;从所述至少一段中抽取出预置帧数的图像,堆叠组成图像差训练数据;以及,以所述动态手势类型作为监督数据,以所述图像差训练数据训练所述第二动态手势识别模型。
- 一种手势交互控制方法,其特征在于,获取视频流;采用如权利要求1至14中任一项所述的方法确定所述视频流中的动态手势识别结果;控制设备执行与所述动态手势识别结果相应的操作。
- 根据权利要求37所述的方法,其特征在于,所述控制设备执行与所述动态手势识别结果相应的操作,包括:根据预先确定的动态手势识别结果与操作指令之间的对应关系,获取与所述动态手势识别结果对应的操作指令;根据所述操作指令控制设备执行相应操作。
- 根据权利要求38所述的方法,其特征在于,所述根据所述操作指令控制设备执行相应操作,包括:根据所述操作指令对车辆的车窗、车门或车载系统进行控制。
- 根据权利要求37所述的方法,其特征在于,所述控制设备执行与所述动态手势识别结果相应的操作,包括:响应于所述动态手势识别结果为预定义动态动作,控制车辆执行与所述预定义动态动作对应的操作。
- 根据权利要求40所述的方法,其特征在于,所述预定义动态动作包括动态手势,所述动态手势包括以下至少之一:单指顺/逆时针旋转、手掌左/右挥动、两指前戳、拇指和小指伸出、手掌朝下向下压、手掌朝上向上抬起、手掌向左/右扇风、拇指伸出左/右移动、手掌左/右长滑、掌心向上拳变掌、掌心向上掌变拳、掌心向下掌变拳、掌心向下拳变掌、单指滑动、多指向内捏住、单指双击、单指单击、多指双击、多指单击;与所述预定义动态动作对应的操作包括以下至少之一:调节音量大/小,歌曲切换,歌曲暂停/继续,接听或启动电话、挂断或拒接电话、空调温度升高或降低、多屏互动、打开天窗、关闭天窗、锁紧门锁、解锁门锁、拖动导航、缩小地图、放大地图。
- 一种手势交互控制装置,其特征在于,所述装置包括:视频流获取模块,用于获取视频流;结果获取模块,用于采用如权利要求19至32中任一项所述的装置确定所述视频流中的动态手势识别结果;操作执行模块,用于控制设备执行与所述动态手势识别结果相应的操作。
- 根据权利要求42所述的装置,其特征在于,所述操作执行模块,包括:操作指令获取子模块,用于根据预先确定的动态手势识别结果与操作指令之间的对应关系,获取与所述动态手势识别结果对应的操作指令;操作执行子模块,用于根据所述操作指令控制设备执行相应操作。
- 根据权利要求43所述的装置,其特征在于,所述操作执行子模块,用于:根据所述操作指令对车辆的车窗、车门或车载系统进行控制。
- 根据权利要求42所述的装置,其特征在于,所述操作执行模块,还用于:响应于所述检测结果为预定义动态动作,控制车辆执行与所述预定义动态动作对应的操作。
- 根据权利要求45所述的装置,其特征在于,所述预定义动态动作包括动态手势,所述动态手势包括以下至少之一:单指顺/逆时针旋转、手掌左/右挥动、两指前戳、拇指和小指伸出、手掌朝下向下压、手掌朝上向上抬起、手掌向左/右扇风、拇指伸出左/右移动、手掌左/右长滑、掌心向上拳变掌、掌心向上掌变拳、掌心向下掌变拳、掌心向下拳变掌、单指滑动、多指向内捏住、单指双击、单指单击、多指双击、多指单击;与所述预定义动态动作对应的操作包括以下至少之一:调节音量大/小,歌曲切换,歌曲暂停/继续,接听或启动电话、挂断或拒接电话、空调温度升高或降低、多屏互动、打开天窗、关闭天窗、锁紧门锁、解锁门锁、拖动导航、缩小地图、放大地图。
- 一种电子设备,其特征在于,包括处理器,所述处理器包括权利要求19至32任意一项所述的 动态手势识别装置,或权利要求33至36任意一项所述的动态手势识别建模装置,或权利要求42至46任意一项所述的手势交互控制装置。
- 一种电子设备,其特征在于,包括:存储器,用于存储可执行指令;以及处理器,用于与所述存储器通信以执行所述可执行指令从而完成权利要求1至14任意一项所述动态手势识别方法或权利要求15至18任意一项所述的动态手势识别建模方法或权利要求37至41任意一项所述的手势交互控制方法的操作。
- 一种计算机可读存储介质,用于存储计算机可读取的指令,其特征在于,所述指令被执行时执行权利要求1至14任意一项所述动态手势识别方法或权利要求15至18任意一项所述的动态手势识别建模方法或权利要求37至41任意一项所述的手势交互控制方法的操作。
- 一种计算机程序产品,包括计算机可读代码,其特征在于,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现权利要求1至14任意一项所述动态手势识别方法或权利要求15至18任意一项所述的动态手势识别建模方法或权利要求37至41任意一项所述的手势交互控制方法的指令。
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2019543878A JP6765545B2 (ja) | 2017-12-22 | 2018-12-21 | 動的ジェスチャ認識方法および装置、ジェスチャ対話制御方法および装置 |
| SG11201909139T SG11201909139TA (en) | 2017-12-22 | 2018-12-21 | Methods and apparatuses for recognizing dynamic gesture, and control methods and apparatuses using gesture interaction |
| US16/530,190 US11221681B2 (en) | 2017-12-22 | 2019-08-02 | Methods and apparatuses for recognizing dynamic gesture, and control methods and apparatuses using gesture interaction |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201711417801.8A CN109960980B (zh) | 2017-12-22 | 2017-12-22 | 动态手势识别方法及装置 |
| CN201711417801.8 | 2017-12-22 | ||
| CN201810974244.8 | 2018-08-24 | ||
| CN201810974244.8A CN109144260B (zh) | 2018-08-24 | 2018-08-24 | 动态动作检测方法、动态动作控制方法及装置 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/530,190 Continuation US11221681B2 (en) | 2017-12-22 | 2019-08-02 | Methods and apparatuses for recognizing dynamic gesture, and control methods and apparatuses using gesture interaction |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2019120290A1 true WO2019120290A1 (zh) | 2019-06-27 |
Family
ID=66992452
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2018/122767 Ceased WO2019120290A1 (zh) | 2017-12-22 | 2018-12-21 | 动态手势识别方法和装置、手势交互控制方法和装置 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US11221681B2 (zh) |
| JP (1) | JP6765545B2 (zh) |
| SG (1) | SG11201909139TA (zh) |
| WO (1) | WO2019120290A1 (zh) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2021089761A (ja) * | 2020-02-14 | 2021-06-10 | ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッドBeijing Baidu Netcom Science Technology Co., Ltd. | ジェスチャによる電子機器の制御方法及び装置 |
| CN113222582A (zh) * | 2021-05-10 | 2021-08-06 | 广东便捷神科技股份有限公司 | 一种人脸支付零售终端机 |
Families Citing this family (25)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP6601644B1 (ja) * | 2018-08-03 | 2019-11-06 | Linne株式会社 | 画像情報表示装置 |
| CN109344755B (zh) * | 2018-09-21 | 2024-02-13 | 广州市百果园信息技术有限公司 | 视频动作的识别方法、装置、设备及存储介质 |
| CN110992426B (zh) * | 2019-12-09 | 2024-03-22 | 北京明略软件系统有限公司 | 姿势识别方法和装置、电子设备及存储介质 |
| CN112115801B (zh) * | 2020-08-25 | 2023-11-24 | 深圳市优必选科技股份有限公司 | 动态手势识别方法、装置、存储介质及终端设备 |
| JP2022087700A (ja) * | 2020-12-01 | 2022-06-13 | 京セラドキュメントソリューションズ株式会社 | 電子機器および画像形成装置 |
| CN113239714A (zh) * | 2020-12-07 | 2021-08-10 | 北京理工大学 | 一种融合注意力机制的动态手势实时识别方法 |
| CN112396666A (zh) * | 2020-12-09 | 2021-02-23 | 广西双英集团股份有限公司 | 基于手势识别的装配过程智能控制方法 |
| CN114967905A (zh) * | 2021-02-26 | 2022-08-30 | 广州视享科技有限公司 | 手势控制方法、装置、计算机可读存储介质和电子设备 |
| CN113126753B (zh) * | 2021-03-05 | 2023-04-07 | 深圳点猫科技有限公司 | 一种基于手势关闭设备的实现方法、装置及设备 |
| CN112686231B (zh) * | 2021-03-15 | 2021-06-01 | 南昌虚拟现实研究院股份有限公司 | 动态手势识别方法、装置、可读存储介质及计算机设备 |
| KR102530222B1 (ko) | 2021-03-17 | 2023-05-09 | 삼성전자주식회사 | 이미지 센서 및 이미지 센서의 동작 방법 |
| US12211274B2 (en) * | 2021-04-19 | 2025-01-28 | The Toronto-Dominion Bank | Weakly supervised action selection learning in video |
| CN113448443B (zh) * | 2021-07-12 | 2024-11-15 | 交互未来(北京)科技有限公司 | 一种基于硬件结合的大屏幕交互方法、装置和设备 |
| CN113642413A (zh) * | 2021-07-16 | 2021-11-12 | 新线科技有限公司 | 控制方法、装置、设备及介质 |
| CN114255513A (zh) * | 2021-12-10 | 2022-03-29 | 深圳市鸿合创新信息技术有限责任公司 | 手势识别方法、装置、设备及存储介质 |
| CN114356076B (zh) * | 2021-12-13 | 2023-10-03 | 中国船舶重工集团公司第七0九研究所 | 一种手势操控方法和系统 |
| AU2021290427B2 (en) * | 2021-12-20 | 2023-05-18 | Sensetime International Pte. Ltd. | Object recognition method, apparatus, device and storage medium |
| WO2023118937A1 (en) * | 2021-12-20 | 2023-06-29 | Sensetime International Pte. Ltd. | Object recognition method, apparatus, device and storage medium |
| CN114463839A (zh) * | 2021-12-30 | 2022-05-10 | 浙江大华技术股份有限公司 | 一种手势识别的方法、装置、电子装置和存储介质 |
| US11899846B2 (en) * | 2022-01-28 | 2024-02-13 | Hewlett-Packard Development Company, L.P. | Customizable gesture commands |
| US12073027B2 (en) * | 2022-12-20 | 2024-08-27 | Accenture Global Solutions Limited | Behavior-based standard operating procedure detection |
| US12472988B2 (en) | 2023-08-28 | 2025-11-18 | Wipro Limited | Method and system for predicting gesture of subjects surrounding an autonomous vehicle |
| WO2025159651A1 (ru) * | 2024-01-25 | 2025-07-31 | Публичное Акционерное Общество "Сбербанк России" | Способ и система распознавания жестов |
| CN119323831B (zh) * | 2024-12-18 | 2025-03-28 | 广州炫视智能科技有限公司 | 一种人机交互控制方法及控制系统 |
| CN119840388B (zh) * | 2025-02-28 | 2025-10-03 | 广汽埃安新能源汽车股份有限公司 | 车载空调控制方法、装置、电子设备和存储介质 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102426480A (zh) * | 2011-11-03 | 2012-04-25 | 康佳集团股份有限公司 | 一种人机交互系统及其实时手势跟踪处理方法 |
| US20130058565A1 (en) * | 2002-02-15 | 2013-03-07 | Microsoft Corporation | Gesture recognition system using depth perceptive sensors |
| CN103593680A (zh) * | 2013-11-19 | 2014-02-19 | 南京大学 | 一种基于隐马尔科夫模型自增量学习的动态手势识别方法 |
| CN104834894A (zh) * | 2015-04-01 | 2015-08-12 | 济南大学 | 一种结合二进制编码和类-Hausdorff距离的手势识别方法 |
Family Cites Families (48)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| USRE46310E1 (en) * | 1991-12-23 | 2017-02-14 | Blanding Hovenweep, Llc | Ergonomic man-machine interface incorporating adaptive pattern recognition based control system |
| US6075895A (en) * | 1997-06-20 | 2000-06-13 | Holoplex | Methods and apparatus for gesture recognition based on templates |
| US6501515B1 (en) * | 1998-10-13 | 2002-12-31 | Sony Corporation | Remote control system |
| JP3657463B2 (ja) * | 1999-06-29 | 2005-06-08 | シャープ株式会社 | 動作認識システムおよび動作認識プログラムを記録した記録媒体 |
| US7274800B2 (en) * | 2001-07-18 | 2007-09-25 | Intel Corporation | Dynamic gesture recognition from stereo sequences |
| US6937742B2 (en) * | 2001-09-28 | 2005-08-30 | Bellsouth Intellectual Property Corporation | Gesture activated home appliance |
| US20080065291A1 (en) * | 2002-11-04 | 2008-03-13 | Automotive Technologies International, Inc. | Gesture-Based Control of Vehicular Components |
| US7421727B2 (en) * | 2003-02-14 | 2008-09-02 | Canon Kabushiki Kaisha | Motion detecting system, motion detecting method, motion detecting apparatus, and program for implementing the method |
| US7308112B2 (en) * | 2004-05-14 | 2007-12-11 | Honda Motor Co., Ltd. | Sign based human-machine interaction |
| JP5303355B2 (ja) * | 2009-05-21 | 2013-10-02 | 学校法人 中央大学 | 周期ジェスチャ識別装置、周期ジェスチャ識別方法、周期ジェスチャ識別プログラム、及び記録媒体 |
| KR20110007806A (ko) * | 2009-07-17 | 2011-01-25 | 삼성전자주식회사 | 카메라를 이용하여 손동작을 인식하는 장치 및 방법 |
| US8818027B2 (en) * | 2010-04-01 | 2014-08-26 | Qualcomm Incorporated | Computing device interface |
| US8751215B2 (en) | 2010-06-04 | 2014-06-10 | Microsoft Corporation | Machine based sign language interpreter |
| US20110304541A1 (en) * | 2010-06-11 | 2011-12-15 | Navneet Dalal | Method and system for detecting gestures |
| CN102402680B (zh) | 2010-09-13 | 2014-07-30 | 株式会社理光 | 人机交互系统中手部、指示点定位方法和手势确定方法 |
| WO2012051747A1 (en) * | 2010-10-18 | 2012-04-26 | Nokia Corporation | Method and apparatus for providing hand detection |
| CN102053702A (zh) | 2010-10-26 | 2011-05-11 | 南京航空航天大学 | 动态手势控制系统与方法 |
| US20120268374A1 (en) * | 2011-04-25 | 2012-10-25 | Heald Arthur D | Method and apparatus for processing touchless control commands |
| US8693726B2 (en) * | 2011-06-29 | 2014-04-08 | Amazon Technologies, Inc. | User identification by gesture recognition |
| JP6243112B2 (ja) * | 2011-12-09 | 2017-12-06 | ソニー株式会社 | 情報処理装置、情報処理方法、および記録媒体 |
| EP2650754A3 (en) * | 2012-03-15 | 2014-09-24 | Omron Corporation | Gesture recognition apparatus, electronic device, gesture recognition method, control program, and recording medium |
| US9734393B2 (en) * | 2012-03-20 | 2017-08-15 | Facebook, Inc. | Gesture-based control system |
| TWI489326B (zh) * | 2012-06-05 | 2015-06-21 | Wistron Corp | 操作區的決定方法與系統 |
| JP5935529B2 (ja) * | 2012-06-13 | 2016-06-15 | ソニー株式会社 | 画像処理装置、画像処理方法、およびプログラム |
| US9128528B2 (en) * | 2012-06-22 | 2015-09-08 | Cisco Technology, Inc. | Image-based real-time gesture recognition |
| US9111135B2 (en) * | 2012-06-25 | 2015-08-18 | Aquifi, Inc. | Systems and methods for tracking human hands using parts based template matching using corresponding pixels in bounded regions of a sequence of frames that are a specified distance interval from a reference camera |
| US9098739B2 (en) * | 2012-06-25 | 2015-08-04 | Aquifi, Inc. | Systems and methods for tracking human hands using parts based template matching |
| TWI479430B (zh) * | 2012-10-08 | 2015-04-01 | Pixart Imaging Inc | 以自然影像進行的手勢辨識方法 |
| WO2014088621A1 (en) * | 2012-12-03 | 2014-06-12 | Google, Inc. | System and method for detecting gestures |
| RU2013146529A (ru) | 2013-10-17 | 2015-04-27 | ЭлЭсАй Корпорейшн | Распознавание динамического жеста руки с избирательным инициированием на основе обнаруженной скорости руки |
| JP6460862B2 (ja) * | 2014-03-20 | 2019-01-30 | 国立研究開発法人産業技術総合研究所 | ジェスチャ認識装置、システム及びそのプログラム |
| US9355236B1 (en) * | 2014-04-03 | 2016-05-31 | Fuji Xerox Co., Ltd. | System and method for biometric user authentication using 3D in-air hand gestures |
| US9418319B2 (en) * | 2014-11-21 | 2016-08-16 | Adobe Systems Incorporated | Object detection using cascaded convolutional neural networks |
| US20160162148A1 (en) | 2014-12-04 | 2016-06-09 | Google Inc. | Application launching and switching interface |
| US10097758B2 (en) | 2015-11-18 | 2018-10-09 | Casio Computer Co., Ltd. | Data processing apparatus, data processing method, and recording medium |
| US20170161607A1 (en) * | 2015-12-04 | 2017-06-08 | Pilot Ai Labs, Inc. | System and method for improved gesture recognition using neural networks |
| US10318008B2 (en) * | 2015-12-15 | 2019-06-11 | Purdue Research Foundation | Method and system for hand pose detection |
| CN106934333B (zh) | 2015-12-31 | 2021-07-20 | 芋头科技(杭州)有限公司 | 一种手势识别方法及系统 |
| JP2017191496A (ja) * | 2016-04-14 | 2017-10-19 | 株式会社東海理化電機製作所 | ジェスチャ判定装置 |
| WO2018017399A1 (en) * | 2016-07-20 | 2018-01-25 | Usens, Inc. | Method and system for 3d hand skeleton tracking |
| US20180088671A1 (en) * | 2016-09-27 | 2018-03-29 | National Kaohsiung University Of Applied Sciences | 3D Hand Gesture Image Recognition Method and System Thereof |
| CN106648112A (zh) | 2017-01-07 | 2017-05-10 | 武克易 | 一种体感动作识别方法 |
| CN107169411B (zh) | 2017-04-07 | 2019-10-29 | 南京邮电大学 | 一种基于关键帧和边界约束dtw的实时动态手势识别方法 |
| CN107180224B (zh) | 2017-04-10 | 2020-06-19 | 华南理工大学 | 基于时空滤波和联合空间Kmeans的手指运动检测与定位方法 |
| CN107316022B (zh) | 2017-06-27 | 2020-12-01 | 歌尔光学科技有限公司 | 动态手势识别方法和装置 |
| CN108197596B (zh) | 2018-01-24 | 2021-04-06 | 京东方科技集团股份有限公司 | 一种手势识别方法和装置 |
| US10296102B1 (en) * | 2018-01-31 | 2019-05-21 | Piccolo Labs Inc. | Gesture and motion recognition using skeleton tracking |
| US11967127B2 (en) * | 2018-04-18 | 2024-04-23 | Sony Interactive Entertainment Inc. | Context embedding for capturing image dynamics |
-
2018
- 2018-12-21 WO PCT/CN2018/122767 patent/WO2019120290A1/zh not_active Ceased
- 2018-12-21 JP JP2019543878A patent/JP6765545B2/ja active Active
- 2018-12-21 SG SG11201909139T patent/SG11201909139TA/en unknown
-
2019
- 2019-08-02 US US16/530,190 patent/US11221681B2/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130058565A1 (en) * | 2002-02-15 | 2013-03-07 | Microsoft Corporation | Gesture recognition system using depth perceptive sensors |
| CN102426480A (zh) * | 2011-11-03 | 2012-04-25 | 康佳集团股份有限公司 | 一种人机交互系统及其实时手势跟踪处理方法 |
| CN103593680A (zh) * | 2013-11-19 | 2014-02-19 | 南京大学 | 一种基于隐马尔科夫模型自增量学习的动态手势识别方法 |
| CN104834894A (zh) * | 2015-04-01 | 2015-08-12 | 济南大学 | 一种结合二进制编码和类-Hausdorff距离的手势识别方法 |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2021089761A (ja) * | 2020-02-14 | 2021-06-10 | ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッドBeijing Baidu Netcom Science Technology Co., Ltd. | ジェスチャによる電子機器の制御方法及び装置 |
| JP7146977B2 (ja) | 2020-02-14 | 2022-10-04 | ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド | ジェスチャによる電子機器の制御方法及び装置 |
| CN113222582A (zh) * | 2021-05-10 | 2021-08-06 | 广东便捷神科技股份有限公司 | 一种人脸支付零售终端机 |
| CN113222582B (zh) * | 2021-05-10 | 2022-03-08 | 广东便捷神科技股份有限公司 | 一种人脸支付零售终端机 |
Also Published As
| Publication number | Publication date |
|---|---|
| JP6765545B2 (ja) | 2020-10-07 |
| SG11201909139TA (en) | 2019-10-30 |
| JP2020508511A (ja) | 2020-03-19 |
| US11221681B2 (en) | 2022-01-11 |
| US20190354194A1 (en) | 2019-11-21 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2019120290A1 (zh) | 动态手势识别方法和装置、手势交互控制方法和装置 | |
| JP7073522B2 (ja) | 空中手書きを識別するための方法、装置、デバイス及びコンピュータ読み取り可能な記憶媒体 | |
| US11610148B2 (en) | Information processing device and information processing method | |
| CN108874126B (zh) | 基于虚拟现实设备的交互方法及系统 | |
| WO2021115181A1 (zh) | 手势识别方法、手势控制方法、装置、介质与终端设备 | |
| CN106997236B (zh) | 基于多模态输入进行交互的方法和设备 | |
| US10095033B2 (en) | Multimodal interaction with near-to-eye display | |
| CN107643828B (zh) | 车辆、控制车辆的方法 | |
| US20170161555A1 (en) | System and method for improved virtual reality user interaction utilizing deep-learning | |
| US20110261213A1 (en) | Real time video process control using gestures | |
| JP2014137818A (ja) | 手の平開閉動作識別方法と装置、マン・マシン・インタラクション方法と設備 | |
| US20140267011A1 (en) | Mobile device event control with digital images | |
| JP2014523019A (ja) | 動的ジェスチャー認識方法および認証システム | |
| CN103106388B (zh) | 图像识别方法和系统 | |
| CN112286360A (zh) | 用于操作移动设备的方法和装置 | |
| CN109725722B (zh) | 有屏设备的手势控制方法和装置 | |
| CN117980873A (zh) | 一种显示设备及其控制方法 | |
| KR20190132885A (ko) | 영상으로부터 손을 검출하는 장치, 방법 및 컴퓨터 프로그램 | |
| Yadav et al. | Gesture Recognition System for Human-Computer Interaction using Computer Vision | |
| CN108256071B (zh) | 录屏文件的生成方法、装置、终端及存储介质 | |
| CN107346207B (zh) | 一种基于隐马尔科夫模型的动态手势切分识别方法 | |
| EP2781991B1 (en) | Signal processing device and signal processing method | |
| Rahman et al. | Continuous motion numeral recognition using RNN architecture in air-writing environment | |
| WO2024160105A1 (zh) | 交互方法、装置、电子设备和存储介质 | |
| CN116360603A (zh) | 基于时序信号匹配的交互方法、设备、介质及程序产品 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18893075 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2019543878 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 05/10/2020) |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18893075 Country of ref document: EP Kind code of ref document: A1 |