CN111046804A

CN111046804A - Living body detection method, device, electronic device, and readable storage medium

Info

Publication number: CN111046804A
Application number: CN201911285947.0A
Authority: CN
Inventors: 王鹏; 姚聪
Original assignee: Beijing Kuangshi Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Priority date: 2019-12-13
Filing date: 2019-12-13
Publication date: 2020-04-21

Abstract

The embodiment of the application provides a living body detection method and device, electronic equipment and a readable storage medium. The method comprises the following steps: acquiring at least one to-be-detected video of a user; respectively carrying out face living body detection and gesture detection on at least one video to be detected to obtain a face living body detection result and a gesture detection result of the at least one video to be detected, wherein the face living body detection result comprises whether a living body face exists in the video to be detected, and the gesture detection result comprises a matching result of a gesture corresponding to a target gesture identification and a gesture existing in the video to be detected; for each video to be detected, determining a detection result of the video to be detected based on a human face living body detection result and a gesture detection result of the video to be detected; and determining the living body detection result of the user based on the detection result of the at least one video to be detected. In the embodiment of the application, the human face living body detection and the gesture detection are carried out on the video to be detected, so that the potential safety hazard is effectively reduced.

Description

Living body detection method, living body detection device, electronic equipment and readable storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for detecting a living body, an electronic device, and a readable storage medium.

Background

The face liveness detection method refers to a technique by which it is determined whether a face in a given image or video is from a real person or from a spoofed face (mask, print photograph, photograph displayed on a screen, or video clip played, etc.). The human face living body judgment is an important technical means for preventing attack and fraud, and has wide application in industries and occasions relating to remote identity authentication, such as banks, insurance, internet finance, electronic commerce and the like.

Currently, a video live body detection method is generally adopted when a human face live body is detected, and the video live body method is mainly divided into an action live body and a silence live body. The action living body needs a user to perform a plurality of specified actions according to prompts, and the actions relate to nodding the head, shaking the head, blinking, opening the mouth and the like. But the specified action is simple, so the method is easy to crack and has higher potential safety hazard. The silent living body only needs the user to look at the camera for about 2-3 seconds, and can keep still or have normal actions in the period, but the verification mode is simpler, and higher potential safety hazard also exists.

Disclosure of Invention

The present application aims to solve at least one of the above technical drawbacks.

In a first aspect, there is provided a method of in vivo detection, the method comprising:

acquiring at least one to-be-detected video of a user;

respectively carrying out face living body detection and gesture detection on at least one video to be detected to obtain a face living body detection result and a gesture detection result of the at least one video to be detected, wherein the face living body detection result comprises whether a living body face exists in the video to be detected, and the gesture detection result comprises a matching result of a gesture corresponding to a target gesture identification and a gesture existing in the video to be detected;

for each video to be detected, determining a detection result of the video to be detected based on a human face living body detection result and a gesture detection result of the video to be detected;

and determining the living body detection result of the user based on the detection result of the at least one video to be detected.

In an optional embodiment of the first aspect, determining the detection result of the video to be detected based on the human face living body detection result and the gesture detection result of the video to be detected includes:

and when the living human face detection result of the video to be detected is that the living human face exists and the gesture detection result is that the gesture existing in the video to be detected is matched with the gesture corresponding to the target gesture identification, determining that the detection result of the video to be detected is that the detection is passed, or determining that the detection result of the video to be detected is that the detection is not passed.

In an optional embodiment of the first aspect, determining a living body detection result of the user based on a detection result of at least one video to be detected includes:

when the detection result of the video to be detected is that the number of the passing detection is not less than the set number, determining that the living body detection result of the user is a living body, wherein the set number is not more than the number of the video to be detected; or the like, or, alternatively,

and when the detection result of the video to be detected is that the video is not detected, determining that the living body detection result of the user is a non-living body.

In an alternative embodiment of the first aspect, the target gesture identification is a candidate gesture identification randomly selected from a preconfigured candidate gesture database.

In an alternative embodiment of the first aspect, the candidate gesture identifications are digital gesture identifications.

In an optional embodiment of the first aspect, performing living human face detection and gesture detection on at least one to-be-detected video respectively to obtain a living human face detection result and a gesture detection result of the at least one to-be-detected video includes:

inputting at least one video to be detected into a gesture living body detection model, and obtaining a human face living body detection result and a gesture detection result of a video frame based on the output of the gesture living body detection model;

the gesture detection result of the video to be detected comprises a target matching probability of a gesture corresponding to the gesture in the video to be detected and the target gesture identification, and when the target matching probability meets a set condition, the gesture detection result of the video to be detected is a matching result of the gesture existing in the video to be detected and the gesture corresponding to the target gesture identification.

In an embodiment of the first aspect, the step of satisfying the target matching probability by the set condition includes:

the target matching probability is greater than a preset threshold value;

or when the gesture detection result of the video to be detected comprises the matching probability of the gesture in the video to be detected and the candidate gesture corresponding to each candidate gesture identification, the target matching probability is the highest probability.

In an optional embodiment of the first aspect, acquiring a video to be detected of a user includes:

acquiring an initial video pre-acquired by a video acquisition device;

when the initial video is determined to comprise the face image, providing gesture prompt information of the target gesture;

and acquiring a target video acquired by the video acquisition device after the gesture prompt information is provided, and taking the target video as a video to be detected.

In an optional embodiment of the first aspect, the method further comprises:

and if the detection result of the video to be detected is determined to be undetected, providing corresponding prompt information for the user according to the human face living body detection result and the gesture detection result of the video to be detected.

In a second aspect, there is provided a living body detection apparatus, the apparatus comprising:

the video acquisition module is used for acquiring a video to be detected of at least one user;

the video detection module is used for respectively carrying out human face living body detection and gesture detection on at least one video to be detected to obtain a human face living body detection result and a gesture detection result of the at least one video to be detected, wherein the human face living body detection result comprises whether a living body human face exists in the video to be detected, and the gesture detection result comprises a matching result of a gesture corresponding to a target gesture identification and a gesture existing in the video to be detected;

the detection result determining module is used for determining the detection result of the video to be detected based on the human face living body detection result and the gesture detection result of the video to be detected for each video to be detected;

and the living body detection result determining module is used for determining the living body detection result of the user based on the detection result of the at least one video to be detected.

In an optional embodiment of the second aspect, the detection result of the video to be detected includes a detection pass and a detection fail, and the detection result determining module is specifically configured to, when determining the detection result of the video to be detected based on the human face living body detection result and the gesture detection result of the video to be detected:

In an optional embodiment of the second aspect, the living body detection result determining module is specifically configured to, when determining the living body detection result of the user based on the detection result of the at least one to-be-detected video:

In an alternative embodiment of the second aspect, the target gesture identification is a candidate gesture identification randomly selected from a preconfigured database of candidate gestures.

In an alternative embodiment of the second aspect, the candidate gesture identifications are digital gesture identifications.

In an optional embodiment of the second aspect, the video detection module is specifically configured to, when performing face live detection and gesture detection on at least one video to be detected respectively to obtain a face live detection result and a gesture detection result of the at least one video to be detected:

In an alternative embodiment of the second aspect, the step of satisfying the target matching probability with the set condition includes:

the target matching probability is greater than a preset threshold value;

In an embodiment of the second aspect, when acquiring a to-be-detected video of a user, the video acquiring module is specifically configured to:

acquiring an initial video pre-acquired by a video acquisition device;

In an optional embodiment of the second aspect, the apparatus further includes an information prompting module, specifically configured to:

In a third aspect, an electronic device is provided, which includes:

a processor and a memory configured to store machine readable instructions that, when executed by the processor, cause the processor to perform any of the methods of the first aspect.

In a fourth aspect, there is provided a computer readable storage medium having stored thereon computer instructions which, when run on a computer, cause the computer to perform a method implementing any of the first aspects

The technical scheme provided by the embodiment of the application has the following beneficial effects:

in the embodiment of the application, when the face live detection is performed on the video to be detected, not only the face live detection is performed on the video to be detected, but also the gesture detection is performed on the video to be detected, so that the detection result of the video to be detected is obtained, and when the live detection result of the user is determined, the determination is performed based on the detection result of at least one video to be detected. Therefore, compared with the prior art that only face living body detection or only gesture detection is carried out to determine whether the face living body detection is passed, the potential safety hazard can be effectively reduced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.

FIG. 1 is a schematic flow chart of a method for detecting a living organism according to an embodiment of the present disclosure;

FIG. 2 is a schematic structural diagram of a living body detecting apparatus according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

An embodiment of the present application provides a method for detecting a living body, which may be performed by a server, as shown in fig. 1, and the method includes:

step S101, at least one video to be detected of a user is obtained.

The number of the videos to be detected may be configured in advance, and the embodiment of the present application is not limited.

In practical application, the video of the user to be detected, which needs to be subjected to face living body detection, in the embodiment of the present application, the obtaining of the video to be detected may include:

acquiring an initial video pre-acquired by a video acquisition device;

when the initial video is determined to comprise the face image, providing gesture prompt information corresponding to the target gesture identification;

Optionally, the target gesture identifier is a candidate gesture identifier randomly selected from a preconfigured candidate gesture database.

The specific form of the target gesture corresponding to the target gesture identifier is not limited to be optional in the embodiment of the present application, and in the embodiment of the present application, the candidate gesture identifier may be a digital gesture identifier, such as an identifier of a gesture with a number 1 to an identifier of a gesture with a number 9. The target gesture identification can be one or more candidate gesture identifications randomly selected from a pre-configured candidate gesture database, the candidate gesture database can be configured in a server, when the target gesture needs to be provided, the server randomly selects one or more candidate gesture identifications from the pre-configured candidate gesture database as the target gesture identifications and sends the information of the determined target gesture identifications to the terminal device, and the terminal device provides the target gesture information corresponding to the target gesture identifications to the user. Certainly, in practical applications, the candidate gesture database may also be configured in the terminal device, and then when the terminal device needs to provide a target gesture for a user, one or more candidate gesture identifications may be directly randomly selected from the preconfigured data candidate gesture database as the target gesture identifications.

The specific implementation manner of providing the target gesture prompt information by the terminal device is also not limited in the embodiment of the present application. For example, when the terminal device is internally provided with the video capture device, text indication information of the target gesture can be displayed on a display screen of the terminal device, and when the terminal device is connected with the video capture device in an external connection mode, information of the target gesture and the like can be broadcasted in a voice mode.

In the embodiment of the application, the initial video can be pre-collected, the target gesture information corresponding to the target gesture identification is provided after the face is determined to exist in the video, and the provided target gesture identification is randomly selected from the candidate gesture library, so that the situation that a user records the video with the target gesture in advance as the video to be detected is prevented, and the situation that the face in the video to be detected is a real face or a deceptive face can be effectively identified.

In an example, it is assumed that the terminal device is a mobile phone, the video capture device is a camera in the mobile phone, and the target gesture identifier is an identifier of a gesture with a number 9. Correspondingly, after the user starts to shoot the video through the camera, the user can be guided to align the face to the camera, for example, prompt information of 'please look ahead at the camera', 'please make the face located in a virtual line area in the picture' can be displayed through a mobile phone screen; further, an initial video can be pre-collected, whether the pre-collected initial video comprises a face image or not is determined, if yes, the number '9' can be displayed on a screen, the video is continuously collected through the video collecting device, and the obtained video serves as the video to be detected.

When it is determined whether a face image exists in the acquired initial video, the method may be executed once or repeatedly until it is determined that a face exists in the acquired initial video, or the execution times reach a set number of times to stop acquiring the initial video, and the like. It can be understood that if no face image exists in the currently acquired initial video, prompt information can be displayed to the user, and correspondingly, if the acquisition times reach the set times, no face image exists in the acquired initial video, the video acquisition failure information can be prompted to make the user know.

Step S102, respectively carrying out face living body detection and gesture detection on at least one video to be detected to obtain a face living body detection result and a gesture detection result of the at least one video to be detected, wherein the face living body detection result comprises whether a living body face exists in the video to be detected, and the gesture detection result comprises a matching result of a gesture existing in the video to be detected and a gesture corresponding to the target gesture identification.

The living human face detection result represents whether a living human face exists in a video to be detected, that is, the living human face detection result can include two situations, one situation is that the living human face exists, and the other situation is that the living human face does not exist. The gesture detection result represents a matching result of a gesture existing in the video to be detected and a gesture corresponding to the target gesture identifier, and the gesture detection result may also include two cases, one case is that the gesture existing in the video to be detected is a gesture corresponding to the target gesture identifier (i.e., the target gesture exists in the video to be detected), and the other case is that the gesture not existing in the video to be detected is a gesture corresponding to the target gesture identifier (i.e., the target gesture does not exist in the video to be detected).

In addition, in practical application, a sequence of target gesture identifications may be generated, and each video to be detected may correspond to one target gesture identification in the sequence. Correspondingly, when the gesture detection is performed on at least one video to be detected, the matching result of the gesture existing in each video to be detected and the target gesture identification corresponding to the sequence of the target gesture identification can be detected in sequence.

In practical application, the number of the human face living body detection and the gesture detection which are specifically required to be performed can be configured in advance, and the embodiment of the application is not limited. For example, when the safety factor requirement of the current application scene is high, the number of videos to be detected, which need to be subjected to face live body detection and gesture detection, may be set to be larger, and when the safety factor requirement of the current application scene is low, the number of videos to be detected, which need to be subjected to face live body detection and gesture detection, may be set to be smaller, such as 1 or 2.

The embodiment of the application is not limited by the specific implementation mode of carrying out the face living body detection and the gesture detection on the video to be detected. For example, the model may be implemented by a neural network, that is, a video to be detected may be input to the neural network model, and a face living body detection result and a gesture detection result are obtained based on an output of the neural network model. It can be understood that when the neural network model for obtaining the face in-vivo detection result and the neural network model for obtaining the gesture detection result are independent neural networks, the video to be detected needs to be input to the neural network model for determining the face in-vivo detection result and the neural network model for determining the gesture detection result respectively.

And S103, for each video to be detected, determining the detection result of the video to be detected based on the human face living body detection result and the gesture detection result of the video to be detected.

And step S104, determining the living body detection result of the user based on the detection result of the at least one video to be detected.

In practical application, for each video to be detected for face live body detection and gesture detection, the detection result of the video to be detected can be determined based on the obtained face live body detection result and gesture detection result. Further, a final corresponding living body detection result of the user can be obtained based on the detection result of the at least one video to be detected.

The living body detection result of the user represents whether the user included in the video to be detected is a living body, that is, the living body detection result may also include two situations, one situation is that the user included in the video to be detected is a living body, and the other situation is that the user included in the video to be detected is a non-living body.

In an optional embodiment of the present application, the determining the detection result of the video to be detected based on the human face living body detection result and the gesture detection result of the video to be detected includes:

That is to say, in practical application, only when the detection result of the video to be detected is that a living human face exists and the gesture detection result is that the gesture existing in the video to be detected is the target gesture corresponding to the target gesture identifier, it can be determined that the detection result of the video to be detected is passed, but as long as one of the detection results is not satisfied, the detection result of the video to be detected is failed.

In an optional embodiment of the present application, performing living human face detection and gesture detection on a video to be detected to obtain a living human face detection result and a gesture detection result of the video to be detected, including:

carrying out face living body detection and gesture detection on a video frame in a video to be detected to obtain a detection result of the video frame, wherein the detection result of the video frame comprises a face living body detection result of the video frame and a gesture detection result of the video frame;

when the living human face detection result of the video to be detected is that a living human face exists and the gesture detection result is that a gesture existing in the video to be detected is matched with a gesture corresponding to the target gesture identification, determining that the detection result of the video to be detected is that the detection is passed, and including:

and when the detection result of at least one frame of video frame in the video to be detected is that the living human face exists and the gesture detection result is that the gesture existing in the video to be detected is matched with the gesture corresponding to the target gesture identification, determining that the detection result of the video to be detected is that the detection is passed.

In practical application, the video to be detected is composed of video frames, and further, in the embodiment of the application, when the face living body detection and the gesture detection are performed on the video to be detected, the face living body detection and the gesture detection can be performed on the video frames in the video to be detected frame by frame, so that the detection result corresponding to each video frame is obtained. Correspondingly, when determining whether the human face living body detection passes, determining whether the human face living body detection result of the video frame is the existence of the living body human face, determining whether the gesture detection result of the video frame is the target gesture corresponding to the target gesture identification, and when the detection result of at least one video frame in the video to be detected is the existence of the living body human face and the gesture detection result is the existence of the target gesture (namely, the detection result of at least one video frame image in the video to be detected meets the human face living body detection passing requirement), determining that the detection of the video to be detected passes.

In addition, in practical application, when performing gesture detection on a video frame, an optional way is to detect a hand region in the video frame, and then identify whether a gesture corresponding to a target gesture identifier exists in the hand region, so as to obtain a gesture detection result. However, when the gesture detection is performed in this way, the video frame may include a plurality of hand regions, and at this time, the plurality of hand regions may be filtered to one based on a preset filtering condition, and then the gesture recognition is performed on the remaining hand region. The preset screening condition may be configured in advance, and the embodiment of the present application is not limited, for example, the preset screening condition may be set as a hand region with the largest reserved region area, and the like.

In practical applications, the security levels required by different application scenarios may be different, for example, when an application program related to property is verified, the lower the expected potential safety hazard is, the better the potential safety hazard is, and then multiple times of verification are required to pass, while when an ordinary application program is verified, only one time of verification may be required.

Based on this, in an optional embodiment of the present application, determining a living body detection result of a user based on a detection result of at least one video to be detected includes:

In practical application, a plurality of videos to be detected can be acquired, and the living body detection result of a user (i.e., a user recording a video to be predicted) is determined based on the detection results of the acquired plurality of videos to be detected, and a detection end condition can be configured in advance in specific implementation. Correspondingly, after the detection result of one video to be detected is determined, if the detection end condition is not met currently, the next video to be detected can be continuously acquired, if the next video to be detected is determined, if the detection end condition is not met, the steps of acquiring the video to be detected and determining the detection result of the video to be detected are continuously executed until the detection end condition is met.

The video to be detected may be detected in a number of cases, for example, the number of passed detection is not less than a set number. That is to say, in practical applications, a plurality of videos to be detected can be obtained altogether, and it is determined whether the number of passing detections in the detection results of the videos to be detected is not less than a preset number, if not, it is determined that the face living body detection result of the user is a living body, and if it is less than the preset number, it is determined that the face living body detection result of the user is a non-living body. Wherein the set number is not greater than the number of the videos to be detected.

In one example, it is assumed that the number of videos to be detected is 5, and the set number is 3. Correspondingly, the detection result of the currently acquired video to be detected can be determined every time the video to be detected of the user is acquired, that is, the detection results of 5 videos to be detected exist at the moment; further, the number of detection results in the detection results of the 5 videos to be detected can be determined as the number of detection passes, and if the number is determined to be not less than 3, the face living body detection result of the user can be determined as the number of detection passes.

In addition, the end condition may be satisfied, that is, when the detection result of one video to be detected is not passed, that is, the living body detection result of the user is determined to be a non-living body, that is, only when the detection results of the video to be detected, which are determined for each number of times, are all passed, the living body detection result of the face of the user may be determined to be a living body. At this time, when the detection result of each video to be detected is determined, if the detection result of one video to be detected is determined to be failed in detection, the result of the living human face detection of the user can be directly determined to be a non-living human body.

In an example, assuming that there are 3 videos to be detected, in practical application, a first video to be detected of a user may be obtained first, and a detection result of the first video to be detected is determined, if the detection result is that the detection is passed, further, a second video to be detected of the user may be obtained and a detection result of the video to be detected is determined, if the detection result of the second video to be detected is that the detection is not passed, at this time, it may be directly determined that a face living body detection result of the user is a non-living body, and the obtaining of the video to be detected of the user may be stopped; if the detection result of the second video to be detected is that the detection is passed, a third video to be detected of the user can be obtained and the detection result of the video to be detected is determined, at this time, if the detection result of the third video to be detected is that the detection is not passed, the living human face detection result of the user is determined to be a non-living body, if the detection result of the third video to be detected is that the detection is passed, the living human face detection result of the user is determined to be a living body, namely, at this time, all of the 3 videos to be detected are passed.

In an optional embodiment of the present application, the performing face live body detection and gesture detection on at least one video to be detected respectively to obtain a face live body detection result and a gesture detection result of the at least one video to be detected includes:

inputting at least one video to be detected into a gesture living body detection model, and obtaining a face living body detection result and a gesture detection result of the video to be detected based on the output of the gesture living body detection model;

the gesture detection result of the video to be detected comprises a target matching probability of a gesture in the video to be detected and a gesture corresponding to the target gesture identification, and when the target matching probability meets a set condition, the gesture detection result of the video to be detected is the matching of the gesture existing in the video to be detected and the gesture corresponding to the target gesture identification.

In practical applications, the gesture living body detection model refers to a neural network model for determining a human face living body detection result and a gesture detection result. That is to say, the face detection result and the gesture detection result of the video to be detected can be obtained based on the gesture living body detection model. It can be understood that, for the face live detection of the video to be detected, one video frame of the video to be detected or several video frames with time sequence may be input to the gesture live detection model to obtain a face detection result, or the video to be detected may be directly input to the gesture live detection model to obtain a face detection result, which is not limited in the embodiment of the present application. For the gesture live body detection of the video to be detected, one video to be detected of the video to be detected may be input into the gesture live body detection model to obtain a corresponding gesture detection result, or one video frame with the best quality in the video to be detected may be input into the gesture live body detection model to obtain a corresponding gesture detection result.

The gesture detection result represents a matching result of the gesture in the video to be detected and the gesture corresponding to the target gesture identifier, and may include a target matching probability of the gesture in the video to be detected and the gesture corresponding to the target gesture identifier, and the target matching probability may be understood as the possibility that the target gesture exists in the video to be detected, and it may be understood that when the target matching probability is higher, the possibility that the target gesture exists in the video to be detected is higher, and when the target matching probability is lower, the possibility that the target gesture exists in the video to be detected is lower.

The human face living body detection result represents whether a human face exists in the video to be detected (namely, a human face detection result) and whether the existing human face is a living body (namely, a human face living body detection result). The embodiment of the present application is not limited to this, for example, a letter P may be used to indicate a face detection result, where P is 1, and P is 0, and thus it indicates that a face is present, or when the face detection result is true, it indicates that a face is present, and when the face detection result is false, it indicates that a face is not present, and so on. The embodiments of the present application are not limited to the specific representation forms of the two cases included in the living body detection result of the human face. For example, the letter d may be used to indicate a live body detection result of a face, indicating that a face exists as a live body when d is 1 and that a face exists as a non-live body when d is 0, or indicating that a face exists as a live body when a live body detection result is true and that a face exists as a non-live body when a live body detection result is false, and the like.

In the embodiment of the application, the neural network model for determining the human face living body detection result and the neural network model for determining the gesture detection result can be integrated into one model, so that the detection speed of human face living body detection can be effectively improved.

In an alternative embodiment of the present application, the step of satisfying the target matching probability with the set condition includes:

the target matching probability is greater than a preset threshold value;

For one embodiment, the gesture detection result may include a target matching probability. Optionally, since the target gesture identifier is randomly selected from the candidate gesture data, different gesture recognition submodels may be trained for different candidate gesture identifiers at this time; further, when the gesture result of the video to be detected is detected, the corresponding gesture recognition submodel can be recognized and selected according to the target gesture identification of the video to be detected, and then the target matching probability of the video to be detected is obtained based on the corresponding gesture recognition submodel.

Correspondingly, if it is determined that the target matching probability included in the gesture detection result of the video to be detected meets the set condition, it may be determined that the target gesture exists in the video to be detected, and in practical application, the target matching probability meeting the preset condition may include various conditions, for example, when the target matching probability is greater than the preset threshold, it may be determined that the target matching probability meets the preset condition, and the size of the preset threshold may be preconfigured without limitation in the embodiment of the present application. For example, assuming that the preset threshold is 70%, if the obtained target matching probability is greater than 70%, it may be determined that the target matching probability satisfies the preset condition, and if the obtained target matching probability is not greater than 70%, the target matching probability does not satisfy the set condition.

As another embodiment, the obtained gesture detection result may include probabilities that the gesture in the video to be detected corresponds to the gesture corresponding to each candidate gesture identifier, further, if the gesture detection result of the video to be detected includes the matching probabilities that the gesture in the video to be detected corresponds to the candidate gesture corresponding to each candidate gesture identifier, since the target gesture identifier is randomly selected from the candidate gesture data, and further the matching probabilities that the gesture in the video to be detected corresponds to the candidate gesture corresponding to each candidate gesture identifier include the target matching probability, at this time, it may be determined whether the highest probability in the matching probabilities that the gesture in the video to be detected corresponds to the candidate gesture corresponding to each candidate gesture identifier is the target matching probability (i.e., whether the highest probability is the probability of the gesture corresponding to the target gesture identifier), if the target matching probability is the highest probability, if the target matching probability meets the set condition, the target gesture exists in the gesture detection result, otherwise, the target matching probability does not meet the set condition, and the target gesture does not exist in the gesture detection result.

The expression form of the probability corresponding to each candidate gesture included in the gesture detection result is not limited. For example, assuming that the candidate gesture identifiers include a gesture identifier with a number 1, a gesture identifier with a number 2, and a gesture identifier with a number 3, the target gesture identifier is a gesture identifier with a number 1 (at this time, the probability of the gesture corresponding to the number 1 is the target matching probability), and the gesture detection result is represented by c. When determining the gesture detection result, it may be determined that the probability that the gesture in the video to be detected is the gesture with the number 1 is 50%, the probability that the gesture with the number 2 is 10%, and the probability that the gesture with the number 3 is 40%, where c1 ═ 50% represents the probability that the gesture in the video to be detected is the gesture with the number 1, c2 ═ 10% represents the probability that the gesture in the video to be detected is the gesture with the number 2, and c3 ═ 40% represents the probability that the gesture in the video to be detected is the gesture with the number 3. Further, since the probability (c1 ═ 50%) (i.e., the target matching probability) of the gesture corresponding to the number 1 is the highest probability, it is indicated that the target matching probability satisfies the set condition, and it may be determined that the target gesture exists in the gesture detection result.

In an optional embodiment of the present application, the face live body detection result includes a first probability that a live body face exists in a video to be detected, and determining whether a live body face exists in the video to be detected based on the face live body detection result of the video to be detected includes:

and if the human face living body detection result of the video to be detected indicates that a human face exists and the first probability meets a preset condition, determining that the living body human face exists in the video to be detected.

In practical applications, the living body detection result of the face included in the face living body detection result may include a first probability that the face existing in the video to be detected is a living body, and it can be understood that the higher the first probability is, the higher the possibility that the face existing in the video to be detected is a living body is. Further, if the living body detection result of the included face includes the first probability, the face detection result of the video to be detected is the presence of the face, and the face present in the video to be detected can be determined as the living body when the first probability meets the preset condition.

In an alternative embodiment of the present application, the first probability meeting the preset condition may include the first probability being greater than a set threshold;

or when the face living body detection result further includes a second probability that the living body face does not exist in the video to be detected, the first probability meeting the preset condition includes that the first probability is greater than the second probability.

In practical applications, the first probability satisfying the preset condition may include a plurality of cases, for example, when the first probability is greater than a set threshold, the first probability may be determined to satisfy the preset condition, and a size of the set threshold may be configured in advance in the embodiments of the present application without limitation. For example, assuming that the threshold is set to 80%, if the obtained first probability is greater than 80%, it may be determined that the first probability satisfies the preset condition, and if the obtained first probability is not greater than 80%, the first probability does not satisfy the preset condition.

In addition, in practical application, the live-body detection result of the face included in the live-body detection result of the face may further include a second probability representing that the face existing in the video to be detected is a non-live-body face, and it can be understood that when the second probability is higher, the probability that the face existing in the video to be detected is a live body is smaller, and the first probability included in the video to be detected is also correspondingly reduced. Correspondingly, when the living body detection result of the included face includes the first probability and the second probability, if the first probability is greater than the second probability, it can be determined that the first probability meets the preset condition, that is, the face existing in the video to be detected is a living body. For example, if the first probability included in the living body detection result of the human face is 80% and the second probability is 20%, and the first probability is greater than the second probability, it may be determined that the first probability satisfies the preset condition, and the human face existing in the video to be detected is a living body.

If the living body detection result of the included face includes the first probability and the second probability, the embodiment of the present application is not limited to the specific representation form of the living body detection result of the included face. In an example, a letter d may be used to represent the living body detection result, and d1 may be used to represent the first probability, d2 may represent the second probability, where a value of d1 is the first probability, and a value of d2 is the second probability. For example, when the first probability is 80% and the second probability is 20%, when d1 is 80% and d2 is 20%.

In an optional embodiment of the present application, the method may further include:

In practical application, if it is determined that the detection result of the video to be detected fails, corresponding prompt information can be provided for a user based on the obtained face living body detection result and the gesture detection result, so that the user can record the video to be detected in a correct mode. The method includes the steps of providing a user with a corresponding prompt message according to a human face living body detection result and a gesture detection result of a video to be detected, and configuring specific content of the prompt message in advance.

As an optional implementation manner, when the face detection result included in the face live body detection result is that no face exists, prompt information that a front face needs to be shot may be provided to the user, for example, a text "please see your front face" may be displayed on a display screen; when the face detection result included in the face live body detection result indicates that a face exists, but the first probability in the face live body detection result is smaller than the second probability, prompt information that a real person needs to shoot a video to be detected can be provided for a user, for example, a character 'please use the real person for detection' can be displayed on a screen; if the target matching probability in the gesture detection result is not the highest probability in the probabilities corresponding to the candidate gestures, providing a prompt message for a user to compare the target gesture, for example, displaying a text "please compare the corresponding gesture correctly" on a screen; in addition, if the probability corresponding to each candidate gesture in the gesture detection result is 0, it indicates that no gesture exists in the video to be detected, and at this time, prompt information that a target gesture needs to be drawn may also be provided to the user, for example, a text "please draw a corresponding gesture correctly" may also be displayed on the display screen.

In order to better understand the scheme provided in the embodiments of the present application, the method provided in the embodiments of the present application is described in detail below with reference to specific application scenarios.

In this example, it is assumed that the current application scenario is a scenario in which the property application program verifies user login, and the verification needs to be repeated 3 times, and when all detection results of 3 times are detection passes, the user is allowed to login to the property application program (i.e. the user is a living body); the candidate gestures are a number 1 gesture to a number 9 gesture, and the target gesture is a number 9 gesture; the passing conditions of the human face living body detection of the video to be detected are as follows: when the living human face detection result of one frame of video frame in the video to be detected indicates that a living human face exists and the gesture of the living human face detection result is matched with the gesture corresponding to the target gesture identification, the detection result of the video to be detected is a detection pass.

Correspondingly, when a user logs in through the client of the property application program, prompt information of a detected video needing to be shot is displayed to the user through the client, the user is guided to align the face to a front camera shooting acquisition device of the terminal equipment where the client is located when the user starts to shoot the detected video, the initial video starts to be acquired, if the fact that the face exists in the initial video is determined, target gesture prompt information corresponding to a target gesture identifier randomly selected by a server from candidate gestures is displayed in an operation interface of the client (for example, a numeral 9 is displayed in a screen), the user needs to demarcate corresponding gestures in a specified area (for example, a virtual line area in a display picture) according to the prompt information of the target gesture within a set time length (wherein, the face is ensured to be in the picture capable of being shot by the camera shooting acquisition device simultaneously in the shooting process), and finishing shooting the detection video after the preset time length is reached, and taking the video shot in the preset time length as the video to be detected.

Further, the video to be detected may be input to a neural network model (the neural network model is a model integrated with a neural network model for determining a face detection result and a neural network model for determining a gesture detection result), and video frames in the video to be detected may be detected frame by frame to obtain a face in-vivo detection result and a gesture detection result.

For example, after the video to be detected is input to the neural network model, the face live detection result and the gesture detection result of one frame of video frame can be represented in the following forms:

y＝(p,p1,p2,c,c0,c1,c2,c3,c4,c5,c6,c7,c8,c9)

wherein, p represents a face detection result in the face live body detection result, and represents that a face exists when p is 1, and represents that no face exists when p is 0; p1 represents a first probability in the live detection result of the included face, p2 represents a second probability in the live detection result of the included face, and p, p1 and p2 constitute a face live detection result; c represents whether a gesture exists or not, when c is 1, the gesture represents that a human face exists in the video frame, and when c is 0, the gesture represents that the human face does not exist in the video frame; c 0-c 9 represent the probability that the gesture in the video frame corresponds to the digital 0 gesture-digital 9 gesture, and c 0-c 9 constitute gesture detection results.

Accordingly, when the face live body detection result and the gesture detection result are represented in an upper format, if p is 1, p1> p2, and c is 1, and the value of c9 is the maximum value among c0 to c9, it may be determined that the detection result of the current video frame is a pass detection; correspondingly, the passing condition of the human face living body detection of the video to be detected is as follows: when the living human face detection result of one frame of video frame in the video to be detected indicates that a living human face exists and the gesture of the gesture detection result which exists is matched with the gesture corresponding to the target gesture identification, the detection result of the video to be detected is detected to be passed, and therefore the detection result of the current video to be detected can be determined to be passed.

Furthermore, the user can be prompted to shoot a second video to be detected and a third video to be detected, then the face living body detection and the gesture detection are carried out on the video frames in the second video to be detected and the third video to be detected frame by frame, if the detection results of the three videos to be detected are all detection results, the user is indicated to be logged in and verified to pass, and the user is allowed to log in the property application program. The processes of shooting the second video to be detected and the third video to be detected and performing the face living body detection and the gesture detection on the second video to be detected and the third video to be detected are the same as the processes of shooting the first video to be detected and performing the face living body detection and the gesture detection on the first video to be detected, and are not repeated here. As an optimal embodiment, when performing gesture detection on a to-be-detected video shot by a user, target gesture identifications corresponding to each to-be-detected video may be different, so as to improve the accuracy of living body detection of the user.

It can be understood that if it is detected that any detection result to be detected by the video is failed, that is, it indicates that the user has failed to log in the application program of the property class this time, the user cannot log in the application program of the property class, and if the user still wants to continue to log in the application program of the property class, a next verification process may be performed, where the next verification process is the same as the verification method provided in this example, and is not described again.

In addition, in practical application, if the detection result of the video to be detected is failed, information can be prompted to a user based on the output human face living body detection result and the gesture detection result. For example, when p is 0, a prompt message "please see your front face" may be displayed on the client of the property application; when c is 0, or c is 1 and the candidate gesture corresponding to the highest probability in c 0-c 9 is different from the target gesture, a prompt message of "please correctly match the corresponding gesture" can be displayed on the client of the property application program; when p is 1 and p1< p2, a prompt message "please check with a real person" may be displayed at the client of the property application.

An embodiment of the present application provides a living body detecting apparatus, and as shown in fig. 2, the living body detecting apparatus 60 may include: a video acquisition module 601, a video detection module 602, a detection result determination module 603, and a living body detection result determination module 604, wherein,

the video acquisition module 601 is configured to acquire a to-be-detected video of at least one user;

the video detection module 602 is configured to perform living human face detection and gesture detection on at least one video to be detected respectively to obtain a living human face detection result and a gesture detection result of the at least one video to be detected, where the living human face detection result includes whether a living human face exists in the video to be detected, and the gesture detection result includes a matching result of a gesture existing in the video to be detected and a gesture corresponding to the target gesture identifier;

the detection result determining module 603 is configured to determine, for each video to be detected, a detection result of the video to be detected based on a face living body detection result and a gesture detection result of the video to be detected;

the living body detection result determining module 604 is configured to determine a living body detection result of the user based on a detection result of the at least one video to be detected.

In an optional embodiment of the present application, the detection result of the video to be detected includes a detection pass and a detection fail, and the detection result determination module is specifically configured to, when determining the detection result of the video to be detected based on the human face living body detection result and the gesture detection result of the video to be detected:

In an optional embodiment of the present application, when determining the live detection result of the user based on the detection result of the at least one to-be-detected video, the live detection result determining module is specifically configured to:

In an alternative embodiment of the present application, the target gesture identifier is a candidate gesture identifier randomly selected from a preconfigured candidate gesture database.

In an alternative embodiment of the present application, the candidate gesture identifier is a digital gesture identifier.

In an optional embodiment of the present application, the video detection module is specifically configured to, when performing face live detection and gesture detection on at least one to-be-detected video respectively to obtain a face live detection result and a gesture detection result of the at least one to-be-detected video:

the gesture detection result of the video to be detected comprises a target matching probability of the gesture in the video to be detected and the gesture corresponding to the target gesture identification, and when the target matching probability meets a set condition, the gesture detection result of the video to be detected is the matching of the gesture existing in the video to be detected and the gesture corresponding to the target gesture identification.

In an optional embodiment of the present application, the step of satisfying the target matching probability with the set condition includes:

the target matching probability is greater than a preset threshold value;

In an optional embodiment of the present application, when the video acquisition module acquires a to-be-detected video of a user, the video acquisition module is specifically configured to:

acquiring an initial video pre-acquired by a video acquisition device;

In an optional embodiment of the present application, the apparatus further includes an information prompt module, specifically configured to:

The biopsy method device of this embodiment can execute the biopsy method provided in this embodiment, and the implementation principles thereof are similar, and are not described herein again.

An embodiment of the present application provides an electronic device, as shown in fig. 3, an electronic device 2000 shown in fig. 3 includes: a processor 2001 and a memory 2003. Wherein the processor 2001 is coupled to a memory 2003, such as via a bus 2002. Optionally, the electronic device 2000 may also include a transceiver 2004. It should be noted that the transceiver 2004 is not limited to one in practical applications, and the structure of the electronic device 2000 is not limited to the embodiment of the present application.

The processor 2001 is applied in the embodiment of the present application to implement the functions of the modules shown in fig. 2.

The processor 2001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 2001 may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs and microprocessors, and the like.

Bus 2002 may include a path that conveys information between the aforementioned components. The bus 2002 may be a PCI bus or an EISA bus, etc. The bus 2002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 3, but this does not mean only one bus or one type of bus.

The memory 2003 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

The memory 2003 is used to store application program code for performing the aspects of the present application and is controlled in execution by the processor 2001. The processor 2001 is used to execute application program codes stored in the memory 2003 to implement the actions of the living body detecting apparatus provided by the embodiment shown in fig. 2.

An embodiment of the present application provides an electronic device, where the electronic device includes: a processor; and a memory configured to store machine readable instructions that, when executed by the processor, cause the processor to perform a liveness detection method.

Embodiments of the present application provide a computer-readable storage medium for storing computer instructions thereon, which, when executed on a computer, enable the computer to perform a method for performing a biopsy.

The terms and implementation principles used in this application for a computer-readable storage medium may refer to a method for detecting a living body in this application, and are not described herein again.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. a method for detecting a living body, comprising:

Obtain at least one video to be detected of the user;

Perform face liveness detection and gesture detection on at least one of the videos to be detected, respectively, to obtain at least one face liveness detection result and gesture detection result of the to-be-detected video, and the face liveness detection result includes whether the video to be detected is detected. There is a live human face, and the gesture detection result includes the matching result of the gesture existing in the video to be detected and the gesture corresponding to the target gesture identifier;

For each to-be-detected video, determine the detection result of the to-be-detected video based on the facial liveness detection result and the gesture detection result of the to-be-detected video;

Based on a detection result of at least one of the videos to be detected, a living body detection result of the user is determined.

2. The method according to claim 1, wherein the detection result of the video to be detected comprises that the detection passes and the detection fails, and the detection result of the human face and the gesture detection result based on the video to be detected, Determine the detection result of the video to be detected, including:

When the face living detection result of the video to be detected is that there is a living face, and the gesture detection result is that the gesture existing in the to-be-detected video matches the gesture corresponding to the target gesture identifier, it is determined that the to-be-detected video is The detection result of the detected video is that the detection has passed, otherwise, it is determined that the detection result of the video to be detected is that the detection has not passed.

3 . The method according to claim 2 , wherein the determining of the user's living body detection result based on the detection result of at least one of the videos to be detected comprises: 3 .

When the detection result of the video to be detected is that the number of passing detections is not less than the set number, it is determined that the user's living body detection result is a living body, wherein the set number is not greater than the number of the to-be-detected videos; or,

When there is a video to be detected and the detection result is that the detection fails, it is determined that the user's living body detection result is non-living.

4 . The method according to claim 1 , wherein the target gesture identifier is a candidate gesture identifier randomly selected from a preconfigured candidate gesture database. 5 .

5 . The method according to claim 4 , wherein the candidate gesture identifiers are digital gesture identifiers. 6 .

6. The method according to claim 3, wherein said at least one said video to be detected is subjected to face live detection and gesture detection respectively, to obtain at least one of said to be detected video of human face live detection result and gesture detection result, include:

Inputting at least one of the videos to be detected into a gesture liveness detection model, and based on the output of the gesture liveness detection model, obtain a face liveness detection result and a gesture detection result of the to-be-detected video;

Wherein, the gesture detection result of the video to be detected includes the target matching probability of the gesture in the video to be detected and the gesture corresponding to the target gesture identifier, and when the target matching probability satisfies a set condition, the video to be detected The gesture detection result is that the gesture existing in the to-be-detected video matches the gesture corresponding to the target gesture identifier.

7. The method according to claim 6 or 4, wherein the target matching probability meeting a set condition comprises:

The target matching probability is greater than a preset threshold;

Alternatively, when the gesture detection result of the video to be detected includes the matching probability between the gesture in the video to be detected and the candidate gesture corresponding to each candidate gesture identifier, the target matching probability is the highest probability.

8. The method according to claim 1, wherein the acquiring the video to be detected of the user comprises:

Obtain the initial video pre-captured by the video capture device;

When it is determined that the initial video includes a face image, the gesture prompt information corresponding to the target gesture identifier is provided;

Acquire a target video collected by the video capture device after the gesture prompt information is provided, and use the target video as the video to be detected.

9. A living body detection device, comprising:

a video acquisition module for acquiring at least one video to be detected of the user;

A video detection module, configured to perform face liveness detection and gesture detection on at least one of the videos to be detected, respectively, and obtain at least one face liveness detection result and gesture detection result of the to-be-detected video, and the face liveness detection result Including whether there is a live human face in the video to be detected, and the gesture detection result includes the matching result of the gesture existing in the video to be detected and the gesture corresponding to the target gesture identifier;

a detection result determination module, configured to determine the detection result of the to-be-detected video for each to-be-detected video based on the facial liveness detection result and the gesture detection result of the to-be-detected video;

A living body detection result determination module, configured to determine a living body detection result of the user based on a detection result of at least one of the videos to be detected.

10. An electronic device, comprising a processor and a memory:

The memory is configured to store machine-readable instructions which, when executed by the processor, cause the processor to perform the method of any of claims 1-8.

11. A computer readable storage medium on which a computer program is stored, characterized in that the computer storage medium is used to store computer instructions that, when run on a computer, enable the computer to execute the above claims 1-8 The method of any of the above.