CN115171227A

CN115171227A - Living body detection method, living body detection device, electronic apparatus, and storage medium

Info

Publication number: CN115171227A
Application number: CN202211077263.3A
Authority: CN
Inventors: 黄石磊; 刘轶; 程刚; 廖晨; 蒋志燕
Original assignee: Shenzhen Raisound Technology Co ltd
Current assignee: Shenzhen Raisound Technology Co ltd
Priority date: 2022-09-05
Filing date: 2022-09-05
Publication date: 2022-10-11
Anticipated expiration: 2042-09-05
Also published as: WO2024051380A1; CN115171227B

Abstract

The embodiment of the invention relates to the technical field of biological identification, in order to accurately identify whether an object to be detected is a living body, the embodiment of the invention relates to a living body detection method, a living body detection device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining a sound signal to be detected and an image to be detected corresponding to the sound signal to be detected, wherein the image to be detected is an image containing the face of an object to be detected; determining a sound source position corresponding to the sound signal to be detected, and determining the lip position of the object to be detected based on the image to be detected; and carrying out consistency comparison on the sound source position and the lip position, and determining the in-vivo detection result of the object to be detected according to the comparison result. Therefore, even if the non-authentication user counterfeits the authentication user by means of acquiring the video image of the authentication user, the object to be detected can be identified as a non-living body, and the accuracy and reliability of the living body detection result are improved.

Description

Living body detection method, living body detection device, electronic apparatus, and storage medium

Technical Field

The embodiment of the invention relates to the technical field of biological identification, in particular to a method and a device for detecting a living body, electronic equipment and a storage medium.

Background

The living body detection is a method for determining the real physiological characteristics of an object in some identity verification scenes, and in the application of face recognition, the living body detection can verify whether a user is a real living body personal operation technology by means of combined actions of blinking, mouth opening, head shaking, head nodding and the like and by using vital sign detection technologies such as face key point positioning, face tracking and the like so as to resist common attack means such as photos, face changing, masks, sheltering, screen copying and the like.

However, vital signs of the user, such as blinking, mouth opening, head shaking, head nodding, etc., can also be detected by performing vital sign detection on a pre-recorded user video or a real-time user video. Based on this, when the counterfeiter performs the live body detection by holding the video of the authenticated user, the identity verification is likely to be passed through the live body detection.

Disclosure of Invention

In view of this, in order to accurately identify whether an object to be detected is a living body, embodiments of the present invention provide a living body detection method, an apparatus, an electronic device, and a storage medium.

In a first aspect, an embodiment of the present invention provides a method for detecting a living body, where the method includes:

determining a sound signal to be detected and an image to be detected corresponding to the sound signal to be detected, wherein the image to be detected is an image containing the face of an object to be detected;

determining a sound source position corresponding to the sound signal to be detected, and determining the lip position of the object to be detected based on the image to be detected;

and comparing the consistency of the sound source position and the lip position, and determining the in-vivo detection result of the object to be detected according to the comparison result.

In one possible embodiment, the matching the sound source position and the lip position according to the consistency comprises:

determining a reference spatial region based on the lip position;

determining whether the sound source position is located within the reference spatial region;

if so, obtaining a comparison result of the sound source position and the lip position which are consistent;

if not, obtaining a comparison result of the inconsistency between the sound source position and the lip position.

In a possible embodiment, the determining the living body detection result of the object to be detected according to the comparison result includes:

if the comparison result represents that the sound source position is inconsistent with the lip position, determining that the living body detection result of the object to be detected is a non-living body;

or if the comparison result represents that the sound source position is consistent with the lip position, determining that the living body detection result of the object to be detected is a living body.

In a possible embodiment, in a case that the alignment result indicates that the sound source position and the lip position are consistent, the method further comprises:

inputting the image to be detected and the sound signal to be detected into a trained mouth shape recognition model to obtain an output result of the mouth shape recognition model;

and if the output result shows that the to-be-detected sound signal is matched with the mouth shape of the to-be-detected object in the to-be-detected image, executing the step of determining that the living body detection result of the to-be-detected object is a living body.

In one possible embodiment, before the determining the sound signal to be detected, the method further comprises:

outputting an interactive instruction, wherein the interactive instruction is used for indicating the object to be detected to send out a sound signal corresponding to preset text data;

before the determining the sound source position corresponding to the sound signal to be detected, the method further includes:

performing voice recognition on the sound signal to be detected to obtain text data corresponding to the sound signal to be detected;

carrying out consistency comparison on the text data corresponding to the sound signal to be detected and the preset text data;

and if the comparison result shows that the text data corresponding to the sound signal to be detected is consistent with the preset text data, executing the step of determining the sound source position corresponding to the sound signal.

In one possible embodiment, the method further comprises:

and if the comparison result shows that the text data corresponding to the sound signal to be detected is inconsistent with the preset text data, returning to the step of executing the output interactive instruction.

In one possible embodiment, the generating process of the interactive instruction includes:

calling a preset random number generation algorithm to generate a random array;

and generating the preset text data based on the random array, and generating the interactive instruction according to the preset text data.

In one possible embodiment, the determining the sound signal to be detected includes:

acquiring sound signals collected by a plurality of microphones;

and synthesizing the sound signals collected by each microphone to obtain the sound signals to be detected.

In a possible embodiment, the determining a sound source position corresponding to the sound signal to be detected includes:

decomposing the sound signal to be detected to obtain a plurality of decomposed signals;

determining a sound source direction for each of the decomposed signals;

and determining the intersection point positions of the plurality of sound source directions as the sound source positions of the sound signals to be detected.

In a second aspect, an embodiment of the present invention provides a living body detection apparatus, including:

the device comprises a first determining module, a second determining module and a judging module, wherein the first determining module is used for determining a sound signal to be detected and an image to be detected corresponding to the sound signal to be detected, and the image to be detected is an image containing the face of an object to be detected;

the second determining module is used for determining the sound source position corresponding to the sound signal to be detected and determining the lip position of the object to be detected based on the image to be detected;

and the third determining module is used for comparing the consistency of the sound source position and the lip position and determining the in-vivo detection result of the object to be detected according to the comparison result.

In a possible implementation manner, the third determining module is specifically configured to:

determining a reference spatial region based on the lip position;

and if not, obtaining a comparison result of the inconsistency of the sound source position and the lip position.

In one possible embodiment, the apparatus further comprises:

the input module is used for inputting the image to be detected and the sound signal to be detected into a trained mouth shape recognition model under the condition that the comparison result represents that the sound source position is consistent with the lip position, so as to obtain an output result of the mouth shape recognition model;

and the first execution module is used for executing the step of determining that the living body detection result of the object to be detected is a living body if the output result shows that the sound signal to be detected is matched with the mouth shape of the object to be detected in the image to be detected.

In one possible embodiment, the apparatus further comprises:

the output module is used for outputting an interactive instruction before the sound signal to be detected is determined, wherein the interactive instruction is used for indicating the object to be detected to send out a sound signal corresponding to preset text data;

the recognition module is used for carrying out voice recognition on the sound signal to be detected before the sound source position corresponding to the sound signal to be detected is determined, so as to obtain text data corresponding to the sound signal to be detected;

the comparison module is used for carrying out consistency comparison on the text data corresponding to the sound signal to be detected and the preset text data;

and the second execution module is used for executing the step of determining the sound source position corresponding to the sound signal if the comparison result shows that the text data corresponding to the sound signal to be detected is consistent with the preset text data.

In one possible embodiment, the apparatus further comprises:

and the third execution module is used for returning to the step of executing the output interactive instruction if the comparison result shows that the text data corresponding to the sound signal to be detected is inconsistent with the preset text data.

In a possible implementation manner, the output module is specifically configured to:

calling a preset random number generation algorithm to generate a random array;

In a possible implementation manner, the first determining module is specifically configured to:

acquiring sound signals collected by a plurality of microphones;

In a possible implementation manner, the second determining module is specifically configured to:

determining a sound source direction for each of the decomposed signals;

and determining the intersection point positions of the sound source directions to be the sound source positions of the sound signals to be detected.

In a third aspect, an embodiment of the present invention provides an electronic device, including: a processor and a memory, the processor being configured to execute a liveness detection program stored in the memory to implement the liveness detection method of any one of the first aspects.

In a fourth aspect, an embodiment of the present invention provides a storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the liveness detection method of any one of the first aspects.

According to the technical scheme provided by the embodiment of the invention, the sound signal to be detected and the image to be detected corresponding to the sound signal to be detected are determined, the image to be detected is the image containing the face of the object to be detected, then, the sound source position of the sound signal to be detected is determined, the lip position of the object to be detected is determined based on the image to be detected, the sound source position and the lip position are compared in a consistent manner, and the living body detection result of the object to be detected is determined according to the comparison result. In the technical scheme, the sound source position of the sound signal to be detected and the lip position of the object to be detected can be directly positioned, and when the sound source position and the lip position are determined to be consistent, the sound signal to be detected is sent out by the lip of the object to be detected, and the object to be detected is a living body; otherwise, the object to be detected is a non-living body. The method and the device realize that the object to be detected can be identified as the non-living body even if the non-authenticated user counterfeits the authenticated user by means of acquiring the video image of the authenticated user, and improve the accuracy and reliability of the living body detection result.

Drawings

Fig. 1 is a schematic view of an application scenario according to an embodiment of the present invention;

fig. 2 is a schematic diagram of another application scenario according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating an exemplary embodiment of a method for detecting a living subject according to the present invention;

FIG. 4 is a flowchart of an embodiment of a method for detecting a living body according to another embodiment of the present invention;

FIG. 5 is a layout diagram of microphones on a living subject detection apparatus according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a living body detecting apparatus determining a sound source position through a microphone according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a sound source location according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of another sound source location provided by an embodiment of the present invention;

FIG. 9 is a flowchart illustrating another embodiment of a method for detecting a living body according to the present invention;

FIG. 10 is a flowchart of another embodiment of a method for detecting a living organism according to an embodiment of the present invention;

FIG. 11 is a block diagram of an embodiment of a living body detecting apparatus according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic view of an application scenario according to an embodiment of the present invention.

The application scenario shown in fig. 1 includes: a user 11 and a liveness detection device 13. The living body detection device 13 may be a device equipped with a living body detection system, supporting various electronic devices with a display screen, including but not limited to a smart phone, a tablet computer, a laptop computer, a desktop computer, and the like. In the embodiment of the invention, the display screen can be used for displaying the video signal captured by the camera and prompting the face position of the user to be detected. The living body detecting device 13 is shown in fig. 1 by taking a display screen as an example.

In the application scenario shown in fig. 1, the user 11 may perform the liveness detection in a normal manner. The normal mode here means: the user 11 stands in front of the camera of the living body detection device 13, and then the living body detection device 13 can directly acquire the video image containing the face of the user 11 through the camera, and further perform living body detection on the user 11 based on the video image.

Fig. 2 is a schematic view of another application scenario related to the embodiment of the present invention. The application scenario includes: a user 11, a user 12, a liveness detection device 13, a terminal 14, and a terminal 15. Wherein the terminal 14 and the terminal 15 can perform network communication.

The

terminals

14 and 15 may be hardware devices or software that support network connections to provide various network services. When the

terminals

14 and 15 are hardware, they may be supported by various electronic devices having display screens, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like, and the smart phones are merely used as examples in fig. 2. When the

terminals

14 and 15 are software, they can be installed in the electronic devices listed above. In the embodiment of the present invention, the terminal 14 and the terminal 15 establish a video call by installing corresponding applications respectively.

In the application scenario shown in fig. 2, it is assumed that the user 11 is a non-authenticated user and the user 12 is an authenticated user. Then, when the user 11 wants to perform live examination by the live-examination device 13, a video call can be made with the user 12 through the terminal 14 and the terminal 15, or a pre-recorded video image containing the face of the user 12 can be played through the terminal 14. At this point, a video image of user 12 may be displayed on the display screen of terminal 14.

Further, the user 11 may direct the display screen of the terminal 14 toward the camera of the liveness detection device 13, and then the liveness detection device 13 may receive the video image of the user 12 through the camera. The living body detection is performed because the living body detection device 13 can detect a vital sign from the video image of the user 12. Therefore, in the prior art, the non-authentication user can imitate the authentication user by means of acquiring the video image of the authentication user and pass the living body detection.

Based on this, the embodiment of the present invention provides a live body detection method, so as to prevent a non-authenticated user from counterfeiting through live body detection when the authenticated user passes through a means of acquiring a video image of the authenticated user, thereby improving accuracy of a live body detection result.

The living body detecting method provided by the present invention is further explained with specific embodiments in the following with reference to the drawings, and the embodiments do not limit the embodiments of the present invention.

Referring to fig. 3, a flowchart of an embodiment of a method for detecting a living body according to an embodiment of the present invention is provided. As an example, the process shown in FIG. 3 may be applied to a living body detecting device, such as the living body detecting device 13 shown in FIG. 1. As shown in fig. 3, the process may include the following steps:

step 301, determining a sound signal to be detected and an image to be detected corresponding to the sound signal to be detected, wherein the image to be detected is an image including a face of an object to be detected.

In an embodiment of the present invention, the sound signal to be detected is a sound signal received by a microphone when the living body detection device performs living body detection. The image to be detected is an image which is acquired by the camera and contains the face of the object to be detected when the living body detection equipment detects the living body. The number of images to be detected may be one or more. When the image to be detected is multiple, the multiple images to be detected can refer to multiple images in a section of video acquired by the living body detection equipment through the camera.

Because the sound signal to be detected and the image to be detected are both obtained by the living body detection equipment during the living body detection, the sound signal to be detected and the image to be detected correspond to each other.

In an exemplary application scenario, the object to be detected is a real object. For example, in the application scenario shown in fig. 1, the user 11 is an object to be detected, and in this case, the living body detecting device 13 may directly acquire an image including the face of the user 11 through a camera to obtain an image to be detected. At the same time, the user 11 may directly emit a sound signal, which the liveness detection device receives via the microphone, thus enabling the liveness detection device to determine the sound signal to be detected.

In another exemplary application scenario, the object to be detected is a virtual object. For example, in the application scenario shown in fig. 2, the video image of the user 12 output on the display screen of the terminal 14 is the object to be detected, and at this time, the living body detecting device 13 may capture an image including the face of the user 12 through the camera. Meanwhile, the user 12 may emit a sound signal, which is collected by the terminal 15 and transmitted to the terminal 14, and the terminal 14 may play the sound signal through a speaker, so that the living body detecting apparatus 13 may receive the sound signal to be detected through a microphone. Alternatively, the user 11 may emit a sound signal, and the living body detecting device 13 may receive the sound signal to be detected through a microphone.

In one embodiment, the above-mentioned living body detecting device may be provided with a plurality of microphones which are usually disposed at different positions, for example, four microphones disposed at four corners of the living body detecting device are provided on the living body detecting device. When the living body detection equipment determines the sound signals to be detected, the sound signals collected by the plurality of microphones can be obtained to obtain a plurality of sound signals.

Alternatively, the living body detecting apparatus may perform synthesis processing on the sound signal collected by each microphone, and determine the sound signal after the synthesis processing as the sound signal to be detected. Thus, noise can be eliminated, and a clearer and more accurate sound signal can be obtained.

Alternatively, the living body detecting apparatus may acquire a sound signal of any one of the microphones, and use the acquired sound signal as a sound signal to be detected.

Step 302, determining a sound source position corresponding to the sound signal to be detected, and determining a lip position of the object to be detected based on the image to be detected.

Step 303, comparing the sound source position and the lip position in a consistent manner, and determining the in-vivo detection result of the object to be detected according to the comparison result.

Step 302 and step 303 are explained below collectively:

in an embodiment, the biopsy device can locate the sound source position corresponding to the sound signal to be detected, and how to locate the sound source position is described below by the process shown in fig. 4, which will not be described in detail.

In an embodiment, as can be seen from the description of step 301, the image to be detected includes a face of an object to be detected, and based on this, the living body detecting device can determine the lip position of the object to be detected by performing face recognition on the image to be detected.

Furthermore, the consistency comparison of the sound source position and the lip position can be carried out, and the in-vivo detection result of the object to be detected can be determined according to the comparison result. Specifically, when the comparison result indicates that the sound source position and the lip position are not consistent, the voice signal to be detected is not emitted by the lip of the object to be detected, which indicates that the living body detection result of the object to be detected is a non-living body.

For example, in an exemplary application scenario, it is assumed that the object to be detected is a virtual object, for example, in the application scenario shown in fig. 2, the video image of the user 12 output on the display screen of the terminal 14 is the object to be detected, and at this time, the living body detecting device 13 may capture an image including the video image of the user 12 through the camera.

Meanwhile, the user 12 may issue a sound signal, which is collected by the terminal 15 and transmitted to the terminal 14, and the terminal 14 may play the sound signal through a speaker, and the speaker of the terminal 14 may be located above, below, left, or right of the housing of the terminal 14, that is, the sound source position of the sound signal may be located above, below, left, or right of the housing of the terminal 14. As shown in fig. 7, the sound source position may be located in A, B, C, D, or any of the E regions, which is not consistent with the position of the lips of the object to be detected. Or, the user 11 may issue a sound signal, and the sound source position at this time is not consistent with the lip position of the object to be detected at the position of the user 11. At this time, it is determined that the result of the living body detection of the object to be detected is a non-living body.

On the contrary, when the comparison result indicates that the sound source position and the lip position are consistent, the lip of the object to be detected emits the sound signal to be detected, which indicates that the living body detection result of the object to be detected is a living body.

For example, in another exemplary application scenario, the object to be detected is a real object. For example, in the application scenario shown in fig. 1, the user 11 is an object to be detected, and in this case, the living body detecting apparatus 13 may directly acquire an image including the face of the user 11 through the camera to obtain an image to be detected. At the same time, the user 11 may directly emit a sound signal, which the liveness detection device receives via the microphone, thus enabling the liveness detection device to determine the sound signal to be detected.

Since the sound signal to be detected is directly emitted from the object to be detected through the lips, the sound source position shown in fig. 8 can be obtained, the sound source position of the sound signal to be detected is located in the F region shown in fig. 8, and the F region is known to be the lip region of the object to be detected, which is consistent with the lip position of the object to be detected, and at this time, the living body detection result of the object to be detected can be determined to be a living body.

Referring to fig. 4, a flowchart of another embodiment of a method for detecting a living body according to an embodiment of the present invention is provided. The flowchart shown in fig. 4 is based on the flowchart shown in fig. 3 and describes how the living body detecting apparatus locates the sound source position of the sound signal to be detected, and as shown in fig. 4, the flowchart may include the following steps:

step 401, decomposing the sound signal to be detected to obtain a plurality of decomposed signals.

The decomposed signal may be a sound signal in any direction among sound signals in different directions included in the sound signal to be detected.

In one embodiment, in order to more accurately locate the sound source position of the sound signal to be detected, the in-vivo detection apparatus may employ a microphone array signal processing technique to determine the sound source position of the sound signal to be detected.

Based on this, the above-described living body detecting apparatus may be provided with N microphones. Here, in order to more accurately determine the sound source position of the sound signal to be detected in the three-dimensional space, N may be equal to or greater than 3 here. Fig. 5 shows a microphone distribution diagram on a living body detection apparatus according to an embodiment of the present invention. As can be seen from fig. 5, the detection device is provided with 4 microphones, one microphone at each corner, microphone 1, microphone 2, microphone 3, and microphone 4.

As can be seen from the above description of step 301, when a plurality of microphones are disposed on the living body detecting apparatus, sound signals collected by the plurality of microphones may be acquired, and the sound signals collected by each of the microphones are synthesized to obtain a sound signal to be detected.

Based on this, when the sound source position of the sound signal to be detected is determined, the sound signal to be detected can be decomposed to obtain a plurality of decomposed signals, wherein each decomposed signal can correspond to one microphone.

Step 402, determining the sound source direction of each decomposed signal.

And step 403, determining the intersection point position of the plurality of sound source directions as the sound source position of the sound signal to be detected.

Step 402 and step 403 are collectively described below:

as can be seen from the above description of step 401, each of the above decomposed signals may correspond to one microphone. Since each microphone, upon receiving a sound signal, can locate the sound source direction of the sound signal, the sound source direction of each decomposed signal can be determined by the microphone to which it corresponds.

Based on this, after the sound signal to be detected is decomposed into a plurality of decomposed signals, the sound source direction of each of the decomposed signals can be determined. Further, the position of the intersection of the plurality of sound source directions may be determined as the sound source position of the sound signal to be detected.

For example, it is assumed that one microphone is provided at each of the four corners in the living body detecting apparatus. As shown in fig. 6, each microphone may correspond to a decomposed signal of a sound signal to be detected, and a sound source direction of the decomposed signal may be determined, so that four sound source directions may be obtained. The cross position of the four sound source directions is point a, and then point a can be determined as the sound source position of the sound signal to be detected.

According to the technical scheme provided by the embodiment of the invention, the sound signal to be detected is decomposed to obtain a plurality of decomposed signals, then the sound source direction of each decomposed signal is determined, and the intersection point position of the plurality of sound source directions is determined as the sound source position of the sound signal to be detected, so that the sound source position of the sound signal to be detected is more accurately positioned.

Referring to fig. 9, a flowchart of another embodiment of a method for detecting a living body according to an embodiment of the present invention is provided. As shown in fig. 9, the process may include the following steps:

step 901, determining an audio signal to be detected and an image to be detected corresponding to the audio signal to be detected, where the image to be detected is an image including a face of an object to be detected.

Step 902, determining a sound source position corresponding to the sound signal to be detected, and determining a lip position of the object to be detected based on the image to be detected.

For detailed descriptions of step 901 and step 902, reference may be made to the descriptions in steps 301 to 302, which are not described herein again.

Step 903, determining a reference spatial region based on the lip position.

Step 904, determining whether the sound source position is located in the reference space area, if so, executing step 906; if not, go to step 905.

And step 905, determining that the living body detection result of the object to be detected is a non-living body.

The following description is made in a unified manner for steps 903 to 905:

in one embodiment, a reference spatial region may first be determined based on the lip position. As a possible implementation manner, a sphere or a rectangular parallelepiped region may be set as the reference space region with the position of the lip as the center. Then, determining whether the sound source position is located in the reference space area, and if so, determining that the sound source position is consistent with the lip position; if not, the sound source position and the lip position can be determined to be inconsistent.

Through the processing, errors existing when the sound source position is positioned can be eliminated, or the position of the positioning lip is inaccurate due to the change of the sitting posture when the object to be detected sends a sound signal, so that the consistency comparison result is more accurate.

And 906, inputting the image to be detected and the sound signal to be detected into the trained mouth shape recognition model to obtain an output result of the mouth shape recognition model.

Step 907, judging whether the output result shows that the to-be-detected sound signal is matched with the mouth shape of the to-be-detected object in the to-be-detected image, if so, executing step 908; if not, go to step 905.

And 908, determining the living body detection result of the object to be detected as the living body.

Step 906 to step 908 are collectively described below:

in an embodiment, the image to be detected and the sound signal to be detected may be input to the trained mouth shape recognition model, so as to obtain an output result indicating whether the mouth shapes of the sound signal to be detected and the object to be detected in the image to be detected match.

Optionally, if the output result is 1, it indicates that the sound signal to be detected matches with the mouth shape of the object to be detected in the image to be detected; and if the output result is 0, the sound signal to be detected is not matched with the mouth shape of the object to be detected in the image to be detected.

In the embodiment of the invention, when the output result shows that the voice signal to be detected is matched with the mouth shape of the object to be detected in the image to be detected, the living body detection result of the object to be detected can be determined as the living body.

For example, in an exemplary application scenario, the user 11 and the living body detection device 13 are in the same environment, and then when the user 11, that is, the object to be detected, sends an "o" sound signal at a certain time, the living body detection device may directly acquire the current time including the image of the face of the object to be detected through the camera to obtain the image to be detected, and receive the "o" sound signal through the microphone to obtain the sound signal to be detected.

Based on this, the image to be detected and the sound signal to be detected are input to the mouth shape recognition model, the mouth shape recognition model recognizes the mouth shape of the object to be detected in the image to be detected, so that the mouth shape of the object to be detected is o-shaped, and the mouth shape of the object to be detected is matched with the sound signal to be detected, so that the object to be detected can be determined to be a living body.

On the contrary, when the output result indicates that the mouth shape of the to-be-detected object in the to-be-detected sound signal and the to-be-detected image is not matched, the living body detection result of the to-be-detected object can be determined to be a non-living body.

For example, in the application scenario shown in fig. 2, the video image of the user 12 output on the display screen of the terminal 14 is the object to be detected, and at this time, the living body detecting device 13 may capture the image including the video image of the user 12 through the camera. Suppose that at a certain time, the user 11 sends out a "o" sound signal, and the user 12 does not send out any sound signal at the same time, at this time, the image to be detected collected by the living body detection device is a video image of the user 12, and the collected sound signal to be detected is a sound signal sent out by the user 11.

Then, the sound signal to be detected and the image to be detected are input to the mouth shape recognition model, and since the user 12 does not send any sound signal at the current time, it is possible to obtain a sound signal in which the mouth shape of the object to be detected is closed and the sound signal to be detected is o by recognizing the image to be detected, and the mouth shape corresponding to the sound signal o should be open, so that an output result that the mouth shapes of the object to be detected in the sound signal to be detected and the image to be detected are not matched can be obtained, and it is thus possible to obtain that the object to be detected is a non-living body.

According to the technical scheme provided by the embodiment of the invention, after the sound source position is determined to be located in the reference space area determined based on the lip position, the sound signal to be detected and the image to be detected are input into the trained mouth shape recognition model, and whether the living body detection result of the object to be detected is a living body can be further determined according to the output result. In the technical scheme, whether the object to be detected is a living body is determined by further detecting whether the mouth shapes of the sound signal to be detected and the object to be detected in the image to be detected are matched. The method and the device realize that the object to be detected can be identified as the non-living body even if the non-authenticated user counterfeits the authenticated user by means of acquiring the video image of the authenticated user, and improve the accuracy and reliability of the living body detection result.

Referring to fig. 10, a flowchart of another embodiment of a method for detecting a living body according to an embodiment of the present invention is provided. As shown in fig. 10, the process may include the following steps:

step 1001, outputting an interactive instruction, wherein the interactive instruction is used for instructing an object to be detected to send out a sound signal corresponding to the preset text data.

In an embodiment, the above-mentioned interaction instruction may be generated by the living body detecting device by: firstly, calling a preset random number generation algorithm to generate a random array; and then, generating the preset text data based on the random array, and generating an interactive instruction according to the preset text data.

For example, the living body detecting device invokes a preset random number generation algorithm to generate the following random number group: 265910. then, the living body detection equipment determines the random array as preset text data, generates an interactive instruction for indicating the object to be detected to say 265910 according to the preset text data, and outputs the interactive instruction.

Step 1002, determining a sound signal to be detected and an image to be detected corresponding to the sound signal to be detected, wherein the image to be detected is an image containing the face of an object to be detected.

Under normal conditions, after the object to be detected receives the interactive instruction, preset text data is spoken according to the interactive instruction, and therefore a sound signal is generated. At this time, the living body detecting apparatus may receive the sound signal through the microphone and determine the sound signal as a sound signal to be detected.

Of course, in an exemplary application scenario, as exemplified above, another object in the same environment as the living body detecting device speaks the preset text data according to the interactive instruction, thereby generating the sound signal. At this time, the living body detecting apparatus may receive the sound signal through the microphone and determine the sound signal as a sound signal to be detected.

And 1003, performing voice recognition on the sound signal to be detected to obtain text data corresponding to the sound signal to be detected.

In an embodiment, the living body detecting device may perform Speech Recognition on the voice signal to be detected through an ASR (Automatic Speech Recognition) technique to obtain text data.

In another embodiment, the in-vivo detection device may perform speech recognition on the sound signal to be detected through a convolutional neural network algorithm to obtain text data.

For example, in step 1001, when the living body detecting device sends an interactive instruction for instructing the object to be detected to speak 265910, the object to be detected sends a corresponding sound signal according to the interactive instruction, and the living body detecting device receives the sound signal through the microphone to obtain the sound signal to be detected. Then, the living body detection device performs voice recognition on the sound signal to be detected to obtain text data 265910.

And 1004, comparing the consistency of the text data corresponding to the sound signal to be detected with preset text data.

Step 1005, judging whether the comparison result shows that the recognized text data is consistent with preset text data, if so, executing step 1006; if not, go to step 1001.

The following describes steps 1004 to 1005 in a unified manner:

in an embodiment, the text data corresponding to the to-be-detected sound signal may be compared with the preset file data for consistency. Specifically, if the comparison result indicates that the recognized text data is inconsistent with the preset text data, it indicates that the object to be detected does not speak out the preset text data according to the interactive instruction output by the living body detection device. At this time, in order to avoid auditory errors of the object to be detected or errors in recognition by the living body detection device, the living body detection device can regenerate the interactive instruction and output the interactive instruction.

On the contrary, if the comparison result indicates that the recognized text data is consistent with the preset text data, it indicates that the object to be detected speaks the preset text data according to the interactive instruction output by the living body detection device. At this point, step 1006 may be continued to further determine whether the object to be detected is a living body.

Step 1006, determining a sound source position corresponding to the sound signal to be detected, and determining a lip position of the object to be detected based on the image to be detected.

Step 1007, judging whether the sound source position is consistent with the lip position, if so, executing step 1009; if not, go to step 1008.

And step 1008, determining that the living body detection result of the object to be detected is a non-living body.

The detailed description of step 1006 to step 1008 can refer to the description in step 302 and step 303, and is not described herein again.

And 1009, inputting the image to be detected and the sound signal to be detected into the trained mouth shape recognition model to obtain an output result of the mouth shape recognition model.

Step 1010, judging whether the output result shows that the mouth shape of the to-be-detected sound signal is matched with the mouth shape of the to-be-detected object in the to-be-detected image, if so, executing step 1011; if not, go to step 1008.

And step 1011, determining that the living body detection result of the object to be detected is a living body.

The descriptions of step 1009 to step 1011 can be referred to the descriptions of step 906 to step 908, and are not described herein again.

According to the technical scheme provided by the embodiment of the invention, before the sound signal to be detected is determined, an interactive instruction is output for indicating the object to be detected to send out the sound signal corresponding to the preset text data, after the sound signal to be detected is determined, the sound signal to be detected is identified, the text data obtained by identification is compared with the preset text data in a consistent manner, and if the text data is consistent with the preset text data, the step of determining the sound source position of the sound signal to be detected can be carried out. In the technical scheme, the object to be detected can be indicated to send out the sound signal corresponding to the preset text data through the interactive instruction, whether the object to be detected can interact with the living body detection device or not can be determined preliminarily, and the object to be detected is prevented from being counterfeited by using the video image recorded in advance, so that when a non-authentication user counterfeits the authentication user by obtaining the video image recorded in advance by the authentication user, the object to be detected can be identified as the non-living body more quickly and accurately, and the accuracy and the reliability of the living body detection result are improved.

Referring to fig. 11, a block diagram of an embodiment of a living body detection apparatus according to an embodiment of the present invention is provided. As shown in fig. 11, the apparatus includes:

the first determining module 111 is configured to determine a to-be-detected sound signal and a to-be-detected image corresponding to the to-be-detected sound signal, where the to-be-detected image is an image including a face of an object to be detected;

a second determining module 112, configured to determine a sound source position corresponding to the to-be-detected sound signal, and determine a lip position of the to-be-detected object based on the to-be-detected image;

a third determining module 113, configured to compare the sound source position and the lip position in a consistent manner, and determine a living body detection result of the object to be detected according to a comparison result.

In a possible implementation manner, the third determining module 113 is specifically configured to:

determining a reference spatial region based on the lip position;

In a possible embodiment, the device further comprises (not shown in the figures):

In a possible implementation, the output module is specifically configured to:

calling a preset random number generation algorithm to generate a random array;

and generating the preset text data based on the random array, and generating the interaction instruction according to the preset text data.

In a possible implementation manner, the first determining module 111 is specifically configured to:

acquiring sound signals collected by a plurality of microphones;

In a possible implementation, the second determining module 112 is specifically configured to:

determining a sound source direction for each of the decomposed signals;

Fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where the electronic device 1200 shown in fig. 12 includes: at least one processor 1201, memory 1202, at least one network interface 1204, and a user interface 1203. The various components in the electronic device 1200 are coupled together by a bus system 1205. It is understood that bus system 1205 is used to enable connected communication between these components. Bus system 1205 includes, in addition to a data bus, a power bus, a control bus, and a status signal bus. But for clarity of illustration the various buses are labeled as bus system 1205 in figure 12.

The user interface 1203 may include, among other things, a display, a keyboard, or a pointing device (e.g., a mouse, trackball, touch pad, or touch screen, etc.).

It is to be understood that the memory 1202 in embodiments of the present invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), double Data Rate Synchronous Dynamic random access memory (ddr Data Rate SDRAM, ddr SDRAM), enhanced Synchronous SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The memory 1202 described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

In some embodiments, memory 1202 stores elements, executable units or data structures, or a subset thereof, or an expanded set thereof: an operating system 12021 and application programs 12022.

The operating system 12021 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application 12022 contains various applications such as a Media Player (Media Player), a Browser (Browser), and the like, and is used to implement various application services. Programs that implement methods in accordance with embodiments of the present invention may be included in applications 12022.

In the embodiment of the present invention, by calling a program or an instruction stored in the memory 1202, specifically, a program or an instruction stored in the application 12022, the processor 1201 is configured to execute method steps provided by various method embodiments, for example, including:

and carrying out consistency comparison on the sound source position and the lip position, and determining the in-vivo detection result of the object to be detected according to the comparison result.

The method disclosed by the embodiment of the invention can be applied to the processor 1201 or implemented by the processor 1201. The processor 1201 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be implemented by integrated logic circuits of hardware or instructions in the form of software in the processor 1201. The Processor 1201 may be a general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software elements in the decoding processor. The software elements may be located in ram, flash, rom, prom, or eprom, registers, among other storage media that are well known in the art. The storage medium is located in the memory 1202, and the processor 1201 reads information in the memory 1202 and completes the steps of the above method in combination with hardware thereof.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof. For a hardware implementation, the Processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented by means of units performing the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

The electronic device provided in this embodiment may be the electronic device shown in fig. 12, and may execute all steps of the biopsy method shown in fig. 3~4 and fig. 9 to 10, so as to achieve the technical effect of the biopsy method shown in fig. 3~4 and fig. 9 to 10, specifically please refer to fig. 3~4 and the related descriptions of fig. 9 to 10, which are for brevity and are not described herein again.

The embodiment of the invention also provides a storage medium (computer readable storage medium). The storage medium herein stores one or more programs. Among others, the storage medium may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory may also comprise a combination of memories of the kind described above.

When the one or more programs in the storage medium are executable by the one or more processors to implement the above-described living body detection method performed on the electronic device side.

The processor is configured to execute a living body detection program stored in the memory to implement the following steps of the living body detection method executed on the electronic device side:

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only examples of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method of in vivo detection, the method comprising:

2. The method of claim 1, wherein said matching the sound source position and the lip position consistently comprises:

determining a reference spatial region based on the lip position;

3. The method according to claim 1 or 2, wherein the determining the in-vivo detection result of the object to be detected according to the comparison result comprises:

4. The method of claim 3, wherein in the case that the alignment result indicates that the sound source position and the lip position are consistent, the method further comprises:

and if the output result shows that the mouth shape of the to-be-detected object in the to-be-detected sound signal and the to-be-detected image are matched, executing the step of determining the living body detection result of the to-be-detected object as a living body.

5. The method according to claim 1, wherein prior to said determining the sound signal to be detected, the method further comprises:

6. The method of claim 5, further comprising:

7. The method according to claim 5 or 6, wherein the generation process of the interactive instruction comprises:

calling a preset random number generation algorithm to generate a random array;

8. The method according to claim 1, wherein the determining the sound signal to be detected comprises:

acquiring sound signals collected by a plurality of microphones;

9. The method according to claim 1, wherein the determining the sound source position corresponding to the sound signal to be detected comprises:

determining a sound source direction for each of the decomposed signals;

10. A living body detection apparatus, the apparatus comprising:

11. An electronic device, comprising: a processor and a memory, the processor for executing a liveness detection program stored in the memory to implement the liveness detection method of any one of claims 1~9.

12. A storage medium storing one or more programs executable by one or more processors to perform the liveness detection method of any one of claims 1~9.