CN113869253A

CN113869253A - Living body detection method, living body training device, electronic apparatus, and medium

Info

Publication number: CN113869253A
Application number: CN202111167884.6A
Authority: CN
Inventors: 张国生
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2021-12-31

Abstract

The disclosure provides a living body detection method, a training method and device of a living body detection model, electronic equipment, a storage medium and a program product, belongs to the technical field of artificial intelligence, particularly relates to the technical field of computer vision and deep learning, and can be applied to scenes such as face image processing, face recognition and the like. The specific implementation scheme is as follows: determining an attack prediction result related to an object based on attack clue information of the object in the image to be identified, wherein the attack clue information represents the probability that the object is an attack object; determining a living body prediction result related to the object based on living body clue information of the object in the image to be recognized, wherein the living body clue information represents the probability that the object is the living body object; and determining a detection result of the object based on the attack prediction result and the living body prediction result.

Description

Living body detection method, living body training device, electronic apparatus, and medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to the field of computer vision and deep learning technologies, which can be applied to scenes such as face image processing and face recognition, and in particular, to a living body detection method, a training method and apparatus for a living body detection model, an electronic device, a storage medium, and a program product.

Background

With the rapid development of computer and internet technologies, biometric identification technology is also increasingly applied to identity authentication. As an important branch of biometric technology, face recognition technology has been widely accepted and applied due to its advantages such as rapidity and non-contact property for authentication. However, the attack face image, which is opposite to the real face image, is a forged image, and is fraudulent. In the process of identity authentication by using a face recognition technology, the attack face image is misjudged as a real face image, and threats are caused to information security, property security and the like.

Disclosure of Invention

The present disclosure provides a living body detection method, a training method of a living body detection model, an apparatus, an electronic device, a storage medium, and a program product.

According to an aspect of the present disclosure, there is provided a method of living body detection, including: determining an attack prediction result related to an object based on attack clue information of the object in an image to be identified, wherein the attack clue information represents the probability that the object is an attack object; determining a living body prediction result related to the object based on living body clue information of the object in the image to be recognized, wherein the living body clue information represents the probability that the object is a living body object; and determining a detection result of the object based on the attack prediction result and the living body prediction result.

According to another aspect of the present disclosure, there is provided a training method of a living body detection model, including: acquiring a plurality of training samples, wherein each training sample in the plurality of training samples comprises a sample image, an attack tag aiming at the sample image and a living body tag aiming at the sample image, the sample image comprises a living body image or an attack image, an object in the living body image is a living body object, an object in the attack image is an attack object, the attack tag is obtained based on attack clue information of the object in the sample image, and the living body tag is obtained based on living body clue information of the object in the sample image; and training an initial living body detection model by using each training sample in the plurality of training samples to obtain a living body detection model.

According to another aspect of the present disclosure, there is provided a living body detection apparatus including: the attack result prediction module is used for determining an attack prediction result related to an object based on attack clue information of the object in the image to be identified, wherein the attack clue information represents the probability that the object is an attack object; a living body result prediction module, configured to determine a living body prediction result related to the object based on living body cue information of the object in the image to be recognized, where the living body cue information represents a probability that the object is a living body object; and an object result determination module for determining a detection result of the object based on the attack prediction result and the living body prediction result.

According to another aspect of the present disclosure, there is provided a training apparatus for a living body detection model, including: an obtaining module, configured to obtain a plurality of training samples, where each training sample in the plurality of training samples includes a sample image, an attack tag for the sample image, and a living body tag for the sample image, where the sample image includes a living body image or an attack image, an object in the living body image is a living body object, an object in the attack image is an attack object, the attack tag is obtained based on attack cue information of the object in the sample image, and the living body tag is obtained based on living body cue information of the object in the sample image; and the training module is used for training an initial living body detection model by utilizing each training sample in the plurality of training samples to obtain a living body detection model.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method as described above.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method as described above.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 schematically illustrates an exemplary system architecture to which the liveness detection method and apparatus may be applied, according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a liveness detection method according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow diagram of a liveness detection method according to another embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow diagram of a liveness detection method according to another embodiment of the present disclosure;

FIG. 5 schematically illustrates a structural schematic of a regression module, according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a flow diagram of a pre-processing operation according to another embodiment of the present disclosure;

FIG. 7 schematically illustrates a flow chart of a method of training a liveness detection model according to an embodiment of the present disclosure;

FIG. 8 schematically illustrates a flow diagram of a method of training a liveness detection model according to another embodiment of the present disclosure;

FIG. 9 schematically illustrates a block diagram of a liveness detection device according to an embodiment of the present disclosure;

FIG. 10 schematically illustrates a block diagram of a training apparatus for a liveness detection model according to an embodiment of the present disclosure; and

FIG. 11 schematically shows a block diagram of an electronic device suitable for implementing a liveness detection method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The face recognition technology is a biological feature recognition technology for carrying out identity recognition based on face feature information of people, and can be applied to a plurality of scenes such as security, attendance, finance, entrance guard passage and the like. The living body detection is an auxiliary technology in the face recognition technology, and can further judge whether the face feature information is from a living body object on the basis of identity recognition based on the face feature information. Therefore, the method helps to discriminate the fraudulent behavior and finally guarantees the benefit of the user.

According to an embodiment of the present disclosure, a living body detection method may include: determining an attack prediction result related to an object based on attack clue information of the object in the image to be identified, wherein the attack clue information represents the probability that the object is an attack object; determining a living body prediction result related to the object based on living body clue information of the object in the image to be recognized, wherein the living body clue information represents the probability that the object is the living body object; and determining a detection result of the object based on the attack prediction result and the living body prediction result.

By using the living body detection method provided by the embodiment of the disclosure, living body detection can be comprehensively carried out based on the attack cue information and the living body cue information of the object in the image to be identified, and the detection result of the object is obtained. The defect caused by the fact that living body detection is carried out through single attack clue information or single living body clue information is overcome. Thereby improving the in vivo detection performance and the generalization effect.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

Fig. 1 schematically illustrates an exemplary system architecture to which the liveness detection method and apparatus may be applied, according to an embodiment of the present disclosure.

It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios. For example, in another embodiment, an exemplary system architecture to which the living body detection method and apparatus may be applied may include a terminal device, but the terminal device may implement the living body detection method and apparatus provided by the embodiments of the present disclosure without interacting with a server.

As shown in fig. 1, the system architecture 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have installed thereon various communication client applications, such as a knowledge reading application, a web browser application, a search application, an instant messaging tool, a mailbox client, and/or social platform software, etc. (by way of example only).

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen, a camera, and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing face recognition support for face images acquired by users using the

terminal devices

101, 102, 103. The background management server can perform living body detection on the received face image and feed back a feedback result of a face detection result in the face image to the terminal equipment.

It should be noted that the living body detection method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the living body detecting apparatus provided by the embodiments of the present disclosure may be generally disposed in the server 105. The liveness detection method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the living body detection apparatus provided by the embodiment of the present disclosure may also be provided in a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105.

For example, when a user logs in a certain application, the

terminal devices

101, 102, and 103 may collect a face image of the user through a camera, then send the face image to the server 105, perform living body detection on the face image by the server 105, and feed back a detection result of the face in the face image to the terminal device. Or a server cluster capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105 performs live body detection on the face image, and finally feeds back the detection result of the face in the face image to the terminal device.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 2 schematically shows a flow chart of a liveness detection method according to an embodiment of the present disclosure.

As shown in fig. 2, the method includes operations S210 to S230.

In operation S210, an attack prediction result related to an object is determined based on attack cue information of the object in the image to be recognized, wherein the attack cue information characterizes a probability that the object is an attack object.

In operation S220, a living body prediction result related to the object is determined based on living body cue information of the object in the image to be recognized, wherein the living body cue information represents a probability that the object is a living body object.

In operation S230, a detection result of the object is determined based on the attack prediction result and the living body prediction result.

According to an embodiment of the present disclosure, the object in the image to be recognized may be, for example, a human face, or a human face and a human body.

According to an embodiment of the present disclosure, the living subject may be a subject characterized by a vital sign, for example the living subject may be a real human face. The object in the image to be recognized is a living object, and the human face in the image to be recognized can be understood as being obtained by directly acquiring the human face information of a real person.

According to an embodiment of the present disclosure, the living body cue information may be information from the image to be recognized for characterizing a probability that an object in the image to be recognized is a living body object. The live cue information may be, for example, depth cue information, or other information that can characterize the probability that an object in the image to be recognized is a live object. If the object in the image to be recognized is a living object, the probability that the object is a living object is closer to the result of living prediction as the ratio of the living cue information in the image to be recognized to the sum of the living cue information and the attack cue information is larger.

According to the embodiment of the disclosure, the attack object can be an attack face obtained by means of deception, counterfeiting and the like. For example, the attack object in the image to be recognized may be a human face obtained by a forging means such as copying a picture, extracting a video frame, and collecting a mask for wearing a target person.

According to the embodiment of the disclosure, the attack clue information may be information from the image to be recognized, and is used for representing the probability that the object in the image to be recognized is the attack object. The attack clue information may be information such as glistening clue information, light and shadow information, and the like. If the object in the image to be recognized is an attack object, the more the proportion of the attack cue information in the image to be recognized to the total sum of the live cue information and the attack cue information is, the closer the probability that the object is a live object is to the attack prediction result.

According to the embodiments of the present disclosure, the detection result of the object may be determined based on the attack prediction result and the living body prediction result. For example, if the object in the image to be recognized is a living object, if the living cue information is significant, the ratio of the score occupied by the living prediction result obtained based on the living cue information is large, while the ratio of the score occupied by the attack prediction result obtained based on the attack cue information is small. Therefore, it is necessary to combine multiple aspects and multi-dimensional clue information with the detection result of the object obtained based on the attack prediction result and the living body prediction result.

By using the living body detection method provided by the embodiment of the disclosure, living body detection can be comprehensively carried out based on the attack cue information and the living body cue information of the object in the image to be identified, and the detection result of the object is obtained. The defect caused by the fact that the living body detection is carried out by the single attack clue information or the single living body clue information is compensated, and the living body detection performance and the generalization effect can be improved.

Referring now to fig. 3-6, a method, such as that shown in fig. 2, provided by embodiments of the present disclosure is further described with reference to specific embodiments.

Fig. 3 schematically shows a flow diagram of an entity detection method according to another embodiment of the present disclosure.

As shown in fig. 3, attack cue information and live cue information may be included in the image to be recognized 310. Attack cue features 321 can be extracted from the attack cue information in the image 310 to be identified, and the attack cue features 321 are processed to obtain an attack prediction result 331. The live cue features 322 can also be extracted from the live cue information in the image 310 to be recognized, and the live cue features 322 can be processed to obtain the live prediction result 332. Based on the attack prediction result 331 and the living body prediction result 332, the detection result 340 of the object can be determined.

According to an example embodiment of the present disclosure, the attack cue features may include glistening cue features and semantic attack features. The reflective clue characteristics can be extracted from the attack clue information, and the reflective clue characteristics are processed to obtain a reflective clue prediction result. And extracting semantic attack characteristics from the attack clue information, and processing the semantic attack characteristics to obtain a semantic attack prediction result. And determining an attack prediction result based on the reflection clue prediction result and the semantic attack prediction result.

According to an embodiment of the present disclosure, the retroreflective cue features may be extracted from retroreflective cue information in the image to be recognized. And if the object in the image to be recognized is an attack object, the light reflection clue information is easy to appear in the image to be recognized. For example, the image to be recognized, which is obtained by copying the image information displayed in the high definition screen, will have reflective cue information. Therefore, by analyzing the characteristics of the glistening clues in the image to be recognized, the method is beneficial to determining whether the object in the image to be recognized is an attack object.

According to the embodiment of the disclosure, the semantic attack features may be extracted from the remaining attack cue information in the image to be identified, and the remaining attack cue information may be other attack cue information than the reflective cue information in the attack cue information.

According to the embodiment of the disclosure, whether the object in the image to be recognized is an attack object is analyzed by utilizing two different angles of the semantic attack characteristic and the reflective cue characteristic, so that the analysis is more comprehensive, the prediction precision is high, and the generalization is good.

According to an embodiment of the present disclosure, in operation S230, a detection result of the object, for example, a probability that the object is a living object may be calculated in a manner as shown in the following equation 1.

S ═ S1) + S2+ (1-S3); equation 1

Wherein S1 is the result of the prediction of the reflective clues, S2 is the result of the prediction of the living clues, S3 is the result of the prediction of the semantic attacks, and S is the probability that the object is the living object.

According to the embodiment of the present disclosure, the larger the S value is, the larger the probability that the object is a living object is represented, and the smaller the S value is, the larger the probability that the object is an attack object is represented.

According to an exemplary embodiment of the present disclosure, in operation S230, a detection result of the object, for example, a probability that the object is a living object may also be calculated in a manner as shown in equation 2.

S' ═ a (1-S1) + B × S2+ C (1-S3); equation 2

Wherein S1 is a reflection cue prediction result, S2 is a living body cue prediction result, S3 is a semantic attack prediction result, A, B, C is a weight of the reflection cue prediction result, a weight of the living body cue prediction result, and a weight of the semantic attack prediction result, respectively, and S' is a probability that an object is a living body object.

According to the embodiment of the disclosure, weights can be respectively given to the reflective cue prediction result, the living body cue prediction result and the semantic attack prediction result, so that one or more of reflective cue information, living body cue information and residual attack cue information are highlighted more obviously, and more accurate and flexible living body detection is realized.

Fig. 4 schematically shows a flowchart of a living body detection method according to another embodiment of the present disclosure.

As shown in fig. 4, the image to be recognized 410 may be processed using a live body detection model. The liveness detection model may include a reflex cue extraction module 421, a reflex cue identification module 422, a semantic attack extraction module 431, a semantic attack identification module 432, a depth cue extraction module 441, and a depth cue identification module 442.

As shown in fig. 4, in the case where the object in the image 410 to be recognized is an attack object, an attack cue map 411 may be formed based on the attack cue information in the image 410 to be recognized. The reflective cue extraction module 421 may be used to extract reflective cue features from the attack cue information, and the reflective cue recognition module 422 is used to process the reflective cue features to obtain the reflective cue prediction result 423. But is not limited thereto. And processing the attack clue information by using a reflective prediction model to obtain a reflective clue prediction result.

As shown in fig. 4, the semantic attack extracting module 431 may be used to extract the semantic attack features from the residual attack cue information, and the semantic attack recognizing module 432 may be used to process the semantic attack features to obtain the semantic attack prediction result 433.

As shown in fig. 4, in the case where the object in the image to be recognized 410 is a living object, a living cue map 412 may be formed based on the living cue information in the image to be recognized 410. The live body cue feature may be extracted from the live body cue information by the depth cue extraction module 441, and the live body cue feature may be processed by the depth cue identification module 442 to obtain the live body prediction result 443. But is not limited thereto. And the living body clue information can be processed by utilizing the depth prediction model to obtain a living body clue prediction result.

As shown in fig. 4, a detection result 440 of the object may be obtained based on the glistening cue prediction result 423, the living body prediction result 443, and the semantic attack prediction result 433.

According to an embodiment of the present disclosure, the glistening thread extraction module may include X sequentially connected residual modules (ResNet Block), X being an integer greater than or equal to 1.

According to an embodiment of the present disclosure, the semantic attack extraction module may include Z sequentially connected residual modules, Z being an integer greater than or equal to 1.

According to an exemplary embodiment of the present disclosure, the number X of residual modules in the reflex cue extraction module is smaller than the number Z of residual modules in the semantic attack extraction module, i.e. Z is larger than X. As shown in fig. 4, Z may be 4 and X may be 2.

According to the embodiment of the disclosure, the number of residual modules in the reflex cue extraction module is smaller than the number of residual modules in the semantic attack extraction module. The method can adapt to the fact that the reflective clue information is low-level texture clue information and the rest attacking clue information is high-level semantic clue information, and balance the extraction of the attacking clue characteristics of multiple scales and different levels. And further, layered prediction is realized, and extraction of different attack clue characteristics is improved.

According to an embodiment of the present disclosure, the depth cue extraction module includes Y sequentially connected residual modules, Y being an integer greater than or equal to 1.

According to an exemplary embodiment of the present disclosure, the number Y of residual modules in the depth cue extraction module may be greater than the number X of residual modules in the reflex cue extraction module, and the number Y of residual modules in the depth cue extraction module may be less than the number Z of residual modules in the semantic attack extraction module, i.e., Y is greater than X and Y is less than Z. As shown in fig. 4, Z may be 4, Y may be 3, and X may be 2.

According to the embodiment of the disclosure, the number of residual modules in the reflex cue extraction module is smaller than that in the depth cue extraction module, and the number of residual modules in the depth cue extraction module is smaller than that in the semantic attack extraction module. The method can be adapted to the fact that the reflective cue information is low-level texture cue information and the living body cue information is combined with the low-level texture cue information, the high-level semantic cue information and the high-level semantic cue information of the residual attack cue information, and the multi-scale different-level attack cue features and the extraction of the living body cue features are weighed. And further, hierarchical prediction is realized, and extraction of attack clue characteristics and living body clue characteristics is improved.

According to the embodiment of the disclosure, the network structures of the reflective cue identification module, the semantic attack identification module and the depth cue identification module may be set to be the same, for example, a regression module is adopted.

According to the embodiment of the disclosure, the living body detection of the image to be recognized is regarded as the regression problem of the pixel level, the living body detection model is utilized to extract the characteristics of the clue information of different levels, and then the hierarchical prediction of the multi-scale attack clue characteristics and the living body clue characteristics is realized, and the accuracy and the generalization effect of the living body detection are improved.

Fig. 5 schematically shows a structural schematic diagram of a regression module according to an embodiment of the present disclosure.

As shown in FIG. 5, the regression module may include a convolutional layer 510, a transposed convolutional layer 520, and an active layer 530 connected in sequence.

According to an embodiment of the present disclosure, the convolutional layer may employ a convolutional layer having a convolutional kernel of 1 × 1. The number of channels can be changed by using convolutional layers.

According to an embodiment of the disclosure, the transposed convolution layer may upsample feature vectors such as glistening cue features, live cue features, and semantic attack features to the same resolution as the image to be identified, so as to facilitate pixel-level regression.

According to the embodiment of the present disclosure, the activation layer may employ, for example, Sigmoid, but is not limited thereto, and Softmax may also be employed as long as it is an activation function capable of outputting an attack prediction result and a living body prediction result.

According to an embodiment of the present disclosure, a preprocessing operation may be performed before operation S201 to have moire information in an image to be recognized input into a living body detection model.

FIG. 6 schematically shows a flow diagram of a pre-processing operation according to another embodiment of the present disclosure.

As shown in fig. 6, fourier transform S610 is performed on an initial image to be recognized 610 to obtain a spectrum image 620 of the initial image to be recognized; and fusing the initial image to be recognized 610 with the spectrum image 620S 620 to obtain an image to be recognized 630.

According to an embodiment of the present disclosure, the initial image to be recognized may be an image based on RGB channels, that is, an image including 3 channels of red, green, and blue. The spectrum image and the initial image to be identified are fused to obtain an image to be identified containing 4 channels.

According to the embodiment of the disclosure, the fourier transform is performed on the initial image to be recognized, so that the frequency spectrum information in the image to be recognized, such as moire information, can be obtained. Therefore, the spectral image and the initial image to be identified are fused, so that the extraction and analysis of the attack clue characteristics are facilitated. Thereby improving the accuracy of the living body detection.

Fig. 7 schematically shows a flowchart of a training method of a liveness detection model according to an embodiment of the present disclosure.

As shown in fig. 7, the method includes operations S710 to S720.

In operation S710, a plurality of training samples are acquired, where each training sample in the plurality of training samples includes a sample image, an attack tag for the sample image, and a live body tag for the sample image, the sample image includes a live body image or an attack image, an object in the live body image is a live body object, an object in the attack image is an attack object, the attack tag is obtained based on attack cue information of the object in the sample image, and the live body tag is obtained based on live body cue information of the object in the sample image.

In operation S720, an initial in-vivo detection model is trained using each of a plurality of training samples, resulting in an in-vivo detection model.

According to embodiments of the present disclosure, the training samples may include positive samples and negative samples. The positive examples may include, for example, live images and the negative examples may include, for example, attack images.

According to an embodiment of the present disclosure, the attack tag may be derived based on attack cue information of the object in the sample image, and the living tag may be derived based on living cue information of the object in the sample image. In the living body image, the attack cue information is relatively small or zero, and the attack tag obtained based on the attack cue information may be 0 or may be a value lower than a predetermined threshold value for the attack. Conversely, in the attack image, the living body cue information is relatively small or zero, and the living body tag obtained based on the living body cue information may be 0 or may be a numerical value lower than a predetermined threshold value of the living body.

According to the embodiment of the disclosure, the sample image in the training sample can be not only a living body image but also an attack image, the initial living body detection model is trained by combining the positive sample and the negative sample, the problem that the initial living body detection model is lack of supervision of the negative sample (namely, attack clue characteristics) due to the fact that the initial living body detection model is trained by only using the positive sample is avoided, and the prediction precision of the trained living body detection model is further improved.

According to the embodiment of the disclosure, since the training sample simultaneously analyzes the attack cue information and the living body cue information in the sample image, the initial living body detection model is supervised and trained by using the attack label and the living body label. The method can solve the problems that the generation modes of the attack images are various, so that the attack clue information is various, and the attack images generated by various different generation modes are forced to be regarded as one type, which is not beneficial to discrimination and training. And further the generalization and the robustness of the in-vivo detection model are improved.

The method provided by the embodiments of the present disclosure, such as that shown in fig. 7, is further described below with reference to fig. 8 in conjunction with specific embodiments.

According to an embodiment of the present disclosure, the initial liveness detection model may include an initial attack prediction module and an initial liveness prediction module.

According to an embodiment of the present disclosure, training an initial in-vivo detection model with each of a plurality of training samples in operation S720, and obtaining the in-vivo detection model may include the following operations.

For example, an initial attack prediction module is used for processing attack clue information of an object in a sample image to obtain a sample attack prediction result; processing living body clue information of an object in a sample image by using an initial living body prediction module to obtain a sample living body prediction result; and training an initial living body detection model by using the sample attack prediction result and the attack label and by using the sample living body prediction result and the living body label to obtain a living body detection model.

According to the embodiment of the disclosure, the sample image can be processed by using the models of the existing known network architecture and trained well, such as the attack prediction model and the living body prediction model, so as to obtain the attack label and the living body label.

According to the exemplary embodiment of the disclosure, the sample attack prediction result and the attack tag can be input into an attack loss function to obtain an attack loss value; inputting the sample living body prediction result and the living body label into a living body loss function to obtain a living body loss value; determining a total loss value based on the attack loss value and the live loss value; adjusting parameters of the initial in-vivo detection model based on the total loss value until the total loss value converges; and taking the initial living body detection model of which the total loss value is converged as a living body detection model.

Fig. 8 schematically shows a flowchart of a training method of a liveness detection model according to another embodiment of the present disclosure.

As shown in fig. 8, the initial attack prediction module may include an initial reflection cue prediction module and an initial semantic attack prediction module. The initial reflective cue prediction module may include an initial reflective cue extraction module 821, an initial reflective cue identification module 822, the initial semantic attack prediction module may include an initial semantic attack extraction module 831, an initial semantic attack identification module 832, and the initial liveness prediction module may include an initial depth cue extraction module 841, an initial depth cue identification module 842.

As shown in fig. 8, the sample attack prediction results may include a sample reflective cue prediction result 823 and a sample semantic attack prediction result 833. Processing the attack clue information by using the initial reflection clue prediction module to obtain a sample reflection clue prediction result 823; processing attack clue information by using an initial semantic attack prediction module to obtain a sample semantic attack prediction result 833; and determining a sample attack prediction result based on the sample reflective cue prediction result 823 and the sample semantic attack prediction result 833.

As shown in fig. 8, a sample living body prediction result 843 can be obtained by processing living body cue information of an object in a sample image 810 by the initial living body prediction module.

As shown in fig. 8, attack tags may include glistening attack tags and semantic attack tags.

As shown in fig. 8, the reflective clue information of the sample image 810 can be processed by using a model, such as an attack prediction model 851, which is constructed by the existing known network and is trained, so as to obtain a reflective attack tag 852. For example, the live prediction model 861 processes the live cue information of the sample image 810 to obtain a live label 862.

As shown in fig. 8, the semantic attack tag may be determined based on whether the object in the sample image is a living object, for example, if the object is a living object, the semantic attack tag is determined to be 1, and if the object is an attack object, the semantic attack tag is determined to be 0.

As shown in fig. 8, before the operation of determining the sample semantic attack prediction result is performed, a judgment operation may be performed to judge whether the semantic attack tag is 1. And under the condition that the semantic attack tag is determined to be 1, executing the operation of processing attack clue information by using the initial semantic attack prediction module to obtain a sample semantic attack prediction result 833. And under the condition that the semantic attack label is determined to be 0, stopping the determination operation of the sample semantic attack prediction result.

According to the embodiment of the disclosure, the determination operation is performed, and the initial semantic attack prediction module may not be supervised and trained, for example, the determination operation of the sample semantic attack prediction result is not performed on the attack image, so as to meet the situation of diversified attack clue information.

According to embodiments of the present disclosure, the attack loss function may include a reflection attack loss function and a semantic attack loss function.

According to the exemplary embodiment of the disclosure, the sample reflective clue prediction result and the reflective attack tag are input into a reflective attack loss function to obtain a reflective attack loss value; inputting the sample semantic attack prediction result and the semantic attack label into a semantic attack loss function to obtain a semantic attack loss value; inputting the sample living body prediction result and the living body label into a living body loss function to obtain a living body loss value; determining a total loss value based on the reflection attack loss value, the semantic attack loss value, and the living body loss value; adjusting parameters of the initial in-vivo detection model based on the total loss value until the total loss value converges; and taking the initial living body detection model of which the total loss value is converged as a living body detection model.

According to the embodiments of the present disclosure, a mean square error loss function may be employed for the live body loss function, the reflection attack loss function, and the semantic attack loss function.

According to the embodiment of the disclosure, Fourier transform can be performed on the initial sample image to obtain a sample spectrum image of the initial sample image; and fusing the initial sample image and the sample spectrum image to obtain a sample image.

According to an example embodiment of the present disclosure, the initial reflective cue prediction module may include an initial reflective cue extraction module and an initial reflective cue identification module.

According to the embodiment of the disclosure, the attack clue information of the object in the sample image can be processed by utilizing the initial reflection clue extraction module to obtain the characteristics of the sample reflection clue; and processing the characteristics of the sample reflective clues by using an initial reflective clue identification module to obtain a sample reflective clue prediction result.

According to an exemplary embodiment of the present disclosure, the initial semantic attack prediction module may include an initial semantic attack extraction module and an initial semantic attack recognition module.

According to the embodiment of the disclosure, attack clue information of an object in a sample image can be processed by using an initial semantic attack extraction module to obtain sample semantic attack characteristics; and processing the semantic attack characteristics of the sample by using an initial semantic attack recognition module to obtain a semantic attack prediction result of the sample.

According to an exemplary embodiment of the present disclosure, the initial living body prediction module may include an initial depth cue extraction module and an initial depth cue identification module.

According to the embodiment of the disclosure, the living body cue information of the object in the sample image can be processed by utilizing an initial depth cue extraction module to obtain the characteristics of the living body cue of the sample; and processing the live clue characteristics of the sample by using an initial depth clue identification module to obtain a live sample prediction result.

By using the training method of the living body detection model provided by the embodiment of the disclosure, different initial attack prediction modules and initial living body prediction modules can be constructed based on clue information of different levels of sample image pixel levels. The living body detection model obtained by final training is applied to the field of living body detection, and is high in living body detection accuracy and good in generalization effect.

According to the embodiment of the disclosure, the network structures of the initial reflective cue identification module, the initial semantic attack identification module and the initial depth cue identification module may be set to be the same, for example, a regression module is adopted. The regression module may include a convolutional layer, a transposed convolutional layer, and an active layer connected in sequence.

For example, the convolutional layer may use a convolutional layer having a convolutional kernel of 1 × 1. The number of channels can be changed by using convolutional layers. The transposed convolutional layer may upsample feature vectors, such as sample reflectron cue features, sample live cue features, and sample semantic attack features, to the same resolution as the sample image, in order to facilitate pixel-level regression. For example, Sigmoid may be used as the activation layer, but the activation layer is not limited to this, and Softmax may be used as long as it is an activation function that can output a sample attack prediction result and a sample living body prediction result.

According to an embodiment of the present disclosure, the initial reflex cue extraction module may include X sequentially connected residual modules, where X is an integer greater than or equal to 1.

According to an embodiment of the present disclosure, the initial semantic attack extraction module may include Z sequentially connected residual modules, Z being an integer greater than or equal to 1.

According to an exemplary embodiment of the present disclosure, the number X of residual modules in the initial reflex cue extraction module is smaller than the number Z of residual modules in the initial semantic attack extraction module, i.e. Z is larger than X. As shown in fig. 8, Z may be 4 and X may be 2.

According to the embodiment of the disclosure, the number of residual modules in the initial reflex cue extraction module is smaller than the number of residual modules in the initial semantic attack extraction module. The method can adapt to the fact that the reflective clue information is low-level texture clue information and the rest attacking clue information is high-level semantic clue information, and balance the extraction of the attacking clue characteristics of multiple scales and different levels. According to an embodiment of the present disclosure, the initial depth cue extraction module includes Y sequentially connected residual modules, Y being an integer greater than or equal to 1.

According to an exemplary embodiment of the present disclosure, the number Y of residual modules in the initial depth cue extraction module may be greater than the number X of residual modules in the initial cue extraction module, and the number Y of residual modules in the initial depth cue extraction module may be less than the number Z of residual modules in the initial semantic attack extraction module, i.e. Y is greater than X and Y is less than Z. As shown in fig. 8, Z may be 4, Y may be 3, and X may be 2.

According to an embodiment of the disclosure, the number of residual modules in the initial reflex cue extraction module is designed to be smaller than the number of residual modules in the initial depth cue extraction module, and the number of residual modules in the initial depth cue extraction module is designed to be smaller than the number of residual modules in the initial semantic attack extraction module. The method can be adapted to the fact that the reflective cue information is low-level texture cue information and the living body cue information is combined with the low-level texture cue information, the high-level semantic cue information and the high-level semantic cue information of the residual attack cue information, and the multi-scale different-level attack cue features and the extraction of the living body cue features are weighed.

According to the embodiment of the present disclosure, the total loss value may be determined by adding up the reflection attack loss value, the semantic attack loss value, and the living body loss value, but is not limited thereto, and weights may be respectively configured for the reflection attack loss value, the semantic attack loss value, and the living body loss value, and the total loss value may be determined based on the reflection attack loss value, the semantic attack loss value, and the living body loss value and the respective weights.

By utilizing the training method of the in-vivo detection model provided by the embodiment of the disclosure, the attack cue information and the in-vivo cue information are displayed and supervised, the learning efficiency of the in-vivo detection model is improved, so that the in-vivo detection model learns the feature extraction and processing of different cue information such as the attack cue information and the in-vivo cue information, and the robustness and the generalization of the in-vivo detection model are improved.

Fig. 9 schematically shows a block diagram of a living body detecting apparatus according to an embodiment of the present disclosure.

As shown in fig. 9, the living body detection apparatus 900 may include an attack result prediction module 910, a living body result prediction module 920, and an object result determination module 930.

The attack result prediction module 910 is configured to determine an attack prediction result related to an object based on attack cue information of the object in the image to be identified, where the attack cue information represents a probability that the object is an attack object.

A living body result predicting module 920, configured to determine a living body prediction result related to the object based on living body cue information of the object in the image to be recognized, where the living body cue information represents a probability that the object is a living body object.

An object result determination module 930 configured to determine a detection result of the object based on the attack prediction result and the living body prediction result.

According to an embodiment of the present disclosure, the attack result prediction module may include an attack extraction unit, and an attack processing unit.

And the attack extraction unit is used for extracting the attack clue characteristics from the attack clue information.

And the attack processing unit is used for processing the attack clue characteristics to obtain an attack prediction result.

According to an embodiment of the present disclosure, the attack result prediction module may include a reflection extraction unit, a semantic extraction unit, a reflection processing unit, a semantic processing unit, and an attack result integration unit.

And the reflecting extraction unit is used for extracting reflecting clue characteristics from the attack clue information.

And the semantic extraction unit is used for extracting semantic attack characteristics from the attack clue information.

And the light reflection processing unit is used for processing the light reflection clue characteristics to obtain a light reflection clue prediction result.

And the semantic processing unit is used for processing the semantic attack characteristics to obtain a semantic attack prediction result.

And the attack result comprehensive unit is used for determining an attack prediction result based on the reflection clue prediction result and the semantic attack prediction result.

According to an embodiment of the present disclosure, the living body result prediction module may include a living body extraction unit, and a living body processing unit.

And the living body extraction unit is used for extracting the living body cue characteristics from the living body cue information.

And the living body processing unit is used for processing the living body clue characteristics to obtain a living body prediction result.

According to the embodiment of the disclosure, the reflective cue extraction module is utilized to extract reflective cue characteristics from the attack cue information; and extracting semantic attack characteristics from the attack clue information by using a semantic attack extraction module. The system comprises a light reflection clue extraction module, a light reflection clue extraction module and a light reflection clue extraction module, wherein the light reflection clue extraction module comprises X residual modules which are sequentially connected, and X is an integer which is greater than or equal to 1; the semantic attack extraction module comprises Z residual modules which are sequentially connected, wherein Z is an integer which is greater than or equal to 1; wherein Z is greater than X.

According to an embodiment of the disclosure, a depth cue extraction module is utilized to extract live cue features from live cue information; the depth cue extraction module comprises Y residual modules which are sequentially connected, wherein Y is an integer which is greater than or equal to 1; wherein Y is greater than X and Y is less than Z.

According to an embodiment of the present disclosure, the living body detecting device may further include a transformation module, and a fusion module.

And the transformation module is used for carrying out Fourier transformation on the initial image to be identified to obtain a frequency spectrum image of the initial image to be identified.

And the fusion module is used for fusing the initial image to be identified and the frequency spectrum image to obtain the image to be identified.

Fig. 10 schematically shows a block diagram of a training apparatus of a living body detection model according to an embodiment of the present disclosure.

As shown in fig. 10, the training apparatus 1000 of the in-vivo detection model may include an acquisition module 1010, and a training module 1020.

The acquiring module 1010 is configured to acquire a plurality of training samples, where each training sample in the plurality of training samples includes a sample image, an attack tag for the sample image, and a living body tag for the sample image, the sample image includes a living body image or an attack image, an object in the living body image is a living body object, an object in the attack image is an attack object, the attack tag is obtained based on attack cue information of the object in the sample image, and the living body tag is obtained based on living body cue information of the object in the sample image.

The training module 1020 is configured to train an initial in-vivo detection model by using each of the plurality of training samples to obtain an in-vivo detection model.

According to an embodiment of the present disclosure, an initial liveness detection model includes an initial attack prediction module and an initial liveness prediction module.

According to an embodiment of the present disclosure, the training module may include an attack result obtaining sub-module, a living body result obtaining sub-module, and a training sub-module.

And the attack result obtaining submodule is used for processing the attack clue information of the object in the sample image by using the initial attack prediction module to obtain a sample attack prediction result.

And the living body result obtaining submodule is used for processing living body clue information of the object in the sample image by using the initial living body prediction module to obtain a sample living body prediction result.

And the training submodule is used for training the initial living body detection model by utilizing the sample attack prediction result and the attack label and the sample living body prediction result and the living body label to obtain the living body detection model.

According to an embodiment of the present disclosure, the initial living body prediction module may include an initial depth cue extraction module and an initial depth cue identification module.

According to an embodiment of the present disclosure, the living body result obtaining submodule may include a sample living body extraction unit and a sample regression processing unit.

And the sample living body extraction unit is used for processing living body cue information of the object in the sample image by using the initial depth cue extraction module to obtain the sample living body cue characteristics.

And the sample living body regression unit is used for processing the sample living body cue characteristics by utilizing the initial depth cue identification module to obtain a sample living body prediction result.

According to an embodiment of the present disclosure, the training submodule may include an attack input unit, a living body input unit, a loss value determination unit, an adjustment unit, and a model determination unit.

And the attack input unit is used for inputting the sample attack prediction result and the attack label into the attack loss function to obtain an attack loss value.

And the living body input unit is used for inputting the sample living body prediction result and the living body label into the living body loss function to obtain a living body loss value.

And a loss value determination unit for determining a total loss value based on the attack loss value and the living body loss value.

And the adjusting unit is used for adjusting the parameters of the initial living body detection model based on the total loss value until the total loss value is converged.

A model determination unit for taking the initial living body detection model in which the total loss value converges as the living body detection model.

According to an embodiment of the present disclosure, the initial attack prediction module includes an initial reflection cue prediction module and an initial semantic attack prediction module.

According to the embodiment of the disclosure, the sample attack prediction result comprises a sample reflective cue prediction result and a sample semantic attack prediction result.

According to the embodiment of the disclosure, the attack result obtaining submodule may include a light reflection result obtaining unit, a semantic result obtaining unit, and an attack comprehensive obtaining unit.

And the light reflection result obtaining unit is used for processing the attack clue information by utilizing the initial light reflection clue prediction module to obtain a sample light reflection clue prediction result.

And the semantic result obtaining unit is used for processing the attack clue information by utilizing the initial semantic attack prediction module to obtain a sample semantic attack prediction result.

And the attack comprehensive obtaining unit is used for determining a sample attack prediction result based on the sample reflective cue prediction result and the sample semantic attack prediction result.

According to the embodiment of the disclosure, the initial semantic attack prediction module comprises an initial semantic attack extraction module and an initial semantic attack recognition module.

According to an embodiment of the present disclosure, the reflection result obtaining unit may include a sample semantic extracting subunit and a sample semantic regressing subunit.

And the sample semantic extraction subunit is used for processing the attack clue information of the object in the sample image by using the initial semantic attack extraction module to obtain the sample semantic attack characteristics.

And the sample semantic regression subunit is used for processing the sample semantic attack characteristics by utilizing the initial semantic attack recognition module to obtain a sample semantic attack prediction result.

According to an embodiment of the present disclosure, the initial reflective cue prediction module includes an initial reflective cue extraction module and an initial reflective cue identification module.

According to an embodiment of the present disclosure, the reflection result obtaining unit may include a sample reflection extracting subunit and a sample reflection regression subunit.

And the sample reflective extraction subunit is used for processing the attack clue information of the object in the sample image by using the initial reflective clue extraction module to obtain the characteristics of the sample reflective clue.

And the sample reflection regression subunit is used for processing the characteristics of the sample reflection clues by utilizing the initial reflection clue identification module to obtain a sample reflection clue prediction result.

According to an embodiment of the present disclosure, the attack tags include a reflection attack tag and a semantic attack tag.

According to an embodiment of the present disclosure, the attack loss function includes a reflection attack loss function and a semantic attack loss function.

According to an embodiment of the present disclosure, the attack input unit may include a reflection loss obtaining subunit, a semantic loss obtaining subunit, and an attack loss obtaining subunit.

And the reflection loss obtaining subunit is used for inputting the sample reflection clue prediction result and the reflection attack tag into a reflection attack loss function to obtain a reflection attack loss value.

And the semantic loss obtaining subunit is used for inputting the sample semantic attack prediction result and the semantic attack tag into a semantic attack loss function to obtain a semantic attack loss value.

And the attack loss obtaining subunit is used for determining an attack loss value based on the reflection attack loss value and the semantic attack loss value.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.

According to an embodiment of the present disclosure, a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method as described above.

According to an embodiment of the disclosure, a computer program product comprising a computer program which, when executed by a processor, implements the method as described above.

FIG. 11 shows a schematic block diagram of an example electronic device 1100 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 11, the device 1100 comprises a computing unit 1101, which may perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM1103, various programs and data necessary for the operation of the device 1100 may also be stored. The calculation unit 1101, the ROM 1102, and the RAM1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

A number of components in device 1100 connect to I/O interface 1105, including: an input unit 1106 such as a keyboard, a mouse, and the like; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 1108 such as a magnetic disk, optical disk, or the like; and a communication unit 1109 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 1109 allows the device 1100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 1101 can be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 1101 performs the respective methods and processes described above, such as the living body detection method or the training method of the living body detection model. For example, in some embodiments, the liveness detection method or the training method of the liveness detection model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 1108. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1100 via ROM 1102 and/or communication unit 1109. When the computer program is loaded into the RAM1103 and executed by the computing unit 1101, one or more steps of the above-described liveness detection method or training method of the liveness detection model may be performed. Alternatively, in other embodiments, the computing unit 1101 may be configured by any other suitable means (e.g., by means of firmware) to perform a liveness detection method or a training method of a liveness detection model.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of in vivo detection comprising:

determining an attack prediction result related to an object based on attack clue information of the object in an image to be identified, wherein the attack clue information represents the probability that the object is an attack object;

determining a living body prediction result related to the object based on living body clue information of the object in the image to be recognized, wherein the living body clue information represents the probability that the object is a living body object; and

determining a detection result of the object based on the attack prediction result and the living body prediction result.

2. The method of claim 1, wherein the determining an attack prediction result related to an object in the image to be identified based on attack cue information of the object comprises:

extracting attack clue characteristics from the attack clue information; and

and processing the attack clue characteristics to obtain the attack prediction result.

3. The method of claim 1, wherein the determining an attack prediction result related to an object in the image to be identified based on attack cue information of the object comprises:

extracting reflex cue features from the attack cue information;

extracting semantic attack features from the attack clue information;

processing the characteristics of the light reflection clues to obtain a light reflection clue prediction result;

processing the semantic attack characteristics to obtain a semantic attack prediction result; and

and determining the attack prediction result based on the reflection clue prediction result and the semantic attack prediction result.

4. The method of any of claims 1 to 3, wherein the determining a live prediction result related to the object based on live cue information of the object in the image to be identified comprises:

extracting live cue features from the live cue information; and

and processing the live body clue characteristics to obtain the live body prediction result.

5. The method of claim 4, wherein reflex cue features are extracted from the offensive cue information with a reflex cue extraction module; extracting semantic attack characteristics from the attack clue information by using a semantic attack extraction module;

the reflective cue extraction module comprises X residual modules which are sequentially connected, wherein X is an integer greater than or equal to 1;

the semantic attack extraction module comprises Z residual modules which are sequentially connected, wherein Z is an integer which is greater than or equal to 1;

wherein Z is greater than X.

6. The method of claim 5, wherein live cue features are extracted from the live cue information with a depth cue extraction module;

the depth cue extraction module comprises Y residual modules which are sequentially connected, wherein Y is an integer which is greater than or equal to 1;

wherein Y is greater than X and Y is less than Z.

7. The method of claim 1, further comprising:

carrying out Fourier transform on an initial image to be identified to obtain a frequency spectrum image of the initial image to be identified; and

and fusing the initial image to be identified and the frequency spectrum image to obtain the image to be identified.

8. A method of training a living body detection model, comprising:

acquiring a plurality of training samples, wherein each training sample in the plurality of training samples comprises a sample image, an attack tag aiming at the sample image and a living body tag aiming at the sample image, the sample image comprises a living body image or an attack image, an object in the living body image is a living body object, an object in the attack image is an attack object, the attack tag is obtained based on attack clue information of the object in the sample image, and the living body tag is obtained based on living body clue information of the object in the sample image; and

and training an initial in-vivo detection model by using each training sample in the plurality of training samples to obtain an in-vivo detection model.

9. The method of claim 8, wherein the initial liveness detection model comprises an initial attack prediction module and an initial liveness prediction module;

the training of the initial in vivo detection model by using each of the plurality of training samples to obtain the in vivo detection model includes:

processing the attack clue information of the object in the sample image by using the initial attack prediction module to obtain a sample attack prediction result;

processing the living body clue information of the object in the sample image by utilizing the initial living body prediction module to obtain a sample living body prediction result;

and training the initial in-vivo detection model by using the sample attack prediction result and the attack label and by using the sample in-vivo prediction result and the in-vivo label to obtain the in-vivo detection model.

10. The method of claim 9, wherein the first and second light sources are selected from the group consisting of,

wherein the initial living body prediction module comprises an initial depth cue extraction module and an initial depth cue identification module;

wherein the processing the living body cue information of the object in the sample image by the initial living body prediction module to obtain a sample living body prediction result comprises: processing the living body cue information by utilizing the initial depth cue extraction module to obtain the living body cue characteristics of the sample; and processing the sample living body cue characteristics by utilizing the initial depth cue identification module to obtain the sample living body prediction result.

11. The method of claim 9, wherein the training the initial in vivo detection model using the sample attack prediction result and the attack signature and using the sample in vivo prediction result and the in vivo signature, the deriving the in vivo detection model comprises:

inputting the sample attack prediction result and the attack label into an attack loss function to obtain an attack loss value;

inputting the sample living body prediction result and the living body label into a living body loss function to obtain a living body loss value;

determining a total loss value based on the attack loss value and the live loss value;

adjusting parameters of the initial in vivo testing model based on the total loss value until the total loss value converges; and

and taking the initial living body detection model of the convergence of the total loss value as the living body detection model.

12. The method of claim 9, wherein the initial attack prediction module comprises an initial reflex cue prediction module and an initial semantic attack prediction module;

the sample attack prediction result comprises a sample reflective cue prediction result and a sample semantic attack prediction result;

the processing, by the initial attack prediction module, the attack cue information of the object in the sample image, and the determining a sample attack prediction result includes:

processing the attack clue information by using the initial reflection clue prediction module to obtain a sample reflection clue prediction result;

processing the attack clue information by using the initial semantic attack prediction module to obtain a sample semantic attack prediction result; and

and determining the sample attack prediction result based on the sample reflective cue prediction result and the sample semantic attack prediction result.

13. The method of claim 12, wherein the first and second light sources are selected from the group consisting of,

the initial semantic attack prediction module comprises an initial semantic attack extraction module and an initial semantic attack recognition module;

wherein, the processing the attack thread information by the initial semantic attack prediction module to obtain the sample semantic attack prediction result comprises: processing the attack clue information by using the initial semantic attack extraction module to obtain sample semantic attack characteristics; processing the sample semantic attack characteristics by using the initial semantic attack recognition module to obtain a sample semantic attack prediction result;

the initial light reflection clue prediction module comprises an initial light reflection clue extraction module and an initial light reflection clue identification module;

wherein the processing the attack cue information by using the initial reflection cue prediction module to obtain the sample reflection cue prediction result comprises: processing the attack clue information by using the initial light reflection clue extraction module to obtain the characteristics of the sample light reflection clue; and processing the characteristics of the sample reflective clues by utilizing the initial reflective clue identification module to obtain a sample reflective clue prediction result.

14. The method of claim 11, wherein the attack tags include glistening attack tags and semantic attack tags;

wherein the attack loss function comprises a reflection attack loss function and a semantic attack loss function;

inputting the sample attack prediction result and the attack tag into an attack loss function, and obtaining an attack loss value comprises:

inputting the sample reflective clue prediction result and the reflective attack label into the reflective attack loss function to obtain a reflective attack loss value;

inputting the sample semantic attack prediction result and the semantic attack label into the semantic attack loss function to obtain a semantic attack loss value; and

determining the attack loss value based on the reflection attack loss value and the semantic attack loss value.

15. A living body detection apparatus comprising:

the attack result prediction module is used for determining an attack prediction result related to an object based on attack clue information of the object in the image to be identified, wherein the attack clue information represents the probability that the object is an attack object;

a living body result prediction module, configured to determine a living body prediction result related to the object based on living body cue information of the object in the image to be recognized, where the living body cue information represents a probability that the object is a living body object; and

an object result determination module for determining a detection result of the object based on the attack prediction result and the living body prediction result.

16. The apparatus of claim 15, wherein the attack outcome prediction module comprises:

the attack extraction unit is used for extracting attack clue characteristics from the attack clue information; and

and the attack processing unit is used for processing the attack clue characteristics to obtain the attack prediction result.

17. The apparatus of claim 15, wherein the attack outcome prediction module comprises:

a reflex extraction unit, configured to extract reflex cue features from the attack cue information;

a semantic extraction unit, which is used for extracting semantic attack characteristics from the attack clue information;

the light reflection processing unit is used for processing the light reflection clue characteristics to obtain a light reflection clue prediction result;

the semantic processing unit is used for processing the semantic attack characteristics to obtain a semantic attack prediction result; and

and the attack result comprehensive unit is used for determining the attack prediction result based on the reflection clue prediction result and the semantic attack prediction result.

18. The apparatus of any one of claims 15 to 17, wherein the in vivo outcome prediction module comprises:

a living body extraction unit configured to extract a living body cue feature from the living body cue information; and

and the living body processing unit is used for processing the living body clue characteristics to obtain the living body prediction result.

19. The apparatus of claim 15, further comprising:

the transformation module is used for carrying out Fourier transformation on an initial image to be identified to obtain a frequency spectrum image of the initial image to be identified; and

20. A training apparatus for a living body detection model, comprising:

an obtaining module, configured to obtain a plurality of training samples, where each training sample in the plurality of training samples includes a sample image, an attack tag for the sample image, and a living body tag for the sample image, where the sample image includes a living body image or an attack image, an object in the living body image is a living body object, an object in the attack image is an attack object, the attack tag is obtained based on attack cue information of the object in the sample image, and the living body tag is obtained based on living body cue information of the object in the sample image; and

and the training module is used for training an initial in-vivo detection model by utilizing each training sample in the plurality of training samples to obtain the in-vivo detection model.

21. The apparatus of claim 20, wherein the initial liveness detection model comprises an initial attack prediction module and an initial liveness prediction module;

the training module comprises:

an attack result obtaining submodule, configured to process the attack clue information of the object in the sample image by using the initial attack prediction module, so as to obtain a sample attack prediction result;

a living body result obtaining submodule, configured to process the living body cue information of the object in the sample image by using the initial living body prediction module, so as to obtain a sample living body prediction result; and

22. The apparatus of claim 21, wherein the training submodule comprises:

the attack input unit is used for inputting the sample attack prediction result and the attack label into an attack loss function to obtain an attack loss value;

a living body input unit, configured to input the sample living body prediction result and the living body label into a living body loss function, so as to obtain a living body loss value;

a loss value determination unit for determining a total loss value based on the attack loss value and the living body loss value;

an adjusting unit, configured to adjust parameters of the initial in-vivo detection model based on the total loss value until the total loss value converges; and

a model determination unit configured to take an initial live body detection model in which the total loss value converges as the live body detection model.

23. The apparatus of claim 21, wherein the initial attack prediction module comprises an initial reflex cue prediction module and an initial semantic attack prediction module;

the attack result obtaining submodule comprises:

a reflection result obtaining unit, configured to process the attack cue information by using the initial reflection cue prediction module, and obtain a sample reflection cue prediction result;

a semantic result obtaining unit, configured to process the attack thread information by using the initial semantic attack prediction module to obtain a sample semantic attack prediction result; and

and the attack comprehensive obtaining unit is used for determining the sample attack prediction result based on the sample reflective cue prediction result and the sample semantic attack prediction result.

24. The apparatus of claim 22, wherein the attack tags include glistening attack tags and semantic attack tags;

the attack input unit includes:

a reflection loss obtaining subunit, configured to input the sample reflection thread prediction result and the reflection attack tag into the reflection attack loss function to obtain a reflection attack loss value;

a semantic loss obtaining subunit, configured to input the sample semantic attack prediction result and the semantic attack tag into the semantic attack loss function to obtain a semantic attack loss value; and

and the attack loss obtaining subunit is used for determining the attack loss value based on the reflection attack loss value and the semantic attack loss value.

25. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the in vivo detection method of any one of claims 1-7 or the training method of the in vivo detection model of any one of claims 8-14.

26. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the in-vivo detection method of any one of claims 1-7 or the training method of the in-vivo detection model of any one of claims 8-14.

27. A computer program product comprising a computer program which, when executed by a processor, implements the in-vivo detection method of any one of claims 1 to 7 or the training method of the in-vivo detection model of any one of claims 8 to 14.