CN114648814A

CN114648814A - Face living body detection method, training method, device, equipment and medium of model

Info

Publication number: CN114648814A
Application number: CN202210178546.0A
Authority: CN
Inventors: 张国生
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2022-06-21

Abstract

The present disclosure provides a face living body detection method, a training method, a device, an apparatus and a medium of a model, and the artificial intelligence technical fields of deep learning, computer vision, etc., which can be applied to scenes such as face image processing, face recognition, etc. The specific implementation scheme is as follows: acquiring a human face image to be detected; acquiring the probability that a plurality of pixel points in the face image belong to a living body face based on a pre-trained face living body detection model and the face image; and detecting whether the face image is a living body face or not based on the probability that a plurality of pixel points in the face image belong to the living body face. The method and the device can effectively improve the accuracy and the detection efficiency of the human face living body detection.

Description

Human face living body detection method, training method, device, equipment and medium of model

Technical Field

The disclosure relates to the technical field of computers, in particular to the technical field of artificial intelligence such as deep learning and computer vision, and can be applied to scenes such as face image processing and face recognition. In particular to a human face living body detection method, a training method, a device, equipment and a medium of a model.

Background

The human face living body detection is to distinguish whether an image is shot by a real person, is a basic module of a human face recognition system, and can ensure the safety of the human face recognition system.

A human face living body detection method using a deep learning technology is a mainstream method in the field at present, and compared with a traditional method, the precision is greatly improved. The existing human face living body detection method based on deep learning is mainly a method based on two classifications, and specifically comprises the following two steps: firstly, extracting features of a face image; and secondly, performing two-classification identification based on the extracted features, namely identifying whether the face image is a living face or an attack.

Disclosure of Invention

The disclosure provides a human face living body detection method, a training method, a device, equipment and a medium of a model.

According to an aspect of the present disclosure, there is provided a face live body detection method, including:

acquiring a human face image to be detected;

acquiring the probability that a plurality of pixel points in the face image belong to a living face based on a pre-trained face living body detection model and the face image;

and detecting whether the face image is a living body face or not based on the probability that a plurality of pixel points in the face image belong to the living body face.

According to another aspect of the present disclosure, there is provided a training method of a human face living body detection model, including:

collecting a plurality of groups of training data, wherein each group of training data comprises a living body face image and an attack face image;

generating a corresponding spliced image based on the living body face image and the attack face image in each group of the training data;

labeling the real probability that a plurality of pixel points in the spliced image corresponding to each group of the training data belong to the living human face;

and training the face living body detection model based on the spliced images corresponding to the training data of each group and the real probability that a plurality of pixel points in the spliced images belong to the living body face.

According to still another aspect of the present disclosure, there is provided a face liveness detection apparatus including:

the image acquisition module is used for acquiring a human face image to be detected;

the probability acquisition module is used for acquiring the probability that a plurality of pixel points in the face image belong to a living body face based on a pre-trained face living body detection model and the face image;

and the detection module is used for detecting whether the face image is a living body face or not based on the probability that the plurality of pixel points in the face image belong to the living body face.

According to another aspect of the present disclosure, there is provided a training device for a human face living body detection model, including:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a plurality of groups of training data, and each group of training data comprises a living body face image and an attack face image;

the generating module is used for generating a corresponding spliced image based on the living body face image and the attack face image in each group of training data;

the labeling module is used for labeling the real probability that a plurality of pixel points in the spliced image corresponding to each group of the training data belong to the living human face;

and the training module is used for training the face living body detection model based on the spliced images corresponding to the training data of all groups and the real probability that a plurality of pixel points in the spliced images belong to the living body face.

According to still another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of the aspects and any possible implementation described above.

According to yet another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the above described aspect and any possible implementation.

According to yet another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of the aspect and any possible implementation as described above.

According to the technology disclosed by the invention, the accuracy and the detection efficiency of the face living body detection can be effectively improved.

It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;

FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;

FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure;

FIG. 4 is a schematic diagram according to a fourth embodiment of the present disclosure;

FIG. 5 is a schematic illustration according to a fifth embodiment of the present disclosure;

FIG. 6 is a schematic diagram according to a sixth embodiment of the present disclosure;

FIG. 7 is a schematic diagram according to a seventh embodiment of the present disclosure;

FIG. 8 is a block diagram of an electronic device used to implement methods of embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It is to be understood that the described embodiments are only a few, and not all, of the disclosed embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

It should be noted that the terminal device involved in the embodiments of the present disclosure may include, but is not limited to, a mobile phone, a Personal Digital Assistant (PDA), a wireless handheld device, a Tablet Computer (Tablet Computer), and other intelligent devices; the display device may include, but is not limited to, a personal computer, a television, and the like having a display function.

In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

The current human face living body detection method based on deep learning directly extracts the characteristics of the whole human face image and carries out two-classification recognition, so that the accuracy of human face living body detection is poor, and the correction and detection efficiency is low.

FIG. 1 is a schematic illustration according to a first embodiment of the present disclosure; as shown in fig. 1, the present embodiment provides a face live detection method, which can be applied to various types of face live detection apparatuses, and specifically includes the following steps:

s101, acquiring a human face image to be detected;

s102, acquiring the probability that a plurality of pixel points in the face image belong to the living face based on a pre-trained face living body detection model and the face image;

s103, detecting whether the face image is a living body face or not based on the probability that a plurality of pixel points in the face image belong to the living body face.

The face living body detection method of the embodiment can be applied to various safety protection systems and is used for detecting whether a face image is a living body face; if the face image is not a living face, the corresponding face image is considered as an attack, and blocking is needed to ensure safety.

Based on different application scenarios of the safety protection system, the form of the living human face detection apparatus of the present embodiment is also different, and for example, the living human face detection apparatus may be in the form of a mobile terminal, an application installed in a terminal, or another form, which is not limited herein.

The pre-trained face living body detection model of the embodiment is different from the two-classification model of the prior art. For example, in use, the acquired face image may be input to the face living body detection model, and the face living body detection model may predict and output the probability that a plurality of pixel points in the face image belong to a living body face. For example, for each pixel point in the face image, the face living body detection model may predict and output that the probability that the pixel point belongs to a living body face is p, where p is [ 0,1 ], and p is 0, which indicates that the pixel point belongs to a face that does not belong to a living body face, i.e., belongs to an attack. And P is 1, which indicates that the pixel belongs to the living human face. In practical application, the value of p is a numerical value greater than 0 and less than 1, and the closer to 1, the higher the probability that the pixel point belongs to the living human face. In practical application, the probability that a plurality of pixel points in the face image belong to the living face can be obtained in other modes based on the face living body detection model and the face image.

Further, based on the probability that a plurality of pixel points in the obtained face image belong to the living body face, comprehensive analysis can be performed to detect whether the face image is the living body face.

According to the face in-vivo detection method, the probability that a plurality of pixel points in a face image belong to a living face is obtained based on a pre-trained face in-vivo detection model and the face image; whether the face image is a living body face is detected based on the probability that a plurality of pixel points in the face image belong to the living body face, information of the pixel points in the face image can be focused, and identification of information with finer granularity is achieved, so that accuracy of face living body detection can be effectively improved, and further detection efficiency of the face living body detection is improved.

In step S102 of the above embodiment, based on a human face living body detection model and a human face image trained in advance, a probability that a plurality of pixel points in the human face image belong to a living body human face is obtained, where the plurality of pixel points of the human face image may refer to each pixel point in the human face image, at this time, the human face image is input into the human face living body detection model, and the human face living body detection model may predict and output the probability that each pixel point in the human face image belongs to a living body human face.

Or a plurality of pixel points in the face image can also refer to part of the pixel points. For example, considering that the calculation amount for predicting the probability that each pixel belongs to the living body face is large, the face image can be virtually divided into a plurality of grids, and one pixel is selected in each grid to predict the probability that the pixel belongs to the living body face. The size of the grid is set according to actual requirements, and if the grid is accurate, the size of the grid can be reduced so as to predict the probability that more pixel points belong to the living human face. If the calculation amount is reduced, the size of the grid can be enlarged to predict the probability that fewer pixel points belong to the living human face. In practical application, accuracy and calculation amount can be balanced, and a grid with a proper size can be selected.

FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure; as shown in fig. 2, this embodiment describes the technical solution of the present disclosure in more detail based on the technical solution of the embodiment shown in fig. 1. As shown in fig. 2, the method for detecting a living human face in this embodiment may specifically include the following steps:

s201, obtaining a face image to be detected;

s202, cutting a face region image from the face image;

s203, acquiring the probability that a plurality of pixel points in the face area image belong to the living body face based on the pre-trained face living body detection model and the face area image, and acquiring the probability that the plurality of pixel points in the face image belong to the living body face;

in practical applications, the acquired face image to be detected may include not only a face portion but also a portion of background. In order to improve the detection efficiency, in this embodiment, a face region image may also be cut out from the face image, so as to avoid that the accuracy and the detection efficiency of the face living body detection are affected when the background information is more.

Specifically, a pre-trained face region detection model may be adopted to detect a region of a face region image in a face image; then, based on the detected region of the face region image, a face region image is cut out from the face image.

And then inputting the face region image into a face living body detection model, wherein the face living body detection model can predict and output the probability that a plurality of pixel points in the face region image belong to a living body face. Similarly, the human face living body detection model can predict the probability that each pixel point in the human face region image belongs to the living body human face. Or the face region image can be virtually divided into a plurality of grids, and the probability that one pixel point in each grid belongs to the living body face is predicted.

In order to improve the accuracy of the living human face detection, preferably, in this embodiment, the living human face detection model may predict the probability that each pixel point in the human face region image in the human face image belongs to the living human face.

Steps S202-S203 are one implementation of step S102 of the embodiment shown in fig. 1 described above.

S204, calculating the probability that the face image is the living body face based on the probability that a plurality of pixel points in the face image belong to the living body face;

specifically, various mathematical calculation methods can be adopted to calculate the probability that the face image is the living body face based on the probability that a plurality of pixel points in the face image belong to the living body face. For example, the simplest method may be adopted to directly take the average value of the probabilities of the plurality of pixel points in the face image belonging to the living body face as the probability that the face image is the living body face. Or may also be calculated by using other mathematical calculation methods, which are not described in detail herein.

The human face living body detection model of the embodiment can adopt a U-Net network structure. The plurality of pixel points includes each pixel point in the stitched image. When the human face living body detection model is used, the human face area image is input into the human face living body detection model, and the human face living body detection model can output a prediction image with a value range of 0-1. The value of each pixel point in the predicted image represents the prediction probability of the pixel point belonging to the living human face. And then, averaging the prediction probabilities of all the pixel points in the predicted image to obtain a prediction score, which can be used for measuring the probability that the face image to be detected is a living face.

S205, detecting whether the face image is the living body face or not based on the probability that the face image is the living body face and a preset probability threshold value.

For example, if the probability that the face image is a living face is greater than or equal to a preset probability threshold, determining that the face image is a living face; if the probability that the face image is the living body face is smaller than the preset probability threshold value, the face image is determined to be an attack, and whether the face image is the living body face or the attack can be accurately identified.

The preset probability threshold may be set based on experience, and may be, for example, 0.5, 0.6, 0.7, or other values, which are not limited herein.

In the face living body detection method of the embodiment, a face region image is cut out from a face image; and based on the human face living body detection model and the human face area image which are trained in advance, the probability that a plurality of pixel points in the human face area image belong to the living body human face is obtained, the probability that a plurality of pixel points in the human face image belong to the living body human face is obtained, and the accuracy and the detection efficiency of human face living body detection can be further improved. Meanwhile, according to the embodiment, information of pixel points in the face image can be focused, and identification of finer-grained information is achieved, so that accuracy of face living body detection can be effectively improved, and further detection efficiency of face living body detection is improved.

FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure; as shown in fig. 3, the present embodiment provides a training method for a face living body detection model, which can be applied to a training device for a face living body detection model, and specifically includes the following steps:

s301, collecting a plurality of groups of training data, wherein each group of training data comprises a living body face image and an attack face image;

in this embodiment, the living body face image and the attack face image in each set of training data may be any two images with corresponding attributes, and it is not required that the living body face image and the attack face image are images of the same person and the same scene.

S302, generating corresponding spliced images based on the living body face images and the attack face images in each group of training data;

that is, the generated spliced image is partially a living face image and partially an attack face image.

S303, marking the real probability that a plurality of pixel points in the spliced image corresponding to each group of training data belong to the living human face;

when generating a stitched image according to step S302, it may be retained whether the source of each portion of the stitched image is a live face image or an attack face image. For the part of the spliced image of which the source is the living body face image, the true probability that the corresponding pixel point belongs to the living body face is 1. For the part with the source of attacking the face image, the true probability that the corresponding pixel point belongs to the living face is 0.

The stitching in step S302 in this embodiment may be performed in any manner, for example, two images may be adjusted to be images with the same size, and then a part of image blocks in the attack face image is directly cut out to replace image blocks in corresponding areas in the living body face image, so as to form a stitched image. Or one, two or more image blocks can be cut out from the living body face image at will, then one, two or more image blocks with the same size are cut out from the attack face image at random, and the image blocks with the same size cut out from the living body face image are replaced to form a spliced image. Or other similar modes can be adopted to obtain a spliced image based on the living body face image and the attack face image, which is not limited herein.

When the spliced image is generated in any mode, which image block is from the living body face image and which image block is from the attack image can be clearly known, and further the true probability that the pixel point in the image block belongs to the living body face can be labeled according to the source of each image block.

It should be noted that, in an embodiment of the present disclosure, the true probability that each pixel point in the stitched image corresponding to each set of training data belongs to a living human face may be labeled; or only the real probability that the representative partial pixel points belong to the living human face can be marked. For example, only the true probabilities that a plurality of pixel points at a plurality of preset positions in the stitched image belong to the living human face may be identified. Certainly, in practical application, a plurality of pixel points can be selected from the spliced image according to other modes, and the real probability that the pixel points belong to the living human face is labeled.

S304, training the face living body detection model based on the spliced images corresponding to the training data and the real probability that a plurality of pixel points in the spliced images belong to the living body face.

Different from the two-classification human face living body detection model in the prior art, the two-classification human face living body detection model comprises the following steps: the face living body detection model in the prior art is a model focusing on an image whole, a plurality of training samples are adopted during training, each training sample comprises a face image or an attack image, and the corresponding training sample is a living body face or attack label. In the embodiment, the living body face image of each group of training data, the spliced image generated by the attack face image and the real probability that a plurality of pixel points in the spliced image belong to the living body face are adopted to train the face living body detection model, so that the trained face living body detection model can pay more attention to the detailed characteristics in the spliced image, and the detection accuracy of the face living body detection model is improved.

By adopting the training method, the trained face living body detection model can pay more attention to the fine-grained characteristics in the spliced image, the detection accuracy of the face living body detection model is improved, and the detection efficiency is further improved.

Compared with the human face living body detection model in the prior art, the existing human face living body detection model only focuses on the whole image, lacks identification of characteristic information with finer granularity, can cause overfitting of the model, further has poor generalization performance, and has influence on actual application performance due to the problem that the detection performance of unknown attack samples and attack modes is reduced. The training method of the face living body detection model of the embodiment can pay attention to the identification of feature information with finer granularity, can avoid overfitting of the model, improves the generalization performance of the model, has good detection performance on unknown attack samples and attack modes, and has very strong practical application performance.

FIG. 4 is a schematic diagram according to a fourth embodiment of the present disclosure; as shown in fig. 4, the embodiment provides a training method for a face live detection model, which specifically includes the following steps:

s401, collecting a plurality of groups of training data, wherein each group of training data comprises a living body face image and an attack face image;

s402, respectively dividing the living body face image and the attack face image in each group of training data into N × N image blocks;

s403, mixing the living body face image and the N x N image blocks of the attack face image;

s404, extracting N image blocks from the mixed 2X N image blocks, and splicing the N image blocks into a spliced image;

in this embodiment, when the living body face image and the attack face image are segmented, the living body face image and the attack face image are respectively segmented into N × N image blocks, and specifically, the size of N may be set according to actual requirements and experience. For example, for images of small size, N may be slightly smaller, e.g., set to 2, 3, or 4. For larger size images, N may be set slightly larger, such as 10, 20 or other values, and is not limited herein.

In addition, in this embodiment, the sizes of the live face image and the attack face image in the default training data are the same. In practical application, if the sizes are inconsistent, the sizes can be adjusted to be consistent.

In this embodiment, the N × N image blocks of the live face image and the N × N image blocks of the attack face image are mixed to obtain 2 × N image blocks, and then the N × N image blocks may be randomly extracted from the 2 × N image blocks to be spliced into a spliced image. That is to say, the number of the image blocks of the living body face image and the number of the image blocks of the attack face image in the spliced image are random and may be as many, the former may be larger than the latter, and the latter may be larger than the former.

Steps S402-S404 are an implementation of step S302 of the embodiment shown in fig. 3.

In order to improve the detection accuracy of the face living body detection model, in this embodiment, a corresponding stitched image is generated based on the living body face image and the attack face image in each set of training data, and the following steps may also be adopted:

(1) for each group of training data, respectively cutting out a first face region image and a second face region image from a living body face image and an attack face image in the training data;

(2) dividing the first face region image and the second face region image into N-by-N image blocks respectively;

(3) mixing the image blocks of N x N of the first face region image and the second face region image;

(4) and extracting N image blocks from the mixed 2X N image blocks, and splicing into a spliced image.

Different from the above steps 402-S404, in the process of generating the stitched image, the steps (1) - (4) first cut out the first face region image and the second face region image from the live face image and the attack face image in the training data, respectively. The specific implementation manner is the same as that of step S202 in the embodiment of the face in-vivo detection method, a pre-trained face region detection model may be adopted to detect the region where the face is located from the in-vivo face image and the attack face image respectively, so as to remove the background, improve the proportion of the face region in the first face region image and the second face region image, and further improve the accuracy of the trained face in-vivo detection model.

Then, the implementation of steps (2) - (4) is the same as the implementation of steps 402-S404 described above.

In the two processes of generating the spliced image, the spliced image can be accurately generated, preparation is made for subsequent training of the face living body detection model, and the accuracy of the trained face living body detection model is further improved.

S405, based on the source of each image block in the spliced image corresponding to each group of training data, marking the real probability that the pixel points in each image block belong to the living body face to obtain the real probability that a plurality of pixel points in the spliced image belong to the living body face;

the source of each image block in the stitched image means that the image block is from a live face image or from an attack image.

That is, in the above-described process of generating the stitched image, the source of each image block is preserved. And then, based on the source of each image block, the real probability that the pixel points in each image block belong to the living human face can be labeled. For example, if an image block is derived from a living human face image, the true probability that each pixel in the image block belongs to a living human face is 1. If a certain image block is derived from an attack face image, the true probability that each pixel point in the image block belongs to a living face is 0.

In an embodiment of the present disclosure, the true probability that each pixel point in each image block belongs to a living human face may be labeled based on the source of each image block in the stitched image corresponding to each set of training data. The real probability that a preset number of pixel points belong to the living body face at the preset position of each image block can be labeled, and the real probability that each pixel point belongs to the living body face does not need to be labeled.

S406, selecting a group of spliced images of training data and the true probability that a plurality of pixel points in the spliced images belong to the living human face;

in this embodiment, a group of training data is selected for one training. In practical application, multiple sets of training data can be selected for one training, and the principle is the same, which is not described herein again.

S407, predicting probability that a plurality of pixel points in the spliced image belong to the living human face based on the obtained human face living body detection model;

s408, constructing a loss function of the spliced image based on the prediction probability and the labeled real probability that a plurality of pixel points belong to the living body face;

s409, detecting whether the loss function is converged, and if not, executing a step S410; if yes, go to step S411;

s410, adjusting parameters of a human face living body detection model; returning to step S406 to select the relevant data of the next piece of training data to continue training.

S411, detecting whether a training termination condition is met; if so, terminating the training, determining parameters of the face living body detection model, and further determining the face living body detection model; if not, the procedure returns to step S406 to select the related data of the next piece of training data to continue training.

The training termination condition of this embodiment may be that the loss function is always converged during the training for a preset number of consecutive times. Wherein the consecutive preset times can be 80 times, 100 times or other consecutive times. Or the training termination condition may also be that the training number reaches a preset number threshold, for example, the preset number threshold may be one million, eighty thousand, or other numbers set according to actual experience, which is not limited herein.

It should be noted that the process of the face live detection method in the embodiment shown in fig. 2 must be consistent with the training process of the face live detection model in the embodiment. When the human face living body detection model is trained, the human face region image is cut out by adopting the step (1), and then the human face region image is cut out from the human face image to be detected by adopting the step S202 in the using process. Secondly, in the using process, the predicted probability that the plurality of predicted pixel points belong to the living body face is consistent with the principle of the real probability that the plurality of pixel points labeled in the training process belong to the living body face. Namely, in the training process, the real probability that each pixel point in the image block belongs to the living body face is marked, so that the probability that each pixel point belongs to the living body face can be predicted in the training process, and further the prediction probability that each pixel point belongs to the living body face in the input image can be predicted in the use process. If the real probability that the preset number of pixel points at the preset positions of the image blocks belong to the living body face is marked in the training process, the probability that the pixel points at the preset positions belong to the living body face can be predicted in the training process, and then the prediction probability that the pixel points at the preset positions in the input image belong to the living body face can be predicted in the use process.

Preferably, in order to improve accuracy, in this embodiment, during training and during use, each pixel in the image is included in the plurality of pixels. The human face living body detection model of the embodiment adopts a U-Net network structure, can fully retain spatial feature information of an image, outputs a single-channel image with the same size as an original image, and finally connects a layer of Sigmod activation function to the output single-channel image to obtain a graph with a value range of 0-1. Namely, the value of each pixel point in the output graph represents the prediction probability that the pixel point belongs to the living population.

It should be noted that, during training and use, the plurality of pixel points only include a part of pixel points at preset positions in the image. Label images may also be used during the training process. When the method is used, the predicted probability that a plurality of pixel points belong to the living human face can also be realized by adopting a predicted image. At this time, the pixel points at the preset positions in the tag image and the predicted image have numerical values, and the pixel points at other positions can be null, if the mark is "-", the pixel points at other positions have no numerical values.

In the training process, during the regression supervision, for convenience of operation, the supervision label signal may be a label image formed by 0/1 obtained according to the real probability that each pixel point of the stitched image belongs to the living human face. Namely, the position of each pixel point of the image block corresponding to the living body face image in the label image is 1, and the position of each pixel point of the image block corresponding to the attack face image is 0.

In this embodiment, the loss function may adopt a smooth L1 loss function, and the specific formula may be as follows:

Loss＝∑_{all pixel points}SmoothL1(f(x)-y)

Wherein x is an input spliced image, f (x) is a prediction result of the face living body detection model, can be in a prediction image form and comprises the prediction probability that each pixel belongs to the face of the living body; and y is a label image corresponding to the spliced image, and comprises the real probability that each pixel point belongs to the living human face.

To accurately calculate Loss, the value of the smoothing L1 Loss function may be processed as follows.

According to the training method of the human face living body detection model, the spliced images corresponding to each group of training data are generated in an image block mixing mode, the true probability that a plurality of pixel points in the spliced images belong to the living body human face is marked according to the sources of the image blocks reserved in the spliced images, and the following training can be supervised; and then training the face living body detection model by adopting the spliced images of each group of training data and the real probability that a plurality of pixel points in the spliced images belong to the living body face. Because the spliced images are spliced, the characteristics of the original image are highly damaged, and overfitting of the human face living body detection model can be avoided to a certain extent; and secondly, image blocks of the original image are reserved in the spliced image, so that the human face living body detection model can pay more attention to local fine-grained characteristics.

According to the training method of the face in-vivo detection model, the face in-vivo detection model is trained in a mode based on the spliced images, the characteristics of the original image are effectively damaged, the face in-vivo detection model can pay more attention to the characteristics of local fine granularity, the generalization performance of the face in-vivo detection model can be effectively improved, and the method has very strong practicability.

FIG. 5 is a schematic diagram according to a fifth embodiment of the present disclosure; as shown in fig. 5, the present embodiment provides a living human face detection apparatus 500, including:

an image obtaining module 501, configured to obtain a face image to be detected;

a probability obtaining module 502, configured to obtain, based on a human face living body detection model trained in advance and a human face image, a probability that a plurality of pixel points in the human face image belong to a living body human face;

the detecting module 503 is configured to detect whether the face image is a live face based on the probability that the plurality of pixel points in the face image belong to the live face.

The human face living body detection apparatus 500 of this embodiment implements the implementation principle and the technical effect of the human face living body detection by using the modules, which are the same as the implementation of the related method embodiment described above, and the details of the related method embodiment can be referred to and are not repeated herein.

FIG. 6 is a schematic diagram according to a sixth embodiment of the present disclosure; as shown in fig. 6, the present embodiment provides a living human face detection apparatus 600, which includes an image acquisition module 601, a probability acquisition module 602, and a detection module 603 having the same functions as those of fig. 5.

As shown in fig. 6, in the living human face detection apparatus 600 of the present embodiment, the detection module 603 includes:

a calculating unit 6031 configured to calculate a probability that the face image is a live face based on a probability that a plurality of pixel points in the face image belong to the live face;

a detecting unit 6032, configured to detect whether the face image is a live face based on the probability that the face image is a live face and a preset probability threshold.

Further, in an embodiment of the present disclosure, the detecting unit 6032 is configured to:

if the probability that the face image is the living body face is larger than or equal to a preset probability threshold value, determining that the face image is the living body face;

and if the probability that the face image is the living body face is smaller than a preset probability threshold value, determining that the face image is an attack.

In an embodiment of the present disclosure, the probability obtaining module 602 is configured to:

cutting a face region image from the face image;

based on a human face living body detection model and a human face area image which are trained in advance, the probability that a plurality of pixel points in the human face area image belong to a living body human face is obtained, and the probability that the plurality of pixel points in the human face image belong to the living body human face is obtained.

The implementation principle and technical effect of the face live detection device 600 of this embodiment by using the modules are the same as the implementation of the related method embodiment, and reference may be made to the description of the related method embodiment in detail, which is not described herein again.

FIG. 7 is a schematic diagram according to a seventh embodiment of the present disclosure; as shown in fig. 7, the present embodiment provides a training apparatus 700 for a face live detection model, which includes:

an acquisition module 701, configured to acquire multiple sets of training data, where each set of training data includes a living body face image and an attack face image;

a generating module 702, configured to generate a corresponding stitched image based on the living body face image and the attack face image in each set of training data;

the labeling module 703 is configured to label a true probability that a plurality of pixel points in the stitched image corresponding to each set of training data belong to a living human face;

and the training module 704 is configured to train a face living body detection model based on the stitched images corresponding to the sets of training data and the true probabilities that a plurality of pixel points in the stitched images belong to a living body face.

The implementation principle and technical effect of the training of the face living body detection model implemented by the training device 700 of the face living body detection model according to the embodiment are the same as the implementation of the related method embodiment, and reference may be made to the description of the related method embodiment in detail, which is not described herein again.

Further, in an embodiment of the present disclosure, the generating module 702 is configured to:

respectively dividing the living body face image and the attack face image in each group of training data into N x N image blocks;

mixing the living body face image and the N x N image blocks of the attack face image;

and extracting N image blocks from the mixed 2X N image blocks, and splicing into a spliced image.

Further, in an embodiment of the present disclosure, the labeling module 703 is configured to:

and marking the real probability that the pixel points in each image block belong to the living body face based on the source of each image block in the spliced image corresponding to each group of training data to obtain the real probability that a plurality of pixel points in the spliced image belong to the living body face.

for each group of training data, respectively cutting out a first face region image and a second face region image from a living body face image and an attack face image in the training data;

dividing the first face region image and the second face region image into N-N image blocks respectively;

mixing the image blocks of N x N of the first face region image and the second face region image;

Further, in an embodiment of the present disclosure, the training module 704 is configured to:

acquiring the prediction probability that a plurality of pixel points in the spliced image predicted by the face living body detection model belong to the living body face based on the spliced image corresponding to each group of training data;

constructing a loss function of the spliced image based on the prediction probability and the marked real probability that a plurality of pixel points belong to the living body face;

and if the loss function is not converged, adjusting parameters of the human face living body detection model.

The human face living body detection apparatus 700 of the embodiment implements the implementation principle and technical effect of human face living body detection by using the modules, which are the same as the implementation of the related method embodiment, and reference may be made to the description of the related method embodiment for details, and details are not repeated here.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 8 shows a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806 such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The computing unit 801 performs the various methods and processes described above, such as the methods described above of the present disclosure. For example, in some embodiments, the above-described methods of the present disclosure may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When loaded into RAM 803 and executed by computing unit 801, may perform one or more of the steps of the above-described methods of the present disclosure described above. Alternatively, in other embodiments, the computing unit 801 may be configured by any other suitable means (e.g., by means of firmware) to perform the above-described methods of the present disclosure.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A face in-vivo detection method comprises the following steps:

acquiring a human face image to be detected;

2. The method of claim 1, wherein detecting whether the face image is a live face based on a probability that a plurality of pixel points in the face image belong to the live face comprises:

calculating the probability that the face image is a living body face based on the probability that a plurality of pixel points in the face image belong to the living body face;

and detecting whether the face image is a living body face or not based on the probability that the face image is the living body face and a preset probability threshold.

3. The method of claim 2, wherein detecting whether the face image is a live face based on the probability that the face image is a live face and a preset probability threshold comprises:

if the probability that the face image is a living body face is greater than or equal to the preset probability threshold value, determining that the face image is a living body face;

and if the probability that the face image is a living body face is smaller than the preset probability threshold value, determining that the face image is an attack.

4. The method according to any one of claims 1 to 3, wherein obtaining the probability that a plurality of pixel points in the face image belong to a living face based on a pre-trained face living body detection model and the face image comprises:

cutting a face region image from the face image;

based on the pre-trained human face living body detection model and the human face area image, acquiring the probability that a plurality of pixel points in the human face area image belong to a living body human face, and acquiring the probability that the plurality of pixel points in the human face image belong to the living body human face.

5. A training method of a human face living body detection model comprises the following steps:

generating corresponding spliced images based on the living body face images and the attack face images in the training data of each group;

6. The method of claim 5, wherein generating a corresponding stitched image based on the live face image and the attack face image in each set of the training data comprises:

dividing the living body face image and the attack face image in each group of training data into N x N image blocks respectively;

and extracting N image blocks from the mixed 2X N image blocks, and splicing to form the spliced image.

7. The method of claim 6, wherein labeling the true probability that a plurality of pixel points in the stitched image corresponding to each set of the training data belong to a live face comprises:

and marking the real probability that the pixel points in the image blocks belong to the living body face based on the source of each image block in the spliced image corresponding to each group of the training data to obtain the real probability that a plurality of pixel points in the spliced image belong to the living body face.

8. The method of claim 5, wherein generating a corresponding stitched image based on the live face image and the attack face image in each set of the training data comprises:

for each group of training data, respectively cutting out a first face region image and a second face region image from the living body face image and the attack face image in the training data;

dividing the first face region image and the second face region image into N × N image blocks respectively;

mixing the N-by-N image blocks of the first face region image and the second face region image;

9. The method according to any one of claims 5 to 8, wherein training the face in-vivo detection model based on the stitched image corresponding to each set of the training data and a true probability that a plurality of pixel points in the stitched image belong to a living face comprises:

acquiring the prediction probability of the plurality of pixel points in the spliced image, which is predicted by the human face living body detection model, belonging to the living body human face based on the spliced image corresponding to each group of the training data;

constructing a loss function of the spliced image based on the prediction probability of the plurality of pixel points belonging to the living body face and the marked real probability;

10. A face liveness detection device, comprising:

11. The apparatus of claim 10, wherein the detection module comprises:

the calculating unit is used for calculating the probability that the face image is the living body face based on the probability that a plurality of pixel points in the face image belong to the living body face;

and the detection unit is used for detecting whether the face image is a living body face or not based on the probability that the face image is the living body face and a preset probability threshold value.

12. The apparatus of claim 11, wherein the detection unit is to:

13. The apparatus of any one of claims 10-12, wherein the probability acquisition module is to:

cutting a face region image from the face image;

14. A training device for a human face living body detection model comprises:

15. The apparatus of claim 14, wherein the means for generating is configured to:

and extracting N image blocks from the mixed 2 x N image blocks, and splicing into the spliced image.

16. The apparatus of claim 15, wherein the tagging module is to:

17. The apparatus of claim 14, wherein the means for generating is configured to:

18. The apparatus of any of claims 14-17, wherein the training module is to:

19. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4 or 5-9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 1-4 or 5-9.

21. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1-4 or 5-9.