CN111310724A

CN111310724A - In-vivo detection method and device based on deep learning, storage medium and equipment

Info

Publication number: CN111310724A
Application number: CN202010170734.XA
Authority: CN
Inventors: 王诗韵; 毛晓蛟; 章勇; 曹李军
Original assignee: Suzhou Keda Technology Co Ltd
Current assignee: Suzhou Keda Technology Co Ltd
Priority date: 2020-03-12
Filing date: 2020-03-12
Publication date: 2020-06-19

Abstract

The application relates to a living body detection method, a living body detection device, a storage medium and living body detection equipment based on deep learning, which belong to the technical field of computers, and the method comprises the following steps: acquiring a target image to be subjected to living body detection; obtaining a living body detection model; inputting the target image into a living body detection model to obtain a living body detection result; the terminal detection model is obtained by deep learning of the first neural network model and terminal sample data, and the photo detection model is obtained by deep learning of the second neural network model and photo sample data; the terminal sample data comprises terminal images which are acquired in a natural scene and have different angles, different background information and/or different terminal models; the photo sample data comprises real face images and photo face images which are acquired in a natural scene and have different angles, different background information and/or different shielding degrees; the problem of low detection efficiency of the living body detection by using the characteristics of the motion information can be solved; the living body detection efficiency is improved.

Description

In-vivo detection method and device based on deep learning, storage medium and equipment

Technical Field

The application relates to a living body detection method and device based on deep learning, a storage medium and an access control device, and belongs to the technical field of computers.

Background

With the wide application of the face recognition technology, how to automatically and efficiently distinguish the authenticity of a face image and resist spoofing attacks to ensure the safety of a face recognition system becomes a problem to be solved in the face recognition technology, and the technology is called as a living body detection technology. Common spoofing attacks include photo attacks, video attacks, 3D mask attacks, and the like. In the living body detection, a real face image is directly acquired under a camera, and a deceptive face image is indirectly acquired through mobile phone video, paper photos, certificate photos and the like. There is a certain difference between the two, and the difference is mainly reflected in the texture, depth, motion and spectral information of the image. Different living body detection methods can be designed by utilizing the difference between the real face image and the deception face image, and the real face image and the deception face image are judged.

The traditional method for detecting the living body in the face recognition comprises the following steps: the living body detection is performed using the characteristics of the motion information. The method requires a user to make actions according to phrase prompts randomly generated by the system through a human-computer interaction system, and a person who finishes related actions can prove that the person is a real face.

However, the above method can achieve a high recognition rate, but requires high cooperation of users, and has a long detection time and a low detection efficiency.

Disclosure of Invention

The application provides a living body detection method, a living body detection device, a storage medium and equipment based on deep learning, which can solve the problem of low detection efficiency of living body detection by using the characteristics of motion information. The application provides the following technical scheme:

in a first aspect, a living body detection method based on deep learning is provided, the method including:

acquiring a target image to be subjected to living body detection;

obtaining a living body detection model; the in-vivo detection model comprises a terminal detection model and a photo detection model which are cascaded with each other, the terminal detection model is obtained by deep learning of a first neural network model and terminal sample data, and the photo detection model is obtained by deep learning of a second neural network model and photo sample data; the terminal sample data comprises terminal images which are acquired in a natural scene and have different angles, different background information and/or different terminal models; the photo sample data comprises real face images and photo face images which are acquired in a natural scene and have different angles, different background information and/or different shielding degrees;

and inputting the target image into the living body detection model to obtain a living body detection result, wherein the living body detection result is used for indicating whether the face in the target image is a real face.

Optionally, the acquiring the in-vivo detection model includes:

acquiring the terminal sample data;

performing data expansion on the terminal sample data to obtain expanded terminal sample data; each group of expanded terminal sample data comprises a terminal image and terminal position information in the terminal image;

inputting the terminal image into the first neural network model to obtain a first training result;

determining a first difference between the first training result and the corresponding terminal's location information based on a first loss function;

and performing iterative training on the model parameters of the first neural network model based on the first difference, and stopping until the first difference reaches a first difference range or the iteration times reaches a first preset time, so as to obtain the terminal detection model.

Optionally, the first neural network model is an object detection network with a feature enhancement network.

Optionally, the acquiring the in-vivo detection model includes:

acquiring the photo sample data;

carrying out face detection on the photo sample data to obtain a face detection result;

carrying out multi-scale cutting on the face detection result to obtain a multi-scale real face image and a multi-scale photo face image;

performing data expansion on the multi-scale real face image and the multi-scale photo face image to obtain expanded photo sample data; each group of extended photo sample data comprises a photo face image and position information of a photo face in the photo face image, or comprises a real face image and position information of a real face in the real face image;

inputting the real face image and the photo face image into the second neural network model to obtain a second training result;

determining a second difference between the second training result and corresponding location information based on a second loss function;

and performing iterative training on the model parameters of the second neural network model based on the second difference until the second difference reaches a second difference range or the iteration times reaches a second preset time, and obtaining the photo detection model.

Optionally, before performing data expansion on the multi-scale real face image and the multi-scale photo face image to obtain expanded photo sample data, the method further includes:

and carrying out face key point detection and face alignment on the multi-scale real face image and the multi-scale photo face image.

Optionally, the second neural network model includes a convolutional neural network corresponding to each scale in the multi-scale real face image and the multi-scale photo face image, and a plurality of convolutional neural networks are cascaded with each other.

Optionally, the data extension mode includes: random flipping, cropping, rotation, and/or color change.

In a second aspect, there is provided a living body detecting apparatus based on deep learning, the apparatus including:

the image acquisition module is used for acquiring a target image to be subjected to living body detection;

the model acquisition module is used for acquiring a living body detection model; the in-vivo detection model comprises a terminal detection model and a photo detection model which are cascaded with each other, the terminal detection model is obtained by deep learning of a first neural network model and terminal sample data, and the photo detection model is obtained by deep learning of a second neural network model and photo sample data; the terminal sample data comprises terminal images which are acquired in a natural scene and have different angles, different background information and/or different terminal models; the photo sample data comprises real face images and photo face images which are acquired in a natural scene and have different angles, different background information and/or different shielding degrees;

and the living body detection module is used for inputting the target image into the living body detection model to obtain a living body detection result, and the living body detection result is used for indicating whether the face in the target image is a real face.

In a third aspect, a deep learning based liveness detection device is provided, the device comprising a processor and a memory; the memory has stored therein a program that is loaded and executed by the processor to implement the deep learning-based liveness detection method according to the first aspect.

In a fourth aspect, a computer-readable storage medium is provided, in which a program is stored, the program being loaded and executed by the processor to implement the deep learning based live body detection method of the first aspect.

In a fifth aspect, an access control device is provided, where the access control device includes an image acquisition component, an image processing component in communication connection with the image acquisition component, and a memory and a result display component in communication connection with the image processing component;

the image acquisition assembly is used for acquiring a target image;

the memory stores a program; the image processing component is used for loading and executing the program to realize the living body detection method based on deep learning provided by the first aspect;

and the result display component is used for displaying the living body detection result obtained by the image processing component.

The beneficial effect of this application lies in: acquiring a target image to be subjected to living body detection; obtaining a living body detection model; inputting the target image into a living body detection model to obtain a living body detection result, wherein the living body detection result is used for indicating whether the face in the target image is a real face or not; the problem of low detection efficiency of the living body detection by using the characteristics of the motion information can be solved; the in-vivo detection model comprises a terminal detection model and a photo detection model which are cascaded with each other, the terminal detection model is obtained by deep learning of a first neural network model and terminal sample data, and the photo detection model is obtained by deep learning of a second neural network model and photo sample data; the terminal sample data comprises terminal images which are acquired in a natural scene and have different angles, different background information and/or different terminal models; the photo sample data comprises real face images and photo face images which are acquired in a natural scene and have different angles, different background information and/or different shielding degrees; the method can effectively resist terminal copying attack and photo attack which may occur in an actual scene, improves the accuracy rate of the living body detection, ensures the real-time property, can realize the living body detection without the cooperation of a user for executing a specified action, and improves the detection efficiency.

The foregoing description is only an overview of the technical solutions of the present application, and in order to make the technical solutions of the present application more clear and clear, and to implement the technical solutions according to the content of the description, the following detailed description is made with reference to the preferred embodiments of the present application and the accompanying drawings.

Drawings

FIG. 1 is a flowchart of a deep learning-based in vivo detection method according to an embodiment of the present application;

FIG. 2 is a flowchart of a deep learning-based liveness detection method according to another embodiment of the present application;

FIG. 3 is a block diagram of a deep learning based liveness detection device according to an embodiment of the present application;

FIG. 4 is a block diagram of a deep learning-based liveness detection device according to an embodiment of the present application.

Detailed Description

The following detailed description of embodiments of the present application will be described in conjunction with the accompanying drawings and examples. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application.

First, several terms referred to in the present application will be described.

Convolutional Neural Network (CNN): on the basis of the traditional multilayer neural network, a more effective characteristic learning part is added, and the specific operation is to add a neural network model obtained by partially connecting a convolution layer and a pooling layer in front of the original fully-connected layer.

The convolutional neural network sequentially comprises an input layer, two cascaded convolutional layers, an activation layer, a pooling layer, two full-connection layers and an output layer from front to back. Wherein, the convolutional layer: extracting features by translating over the original image; an active layer: for increasing non-linear segmentation capability; a pooling layer: the method is used for compressing the data and parameter quantity, reducing overfitting and reducing the complexity of the network.

Target detection network: for finding objects of interest from an image or video, while detecting their position and size. Object detection networks include, but are not limited to: R-CNN, OverFeat, Fast/Faster R-CNN, SSD, YOLO series, etc.

In the present application, a target detection network is taken as a multi-scale Single Shot firing detection (SSD) network as an example. The SSD network uses a VGG16 network as a basic network, and adopts a feature layer group (multi-scale feature layer) of a pyramid structure for classification and positioning. In the feature layer group of the pyramid structure, the shallow feature map is larger in size and more suitable for detecting small targets, the deep feature map is smaller in size and more suitable for detecting large targets, the burden of a single-layer target detection algorithm on a detection network is relieved, and the detection effect is better than that of the single-layer detection algorithm. The traditional SSD algorithm adopts a pyramid feature layer to perform Box extraction. However, the feature layers with different scales are independent, and feature complementation between the feature layers is lacked, so that the performance of the target detection algorithm is not improved. For example, deep features often provide rich context information to aid in the detection of small target objects, whereas shallow features of conventional SSD algorithms lack sufficient context information to make SSD algorithms less effective in detecting small target objects.

In the application, a feature enhancement network is added to the target detection network. Illustratively, a feature enhancement network, i.e., a feature enhancement single-shot multi-box detector (FE-SSD) algorithm, is added to the SSD network. The construction strategy of the FE-SSD network comprises the following steps: in a feature layer group of the SSD pyramid structure, convolution operation is respectively carried out on feature layers of each scale from bottom to top, and more abstract semantic features are extracted; respectively fusing the feature layer after convolution and the feature layer before convolution to obtain a group of new pyramid feature layers; and then, carrying out classification and positioning operation on the target on the new pyramid feature layer.

Optionally, in the present application, an execution subject of each embodiment is taken as an example of an electronic device such as a terminal or a server, where the terminal may be a device with an image processing capability, such as a mobile phone, a computer, a tablet computer, and a video conference terminal, and the embodiment does not limit the type of the terminal.

The application scenario of the in-vivo detection method based on deep learning provided by the application includes but is not limited to at least one of the following:

1. the in-vivo detection of the access control equipment: the entrance guard equipment (including an image acquisition assembly) is used for capturing a front target and then performing living body detection on the captured image.

2. Liveness detection for face payment or login: and (4) capturing a target in front of a camera in the mobile terminal, and then carrying out the live body detection on the captured image.

Fig. 1 is a flowchart of a deep learning-based in-vivo detection method according to an embodiment of the present application. The method at least comprises the following steps:

step 101, acquiring a target image to be subjected to living body detection.

The target image can be obtained by shooting through a shooting component, such as: the attendance checking system is obtained by shooting through a mobile phone camera, a camera of attendance checking equipment and the like; or, the information can be sent by other devices; or, the image may be stored in a storage medium, and the source of the target image is not limited in this embodiment.

Step 102, acquiring a living body detection model; the living body detection model comprises a terminal detection model and a photo detection model which are cascaded with each other, the terminal detection model is obtained by deep learning of a first neural network model and terminal sample data, and the photo detection model is obtained by deep learning of a second neural network model and photo sample data; the terminal sample data comprises terminal images which are acquired in a natural scene and have different angles, different background information and/or different terminal models; the photo sample data comprises real face images and photo face images which are acquired under a natural scene and have different angles, different background information and/or different occlusion degrees.

Optionally, the terminal detection model is located before the photo detection model; or, the terminal detection model is located behind the photo detection model, and the embodiment does not limit the cascade order between the terminal detection model and the photo detection model.

Acquiring a living body detection model includes: and acquiring a terminal detection model and a photo detection model. The terminal detection model and the photo detection model can be stored in a storage medium and obtained by reading; or by real-time training.

The training terminal detection model and the photo detection model are introduced below.

Firstly, the training of the terminal detection model at least comprises the following 5 steps:

step 1, obtaining terminal sample data.

The terminal sample data comprises terminal images which are acquired in a natural scene and have different angles, different background information and/or different terminal models. The terminal sample data also includes information on the actual position of the terminal in each terminal image.

Optionally, the position of the terminal is indicated by a rectangular frame, and in this case, the position information of the terminal may include coordinates of an upper left vertex and a lower right vertex of the rectangular frame; alternatively, the coordinates of a vertex of the rectangular frame and the length and width of the rectangular frame may be used.

Step 2, performing data expansion on terminal sample data to obtain expanded terminal sample data; and each group of expanded terminal sample data comprises a terminal image and the position information of the terminal in the terminal image.

In the embodiment, the data expansion is performed on the terminal sample data, so that the data volume of the training terminal detection model can be increased, and the training precision is improved.

And 3, inputting the terminal image into the first neural network model to obtain a first training result.

Optionally, the first neural network model is an object detection network with a feature enhancement network. In this embodiment, the performance of the detection terminal of the terminal detection model can be improved by adding the feature enhancement network to the existing target detection network.

And 4, determining a first difference between the first training result and the position information of the corresponding terminal based on the first loss function.

Optionally, the first loss function is a classification loss function or a regression loss function.

In one example, the classification loss function employs a Focal-loss function, which is represented by the following equation:

wherein y' is a probability value predicted by the first neural network model as a first training result; y real result, namely whether the terminal exists or not is determined according to the position information; gamma is a constant, and the value of gamma may be 0.25, or of course, other values are also possible, and the value of gamma is not limited in this embodiment.

In another example, the regression loss function is represented by the following equation:

wherein,

a tag value representing an image of the terminal (i.e., location information of the terminal) as an indication function;

predicting the offset for the terminal image;

the true offset of the terminal image.

And 5, carrying out iterative training on the model parameters of the first neural network model based on the first difference, and stopping until the first difference reaches a first difference range or the iteration times reaches a first preset time, so as to obtain a terminal detection model.

Secondly, the training of the photo detection model at least comprises the following 7 steps:

step 1, acquiring photo sample data.

The photo sample data comprises real face images and photo face images which are acquired in a natural scene and have different angles, different background information and/or different occlusion degrees.

Wherein, the real face image means: an image obtained by shooting a living human face by using image acquisition equipment (such as a camera, a mobile phone and the like); the photo face image means: the image acquisition equipment is used for taking a picture with a human face (such as a real human face picture).

And 2, carrying out face detection on the photo sample data to obtain a face detection result.

And carrying out face detection on the photo sample data by using a face detection algorithm to obtain a face detection result. The face detection algorithm includes but is not limited to: a Deformable component Model (DPM), a Cascade CNN, and the like, and the present embodiment does not limit the type of the face detection algorithm.

Optionally, the face detection result includes whether a face is included.

And 3, carrying out multi-scale cutting on the face detection result to obtain a multi-scale real face image and a multi-scale photo face image.

Optionally, the face detection result is clipped by 3 scales of the pixel expansion. The multiple scales of the multi-scale real face image correspond to the multiple scales of the multi-scale photo face image one by one; in other words, the multiple scales of the multi-scale real face image are the same as the multiple scales of the multi-scale photo face image.

Optionally, after obtaining the multi-scale real face image and the multi-scale photo face image, respectively performing face key point detection and face alignment on the multi-scale real face image and the multi-scale photo face image.

The detection of the key points of the human face comprises the following steps: on the basis of face detection, key feature points of the face, such as eyes, nose tips, mouth corner points, eyebrows, contour points of all parts of the face and the like, are automatically positioned according to the input face image, input into the face image, and output into a feature point set of the face.

The face alignment means: a process of searching a face image for a predefined point of the face (also called face shape). The face alignment is mainly to detect the eyes, nose, mouth and chin in the face and to mark them with feature points.

Step 4, performing data expansion on the multi-scale real face image and the multi-scale photo face image to obtain expanded photo sample data; each group of extended photo sample data comprises a photo face image and position information of a photo face in the photo face image, or comprises a real face image and position information of a real face in the real face image.

And 5, inputting the real face image and the photo face image into a second neural network model to obtain a second training result.

The second neural network model comprises a convolution neural network corresponding to each scale in the multi-scale real face image and the multi-scale photo face image, and the convolution neural networks are mutually cascaded.

Taking the example that the real face image and the photo face image comprise 3 scales, the real face image and the photo face image of each scale correspond to a convolution neural network, and the 3 convolution neural networks are mutually cascaded.

Step 6, determining a second difference between the second training result and the corresponding location information based on a second loss function.

Assuming that the second training result is the training result of the real face image, the corresponding position information is the position information of the real face in the real face image; and if the second training result is the training result of the photo face image, the corresponding position information is the position information of the photo face in the photo face image.

Optionally, the second loss function is a categorical loss function, which may be a softmax loss function, which is expressed by the following equation:

wherein,

a tag value (i.e., a position of a human face) representing the photo sample data as an indication function;

a probability value of a non-living body for the photo sample data.

And 7, carrying out iterative training on the model parameters of the second neural network model based on the second difference until the second difference reaches a second difference range or the iteration times reaches a second preset time, and obtaining the photo detection model.

Step 103, inputting the target image into the living body detection model to obtain a living body detection result, wherein the living body detection result is used for indicating whether the human face in the target image is a real human face.

Optionally, the in-vivo detection result is a probability value with a value range of [0,1], and the smaller the in-vivo detection result is, the higher the probability that the face in the target image is the real face is; the larger the live body detection result is, the smaller the probability that the face in the target image is a real face is.

In summary, in the living body detection method based on deep learning provided by the embodiment, the target image to be subjected to living body detection is acquired; obtaining a living body detection model; inputting the target image into a living body detection model to obtain a living body detection result, wherein the living body detection result is used for indicating whether the face in the target image is a real face or not; the problem of low detection efficiency of the living body detection by using the characteristics of the motion information can be solved; the in-vivo detection model comprises a terminal detection model and a photo detection model which are cascaded with each other, the terminal detection model is obtained by deep learning of a first neural network model and terminal sample data, and the photo detection model is obtained by deep learning of a second neural network model and photo sample data; the terminal sample data comprises terminal images which are acquired in a natural scene and have different angles, different background information and/or different terminal models; the photo sample data comprises real face images and photo face images which are acquired in a natural scene and have different angles, different background information and/or different shielding degrees; the method can effectively resist terminal copying attack and photo attack which may occur in an actual scene, improves the accuracy rate of the living body detection, ensures the real-time property, can realize the living body detection without the cooperation of a user for executing a specified action, and improves the detection efficiency.

In addition, the face key point detection and the face alignment are carried out on the face image of the multi-scale photo, so that the influence of face angles and face shielding on a model detection result is avoided, and the model detection precision is improved.

In order to more clearly understand the in-vivo detection method based on deep learning provided by the present application, the method is exemplified below. In this embodiment, taking the example that the terminal detection model in the living body detection model is cascaded before the photo detection model, the method at least comprises the following steps of 21-25:

step 21, acquiring a target image;

step 22, the terminal detection model detects whether the target image is terminal reproduction; if yes, go to step 23; if not, executing step 24;

step 23, outputting a living body detection result, wherein the living body detection result is used for indicating that the face in the target image is a non-living body face, and the process is ended;

step 24, detecting whether the target image is a photo by the photo detection model; if yes, go to step 23; if not, executing step 25;

and 25, outputting a living body detection result, wherein the living body detection result is used for indicating that the human face in the target image is a living body human face.

In summary, the in-vivo detection method based on deep learning provided by the embodiment can effectively resist mobile phone copying attacks and photo attacks which may occur in an actual scene, improves the accuracy of in-vivo detection, ensures real-time performance, and can deploy the in-vivo detection model on the mobile terminal, thereby improving the application range of the in-vivo detection model.

Optionally, taking as an example that the in-vivo detection method based on deep learning provided by the present application is used in an access control device, the access control device includes an image acquisition component, an image processing component connected to the image acquisition component in communication, and a memory and a result display component connected to the image processing component in communication.

The image acquisition assembly is used for acquiring a target image;

the memory stores programs; the image processing component is used for loading and executing a program to realize the living body detection method based on deep learning provided by the application;

The image acquisition component can be a camera or equipment with image acquisition capability such as a camera; the image processing component may be a processor; the result display component may be a display screen.

Fig. 3 is a block diagram of a living body detecting device based on deep learning according to an embodiment of the present application. The device at least comprises the following modules: an image acquisition module 310, a model acquisition module 320, and a liveness detection module 330.

An image acquisition module 310, configured to acquire a target image to be subjected to living body detection;

a model acquisition module 320 for acquiring a living body detection model; the in-vivo detection model comprises a terminal detection model and a photo detection model which are cascaded with each other, the terminal detection model is obtained by deep learning of a first neural network model and terminal sample data, and the photo detection model is obtained by deep learning of a second neural network model and photo sample data; the terminal sample data comprises terminal images with different angles, different background information and/or different terminal models, which are acquired in a natural scene; the photo sample data comprises real face images and photo face images which are acquired in a natural scene and have different angles, different background information and/or different shielding degrees;

a living body detection module 330, configured to input the target image into the living body detection model to obtain a living body detection result, where the living body detection result is used to indicate whether a face in the target image is a real face.

For relevant details reference is made to the above-described method embodiments.

It should be noted that: in the living body detecting device based on deep learning provided in the above embodiments, when performing living body detection based on deep learning, only the division of the above functional modules is illustrated, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the living body detecting device based on deep learning is divided into different functional modules to perform all or part of the functions described above. In addition, the living body detection device based on deep learning provided by the above embodiment and the living body detection method based on deep learning belong to the same concept, and the specific implementation process is described in detail in the method embodiment and is not described herein again.

Fig. 4 is a block diagram of a living body detecting device based on deep learning according to an embodiment of the present application. The apparatus comprises at least a processor 401 and a memory 402.

Processor 401 may include one or more processing cores such as: 4 core processors, 8 core processors, etc. The processor 401 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 401 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 401 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed by the display screen. In some embodiments, the processor 401 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 402 may include one or more computer-readable storage media, which may be non-transitory. Memory 402 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 402 is used to store at least one instruction for execution by processor 401 to implement the deep learning based liveness detection method provided by method embodiments herein.

In some embodiments, the in-vivo detection device based on deep learning may further include: a peripheral interface and at least one peripheral. The processor 401, memory 402 and peripheral interface may be connected by bus or signal lines. Each peripheral may be connected to the peripheral interface via a bus, signal line, or circuit board. Illustratively, peripheral devices include, but are not limited to: radio frequency circuit, touch display screen, audio circuit, power supply, etc.

Of course, the living body detecting device based on deep learning may further include fewer or more components, which is not limited by the embodiment.

Optionally, the present application further provides a computer-readable storage medium, in which a program is stored, and the program is loaded and executed by a processor to implement the deep learning based liveness detection method of the above method embodiment.

Optionally, the present application further provides a computer product, which includes a computer-readable storage medium, in which a program is stored, and the program is loaded and executed by a processor to implement the deep learning based liveness detection method of the above-mentioned method embodiment.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A living body detection method based on deep learning is characterized by comprising the following steps:

acquiring a target image to be subjected to living body detection;

2. The method of claim 1, wherein the acquiring the in-vivo detection model comprises:

acquiring the terminal sample data;

3. The method of claim 2, wherein the first neural network model is an object detection network having a feature enhancement network.

4. The method of claim 1, wherein the acquiring the in-vivo detection model comprises:

acquiring the photo sample data;

5. The method according to claim 4, wherein before performing data expansion on the multi-scale real face image and the multi-scale photo face image to obtain the expanded photo sample data, the method further comprises:

and respectively carrying out face key point detection and face alignment on the multi-scale real face image and the multi-scale photo face image.

6. The method of claim 4, wherein the second neural network model comprises a convolutional neural network corresponding to each scale in the multi-scale real face image and the multi-scale photo face image, and wherein a plurality of convolutional neural networks are cascaded with each other.

7. The method of claim 2 or 4, wherein the data extension comprises: random flipping, cropping, rotation, and/or color change.

8. A living body detecting apparatus based on deep learning, characterized in that the apparatus comprises:

9. A deep learning based liveness detection device, the device comprising a processor and a memory; the memory has stored therein a program that is loaded and executed by the processor to implement the deep learning based liveness detection method according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the storage medium has stored therein a program which, when executed by a processor, is for implementing the deep learning-based liveness detection method according to any one of claims 1 to 7.

11. The access control equipment is characterized by comprising an image acquisition component, an image processing component, a memory and a result display component, wherein the image processing component is in communication connection with the image acquisition component;

the image acquisition assembly is used for acquiring a target image;

the memory stores a program; the image processing component is used for loading and executing the program to realize the living body detection method based on the deep learning of any one of claims 1 to 7;