CN114140854A

CN114140854A - Living body detection method and device, electronic equipment and storage medium

Info

Publication number: CN114140854A
Application number: CN202111473666.5A
Authority: CN
Inventors: 黄泽斌
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-11-29
Filing date: 2021-11-29
Publication date: 2022-03-04

Abstract

The disclosure provides a living body detection method, a living body detection device, electronic equipment and a storage medium, relates to the technical field of artificial intelligence, in particular to the technical field of deep learning and computer vision, and can be applied to scenes such as face recognition, living body detection and the like. The specific implementation scheme is as follows: acquiring a human face RGB image to be detected; respectively carrying out color model conversion and Fourier transform on the face RGB image to obtain a YUV image and a spectrogram; inputting the YUV image and the spectrogram into a pre-trained living body detection model, so that the living body detection model extracts YUV image features of the YUV image, extracts spectrogram features of the spectrogram, fuses the YUV image features and the spectrogram features to obtain fusion features, and performs living body detection based on the fusion features; and obtaining the living body detection result output by the living body detection model. The characteristics learned by the network model are more robust, and the accuracy of the living body detection is improved.

Description

Living body detection method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical field of deep learning and computer vision, and can be applied to scenes such as face recognition, living body detection and the like.

Background

With the development of technologies such as electronic commerce and the like, identity verification based on a human face is widely applied, the identity verification based on the human face is mainly realized through a human face recognition technology, the human face recognition technology greatly improves the life convenience of people, and meanwhile, the safety problem is gradually exposed, for example, the identity verification is performed through printing photos, screen photos and the like which are disguised as an entity human face.

Therefore, in the face recognition technology, a face living body detection technology is required to judge whether a face image is obtained by shooting a living body face.

Disclosure of Invention

The disclosure provides a living body detection method, a living body detection device, an electronic device and a storage medium.

According to an aspect of the present disclosure, there is provided a method of living body detection, including:

acquiring a human face RGB image to be detected;

respectively carrying out color model conversion and Fourier transform on the face RGB image to obtain a YUV image and a spectrogram;

inputting the YUV image and the spectrogram into a pre-trained living body detection model, so that the living body detection model extracts YUV image features of the YUV image, extracts spectrogram features of the spectrogram, fuses the YUV image features and the spectrogram features to obtain fusion features, and performs living body detection based on the fusion features;

and obtaining a living body detection result output by the living body detection model.

According to another aspect of the present disclosure, there is provided a training method of a living body detection model, including:

acquiring a sample image and a living body detection real label of the sample image;

respectively carrying out color model conversion and Fourier transform on the sample image to obtain a YUV sample image and a sample spectrogram;

inputting the sample YUV image and the sample spectrogram into an initial network to obtain a living body detection prediction label; the initial network comprises a feature extraction layer, a pooling layer, a full connection layer and a softmax layer; the feature extraction layer includes: a YUV extraction layer and a frequency spectrum extraction layer; the YUV extraction layer is used for extracting YUV image features of the YUV image; the spectrum extraction layer is used for extracting spectrogram characteristics of the spectrogram;

calculating a loss value based on the liveness detection real tag and the liveness detection prediction tag, and adjusting a learnable parameter in the initial network based on the loss value.

According to another aspect of the present disclosure, there is provided a living body detection apparatus including:

the first acquisition module is used for acquiring a human face RGB image to be detected;

the first conversion module is used for respectively carrying out color model conversion and Fourier transform on the face RGB image to obtain a YUV image and a spectrogram;

the input module is used for inputting the YUV image and the spectrogram into a pre-trained living body detection model, so that the living body detection model extracts YUV image features of the YUV image, extracts spectrogram features of the spectrogram, fuses the YUV image features and the spectrogram features to obtain fusion features, and performs living body detection based on the fusion features;

and the obtaining module is used for obtaining the living body detection result output by the living body detection model.

According to another aspect of the present disclosure, there is provided a training apparatus for a living body detection model, including:

the second acquisition module is used for acquiring a sample image and a living body detection real label of the sample image;

the second conversion module is used for respectively carrying out color model conversion and Fourier transform on the sample image to obtain a sample YUV image and a sample spectrogram;

the prediction module is used for inputting the sample YUV image and the sample spectrogram into an initial network to obtain a living body detection prediction label; the initial network comprises a feature extraction layer, a pooling layer, a full connection layer and a softmax layer; the feature extraction layer includes: a YUV extraction layer and a frequency spectrum extraction layer; the YUV extraction layer is used for extracting YUV image features of the YUV image; the spectrum extraction layer is used for extracting spectrogram characteristics of the spectrogram;

an adjusting module for calculating a loss value based on the live detection real tag and the live detection prediction tag, and adjusting a learnable parameter in the initial network based on the loss value.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a liveness detection method.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform a liveness detection method.

According to yet another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a liveness detection method.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure.

FIG. 1 is a schematic flow chart of a method for in vivo detection provided by an embodiment of the present disclosure;

FIG. 2 is a schematic structural diagram of an in-vivo detection model provided in an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of an in-vivo detection model provided in an embodiment of the present disclosure;

FIG. 4 is a schematic flow chart of a training method for an in-vivo detection model according to an embodiment of the present disclosure;

FIG. 5 is a block diagram of an apparatus for implementing a liveness detection method of an embodiment of the present disclosure;

FIG. 6 is a block diagram of an apparatus for implementing a training method of a liveness detection model of an embodiment of the present disclosure;

FIG. 7 is a block diagram of an electronic device for implementing a liveness detection method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The human face living body detection is one of basic technologies in the human face related field, and can be applied to various scenes such as attendance checking, entrance guard passing and the like. The method has wide application in many current services.

In the current human face living body detection process, features of a human face or an RGB (red-green-blue) image of a certain region of the human face are usually extracted, and then two classifications are performed based on the image features to obtain a detection result.

However, in the GRB image, the illumination brightness information, the blur degree information, and the like are fused together, and only the RGB image is used as an input, the features learned by the network model are not robust enough, and finally the accuracy of the living body detection is low.

In order to solve the technical problem, the present disclosure provides a method and an apparatus for detecting a living body, an electronic device, and a storage medium.

In one embodiment of the present disclosure, a method for detecting a living human face is provided, the method including:

acquiring a human face RGB image to be detected;

Therefore, the YUV image with decoupled information of image illumination, brightness, chromaticity and the like is used for replacing the RGB image to perform feature learning, so that the learned features of the network model are more robust; and a spectrogram is added for feature learning, so that the network model can learn feature information related to the fuzziness (corresponding to low-frequency information in the spectrogram) and the definition (corresponding to high-frequency information in the spectrogram) of the image, and the robustness of the features is further improved. In addition, the YUV image characteristics and the frequency spectrum image characteristics are subjected to characteristic fusion, so that the characteristic reusability is improved, and the two characteristics can be mutually promoted to be learned. Finally, the accuracy of the living body detection is improved.

The living body detection method, the living body detection device, the electronic apparatus, and the storage medium provided by the embodiments of the present disclosure are described in detail below.

Referring to fig. 1, fig. 1 is a schematic flow chart of a living body detection method provided by an embodiment of the present disclosure, and as shown in fig. 1, the method may include the following steps:

s101: and acquiring a human face RGB image to be detected.

In the embodiment of the disclosure, when the human face living body detection is required, the RGB image of the human face to be detected is acquired.

The human face living body detection can be understood as follows: whether the face image is obtained by photographing a living body face is detected.

The embodiment of the present disclosure does not limit the way of acquiring the target image.

S102: and respectively carrying out color model conversion and Fourier transform on the face RGB image to obtain a YUV image and a spectrogram.

In the embodiment of the disclosure, color model conversion can be performed on the face RGB image to obtain a YUV image. YUV is a color space model, and "Y" represents brightness (Luma) and gray scale value, and "U" and "V" represent Chrominance (Chroma) and saturation, which are used to describe the color and saturation of an image for specifying the color of a pixel. Therefore, the YUV image decouples information such as image illumination, brightness, chroma and the like.

In addition, a Fast Fourier Transform (FFT) is performed on the face RGB image to obtain a discrete Fourier spectrum, which is referred to as a spectrogram for short.

S103: inputting the YUV image and the spectrogram into a pre-trained living body detection model, so that the living body detection model extracts YUV image features of the YUV image, extracts spectrogram features of the spectrogram, fuses the YUV image features and the spectrogram features to obtain fusion features, and performs living body detection based on the fusion features.

In the embodiment of the present disclosure, the living body detection model is trained in advance. The living body detection model is provided with a feature extraction layer for extracting YUV image features and a feature extraction layer for extracting spectrogram features, and the extracted YUV image features and the spectrogram features can be fused to obtain fusion features.

The nature of the in vivo detection is a classification problem, so an output layer is further arranged in the in vivo detection model, and two classifications can be realized according to the fusion characteristics to obtain the in vivo detection result.

S104: and obtaining the living body detection result output by the living body detection model.

In one embodiment of the present disclosure, a biopsy model includes: the device comprises a feature extraction layer, a pooling layer, a full connection layer and a softmax layer.

Wherein the feature extraction layer comprises: a YUV extraction layer and a frequency spectrum extraction layer; the YUV extraction layer is used for extracting YUV image features of the YUV image; the spectrum extraction layer is used for extracting spectrogram features of the spectrogram.

And the splicing characteristics of the YUV image characteristics and the spectrogram characteristics can be used as the input of the pooling layer. And after pooling, the splicing characteristics are fully connected through a full connecting layer, and finally classification is realized through a softmax layer, so that a living body detection result is obtained.

In one embodiment of the present disclosure, the YUV extraction layer includes a first YUV sub-extraction layer and a second YUV sub-extraction layer; the frequency spectrum extraction layer comprises a first frequency spectrum sub-extraction layer and a second frequency spectrum sub-extraction layer; the input characteristic of the second YUV sub-extraction layer is the splicing characteristic of the output characteristic of the first YUV sub-extraction layer and the output characteristic of the first spectrum sub-extraction layer; and the input characteristic of the pooling layer is a splicing characteristic of the output characteristic of the second YUV sub-extraction layer and the output characteristic of the second spectrum sub-extraction layer.

Specifically, in order to better fuse the YUV image features and the spectrogram features, a plurality of YUV feature sub-extraction layers (abbreviated as YUV sub-extraction layers) and a plurality of spectral feature sub-extraction layers (abbreviated as spectral sub-extraction layers) may be provided.

As an example, referring to fig. 2, fig. 2 is a schematic structural diagram of a living body detection model provided by an embodiment of the present disclosure. As shown in fig. 2, the RGB image of the human face is subjected to color model conversion and fast fourier transform, respectively, to obtain a YUV image and a spectrogram. The first YUV sub-extraction layer extracts image features of a YUV image, the first frequency spectrum sub-extraction layer extracts image features of a frequency spectrum image, then the image features extracted by the first YUV sub-extraction layer and the frequency spectrum sub-extraction layer are spliced to be used as input of a second YUV sub-extraction layer, and the second YUV sub-extraction layer further extracts YUV image features; the input of the second spectrum sub-extraction layer is the output of the first spectrum sub-extraction layer, and the second spectrum sub-extraction layer further extracts the spectrum image characteristics. And then, splicing the output features of the second YUV sub-extraction layer and the output features of the second spectrum sub-extraction layer to serve as the input features of the pooling layer. And then sequentially passing through the full connecting layer and the softmax layer to obtain a living body detection result.

Therefore, by adopting the connection relation between the YUV feature extraction layer and the spectrum feature extraction layer, the YUV image feature and the spectrum image feature can be fused well, the two features can be mutually promoted to be learned, the robustness of the learned feature of the model is improved, and the accuracy of in-vivo detection is finally improved.

In one embodiment of the present disclosure, the network structure of Resnet18 may be employed as a backbone network for feature extraction.

Specifically, the network structure of the Resnet18 includes conv1_ x (first convolutional layer), conv2_ x (second convolutional layer), conv3_ x (third convolutional layer), conv4_ x (fourth convolutional layer), conv5_ x (fifth convolutional layer), avg pooling (average pooling layer), fc (full connection layer), and softmax layer in this order.

In one embodiment of the present disclosure, the first YUV sub-extraction layer may include: a first convolutional layer, a second convolutional layer, and a third convolutional layer in a Resnet18 network; the second YUV sub-extraction layer may include: the fourth convolutional layer and the fifth convolutional layer in the Resnet18 network.

In addition, the network structure of acceptance v3 can be adopted as the spectrum sub-extraction layer. Specifically, the first spectrum sub-extraction layer comprises an acceptance v3 network and a first dimension reduction convolution layer; the second spectral sub-extraction layer comprises a two-layer acceptance v3 network and a second dimension-reduction convolutional layer.

The first dimension reduction convolution layer is used for reducing the dimension of the feature output by the acceptance v3 network, and the dimension of the feature after dimension reduction is equal to the feature output by the first YUV sub-extraction layer, so that feature splicing is facilitated.

And the second dimension reduction convolution layer is used for reducing the dimension of the features output by the double-layer input v3 network, so that the dimension of the features after dimension reduction is equal to the features output by the second YUV sub-extraction layer, and the feature splicing is facilitated.

Specifically, referring to fig. 3, fig. 3 is another schematic structural diagram of the in-vivo detection model provided in the embodiment of the present disclosure.

As shown in fig. 3, the RGB image of the human face is subjected to color model conversion and fast fourier transform, respectively, to obtain a YUV image and a spectrogram. Conv1_ x, conv2_ x and conv3_ x in the Resnet18 network are used as a first YUV sub-extraction layer to extract image features of YUV images.

And (4) extracting the image features of the spectrum image by the aid of an interception v3 network, then performing dimension reduction by connecting convolution of kernel size 5 and stride 4, and splicing the dimension-reduced features with the output features of conv3_ x to serve as the input of a second YUV sub-extraction layer.

Conv4_ x and conv5_ x in the Resnet18 network are used as a second YUV sub-extraction layer to further extract YUV image features. The convolution of the two-layer interpolation v3 network and kernel size 5, stride 4 serves as a second spectral sub-extraction layer, the input of which is the output of the first spectral sub-extraction layer, and the second spectral sub-extraction layer further extracts spectral image features.

And then, splicing the output features of the second YUV sub-extraction layer and the output features of the second spectrum sub-extraction layer to serve as the input features of the pooling layer. And then sequentially passing through the full connecting layer and the softmax layer to obtain a living body detection result.

By adopting the connection relation between the YUV feature extraction layer and the spectrum feature extraction layer, the YUV image feature and the spectrum image feature can be well fused, the two features can be mutually promoted to be learned, the robustness of the learned feature of the model is improved, and the accuracy of in-vivo detection is finally improved.

In addition, the network connection structure shown in fig. 3 is merely an example, and other network connection structures may be adopted. For example, conv1_ x and conv2_ x in the Resnet18 network serve as a first YUV sub-extraction layer; the conv3_ x, conv4_ x and conv5_ x in the Resnet18 network are used as the first YUV sub-extraction layer, which is not limited in the embodiment of the present disclosure.

In an embodiment of the present disclosure, before inputting the YUV image and the spectrogram into the pre-trained living body detection model, the method further includes: and carrying out spectrum alignment on the spectrogram.

Specifically, the spectral alignment is performed on the spectrogram to change the distribution of high-frequency and low-frequency features of the spectrum, so that the high-frequency features are focused on the center of the image.

After spectral alignment, the transformed spectrogram is used as an input of a living body detection model.

Referring to fig. 4, fig. 4 is a schematic flowchart of a training method of a biopsy model according to an embodiment of the present disclosure, as shown in fig. 4, including the following steps:

s401: acquiring a sample image and a living body detection real label of the sample image;

s402: respectively carrying out color model conversion and Fourier transform on the sample image to obtain a YUV sample image and a sample spectrogram;

s403: inputting the sample YUV image and the sample spectrogram into an initial network to obtain a living body detection prediction label; the initial network comprises a feature extraction layer, a pooling layer, a full connection layer and a softmax layer; the feature extraction layer includes: a YUV extraction layer and a frequency spectrum extraction layer; the YUV extraction layer is used for extracting YUV image characteristics of the YUV image; the frequency spectrum extraction layer is used for extracting the spectrogram characteristics of the spectrogram;

s404: a loss value is calculated based on the liveness detection real tag and the liveness detection prediction tag, and a learnable parameter in the initial network is adjusted based on the loss value.

Specifically, the sample image includes a positive sample image and a negative sample image, the positive sample is labeled as a living body, and the negative sample is labeled as a non-living body. And performing color model conversion and Fourier transform on the sample image to obtain a sample YUV image and a sample spectrogram. Inputting the sample YUV image and the sample spectrogram into an initial network to obtain a live body detection prediction label, calculating a loss value according to the live body detection prediction label and a live body detection real label, adjusting learnable parameters in the initial network based on the loss value, and finishing training when the loss value reaches a preset threshold value or the iteration times reaches preset times. The trained initial network can be used as a living body detection model.

In one embodiment of the present disclosure, the step of calculating the loss value based on the live detection real tag and the live detection prediction tag includes:

acquiring a living body detection prediction label of a sample image output by a softmax layer;

performing cross entropy operation on the live body detection real label based on the sample image and the live body detection prediction label of the sample image to obtain a loss value

Specifically, in the training process, the softmax layer outputs a classification prediction result after normalization processing, namely a living body detection prediction label, and then performs cross entropy operation by combining with a living body detection real label to obtain a loss value.

Wherein the cross entropy operation is used to determine the proximity of the actual output to the real tag.

As an example, in the training process, the classification prediction result output by the softmax layer is [ 0.7,0.3 ], where 0.7 represents the probability that the sample image is a live subject shot, and 0.3 represents the probability that the sample image is a non-live subject shot. If the sample image is a positive sample and the corresponding live body detection real label is [ 1, 0 ], cross entropy operation can be performed according to the live body detection prediction label [ 0.7,0.3 ] and the live body detection real label [ 1, 0 ], and the cross entropy operation result can measure the proximity degree of the prediction result and the real result, so that the cross entropy operation result can be used as a loss value.

Referring to fig. 5, fig. 5 is a block diagram of an apparatus for implementing a living body detecting method according to an embodiment of the present disclosure, and as shown in fig. 5, the apparatus may include:

a first obtaining module 501, configured to obtain a RGB image of a human face to be detected;

a first conversion module 502, configured to perform color model conversion and fourier transform on the face RGB image respectively to obtain a YUV image and a spectrogram;

an input module 503, configured to input the YUV image and the spectrogram into a pre-trained living body detection model, so that the living body detection model extracts a YUV image feature of the YUV image, extracts a spectrogram feature of the spectrogram, and fuses the YUV image feature and the spectrogram feature to obtain a fusion feature, and performs living body detection based on the fusion feature;

an obtaining module 504, configured to obtain a living body detection result output by the living body detection model.

In one embodiment of the present disclosure, the in-vivo detection model includes:

the device comprises a feature extraction layer, a pooling layer, a full connection layer and a softmax layer;

the feature extraction layer includes: a YUV extraction layer and a frequency spectrum extraction layer;

the YUV extraction layer is used for extracting YUV image features of the YUV image;

the spectrum extraction layer is used for extracting spectrogram characteristics of the spectrogram.

In one embodiment of the present disclosure, the YUV extraction layer includes a first YUV sub-extraction layer and a second YUV sub-extraction layer;

the spectrum extraction layer comprises a first spectrum sub-extraction layer and a second spectrum sub-extraction layer;

the input feature of the second YUV sub-extraction layer is a splicing feature of the output feature of the first YUV sub-extraction layer and the output feature of the first spectrum sub-extraction layer;

and the input characteristic of the pooling layer is a splicing characteristic of the output characteristic of the second YUV sub-extraction layer and the output characteristic of the second spectrum sub-extraction layer.

In one embodiment of the present disclosure, the first YUV sub-extraction layer includes a first convolutional layer, a second convolutional layer, and a third convolutional layer in a Resnet18 network;

the second YUV sub-extraction layer comprises a fourth convolution layer and a fifth convolution layer in a Resnet18 network;

the first spectrum sub-extraction layer comprises an acceptance v3 network and a first dimension reduction convolution layer;

the second spectral sub-extraction layer comprises a two-layer acceptance v3 network and a second dimension-reduction convolutional layer.

In one embodiment of the present disclosure, the method further includes:

and the alignment module is used for performing spectrum alignment on the spectrogram before the YUV image and the spectrogram are input into a pre-trained living body detection model.

Referring to fig. 6, fig. 6 is a block diagram of an apparatus for implementing a training method of a living body detection model according to an embodiment of the present disclosure, and as shown in fig. 6, the apparatus may include:

a second obtaining module 601, configured to obtain a sample image and a live body detection real tag of the sample image;

a second conversion module 602, configured to perform color model conversion and fourier transform on the sample image, respectively, to obtain a sample YUV image and a sample spectrogram;

the predicting module 603 is configured to input the sample YUV image and the sample spectrogram into an initial network to obtain a living body detection prediction tag; the initial network comprises a feature extraction layer, a pooling layer, a full connection layer and a softmax layer; the feature extraction layer includes: a YUV extraction layer and a frequency spectrum extraction layer; the YUV extraction layer is used for extracting YUV image features of the YUV image; the spectrum extraction layer is used for extracting spectrogram characteristics of the spectrogram;

an adjusting module 604, configured to calculate a loss value based on the live detection real tag and the live detection prediction tag, and adjust a learnable parameter in the initial network based on the loss value.

In one embodiment of the present disclosure, the adjusting module 604 may include a loss calculating module for:

acquiring a living body detection prediction label of the sample image output by the softmax layer;

and performing cross entropy operation based on the living body detection real label of the sample image and the living body detection prediction label of the sample image to obtain a loss value.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

It should be noted that the two-dimensional face image in the present embodiment is from a public data set.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

The present disclosure provides an electronic device, including:

at least one processor; and

The present disclosure provides a non-transitory computer-readable storage medium having stored thereon computer instructions for causing the computer to execute a liveness detection method.

The present disclosure provides a computer program product comprising a computer program which, when executed by a processor, implements a liveness detection method.

FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 701 executes the respective methods and processes described above, such as the living body detection method. For example, in some embodiments, the liveness detection method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM 702 and/or communications unit 709. When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the liveness detection method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the liveness detection method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of in vivo detection comprising:

acquiring a human face RGB image to be detected;

2. The method of claim 1, wherein the liveness detection model comprises:

3. The method of claim 2, wherein,

the YUV extraction layer comprises a first YUV sub-extraction layer and a second YUV sub-extraction layer;

4. The method of claim 3, wherein,

the first YUV sub-extraction layer comprises a first convolution layer, a second convolution layer and a third convolution layer in a Resnet18 network;

5. The method of claim 1, further comprising, before inputting the YUV images and the spectrogram into a pre-trained living body detection model:

and carrying out spectrum alignment on the spectrogram.

6. A method of training a living body detection model, comprising:

7. The method of claim 6, wherein the step of calculating a loss value based on the liveness detection real tag and the liveness detection predictive tag comprises:

8. A living body detection apparatus comprising:

9. The apparatus of claim 8, wherein the liveness detection model comprises:

10. The apparatus of claim 9, wherein,

11. The apparatus of claim 10, wherein,

12. The apparatus of claim 8, further comprising:

13. A training apparatus for a living body detection model, comprising:

14. The apparatus of claim 13, the adjustment module comprising a loss calculation module to:

15. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.

17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.