CN114613017A

CN114613017A - Living body detection method and related equipment

Info

Publication number: CN114613017A
Application number: CN202210283028.5A
Authority: CN
Inventors: 王柏润; 刘建博; 张帅; 伊帅
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2022-03-22
Filing date: 2022-03-22
Publication date: 2022-06-10

Abstract

The application provides a living body detection method and related equipment, wherein the method comprises the following steps: obtaining RGB characteristics, NIR characteristics, first hidden layer characteristics, second hidden layer characteristics and position codes of human faces according to the color RGB images and the near infrared NIR images of the detection object at the same time; obtaining parallax features of the color RGB image and the near infrared NIR image based on the first hidden layer feature and the second hidden layer feature; and obtaining a living body detection result of the detection object according to the RGB characteristic, the NIR characteristic, the position coding and the parallax characteristic. The embodiment of the application is favorable for improving the precision of the in-vivo detection.

Description

Living body detection method and related equipment

Technical Field

The application relates to the technical field of computer vision, in particular to a living body detection method and related equipment.

Background

With the development of deep learning, the face recognition technology is greatly improved in the aspect of recognition rate, and is widely applied to security systems, bank systems and daily life. The face recognition generally comprises live body detection of a front end and feature comparison of a background, the front end can adopt interactive live body detection and silent live body detection in the aspect of live body detection, and the traditional silent live body detection obtains a live body detection result by classifying image features, but the mode has weak defense on 3D (3-dimensional) attacks, and the detection precision still needs to be improved.

Disclosure of Invention

The embodiment of the application provides a living body detection method and related equipment, which are beneficial to improving the precision of living body detection, so that the defense to 3D attack is improved.

In a first aspect, an embodiment of the present application provides a method for detecting a living body, including:

obtaining RGB characteristics, NIR characteristics, first hidden layer characteristics, second hidden layer characteristics and position codes of human faces according to a color RGB image and a near infrared NIR image of a detection object at the same time;

obtaining parallax features of the color RGB image and the near infrared NIR image based on the first hidden layer feature and the second hidden layer feature;

and obtaining the living body detection result of the detection object according to the RGB characteristic, the NIR characteristic, the position coding and the parallax characteristic.

In the embodiment of the application, the RGB characteristics, the NIR characteristics, the first hidden layer characteristics, the second hidden layer characteristics and the position codes of the human faces are obtained according to the color RGB images and the near-infrared NIR images of the detection object at the same time; obtaining parallax features of the color RGB image and the near infrared NIR image based on the first hidden layer feature and the second hidden layer feature; and obtaining a living body detection result of the detection object according to the RGB characteristic, the NIR characteristic, the position coding and the parallax characteristic. On the basis of the RGB characteristic and the NIR characteristic, parallax information obtained based on the hidden layer characteristic and position codes of human faces in the two images are added for living body prediction, the position codes show the relative position relation of the human faces in the two images to a certain extent, and the parallax information and the position codes are used for assisting classification, so that the accuracy of living body detection is improved.

With reference to the first aspect, in a possible implementation manner, obtaining a parallax feature of a color RGB image and a near-infrared NIR image based on a first hidden layer feature and a second hidden layer feature includes:

splicing the feature corresponding to each position in the first hidden layer feature with the feature corresponding to each position in the second hidden layer feature to obtain a spliced feature;

performing at least one linear transformation and activation on the splicing characteristics, and performing linear transformation on the characteristics obtained by the at least one linear transformation and activation to obtain an attention matrix;

converting the attention moment array into a target feature, and determining the target feature as a parallax feature; the size of the target feature is the same as the size of the color RGB image, near infrared NIR image.

It can be seen that, in this embodiment, an attention matrix between the first hidden layer feature and the second hidden layer feature is calculated, where the attention matrix implicitly includes a similarity between features at corresponding positions in the first hidden layer feature and the second hidden layer feature, and the similarity implicitly expresses a parallax between a color RGB image and a near-infrared NIR image, and the attention matrix is converted into a target feature, where the target feature can represent the parallax feature, so as to achieve the purpose of using parallax information to assist in living body detection.

With reference to the first aspect, in a possible implementation manner, obtaining a position code of a human face according to a color RGB image and a near infrared NIR image of a detection object at the same time includes:

acquiring first detection frame position information of a face of a detection object in a color RGB image and second detection frame position information of the face of the detection object in a near-infrared NIR image;

and carrying out sine position coding on the position information of the first detection frame and the position information of the second detection frame to obtain position codes.

It can be seen that in this embodiment, the sinusoidal position coding is adopted to obtain the position codes of the faces in the two images, the coding can implicitly express the relative position information of the two faces, and the relative position information can be used as a coarse estimation of the parallax, which is beneficial to realizing that the living body detection is assisted by adopting another parallax information.

With reference to the first aspect, in a possible implementation manner, before obtaining RGB features, NIR features, first hidden layer features, second hidden layer features, and position codes of human faces according to a color RGB image and a near-infrared NIR image of a detection object at the same time, the method further includes:

carrying out face detection on the color RGB image to obtain position information of a first detection frame;

acquiring first depth information of a color RGB image;

obtaining a fourth detection frame based on the first depth information and the first detection frame position information;

for a plurality of near-infrared NIR images to be matched, carrying out face detection on each near-infrared NIR image in the plurality of near-infrared NIR images to obtain third detection frame position information of a face in each near-infrared NIR image;

acquiring second depth information of each Near Infrared (NIR) image;

obtaining a fifth detection frame based on the second depth information and the third detection frame position information;

and matching the face to be matched in the fifth detection frame with the face of the detection object contained in the fourth detection frame, and determining the near infrared NIR image from the plurality of near infrared NIR images.

It can be seen that, in this embodiment, based on the depth information and the position information of the detection frame, the windows to be matched for the human face (i.e., the fourth detection frame and the fifth detection frame) are obtained, the human faces in the two matching windows are matched, and the near-infrared image where the matched human face is located is determined as the near-infrared image synchronously acquired with the RGB image, which is beneficial to improving the matching efficiency.

With reference to the first aspect, in one possible implementation, the acquiring the color RGB image by an RGB camera of a binocular camera, the first detection frame position information includes a width and a height of the first detection frame, and the deriving the fourth detection frame based on the first depth information and the first detection frame position information includes:

calculating to obtain an updated width by adopting the first depth information, the width of the first detection frame and the focal length of the RGB camera;

calculating to obtain an updated height by adopting the first depth information, the height of the first detection frame and the focal length of the RGB camera;

and determining a rectangular frame obtained based on the updated width and the updated height as a fourth detection frame.

It can be seen that, in the embodiment, the face detection frame is adjusted by using the calibration parameters and the depth information of the camera to reduce the features used in matching, that is, only part of the face features are used for matching, so that the calculation amount is reduced, and the matching efficiency is improved.

With reference to the first aspect, in one possible implementation manner, obtaining a living body detection result of a detection object according to RGB features, NIR features, position coding, and parallax features includes:

interpolating the position codes to obtain position characteristics with the same size as the RGB characteristics, the NIR characteristics and the parallax characteristics;

and splicing the RGB characteristics, the NIR characteristics, the parallax characteristics and the position characteristics, and classifying the spliced characteristics to obtain a living body detection result.

It can be seen that, in the embodiment, the features (namely, RGB features and NIR features) containing rich semantic information, the parallax features and the position features are spliced, and the splicing features fused with the parallax information and the position information are used to assist in vivo detection, which is beneficial to improving the accuracy of the in vivo detection.

With reference to the first aspect, in one possible implementation, the method is performed by a pre-trained living body detection model, where the living body detection model includes a first neural network branch, a second neural network branch, an attention network branch, a position coding branch and a classifier, the first neural network branch and the second neural network branch are respectively connected to the attention network branch, an output of the attention network branch is used as an input of the multi-layer perceptron, and outputs of the first neural network branch, the second neural network branch, the multi-layer perceptron and the position coding branch are used as inputs of the classifier after being spliced.

It can be seen that, in the embodiment, the in-vivo detection method is executed by using the pre-trained in-vivo detection model, and the in-vivo detection model can splice the features including rich semantic information, the parallax features and the position features, and then performs in-vivo detection by using the spliced features, so that the parallax information is used for assisting in-vivo detection, and the accuracy of in-vivo detection is improved.

A second aspect of the embodiments of the present application provides a living body detecting apparatus including a first processing unit, a second processing unit, and a living body detecting unit;

the first processing unit is used for obtaining RGB characteristics, NIR characteristics, first hidden layer characteristics, second hidden layer characteristics and position codes of human faces according to the color RGB images and the near-infrared NIR images of the detection objects at the same time;

the second processing unit is used for obtaining parallax features of the color RGB image and the near infrared NIR image based on the first hidden layer feature and the second hidden layer feature;

and the living body detection unit is used for obtaining a living body detection result of the detection object according to the RGB characteristic, the NIR characteristic, the position code and the parallax characteristic.

A third aspect of embodiments of the present application provides an electronic device, which includes an input device, an output device, and a processor, and is adapted to implement one or more instructions; and a memory storing one or more computer programs adapted to be loaded by the processor and to perform the steps of the method according to the first aspect.

A fourth aspect of embodiments of the present application provides a computer storage medium storing one or more instructions adapted to be loaded by a processor and execute the steps of the method according to the first aspect.

A fifth aspect of embodiments of the present application provides a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform a method according to the first aspect.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic diagram of an application environment provided in an embodiment of the present application;

FIG. 2 is a schematic flow chart of a method for detecting a living body according to an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of a living body detection model according to an embodiment of the present disclosure;

FIG. 4 is a schematic structural diagram of another in-vivo detection model provided in an embodiment of the present application;

FIG. 5 is a schematic flow chart of another in-vivo detection method provided in the embodiments of the present application;

FIG. 6 is a schematic structural diagram of a living body detecting apparatus according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "comprising" and "having," and any variations thereof, as appearing in the specification, claims and drawings of this application, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. Furthermore, the terms "first," "second," and "third," etc. are used to distinguish between different objects and are not used to describe a particular order.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by a person skilled in the art that the embodiments described herein can be combined with other embodiments.

Referring to fig. 1, fig. 1 is a schematic view of an application environment provided in an embodiment of the present application, where the application environment includes a binocular camera, an electronic device, and a database, where the two components are communicatively connected through a network. Specifically, the binocular camera includes a color RGB (Red, Green, Blue, Red, Green, Blue) camera module and a near infrared NIR (near infrared radiation) camera module, and the color RGB camera module and the near infrared NIR camera module respectively perform image acquisition on the same detection object at the same time to obtain a color RGB image and a near infrared NIR image of the detection object, and transmit the color RGB image and the near infrared NIR image to the electronic device through a network. After receiving the two images, the electronic device executes the living body detection method provided by the embodiment of the application, performs feature extraction on the two images, obtains RGB features and first hidden layer features based on the color RGB images, obtains NIR features and second hidden layer features based on the near-infrared NIR images, encodes position information of detection frames of faces in the two images to obtain codes of the faces of the detection objects, then obtains NIR parallax features of the color RGB images and the near-infrared NIR images based on the first hidden layer features and the second hidden layer features, splices the RGB features, the NIR features, the position codes and the parallax features, performs living body detection on the spliced features, and obtains a living body detection result of the detection objects. For example, the binocular camera may further store all the acquired color RGB images and the near-infrared NIR images in the database, and when the electronic device needs to perform living body detection on a certain detection object, the electronic device first acquires the color RGB images of the detection object from the database, then matches an infrared NIR image acquired by image acquisition on the detection object at the same time as the color RGB images from a plurality of near-infrared NIR images, and then executes the living body detection method provided by the embodiment of the present application.

For example, the electronic device may be an independent physical server, a server cluster, a cloud server providing basic cloud computing services such as cloud services, a cloud database, cloud computing, cloud functions, cloud storage, web services, cloud communication, middleware services, domain name services, security services, a big data and artificial intelligence platform, a control host connected to a binocular camera, a self-service settlement terminal, a face gate, and the like.

Referring to fig. 2, fig. 2 is a schematic flowchart of a method for detecting a living body according to an embodiment of the present application, which can be implemented based on the application environment shown in fig. 1 and applied to an electronic device, as shown in fig. 2, the method includes steps 201 and 203:

201: and obtaining RGB characteristics, NIR characteristics, first hidden layer characteristics, second hidden layer characteristics and position codes of the human face according to the color RGB image and the near infrared NIR image of the detection object at the same time.

In an embodiment of the present application, a living body detection model is provided, as shown in fig. 3, the living body detection model includes a first neural network branch, a second neural network branch, an attention network branch, a position coding branch, and a classifier, the first neural network branch and the second neural network branch are respectively connected to the attention network branch, an output of the attention network branch is used as an input of a Multilayer Perceptron (MLP), and outputs of the first neural network branch, the second neural network branch, the Multilayer Perceptron, and the position coding branch are used as an input of the classifier after being spliced.

The first neural network branch and the second neural network branch adopt a plurality of inception structures which are connected in series continuously to carry out feature extraction, as shown in fig. 3, a color RGB image is input into the first neural network branch, is processed by the plurality of inception structures, and RGB features are output by an output layer of the first neural network branch; the near infrared NIR image is input into a first neural network branch, processed by a plurality of initiation structures, and the NIR characteristic is output by an output layer of the first neural network branch. The entrapment structure adopts convolution kernels with different sizes to mean reception fields with different sizes, and fusion of features with different scales is realized, so that the RGB features and the NIR features have abundant semantic information such as material, luster and/or texture. Specifically, a feature output by any hidden layer in the first neural network branch is used as a first hidden layer feature, and a feature output by a hidden layer corresponding to the any hidden layer in the second neural network branch is used as a second hidden layer feature, that is, the sizes of the first hidden layer feature and the second hidden layer feature are the same.

Illustratively, the obtaining of the position code of the human face according to the color RGB image and the near infrared NIR image of the detection object at the same time includes:

Specifically, the first detection frame position information includes coordinates (x1, y1) of a first detection frame for detecting a face of a subject in a color RGB image, a width w1, and a height h1, and the second detection frame position information includes coordinates (x2, y2) of a face of a subject in a near infrared NIR image, a width w2, and a height h 2. The coordinates of the detection frame may be coordinates of a center point of the detection frame, coordinates of an upper left corner or an lower right corner of the detection frame, or a combination thereof, which is not limited herein. The electronic device takes the first detection frame position information and the second detection frame information as an input sequence (x1, y1, w1, h1, x2, y2, w2 and h2), inputs the sequence into the position coding branch in fig. 3, and alternately codes the sequence by adopting a sine function and a cosine function through the position coding branch to obtain the position codes (namely sine position codes) of the human faces in the two images. In the embodiment, the position codes of the faces in the two images are obtained by adopting sinusoidal position coding, the relative position information of the two faces can be implicitly expressed by the coding, and the relative position information can be used as coarse estimation of parallax, which is beneficial to realizing that the living body detection is assisted by adopting another parallax information.

202: and obtaining parallax features of the color RGB image and the near infrared NIR image based on the first hidden layer feature and the second hidden layer feature.

In the embodiment of the application, obtaining the parallax feature of the color RGB image and the near-infrared NIR image based on the first hidden layer feature and the second hidden layer feature includes:

Specifically, the first hidden layer features and the second hidden layer features have the same size, for example: c28 × 28, where C denotes a channel, and 28 × 28 denotes a width and a height, the electronic device may input the first hidden layer feature and the second hidden layer feature into the attention network branch in fig. 3, and splice the feature at each position in the first hidden layer feature with the feature at each position in the second hidden layer feature through the attention network branch, that is, the feature at any position in the first hidden layer feature is spliced with (28 × 28) features in the second hidden layer feature, and the splicing manner may be multiplication, addition, stacking, or the like. The attention network branch processes the splicing characteristics in a linear transformation-activation-linear transformation mode to convert the splicing characteristics into an attention matrix, such as: the size of the attention matrix may be (28 × 28) (28 × 28). Because the matrix can not directly participate in convolution operation, the electronic equipment inputs the attention moment array into the multilayer perceptron, and the multilayer perceptron converts the attention moment array into target characteristics, wherein the size of the target characteristics is the same as that of the color RGB image and the near infrared NIR image. Since the attention moment matrix can be understood as an attention matrix of the first hidden layer feature relative to the second hidden layer feature or an attention matrix of the second hidden layer feature relative to the first hidden layer feature, and the relative relationship implicitly expresses the similarity or difference of the features of the corresponding positions in the first hidden layer feature and the second hidden layer feature, the target feature can be used as a parallax feature of a color RGB image and a near-infrared NIR image.

In this embodiment, an attention matrix between the first hidden layer feature and the second hidden layer feature is calculated, where the attention matrix implicitly includes a similarity between features at corresponding positions in the first hidden layer feature and the second hidden layer feature, and the similarity implicitly expresses a parallax between a color RGB image and a near-infrared NIR image, and the attention matrix is converted into a target feature, which can represent the parallax feature, so as to achieve the purpose of using parallax information to assist in living body detection.

203: and obtaining a living body detection result of the detection object according to the RGB characteristic, the NIR characteristic, the position coding and the parallax characteristic.

In the embodiment of the application, because the size of the position code is different from the RGB feature, the NIR feature, and the parallax feature, the electronic device may interpolate the position code to obtain the position feature having the same size as the RGB feature, the NIR feature, and the parallax feature, as shown in fig. 3, the electronic device concatenates the RGB feature, the NIR feature, the parallax feature, and the position feature, and inputs the concatenated features into the classifier for classification prediction to obtain the living body detection result of the detection object. In the embodiment, the features (namely RGB features and NIR features) containing rich semantic information, the parallax features and the position features are spliced, and the splicing features fused with the parallax information and the position information are utilized to assist in-vivo detection, so that the in-vivo detection precision is improved.

Illustratively, before the RGB features, the NIR features, the first hidden layer features, the second hidden layer features, and the position codes of the human faces are obtained according to the color RGB image and the near-infrared NIR image of the detection object at the same time, the method further includes:

a: and carrying out face detection on the color RGB image to obtain the position information of the first detection frame.

Specifically, as shown in fig. 4, the living body detection model further includes a face detection branch, and the face detection branch performs face detection on the color RGB image through a YOLO (young Only Look one) network to obtain position information of the first detection frame.

B: first depth information of a color RGB image is acquired.

Illustratively, the parallax characteristic can be used as input, parallax information a of the color RGB image relative to the near infrared NIR image is obtained by adopting a pre-trained residual network prediction, and then first depth information of the color RGB image is obtained by adopting the parallax information a and calibration parameters of a camera. For example, the electronic device may further calculate the first depth information of the color RGB image by using the following formula:

depth information is the amount of data of the image/image size;

for example, the electronic device may further calculate depth information a of the color RGB image using the parallax information a, calculate depth information b of the color RGB image using the formula, and use an average, a weighted sum, or a weighted average of the depth information a and the depth information b as the first depth information.

C: and obtaining a fourth detection frame based on the first depth information and the first detection frame position information.

For example, in step C, obtaining a fourth detection frame based on the first depth information and the first detection frame position information includes:

Specifically, if the width of the first detection frame is W1 and the height of the first detection frame is H1, the width W1 and the height H1 of the fourth detection frame can be calculated by using the following formulas:

wherein z1 represents the first depth information, f represents the focal length of the binocular camera, the electronic device keeps the center of the first detection frame still, and the first detection frame is zoomed according to the width W1 and the height H1, so as to obtain the fourth detection frame, that is, the fourth detection frame is the detection frame obtained by zooming out the first detection frame. In the embodiment, the face detection frame is adjusted by using the calibration parameters and the depth information of the camera to reduce the used characteristics during matching, namely, only part of face characteristics are used for matching, so that the calculated amount is reduced, and the matching efficiency is improved.

D: and for a plurality of near-infrared NIR images to be matched, carrying out face detection on each near-infrared NIR image in the plurality of near-infrared NIR images to obtain third detection frame position information of a face in each near-infrared NIR image.

Specifically, the plurality of near-infrared NIR images include a near-infrared NIR image of the detection object, and the electronic device performs face detection on the plurality of near-infrared NIR images through the face detection branch to obtain third detection frame position information of the face in each near-infrared NIR image.

E: second depth information is acquired for each Near Infrared (NIR) image.

Specifically, the second depth information of each near-infrared NIR image can also be calculated by using the following formula:

depth information is the amount of data of the image/image size;

f: and obtaining a fifth detection frame based on the second depth information and the third detection frame position information.

Specifically, if the width of the third detection frame is W3 and the height of the third detection frame is H3, the width W2 and the height H2 of the fifth detection frame can be calculated by using the following formulas:

where z2 represents the second depth information. It should be understood that, in the same manner as the fourth detection frame, the center of the third detection frame is kept still, and the third detection frame is scaled according to the width W2 and the height H2, so as to obtain the fifth detection frame, that is, the fifth detection frame is the detection frame after the third detection frame is scaled down. G: and matching the face to be matched in the fifth detection frame with the face of the detection object contained in the fourth detection frame, and determining the near infrared NIR image from the plurality of near infrared NIR images.

Specifically, because the fourth detection frame only includes a part of the face of the detection object, and the fifth detection frame only includes a part of the face to be matched of the object stored in the database, feature extraction is performed on the face to be matched in the fifth detection frame, feature extraction is performed on the face in the fourth detection frame, then similarity is calculated, and a near infrared NIR image with the highest similarity to the face in the fourth detection frame can be obtained, and the near infrared NIR image is used as the near infrared NIR image acquired from the detection object at the same time as the color RGB image.

In the embodiment, based on the depth information and the position information of the detection frame, the windows to be matched with the human face (i.e., the fourth detection frame and the fifth detection frame) are obtained, the human faces in the two matching windows are matched, and the near-infrared image where the matched human face is located is determined to be the near-infrared image synchronously acquired with the RGB image, so that the matching efficiency is improved.

Referring to fig. 5, fig. 5 is a schematic flow chart of another in-vivo detection method according to an embodiment of the present application, as shown in fig. 5, the method includes steps 501-505:

501: obtaining RGB characteristics, NIR characteristics, first hidden layer characteristics, second hidden layer characteristics and position codes of human faces according to the color RGB images and the near infrared NIR images of the detection object at the same time;

502: splicing the feature corresponding to each position in the first hidden layer feature with the feature corresponding to each position in the second hidden layer feature to obtain a spliced feature;

503: performing at least one linear transformation and activation on the splicing characteristics, and performing linear transformation on the characteristics obtained by the at least one linear transformation and activation to obtain an attention matrix;

504: converting the attention moment array into a target feature, and determining the target feature as a parallax feature;

505: and obtaining a living body detection result of the detection object according to the RGB characteristic, the NIR characteristic, the position coding and the parallax characteristic.

The specific implementation of steps 501-505 has been described in the embodiment shown in fig. 2, and can achieve the same or similar beneficial effects, and will not be described herein again.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a living body detecting apparatus according to an embodiment of the present disclosure, as shown in fig. 6, the apparatus includes a first processing unit 601, a second processing unit 602, and a living body detecting unit 603;

the first processing unit 601 is configured to obtain RGB features, NIR features, first hidden layer features, second hidden layer features, and position codes of faces according to color RGB images and near-infrared NIR images of the detection object at the same time;

a second processing unit 602, configured to obtain a parallax feature of the color RGB image and the near-infrared NIR image based on the first hidden layer feature and the second hidden layer feature;

and a living body detection unit 603 for obtaining a living body detection result of the detection object according to the RGB feature, the NIR feature, the position code, and the parallax feature.

It can be seen that in the apparatus shown in fig. 6, RGB features, NIR features, first hidden layer features, second hidden layer features, and position codes of faces are obtained according to a color RGB image and a near-infrared NIR image of a detection object at the same time; obtaining parallax features of the color RGB image and the near infrared NIR image based on the first hidden layer feature and the second hidden layer feature; and obtaining the living body detection result of the detection object according to the RGB characteristic, the NIR characteristic, the position coding and the parallax characteristic. On the basis of the RGB characteristic and the NIR characteristic, parallax information obtained based on the hidden layer characteristic and position codes of human faces in the two images are added for living body prediction, the position codes show the relative position relation of the human faces in the two images to a certain extent, and the parallax information and the position codes are used for assisting classification, so that the accuracy of living body detection is improved.

In a possible embodiment, in terms of obtaining the parallax feature of the color RGB image and the near-infrared NIR image based on the first hidden layer feature and the second hidden layer feature, the second processing unit 602 is specifically configured to:

In a possible embodiment, in terms of obtaining the position code of the human face according to the color RGB image and the near-infrared NIR image of the detection object at the same time, the first processing unit 601 is specifically configured to:

In a possible implementation, the first processing unit 601 is further configured to:

acquiring first depth information of a color RGB image;

acquiring second depth information of each Near Infrared (NIR) image;

In one possible embodiment, the color RGB image is captured by an RGB camera in a binocular camera, the first detection frame position information includes a width and a height of the first detection frame, and in terms of obtaining the fourth detection frame based on the first depth information and the first detection frame position information, the first processing unit 601 is specifically configured to:

In one possible embodiment, in terms of obtaining the living body detection result of the detection object according to the RGB feature, the NIR feature, the position code, and the parallax feature, the living body detection unit 603 is specifically configured to:

In one possible embodiment, the first processing unit 601, the second processing unit 602, and the living body detecting unit 603 may implement their functional functions through a pre-trained living body detecting model, where the living body detecting model includes a first neural network branch, a second neural network branch, an attention network branch, a position coding branch, and a classifier, the first neural network branch and the second neural network branch are respectively connected to the attention network branch, an output of the attention network branch is used as an input of the multi-layer perceptron, and outputs of the first neural network branch, the second neural network branch, the multi-layer perceptron, and the position coding branch are used as inputs of the classifier after being spliced.

According to an embodiment of the present application, the units in the biopsy device shown in fig. 6 may be respectively or entirely combined into one or several other units to form the biopsy device, or some of the units may be further split into multiple functionally smaller units to form the biopsy device, which may achieve the same operation without affecting the achievement of the technical effects of the embodiment of the present application. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the living body detecting apparatus may include other units, and in practical applications, these functions may be implemented by assistance of other units, and may be implemented by cooperation of a plurality of units.

According to another embodiment of the present application, the living body detecting apparatus shown in fig. 6 may be constructed by running a computer program (including program codes) capable of executing steps involved in the corresponding method shown in fig. 2 or fig. 5 on a general-purpose computing device such as a computer including a Processing element such as a Central Processing Unit (CPU), a Random Access Memory (RAM), a Read-Only Memory (ROM), and a storage element, and implementing the living body detecting method of the embodiment of the present application. The computer program may be recorded on a computer-readable recording medium, for example, and loaded and executed in the above-described computing apparatus via the computer-readable recording medium.

Based on the description of the method embodiment and the apparatus embodiment, an electronic device is provided in the embodiments of the present application, and please refer to fig. 7, the electronic device at least includes a processor 701, an input device 702, an output device 703, and a computer storage medium 704. The processor 701, the input device 702, the output device 703, and the computer storage medium 704 within the electronic device may be connected by a bus or other means.

A computer storage medium 704 may be stored in the memory of the electronic device, the computer storage medium 704 being used for storing a computer program comprising program instructions, the processor 701 being used for executing the program instructions stored by the computer storage medium 704. The processor 701 (or central processing unit) is a computing core and a control core of an electronic device, and is adapted to implement one or more instructions, and in particular, to load and execute the one or more instructions so as to implement a corresponding method flow or a corresponding function.

In one embodiment, the processor 701 of the electronic device provided by the embodiment of the present application may be used to perform a series of processes of living body detection:

obtaining RGB characteristics, NIR characteristics, first hidden layer characteristics, second hidden layer characteristics and position codes of human faces according to the color RGB images and the near infrared NIR images of the detection object at the same time;

and obtaining a living body detection result of the detection object according to the RGB characteristic, the NIR characteristic, the position coding and the parallax characteristic.

It can be seen that, in the electronic device shown in fig. 7, the RGB feature, the NIR feature, the first hidden layer feature, the second hidden layer feature, and the position code of the face are obtained according to the color RGB image and the near-infrared NIR image of the detection object at the same time; obtaining parallax features of the color RGB image and the near infrared NIR image based on the first hidden layer feature and the second hidden layer feature; and obtaining a living body detection result of the detection object according to the RGB characteristic, the NIR characteristic, the position coding and the parallax characteristic. On the basis of the RGB characteristic and the NIR characteristic, parallax information obtained based on the hidden layer characteristic and position codes of human faces in the two images are added for living body prediction, the position codes show the relative position relation of the human faces in the two images to a certain extent, and the parallax information and the position codes are used for assisting classification, so that the accuracy of living body detection is improved.

In another embodiment, the processor 701 performs the parallax feature of the color RGB image and the near-infrared NIR image based on the first hidden layer feature and the second hidden layer feature, including:

In another embodiment, the processor 701 performs the obtaining of the position code of the human face according to the color RGB image and the near infrared NIR image of the detection object at the same time, including:

In another embodiment, before obtaining RGB features, NIR features, first hidden layer features, second hidden layer features, and position codes of a human face according to a color RGB image and a near-infrared NIR image of a detection object at the same time, the processor 701 is further configured to:

acquiring first depth information of a color RGB image;

acquiring second depth information of each near infrared NIR image;

In another embodiment, the color RGB image is captured by an RGB camera of a binocular camera, the first detection frame position information includes a width and a height of the first detection frame, and the processor 701 performs a process of obtaining a fourth detection frame based on the first depth information and the first detection frame position information, including:

In another embodiment, the processor 701 performs the living body detection result of the detection object according to the RGB feature, the NIR feature, the position code, and the parallax feature, including:

In still another embodiment, the processor 701 may execute some or all of the steps in the living body detection method through a pre-trained living body detection model, where the living body detection model includes a first neural network branch, a second neural network branch, an attention network branch, a position coding branch and a classifier, the first neural network branch and the second neural network branch are respectively connected to the attention network branch, an output of the attention network branch is used as an input of the multi-layer perceptron, and outputs of the first neural network branch, the second neural network branch, the multi-layer perceptron and the position coding branch are used as inputs of the classifier after being spliced.

Illustratively, the electronic device may include, but is not limited to, a processor 701, an input device 702, an output device 703, and a computer storage medium 704, the input device 702 may be a keyboard, a touch screen, etc., and the output device 703 may be a speaker, a display, a radio frequency transmitter, etc. It will be appreciated by those skilled in the art that the schematic diagrams are merely examples of an electronic device and are not limiting of an electronic device and may include more or fewer components than those shown, or some components in combination, or different components.

It should be noted that, since the steps in the living body detection method according to the embodiment of the present application are implemented when the processor 701 of the electronic device executes the computer program, all the embodiments of the living body detection method are applicable to the electronic device, and all the same or similar beneficial effects can be achieved.

An embodiment of the present application provides a computer storage medium (Memory), which is a Memory device in an electronic device and is used for storing programs and data. It is understood that the computer storage medium herein may include a built-in storage medium in the terminal, and may also include an extended storage medium supported by the terminal. The computer storage medium provides a storage space that stores an operating system of the terminal. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by processor 701. The computer storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory; alternatively, it may be at least one computer storage medium located remotely from the processor 701. In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by processor 701 to perform the corresponding steps described above with respect to the liveness detection method.

Illustratively, the computer program of the computer storage medium includes computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer memory, ROM, RAM, electrical carrier wave signals, telecommunications signals, software distribution media, and the like.

An embodiment of the present application further provides a computer program product, where the computer program product includes a computer program operable to cause a computer to execute the steps in the living body detecting method. The computer program product may be a software installation package.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, the specific implementation manner and the application scope may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method of in vivo detection, the method comprising:

obtaining parallax features of the color RGB image and the near-infrared NIR image based on the first hidden layer features and the second hidden layer features;

and obtaining a living body detection result of the detection object according to the RGB characteristic, the NIR characteristic, the position code and the parallax characteristic.

2. The method of claim 1, wherein the deriving the parallax feature of the color RGB image and the near infrared NIR image based on the first hidden layer feature and the second hidden layer feature comprises:

converting the attention moment array into a target feature, and determining the target feature as the parallax feature; the size of the target feature is the same as the size of the color RGB image, the near infrared NIR image.

3. The method according to claim 1 or 2, wherein the obtaining of the position code of the human face according to the color RGB image and the near infrared NIR image of the detected object at the same time comprises:

acquiring first detection frame position information of the face of the detection object in the color RGB image and second detection frame position information of the face of the detection object in the near-infrared NIR image;

and carrying out sinusoidal position coding on the position information of the first detection frame and the position information of the second detection frame to obtain the position codes.

4. The method of claim 3, wherein before obtaining the RGB features, the NIR features, the first hidden layer features, the second hidden layer features and the position codes of the human faces according to the color RGB image and the near infrared NIR image of the detected object at the same time, the method further comprises:

carrying out face detection on the color RGB image to obtain position information of the first detection frame;

acquiring first depth information of the color RGB image;

for a plurality of near infrared NIR images to be matched, carrying out face detection on each near infrared NIR image in the plurality of near infrared NIR images to obtain third detection frame position information of a face in each near infrared NIR image;

acquiring second depth information of each Near Infrared (NIR) image;

matching the face to be matched in the fifth detection frame with the face of the detection object contained in the fourth detection frame, and determining the near infrared NIR image from the plurality of near infrared NIR images.

5. The method of claim 4, wherein the color RGB image is captured by an RGB camera of a binocular camera, the first detection frame position information includes a width and a height of a first detection frame, and the deriving a fourth detection frame based on the first depth information and the first detection frame position information includes:

determining a rectangular frame obtained based on the updated width and the updated height as the fourth detection frame.

6. The method according to any one of claims 1 to 5, wherein obtaining the in-vivo detection result of the detection object according to the RGB features, the NIR features, the position code and the parallax features comprises:

interpolating the position codes to obtain position features with the same size as the RGB features, the NIR features and the parallax features;

and splicing the RGB characteristic, the NIR characteristic, the parallax characteristic and the position characteristic, and classifying the spliced characteristics to obtain the in-vivo detection result.

7. The method according to any one of claims 1-6, wherein the method is performed by a pre-trained in-vivo detection model, the in-vivo detection model comprises a first neural network branch, a second neural network branch, an attention network branch, a position coding branch and a classifier, the first neural network branch and the second neural network branch are respectively connected with the attention network branch, an output of the attention network branch is used as an input of a multi-layer perceptron, and outputs of the first neural network branch, the second neural network branch, the multi-layer perceptron and the position coding branch are used as an input of the classifier after being spliced.

8. A living body detecting apparatus, characterized in that the apparatus comprises a first processing unit, a second processing unit and a living body detecting unit;

the second processing unit is configured to obtain a parallax feature of the color RGB image and the near-infrared NIR image based on the first hidden layer feature and the second hidden layer feature;

the living body detection unit is used for obtaining a living body detection result of the detection object according to the RGB characteristic, the NIR characteristic, the position code and the parallax characteristic.

9. An electronic device comprising an input device and an output device, further comprising:

a processor adapted to implement one or more instructions; and the number of the first and second groups,

a computer storage medium having one or more instructions stored thereon, the one or more instructions adapted to be loaded by the processor and to perform the method of any of claims 1-7.

10. A computer storage medium having one or more instructions stored thereon, the one or more instructions adapted to be loaded by a processor and to perform the method of any of claims 1-7.