CN117011918B

CN117011918B - Method for constructing human face living body detection model based on linear attention mechanism

Info

Publication number: CN117011918B
Application number: CN202310992389.1A
Authority: CN
Inventors: 田坤; 朱益良; 王健伟; 张忠宇; 王宇达; 张威; 刘叶轩
Original assignee: Nanjing Institute of Technology
Current assignee: Nanjing Institute of Technology
Priority date: 2023-08-08
Filing date: 2023-08-08
Publication date: 2024-03-26
Anticipated expiration: 2043-08-08
Also published as: CN117011918A

Abstract

The invention discloses a human face living body detection method based on a linear attention mechanismThe construction method of the model comprises the steps of extracting a face image containing a face from a data set, and preprocessing the data; constructing a basic model of a feature extraction face image based on a convolutional neural network to obtain a feature map; and constructing a channel attention layer and a position attention layer to form a complete feature extraction network, and carrying out feature fusion on the feature map through the feature extraction network to obtain a progressive feature map. The invention carries out linear optimization on the soft maximization function based on the classical point multiplication attention mechanism, and changes the multiplication sequence of matrix factors based on the matrix multiplication combination law so as to lead the original complexity to be O (N) ² ) The method is reduced to O (N), so that the construction of the human face living body detection model of the linear attention mechanism can effectively reduce the computational complexity on the premise of ensuring the recognition performance.

Description

Method for constructing human face living body detection model based on linear attention mechanism

Technical Field

The invention relates to the technical field of living body detection, in particular to a method for constructing a human face living body detection model based on a linear attention mechanism.

Background

Along with the progress of artificial intelligence and face recognition technology, the importance of face living detection in a face recognition system is increasingly prominent, however, the existing face living detection method has some problems, such as poor user experience, high complexity, strong dependence and the like, and a new face living detection method needs to be pursued to solve the problems; currently, mainstream face living body detection methods can be classified into a method requiring the use of auxiliary information and a method not requiring the use of auxiliary information; the former requires the user to make specific action feedback, and the result is reliable, but the user experience is poor and the efficiency is low; the latter accords with the future development trend, and only the face image under the visible light is used for detection;

however, when the parameter amount is huge, the existing deep learning method is slow in detection speed and low in precision, in order to solve the problems, a scheme construction model of a dual-attention mechanism network is generally introduced, so that complex and various scenes can be processed efficiently, the space and channel dependency relationship in the feature map is captured through a self-attention mechanism, the feature representation is further enhanced, but the problems of high computational complexity and low computational precision exist in the introduction of the dual-attention mechanism network, and improvement is needed;

therefore, it is necessary to invent a method for constructing a human face living body detection model based on a linear attention mechanism to solve the above-mentioned problems.

Disclosure of Invention

The invention aims to provide a method for constructing a human face living body detection model based on a linear attention mechanism, which is used for carrying out linear optimization on a soft maximization function on the basis of a classical point multiplication attention mechanism, and changing the multiplication sequence of matrix factors based on a matrix multiplication combination law so as to lead the original complexity to be O (N ² ) The method is reduced to O (N), so that the construction of the human face living body detection model of the linear attention mechanism can effectively reduce the computational complexity on the premise of ensuring the recognition performance.

In order to achieve the above object, the present invention provides the following technical solutions: the method for constructing the human face living body detection model based on the linear attention mechanism comprises the following steps:

step 1, extracting a face image containing a face from a data set, and preprocessing the data;

step 2, constructing a basic model of feature extraction face images based on a convolutional neural network to obtain a feature map;

step 3, constructing a channel attention layer and a position attention layer to form a complete feature extraction network, and carrying out feature fusion on the feature map through the feature extraction network to obtain a progressive feature map;

step 4, accessing the advanced feature map into a fully-connected network for classification and identification, completing the identification capability of the true face and the false face, converting the advanced feature map into a two-dimensional vector, and completing the construction of the human face living body detection model;

step 5, based on the two-dimensional vector result output by the full connection layer, obtaining a classification result by utilizing binary cross entropy loss, and carrying out back propagation to finish network parameter updating of the human face living body detection model; and 6, verifying the unknown test set by using the model parameters obtained through training, and comprehensively evaluating the performance of the human face living detection model by using the recall rate and the accuracy.

In the foregoing method for constructing a face living body detection model based on a linear attention mechanism, in step 1, a face image including a face is extracted from a data set, and data preprocessing is performed, which specifically includes the steps of:

1.1, creating a 4-dimensional channel of a face image, and calculating the integral average value of pixels of three channels of red, green and blue of the face image, wherein the specific formula is as follows:

wherein R is Red, which represents the Red of the face image; g is Green, which represents the Green of the face image; b is Blue, which represents Blue of the face image;

n is the total number of training set pictures;

μ _R calculating an average value of R channels of all face images;

μ _G calculating the average value of the G channels of all face images;

μ _B calculating the average value of the B channels of all face images;

_i is a picture ordinal number;

1.2, subtracting the average value from each pixel value, wherein the specific formula is as follows:

wherein σ is the added scale factor, representing the standard deviation on the training set, specifically, σ=1;

1.3, carrying out random drifting, overturning, rotating and scaling on each picture subjected to illumination processing, and increasing the quantity of data.

The method for constructing the human face living body detection model based on the linear attention mechanism constructs a basic model for extracting human face images based on the characteristics of the convolutional neural network to obtain a characteristic diagram, wherein the characteristics based on the convolutional neural network have the following characteristics:

the convolutional neural network consists of four basic convolutional blocks and a maximum pooling layer, wherein each basic convolutional block consists of a convolutional layer and a batch normalization layer;

the convolution kernel of the convolution layer has the size of 3 multiplied by 3, the number of the convolution kernels is 128, the step length is 1, the activation function is a linear rectification function, and the filling mode is same;

the convolution kernel of the pooling layer is 2×2, with a step size of 2.

In the method for constructing the human face living body detection model based on the linear attention mechanism, in the step 3, a complete feature extraction network is formed by constructing a channel attention layer and a position attention layer, and feature fusion is carried out on the feature image through the feature extraction network to obtain a progressive feature image;

the method comprises the following specific steps of:

3.1.1, recording a characteristic diagram obtained by a convolutional neural network as A according to a dot product attention mechanism, andpassing A through three convolution layers to obtain query vector Q, and +.>Key vector K, and->Value vector V, and->

Wherein H is the characteristic height,

w is the width of the feature, and,

c is the number of channels;

wherein the point attention mechanism is formulated as follows:

s(Q*K,V)＝(Q*K) ^T V

wherein Q is a query vector, K is a key vector, and V is a value vector;

3.1.2 transforming the dimensions of A, Q, K, V intoUsing dot product operation as attention scoring function and soft maximization function, normalizing by row, calculating attention distribution +.>And the specific calculation formula is as follows:

s＝softmax(QK ^T )

3.1.3 performing dot product operation on the attention distribution s and V to obtain an output vectorAnd the specific calculation formula is as follows:

H＝sV＝softmax(QK ^T )V

3.1.4 multiplying the output sequence H by a leachable scale parameter alpha, and summing it with the feature map A element by element, the output transformed dimension beingAnd the specific calculation formula is as follows:

M _PA ＝αH+A

where alpha is initialized to 0 and progressively learns to assign more weight,

M _PA as an original channel attention mechanism,

the method for constructing the channel attention layer comprises the following specific steps of:

3.2.1 direct use of A withIts transposed matrix A ^T And soft maximization function to calculate channel attention distribution mapAnd the specific calculation formula is as follows:

x＝softmax(A ^T A)

3.2.2 mapping x onto A, multiplying by a learnable parameter beta, adding A to obtain result, and dimension-transforming the result intoAnd the specific transformation formula is as follows:

E _CA ＝β(Ax)+A

where beta is a parameter learned from 0,

E _CA is the sum of the characteristics of all channels weighted and the original characteristics,

the method comprises the specific steps of constructing a channel attention layer and a position attention layer, and forming a complete characteristic extraction network, wherein the specific steps are as follows:

3.3.1, M _PA Removing a soft maximization function, performing soft maximization operation on the rows and the columns of Q, calculating the last two terms according to the characteristics of a matrix multiplication combination law to obtain a C×C matrix, and multiplying the Q by the left to obtain a final result, wherein a specific formula is as follows;

E _PA ＝αsoftmax(Q)·(softmax(K ^T )·V)+A

3.3.2, point E _CA And E is _PA The dimension transformation is changed from NxC to HxW xC, and the feature fusion is carried out, wherein the specific formula is as follows:

F _A ＝F _CA +F _PA ；

wherein F is _A Is the result of the fusion of the two attention mechanisms.

In the aforementioned method for constructing a human face living body detection model based on a linear attention mechanism, in step 4, a full connection layer is used to map a progressive feature map onto a target space, so that the progressive feature map is converted into a two-dimensional vector, and the specific process is as follows:

and (3) extracting the correlation features among the features from the advanced feature map extracted by the front network layer after nonlinear change, and finally mapping the correlation features to a target feature space to convert the advanced feature map into a two-dimensional vector, thereby completing the construction of the human face living body detection model.

In the above-mentioned construction method of the human face living body detection model based on the linear attention mechanism, in step 5, based on the two-dimensional vector result output by the full connection layer, the classification result is obtained by utilizing binary cross entropy loss, and the reverse propagation is performed, so as to complete the network parameter update of the human face living body detection model;

the binary cross entropy loss calculation formula is as follows:

wherein N is the size of the batch,

y _i for the tag to which the data corresponds,

P(y _i ) The result of the prediction of the data by the network is a probability value.

In the above-mentioned construction method of the human face living body detection model based on the linear attention mechanism, in step 6, the model parameters obtained by training are used for verification on an unknown test set, and the recall rate and the accuracy rate are used for comprehensively evaluating the performance of the human face living body detection model;

the calculation formula of the recall rate is as follows:

where TP is the number of samples predicted to be positive, actually positive,

TN is the number of samples predicted to be negative and actually positive.

Compared with the prior art, the invention has the beneficial effects that:

the invention carries out linear optimization on the soft maximization function on the basis of classical point multiplication attention mechanism, namely M _PA Soft maximization function removal, respectively carrying out normalization operation of respective dimensionalities on the original two factors, and changing the multiplication sequence of the matrix factors based on the matrix multiplication combination law so that the original complexity is O (N ² ) The method reduces O (N) to optimize the calculation complexity, reduces the complex calculation caused by introducing a network with a dual-attention mechanism, and thus constructs a brand-new human face living body detection model based on a linear attention mechanism.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a schematic diagram of the overall network architecture of the present invention;

figure 3 is a graph of accuracy versus time for the present invention,

wherein, (a) is the accuracy-time diagram in the CASIA-SURF data set,

(b) The accuracy rate-time diagram is in self-made data set;

figure 4 is a plot of accuracy versus batch for the present invention,

wherein, (a) is the accuracy of the CASIA-SURF data set-batch map,

(b) The accuracy rate-batch map in the self-made data set.

Detailed Description

In order to make the technical scheme of the present invention better understood by those skilled in the art, the present invention will be further described in detail with reference to the accompanying drawings.

The invention provides a method for constructing a human face living body detection model based on a linear attention mechanism as shown in fig. 1-4, which comprises the following steps:

step 1, extracting a face image containing a face from a data set, and preprocessing the data, wherein the specific steps are as follows:

n is the total number of training set pictures;

μ _R calculating an average value of R channels of all face images;

μ _G calculating the average value of the G channels of all face images;

μ _B calculating the average value of the B channels of all face images;

i is a picture ordinal number;

1.3, carrying out random drifting, overturning, rotating and scaling on each picture in the original data set so as to increase the quantity of data.

In the step, the influence of different illumination pictures on the final classification or neural network under the same scene can be eliminated by creating a 4-dimensional channel of an image, the invariance characteristic of data can be enhanced by carrying out random drifting, overturning, rotating and scaling on each picture in the original data set, so that the number of data is increased, the generalization capability of a training model is improved, and the number of data is increased, namely, the model can identify pictures with different angles and different sizes;

and the feature map is a term in the art, and represents the output of hidden layers of a model, wherein the output of each hidden layer can be called as a feature map, one model has a plurality of hidden layers, namely a plurality of feature maps, in the subsequent model evaluation, only the final performance index and convergence characteristic are generally focused, and the output of the hidden layers in the middle (namely the feature map) is not used as an evaluation index.

Step 2, constructing a basic model of feature extraction face images based on a convolutional neural network to obtain a feature map, wherein the feature based on the convolutional neural network has the following characteristics:

the convolutional neural network consists of four basic convolutional blocks (Block) and a maximum pooling layer, wherein each basic convolutional Block consists of a convolutional layer (Conv 2D) and a batch normalization layer (BN);

the convolution kernel of the pooling layer is 2 multiplied by 2, and the step length is 2;

in the step, high-level characteristic information is effectively extracted from the face image through the characteristic extraction process of the convolutional neural network, and a basis is provided for the subsequent classification task.

the method comprises the following specific steps of:

Wherein H is the characteristic height,

w is the width of the feature, and,

c is the number of channels;

wherein the dot product attention mechanism is defined as follows:

the point-of-attention mechanism is a mechanism that selectively focuses on specific information in information processing. The method is characterized in that attention is focused on information related to the task, and other irrelevant information is ignored, so that the effect of task execution is improved;

the formula for this point-of-attention mechanism is as follows:

s(Q*K,V)＝(Q*K) ^T V

wherein Q is a query vector, K is a key vector, and V is a value vector;

3.1.2 transforming the dimension of A, Q, K, V (reshape) toUsing dot product operation as attention scoring function and soft maximization function, normalizing by row, calculating attention distribution +.>And the specific calculation formula is as follows:

s＝softmax(QK ^T )

H＝sV＝softmax(QK ^T )V

M _PA ＝αH+A

where alpha is initialized to 0 and progressively learns to assign more weight,

M _PA is the original channel attention mechanism;

3.2.1 direct use of A with its transpose matrix A ^T And soft maximization function to calculate channel attention distribution mapAnd the specific calculation formula is as follows:

x＝softmax(A ^T A)

E _CA ＝β(Ax)+A

where beta is a parameter learned from 0,

E _CA the long-distance dependency relationship among the channels of the feature map is established, the feature resolvability is improved, and the semantic relativity among the channels is fully utilized;

3.3.1, M _PA Soft maximization function extraction and column of Q and K are doneSoft maximization operation, calculating the last two terms according to the characteristic of a matrix multiplication combination law to obtain a matrix of C multiplied by C, and multiplying Q by the left to obtain a final result, wherein a specific formula is as follows;

E _PA ＝αsoftmax(Q)(softmax(K ^T )·V)+A

the matrix multiplication combination law refers to that in matrix multiplication operation, the combination law is satisfied, specifically, for three matrices A, B and C, the following relationship is satisfied:

(A*B)*C＝A*(B*C)

whether multiplying A with B and then with C, or multiplying B with C and then with A, the obtained results are the same;

3.3.2, point E _CA And E is _PA The dimension transformation is changed from NxC to HxW xC, and feature fusion is carried out, wherein the formula is as follows:

F _A ＝F _CA +F _PA

wherein F is _A Is the result of the fusion of the two attention mechanisms;

in the present embodiment E _PA Selectively aggregating the characteristics of the position and other positions, realizing the mutual effect of each position, improving the semantic consistency, E _CA The long-distance dependency relationship among the channels of the feature map is established, the feature resolvability is improved, and the semantic relativity among the channels is fully utilized.

And 4, accessing the advanced feature map into a fully-connected network for classification and identification, completing the identification capability of the true face and the false face, converting the advanced feature map into a two-dimensional vector, and completing the construction of a face living body detection model, wherein the method comprises the following specific steps of:

after nonlinear change, the extracted advanced feature map of the front network layer extracts the associated features among the features and finally maps the extracted advanced feature map to a target feature space so as to convert the advanced feature map into a two-dimensional vector, and thus, the human face living body detection model is built;

wherein the full connection layer plays a role of a classifier in the whole convolutional neural network,

the front layer network layer refers to the convolutional and then the attention network,

the feature map is mapped onto the target feature space and then converted into a two-dimensional vector.

Step 5, based on the two-dimensional vector result output by the full connection layer, obtaining a classification result by utilizing binary cross entropy loss, and carrying out back propagation to finish network parameter updating of the human face living body detection model;

the binary cross entropy loss calculation formula is as follows:

wherein N is the size of the batch,

y _i for the tag to which the data corresponds,

P(y _i ) Is the result of the network's prediction of the data, and since the last layer's activation function is a soft maximization function, P (y _i ) Is a probability value;

in this embodiment, the updating of the parameters is performed by convergence of the face living body detection model based on the linear attention mechanism, as shown in fig. 2-3, that is, the convergence curve tends to be smooth.

Step 6, verifying the unknown test set by using the model parameters obtained through training, and comprehensively evaluating the performance of the human face living body detection model by using the recall rate and the accuracy rate;

the calculation formula of the recall rate is as follows:

where TP is the number of samples predicted to be positive, actually positive,

TN is the number of samples predicted to be negative and actually positive;

in this embodiment, after the training of the face living body detection model based on the linear attention mechanism is finished, the performance of the model is evaluated to verify whether the model is valid.

In summary, the invention describes the input of a human face living body detection model based on a linear attention mechanism in step 1, which is hereinafter referred to as the model, wherein the convolution network in step 2 is the first stage of the model, and mainly completes the image feature extraction; the attention mechanism algorithm in the step 3 is the second stage of the model, and the enhancement of the features in the step 2 is completed, which is equivalent to screening of feature information closely related to the task; the fully connected network in the step 4 is a classifier, the characteristics in the step 3 are classified and identified, and the judgment of whether the person is a true person or a dummy person is given; the training method of the model is provided through the step 5, namely, the model described in the step 1-4 is trained and learned, so that the network model can update parameters, and the convergence effect is achieved; and step 6, evaluating the trained model in step 5 to confirm the validity of the model described in the invention.

Verification test

In order to verify the effectiveness of the method, the RGB image in the CASIA-SURF database is extracted through the method, the RGB image is subjected to image preprocessing to generate an enhanced data set, 75% of the data set is divided into a training set and 25% of the data set is divided into a test set at random;

wherein the training set has 22046 real faces and 50393 fraudulent faces, and total 72439 face pictures, and the test set has 7348 real faces and 16797 fraudulent faces, and total 24145 face pictures, according to the experimental result, table 1 is prepared;

TABLE 1

Method	Dummy face recall rate	Real face recall rate	Accuracy rate of
				Conventional model	99.7458％	99.8860％	99.8426％
The invention is that	99.8528％	99.8560％	99.8550％

In order to avoid experimental contingency, a video replay attack living body detection data set containing 14500 false faces and 9340 Zhang Zhen faces is prepared, and the processing is performed in the same way, so that a linear attention mechanism model and a conventional attention mechanism model of the application are evaluated, and table 2 is prepared according to experimental results;

TABLE 2

Method	Dummy face recall rate	Real face recall rate	Accuracy rate of
				Conventional model	99.9153％	99.8025％	99.9315％
The invention is that	99.9717％	99.9564％	99.9828％

As can be seen from fig. 4, in the ca sia-SURF dataset, after about 75 batches of training, all three networks reached convergence, and in the case of our homemade dataset, after about 125 batches of training, all reached convergence. In the two data sets, the accuracy curves of the two networks are basically coincident, and after the calculation complexity is optimized, the performance of the network is similar to that of the original network. Table 1, above, and Table 2 show the best results achieved by training the three networks in the CASIA-SURF dataset and the our own dataset, respectively. In the CASIA-SURF dataset, the linear attention network accuracy herein is about 99.86%, and the accuracy using conventional attention mechanisms is about 99.84%; in the homemade dataset, the model herein achieves an accuracy of about 99.98% and a conventional attention mechanism is used to achieve an accuracy of about 99.93%. It can be seen that our modification of the conventional attention mechanism does not have a significant impact on the model performance;

the results of the comparison are shown in tables 1 and 2, and the three noted network performances are about the same. As shown in fig. 3 (a) and (b), the linear attention network can greatly improve the training speed under the same performance. If the picture pixels of the test set are larger, the efficiency improvement brought by the linear attention network is more obvious. The results show that the application modification of the attention network is feasible and successful, and certain innovations are presented.

In conclusion, the invention carries out linear optimization on the soft maximization function based on the classical point multiplication attention mechanism, namely M _PA Soft maximization function removal, respectively carrying out normalization operation of respective dimensionalities on the original two factors, and changing the multiplication sequence of the matrix factors based on the matrix multiplication combination law so that the original complexity is O (N ² ) Down to O (N) so that the face biopsy model of the linear attention mechanismOn the premise of ensuring the recognition performance, the construction of the method can effectively reduce the calculation complexity;

experiments on the published human face living detection data set CASIA-SURF and the homemade data set show that under the condition of the same training steps, the training time can be shortened by about 1/8, and the proportion of the shortened training time is further increased along with the increase of the size of input pictures, so that the accuracy is higher than that of a conventional attention mechanism, and the accuracy is respectively up to 99.8550% and 99.9828%, and the recall rates of a real human face and a fake human face are effectively balanced.

While certain exemplary embodiments of the present invention have been described above by way of illustration only, it will be apparent to those of ordinary skill in the art that modifications may be made to the described embodiments in various different ways without departing from the spirit and scope of the invention. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive of the scope of the invention, which is defined by the appended claims.

Claims

1. The method for constructing the human face living body detection model based on the linear attention mechanism is characterized by comprising the following steps of: the method comprises the following steps:

wherein, in step 3:

constructing a channel attention layer and a position attention layer to form a complete feature extraction network, and carrying out feature fusion on the feature map through the feature extraction network to obtain a progressive feature map;

the method comprises the following specific steps of:

3.1.1, recording a characteristic diagram obtained by a convolutional neural network as A according to a dot product attention mechanism, andpassing A through three convolution layers to obtain query vector Q, and +.>Key vector K, and->Value vector V, and

wherein H is the characteristic height,

w is the width of the feature, and,

c is the number of channels;

wherein the point attention mechanism is formulated as follows:

s(Q*K,V)＝(Q*K) ^T V

wherein Q is a query vector, K is a key vector, and V is a value vector;

3.1.2 transforming the dimensions of A, Q, K, V intoNormalization by line using dot product operation as a scoring function for attention and soft maximization function, countingCalculate the attention distribution +.>And the specific calculation formula is as follows:

s＝softmax(QK ^T )

H＝sV＝softmax(QK ^T )V

M _PA ＝αH+A

where alpha is initialized to 0 and progressively learns to assign more weight,

M _PA as an original channel attention mechanism,

3.2.1 direct use of A with its transposed matrix A ^T And soft maximization function to calculate channel attention distribution mapAnd the specific calculation formula is as follows:

x＝softmax(A ^T A)

E _CA ＝β(Ax)+A

where beta is a parameter learned from 0,

3.3.1, M _PA Removing a soft maximization function, performing soft maximization operation on the rows of Q and the columns of K, calculating the last two terms according to the characteristics of a matrix multiplication combination law to obtain a C×C matrix, and multiplying the matrix by V to obtain a final result, wherein a specific formula is as follows;

E _PA ＝αsoftmax(Q)·(softmax(K ^T )·V)+A

F _A ＝F _CA +F _PA ；

wherein F is _A Is the result of the fusion of the two attention mechanisms.

2. The method for constructing a human face living body detection model based on a linear attention mechanism according to claim 1, wherein the method comprises the following steps: in step 1, a face image including a face is extracted from a data set, and data preprocessing is performed, specifically the steps are as follows:

n is the total number of training set pictures;

μ _R calculating an average value of R channels of all face images;

μ _G to calculate all face imagesAverage value of G channel;

μ _B calculating the average value of the B channels of all face images;

i is a picture ordinal number;

3. The method for constructing a human face living body detection model based on a linear attention mechanism according to claim 1, wherein the method comprises the following steps: in step 2, constructing a basic model of feature extraction face images based on a convolutional neural network to obtain a feature map, wherein the feature based on the convolutional neural network has the following characteristics:

the convolution kernel of the pooling layer is 2×2, with a step size of 2.

4. The method for constructing a human face living body detection model based on a linear attention mechanism according to claim 1, wherein the method comprises the following steps: in step 4, the advanced feature map is accessed into a fully-connected network for classification and identification, so that the identification capability of the true face and the false face is completed, the advanced feature map is converted into a two-dimensional vector, and the human face living body detection model is constructed, and the specific process is as follows:

5. The method for constructing a human face living body detection model based on a linear attention mechanism according to claim 1, wherein the method comprises the following steps: in step 5, based on the two-dimensional vector result output by the full connection layer, obtaining a classification result by utilizing binary cross entropy loss, and performing back propagation to finish network parameter updating of the human face living body detection model;

the binary cross entropy loss calculation formula is as follows:

wherein N is the size of the batch,

y _i for the tag to which the data corresponds,

6. The method for constructing a human face living body detection model based on a linear attention mechanism according to claim 1, wherein the method comprises the following steps: in step 6, verifying on an unknown test set by using the model parameters obtained through training, and comprehensively evaluating the performance of the human face living body detection model by using recall rate and accuracy;

the calculation formula of the recall rate is as follows:

where TP is the number of samples predicted to be positive and TN is the number of samples predicted to be negative and actually positive.