CN117011918B - Method for constructing human face living body detection model based on linear attention mechanism - Google Patents

Method for constructing human face living body detection model based on linear attention mechanism Download PDF

Info

Publication number
CN117011918B
CN117011918B CN202310992389.1A CN202310992389A CN117011918B CN 117011918 B CN117011918 B CN 117011918B CN 202310992389 A CN202310992389 A CN 202310992389A CN 117011918 B CN117011918 B CN 117011918B
Authority
CN
China
Prior art keywords
living body
body detection
attention
constructing
follows
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310992389.1A
Other languages
Chinese (zh)
Other versions
CN117011918A (en
Inventor
田坤
朱益良
王健伟
张忠宇
王宇达
张威
刘叶轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Institute of Technology
Original Assignee
Nanjing Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Institute of Technology filed Critical Nanjing Institute of Technology
Priority to CN202310992389.1A priority Critical patent/CN117011918B/en
Publication of CN117011918A publication Critical patent/CN117011918A/en
Application granted granted Critical
Publication of CN117011918B publication Critical patent/CN117011918B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Algebra (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a human face living body detection method based on a linear attention mechanismThe construction method of the model comprises the steps of extracting a face image containing a face from a data set, and preprocessing the data; constructing a basic model of a feature extraction face image based on a convolutional neural network to obtain a feature map; and constructing a channel attention layer and a position attention layer to form a complete feature extraction network, and carrying out feature fusion on the feature map through the feature extraction network to obtain a progressive feature map. The invention carries out linear optimization on the soft maximization function based on the classical point multiplication attention mechanism, and changes the multiplication sequence of matrix factors based on the matrix multiplication combination law so as to lead the original complexity to be O (N) 2 ) The method is reduced to O (N), so that the construction of the human face living body detection model of the linear attention mechanism can effectively reduce the computational complexity on the premise of ensuring the recognition performance.

Description

Method for constructing human face living body detection model based on linear attention mechanism
Technical Field
The invention relates to the technical field of living body detection, in particular to a method for constructing a human face living body detection model based on a linear attention mechanism.
Background
Along with the progress of artificial intelligence and face recognition technology, the importance of face living detection in a face recognition system is increasingly prominent, however, the existing face living detection method has some problems, such as poor user experience, high complexity, strong dependence and the like, and a new face living detection method needs to be pursued to solve the problems; currently, mainstream face living body detection methods can be classified into a method requiring the use of auxiliary information and a method not requiring the use of auxiliary information; the former requires the user to make specific action feedback, and the result is reliable, but the user experience is poor and the efficiency is low; the latter accords with the future development trend, and only the face image under the visible light is used for detection;
however, when the parameter amount is huge, the existing deep learning method is slow in detection speed and low in precision, in order to solve the problems, a scheme construction model of a dual-attention mechanism network is generally introduced, so that complex and various scenes can be processed efficiently, the space and channel dependency relationship in the feature map is captured through a self-attention mechanism, the feature representation is further enhanced, but the problems of high computational complexity and low computational precision exist in the introduction of the dual-attention mechanism network, and improvement is needed;
therefore, it is necessary to invent a method for constructing a human face living body detection model based on a linear attention mechanism to solve the above-mentioned problems.
Disclosure of Invention
The invention aims to provide a method for constructing a human face living body detection model based on a linear attention mechanism, which is used for carrying out linear optimization on a soft maximization function on the basis of a classical point multiplication attention mechanism, and changing the multiplication sequence of matrix factors based on a matrix multiplication combination law so as to lead the original complexity to be O (N 2 ) The method is reduced to O (N), so that the construction of the human face living body detection model of the linear attention mechanism can effectively reduce the computational complexity on the premise of ensuring the recognition performance.
In order to achieve the above object, the present invention provides the following technical solutions: the method for constructing the human face living body detection model based on the linear attention mechanism comprises the following steps:
step 1, extracting a face image containing a face from a data set, and preprocessing the data;
step 2, constructing a basic model of feature extraction face images based on a convolutional neural network to obtain a feature map;
step 3, constructing a channel attention layer and a position attention layer to form a complete feature extraction network, and carrying out feature fusion on the feature map through the feature extraction network to obtain a progressive feature map;
step 4, accessing the advanced feature map into a fully-connected network for classification and identification, completing the identification capability of the true face and the false face, converting the advanced feature map into a two-dimensional vector, and completing the construction of the human face living body detection model;
step 5, based on the two-dimensional vector result output by the full connection layer, obtaining a classification result by utilizing binary cross entropy loss, and carrying out back propagation to finish network parameter updating of the human face living body detection model; and 6, verifying the unknown test set by using the model parameters obtained through training, and comprehensively evaluating the performance of the human face living detection model by using the recall rate and the accuracy.
In the foregoing method for constructing a face living body detection model based on a linear attention mechanism, in step 1, a face image including a face is extracted from a data set, and data preprocessing is performed, which specifically includes the steps of:
1.1, creating a 4-dimensional channel of a face image, and calculating the integral average value of pixels of three channels of red, green and blue of the face image, wherein the specific formula is as follows:
wherein R is Red, which represents the Red of the face image; g is Green, which represents the Green of the face image; b is Blue, which represents Blue of the face image;
n is the total number of training set pictures;
μ R calculating an average value of R channels of all face images;
μ G calculating the average value of the G channels of all face images;
μ B calculating the average value of the B channels of all face images;
i is a picture ordinal number;
1.2, subtracting the average value from each pixel value, wherein the specific formula is as follows:
wherein σ is the added scale factor, representing the standard deviation on the training set, specifically, σ=1;
1.3, carrying out random drifting, overturning, rotating and scaling on each picture subjected to illumination processing, and increasing the quantity of data.
The method for constructing the human face living body detection model based on the linear attention mechanism constructs a basic model for extracting human face images based on the characteristics of the convolutional neural network to obtain a characteristic diagram, wherein the characteristics based on the convolutional neural network have the following characteristics:
the convolutional neural network consists of four basic convolutional blocks and a maximum pooling layer, wherein each basic convolutional block consists of a convolutional layer and a batch normalization layer;
the convolution kernel of the convolution layer has the size of 3 multiplied by 3, the number of the convolution kernels is 128, the step length is 1, the activation function is a linear rectification function, and the filling mode is same;
the convolution kernel of the pooling layer is 2×2, with a step size of 2.
In the method for constructing the human face living body detection model based on the linear attention mechanism, in the step 3, a complete feature extraction network is formed by constructing a channel attention layer and a position attention layer, and feature fusion is carried out on the feature image through the feature extraction network to obtain a progressive feature image;
the method comprises the following specific steps of:
3.1.1, recording a characteristic diagram obtained by a convolutional neural network as A according to a dot product attention mechanism, andpassing A through three convolution layers to obtain query vector Q, and +.>Key vector K, and->Value vector V, and->
Wherein H is the characteristic height,
w is the width of the feature, and,
c is the number of channels;
wherein the point attention mechanism is formulated as follows:
s(Q*K,V)=(Q*K) T V
wherein Q is a query vector, K is a key vector, and V is a value vector;
3.1.2 transforming the dimensions of A, Q, K, V intoUsing dot product operation as attention scoring function and soft maximization function, normalizing by row, calculating attention distribution +.>And the specific calculation formula is as follows:
s=softmax(QK T )
3.1.3 performing dot product operation on the attention distribution s and V to obtain an output vectorAnd the specific calculation formula is as follows:
H=sV=softmax(QK T )V
3.1.4 multiplying the output sequence H by a leachable scale parameter alpha, and summing it with the feature map A element by element, the output transformed dimension beingAnd the specific calculation formula is as follows:
M PA =αH+A
where alpha is initialized to 0 and progressively learns to assign more weight,
M PA as an original channel attention mechanism,
the method for constructing the channel attention layer comprises the following specific steps of:
3.2.1 direct use of A withIts transposed matrix A T And soft maximization function to calculate channel attention distribution mapAnd the specific calculation formula is as follows:
x=softmax(A T A)
3.2.2 mapping x onto A, multiplying by a learnable parameter beta, adding A to obtain result, and dimension-transforming the result intoAnd the specific transformation formula is as follows:
E CA =β(Ax)+A
where beta is a parameter learned from 0,
E CA is the sum of the characteristics of all channels weighted and the original characteristics,
the method comprises the specific steps of constructing a channel attention layer and a position attention layer, and forming a complete characteristic extraction network, wherein the specific steps are as follows:
3.3.1, M PA Removing a soft maximization function, performing soft maximization operation on the rows and the columns of Q, calculating the last two terms according to the characteristics of a matrix multiplication combination law to obtain a C×C matrix, and multiplying the Q by the left to obtain a final result, wherein a specific formula is as follows;
E PA =αsoftmax(Q)·(softmax(K T )·V)+A
3.3.2, point E CA And E is PA The dimension transformation is changed from NxC to HxW xC, and the feature fusion is carried out, wherein the specific formula is as follows:
F A =F CA +F PA
wherein F is A Is the result of the fusion of the two attention mechanisms.
In the aforementioned method for constructing a human face living body detection model based on a linear attention mechanism, in step 4, a full connection layer is used to map a progressive feature map onto a target space, so that the progressive feature map is converted into a two-dimensional vector, and the specific process is as follows:
and (3) extracting the correlation features among the features from the advanced feature map extracted by the front network layer after nonlinear change, and finally mapping the correlation features to a target feature space to convert the advanced feature map into a two-dimensional vector, thereby completing the construction of the human face living body detection model.
In the above-mentioned construction method of the human face living body detection model based on the linear attention mechanism, in step 5, based on the two-dimensional vector result output by the full connection layer, the classification result is obtained by utilizing binary cross entropy loss, and the reverse propagation is performed, so as to complete the network parameter update of the human face living body detection model;
the binary cross entropy loss calculation formula is as follows:
wherein N is the size of the batch,
y i for the tag to which the data corresponds,
P(y i ) The result of the prediction of the data by the network is a probability value.
In the above-mentioned construction method of the human face living body detection model based on the linear attention mechanism, in step 6, the model parameters obtained by training are used for verification on an unknown test set, and the recall rate and the accuracy rate are used for comprehensively evaluating the performance of the human face living body detection model;
the calculation formula of the recall rate is as follows:
where TP is the number of samples predicted to be positive, actually positive,
TN is the number of samples predicted to be negative and actually positive.
Compared with the prior art, the invention has the beneficial effects that:
the invention carries out linear optimization on the soft maximization function on the basis of classical point multiplication attention mechanism, namely M PA Soft maximization function removal, respectively carrying out normalization operation of respective dimensionalities on the original two factors, and changing the multiplication sequence of the matrix factors based on the matrix multiplication combination law so that the original complexity is O (N 2 ) The method reduces O (N) to optimize the calculation complexity, reduces the complex calculation caused by introducing a network with a dual-attention mechanism, and thus constructs a brand-new human face living body detection model based on a linear attention mechanism.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic diagram of the overall network architecture of the present invention;
figure 3 is a graph of accuracy versus time for the present invention,
wherein, (a) is the accuracy-time diagram in the CASIA-SURF data set,
(b) The accuracy rate-time diagram is in self-made data set;
figure 4 is a plot of accuracy versus batch for the present invention,
wherein, (a) is the accuracy of the CASIA-SURF data set-batch map,
(b) The accuracy rate-batch map in the self-made data set.
Detailed Description
In order to make the technical scheme of the present invention better understood by those skilled in the art, the present invention will be further described in detail with reference to the accompanying drawings.
The invention provides a method for constructing a human face living body detection model based on a linear attention mechanism as shown in fig. 1-4, which comprises the following steps:
step 1, extracting a face image containing a face from a data set, and preprocessing the data, wherein the specific steps are as follows:
1.1, creating a 4-dimensional channel of a face image, and calculating the integral average value of pixels of three channels of red, green and blue of the face image, wherein the specific formula is as follows:
wherein R is Red, which represents the Red of the face image; g is Green, which represents the Green of the face image; b is Blue, which represents Blue of the face image;
n is the total number of training set pictures;
μ R calculating an average value of R channels of all face images;
μ G calculating the average value of the G channels of all face images;
μ B calculating the average value of the B channels of all face images;
i is a picture ordinal number;
1.2, subtracting the average value from each pixel value, wherein the specific formula is as follows:
wherein σ is the added scale factor, representing the standard deviation on the training set, specifically, σ=1;
1.3, carrying out random drifting, overturning, rotating and scaling on each picture in the original data set so as to increase the quantity of data.
In the step, the influence of different illumination pictures on the final classification or neural network under the same scene can be eliminated by creating a 4-dimensional channel of an image, the invariance characteristic of data can be enhanced by carrying out random drifting, overturning, rotating and scaling on each picture in the original data set, so that the number of data is increased, the generalization capability of a training model is improved, and the number of data is increased, namely, the model can identify pictures with different angles and different sizes;
and the feature map is a term in the art, and represents the output of hidden layers of a model, wherein the output of each hidden layer can be called as a feature map, one model has a plurality of hidden layers, namely a plurality of feature maps, in the subsequent model evaluation, only the final performance index and convergence characteristic are generally focused, and the output of the hidden layers in the middle (namely the feature map) is not used as an evaluation index.
Step 2, constructing a basic model of feature extraction face images based on a convolutional neural network to obtain a feature map, wherein the feature based on the convolutional neural network has the following characteristics:
the convolutional neural network consists of four basic convolutional blocks (Block) and a maximum pooling layer, wherein each basic convolutional Block consists of a convolutional layer (Conv 2D) and a batch normalization layer (BN);
the convolution kernel of the convolution layer has the size of 3 multiplied by 3, the number of the convolution kernels is 128, the step length is 1, the activation function is a linear rectification function, and the filling mode is same;
the convolution kernel of the pooling layer is 2 multiplied by 2, and the step length is 2;
in the step, high-level characteristic information is effectively extracted from the face image through the characteristic extraction process of the convolutional neural network, and a basis is provided for the subsequent classification task.
Step 3, constructing a channel attention layer and a position attention layer to form a complete feature extraction network, and carrying out feature fusion on the feature map through the feature extraction network to obtain a progressive feature map;
the method comprises the following specific steps of:
3.1.1, recording a characteristic diagram obtained by a convolutional neural network as A according to a dot product attention mechanism, andpassing A through three convolution layers to obtain query vector Q, and +.>Key vector K, and->Value vector V, and->
Wherein H is the characteristic height,
w is the width of the feature, and,
c is the number of channels;
wherein the dot product attention mechanism is defined as follows:
the point-of-attention mechanism is a mechanism that selectively focuses on specific information in information processing. The method is characterized in that attention is focused on information related to the task, and other irrelevant information is ignored, so that the effect of task execution is improved;
the formula for this point-of-attention mechanism is as follows:
s(Q*K,V)=(Q*K) T V
wherein Q is a query vector, K is a key vector, and V is a value vector;
3.1.2 transforming the dimension of A, Q, K, V (reshape) toUsing dot product operation as attention scoring function and soft maximization function, normalizing by row, calculating attention distribution +.>And the specific calculation formula is as follows:
s=softmax(QK T )
3.1.3 performing dot product operation on the attention distribution s and V to obtain an output vectorAnd the specific calculation formula is as follows:
H=sV=softmax(QK T )V
3.1.4 multiplying the output sequence H by a leachable scale parameter alpha, and summing it with the feature map A element by element, the output transformed dimension beingAnd the specific calculation formula is as follows:
M PA =αH+A
where alpha is initialized to 0 and progressively learns to assign more weight,
M PA is the original channel attention mechanism;
the method for constructing the channel attention layer comprises the following specific steps of:
3.2.1 direct use of A with its transpose matrix A T And soft maximization function to calculate channel attention distribution mapAnd the specific calculation formula is as follows:
x=softmax(A T A)
3.2.2 mapping x onto A, multiplying by a learnable parameter beta, adding A to obtain result, and dimension-transforming the result intoAnd the specific transformation formula is as follows:
E CA =β(Ax)+A
where beta is a parameter learned from 0,
E CA is the sum of the characteristics of all channels weighted and the original characteristics,
E CA the long-distance dependency relationship among the channels of the feature map is established, the feature resolvability is improved, and the semantic relativity among the channels is fully utilized;
the method comprises the specific steps of constructing a channel attention layer and a position attention layer, and forming a complete characteristic extraction network, wherein the specific steps are as follows:
3.3.1, M PA Soft maximization function extraction and column of Q and K are doneSoft maximization operation, calculating the last two terms according to the characteristic of a matrix multiplication combination law to obtain a matrix of C multiplied by C, and multiplying Q by the left to obtain a final result, wherein a specific formula is as follows;
E PA =αsoftmax(Q)(softmax(K T )·V)+A
the matrix multiplication combination law refers to that in matrix multiplication operation, the combination law is satisfied, specifically, for three matrices A, B and C, the following relationship is satisfied:
(A*B)*C=A*(B*C)
whether multiplying A with B and then with C, or multiplying B with C and then with A, the obtained results are the same;
3.3.2, point E CA And E is PA The dimension transformation is changed from NxC to HxW xC, and feature fusion is carried out, wherein the formula is as follows:
F A =F CA +F PA
wherein F is A Is the result of the fusion of the two attention mechanisms;
in the present embodiment E PA Selectively aggregating the characteristics of the position and other positions, realizing the mutual effect of each position, improving the semantic consistency, E CA The long-distance dependency relationship among the channels of the feature map is established, the feature resolvability is improved, and the semantic relativity among the channels is fully utilized.
And 4, accessing the advanced feature map into a fully-connected network for classification and identification, completing the identification capability of the true face and the false face, converting the advanced feature map into a two-dimensional vector, and completing the construction of a face living body detection model, wherein the method comprises the following specific steps of:
after nonlinear change, the extracted advanced feature map of the front network layer extracts the associated features among the features and finally maps the extracted advanced feature map to a target feature space so as to convert the advanced feature map into a two-dimensional vector, and thus, the human face living body detection model is built;
wherein the full connection layer plays a role of a classifier in the whole convolutional neural network,
the front layer network layer refers to the convolutional and then the attention network,
the feature map is mapped onto the target feature space and then converted into a two-dimensional vector.
Step 5, based on the two-dimensional vector result output by the full connection layer, obtaining a classification result by utilizing binary cross entropy loss, and carrying out back propagation to finish network parameter updating of the human face living body detection model;
the binary cross entropy loss calculation formula is as follows:
wherein N is the size of the batch,
y i for the tag to which the data corresponds,
P(y i ) Is the result of the network's prediction of the data, and since the last layer's activation function is a soft maximization function, P (y i ) Is a probability value;
in this embodiment, the updating of the parameters is performed by convergence of the face living body detection model based on the linear attention mechanism, as shown in fig. 2-3, that is, the convergence curve tends to be smooth.
Step 6, verifying the unknown test set by using the model parameters obtained through training, and comprehensively evaluating the performance of the human face living body detection model by using the recall rate and the accuracy rate;
the calculation formula of the recall rate is as follows:
where TP is the number of samples predicted to be positive, actually positive,
TN is the number of samples predicted to be negative and actually positive;
in this embodiment, after the training of the face living body detection model based on the linear attention mechanism is finished, the performance of the model is evaluated to verify whether the model is valid.
In summary, the invention describes the input of a human face living body detection model based on a linear attention mechanism in step 1, which is hereinafter referred to as the model, wherein the convolution network in step 2 is the first stage of the model, and mainly completes the image feature extraction; the attention mechanism algorithm in the step 3 is the second stage of the model, and the enhancement of the features in the step 2 is completed, which is equivalent to screening of feature information closely related to the task; the fully connected network in the step 4 is a classifier, the characteristics in the step 3 are classified and identified, and the judgment of whether the person is a true person or a dummy person is given; the training method of the model is provided through the step 5, namely, the model described in the step 1-4 is trained and learned, so that the network model can update parameters, and the convergence effect is achieved; and step 6, evaluating the trained model in step 5 to confirm the validity of the model described in the invention.
Verification test
In order to verify the effectiveness of the method, the RGB image in the CASIA-SURF database is extracted through the method, the RGB image is subjected to image preprocessing to generate an enhanced data set, 75% of the data set is divided into a training set and 25% of the data set is divided into a test set at random;
wherein the training set has 22046 real faces and 50393 fraudulent faces, and total 72439 face pictures, and the test set has 7348 real faces and 16797 fraudulent faces, and total 24145 face pictures, according to the experimental result, table 1 is prepared;
TABLE 1
Method Dummy face recall rate Real face recall rate Accuracy rate of
Conventional model 99.7458% 99.8860% 99.8426%
The invention is that 99.8528% 99.8560% 99.8550%
In order to avoid experimental contingency, a video replay attack living body detection data set containing 14500 false faces and 9340 Zhang Zhen faces is prepared, and the processing is performed in the same way, so that a linear attention mechanism model and a conventional attention mechanism model of the application are evaluated, and table 2 is prepared according to experimental results;
TABLE 2
Method Dummy face recall rate Real face recall rate Accuracy rate of
Conventional model 99.9153% 99.8025% 99.9315%
The invention is that 99.9717% 99.9564% 99.9828%
As can be seen from fig. 4, in the ca sia-SURF dataset, after about 75 batches of training, all three networks reached convergence, and in the case of our homemade dataset, after about 125 batches of training, all reached convergence. In the two data sets, the accuracy curves of the two networks are basically coincident, and after the calculation complexity is optimized, the performance of the network is similar to that of the original network. Table 1, above, and Table 2 show the best results achieved by training the three networks in the CASIA-SURF dataset and the our own dataset, respectively. In the CASIA-SURF dataset, the linear attention network accuracy herein is about 99.86%, and the accuracy using conventional attention mechanisms is about 99.84%; in the homemade dataset, the model herein achieves an accuracy of about 99.98% and a conventional attention mechanism is used to achieve an accuracy of about 99.93%. It can be seen that our modification of the conventional attention mechanism does not have a significant impact on the model performance;
the results of the comparison are shown in tables 1 and 2, and the three noted network performances are about the same. As shown in fig. 3 (a) and (b), the linear attention network can greatly improve the training speed under the same performance. If the picture pixels of the test set are larger, the efficiency improvement brought by the linear attention network is more obvious. The results show that the application modification of the attention network is feasible and successful, and certain innovations are presented.
In conclusion, the invention carries out linear optimization on the soft maximization function based on the classical point multiplication attention mechanism, namely M PA Soft maximization function removal, respectively carrying out normalization operation of respective dimensionalities on the original two factors, and changing the multiplication sequence of the matrix factors based on the matrix multiplication combination law so that the original complexity is O (N 2 ) Down to O (N) so that the face biopsy model of the linear attention mechanismOn the premise of ensuring the recognition performance, the construction of the method can effectively reduce the calculation complexity;
experiments on the published human face living detection data set CASIA-SURF and the homemade data set show that under the condition of the same training steps, the training time can be shortened by about 1/8, and the proportion of the shortened training time is further increased along with the increase of the size of input pictures, so that the accuracy is higher than that of a conventional attention mechanism, and the accuracy is respectively up to 99.8550% and 99.9828%, and the recall rates of a real human face and a fake human face are effectively balanced.
While certain exemplary embodiments of the present invention have been described above by way of illustration only, it will be apparent to those of ordinary skill in the art that modifications may be made to the described embodiments in various different ways without departing from the spirit and scope of the invention. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive of the scope of the invention, which is defined by the appended claims.

Claims (6)

1. The method for constructing the human face living body detection model based on the linear attention mechanism is characterized by comprising the following steps of: the method comprises the following steps:
step 1, extracting a face image containing a face from a data set, and preprocessing the data;
step 2, constructing a basic model of feature extraction face images based on a convolutional neural network to obtain a feature map;
step 3, constructing a channel attention layer and a position attention layer to form a complete feature extraction network, and carrying out feature fusion on the feature map through the feature extraction network to obtain a progressive feature map;
step 4, accessing the advanced feature map into a fully-connected network for classification and identification, completing the identification capability of the true face and the false face, converting the advanced feature map into a two-dimensional vector, and completing the construction of the human face living body detection model;
step 5, based on the two-dimensional vector result output by the full connection layer, obtaining a classification result by utilizing binary cross entropy loss, and carrying out back propagation to finish network parameter updating of the human face living body detection model;
step 6, verifying the unknown test set by using the model parameters obtained through training, and comprehensively evaluating the performance of the human face living body detection model by using the recall rate and the accuracy rate;
wherein, in step 3:
constructing a channel attention layer and a position attention layer to form a complete feature extraction network, and carrying out feature fusion on the feature map through the feature extraction network to obtain a progressive feature map;
the method comprises the following specific steps of:
3.1.1, recording a characteristic diagram obtained by a convolutional neural network as A according to a dot product attention mechanism, andpassing A through three convolution layers to obtain query vector Q, and +.>Key vector K, and->Value vector V, and
wherein H is the characteristic height,
w is the width of the feature, and,
c is the number of channels;
wherein the point attention mechanism is formulated as follows:
s(Q*K,V)=(Q*K) T V
wherein Q is a query vector, K is a key vector, and V is a value vector;
3.1.2 transforming the dimensions of A, Q, K, V intoNormalization by line using dot product operation as a scoring function for attention and soft maximization function, countingCalculate the attention distribution +.>And the specific calculation formula is as follows:
s=softmax(QK T )
3.1.3 performing dot product operation on the attention distribution s and V to obtain an output vectorAnd the specific calculation formula is as follows:
H=sV=softmax(QK T )V
3.1.4 multiplying the output sequence H by a leachable scale parameter alpha, and summing it with the feature map A element by element, the output transformed dimension beingAnd the specific calculation formula is as follows:
M PA =αH+A
where alpha is initialized to 0 and progressively learns to assign more weight,
M PA as an original channel attention mechanism,
the method for constructing the channel attention layer comprises the following specific steps of:
3.2.1 direct use of A with its transposed matrix A T And soft maximization function to calculate channel attention distribution mapAnd the specific calculation formula is as follows:
x=softmax(A T A)
3.2.2 mapping x onto A, multiplying by a learnable parameter beta, adding A to obtain result, and dimension-transforming the result intoAnd the specific transformation formula is as follows:
E CA =β(Ax)+A
where beta is a parameter learned from 0,
E CA is the sum of the characteristics of all channels weighted and the original characteristics,
the method comprises the specific steps of constructing a channel attention layer and a position attention layer, and forming a complete characteristic extraction network, wherein the specific steps are as follows:
3.3.1, M PA Removing a soft maximization function, performing soft maximization operation on the rows of Q and the columns of K, calculating the last two terms according to the characteristics of a matrix multiplication combination law to obtain a C×C matrix, and multiplying the matrix by V to obtain a final result, wherein a specific formula is as follows;
E PA =αsoftmax(Q)·(softmax(K T )·V)+A
3.3.2, point E CA And E is PA The dimension transformation is changed from NxC to HxW xC, and the feature fusion is carried out, wherein the specific formula is as follows:
F A =F CA +F PA
wherein F is A Is the result of the fusion of the two attention mechanisms.
2. The method for constructing a human face living body detection model based on a linear attention mechanism according to claim 1, wherein the method comprises the following steps: in step 1, a face image including a face is extracted from a data set, and data preprocessing is performed, specifically the steps are as follows:
1.1, creating a 4-dimensional channel of a face image, and calculating the integral average value of pixels of three channels of red, green and blue of the face image, wherein the specific formula is as follows:
wherein R is Red, which represents the Red of the face image; g is Green, which represents the Green of the face image; b is Blue, which represents Blue of the face image;
n is the total number of training set pictures;
μ R calculating an average value of R channels of all face images;
μ G to calculate all face imagesAverage value of G channel;
μ B calculating the average value of the B channels of all face images;
i is a picture ordinal number;
1.2, subtracting the average value from each pixel value, wherein the specific formula is as follows:
wherein σ is the added scale factor, representing the standard deviation on the training set, specifically, σ=1;
1.3, carrying out random drifting, overturning, rotating and scaling on each picture subjected to illumination processing, and increasing the quantity of data.
3. The method for constructing a human face living body detection model based on a linear attention mechanism according to claim 1, wherein the method comprises the following steps: in step 2, constructing a basic model of feature extraction face images based on a convolutional neural network to obtain a feature map, wherein the feature based on the convolutional neural network has the following characteristics:
the convolutional neural network consists of four basic convolutional blocks and a maximum pooling layer, wherein each basic convolutional block consists of a convolutional layer and a batch normalization layer;
the convolution kernel of the convolution layer has the size of 3 multiplied by 3, the number of the convolution kernels is 128, the step length is 1, the activation function is a linear rectification function, and the filling mode is same;
the convolution kernel of the pooling layer is 2×2, with a step size of 2.
4. The method for constructing a human face living body detection model based on a linear attention mechanism according to claim 1, wherein the method comprises the following steps: in step 4, the advanced feature map is accessed into a fully-connected network for classification and identification, so that the identification capability of the true face and the false face is completed, the advanced feature map is converted into a two-dimensional vector, and the human face living body detection model is constructed, and the specific process is as follows:
and (3) extracting the correlation features among the features from the advanced feature map extracted by the front network layer after nonlinear change, and finally mapping the correlation features to a target feature space to convert the advanced feature map into a two-dimensional vector, thereby completing the construction of the human face living body detection model.
5. The method for constructing a human face living body detection model based on a linear attention mechanism according to claim 1, wherein the method comprises the following steps: in step 5, based on the two-dimensional vector result output by the full connection layer, obtaining a classification result by utilizing binary cross entropy loss, and performing back propagation to finish network parameter updating of the human face living body detection model;
the binary cross entropy loss calculation formula is as follows:
wherein N is the size of the batch,
y i for the tag to which the data corresponds,
P(y i ) The result of the prediction of the data by the network is a probability value.
6. The method for constructing a human face living body detection model based on a linear attention mechanism according to claim 1, wherein the method comprises the following steps: in step 6, verifying on an unknown test set by using the model parameters obtained through training, and comprehensively evaluating the performance of the human face living body detection model by using recall rate and accuracy;
the calculation formula of the recall rate is as follows:
where TP is the number of samples predicted to be positive and TN is the number of samples predicted to be negative and actually positive.
CN202310992389.1A 2023-08-08 2023-08-08 Method for constructing human face living body detection model based on linear attention mechanism Active CN117011918B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310992389.1A CN117011918B (en) 2023-08-08 2023-08-08 Method for constructing human face living body detection model based on linear attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310992389.1A CN117011918B (en) 2023-08-08 2023-08-08 Method for constructing human face living body detection model based on linear attention mechanism

Publications (2)

Publication Number Publication Date
CN117011918A CN117011918A (en) 2023-11-07
CN117011918B true CN117011918B (en) 2024-03-26

Family

ID=88575810

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310992389.1A Active CN117011918B (en) 2023-08-08 2023-08-08 Method for constructing human face living body detection model based on linear attention mechanism

Country Status (1)

Country Link
CN (1) CN117011918B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109961062A (en) * 2019-04-16 2019-07-02 北京迈格威科技有限公司 Image-recognizing method, device, terminal and readable storage medium storing program for executing
CN110084113A (en) * 2019-03-20 2019-08-02 阿里巴巴集团控股有限公司 Biopsy method, device, system, server and readable storage medium storing program for executing
CN110991432A (en) * 2020-03-03 2020-04-10 支付宝(杭州)信息技术有限公司 Living body detection method, living body detection device, electronic equipment and living body detection system
CN111401436A (en) * 2020-03-13 2020-07-10 北京工商大学 Streetscape image segmentation method fusing network and two-channel attention mechanism
CN111460931A (en) * 2020-03-17 2020-07-28 华南理工大学 Face spoofing detection method and system based on color channel difference image characteristics
CN111767954A (en) * 2020-06-30 2020-10-13 苏州科达科技股份有限公司 Vehicle fine-grained identification model generation method, system, equipment and storage medium
CN112580782A (en) * 2020-12-14 2021-03-30 华东理工大学 Channel enhancement-based double-attention generation countermeasure network and image generation method
CN113435353A (en) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Multi-mode-based in-vivo detection method and device, electronic equipment and storage medium
WO2021208687A1 (en) * 2020-11-03 2021-10-21 平安科技(深圳)有限公司 Human-face detection model training method, device, medium, and human-face detection method
CN113658165A (en) * 2021-08-25 2021-11-16 平安科技(深圳)有限公司 Cup-to-tray ratio determining method, device, equipment and storage medium
CN113780209A (en) * 2021-09-16 2021-12-10 浙江工业大学 Human face attribute editing method based on attention mechanism
CN113989906A (en) * 2021-11-26 2022-01-28 江苏科技大学 Face recognition method
CN115082994A (en) * 2022-06-27 2022-09-20 平安银行股份有限公司 Face living body detection method, and training method and device of living body detection network model
CN116152523A (en) * 2022-12-06 2023-05-23 马上消费金融股份有限公司 Image detection method, device, electronic equipment and readable storage medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084113A (en) * 2019-03-20 2019-08-02 阿里巴巴集团控股有限公司 Biopsy method, device, system, server and readable storage medium storing program for executing
CN109961062A (en) * 2019-04-16 2019-07-02 北京迈格威科技有限公司 Image-recognizing method, device, terminal and readable storage medium storing program for executing
CN110991432A (en) * 2020-03-03 2020-04-10 支付宝(杭州)信息技术有限公司 Living body detection method, living body detection device, electronic equipment and living body detection system
CN111401436A (en) * 2020-03-13 2020-07-10 北京工商大学 Streetscape image segmentation method fusing network and two-channel attention mechanism
CN111460931A (en) * 2020-03-17 2020-07-28 华南理工大学 Face spoofing detection method and system based on color channel difference image characteristics
CN111767954A (en) * 2020-06-30 2020-10-13 苏州科达科技股份有限公司 Vehicle fine-grained identification model generation method, system, equipment and storage medium
WO2021208687A1 (en) * 2020-11-03 2021-10-21 平安科技(深圳)有限公司 Human-face detection model training method, device, medium, and human-face detection method
CN112580782A (en) * 2020-12-14 2021-03-30 华东理工大学 Channel enhancement-based double-attention generation countermeasure network and image generation method
CN113435353A (en) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Multi-mode-based in-vivo detection method and device, electronic equipment and storage medium
CN113658165A (en) * 2021-08-25 2021-11-16 平安科技(深圳)有限公司 Cup-to-tray ratio determining method, device, equipment and storage medium
CN113780209A (en) * 2021-09-16 2021-12-10 浙江工业大学 Human face attribute editing method based on attention mechanism
CN113989906A (en) * 2021-11-26 2022-01-28 江苏科技大学 Face recognition method
CN115082994A (en) * 2022-06-27 2022-09-20 平安银行股份有限公司 Face living body detection method, and training method and device of living body detection network model
CN116152523A (en) * 2022-12-06 2023-05-23 马上消费金融股份有限公司 Image detection method, device, electronic equipment and readable storage medium

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
Attention-Guided Network for Iris Presentation Attack Detectio;Cunjian Chen;《arXiv》;1-10 *
Dual Attention Network for Scene Segmentation;Jun Fu 等;《CVPR 2019》;3146-3154 *
Multiple-Attention Mechanism Network for Semantic Segmentation;Dongli Wang 等;《sensors》;20220613;1-16 *
Visual Attention Methods in Deep Learning In-Depth Survey;Mohammed Hassanin 等;《arXiv》;20220421;1-20 *
基于三维注意力机制的车辆重识别算法;方彦策 等;《计算机测量与控制》;20220725;第30卷(第7期);194-200 *
基于多尺度双通道网络的人脸活体检测;任拓 等;《中北大学学报(自然科学版)》;第44卷(第3期);325-332 *
基于深度学习的图像语义分割技术研究进展;梁新宇 等;《计算机工程与应用》;20191113;第56卷(第2期);18-28 *

Also Published As

Publication number Publication date
CN117011918A (en) 2023-11-07

Similar Documents

Publication Publication Date Title
CN110263912B (en) Image question-answering method based on multi-target association depth reasoning
CN112308158B (en) Multi-source field self-adaptive model and method based on partial feature alignment
CN110458844B (en) Semantic segmentation method for low-illumination scene
CN110245665B (en) Image semantic segmentation method based on attention mechanism
CN111695467B (en) Spatial spectrum full convolution hyperspectral image classification method based on super-pixel sample expansion
CN115035418A (en) Remote sensing image semantic segmentation method and system based on improved deep LabV3+ network
CN114066871B (en) Method for training new coronal pneumonia focus area segmentation model
CN114694039A (en) Remote sensing hyperspectral and laser radar image fusion classification method and device
CN115410081A (en) Multi-scale aggregated cloud and cloud shadow identification method, system, equipment and storage medium
CN116740121A (en) Straw image segmentation method based on special neural network and image preprocessing
CN112329771A (en) Building material sample identification method based on deep learning
CN116704188A (en) Wheat grain image segmentation algorithm with different volume weights based on improved U-Net network
CN116452862A (en) Image classification method based on domain generalization learning
CN115171074A (en) Vehicle target identification method based on multi-scale yolo algorithm
CN114581789A (en) Hyperspectral image classification method and system
CN112528077B (en) Video face retrieval method and system based on video embedding
CN114049503A (en) Saliency region detection method based on non-end-to-end deep learning network
CN117611925A (en) Multi-source remote sensing image classification method based on graph neural network and convolution network
CN112329662B (en) Multi-view saliency estimation method based on unsupervised learning
CN113505856A (en) Hyperspectral image unsupervised self-adaptive classification method
CN115641445B (en) Remote sensing image shadow detection method integrating asymmetric inner convolution and Transformer
CN117011918B (en) Method for constructing human face living body detection model based on linear attention mechanism
Yu et al. MagConv: Mask-guided convolution for image inpainting
Zhao et al. MSRF-Net: multiscale receptive field network for building detection from remote sensing images
CN110992320A (en) Medical image segmentation network based on double interleaving

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant