CN113255472B

CN113255472B - Face quality evaluation method and system based on random embedding stability

Info

Publication number: CN113255472B
Application number: CN202110494692.XA
Authority: CN
Inventors: 李阳; 罗鑫
Original assignee: Beijing Zhongke Flux Technology Co ltd
Current assignee: Beijing Zhongke Flux Technology Co ltd
Priority date: 2021-05-07
Filing date: 2021-05-07
Publication date: 2024-05-24
Anticipated expiration: 2041-05-07
Also published as: CN113255472A

Abstract

The invention discloses a face quality evaluation method and a face quality evaluation system based on random embedding stability, wherein the method is to obtain face quality evaluation scores while carrying out a face recognition process by embedding the face quality evaluation scores into a face recognition neural network, and specifically comprises the following steps: step 1: carrying out frame extraction on each image frame in the video stream according to a preset rule; step 2: face detection is carried out on any image frame, and the detected faces are cut and aligned to obtain face images; step 3: inputting the face image into a first neural network to obtain corresponding face characteristics; step 4: inputting the face features into a second neural network to obtain n random features; step 5: combining the n random features pairwise to obtain different random feature pairs, and calculating face quality evaluation scores according to the feature pairs; step 6: and (3) inputting the face characteristics in the step (3) into a third neural network to obtain face identification characteristics for the subsequent face identification or comparison process.

Description

Face quality evaluation method and system based on random embedding stability

Technical Field

The invention relates to the field of face recognition and evaluation, in particular to a face quality evaluation method and system based on random embedding stability.

Background

The face recognition technology is one of important contents in the security monitoring field, fig. 1 is a structure diagram of an existing face recognition system, and as shown in fig. 1, generally, the face recognition system mainly comprises four parts of face detection, face alignment, face quality evaluation and face recognition. Face quality evaluation is one of important components in a face recognition system, and is used for evaluating the imaging quality of a face image and whether the image can be used for face recognition, and has the following significance: 1) The recognition rate and reliability of a face recognition system can be reduced by the low-quality face image, the face recognition system can be interfered by factors such as illumination, resolution, blurring, gesture, shielding and the like, and the identity information of the face image is difficult to effectively recognize by an algorithm and a user from the low-quality image; 2) The computing cost of the face recognition system is often larger, the computing performance of the face recognition system is obviously reduced due to a large number of low-quality face images, and the face quality evaluation system can screen out the optimal face images from the video stream and send the optimal face images to the recognition module, so that the overall operation speed of the face recognition system is improved; 3) The face recognition system needs to reserve high-quality stranger images for users, and whether the face quality evaluation system can obtain clear and recognizable face images or not determines the usability of the system; 4) The face quality evaluation can measure the reliability of the face recognition result.

At present, part of face quality evaluation adopts a face quality evaluation method based on supervised learning, and the face quality is evaluated by constructing an artificially marked data set and utilizing a neural network. Firstly, a large number of different face images are required to be collected to construct a training set for face quality evaluation; then, a plurality of labeling personnel are required to subjectively score the quality of the face image, and the scored result is processed to be used as a label of a training set; then, various convolutional neural networks or other feature extraction methods are utilized to extract the features of the face images in the training set; finally, continuous scores are obtained using the full connection layer (FullyConnectedLayer) of convolutional neural network or linear regression, logistic regression, etc. methods in machine learning. However, the evaluation of the face picture by means of subjective judgment of the person has great deviation, and it is difficult to find a unified, clear and definite evaluation standard, so that different people give the same score when evaluating the quality of the face picture. Therefore, the labels obtained in this way are not reliable enough and not universal, and the algorithms trained on this dataset are not stable and reliable enough. In addition, the training set for human face quality evaluation needs to contain different human face images under various conditions, the acquisition and labeling of the images are time-consuming and labor-consuming, meanwhile, the image quality score obtained by the technology often accords with the subjective judgment of a person, but does not necessarily meet the judgment standard of a human face model, the evaluation standard of the human face image quality by the algorithm and the subjective of the person may not be consistent, the person subjectively feels the high-quality human face image, and the result of the recognition algorithm may not necessarily be considered stable and reliable. In addition, in order to keep the data distribution of the recognition model and the quality evaluation model consistent, different face quality evaluation models need to be trained aiming at different face recognition models, and the training is time-consuming and labor-consuming.

And a part of face quality evaluation adopts a face quality evaluation method based on face image characteristics, wherein different algorithms are designed to score various different characteristics (such as Euler angles, ambiguity, shielding degree and the like) of a face image respectively by comprehensively considering various different characteristics of the face, and finally the final score of the face quality is obtained through weighted summation. The face Euler angle can be obtained through key points detected by the face, the ambiguity can be calculated through a calculated Laplacian operator, the shielding degree can be judged through a neural network, different weights are designed for the plurality of features according to different requirements, the final face quality score is obtained through weighted summation, and the higher the weight is, the more important the factor is on the face quality evaluation result. However, the calculation of the face euler angle firstly needs the face key points obtained by the face detection module, and if the face key points are not available, the face euler angle cannot be calculated. The illumination, the ambiguity, the face gesture and the face shielding equal division need a plurality of image algorithms to calculate the same, different algorithms are selected according to the requirements, a great deal of calculation cost is brought to the calculation, and how to select a proper algorithm to calculate the same is also a difficulty. The dimension of different face features obtained by different algorithms is not the same, and how to normalize and weight the results is also a difficulty. In addition, the face quality score calculated by the technology is not directly related to the calculation result of the face recognition model, and the degree of the face quality score does not mean that the face recognition result is reliable or not.

Disclosure of Invention

In order to solve the problems, the invention provides a face quality evaluation method and a face quality evaluation system based on random embedding stability, which are used for covering interference factors such as shielding, attitude, illumination, resolution, ambiguity and the like to obtain a stable and reliable face quality evaluation score, and a training set needing manual labeling is not required to be constructed, so that the problems of inaccurate manual labeling and ambiguous labeling standard are avoided. In addition, the algorithm design of the invention can reduce the training process of the face quality evaluation model, avoid priori knowledge in the process of calculating the face quality, and simultaneously, the obtained final result can better explain the correlation between the face quality evaluation and the face recognition result.

In order to achieve the above purpose, the invention provides a face quality evaluation method based on random embedding stability, which is embedded into a face recognition neural network, and obtains face quality evaluation scores while carrying out a face recognition process, and specifically comprises the following steps:

Step 1: acquiring a video stream acquired by a monitoring camera, and extracting frames from each image frame in the video stream according to a preset rule;

Step 2: performing face detection on any extracted image frame, and cutting and aligning the detected face in an affine transformation mode to obtain a face image I with a fixed size after alignment;

step 3: inputting the obtained face image I into a first neural network to obtain corresponding face characteristics X (I);

Step 4: inputting the obtained face features X (I) into a second neural network, and obtaining n random features X _i (i=1, 2, … … n) through n times of random feature calculation;

step 5: combining the obtained n random features x _i (i=1, 2, … … n) two by two to obtain A number of different random feature pairs (x _i,x_j) and calculating a face quality assessment score from the feature pairs (x _i,x_j), wherein i=1, 2, … … n, j=1, 2, … … n;

Step 6: and (3) inputting the face characteristics X (I) obtained in the step (3) into a third neural network to obtain face identification characteristics for the subsequent face identification or comparison process.

In an embodiment of the present invention, the frame extraction rule adopted in step 1 specifically extracts one frame every 10 frames of images.

In an embodiment of the present invention, an algorithm adopted for face detection in the step 2 is a RETINAFACE face detection algorithm.

In one embodiment of the present invention, the face quality evaluation score calculated in step 5 according to the feature pair (x _i,x_j) is specifically calculated according to the following formula:

in the formula, q is a face quality evaluation score, sigma represents a Sigmoid function, and d (·) represents a euclidean distance.

In an embodiment of the present invention, the first neural network, the second neural network, and the third neural network are sub-networks of the face recognition neural network, the face recognition neural network is a residual network, and the face recognition neural network includes a first layer, a second layer, a third layer, a fourth layer, and a normalization and full connection layer, and each of the first layer, the second layer, the third layer, and the fourth layer includes a plurality of neural network residual blocks, each of the neural network residual blocks includes a convolution kernel, a residual block normalization layer, and an activation function, where:

The first neural network is composed of the first layer, the second layer, the third layer and the fourth layer of the face recognition neural network, and the weight parameters of the first neural network are the same as those of the face recognition neural network;

the second neural network comprises a first normalization layer, a Dropout layer and a first full-connection layer, wherein weight parameters of the first normalization layer and the first full-connection layer are inherited from the face recognition neural network, and the Dropout layer comprises a super parameter used for specifying the probability of randomly discarding neurons;

the third neural network comprises a second normalization layer and a second full-connection layer, wherein weight parameters of the second normalization layer and the second full-connection layer are inherited from the face recognition neural network.

In order to achieve the above object, the present invention provides a face quality evaluation system based on random embedding stability, which is embedded into a face recognition neural network to obtain a face quality evaluation score while performing a face recognition process, and includes:

The video image preprocessing module is used for preprocessing an input video stream, wherein the preprocessing comprises the steps of extracting an image frame from the video stream, and carrying out face detection and clipping alignment on the image frame;

The first neural network module is connected with the video image preprocessing module and is used for extracting face features;

the second neural network module is connected with the first neural network module and is used for calculating the face quality evaluation score;

And the third neural network module is connected with the first neural network module and is used for acquiring face recognition characteristics.

In an embodiment of the present invention, the first neural network module includes a plurality of neural network residual blocks, and each neural network residual block includes a convolution kernel, a residual block normalization layer, a pooling layer, and an activation function.

In an embodiment of the present invention, the second neural network module includes a first normalization layer, a Dropout layer, and a first fully-connected layer, where the Dropout layer includes a super parameter for specifying a probability of randomly discarding neurons.

In an embodiment of the present invention, the third neural network module includes a second normalization layer and a second full connection layer.

Compared with the prior art, the invention has the following advantages:

1) The face quality evaluation method and the face quality evaluation system adopted by the invention can be embedded into most of the existing face recognition models, can obtain face characteristics and face quality evaluation scores while carrying out face recognition, save the calculation time of the whole face recognition and avoid retraining a neural network aiming at the face quality evaluation task;

2) The invention does not need to construct a human face quality evaluation training set or manually and subjectively mark the human face image, and is a non-reference human face quality evaluation method;

3) The invention covers common interference factors of various face recognition such as illumination, ambiguity, gesture and the like, and does not need to carry out anti-interference treatment on the factors by respectively designing algorithms;

4) The face quality evaluation score obtained by calculation can be regarded as a measurement standard for the reliability of the face recognition result, and the higher the face quality evaluation score is, the more reliable the face recognition result is, and the face recognition result have certain correlation.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a diagram of a prior art face recognition system architecture;

FIG. 2 is a flowchart of a face quality evaluation method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a neural network according to an embodiment of the present invention;

Fig. 4 is a block diagram of a face quality evaluation system according to an embodiment of the present invention.

Reference numerals illustrate: net 1-a first neural network; net 2-second neural network; net 3-third neural network; resNet 50 0-face recognition neural network; layer 1-first Layer; layer 2-second Layer; layer 3-third Layer; layer 4-fourth Layer; batchNorm-normalization layer; fullyConnected-a full connection layer 401-a video image preprocessing module; 402-a first neural network module; 403-a second neural network module; 404-a third neural network module.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without any inventive effort, are intended to be within the scope of the invention.

The face image quality assessment (Image Quality Assessment, IQA) can be divided into subjective assessment and objective assessment in terms of methods. The subjective evaluation is to evaluate the quality of the image from subjective perception of a person, and the specific process is to give an original reference image and a distorted image first, so that a marker scores the distorted image; the objective evaluation is to give a quantized value by using a mathematical model, which can generate a batch of distorted images using image processing techniques. In addition, objective evaluations can be classified into three types, full Reference (FR), partial Reference (RR), and No Reference (No-Reference, NR), depending on whether or not the original Reference image is included. The full-reference evaluation method needs to refer to each pixel point on the original image to obtain the image quality; the non-reference evaluation method only needs to utilize the distorted image to evaluate the quality, and has higher difficulty; while the half-reference evaluation method is intermediate between full-reference and no-reference, it requires partial information of the original reference image, or partial features extracted from the original image. The evaluation method and the evaluation system adopted by the embodiment of the invention belong to non-reference face image quality evaluation (No-REFERENCE IMAGE Quality Assessment, NRIQA), can be widely applied to video monitoring scenes, and can evaluate and score the quality of face images captured by a monitoring camera, so that high-quality face images are screened out for the subsequent face recognition or comparison process.

Example 1

Fig. 2 is a flowchart of a face quality evaluation method according to an embodiment of the present invention, as shown in fig. 2, and the embodiment provides a face quality evaluation method based on random embedding stability, which is embedded into a face recognition neural network, and obtains a face quality evaluation score while performing a face recognition process, and specifically includes the following steps:

in this embodiment, the frame extraction rule adopted in step 1 specifically extracts one frame from every 10 frames of images, and in other embodiments, other numbers of frame extraction intervals may be adopted to perform frame extraction, which is not limited by the number of frame extraction intervals.

Step 2: performing face detection on any extracted image frame, and cutting and aligning the detected face in an affine transformation mode to obtain a face image I with a fixed size after alignment; the fixed size in this embodiment is 112 pixels×112 pixels, and in other embodiments, the fixed size may be cut and aligned to other sizes according to the requirements, which is not limited by the present invention.

In this embodiment, the algorithm adopted for face detection in step 2 is RETINAFACE face detection algorithm. The RETINAFACE face detection algorithm is an open source face detection algorithm based on one-stage developed by a developer of INSIGHTFACE algorithm in 5 months of 2019, and compared with the detection algorithm based on two-stage, the RETINAFACE face detection algorithm has the biggest advantage that the detection speed is not limited by the number of faces.

Step 4: inputting the obtained face feature X (I) to a second neural network, calculating a random characteristic (stochastic embedding), wherein the calculation process is repeated n times, so as to obtain n random features X _i (i=1, 2, … … n);

In this embodiment, the face quality evaluation score calculated according to the feature pair (x _i,x_j) in step 5 is specifically calculated according to the following formula:

Fig. 3 is a schematic view of a neural Network according to an embodiment of the present invention, as shown in fig. 3, in this embodiment, the first neural Network (Net 1), the second neural Network (Net 2), and the third neural Network (Net 3) are sub-networks of the face recognition neural Network (ResNet) and the face recognition neural Network (ResNet) of this embodiment is a Residual Network (Residual Network) that includes five parts of a first Layer (Layer 1), a second Layer (Layer 2), a third Layer (Layer 3), a fourth Layer (Layer 4), and a normalization and full connection Layer (BatchNorm FullyConnected), where each Layer, i.e., the first Layer (Layer 1), the second Layer (Layer 2), the third Layer (Layer 3), and the fourth Layer (Layer 4), includes a plurality of neural Network Residual blocks (ResidualBlock), and each neural Network Residual block (ResidualBlock) includes a convolution kernel (Convs), a Residual block normalization Layer (BatchNorm), and an activation function (Relu), where:

The first neural network (Net 1) is composed of the first four layers of the face recognition neural network (ResNet) which are the first Layer (Layer 1), the second Layer (Layer 2), the third Layer (Layer 3) and the fourth Layer (Layer 4), and the weight parameters of the first neural network (Net 1) are the same as those of the face recognition neural network (ResNet);

The second neural network (Net 2) is a three-layer network, which comprises a normalization layer (BatchNorm), a Dropout layer (Dropout, meaning taken off, in which the function of taking off part of neurons is implemented) and a full connection layer (FullyConnected), wherein the weight parameters of the normalization layer (BatchNorm) and the full connection layer (FullyConnected) are inherited from the face recognition neural network (ResNet 50), and the Dropout layer (Dropout) comprises a super parameter for specifying the probability of randomly discarding neurons; in this embodiment, the dropoff layer (dropoout) does not need a weight parameter, and only needs a super parameter p, and the dropoff layer functions to randomly discard a part of neurons in the neural network, thereby generating randomness. Therefore, the purpose of introducing a Dropout layer (Dropout) in the second neural network (Net 2) is to test the robustness of the neural network to the output characteristics, the higher the robustness, the better the corresponding input image quality.

The normalization of the third neural network (Net 3) and the face recognition neural network (ResNet 50) is the same as the full connection layer (BatchNorm FullyConnected), which comprises a normalization layer (BatchNorm) and a full connection layer (FullyConnected), wherein the weight parameters of the normalization layer (BatchNorm) and the full connection layer (FullyConnected) of the third neural network (Net 3) are inherited from the face recognition neural network (ResNet).

Compared with the scheme of using an independent face quality evaluation neural network to evaluate scoring, the method can save a large amount of calculation cost and reduce the running time of a system, and the method belongs to the scheme of evaluating the quality of the non-reference image, and does not need to consider the influence of interference factors such as resolution, ambiguity, shielding, noise, gesture, illumination and the like on the evaluation result when evaluating the scoring of the face quality. In addition, as shown in fig. 1, the evaluation method adopted in the embodiment has a certain correlation with the face recognition model, and can determine the reliability of the face recognition result by setting a threshold value, so that the higher the face quality score, the higher the image quality is, the more favorable the face recognition model to determine the identity of the person, that is, the more reliable the face recognition result.

Example two

Fig. 4 is a block diagram of a face quality evaluation system according to an embodiment of the present invention, as shown in fig. 4, the embodiment provides a face quality evaluation system based on random embedding stability, which is embedded in a face recognition neural network, and is configured to obtain a face quality evaluation score while performing a face recognition process, and includes:

a video image preprocessing module (401) for preprocessing an input video stream, wherein the preprocessing comprises the steps of extracting image frames from the video stream, performing face detection on the extracted image frames and performing clipping alignment on detected persons;

The first neural network module (402) is connected with the video image preprocessing module (401) and is used for extracting face features;

A second neural network module (403) connected to the first neural network module (402) for calculating a face quality evaluation score;

and the third neural network module (404) is connected with the first neural network module (402) and is used for acquiring the face recognition characteristics.

In this embodiment, the first neural network module (402) is a sub-network of a face recognition neural network, and includes a plurality of neural network residual blocks, where each neural network residual block includes a convolution kernel (Conv), a residual block normalization layer (BatchNorm), a pooling layer (Pooling), an activation function (Relu), and the like.

In this embodiment, the second neural network module (403) is a further sub-network of the face recognition neural network, which includes a normalization layer (BatchNorm), a Dropout layer (Dropout), a full-connection layer (FullyConnected), and the like, where the Dropout layer (Dropout) includes a super parameter for specifying a probability of randomly discarding neurons.

In this embodiment, the third neural network module (404) is another sub-network of the face recognition neural network, which includes a normalization layer (BatchNorm), a full connection layer (FullyConnected), and the like.

Compared with the prior art, the face quality evaluation method and system based on random embedding stability adopted by the embodiments of the invention have the following advantages:

Those of ordinary skill in the art will appreciate that: the drawing is a schematic diagram of one embodiment and the modules or flows in the drawing are not necessarily required to practice the invention.

Those of ordinary skill in the art will appreciate that: the modules in the apparatus of the embodiments may be distributed in the apparatus of the embodiments according to the description of the embodiments, or may be located in one or more apparatuses different from the present embodiments with corresponding changes. The modules of the above embodiments may be combined into one module, or may be further split into a plurality of sub-modules.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The face quality evaluation method based on random embedding stability is characterized by embedding the face quality evaluation method into a face recognition neural network, and obtaining face quality evaluation scores while carrying out a face recognition process, wherein the face recognition neural network is a residual error network and comprises a first layer, a second layer, a third layer, a fourth layer, a normalization layer and a full connection layer, wherein:

Forming a first neural network by the first layer, the second layer, the third layer and the fourth layer, wherein the weight parameters of the first neural network are the same as those of the face recognition neural network;

the normalization layer, the Dropout layer and the full-connection layer form a second neural network, wherein weight parameters of the normalization layer and the full-connection layer are inherited from the face recognition neural network, and the Dropout layer comprises a super parameter for specifying the probability of randomly discarding neurons;

the normalization layer and the full-connection layer form a third neural network, wherein weight parameters of the normalization layer and the full-connection layer are inherited from the face recognition neural network;

The first neural network, the second neural network and the third neural network are sub-networks of the face recognition neural network, and the face quality evaluation method specifically comprises the following steps:

2. The face quality evaluation method according to claim 1, wherein the frame extraction rule adopted in step 1 is specifically to extract one frame every 10 frames of images.

3. The face quality evaluation method according to claim 1, wherein the algorithm used for face detection in step 2 is RETINAFACE face detection algorithm.

4. The face quality evaluation method according to claim 1, wherein the face quality evaluation score calculated in step 5 from the feature pair (x _i,x_j) is specifically calculated according to the following formula:

5. The face quality assessment method according to claim 1, wherein the first layer, the second layer, the third layer and the fourth layer each comprise a plurality of neural network residual blocks, each including a convolution kernel, a residual block normalization layer and an activation function.

6. A face quality evaluation system based on random embedding stability for executing the face quality evaluation method according to any one of claims 1 to 5, comprising:

7. The face quality assessment system of claim 6, wherein the first neural network module comprises a plurality of neural network residual blocks, each neural network residual block comprising a convolution kernel, a residual block normalization layer, a pooling layer, and an activation function.