CN113936312A

CN113936312A - Face recognition base screening method based on deep learning graph convolution network

Info

Publication number: CN113936312A
Application number: CN202111185859.0A
Authority: CN
Inventors: 王乾宇; 周金明; 张世坤
Original assignee: Nanjing Inspector Intelligent Technology Co Ltd
Current assignee: Nanjing Inspector Intelligent Technology Co Ltd
Priority date: 2021-10-12
Filing date: 2021-10-12
Publication date: 2022-01-14
Anticipated expiration: 2041-10-12
Also published as: CN113936312B

Abstract

The invention discloses a face recognition base screening method based on a deep learning graph convolution network, which comprises the following steps: the method comprises the steps of firstly, acquiring a video in a use scene; secondly, selecting a face recognition network as a main network according to the requirements of a use scene, combining a graph convolution network as a branch, constructing a face image quality evaluation model, and then training the model; and thirdly, detecting a human face object from the captured video. Fourthly, acquiring an image to be selected by using a mask detection algorithm; fifthly, adding part of interference images and the images to be selected obtained in the previous step, inputting the interference images and the images to be selected into a trained face recognition base image screening model for screening, and outputting the images with the highest confidence scores in the images to be detected by the model; and sixthly, storing the high-quality face image output by the model as a base library. The method for screening the human face bottom library not only ensures the screening efficiency, but also ensures the definition of the screened human face image and the integrity of the characteristics.

Description

Face recognition base screening method based on deep learning graph convolution network

Technical Field

The invention relates to the fields of computer vision, deep learning face recognition and intelligent monitoring, in particular to a face recognition base screening method based on a deep learning graph convolution network.

Background

With the rapid development of the artificial intelligence direction in China, the face recognition technology is widely applied to various fields, not only brings great convenience to the life of people, but also improves the safety of public places to a great extent; the face recognition is based on the premise that face base information is input, and if the input base image quality is poor, the situations of fuzziness, facial occlusion and the like exist, the face recognition information is likely to be identified wrongly when compared; therefore, the method selects a proper method to screen out the high-definition high-quality face image as the base information to play a crucial role in face recognition, and is beneficial to improving the accuracy and stability of face recognition.

At present, most of face recognition base image construction work still depends on manual screening, and high-quality clear faces are screened out from a large number of face images to serve as recognition base images; however, there are significant drawbacks to manual screening: depending on the experience of workers, the subjectivity of the workers in the screening process is overlarge, and the screening result has no unified standard; the detection and screening takes long time and the working efficiency is low.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a face recognition base screening method based on a deep learning graph convolution network, which screens a face base through a model, thereby ensuring the screening efficiency and ensuring the definition and the feature integrity of the screened face image; the highest-quality face image of the detected object is obtained through neural network learning and is used as a bottom library, a screening threshold value does not need to be set manually, and the influence caused by insufficient manual experience is reduced. The technical scheme is as follows:

a face recognition base screening method based on a deep learning graph convolution network comprises the following steps:

firstly, a video shot by a monitoring camera in a using scene is obtained.

And secondly, selecting a face recognition network as a main network according to the use scene, combining a graph convolution network as a branch, constructing a face image quality evaluation model, and then training the model.

And thirdly, detecting a face object from the captured video, and adopting a lightweight face detection algorithm MTCNN for face detection. And selecting a detected face object as a target, marking the target to prevent repeated selection of the next detection, tracking the selected target in each frame later by using a KCF target tracking algorithm, and intercepting and storing a frame image of the target face.

And fourthly, detecting the image stored in the third step by using a paddlehub mask detection algorithm, and storing the image without wearing the mask according to a detection result to be used as an image to be selected.

And fifthly, adding part of interference images and the images to be selected obtained in the previous step, inputting the interference images and the images to be selected into a trained human face recognition base image screening model for screening, and outputting the image with the highest confidence score in the images to be detected by the model.

And sixthly, storing the high-quality face image output by the model as a base library.

Preferably, the face recognition network in the second step is trained using the Resnet network.

Further, when the face recognition network is trained by using the Resnet network, the model is trained in two stages:

the first stage is as follows: and training a backbone network Resnet by taking a data set with a label as input for extracting the characteristics of the image to be detected.

And a second stage: training the GCN-V network:

(1) constructing a data set:

firstly, a data set with classification labels is given, and the features of each image in the given data set are extracted by utilizing a trained backbone network Resnet to form a feature set

Wherein f is_i∈R^DD represents a characteristic dimension, i ∈ {1,2,3 … … N }, N being the number of images in a given dataset; defining each image feature f_iFor node i, the similarity between node i and node j is denoted as a_i,j，a_i,jIs f_iAnd f_jCosine similarity between them, j ∈ {1,2,3 … … N }.

Acquiring k adjacent nodes of all image nodes according to the similarity between the image nodes, and constructing a similar graph G (V, E) by using the k adjacent nodes, wherein the method specifically comprises the following steps: each image feature is a node belonging to V, the similarity value of each node is sorted from large to small, the first K nodes are taken as neighbor nodes of the node, namely, a neighborhood of the node is formed, the vertex is connected with the neighbor nodes to form K edges belonging to E, a similarity graph can be represented as a vertex feature matrix F 'and an adjacency matrix A, the size of the vertex feature matrix F' is N multiplied by D, the size of the adjacency matrix A is N multiplied by N, and in the adjacency matrix A, if vi and vj are not connected, the similarity a between an image node i and a node j is_i,jIs updated to 0.

Solving a ground-truth confidence coefficient label; since data sets typically vary greatly within classes, there may be different confidence values even if each image belongs to the same class. The image with high confidence coefficient has more obvious characteristics in the class, the probability of belonging to the class is high, the image with low confidence coefficient is marginalized, and the characteristics in the class are weaker. Based on this feature, the confidence c of each node is defined according to its neighborhood_i: and for all adjacent nodes of the current node, if the adjacent nodes are consistent with the adjacent node types, accumulating the similarity between the adjacent nodes, if the adjacent nodes are not of the same type, subtracting the similarity between the current node and the adjacent node, and finally dividing the similarity by the number of the adjacent nodes of the current node to obtain the node confidence.

(2) Training of the model:

the inputs to the model are the vertex feature matrix F' and the adjacency matrix A in the dataset constructed in the previous step.

Secondly, performing polymerization operation on the feature matrix F 'and the adjacent matrix A, obtaining new features through L layers of convolution layers, performing regression on the new features in the last layer of linear layer of the network to obtain a predicted confidence value C',

C'＝F_LW+b

wherein W is a trainable regression variable, b is a trainable deviation, and L is the number of layers of graph convolution and can be adjusted according to the requirement; node v_iMay be selected from

Extracting corresponding elements from the Chinese medicinal materials with

And (4) showing.

(iii) training the model by minimizing the Mean Square Error (MSE) between the true confidence and the confidence score predicted by the model.

Preferably, the backbone network in the second step is not fixed, and can be replaced by any required face recognition network at will according to the use requirement.

Preferably, the fifth step specifically includes:

(1) firstly, an image characteristic set is obtained through a backbone network

(2) Processing F output by the backbone network: and acquiring k adjacent nodes of all image nodes according to the similarity between the image nodes, constructing a similar graph G (V, E), and expressing the similar graph G as a vertex characteristic matrix F' and an adjacent matrix A.

(3) And taking the vertex characteristic matrix F and the adjacency matrix A as the input of a GCN-V network, and calculating the confidence score of each image to be detected through GCN-V: c ═ F_LW+b。

(4) And screening the image with the highest confidence score for output.

Compared with the prior art, one of the technical schemes has the following beneficial effects: constructing a face recognition base image screening model based on a deep learning face recognition network and a graph convolution network, and screening the face base through the model, so that the screening efficiency is ensured, and the definition and the feature integrity of the screened face image are also ensured; meanwhile, only one high-quality image of one object is kept as a bottom library. Meanwhile, the highest-quality face image of the detected object is obtained through neural network learning and is used as a bottom library, a screening threshold value does not need to be set manually, and the influence caused by insufficient manual experience is reduced; meanwhile, the real-time performance is high, the requirement on hardware is lower, and the hardware cost is reduced; high-quality human faces can be effectively detected and screened out to be used as bottom library images.

Detailed Description

In order to clarify the technical solution and the working principle of the present invention, the embodiments of the present disclosure will be described in further detail below. All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.

The terms "first step," "second step," "third step," or the like, in the description and in the claims of this application, or the like, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the application described herein may be practiced in sequences other than those described herein.

The embodiment of the disclosure provides a face recognition base screening method based on a deep learning graph convolution network, which comprises the following steps: the method comprises the following steps:

the method comprises the steps of firstly, acquiring a video shot by a monitoring camera in a using scene; the method can be suitable for the construction of the face recognition base in any scene, such as airports, railway stations, banks and the like.

Preferably, the backbone network is not fixed in the second step, and can be replaced by any required face recognition network at will according to the use requirement; the flexibility is stronger, and the performance can not receive the influence that uses the scene change. All face recognition networks can be used, and different networks have different influences on the accuracy of model screening and the detection speed;

preferably, the face recognition network is trained using the Resnet network.

Further, when the face recognition network is trained by using the Resnet network, the model is trained in two stages.

And a second stage: training the GCN-V network:

(1) constructing a data set:

Acquiring k adjacent nodes of all image nodes according to the similarity between the image nodes, and constructing a similar graph G (V, E) by using the k adjacent nodes, wherein the method specifically comprises the following steps: each image feature is a node belonging to V, the similarity value of each node is sorted from large to small, the first K nodes are taken as neighbor nodes of the node, namely, a neighborhood of the node is formed, the top point is connected with the neighbor nodes to form K edges belonging to E, and the similarity graph can be represented as a top point feature matrix F' and the neighbor nodesThe size of the adjacency matrix A is NxD, the size of the adjacency matrix A is NxN, and in the adjacency matrix A, if vi and vj are not connected, the similarity a between the image node i and the node j_i,jIs updated to 0.

(2) Training of the model:

inputting a vertex characteristic matrix F' and an adjacent matrix A in the data set constructed in the previous step into the model;

C'＝F_LW+b

where W is a trainable regression variable and b is a trainable bias, L, i.e., the number of layers of graph convolution, may be adjusted as desired. Node v_iMay be selected from

Extracting corresponding elements from the Chinese medicinal materials with

And (4) showing.

The face recognition base screening model designed by the invention has no requirement on input images and can accept input of any size, so that the images obtained by utilizing a tracking algorithm do not need to be subjected to size constraint, and the applicability of the model is also improved. If the unit of use has a requirement on the size of the bottom library image, after all the frame images of the target object are acquired at this step, the images can be further preprocessed: and (5) adjusting the size. And after the size is uniformly adjusted, the subsequent operation is carried out.

And fourthly, detecting the image stored in the third step by using a paddlehub mask detection algorithm, and storing the image without wearing the mask according to a detection result to be used as an image to be selected. The images are preprocessed through mask detection, so that facial features of the finally screened high-quality face recognition base image are not shielded, and the influence of incomplete face images on the next extraction of similarity images is reduced.

Preferably, the fifth step specifically includes:

(1) firstly, an image characteristic set is obtained through a backbone network

(3) The vertex feature matrix F and adjacencyThe matrix A is used as the input of a GCN-V network, and the confidence score of each image to be detected is calculated through GCN-V: c ═ F_LW+b。

(4) And screening the image with the highest confidence score for output.

The confidence degree represents the quality score of the image, and the higher the quality score is, the higher the quality of the image is, and the more the requirement on the bottom library image is met. For the image with high confidence level, the included features in the class are more comprehensive and can represent the belonging class more completely. Images with low confidence usually contain less obvious and incomplete features in the class and cannot represent the class. Therefore, the image with the highest confidence score in the same class is the image with the highest quality, which can most comprehensively represent the features in the class, in all the images in the class. Based on the face image feature information, the face image with the highest confidence level score in the same class is screened as a base library.

The invention has been described above by way of example, it is obvious that the specific implementation of the invention is not limited by the above-described manner, and that various insubstantial modifications are possible using the method concepts and technical solutions of the invention; or the technical scheme of the invention can be directly applied to other occasions without improvement and equivalent replacement, and the invention is within the protection scope of the invention.

Claims

1. A face recognition base screening method based on a deep learning graph convolution network is characterized by comprising the following steps:

the method comprises the steps of firstly, acquiring a video shot by a monitoring camera in a using scene;

secondly, selecting a face recognition network as a main network according to the requirements of a use scene, combining a graph convolution network as a branch, constructing a face image quality evaluation model, and then training the model;

thirdly, detecting a face object from the captured video, and adopting a lightweight face detection algorithm MTCNN for face detection; selecting a detected face object as a target, marking the target to prevent repeated selection of next detection, tracking the selected target in each frame later by using a KCF target tracking algorithm, and intercepting and storing a frame image of the face of the target;

fourthly, detecting the image stored in the third step by using a paddlehub mask detection algorithm, and storing the image without wearing the mask according to a detection result to be used as an image to be selected;

fifthly, adding part of interference images and the images to be selected obtained in the previous step, inputting the interference images and the images to be selected into a trained face recognition base image screening model for screening, and outputting the images with the highest confidence scores in the images to be detected by the model;

2. The method as claimed in claim 1, wherein the face recognition base screening method based on the deep learning graph convolution network is characterized in that the face recognition network in the second step is trained by using a Resnet network.

3. The method for screening the face recognition base based on the deep learning graph convolution network as claimed in claim 2, wherein when the face recognition network is trained by using the Resnet network, the model is trained in two stages:

the first stage is as follows: training a backbone network Resnet by taking a data set with a label as input, and extracting the characteristics of an image to be detected;

and a second stage: training the GCN-V network:

(1) constructing a data set:

Wherein f is_i∈R^DD denotes the feature dimension, i ∈ {1,2,3 … … N }, N being the image in a given datasetThe number of (2); defining each image feature f_iFor node i, the similarity between node i and node j is denoted as a_i,，j，a_i,jIs f_iAnd f_jCosine similarity between them, j ∈ {1,2,3 … … N };

acquiring k adjacent nodes of all image nodes according to the similarity between the image nodes, and constructing a similar graph G (V, E) according to the k adjacent nodes, wherein the method specifically comprises the following steps: each image feature is a node belonging to V, the similarity value of each node is sorted from large to small, the first K nodes are taken as neighbor nodes of the node, namely, a neighborhood of the node is formed, the vertex is connected with the neighbor nodes to form K edges belonging to E, a similarity graph can be represented as a vertex feature matrix F 'and an adjacency matrix A, the size of the vertex feature matrix F' is NxD, the size of the adjacency matrix A is NxN, in the adjacency matrix A, if vi and vj are not connected, the similarity a between an image node i and a node j is obtained_i,jUpdating to 0;

solving a ground-truth confidence coefficient label; since data sets typically vary greatly within classes, there may be different confidence values even if each image belongs to the same class; the image with high confidence coefficient has more obvious characteristics in the class, the probability of belonging to the class is high, the image with low confidence coefficient is marginalized, and the characteristics in the class are weaker; based on this feature, the confidence c of each node is defined according to its neighborhood_i: for all adjacent nodes of the current node, if the type of the adjacent nodes is consistent with that of the adjacent nodes, the similarity between the current node and the adjacent nodes is accumulated, if the type of the adjacent nodes is not the same, the similarity between the current node and the adjacent nodes is subtracted, and finally, the number of the adjacent nodes of the current node is divided to obtain the node confidence;

(2) training of the model:

secondly, the feature matrix F 'and the adjacent matrix A are subjected to polymerization operation, new features are obtained through L layers of convolution layers, the new features are regressed in the last layer of linear layer of the network to obtain a predicted confidence value C',

C′＝F_LW+b

wherein W is a trainable regression variableB is a trainable bias, L is the number of layers of graph convolution, which can be adjusted as required; node v_iMay be selected from

Extracting corresponding elements from the Chinese medicinal materials with

Represents;

4. The method for screening the face recognition base based on the deep learning graph convolution network as claimed in claim 1, wherein the backbone network is not fixed in the second step, and any required face recognition network can be replaced as required at will according to the requirements of use.

5. The method for screening the face recognition base based on the deep learning graph convolution network according to any one of claims 1 to 4, wherein the fifth step specifically comprises:

(1) firstly, an image characteristic set is obtained through a backbone network

(2) Processing F output by the backbone network: acquiring k adjacent nodes of all image nodes according to the similarity between the image nodes, constructing a similar graph G (V, E), and expressing the similar graph G as a vertex characteristic matrix F' and an adjacent matrix A;

(3) and taking the vertex characteristic matrix F and the adjacency matrix A as the input of a GCN-V network, and calculating the confidence score of each image to be detected through GCN-V: c ═ F_LW+b；

(4) And screening the image with the highest confidence score for output.