CN108090406B

CN108090406B - Face recognition method and system

Info

Publication number: CN108090406B
Application number: CN201611048348.3A
Authority: CN
Inventors: 葛主贝
Original assignee: Zhejiang Uniview Technologies Co Ltd
Current assignee: Zhejiang Uniview Technologies Co Ltd
Priority date: 2016-11-23
Filing date: 2016-11-23
Publication date: 2022-03-11
Anticipated expiration: 2036-11-23
Also published as: CN108090406A

Abstract

The application provides a face recognition method and a face recognition system, wherein the method comprises the following steps: extracting the face characteristics of face images shot by a camera under different scenes through a CNN network; calculating the similarity of each face feature and each preset face feature to be distributed and controlled and sequencing the similarity; if the maximum similarity is larger than a preset alarm threshold, recording the ternary characteristics corresponding to each camera; when the total number of the ternary features corresponding to each camera reaches a preset number, inputting the ternary features corresponding to the camera into a secondary fine tuning network corresponding to the camera for self-training to obtain a fine tuning model corresponding to the camera, and when face features of a face image shot by the camera are extracted next time, sequentially inputting the face image shot by the camera into a CNN network and the current fine tuning model of the camera to obtain the face features of the face image. The method and the device are suitable for each camera environment, and the recognition rate can be continuously improved.

Description

Face recognition method and system

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a face recognition method and system.

Background

The face recognition system is widely applied to various fields such as internet, monitoring, finance, public security, schools, prisons and the like, and mainly adopts the technologies such as face detection, correction, feature extraction, feature comparison and the like to recognize faces so as to achieve the functions of control, personnel verification and the like.

At present, the feature extraction in a face recognition system generally adopts large-sample offline model training, taking a neural network as an example, firstly a large number of face samples need to be prepared, then a network model is designed, the face samples are used as the input of the network model, and a feature extraction model is trained by the network model.

In consideration of the fact that the feature model needs to be applied to each scene, different image acquisition environments such as illumination, camera imaging quality, camera position angle and the like exist in each scene, various face samples in various different scenes need to be acquired when the face samples are prepared, and the sample amount is usually hundreds of thousands to millions of orders of magnitude, so that great difficulty is brought to the collection of the face samples. However, to fit such a huge sample size, the network model will be increased and deepened, and the model parameters will be increased, thereby causing the feature extraction to be very time-consuming. And the trained feature extraction model cannot be efficiently applied to such many actual environment scenes, and the whole system integrates the face recognition rate of each scene to show the common effect.

A robust face recognition method based on Gabor wavelet and model self-adaptation mainly retrains a mapping matrix of an acquired real environment face image, combines the mapping matrix calculated by original data, and updates a face feature model so as to improve the robustness of a face system to the environment and improve the recognition rate. The method uses traditional characteristic Gabor wavelet extraction as face description characteristics, has limited feature description capacity, can only identify trained faces (a plurality of faces are needed), and cannot support single base pictures to be put in storage and carry out similarity comparison with snap images. When the model is adjusted, all newly collected samples are trained to obtain an additive mapping matrix, the self-adaptation of the model can only improve the recognition rate of the class, and the model has no migration promotion effect on other classes.

The human face recognition method and the human face recognition device with the automatic base map updating function replace a newly acquired human face image with a base map with the worst quality in a recognition library so as to achieve the purpose of continuously improving the quality of the base map and further improve the next recognition rate. The method and the device need to input a plurality of face images, but at present, each face system generally only provides one identification photo, and the application field is limited. The method and the device continuously change the base map of the identified user through long-time operation of the system, only can improve the identification rate of the frequently identified user, and have no identification rate improvement effect on other users.

Disclosure of Invention

In view of this, the present application provides a face recognition method and system to solve the problem that a face recognition system in the prior art cannot perform face recognition adaptively along with the environment.

Specifically, the method is realized through the following technical scheme:

according to a first aspect of the present application, a face recognition method is provided, which is applied to a face recognition system, where the face recognition system includes a CNN network and a secondary fine-tuning network respectively constructed for each camera, and the method includes:

inputting face images shot by cameras positioned in different scenes into a CNN network for face feature extraction;

calculating the similarity between each face feature and each preset face feature to be distributed and controlled, and sequencing the similarity corresponding to each face feature respectively;

judging whether the maximum similarity of the face features is greater than a preset alarm threshold value or not;

if so, recording ternary features corresponding to each camera, wherein the ternary features comprise a face feature with the maximum similarity degree larger than a preset alarm threshold value, a face feature to be distributed and controlled with the maximum similarity degree with the face feature and a face feature to be distributed and controlled with the second similarity degree with the face feature, which are captured by the camera;

when the total number of the ternary features recorded aiming at a certain camera reaches a preset number, inputting the ternary features corresponding to the camera into a secondary fine tuning network corresponding to the camera for self-training to obtain a fine tuning model corresponding to the camera;

and when the face features of the face image shot by the camera are extracted next time, the face image shot by the camera is sequentially input into the CNN network and the current fine tuning model of the camera to obtain the face features of the face image.

Optionally, the quadratic fine-tuning self-training network includes:

constructing a full connection layer corresponding to each camera, and inputting the face features in the ternary features corresponding to each camera into the full connection layer corresponding to the camera;

inputting the features output by the full connection layer corresponding to each camera, the face features to be deployed and controlled with the maximum similarity in the ternary features of the corresponding cameras and the face features to be deployed and controlled with the second similarity into the triple Loss layer for learning, and obtaining the parameters of the full connection layer corresponding to each camera as the fine-tuning model parameters corresponding to the camera.

Optionally, the training process of the CNN network includes:

designing a CNN network structure layer to carry out primary extraction on the characteristics of a plurality of face samples with labels;

and inputting the features extracted from the CNN network structure layer into a Softmax Loss layer for classification training to obtain parameters of the CNN network.

Optionally, before the face image shot by each camera is input into a CNN network for face feature extraction, face detection, face correction and image preprocessing are performed on the face image shot by each camera, so as to obtain a face region of the face image shot by each camera.

Optionally, the processor performs face feature extraction, similarity comparison and ternary feature recording at the deployment time, and performs self-training of the secondary fine-tuning network corresponding to each camera at the non-deployment time.

According to a second aspect of the present application, there is provided a face recognition system, where the face recognition system includes a CNN network and a secondary fine adjustment network respectively constructed for each camera, and the face recognition system further includes:

the sample acquisition module is used for inputting face images shot by cameras positioned in different scenes into a CNN network to extract face features;

the calculation sequencing module is used for calculating the similarity between each face feature and each preset face feature to be distributed and controlled and sequencing the similarity corresponding to each face feature;

the judging module is used for judging whether the maximum similarity of the face features is greater than a preset alarm threshold value or not;

the recording module is used for recording the ternary features corresponding to each camera when the maximum similarity of each face feature is greater than a preset alarm threshold, wherein the ternary features comprise the face feature with the maximum similarity greater than the preset alarm threshold, the face feature to be distributed and controlled with the maximum similarity to the face feature and the face feature to be distributed and controlled with the second similarity to the face feature, and the face feature to be distributed and controlled with the second similarity to the face feature;

the secondary fine-tuning module is used for inputting the ternary features corresponding to a certain camera into a secondary fine-tuning network corresponding to the camera for self-training when the total number of the ternary features recorded by the camera reaches a preset number, so as to obtain a fine-tuning model corresponding to the camera;

and the face recognition module is used for sequentially inputting the face images shot by the camera into the CNN network and the current fine adjustment model of the camera when face features of the face images shot by the camera are extracted next time, so that the face features of the face images are obtained.

Optionally, the secondary fine tuning network includes:

the full connection layer corresponding to each camera receives the face features in the corresponding camera ternary features;

and the triple Loss layer receives and learns the features output by the full connection layer corresponding to each camera, the face feature to be deployed and controlled with the maximum similarity and the face feature to be deployed and controlled with the second similarity in the ternary features of the corresponding cameras, and obtains the parameters of the full connection layer corresponding to each camera as the fine-tuning model parameters corresponding to the camera.

Optionally, the CNN network offline training process includes:

performing primary extraction of features on a plurality of face samples with labels by a CNN network structure layer;

and the Softmax Loss layer receives the features extracted by the CNN network structure layer and carries out classification training on the features extracted by the CNN network structure layer to obtain the parameters of the CNN network.

Optionally, the system further comprises:

and the face area acquisition module is used for carrying out face detection, face correction and image preprocessing on the face images shot by the cameras before inputting the face images shot by the cameras into a CNN network for face feature extraction, and acquiring the face areas of the face images shot by the cameras.

Optionally, the sample obtaining module, the calculation sorting module, the judging module, the recording module and the face recognition module operate in a control distribution time, and the secondary fine adjustment module operates in a non-control distribution time.

The beneficial effect of this application: the method comprises the steps of utilizing a CNN network trained offline to extract face features, namely identifying face images captured by cameras by using a pre-trained basic recognition rate, reserving useful features in the identification process, carrying out corresponding secondary fine-tuning network self-training aiming at the cameras to obtain fine-tuning models corresponding to the cameras, and using the CNN network and the fine-tuning models corresponding to the cameras obtained by current self-training to continuously improve the recognition rate during the subsequent face feature extraction. Each camera corresponds to a self-training fine-tuning model, and the fine-tuning models corresponding to the cameras are concentrated on feature expression under corresponding camera data, so that the system is adaptive to the environment of the cameras, and the recognition rate is continuously improved. And the self-training work of the fine-tuning model corresponding to each camera can not only improve the similarity between the captured user and the base library, but also has a migration and improvement effect on other users which are not captured under respective scenes.

Drawings

Fig. 1 is a flow chart illustrating offline training of a CNN network according to an exemplary embodiment of the present application;

FIG. 2 is a flow chart illustrating a face recognition method according to an exemplary embodiment of the present application;

fig. 3 is a flowchart illustrating a warehousing process of a face image to be scheduled and controlled according to an exemplary embodiment of the present application;

FIG. 4 is a flowchart illustrating an exemplary embodiment of the present application for obtaining a three-dimensional feature of a face image;

FIG. 5 is a schematic diagram of a feature comparison structure shown in an exemplary embodiment of the present application;

FIG. 6 is a flow diagram illustrating self-training of a quadratic vernier network according to an exemplary embodiment of the present application;

FIG. 7 is a diagram illustrating the results of a quadratic trim network self-training process according to an exemplary embodiment of the present application;

FIG. 8 is a schematic diagram of a network structure of a face recognition feature extraction module according to an exemplary embodiment of the present application;

FIG. 9 is a processor work flow diagram, shown in an exemplary embodiment of the present application;

FIG. 10 is a schematic diagram illustrating a face recognition system according to an exemplary embodiment of the present application;

fig. 11 is a block diagram of a specific face recognition system according to an exemplary embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

A set of face monitoring system usually includes a plurality of cameras, and the cameras are erected in different environmental scenes, which may cause differences in illumination, angle, scale, image quality, and the like of face images acquired by the cameras. Typically, these cameras transmit facial image data to a server, which performs facial comparisons. The server usually adopts a fixed feature extraction model, so that the face recognition rate in each camera environment is different and limited by the acquisition environment and the number of samples of training samples, and the face recognition rate is low in some camera scenes with large acquisition environment difference.

In order to solve the problems, the application provides a face recognition method and a face recognition system, on the basis of the initial basic recognition rate of the system, according to the environments of different cameras, corresponding feature extraction models are continuously updated in an iterative mode to adapt to the environments of the cameras, and therefore the face recognition rate is comprehensively improved.

Referring to fig. 1, before performing face recognition (i.e., face feature extraction), offline training is required to obtain a CNN (Convolutional Neural Network) Network, and the step of obtaining the CNN Network may include: firstly, designing a CNN network structure layer to carry out primary extraction of features on a plurality of face samples with labels; then, the features extracted from the CNN network structure layer are input into a Softmax Loss (i.e., a classification Loss function) layer for classification training, so as to obtain parameters of the CNN network.

The CNN network structure layer receives a large number of images of people to be controlled with labels as a training sample vector set, the images are processed and then characteristics are output, and the Softmax Loss layer performs classification training on the characteristics output by the CNN network structure layer to obtain parameters of the CNN network.

In this embodiment, the CNN network structure layer may be set as needed, and the design requirement of the CNN network structure layer satisfies: the convergence can be normal, and the face features with the characterization function can be extracted.

The Softmax Loss layer performs classification training by using a Softmax regression algorithm, and the cost function J (θ) of the Softmax regression algorithm in this embodiment is:

in formula (1), θ is a parameter to be trained, such as the weight w and the offset b of each convolutional layer and each fully-connected layer and other layers in the CNN network structure layer;

m is the input sample amount, and the input sample is the labeled face sample input to the CNN network structure layer;

k is the total number of classified predictions;

x is an input sample, namely a face image to be deployed and controlled in step S101;

y is the class label of the input sample;

j and l are values [1, k ] for the sample classification label;

i is the serial number value [1, m ] of the input sample;

theta is all parameters of the model, theta₁，θ₂，...，θ_kIs the model parameter and T is the matrix transpose operation.

In order to facilitate the use of subsequent secondary fine-tuning self-training, an L2 normalization layer is arranged between the CNN network structure layer and the Softmax Loss layer, and the L2 normalization layer performs L2 normalization on the features output by the CNN network structure layer and inputs the features into the Softmax Loss layer for classification training.

It should be noted that the cost function J (θ) of the Softmax regression algorithm can be set to other formulas according to the experience of one of ordinary skill in the art.

In this embodiment, the feature vector formed by the features output by the CNN network structure layer is x (x)₁，x₂，...，x_n) Then the formula for L2 normalization (norm normalization) is:

in the formula (2), x_i' is the value of the ith feature in the feature vector x after normalization;

x_iis the value of the ith (i is more than or equal to 1 and less than or equal to n) feature in the feature vector x. In this embodiment, the process of obtaining the face sample with the label may include the following steps:

carrying out face detection on the plurality of face images to obtain a face area of each face image;

and performing face correction (such as face pointing), image preprocessing (such as rotation, similarity transformation and the like) and labeling on the face area of each face image to obtain a labeled face sample.

In this embodiment, the face detection algorithm may select a conventional face detection algorithm as needed, for example, LBP (Local Binary Patterns), Haar (wavelet transform), HOG (Histogram of Oriented Gradient), SURF (Speed-up robust features) plus adaboost (iterative algorithm), SVM (Support Vector Machine), neural network algorithm, and the like.

The face correction algorithm may also select a conventional face correction algorithm, such as a neural network algorithm, as desired.

The method comprises the steps of labeling each face image, namely setting unique identifiers for various face images, namely setting the identifiers of different face images of the same person to be the same, setting the identifiers of the face images among different persons to be different, and inputting various face images with the unique identifiers into a CNN network structure layer to carry out primary extraction on features.

As shown in fig. 2, which is a flowchart of a face recognition method provided in this embodiment, the face recognition method may be applied to a face recognition system, where the face recognition system includes a CNN network and a secondary fine-tuning network respectively constructed for each camera.

Before face recognition is performed, the face recognition system needs to extract the face features of the face image to be deployed and controlled to obtain the face image to be deployed and controlled, which specifically comprises the following steps:

and inputting the face images to be distributed into the CNN network, and extracting the face features of the face images to be distributed.

And before extracting the face features of each face image to be distributed, the images of the persons to be distributed need to be put in storage, and the face recognition system performs face detection, face correction and image preprocessing to obtain the face image to be distributed.

Specifically, referring to fig. 3, an image of a person to be controlled is imported by a user, and a face recognition system determines whether the image of the person to be controlled is read completely, if not, the image of the person to be controlled is read continuously, after the image of the person to be controlled is read successfully, the face recognition system performs face detection on the image of the person to be controlled to obtain a face region, performs face correction on the face region to obtain a more accurate face region, and performs image preprocessing on the corrected face region to obtain the face image to be controlled.

In the process, if the face recognition system judges that the image reading of the personnel to be controlled is unsuccessful or the face detection does not detect the face area, the warehousing is failed, and the warehousing operation of the images of the personnel to be controlled needs to be carried out again.

And when the face recognition system judges that the image of the person to be controlled is completely read, the image of the person to be controlled is put in storage.

In this embodiment, a face detection algorithm is used to perform face region detection, and a face rectification algorithm is used to perform face spotting.

The face recognition system stores the face features of the face images to be deployed and controlled and the corresponding face images to be deployed and controlled into a database for subsequent calling.

Referring to fig. 2, the face recognition method provided in this embodiment may include:

s101: and inputting the face images shot by the cameras positioned in different scenes into a CNN network for face feature extraction.

Referring to fig. 4, cameras are arranged in different scenes such as a mall entrance, a subway entrance, a station and the like, and each camera continuously captures a face image and sends the captured face image to a face recognition system.

The face recognition system performs face detection on the input face image to judge whether a face exists in the input face image, and if so, records a camera number (i.e., camera ID, identification) for capturing the face image. Then, the face recognition system performs face detection, face correction, image preprocessing and CNN network on the face image to obtain face features corresponding to the face image one by one.

The human face detection, the human face correction and the image preprocessing are the same as the human face sample processing process in the CNN network offline training process, and are not described herein again.

S102: and calculating the similarity between each face feature and each preset face feature to be distributed and controlled, and sequencing the similarity corresponding to each face feature respectively.

Referring to fig. 4, a similarity comparison algorithm is used to calculate the similarity between each face feature obtained in step S102 and the face feature to be controlled, that is, the similarity comparison algorithm is used to calculate the similarity between each face feature and each face feature to be controlled, so as to obtain the similarity corresponding to each face feature, and then the face recognition system sorts the similarities corresponding to each face feature, so as to obtain the sorting result of the similarities corresponding to each face feature.

In one example, the number of faces to be controlled is N, after similarity calculation, the number of similarities corresponding to each face feature is N, and the N similarities corresponding to each face feature are sorted.

The similarity can be calculated by selecting Euclidean distance, cosine distance and the like between the human face features and the human face features to be distributed and controlled.

S103: and judging whether the maximum similarity of the face features is greater than a preset alarm threshold value T or not.

The maximum similarity of each face feature is the similarity between the face feature to be distributed and controlled with the highest similarity with each face feature and the corresponding face feature.

In an example, the preset alarm threshold T is 80%, and when the maximum similarity of each face feature is greater than 80%, the method proceeds to step S104, and outputs alarm information, where the alarm information indicates that the face feature matches one of the face features to be controlled, that is, the face feature is determined to be a reliable sample, and may be used as a self-training sample of the secondary fine-tuning network.

The alarm signal can be selected according to needs, for example, a pop-up dialog box can be selected to remind the worker, and the human face feature is the human face feature to be deployed and controlled.

S104: recording ternary features corresponding to each camera, wherein the ternary features comprise a face feature a (anchor) with the maximum similarity degree of the camera shooting being greater than a preset alarm threshold value, a face feature p (Positive, namely the same type sample of a) to be distributed and controlled with the maximum similarity degree of the face feature, and a face feature b (Negative, namely the closest sample in the non-same type sample of a) to be distributed and controlled with the second similarity degree of the face feature.

In this embodiment, the face feature a with the maximum similarity greater than the preset alarm threshold, the face feature p to be deployed and controlled with the maximum similarity to the face feature (as the same type sample of a), and the face feature b to be deployed and controlled with the second similarity to the face feature (as the closest sample of the non-same type samples of a) form a ternary feature.

In one embodiment, in order to adapt to the environment of each camera, when the ternary features are recorded, the camera numbers to which the ternary features belong are also recorded, so that the cameras are subjected to secondary fine tuning self-training of the corresponding network respectively according to the ternary features recorded by the cameras.

In another embodiment, the face recognition system sets a corresponding storage module for each camera, and each storage module is used for the ternary feature corresponding to the corresponding camera, and when the ternary feature is recorded, the serial number of the storage device in which the ternary feature is located is also recorded, so that secondary fine tuning training of the corresponding network is performed on each camera. Referring to fig. 5, in one example, a face detection is performed on a surveillance camera image of a certain camera to obtain a face image captured by the face camera, then face correction and image preprocessing are carried out, CNN network is input for feature extraction to obtain face features corresponding to the face image, similarity pair calculation is carried out on the face features and the face features to be distributed and controlled corresponding to each warehousing certificate photo (namely the image of the person to be distributed and controlled) by utilizing a feature comparison algorithm to obtain the similarity of the face features and the face features to be distributed and controlled of the first warehousing certificate photo of 0.3, the similarity between the face characteristic to be distributed and controlled of the second picture of warehousing certificate photo and the face characteristic to be distributed and controlled of the third picture of warehousing certificate photo is 0.9, the similarity between the face characteristic to be distributed and controlled of the third picture of warehousing certificate photo and the face characteristic to be distributed and controlled of the third picture of warehousing certificate photo is 0.6, and sequencing the similarity corresponding to the face features to obtain a second warehousing certificate photo with the maximum similarity of 0.9.

And if the face recognition system judges that the maximum similarity of the face features is more than 80 percent, popping an alarm dialog box, namely showing that the face image is matched with the second warehousing certificate photo, recording the face features, and forming the ternary features corresponding to the camera by the face features to be distributed and controlled corresponding to the second warehousing certificate photo and the face features to be distributed and controlled corresponding to the third certificate photo.

S105: and when the total number of the ternary features recorded aiming at a certain camera reaches a preset number, inputting the ternary features corresponding to the camera into a secondary fine tuning network corresponding to the camera for self-training to obtain a fine tuning model corresponding to the camera.

The face recognition system judges whether the total number of reliable face ternary features captured by each camera is greater than a preset number N, and if yes, self-training of a secondary fine-tuning network of the batch of samples is started for the camera; otherwise, the self-training of the secondary fine-tuning network of the camera is abandoned.

Optionally, the secondary fine tuning network includes:

inputting the features output by the full connection layer corresponding to each camera, the face features to be deployed and controlled with the maximum similarity in the ternary features of the corresponding cameras and the face features to be deployed and controlled with the second similarity into a triple Loss layer for learning, and obtaining the parameters of the full connection layer corresponding to each camera as the fine tuning model parameters corresponding to the camera.

In this embodiment, the triple Loss layer is used for the second fine tuning network training. The input of the triple Loss layer is the recorded ternary characteristics corresponding to each camera.

The calculation formula of the Loss function L of the triple Loss layer is as follows:

in the formula (3), the first and second groups,

the ith Anchor feature vector in the input set is the ith feature output by the full-connected layer corresponding to each camera;

the method comprises the steps of inputting an ith Positive feature vector in a set, namely the ith face feature to be distributed and controlled with the largest similarity in the corresponding camera ternary features;

the method comprises the steps of inputting an ith Negative feature vector in a set, namely the ith similarity second face feature to be distributed and controlled in the corresponding camera ternary feature;

f is a fully-connected fc-layer function, alpha is a training setting parameter, and N is a natural number.

And when L is not reduced in the iteration period any more or training iteration is carried out to preset times, acquiring parameters of the full connection layer function corresponding to each camera as fine tuning model parameters corresponding to the camera. That is, when the iteration period is not reduced any more, or the training is iterated to a preset number of times, the parameter of the fine tuning model fc (i) output by the triple Loss layer is the result of self-training of the secondary fine tuning network. The fine tuning model fc (i) is only used by the camera number or the camera corresponding to the storage device with the number i.

It should be noted that the formula for calculating the Loss function L of the Triplet Loss layer may be set to other formulas by a person skilled in the art based on experience.

Referring to fig. 7, after the triple Loss layer is passed, the distance between the feature Anchor and the feature Positive is shortened, and the distance between the feature Anchor and the feature Negative is lengthened, so that the distance between the feature Anchor and the feature Positive is shortened through training and learning of the triple Loss layer, and face matching can be performed more accurately.

And in order to learn the abnormal parameters of the scene environment of each camera, when the secondary fine tuning network carries out self-training, a full-connection fc layer is respectively added to each camera. Referring to fig. 6, the face features corresponding to the camera numbers are input into the fully-connected fc layer corresponding to the camera, and the input feature dimensions and the output feature dimensions of the fully-connected fc layer are consistent.

The face recognition system inputs the feature a output by the full-connection fc layer corresponding to each camera, the face feature p to be deployed and controlled with the largest face feature similarity and the face feature n to be deployed and controlled with the second face feature similarity into the triple Loss layer for learning, and the fine tuning model fc (i) corresponding to each camera is obtained.

In order to simplify the calculation, an L2 normalization layer is arranged between the fully connected fc layer and the Triplet Loss layer, and the characteristics output by the fully connected fc layer are input into the Triplet Loss layer after being subjected to L2 normalization. The calculation method of the L2 normalization layer can be seen in formula (2).

S106: and when the face features of the face image shot by the camera are extracted next time, the face image shot by the camera is sequentially input into the CNN network and the current fine tuning model of the camera to obtain the face features of the face image. Referring to fig. 8, the face recognition feature extraction model of each camera is updated to the CNN network and the current fine tuning model of the camera learned for the camera training in step S106.

Optionally, after L2 normalization is performed on the fine tuning model fc (i) corresponding to each camera, the current fine tuning model of the camera is obtained, so as to simplify the calculation.

In this embodiment, after the facial features of the facial images captured by the cameras are obtained in step S106, the process returns to step S102 to continue to be executed downwards, and the fine-tuning models of the cameras are continuously iterated, so that the facial recognition rate of each camera is continuously improved.

In addition, it should be noted that the parameters of the CNN network do not need to be changed. Because the computing performance of a processor in the face recognition system is often limited, the method adopts the control-deployed time period for recognition comparison processing, and adopts the non-control-deployed time period for model fine adjustment, so that the computing resources of the face recognition system are fully used.

The deployment and control time period is selected as a working time period of the camera scene environment according to needs, such as 8:00 in the morning to 10:00 in the evening. The non-deployment time period is a non-working time period of the camera scene environment.

Referring to fig. 9, in the deployment and control time period, the face recognition system performs face detection, face correction, image preprocessing, face feature extraction, similarity calculation sorting, alarm processing, and ternary feature records corresponding to each camera. And in the non-deployment time period, the face recognition system carries out secondary fine-tuning network self-training to obtain a fine-tuning model corresponding to each camera, and the obtained fine-tuning model corresponding to each camera and the CNN network are used as a face extraction model of the corresponding camera in the next deployment time.

The face recognition system of the embodiment processes different tasks in a time-sharing manner, and in a deployment and control time period, the face recognition system performs the face recognition deployment and control task by using a fine tuning model corresponding to each camera after self-training and updating of a CNN network and a subsequent secondary fine tuning network; in the non-distribution control time period, the face recognition system performs self-training of the secondary fine-tuning network corresponding to each camera to obtain a new fine-tuning model corresponding to the camera, and computing resources of the face recognition system are fully utilized.

As shown in fig. 10, a block diagram of a face recognition system provided in the present application, corresponding to the face recognition method, may refer to an embodiment of the face recognition method to understand or explain the contents of the face recognition system.

Referring to fig. 10, the face recognition system provided in this embodiment includes a CNN network and a secondary fine-tuning network respectively constructed for each camera, and further includes a sample obtaining module 101, a calculation sorting module 102, a determining module 103, a recording module 104, a secondary fine-tuning module 105, and a face recognition module 106.

The sample acquisition module 101 is configured to input face images shot by cameras located in different scenes into a CNN network to perform face feature extraction.

The calculating and sorting module 102 is configured to calculate similarity between each face feature and each preset face feature to be deployed and controlled, and sort the similarity corresponding to each face feature.

The judging module 103 is configured to judge whether the maximum similarity of the face features is greater than a preset alarm threshold.

When the maximum similarity of each face feature is greater than a preset alarm threshold, the recording module 104 records the ternary features corresponding to each camera, where the ternary features include a face feature whose maximum similarity of the camera is greater than the preset alarm threshold, a face feature to be deployed and controlled whose similarity with the face feature is the maximum, and a face feature to be deployed and controlled whose similarity with the face feature is the second.

When the total number of the ternary features recorded for a certain camera reaches a preset number, the secondary fine tuning module 105 inputs the ternary features corresponding to the camera into a secondary fine tuning network of the camera for self-training, and obtains a fine tuning model corresponding to the camera.

In this embodiment, the secondary fine-tuning network includes:

Optionally, each feature output by the fully-connected network is normalized by L2 and then input to the Triplet Loss layer.

wherein the content of the first and second substances,

the ith feature output for the fully connected layer corresponding to each camera,

the face feature to be deployed and controlled with the maximum ith similarity in the corresponding camera three-dimensional features,

f is a full connection layer function, and alpha is a training setting parameter, wherein f is a face feature to be distributed and controlled with the second ith similarity in the corresponding camera ternary features;

and when L is not reduced in the iteration period any more or training iteration is carried out to preset times, acquiring parameters of the full connection layer function corresponding to each camera as fine tuning model parameters corresponding to the camera.

It should be noted that the calculation formula of the Loss function L of the Triplet Loss layer may be set to other formulas according to the experience of one of ordinary skill in the art.

The face recognition module 106, when extracting the face features of the face image taken by the camera next time, sequentially inputs the face image taken by the camera into the CNN network and the current fine tuning model of the camera, and obtains the face features of the face image.

The face recognition system of this embodiment further includes a parameter extraction module (not shown in the figure), configured to input the face image to be deployed and controlled into the CNN network to perform face feature extraction to be deployed and controlled.

In this embodiment, the process of obtaining the CNN network through offline training may include:

and then, the Softmax Loss layer receives the features extracted by the CNN network structure layer and carries out classification training on the features extracted by the CNN network structure layer to obtain the parameters of the CNN network.

And the classification features extracted by the CNN network structure layer are input into a Softmax Loss layer after being subjected to normalization processing by L2.

The calculation formula of the cost function J (theta) of the Softmax Loss layer is as follows:

wherein θ is a parameter to be trained, m is an input sample size, k is a total number of classified predictions, x is an input sample, and y is a class label of the input sample.

In this embodiment, the system further includes:

and a face region acquisition module (not shown in the figure), which is used for carrying out face detection, face correction and image preprocessing on the face image to be controlled or the face image shot by each camera before inputting the face image to be controlled or the face image shot by each camera into a CNN network for face feature extraction, and acquiring the face region of the face image to be controlled or the face image shot by each camera.

In one example, the sample acquiring module 101, the calculation sorting module 102, the judging module 103, the recording module 104 and the face recognition module 106 operate at deployment time, and the secondary fine tuning module 105 operates at non-deployment time to fully utilize the calculation resources of the face recognition system.

In another embodiment, referring to fig. 11, the face recognition system includes a processor, a storage device, an input device (e.g., mouse, keyboard, microphone), an output device (e.g., screen, speaker, warning light, etc.), and a face camera sequence. The storage device, the input device, the output device and the human face camera sequence are all in communication connection with the processor.

The processor is a Central Processing Unit (CPU) or a Graphics Processing Unit (GPU). The face cameras are respectively used in different camera environments, images are shot at the control time and returned to the processor, and the processor performs face recognition, similarity comparison and ternary feature extraction. The storage device is used for storing the face images to be controlled in the bottom library and the corresponding face features to be controlled, and storing the ternary features corresponding to the cameras and the camera numbers or the numbers of the memories.

The output device may be a warning light or the like for warning.

To sum up, the face recognition method and system of the present application use the offline trained CNN network to perform face feature extraction, that is, use the pre-trained basic recognition rate to recognize the face image captured by each camera, retain the ternary features useful for each camera in the recognition process, perform secondary fine-tuning network self-training on each camera to obtain the fine-tuning model corresponding to each camera, and use the CNN network and the fine-tuning model corresponding to each camera obtained by the current self-training to continuously improve the recognition rate during the subsequent face feature extraction. Each camera corresponds to a self-training fine-tuning model, and the fine-tuning models corresponding to the cameras are concentrated on feature expression under corresponding camera data, so that the system is adaptive to the environment of the cameras, and the recognition rate is continuously improved. And the self-training work of the fine-tuning model corresponding to each camera can not only improve the similarity between the captured user and the base library, but also has a migration and improvement effect on other users which are not captured under respective scenes.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. A face recognition method is applied to a face recognition system, and is characterized in that the face recognition system comprises a CNN network and a secondary fine tuning network respectively constructed for each camera, and the method comprises the following steps:

when the facial features of the facial image shot by the camera are extracted next time, the facial image shot by the camera is sequentially input into a CNN network and a current fine tuning model of the camera to obtain the facial features of the facial image;

the second-time fine-tuning self-training network comprises:

2. The face recognition method of claim 1, wherein the training process of the CNN network comprises:

3. The face recognition method of claim 1, wherein before the face image shot by each camera is input into a CNN network for face feature extraction, face detection, face correction and image preprocessing are performed on the face image shot by each camera to obtain the face region of the face image shot by each camera.

4. The face recognition method of claim 1, wherein the processor performs face feature extraction, similarity comparison and ternary feature recording at deployment time, and the processor performs self-training of a secondary fine-tuning network corresponding to each camera at non-deployment time.

5. A face recognition system is characterized in that the face recognition system comprises a CNN network and a secondary fine tuning network respectively constructed for each camera, and the face recognition system further comprises:

the face recognition module is used for sequentially inputting the face image shot by the camera into the CNN network and the current fine adjustment model of the camera to obtain the face feature of the face image when the face feature extraction is carried out on the face image shot by the camera next time;

the secondary trimming network comprises:

6. The face recognition system of claim 5, wherein the CNN network offline training process comprises:

7. The face recognition system of claim 5, wherein the system further comprises:

8. The face recognition system of claim 5, wherein the sample acquisition module, the calculation sorting module, the judgment module, the recording module and the face recognition module operate at deployment time, and the secondary fine-tuning module operates at non-deployment time.