CN115661890A

CN115661890A - Model training method, face recognition device, face recognition equipment and medium

Info

Publication number: CN115661890A
Application number: CN202211203545.3A
Authority: CN
Inventors: 李超
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2022-09-29
Filing date: 2022-09-29
Publication date: 2023-01-31

Abstract

The application discloses a model training method, a face recognition device, a face recognition equipment and a face recognition medium, and belongs to the field of artificial intelligence. The method comprises the following steps: acquiring face images of target objects of N continuous frames; inputting the face image into a face recognition model for feature extraction to obtain first face feature information and second face feature information; the face extraction model comprises an X layer of feature extraction layer; the first face feature information is obtained after being sequentially processed by the X-layer feature extraction layer, and the second face feature information is obtained after being processed by the ith layer in the X-layer feature extraction layer; i belongs to {2,3, … … X-1}; calculating to obtain a target loss parameter based on the first face image characteristic information and the second face characteristic information; and training the face recognition model based on the target loss parameters to obtain the trained face recognition model.

Description

Model training method, face recognition device, face recognition equipment and medium

Technical Field

The application belongs to the field of artificial intelligence, and particularly relates to a model training method, a face recognition device, a face recognition equipment and a medium.

Background

With the development of electronic technology, when a user borrows electronic equipment for other users to use the electronic equipment and cannot control the operation of the electronic equipment by other users, a greater risk of privacy disclosure exists. Currently, the electronic device may set real-time face recognition for a specific application, that is, perform uninterrupted face recognition for a user of the electronic device, and once the application is used under the condition that the application does not conform to a preset face, the application in the electronic device may be suspended.

In the prior art, a training method for real-time face recognition is usually based on a face database, and during training, features of a face image obtained by the database are extracted through a face recognition model to obtain face feature information corresponding to the face image, the face image corresponding to the face feature information in the database is compared with an input face image, comparison parameters of the face image and the input face image are calculated, and finally, the face recognition model is finely adjusted by using the parameters.

Therefore, as the contrast parameters for adjusting the face recognition model in the existing face recognition training method are too single and fixed, more accurate contrast parameters cannot be obtained, and the passing rate and the stability of the face recognition model are still lower after the face recognition model is updated.

Disclosure of Invention

The embodiment of the application aims to provide a model training method, a face recognition device, equipment and a medium, and can solve the problem that the passing rate and the stability of face recognition are still low after a face recognition model is updated.

In a first aspect, an embodiment of the present application provides a training method for a face recognition model, where the training method for the face recognition model includes: acquiring face images of target objects of N continuous frames; inputting the face image into a face recognition model for feature extraction to obtain first face feature information and second face feature information; the face extraction model comprises an X layer of feature extraction layer; the first face feature information is obtained after being sequentially processed by the X-layer feature extraction layer, and the second face feature information is obtained after being processed by the ith layer in the X-layer feature extraction layer; i belongs to {2,3, … … X-1}; calculating a target loss parameter based on the first face image characteristic information and the second face characteristic information; and training the face recognition model based on the target loss parameters to obtain a new face recognition model.

In a second aspect, an embodiment of the present application provides a training apparatus for a face recognition model, where the training apparatus for a face recognition model includes: the device comprises an acquisition module and a processing module; the acquisition module is used for acquiring face images of target objects of N continuous frames; the processing module is used for inputting the continuous N frames of target object face images into a face recognition model for feature extraction to obtain first face feature information and second face feature information; the face extraction model comprises an X layer of feature extraction layer; the first face feature information is obtained by sequentially processing the first face feature information by the X-layer feature extraction layer, and the second face feature information is obtained by processing the ith layer in the X-layer feature extraction layer; i belongs to {2,3, … … X-1}; the processing module is further configured to calculate a target loss parameter based on the first face image feature information and the second face feature information; the processing module is further configured to train a face recognition model based on the target loss parameter to obtain a new face recognition model; wherein N, X is a positive integer greater than 1.

In a third aspect, an embodiment of the present application provides a face recognition method based on a face recognition model, where the face recognition method based on the face recognition model includes: acquiring a current frame face image; detecting key point information of the current frame face image; processing the current frame face image based on the key point information, inputting the processed current frame face image into a face recognition model, and sequentially processing the processed current frame face image through an X-layer feature extraction layer of the face recognition model to obtain third face image feature information; matching the third facial image feature information with preset facial image feature information, and outputting a first recognition result; the face recognition model is a face recognition model trained by applying the training method of the face recognition model of the first aspect.

In a fourth aspect, an embodiment of the present application provides a face recognition apparatus based on a face recognition model, where the face recognition apparatus based on the face recognition model includes: the device comprises an acquisition module, a detection module and a processing module; the acquisition module is also used for acquiring a current frame face image; the detection module is used for detecting key point information of the current frame face image; the processing module is further configured to process the current frame face image based on the key point information, input the current frame face image to a face recognition model, and sequentially process the current frame face image through an X-layer feature extraction layer of the face recognition model to obtain third face image feature information; the processing module is further configured to match the third facial image feature information with preset facial image feature information and output a first recognition result; the face recognition model is a face recognition model trained by applying the training method of the face recognition model of the first aspect.

In a fifth aspect, embodiments of the present application provide an electronic device, which includes a processor and a memory, where the memory stores a program or instructions executable on the processor, and the program or instructions, when executed by the processor, implement the method according to the first aspect, or the steps of the method according to the third aspect.

In a sixth aspect, the present application provides a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the method according to the first aspect, or the steps of the method according to the third aspect.

In a seventh aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect or the method according to the third aspect.

In an eighth aspect, embodiments of the present application provide a computer program product, stored on a storage medium, for execution by at least one processor to implement the method according to the first aspect, or the method according to the third aspect.

In the embodiment of the application, the face images of the target objects of N continuous frames are obtained; inputting the face image into a face recognition model for feature extraction to obtain first face feature information and second face feature information; the face extraction model comprises an X layer of feature extraction layer; the first face feature information is obtained after being sequentially processed by the X-layer feature extraction layer, and the second face feature information is obtained after being processed by the ith layer in the X-layer feature extraction layer; i belongs to {2,3, … … X-1}; calculating to obtain a target loss parameter based on the first face image characteristic information and the second face characteristic information; and training the face recognition model based on the target loss parameters to obtain the trained face recognition model. Therefore, through inputting a plurality of face images, first face image characteristic information and second face image characteristic information can be obtained, the two types of face characteristic information are used for representing different levels of face characteristic information, target loss parameters of a face recognition model are obtained based on the two types of characteristic information, the face recognition model is trained, the trained face recognition model is obtained, the face recognition model is used for calculating loss of the face characteristic information through obtaining the face characteristic information of different levels, the loss is applied to the trained face recognition model, the face characteristic information of different faces recognized by the face recognition model is more accurate, and accordingly the passing rate and the stability of face recognition are improved.

Drawings

Fig. 1 is a schematic flowchart of a training method for a face recognition model according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a face recognition model according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a face recognition method based on a face recognition model according to an embodiment of the present application;

fig. 4 is a schematic diagram of an example of a face recognition method based on a face recognition model according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a training apparatus for a face recognition model according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a face recognition apparatus based on a face recognition model according to an embodiment of the present application;

fig. 7 is a schematic hardware structure diagram of an electronic device according to an embodiment of the present disclosure;

fig. 8 is a second schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present disclosure.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

The model training method, the face recognition device, the model training apparatus, the face recognition apparatus, and the model training medium provided in the embodiments of the present application are described in detail below with reference to the accompanying drawings.

In the prior art, a training method for real-time face recognition is usually based on a face database, and during training, features of a face image obtained by using the database are extracted through a face recognition model to obtain face feature information corresponding to the face image, the face image corresponding to the face feature information in the database is compared with an input face image, comparison parameters of the face image and the input face image are calculated, and finally, the face recognition model is finely adjusted by using the parameters.

In the embodiment of the application, the face images of the target object of continuous N frames, namely a plurality of face images, are obtained; inputting the N frames of face images into a face recognition model for feature extraction to obtain first face feature information and second face feature information; the face extraction model comprises an X layer of feature extraction layer; the second face feature information is obtained after being processed by a 13 th layer in the feature extraction layer; calculating classification cross entropy loss and first L2 loss based on the first face image characteristic information, and calculating second L2 loss based on the second face characteristic information; calculating to obtain a target loss parameter based on the three losses; and training the face recognition model according to the target loss parameters to obtain a new face recognition model. Therefore, through inputting a plurality of face images, first face image characteristic information and second face image characteristic information can be obtained, the two types of face characteristic information are used for representing different levels of face characteristic information, target loss parameters of a face recognition model are obtained based on the two types of characteristic information, the face recognition model is trained, the trained face recognition model is obtained, the face recognition model is used for calculating loss of the face characteristic information through obtaining the face characteristic information of different levels, the loss is applied to the trained face recognition model, the face characteristic information of different faces recognized by the face recognition model is more accurate, and accordingly the passing rate and the stability of face recognition are improved.

The execution main body of the face recognition model training method and the face recognition method based on the face recognition model provided by the embodiment of the application can be a face recognition model training device, and the face recognition model training device can be an electronic device or a functional module in the electronic device. The following describes technical solutions provided in embodiments of the present application by taking an electronic device as an example.

An embodiment of the present application provides a training method for a face recognition model, and fig. 1 shows a flowchart of the training method for the face recognition model provided in the embodiment of the present application. As shown in fig. 1, the training method of the face recognition model provided in the embodiment of the present application may include the following steps 201 to 204.

Step 201, acquiring face images of target objects of N continuous frames.

Wherein N is a positive integer greater than 1.

In this embodiment of the application, the face images of the target objects of the N consecutive frames may be real-time face images obtained by an electronic device, or may be obtained from a training data set.

In one example, the electronic device acquires time-sequential face images of the user while the user is using the electronic device, for example, the face images are acquired every 200 ms.

In another example, all the acquired face images of different users are constructed as a training data set.

Illustratively, the training data set may include facial images of different users, or may include only facial images of a single user (i.e., the target object).

Illustratively, in the process of training the face recognition model, different face image groups need to be constructed, so that part of face images in the training sample can be replaced by face images of other users at random. It should be noted that the newly added face image in the training sample is not the same user as in the training sample.

For example, in the process of training the face recognition model, the face images of the same user in the same continuous time period can be taken as a sequence. From this sequence, N successive frames of face images are sampled as training samples for the face recognition model. It should be noted that there may be overlapping frames between training samples.

Step 202, inputting the obtained face images of the continuous N frames into a face recognition model for feature extraction, so as to obtain first face feature information and second face feature information.

In this embodiment of the present application, the face recognition model includes an X-layer feature extraction layer. Wherein X is a positive integer greater than 1.

In the embodiment of the present application, as shown in fig. 2, the face recognition model may include an X-layer feature extraction layer, a different convolution layer, an L2 norm layer, and a softmax layer.

In one example, the face recognition model includes a feature extraction module, and the feature extraction module includes the X-layer feature extraction layer.

In another example, the X feature extraction layers are X bottleneck structures.

In an embodiment of the present application, the first facial feature information is obtained by sequentially processing the first facial feature information by the X-layer feature extraction layer.

In this embodiment, the second face feature information is obtained by processing an ith layer in the X-layer feature extraction layer, where i ∈ {2,3, … … }.

Illustratively, the ith layer of feature extraction layer is one of the X layers of feature extraction layers.

The ith layer feature extraction layer may be referred to as an intermediate layer of the feature extraction module, and the second face feature information may be referred to as an intermediate layer feature extracted from the ith layer.

Therefore, due to the fact that the sequential face images of the same user have semantic consistency, and the detail information such as face positions and outlines of the face images also have high consistency. And the high-level features of the neural network contain more semantic information. More detailed information, such as position-dependent contour texture, is contained in the features of the lower layer. Therefore, it is necessary to extract intermediate feature information from multiple feature extraction layers, so as to better acquire detail features of a face image and perform more accurate face recognition.

Optionally, in this embodiment of the application, in the process of inputting the facial image into the face recognition model to perform feature extraction in the step 202 "to obtain the first facial feature information and the second facial feature information", the steps 202a to 202c are included:

step 202a, inputting the face image into the X-layer feature extraction layer for sequential processing, so as to obtain a first face feature image.

And step 202b, performing convolution transformation processing on the first face feature image to obtain the first face image feature information.

Illustratively, each feature extraction layer consists of a convolution layer (conv), a normalization layer (batch norm), and a nonlinear excitation layer (relu).

Specifically, the electronic device firstly inputs the face image into an X-layer feature extraction layer, and extracts a first face feature image (for example, a face feature image with a size of 128 × 7) through conv, batch norm and relu in the X-layer feature extraction layer. Then, the first face feature image is sequentially input into convolution layers of different sizes (e.g., 1x1 conv, depth-wise conv,1x1 conv), so that face feature information of each channel in the first face feature image is extracted by the convolution layers, then the face feature information of each channel is combined, and finally, the combined face feature information is subjected to L2 norm transformation, so that the face feature information is normalized to obtain first face image feature information.

Step 202c, the second face feature image obtained after the processing of the ith layer of feature extraction layer is processed by a convolution layer to obtain the second face feature information.

Exemplarily, first, the electronic device extracts a second face feature image (for example, a face feature image with a size of 128X7X 7) from an i-th layer of the X layers of the feature extraction module, then inputs the second face feature image into a convolution layer (for example, 7X7 depth wise conv), and pools face feature information in the second face feature image through the convolution layer, so as to remove redundant content in the face feature information, reduce a calculation parameter, and further obtain more accurate face feature information (for example, the face feature information may be 128X1X1 in size); then, different channels of face feature information are combined through another convolution layer (for example, 1x1 conv), and finally, the combined face feature information is subjected to L2 norm transformation, so that the face feature information is normalized, and second face image feature information is obtained.

And step 203, calculating to obtain a target loss parameter based on the first face image characteristic information and the second face characteristic information.

And step 204, training the face recognition model based on the target loss parameters to obtain the trained face recognition model.

Optionally, in this embodiment of the application, the step 203 "calculating a target loss parameter based on the first facial image feature information and the second facial feature information" includes steps 203a to 203c:

and step 203a, obtaining a classification cross quotient loss and a first loss based on the first face image characteristic information.

Illustratively, the first loss may be a first L2 loss.

Illustratively, the first face image feature information is converted into probability values of various categories through a 128xN full connection layer (N is the number of ID categories of a training set) and a Softmax layer, and a categorical cross entropy loss Lce based on margin is obtained by combining real labels of images.

Illustratively, the L2 loss is calculated based on a first formula.

Illustratively, the first formula is:

and i is the face image of the ith frame in the face images of the N frames of target objects.

Further exemplarily, the first L2 loss is a sum of L2 losses between each two first facial image feature information corresponding to the facial images of the N frames of target objects of the input face recognition model calculated based on a second formula.

Exemplarily, taking the above N =4 as an example, that is, there are four consecutive frames of face images, the above second formula is: l2 _emb ＝a ₀₁ *L2(f0,f1)+a ₀₂ *L2(f0,f2)+a ₀₃ *L2(f0,f3)+a ₁₂ *L2(f1,f2)+a ₁₃ *L2(f1,f3)+a ₂₃ *L2(f2,f3)

Wherein, a _ij As a weighting function, there are positive IDs between the same IDs and negative IDs between different IDs. As follows:

note that when the i, j frames are the same person, the L2 loss weight is different between frames at different intervals. The similarity between adjacent frames (| i-j | = 1) is the highest, so the weight is 1.5, the weight of one frame at an interval is 1.2, and the weight of two frames at an interval is 1. The purpose of setting the different weights is to constrain the model so that the model gives less distance to more closely spaced frames. When i, j frames are different people, they are not taken from a continuous use scene, and the interval size of the frames has no meaning, so the weight is-1.

And step 203b, obtaining a second loss based on the second face feature information.

Illustratively, the second loss may be a second L2 loss.

Illustratively, the second L2 loss is a sum of L2 losses between every two second facial image feature information corresponding to the facial images of the N frames of target objects of the input facial recognition model calculated based on a third formula.

Exemplarily, taking the above N =4 as an example, that is, there are four consecutive frames of face images, the above third formula is:

L2 _mid ＝a ₀₁ *L2(f0′,f1′)+a ₀₂ *L2(f0′,f2′)+a ₀₃ *L2(f0′,f3′)

+a ₁₂ *L2(f1′,f2′)+a ₁₃ *L2(f1′,f3′)+a ₂₃ *L2(f2',f3')

in this way, the L2 loss is calculated for the second face image feature information of the middle layer, and the constraint on the face pose consistency of the continuous frames can be explicitly added during model training.

And 203c, performing weighted calculation on the classified cross entropy loss, the first L2 loss and the second L2 loss to obtain a target loss parameter.

Illustratively, the first L2 penalty is: sum of L2 loss between the first face image feature information.

Illustratively, the second L2 loss is: and the sum of L2 losses among the second face image feature information.

Illustratively, the classified cross entropy loss, the first L2 loss, and the second L2 loss are weighted and calculated based on a fourth formula to obtain a final target loss parameter.

Exemplarily, the fourth formula is: l is _all ＝L _ce +w1*L2 _emb +w2*L2 _mid 。

Wherein, w1 and w2 are weight coefficients 0.5,0.2 respectively.

Illustratively, the final target loss parameter consists of a weighted sum of the three partial losses described above. And updating the parameters of the face recognition model by performing back propagation on the target loss parameters to obtain a new face recognition model.

Therefore, through simultaneous learning and classification cross entropy loss, based on L2 loss between characteristic information of face images of N frames of images which are compared and learned and L2 loss between characteristic information of face images of middle layers of the N frames of images, the polymerization degree of different faces of the same user ID and the discrimination degree of the faces of different user IDs of the face recognition model are improved, and the polymerization capacity of the model to continuous face images is improved.

In the training method of the face recognition model provided by the embodiment of the application, face images of target objects of continuous N frames are obtained; inputting the face image into a face recognition model for feature extraction to obtain first face feature information and second face feature information; the face extraction model comprises an X layer of feature extraction layer; the first face feature information is obtained after being sequentially processed by the X-layer feature extraction layer, and the second face feature information is obtained after being processed by the ith layer in the X-layer feature extraction layer; i belongs to {2,3, … … }; calculating to obtain a target loss parameter based on the first face image characteristic information and the second face characteristic information; and training the face recognition model based on the target loss parameters to obtain the trained face recognition model. Therefore, through inputting a plurality of face images, first face image characteristic information and second face image characteristic information can be obtained, the two types of face characteristic information are used for representing different levels of face characteristic information, target loss parameters of a face recognition model are obtained based on the two types of characteristic information, the face recognition model is trained, the trained face recognition model is obtained, the face recognition model is used for calculating loss of the face characteristic information through obtaining the face characteristic information of different levels, the loss is applied to the trained face recognition model, the face characteristic information of different faces recognized by the face recognition model is more accurate, and accordingly the passing rate and the stability of face recognition are improved.

The embodiment of the application provides a training method of a face recognition model, and fig. 3 shows a flowchart of the training method of the face recognition model provided by the embodiment of the application. As shown in fig. 3, the training method of the face recognition model provided in the embodiment of the present application may include steps 301 to 304 described below.

Step 301, obtaining a current frame face image.

Illustratively, the current frame face image may be a face image acquired by the electronic device in real time.

For example, the user may select to turn on a "real-time face recognition" function in the electronic device. After the function is started, the front-facing camera of the electronic device collects the face image of the current user in real time according to a preset time interval (for example, 200 ms).

Step 302, detecting the key point information of the current frame face image.

Illustratively, the key points may include: left eye, right eye, nose, left corner of mouth, right corner of mouth, etc.

For example, the above-mentioned key point information of the current frame face image may be position and coordinate information of a key point in the target face image.

Illustratively, face detection is performed on a current frame face image to obtain coordinate information of a face position frame and key points in a target face image.

And 303, processing the target face image based on the key point information of the target face image, inputting the processed target face image into a face recognition model, and sequentially processing the target face image through an X-layer feature extraction layer in the face recognition model to obtain third face image feature information.

Illustratively, the face recognition model is a face recognition model trained by applying the training method of the steps 201 to 204.

And step 304, matching the third face image characteristic information with preset face image characteristic information, and outputting a first recognition result.

Illustratively, the preset face image feature information is face image feature information corresponding to a preset face of a user in the electronic device.

Illustratively, the first recognition result is used to indicate whether the current frame face image is the same person as the preset face image, that is, whether the current user is the preset user is determined.

Illustratively, after the third facial image feature information is matched with preset facial image feature information, a first distance threshold is obtained, and the first distance threshold is compared with a preset distance threshold.

Illustratively, the first distance threshold is used to characterize: and the difference between the third face image characteristic information and the preset face image characteristic information.

In particular, the first distance threshold may be denoted as d (f) _i ，f _enroll ) Wherein f is _i As feature information of a third face image, f _enroll The face image feature information is preset.

Illustratively, the preset distance threshold is set by the electronic device in a self-defined manner. Further, the preset distance threshold may be denoted as D _th1 。

In a possible embodiment, the first distance threshold is compared with a preset distance threshold, and if the first distance threshold is smaller than the preset distance threshold, the electronic device outputs a result that the target face image corresponding to the third face image feature information passes the recognition, that is, the recognition result indicates that the face verification is successful.

In another possible embodiment, the first distance threshold is compared with a preset distance threshold, and the first distance threshold is greater than the preset distance threshold, that is, the recognition result indicates that the face verification fails, then step 401 is continued.

In the face recognition method based on the face recognition model provided by the embodiment of the application, a current frame face image is obtained; detecting key point information of the current frame face image; processing the current frame face image based on the key point information, inputting the processed current frame face image into a face recognition model, and sequentially processing the processed current frame face image through an X-layer feature extraction layer of the face recognition model to obtain third face image feature information; matching the third facial image feature information with preset facial image feature information, and outputting a first recognition result; the face recognition model is a face recognition model trained by applying the training method of the face recognition model of the first aspect. Therefore, the face recognition model trained by the training method of the face recognition model in the embodiment of the application is used for extracting the features of the face image, and the success rate and the stability of face recognition are improved.

Optionally, in this embodiment of the application, after "matching the third facial image feature information with preset facial image feature information and outputting a first recognition result" in step 304, the method for face recognition based on a face recognition model according to this embodiment of the application further includes:

step 401, under the condition that the first recognition result indicates that the face verification fails, if it is determined that the key point information is missing, matching the third face image feature information with face image feature information corresponding to any one frame of face images in the previous Y frames of the current frame, and outputting a second recognition result.

Any one of the face images in the previous Y frames may be acquired by the electronic device in real time.

For example, the face image of any one of the previous Y frames may be the same user as the face image of the current frame, or may not be the same user.

Optionally, in this embodiment of the application, the process of step 401 includes the following steps 401a to 401c:

step 401a, detecting the amount of key point information of a current frame face image under the condition that the first recognition result indicates that the face verification fails; and judging the display state of the current frame face image based on the key point information quantity.

Illustratively, the display state of the current frame face image may include: abnormal display and normal display.

Further exemplarily, the abnormal display state may include at least one of: large-area shielding of the face, large-angle rotation of the face, severe half face and the like.

In a possible embodiment, it is detected that the number of the key point information is greater than the predetermined number, and the display state of the current frame face image is determined to be normal display, and since the first distance threshold is greater than the preset distance threshold, the electronic device outputs a result that the target face image corresponding to the third face image feature information fails to be identified, that is, the face verification fails.

In a possible embodiment, it is detected that the number of the key point information is less than a predetermined number, if it is determined that the display state of the current frame face image is abnormal, the electronic device outputs a result that the current frame face image corresponding to the third face image feature information fails to be identified, that is, the face verification fails; if the display state of the current frame face image is judged to be normal display, it is indicated that the face image information is more defective, and the step 401b is continued.

Step 401b, comparing the third face image feature information with the fourth face image feature information of the previous m frames of face images to obtain a second distance threshold value under the condition that the current frame of face images is too missing, and comparing the second distance threshold value with a preset distance threshold value.

Illustratively, the m previous frames of face images are any one of the Y previous frames of face images.

Illustratively, the previous m frames of face images are face images of the current frame of face images that have been previously subjected to face recognition. It should be noted that, when searching for a frame that has passed through before (i.e. the m frames of face images before the current frame), the search range should be as small as possible, and the search range may be set by the user or may be default for the electronic device, for example, 1s, i.e. the 5 frames before the electronic device.

Exemplarily, the second distance threshold is used for characterizing: and the difference between the third face image characteristic information and the fourth face image characteristic information.

In particular, the second distance threshold may be denoted as d (f) _i ,f _pre ) Wherein f is _i Is the feature information of the third face image, f _pre For presetting face image characteristic information

In a possible embodiment, the second distance threshold is compared with a preset distance threshold, and if the second distance threshold is smaller than the preset distance threshold, the electronic device outputs a result that the target face image corresponding to the third face image feature information passes the recognition, that is, the face verification is successful.

In another possible embodiment, the second distance threshold is compared with a preset distance threshold, and the second distance threshold is greater than the preset distance threshold, then step 401c is continued.

Step 401c, judging whether the previous frame of face image of the current frame of face image passes face recognition, and outputting the recognition result of the current frame of face image based on the recognition result of the previous frame of face image.

In a possible embodiment, if the previous frame of face image of the target face image fails to pass the face recognition result, the electronic device outputs a result that the target face image corresponding to the third face image feature information fails to pass the face recognition, that is, the face verification fails.

In another possible embodiment, the previous frame of face image of the current frame of face image passes the face recognition result, the change of the position of the key point in the current frame of face image and the previous frame of face image is further compared, if the change of the key point is smaller, the change of the face position is not large, the electronic device outputs the result that the face recognition passes, namely the face verification succeeds, otherwise, the result that the face recognition fails, namely the face verification fails.

For example, as shown in fig. 4, the test patterns t1, t2, and t3 are consecutive test images. The characteristic distance between the test chart t1 (i.e., the current frame face image) and the registration chart (i.e., the preset face image) is 0.46 (i.e., the first distance threshold), and is smaller than the recognition distance threshold 0.47 (and the preset distance threshold), so that the user image can be determined. When the test patterns t2 and t3 are directly compared with the registered image, the distance is greater than the threshold value, and the identification fails (it should be noted that, at this time, the test patterns t2 and t3 are the current frame face image). Since the consecutive frames have strong correlation, the test pattern t2 can be compared with the previous test pattern t1, and the distance between the two test patterns is 0.46. Similarly, the test chart t3 may be compared with the t2 image that has passed before, and the distance is 0.458, which is to pass (note that, at this time, the test chart t2, t3 are the previous m frames of face images).

Therefore, the face feature information is extracted through the face recognition model, then the face feature information is compared with the preset face feature information and the face feature information of the face which passes through the previous frame, and the final verification result is obtained by combining the change of the face position state, so that the face recognition rejection rate of a user is reduced, and the user experience is improved.

It should be noted that, in the training method for a face recognition model provided in the embodiment of the present application, the execution subject may be a training device for a face recognition model, or an electronic device, or may be a functional module or an entity in an electronic device. In the embodiment of the present application, a method for executing a training of a face recognition model by using a training device of the face recognition model is taken as an example, and the training device of the face recognition model provided in the embodiment of the present application is described.

Fig. 5 shows a schematic diagram of a possible structure of a training apparatus for a face recognition model according to an embodiment of the present application. As shown in fig. 5, the training apparatus 700 for face recognition model may include: an acquisition module 601 and a processing module 602; the acquiring module 601 is configured to acquire face images of target objects of N consecutive frames; the processing module 602 is configured to input the N consecutive frames of target object face images into a face recognition model for feature extraction, so as to obtain first face feature information and second face feature information; the face extraction model comprises an X layer of feature extraction layer; the first face feature information is obtained by sequentially processing the first face feature information by the X-layer feature extraction layer, and the second face feature information is obtained by processing the ith layer in the X-layer feature extraction layer; i belongs to {2,3, … … X-1}; the processing module 602 is further configured to calculate a target loss parameter based on the first facial image feature information and the second facial feature information; the processing module 602 is further configured to train a face recognition model based on the target loss parameter, so as to obtain a trained face recognition model; wherein N, X is a positive integer greater than 1.

Optionally, in this embodiment of the application, the processing module 602 is specifically configured to: obtaining a classification cross quotient loss and a first loss based on the first face image characteristic information; obtaining a second loss based on the second face feature information; performing weighted calculation on the classified cross entropy loss, the first loss and the second loss to obtain a target loss parameter; the first loss is: sum of losses between the first face image feature information; the second loss is: and the sum of losses among the second face image feature information.

Optionally, in this embodiment of the application, the processing module 602 is specifically configured to: inputting the face image into the X-layer feature extraction layer for sequential processing to obtain a first face feature image; performing convolution transformation processing on the first face feature image to obtain the feature information of the first face image; and processing the second face feature image obtained after the processing of the ith layer of feature extraction layer by a convolution layer to obtain the second face feature information.

Optionally, in this embodiment of the application, the processing module 602 is specifically configured to: mapping the first face image characteristic information to M preset face recognition labels through a full connection layer to obtain face recognition prediction parameters, wherein the face recognition prediction parameters are used for indicating the predicted face recognition labels; calculating the classified cross entropy loss between the predicted face recognition label and the M preset face recognition labels; the first loss is obtained by calculation processing based on the first face image feature information.

In the training device of the face recognition model provided by the embodiment of the application, face images of target objects of N continuous frames are obtained; inputting the face image into a face recognition model for feature extraction to obtain first face feature information and second face feature information; the face extraction model comprises an X layer of feature extraction layer; the first face feature information is obtained after being sequentially processed by the X-layer feature extraction layer, and the second face feature information is obtained after being processed by the ith layer in the X-layer feature extraction layer; i belongs to {2,3, … … }; calculating to obtain a target loss parameter based on the first face image characteristic information and the second face characteristic information; and training the face recognition model based on the target loss parameters to obtain the trained face recognition model. Therefore, through inputting a plurality of face images, first face image characteristic information and second face image characteristic information can be obtained, the two types of face characteristic information are used for representing different levels of face characteristic information, target loss parameters of a face recognition model are obtained based on the two types of characteristic information, the face recognition model is trained, the trained face recognition model is obtained, the face recognition model is used for calculating loss of the face characteristic information through obtaining the face characteristic information of different levels, the loss is applied to the trained face recognition model, the face characteristic information of different faces recognized by the face recognition model is more accurate, and accordingly the passing rate and the stability of face recognition are improved.

Fig. 6 shows a schematic diagram of a possible structure of a face recognition apparatus based on a face recognition model according to an embodiment of the present application. As shown in fig. 6, the face recognition apparatus 700 based on the face recognition model may include: an acquisition module 701, a detection module 702 and a processing module 703; the obtaining module 701 is further configured to obtain a target face image; the detection module 702 is configured to detect key point information of the current frame face image; the processing module 703 is further configured to process the current frame face image based on the key point information, input the current frame face image to the face recognition model, and sequentially process the current frame face image through the X-layer feature extraction layer to obtain third face image feature information; matching the third face image characteristic information with preset face image characteristic information, and outputting a first recognition result; the face recognition model is a face recognition model trained by applying the training method of the face recognition model.

In the face recognition device based on the face recognition model provided by the embodiment of the application, a face image of a current frame is obtained; detecting key point information of the current frame face image; processing the current frame face image based on the key point information, inputting the processed current frame face image into a face recognition model, and sequentially processing the processed current frame face image through an X-layer feature extraction layer of the face recognition model to obtain third face image feature information; matching the third facial image feature information with preset facial image feature information, and outputting a first recognition result; the face recognition model is a face recognition model trained by applying the training method of the face recognition model of the first aspect. Therefore, the face recognition model trained by the training method of the face recognition model in the embodiment of the application is used for extracting the features of the face image, and the success rate and the stability of face recognition are improved.

The training device of the face recognition model and the face recognition device based on the face recognition model in the embodiment of the application can be electronic equipment, and can also be a component in the electronic equipment, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be a device other than a terminal. The electronic Device may be, for example, an electronic Device, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic Device, a Mobile Internet Device (MID), an Augmented Reality (AR)/Virtual Reality (VR) Device, a robot, a wearable Device, an ultra-Mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and may also be a server, a Network Attached Storage (Storage), a personal computer (NAS), a Television (TV), a teller machine, a self-service machine, or the like, and the embodiments of the present application are not particularly limited.

The training device of the face recognition model and the face recognition device based on the face recognition model in the embodiment of the application may be devices with operating systems. The operating system may be an Android operating system, an ios operating system, or other possible operating systems, which is not specifically limited in the embodiment of the present application.

The training device for the face recognition model and the face recognition device based on the face recognition model provided by the embodiment of the application can realize each process realized by the method embodiments of fig. 1 to 4, and are not repeated here for avoiding repetition.

Optionally, as shown in fig. 7, an electronic device 800 is further provided in an embodiment of the present application, and includes a processor 801 and a memory 802, where the memory 802 stores a program or an instruction that can be executed on the processor 801, and when the program or the instruction is executed by the processor 801, the steps of the embodiment of the face recognition model training method and the embodiment of the face recognition method based on the face recognition model are implemented, and the same technical effects can be achieved, and are not described herein again to avoid repetition.

It should be noted that the electronic device in the embodiment of the present application includes the mobile electronic device and the non-mobile electronic device described above.

Fig. 8 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

The electronic device 100 includes, but is not limited to: a radio frequency unit 101, a network module 102, an audio output unit 103, an input unit 104, a sensor 105, a display unit 106, a user input unit 107, an interface unit 108, a memory 109, and a processor 110.

Those skilled in the art will appreciate that the electronic device 100 may further comprise a power source (e.g., a battery) for supplying power to various components, and the power source may be logically connected to the processor 110 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The electronic device structure shown in fig. 8 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.

The processor 110 is configured to acquire face images of target objects of N consecutive frames; the processor 110 is further configured to input the N consecutive frames of target object face images into a face recognition model for feature extraction, so as to obtain first face feature information and second face feature information; the face extraction model comprises an X layer of feature extraction layer; the first face feature information is obtained by sequentially processing the first face feature information by the X-layer feature extraction layer, and the second face feature information is obtained by processing the ith layer in the X-layer feature extraction layer; i belongs to {2,3, … … X-1}; the processor 110 is further configured to calculate a target loss parameter based on the first facial image feature information and the second facial feature information; the processor 110 is further configured to train a face recognition model based on the target loss parameter, so as to obtain a trained face recognition model; wherein N, X is a positive integer greater than 1.

Optionally, in this embodiment of the application, the processor 110 is specifically configured to: obtaining a classification cross quotient loss and a first loss based on the first face image characteristic information; obtaining a second loss based on the second face feature information; carrying out weighted calculation on the classified cross entropy loss, the first loss and the second loss to obtain a target loss parameter; the first loss is: sum of losses between the first face image feature information; the second loss is: and the sum of losses among the second face image feature information.

Optionally, in this embodiment of the application, the processor 110 is specifically configured to: inputting the face image into the X-layer feature extraction layer for sequential processing to obtain a first face feature image; performing convolution transformation processing on the first face characteristic image to obtain the characteristic information of the first face image; and processing the second face feature image obtained after the processing of the ith layer of feature extraction layer by a convolution layer to obtain the second face feature information.

Optionally, in this embodiment of the application, the processor 110 is specifically configured to: mapping the first face image characteristic information to M preset face recognition labels through a full connection layer to obtain face recognition prediction parameters, wherein the face recognition prediction parameters are used for indicating the predicted face recognition labels; calculating the classified cross entropy loss between the predicted face recognition label and the M preset face recognition labels; and calculating and processing to obtain the first loss based on the first face portrait characteristic information.

In the electronic equipment provided by the embodiment of the application, the face images of the target objects of N continuous frames are obtained; inputting the face image into a face recognition model for feature extraction to obtain first face feature information and second face feature information; the face extraction model comprises an X layer of feature extraction layer; the first face feature information is obtained after being sequentially processed by the X-layer feature extraction layer, and the second face feature information is obtained after being processed by the ith layer in the X-layer feature extraction layer; i belongs to {2,3, … … }; calculating a target loss parameter based on the first face image characteristic information and the second face characteristic information; and training the face recognition model based on the target loss parameters to obtain a new face recognition model. Therefore, through inputting a plurality of face images, first face image characteristic information and second face image characteristic information can be obtained, the two types of face characteristic information are used for representing different levels of face characteristic information, target loss parameters of a face recognition model are obtained based on the two types of characteristic information, the face recognition model is trained, a new face recognition model is obtained, the face recognition model is used for calculating loss of the face characteristic information through obtaining the face characteristic information of different levels, the loss is applied to the training face recognition model, the face characteristic information of different faces recognized by the face recognition model is more accurate, and accordingly the passing rate and the stability of face recognition are improved.

Optionally, in this embodiment of the application, the processor 110 is further configured to obtain a current frame face image; the processor 110 is configured to detect key point information of the current frame of face image; the processor 110 is further configured to process the target face image based on the key point information, input the target face image into the face recognition model, and sequentially process the target face image through the X-layer feature extraction layer to obtain third face image feature information; the processor 110 is further configured to match the third facial image feature information with preset facial image feature information, and output a first recognition result; the face recognition model is a face recognition model trained by applying the training method of the face recognition model.

In the electronic equipment provided by the embodiment of the application, a current frame face image is obtained; detecting key point information of the current frame face image; processing the current frame face image based on the key point information, inputting the current frame face image into a face recognition model, and sequentially processing the current frame face image through an X-layer feature extraction layer of the face recognition model to obtain third face image feature information; matching the third facial image feature information with preset facial image feature information, and outputting a first recognition result; the face recognition model is a face recognition model trained by applying the training method of the face recognition model of the first aspect. Therefore, the face recognition model trained by the training method of the face recognition model in the embodiment of the application is used for extracting the features of the face image, and the success rate and the stability of face recognition are improved.

It should be understood that, in the embodiment of the present application, the input Unit 104 may include a Graphics Processing Unit (GPU) 1041 and a microphone 1042, and the Graphics Processing Unit 1041 processes image data of a still picture or a video obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 106 may include a display panel 1061, and the display panel 1061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 107 includes at least one of a touch panel 1071 and other input devices 1072. The touch panel 1071 is also referred to as a touch screen. The touch panel 1071 may include two parts of a touch detection device and a touch controller. Other input devices 1072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein.

The memory 109 may be used to store software programs as well as various data. The memory 109 may mainly include a first storage area storing a program or an instruction and a second storage area storing data, wherein the first storage area may store an operating system, an application program or an instruction (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, memory 109 may include volatile memory or non-volatile memory, or memory 109 may include both volatile and non-volatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. The volatile Memory may be a Random Access Memory (RAM), a Static Random Access Memory (Static RAM, SRAM), a Dynamic Random Access Memory (Dynamic RAM, DRAM), a Synchronous Dynamic Random Access Memory (Synchronous DRAM, SDRAM), a Double Data Rate Synchronous Dynamic Random Access Memory (Double Data Rate SDRAM, ddr SDRAM), an Enhanced Synchronous SDRAM (ESDRAM), a Synchronous Link DRAM (SLDRAM), and a Direct Memory bus RAM (DRRAM). Memory 109 in the embodiments of the subject application includes, but is not limited to, these and any other suitable types of memory.

Processor 110 may include one or more processing units; optionally, the processor 110 integrates an application processor, which mainly handles operations related to the operating system, user interface, application programs, etc., and a modem processor, which mainly handles wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor 110.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the embodiment of the training method for a face recognition model, and can achieve the same technical effect, and in order to avoid repetition, the detailed description is omitted here.

The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a computer read only memory ROM, a random access memory RAM, a magnetic or optical disk, and the like.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement each process of the embodiment of the training method for a face recognition model, and can achieve the same technical effect, and in order to avoid repetition, the details are not repeated here.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

The present application provides a computer program product, where the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the processes of the above-mentioned training method for a face recognition model, and can achieve the same technical effects, and in order to avoid repetition, details are not repeated here.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatuses in the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions recited, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (which may be an electronic device, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A training method of a face recognition model is characterized by comprising the following steps:

acquiring face images of target objects of N continuous frames;

inputting the face image into a face recognition model for feature extraction to obtain first face feature information and second face feature information; the face extraction model comprises an X layer of feature extraction layer; the first face feature information is obtained after being sequentially processed by the X-layer feature extraction layer, and the second face feature information is obtained after being processed by the ith layer in the X-layer feature extraction layer; i belongs to {2,3, … … X-1};

calculating to obtain a target loss parameter based on the first face image characteristic information and the second face characteristic information;

training a face recognition model based on the target loss parameters to obtain a trained face recognition model;

wherein N, X is a positive integer greater than 1.

2. The method according to claim 1, wherein calculating a target loss parameter based on the first face image feature information and the second face image feature information comprises:

obtaining a classification cross quotient loss and a first loss based on the first face image feature information;

obtaining a second loss based on the second face feature information;

performing weighted calculation on the classified cross entropy loss, the first loss and the second loss to obtain a target loss parameter; the first loss is: sum of losses between the first facial image feature information;

the second loss is: and the sum of losses among the second face image feature information.

3. The method according to claim 1, wherein the inputting the face image into a face recognition model for feature extraction to obtain first face feature information and second face feature information comprises:

inputting the face image into the X-layer feature extraction layer for sequential processing to obtain a first face feature image;

performing convolution transformation processing on the first face feature image to obtain feature information of the first face image;

and processing the second face feature image obtained after the processing of the ith layer of feature extraction layer by a convolution layer to obtain second face feature information.

4. The method according to claim 1, wherein the deriving the classification cross quotient loss and the first loss based on the first face image feature information comprises:

mapping the first face image feature information to M preset face recognition labels through a full connection layer to obtain face recognition prediction parameters, wherein the face recognition prediction parameters are used for indicating the predicted face recognition labels;

calculating the classified cross entropy loss between the predicted face recognition label and the M preset face recognition labels;

and calculating and processing to obtain the first loss based on the first face image characteristic information.

5. A face recognition method based on a face recognition model is characterized by comprising the following steps:

acquiring a current frame face image;

detecting key point information of the current frame face image;

processing the current frame face image based on the key point information, inputting the current frame face image to the face recognition model, and sequentially processing the current frame face image through the X-layer feature extraction layer to obtain third face image feature information;

matching the third facial image feature information with preset facial image feature information, and outputting a first recognition result;

wherein the face recognition model is a face recognition model trained by applying the training method of any one of claims 1 to 4.

6. A training device for a face recognition model is characterized by comprising: the device comprises an acquisition module and a processing module;

the acquisition module is used for acquiring the face images of the target objects of N continuous frames;

the processing module is used for inputting the face image acquired by the acquisition module into a face recognition model for feature extraction to obtain first face feature information and second face feature information; the face extraction model comprises an X layer of feature extraction layer; the first face feature information is obtained after being sequentially processed by the X-layer feature extraction layer, and the second face feature information is obtained after being processed by the ith layer in the X-layer feature extraction layer; i belongs to {2,3, … … X-1};

the processing module is further configured to calculate a target loss parameter based on the first face image feature information and the second face image feature information;

the processing module is further used for training a face recognition model based on the target loss parameters to obtain a trained face recognition model;

wherein N, X is a positive integer greater than 1.

7. The apparatus of claim 6, comprising:

the processing module is specifically configured to:

obtaining a second loss based on the second face feature information;

performing weighted calculation on the classified cross entropy loss, the first loss and the second loss to obtain a target loss parameter; the first loss is: sum of losses between the first face image feature information;

8. The apparatus of claim 6, comprising:

the processing module is specifically configured to:

9. The apparatus of claim 6, comprising:

the processing module is specifically configured to:

10. A face recognition apparatus based on a face recognition model, the apparatus comprising:

the device comprises an acquisition module, a detection module and a processing module;

the acquisition module is used for acquiring a current frame face image;

the detection module is used for detecting the key point information of the current frame face image acquired by the acquisition module;

the processing module is used for processing the current frame face image acquired by the acquisition module based on the key point information detected by the detection module, inputting the current frame face image to the face recognition model, and sequentially processing the current frame face image through the X-layer feature extraction layer to obtain third face image feature information;

the processing module is used for matching the third face image characteristic information with preset face image characteristic information and outputting a first recognition result;

11. An electronic device comprising a processor, a memory and a program or instructions stored on the memory and executable on the processor, the program or instructions, when executed by the processor, implementing the steps of the training method of a face recognition model according to any one of claims 1 to 4 or the face recognition method based on a face recognition model according to claim 5.

12. A readable storage medium, on which a program or instructions are stored, which, when executed by a processor, implement the steps of the method for training a face recognition model according to any one of claims 1 to 4, or the method for face recognition based on a face recognition model according to claim 5.