CN111860101A

CN111860101A - Training method and device for face key point detection model

Info

Publication number: CN111860101A
Application number: CN202010333407.1A
Authority: CN
Inventors: 张修宝; 黄泄合; 沈海峰
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2020-04-24
Filing date: 2020-04-24
Publication date: 2020-10-30

Abstract

The application relates to the technical field of face recognition, in particular to a training method and a training device for a face key point detection model. According to the method and the device, the calculation weight of each face image sample can be determined according to the special degree of at least one sample category where each face image sample is located, each face image sample is input into a basic face key point detection model, the prediction coordinates of face key points in each face image sample can be determined, and further, the basic face key point detection model is trained according to the prediction coordinates and the real coordinates of the face key points in each face image sample and the calculation weight of the face image sample. Based on the above mode, when the model is trained by using the face image samples, the face image samples are distributed with the calculation weight according to the special degree of the face image samples, so that the model can learn diversified face structures in a balanced manner, and further, the accuracy of face key point detection can be improved.

Description

Training method and device for face key point detection model

Technical Field

The application relates to the technical field of face recognition, in particular to a training method and a training device for a face key point detection model.

Background

Human face recognition, which is a classic topic in the field of computer vision, has great research and application value, and has high demand for it. The face recognition is a technology for extracting face features of a face from an image containing the face, recognizing the face image according to the face features, and giving a recognition result.

Face key point detection is a key step in the field of face recognition and analysis, and is a precondition and breakthrough for other face-related problems such as automatic face recognition, expression analysis, three-dimensional face reconstruction, three-dimensional animation and the like. The face key point detection is a technology for detecting and positioning face key points in a face image, at present, the face key points can be accurately detected for a normal face image, but the face image is rich and various, and the accuracy rate of identifying the face key points is not high for special face images with non-normal faces, exaggerated expressions, makeup, shading, uneven illumination and blurriness.

Disclosure of Invention

In view of this, the embodiments of the present application at least provide a training method and apparatus for a face key point detection model, so that the accuracy of face key point detection can be improved by making the model learn a special face image sample in a focused manner.

The application mainly comprises the following aspects:

in a first aspect, an embodiment of the present application provides a training method for a face keypoint detection model, where the training method includes:

determining the calculation weight of each face image sample according to the special degree of at least one sample category in which each face image sample is positioned aiming at each face image sample in the obtained plurality of face image samples;

inputting each face image sample into a basic face key point detection model, and determining the prediction coordinates of the face key points in each face image sample;

and training the basic face key point detection model according to the predicted coordinates and the real coordinates of the face key points in each face image sample and the calculation weight of each face image sample.

In one possible embodiment, the sample category includes at least one of the following categories:

face correction degree, expression exaggeration degree, illumination degree, makeup degree, shading degree and image definition degree.

In a possible embodiment, for each sample class of the at least one sample class, the degree of specificity of each sample class in which each face image sample is located is determined according to the following steps:

Counting the target number of face image samples with target image characteristics in the obtained plurality of face image samples and the total number of face image samples in the plurality of face image samples; the target image characteristics are characteristics of each face image sample in each sample category;

and determining the special degree of each sample category in which each face image sample is positioned according to the total quantity and the target quantity.

In a possible implementation manner, the determining the calculated weight of each facial image sample according to the degree of specificity of at least one sample class in which each facial image sample is located includes:

and determining the calculation weight of each facial image sample according to the special degree corresponding to each sample type in the at least one sample type in which each facial image sample is positioned.

In a possible implementation, the training the basic face keypoint detection model according to the predicted coordinates and the real coordinates of the face keypoints in each face image sample and the calculated weight of each face image sample includes:

determining a first cross entropy of each face image sample according to the predicted coordinates and the real coordinates of the key points of the face in each face image sample;

Adjusting the first cross entropy according to the calculation weight of each face image sample to obtain a second cross entropy of each face image sample;

and adjusting the model parameters of the basic human face key point detection model according to the second cross entropy of each human face image sample.

In a possible implementation manner, after the obtaining the second cross entropy of each face image sample, the training method further includes:

determining the average cross entropy of the plurality of face image samples according to the obtained second cross entropy of each face image sample in the plurality of face image samples;

and when the average cross entropy meets a convergence condition, stopping training the basic face key point detection model.

In a possible implementation manner, the determining a first cross entropy of each face image sample according to the predicted coordinates and the real coordinates of the face key points in each face image sample includes:

calculating an absolute difference value between a predicted coordinate and a real coordinate of each face key point according to each face key point in each face image sample, and calculating the entropy loss of each face key point according to a calculation formula matched with the absolute difference value;

And determining a first cross entropy of each face image sample according to the entropy loss of each face key point in each face image sample.

In a possible implementation manner, if the absolute difference is smaller than a preset threshold, the calculation formula matched with the absolute difference is:

wherein, omega, alpha and epsilon are model parameters of the basic human face key point detection model, y is the prediction coordinate of each human face key point,

the real coordinates of each face key point.

In a possible implementation manner, if the absolute difference is greater than or equal to a preset threshold, the calculation formula matched with the absolute difference is:

wherein the content of the first and second substances,

y is the predicted coordinates of each face keypoint,

and for the real coordinates of each face key point, omega, alpha and epsilon are model parameters of the basic face key point detection model, and theta is the preset threshold value.

In a second aspect, an embodiment of the present application further provides a method for detecting key points of a human face, where the method includes:

acquiring a human face image to be detected;

inputting the face image to be detected into the face key point detection model obtained by training through the training method of the face key point detection model in any one of the possible implementation manners of the first aspect or the first aspect, and obtaining coordinates of the face key points in the face image to be detected.

In a third aspect, an embodiment of the present application further provides a training device for a face keypoint detection model, where the training device includes:

the first determining module is used for determining the calculation weight of each face image sample according to the special degree of at least one sample category in which each face image sample is positioned aiming at each face image sample in the obtained plurality of face image samples;

the second determination module is used for inputting each face image sample into the basic face key point detection model and determining the prediction coordinates of the face key points in each face image sample;

and the training module is used for training the basic face key point detection model according to the predicted coordinates and the real coordinates of the face key points in each face image sample and the calculation weight of each face image sample.

In a possible implementation manner, for each sample class in the at least one sample class, the first determining module is further configured to determine a degree of specificity of each sample class in which each face image sample is located according to the following steps:

In a possible implementation, the first determining module is specifically configured to determine the calculation weight of each face image sample according to the following steps:

In one possible embodiment, the training module comprises:

the first determining unit is used for determining a first cross entropy of each face image sample according to the predicted coordinates and the real coordinates of the key points of the face in each face image sample;

the second determining unit is used for adjusting the first cross entropy according to the calculation weight of each face image sample so as to obtain a second cross entropy of each face image sample;

And the adjusting unit is used for adjusting the model parameters of the basic human face key point detection model according to the second cross entropy of each human face image sample.

In a possible embodiment, the training device further comprises a stopping module; the stopping module is used for stopping the training of the basic face key point detection model according to the following steps:

In one possible implementation, the first determining unit includes:

the calculation subunit is used for calculating an absolute difference value between the predicted coordinate and the real coordinate of each face key point according to each face key point in each face image sample, and calculating the entropy loss of each face key point according to a calculation formula matched with the absolute difference value;

and the determining subunit is used for determining the first cross entropy of each face image sample according to the entropy loss of each face key point in each face image sample.

the real coordinates of each face key point.

wherein the content of the first and second substances,

y is the predicted coordinates of each face keypoint,

In a fourth aspect, an apparatus for detecting key points of a human face, the apparatus comprising:

the acquisition module is used for acquiring a face image to be detected;

and the determining module is configured to input the face image to be detected into the face keypoint detection model obtained by training of the training device of the face keypoint detection model in the second aspect or any possible implementation manner of the second aspect, so as to obtain coordinates of face keypoints in the face image to be detected.

In a fifth aspect, an embodiment of the present application further provides an electronic device, including: a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, when an electronic device runs, the processor and the memory communicate with each other through the bus, and when the processor runs, the machine-readable instructions perform the steps of the method for training a face keypoint detection model according to the first aspect or any one of the possible embodiments of the first aspect, and/or the steps of the method for detecting a face keypoint according to the second aspect.

In a sixth aspect, the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the steps of the method for training a face keypoint detection model described in the first aspect or any one of the possible implementations of the first aspect, and/or the steps of the method for detecting a face keypoint detection described in the second aspect.

According to the training method and device for the face key point detection model, the calculation weight of each face image sample can be determined according to the special degree of at least one sample type where each face image sample is located, each face image sample is input into the basic face key point detection model, the prediction coordinates of the face key points in each face image sample can be determined, and further, the basic face key point detection model is trained according to the prediction coordinates and the real coordinates of the face key points in each face image sample and the calculation weight of the face image sample. Based on the above mode, when the model is trained by using the face image samples, the face image samples are distributed with the calculation weight according to the special degree of the face image samples, so that the model can learn diversified face structures in a balanced manner, and further, the accuracy of face key point detection can be improved.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a flowchart illustrating a training method of a face key point detection model according to an embodiment of the present application;

fig. 2 is a flowchart illustrating a method for detecting key points of a face according to an embodiment of the present application;

FIG. 3 is a functional block diagram of an apparatus for training a face keypoint detection model according to an embodiment of the present application;

FIG. 4 illustrates a functional block diagram of the training module of FIG. 3;

FIG. 5 is a second functional block diagram of an apparatus for training a face keypoint detection model according to an embodiment of the present application;

FIG. 6 shows a functional block diagram of the first determination unit in FIG. 4;

fig. 7 is a functional block diagram of an apparatus for detecting key points of a human face according to an embodiment of the present application;

fig. 8 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

To make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of protection of the present application. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and that steps without logical context may be performed in reverse order or concurrently. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.

In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

To enable those skilled in the art to use the present disclosure, the following embodiments are given in conjunction with the specific application scenario "training of a face keypoint detection model", and it will be apparent to those skilled in the art that the general principles defined herein may be applied to other embodiments and application scenarios without departing from the spirit and scope of the present disclosure.

The method, the apparatus, the electronic device, or the computer-readable storage medium described in the embodiments of the present application may be applied to any scenario in which a face keypoint detection model needs to be trained, and the embodiments of the present application do not limit a specific application scenario, and any scheme using the training method and apparatus for a face keypoint detection model provided in the embodiments of the present application is within the scope of protection of the present application.

It is worth noting that before the application is provided, for normal face images, face key points can be accurately detected, as the face images are rich and diverse, and for special face images with non-normal faces, exaggerated expressions, makeup, shading, uneven lighting and blurry, in the model training process, the proportion occupied by the face images is possibly lower than that of the normal face images, therefore, the model cannot well learn the special images, and therefore, when the face key points are detected on the images, the accuracy rate of the identification of the face key points is not high.

In view of the above problems, in the embodiments of the present application, according to the special degree of at least one sample category in which each face image sample is located, the calculation weight of each face image sample may be determined, and each face image sample is input to the basic face key point detection model, so that the prediction coordinates of the face key points in each face image sample may be determined, and further, the basic face key point detection model is trained according to the prediction coordinates and the real coordinates of the face key points in each face image sample, and the calculation weight of the face image sample. Based on the above mode, when the model is trained by using the face image samples, the face image samples are distributed with the calculation weight according to the special degree of the face image samples, so that the model can learn diversified face structures in a balanced manner, and further, the accuracy of face key point detection can be improved.

For the convenience of understanding of the present application, the technical solutions provided in the present application will be described in detail below with reference to specific embodiments.

Fig. 1 is a flowchart of a training method for a face keypoint detection model according to an embodiment of the present disclosure. As shown in fig. 1, the training method for a face key point detection model provided in the embodiment of the present application includes the following steps:

s101: and determining the calculation weight of each face image sample in the plurality of acquired face image samples according to the special degree of at least one sample class in which each face image sample is positioned.

In specific implementation, when a basic face key point detection model is trained, a plurality of face image samples for training the basic face key point detection model need to be acquired, the special degree of each acquired face image sample in each sample class is determined according to each preset sample class in at least one sample class, and then the calculation weight of each face image sample is determined according to the special degree of at least one sample class in which each face image sample is located. Here, for each sample class, the degree of specificity of each face image sample in the sample class may be understood as the degree of specificity of the face image sample in the obtained multiple face image samples for the sample class, and the calculation weight of each face image sample when participating in the training of the basic face keypoint detection model is calculated by using the degree of specificity of each sample class in which each face image sample is located.

It should be noted that, if, for a sample class, the number of face image samples having the same image characteristics as that of one face image sample in the sample class is smaller, the degree of specificity of the face image sample is greater; on the contrary, if the number of face image samples having the same image characteristics as one face image sample in the sample type is larger, the degree of specificity of the face image sample is smaller. For each sample type, if the special degree of one face image sample in the obtained plurality of face image samples is higher, the calculation weight of the face image sample is also higher; conversely, for each sample type, if the degree of specificity of one face image sample in the obtained plurality of face image samples is low, the calculation weight of the face image sample is also high. Therefore, the calculation weight can play a role in balancing data by distributing larger calculation weight to the special face image samples with smaller quantity and distributing smaller calculation weight to the common face image samples with larger quantity, so that the diversity of the face structure can be uniformly learned by the face key point detection model.

Here, at least one sample category may be set in advance, and the sample category includes, but is not limited to, a face level category, an expression exaggeration level category, a light level category, a makeup level category, an occlusion level category, and an image clarity level category.

In one example, the sample type is an expression exaggeration degree type, image features of most face images in the obtained face image samples are normal in face expression, and if one image feature of the face image sample a is an exaggeration face expression, it is indicated that the feature degree of the face image sample a in the expression exaggeration degree type is higher for the expression exaggeration degree type; on the contrary, if one image feature of the facial image sample a is that the facial expression is normal, it indicates that the facial image sample a has a lower feature degree in the expression exaggeration degree category for the expression exaggeration degree category.

Further, for each sample class in at least one preset sample class, a process of determining a specificity of each sample class in which each facial image sample is located is described, that is, for each sample class in at least one preset sample class, determining a specificity of each sample class in which each facial image sample is located according to the following steps:

Step 1011: counting the target number of face image samples with target image characteristics in a plurality of acquired face image samples and the total number of face image samples in the plurality of face image samples; and the target image features are the features of each face image sample under each sample category.

In a specific implementation, when calculating the degree of particularity of any facial image sample in any sample class, firstly, searching for facial image samples having the same image characteristics as the calculated facial image samples in the sample class from among the obtained multiple facial image samples, further, counting the total number of the searched facial image samples together with the facial image samples, that is, counting the target number of the facial image samples having target image characteristics, wherein the target image characteristics are the calculated characteristics of the facial image samples in the sample class, and counting the total number of the facial image samples in the obtained multiple facial image samples.

Step 1012: and determining the special degree of each sample category in which each face image sample is positioned according to the total quantity and the target quantity.

In specific implementation, the degree of specificity of each sample category in which each facial image sample is located is determined jointly according to the total number of the acquired facial image samples in the plurality of facial image samples and the target number of the facial image samples having the target image features under the sample category (the features of the facial image samples). Here, since the face image samples with higher degrees of specificity occupy lower proportions of all the acquired face image samples, in order that the diversity of the face structure can be learned by the face keypoint detection model in a balanced manner, a larger calculation weight may be assigned to a face image sample with a smaller number of specificities, and the calculation weight is determined by the degree of importance, where the ratio between the total number and the target number may be used as the degree of specificity of each sample category in which each face image sample is located.

Here, the calculation formula of the degree of particularity of the face image sample N in the sample class c among the acquired N face image samples is

Wherein σ denotes a specialAnd the degree, the image feature of the face image sample N in the sample class c is a, and M is the number of face image samples with the image feature a in the N face image samples.

In one example, the image features of the face image sample n in the makeup degree category are "deep makeup features", and out of 50 acquired face image samples, the image features of a total of 5 face image samples in the makeup degree category are "deep makeup features", and then the degree of particularity of the face image sample n in the makeup degree category is "deep makeup features"

Further, the following describes a process of determining a calculation weight of each facial image sample, that is, determining a calculation weight of each facial image sample according to a special degree of at least one sample class in which each facial image sample is located in step S101, including the following steps:

In a specific implementation, at least one sample class may be preset, and further, the calculation weight of each facial image sample is calculated together according to the respective corresponding special degree of each facial image sample in each sample class, where two ways are given to determine the calculation weight:

the first method is as follows: the method comprises the steps of firstly determining the special degree of each face image sample in each sample class, adding the special degrees of each face image sample in each sample class in at least one preset sample class, and determining the sum value obtained by adding as the calculation weight of each face image sample.

In one example: the total preset 3 sample classes comprise a sample class A, a sample class B and a sample class C, the special degree of the human face image sample n in the sample class A is a1, the special degree of the human face image sample n in the sample class B is a2, and the special degree of the human face image sample n in the sample class C is a3, and then the calculation weight of the human face image sample n is a1+ a2+ a 3.

The second method comprises the following steps: the method comprises the steps of firstly determining the special degree of each facial image sample in each sample type, multiplying the special degree of each facial image sample in each sample type by a preset weight corresponding to the sample type, summing a plurality of values obtained by the multiplication, and determining a numerical value obtained by the summation as a calculation weight of each facial image sample.

In one example: the method comprises the steps that 3 sample classes are preset, wherein the sample classes comprise a sample class A, a sample class B and a sample class C, the special degree of a human face image sample n in the sample class A is a1, the special degree of the human face image sample n in the sample class B is a2, the special degree of the human face image sample n in the sample class C is a3, the preset weight corresponding to the sample class A is B1, the preset weight corresponding to the sample class B is B2, and the preset weight corresponding to the sample class C is B3, so that the calculation weight of the human face image sample n is a 1B 1+ a 2B 2+ a 3B 3.

S102: and inputting each face image sample into the basic face key point detection model, and determining the prediction coordinates of the face key points in each face image sample.

In specific implementation, when a basic face key point detection model is trained, each face image sample in a plurality of acquired face image samples is input into the basic face key point detection model, and the prediction coordinates of the face key points in each face image sample are predicted. Here, the predicted coordinates of the predicted face key points and the real coordinates of the corresponding face key points may have a difference, and the basic face key point detection model may be trained using the difference.

S103: and training the basic face key point detection model according to the predicted coordinates and the real coordinates of the face key points in each face image sample and the calculation weight of each face image sample.

In specific implementation, when a plurality of face image samples for training a basic face key point detection model are obtained, the real coordinates of the face key points in each face image sample also need to be obtained, where the real coordinates of the face key points in each face image sample can be artificially labeled in advance, and then the basic face key point detection model is trained according to the predicted coordinates and the real coordinates of the face key points in each face image sample and the calculation weight of each face image sample. When the face image samples are used for training the model, calculation weights are distributed to the face image samples according to the special degree of the face image samples, so that the model can learn diversified face structures in a balanced manner, and further the accuracy of face key point detection can be improved. Specifically, a larger calculation weight can be allocated to a small number of special face image samples, and a smaller calculation weight can be allocated to a large number of common face image samples, so that the calculation weights can play a role in balancing data, and thus, the diversity of the face structure can be uniformly learned by a face key point detection model, and the purpose of improving the accuracy of face key point detection is achieved.

Further, for each face image sample of the obtained plurality of face image samples, the basic face keypoint detection model may be trained according to the following steps:

step 1031: and determining the first cross entropy of each face image sample according to the predicted coordinates and the real coordinates of the key points of the face in each face image sample.

In specific implementation, the first cross entropy of each face image sample is determined by the difference between the predicted coordinates and the real coordinates of each face key point in each face image sample.

Here, each face image sample contains a plurality of face key points, including but not limited to the corners of the mouth, nose, eyes (inner and outer corners of the eyes, etc.), and eyebrows.

Further, the determining the first cross entropy of each face image sample according to the predicted coordinates and the real coordinates of the key points of the face in each face image sample in step 1031 includes the following steps:

calculating an absolute difference value between a predicted coordinate and a real coordinate of each face key point according to each face key point in each face image sample, and calculating the entropy loss of each face key point according to a calculation formula matched with the absolute difference value; and determining a first cross entropy of each face image sample according to the entropy loss of each face key point in each face image sample.

In specific implementation, for any face key point in a plurality of face image samples, calculating an absolute difference between the predicted coordinate and the real coordinate of the face key point, and calculating the entropy loss of the face key point according to a calculation formula matched with the absolute difference.

Here, if the absolute difference is smaller than a preset threshold, the entropy loss of the face key point is calculated according to the following formula:

the real coordinates of each face key point. Here, when the difference between the predicted coordinates and the real coordinates of the face key points is small, the nonlinear loss function is used for prediction, so that the influence on small errors can be increased. The model parameters are configuration variables inside the model, and when the basic face key point detection model is trained through a face image sample, the model parameters omega, alpha and epsilon are adjusted, namely, the model parameters change during each training, so that the purpose of reducing the difference between the predicted coordinates and the real coordinates of the face key points is achieved. The preset threshold value can be set according to the requirement of the training precision of the actual model.

Here, if the absolute difference is greater than or equal to a preset threshold, the entropy loss of the face key point is calculated according to the following formula:

wherein the content of the first and second substances,

y is the predicted coordinates of each face keypoint,

and for the real coordinates of each face key point, omega, alpha and epsilon are model parameters of the basic face key point detection model, and theta is the preset threshold value. Here, for the case where the difference between the predicted coordinates and the real coordinates of the face key points is large, a linear loss function is used for prediction, and the linear loss is sufficient to converge the training to the predicted position. The model parameters are configuration variables inside the model, and when the basic face key point detection model is trained through a face image sample, the model parameters omega, alpha and epsilon are adjusted, namely, the model parameters change during each training, so that the purpose of reducing the difference between the predicted coordinates and the real coordinates of the face key points is achieved.

It should be noted that, when the basic face key point detection model is trained through a face image sample each time, model parameters of the basic face key point detection model are adjusted, here, ω, α, and e may be adjusted, and the loss function is an adaptive wing loss function.

Step 1032: and adjusting the first cross entropy according to the calculation weight of each face image sample to obtain a second cross entropy of each face image sample.

In specific implementation, after the calculation weight of each face image sample and the first cross entropy of each face image sample are calculated, the first cross entropy is adjusted through the calculation weight to obtain the second cross entropy of each face image sample, so that the calculation weight can play a role in balancing data, and the diversity of the face structure can be learned by a face key point detection model in a balanced manner.

Step 1033: and adjusting the model parameters of the basic human face key point detection model according to the second cross entropy of each human face image sample.

In a specific implementation, in the training process of the basic face keypoint detection model through each face image sample, the model parameters of the basic face keypoint detection model may be adjusted according to the second cross entropy of each face image sample.

Here, a process of training a basic face key point detection model by using a plurality of acquired face image samples is described:

taking any one of the human face image samples which do not participate in model training in the current round as a current image sample, and determining a first cross entropy of the current image sample according to a predicted coordinate and an actual coordinate of each human face key point in the current image sample; adjusting the first cross entropy according to the calculation weight of each face key point of the current image sample, and determining a second cross entropy of the current image sample; adjusting model parameters of a basic human face key point detection model according to a second cross entropy of the current image sample in the current round; and taking the current image sample as an image sample participating in model training in the current round, and returning to the step of determining the current image sample until all image samples in the obtained plurality of face image samples complete the training in the current round.

Here, a multi-round training process may be set, each round of which acquires a plurality of face image samples, and each round of which trains the basic key point detection model by using the acquired plurality of face image samples, where, when each round of training is finished, an average cross entropy of the plurality of face image samples participating in the model training is determined, and whether to terminate the training of the basic face key point detection model is determined according to whether the average cross entropy converges, and then, a target face key point detection model capable of performing face key point detection is obtained, and after the second cross entropy of each face image sample is obtained, the method further includes the following steps:

determining the average cross entropy of the plurality of face image samples according to the obtained second cross entropy of each face image sample in the plurality of face image samples; and when the average cross entropy meets a convergence condition, stopping training the basic face key point detection model.

Here, the calculation formula of the average cross entropy is as follows:

wherein Loss is the average cross entropy of N face image samples, C is the sample class, C is the number of sample classes,

the degree of specificity of the face image sample n in the sample class c, k is the key point of the face,

Representing the entropy loss of the face key points K, where K is the number of face key points in the face image sample n,

the real coordinates of the face key points k in the face image sample n,

and (4) predicting coordinates of a key point k of the face in the face image sample n.

Here, a convergence threshold may be set in advance, and when the average cross entropy is less than or equal to the convergence threshold, it is determined that the average cross entropy satisfies the convergence condition, and the training of the base face keypoint detection model is stopped.

In the embodiment of the application, according to the special degree of at least one sample category in which each face image sample is located, the calculation weight of each face image sample can be determined, each face image sample is input to the basic face key point detection model, the prediction coordinates of the face key points in each face image sample can be determined, and further, the basic face key point detection model is trained according to the prediction coordinates and the actual coordinates of the face key points in each face image sample and the calculation weight of the face image sample. Based on the above mode, when the model is trained by using the face image samples, the face image samples are distributed with the calculation weight according to the special degree of the face image samples, so that the model can learn diversified face structures in a balanced manner, and further, the accuracy of face key point detection can be improved.

Fig. 2 is a flowchart of a method for detecting face key points according to an embodiment of the present application. As shown in fig. 2, the method for detecting key points of a human face provided in the embodiment of the present application includes the following steps:

s201: acquiring a human face image to be detected;

s202: and inputting the face image to be detected into a trained face key point detection model to obtain coordinates of the face key points in the face image to be detected.

In the embodiment of the application, when the face image sample is used for training the model, the face image sample is assigned with the calculation weight according to the special degree of the face image sample, so that the model can learn diversified face structures in a balanced manner, and further, the target face key point detection model obtained by the training method of the face key point detection model shown in the figure 1 is used for detecting key points of the face image, and the accuracy of face key point detection can be improved.

Based on the same application concept, the embodiment of the present application further provides a training device for a face key point detection model corresponding to the training method for a face key point detection model provided in the above embodiment, and as the principle of solving the problem of the device in the embodiment of the present application is similar to the training method for a face key point detection model provided in the above embodiment of the present application, the implementation of the device may refer to the implementation of the method, and repeated parts are not repeated.

As shown in fig. 3 to 6, fig. 3 is one of functional block diagrams of a training apparatus 300 for a face keypoint detection model according to an embodiment of the present application, fig. 4 is a functional block diagram of a training module 330 in fig. 3, fig. 5 is a functional block diagram of a training apparatus 300 for a face keypoint detection model according to an embodiment of the present application, and fig. 6 is a functional block diagram of a first determination unit 331 in fig. 4.

As shown in fig. 3, the training apparatus 300 for a face keypoint detection model includes:

a first determining module 310, configured to determine, for each face image sample of the obtained multiple face image samples, a calculation weight of each face image sample according to a special degree of at least one sample category in which each face image sample is located;

the second determining module 320 is configured to input each face image sample into the basic face keypoint detection model, and determine a prediction coordinate of a face keypoint in each face image sample;

the training module 330 is configured to train the basic face key point detection model according to the predicted coordinates and the real coordinates of the face key points in each face image sample and the calculation weight of each face image sample.

In a possible implementation manner, as shown in fig. 3, for each sample class in the at least one sample class, the first determining module 310 is further configured to determine a degree of specificity of each sample class in which each face image sample is located according to the following steps:

In a possible implementation manner, as shown in fig. 3, the first determining module 310 is specifically configured to determine the calculation weight of each face image sample according to the following steps:

In one possible embodiment, as shown in fig. 4, the training module 330 includes:

the first determining unit 331 is configured to determine a first cross entropy of each face image sample according to the predicted coordinates and the real coordinates of the face key points in each face image sample;

a second determining unit 332, configured to adjust the first cross entropy according to the calculation weight of each facial image sample, so as to obtain a second cross entropy of each facial image sample;

and an adjusting unit 333, configured to adjust a model parameter of the basic face keypoint detection model according to the second cross entropy of each face image sample.

In one possible embodiment, as shown in fig. 5, the training apparatus 300 for the face keypoint detection model further includes a stopping module 340; the stopping module 340 is configured to stop training the basic face keypoint detection model according to the following steps:

In one possible implementation, as shown in fig. 6, the first determining unit 331 includes:

a calculating subunit 3311, configured to calculate, for each face key point in each face image sample, an absolute difference between a predicted coordinate and a real coordinate of each face key point, and calculate an entropy loss of each face key point according to a calculation formula matched with the absolute difference;

the determining subunit 3312 is configured to determine a first cross entropy of each face image sample according to the entropy loss of each face keypoint in each face image sample.

the real coordinates of each face key point.

wherein the content of the first and second substances,

y is the predicted coordinates of each face keypoint,

Fig. 7 is a functional block diagram of an apparatus 700 for detecting a face keypoint according to an embodiment of the present application, where the apparatus 700 for detecting a face keypoint includes:

an obtaining module 710, configured to obtain a face image to be detected;

the determining module 720 is configured to input the face image to be detected into the face keypoint detection model obtained through training by the training device of the face keypoint detection model in any of the above implementations, so as to obtain coordinates of the face keypoints in the face image to be detected.

In the embodiment of the application, when the face image sample is used for training the model, the face image sample is distributed with the calculation weight according to the special degree of the face image sample, so that the model can learn diversified face structures in a balanced manner, and then the target face key point detection model obtained by training is used for detecting key points of the face image, and the accuracy of face key point detection can be improved.

Based on the same application concept, referring to fig. 8, a schematic structural diagram of an electronic device 800 provided in the embodiment of the present application includes: a processor 810, a memory 820 and a bus 830, wherein the memory 820 stores machine-readable instructions executable by the processor 810, when the electronic device 800 is operated, the processor 810 and the memory 820 communicate with each other through the bus 830, and when the processor 810 is operated, the machine-readable instructions perform the steps of the method for training the face key point detection model according to any one of the embodiments and/or the steps of the method for detecting the face key points according to any one of the embodiments.

In particular, the machine readable instructions, when executed by the processor 810, may perform the following:

In the embodiment of the application, according to the special degree of at least one sample category where each face image sample is located, the calculation weight of each face image sample can be determined, each face image sample is input into a basic face key point detection model, the prediction coordinates of face key points in each face image sample can be determined, and further, the basic face key point detection model is trained according to the prediction coordinates and the actual coordinates of the face key points in each face image sample and the calculation weight of the face image sample. Based on the above mode, when the model is trained by using the face image samples, the face image samples are distributed with the calculation weight according to the special degree of the face image samples, so that the model can learn diversified face structures in a balanced manner, and further, the accuracy of face key point detection can be improved.

Based on the same application concept, embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the steps of the training method for face keypoint detection models provided in the foregoing embodiments, and/or to perform the steps of the detection method for face keypoints as described in any of the foregoing embodiments.

Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, or the like, when a computer program on the storage medium is run, the training method of the face key point detection model can be executed, and the face image sample is assigned with the calculation weight according to the special degree of the face image sample, so that the model can learn diversified face structures in a balanced manner, and further, the accuracy of face key point detection can be improved.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A training method for a face key point detection model is characterized by comprising the following steps:

2. Training method according to claim 1, wherein the sample classes comprise at least one of the following classes:

3. The training method of claim 1, wherein for each sample class of the at least one sample class, the degree of specificity of each sample class in which each face image sample is located is determined according to the following steps:

4. The training method of claim 1, wherein the determining the calculated weight of each facial image sample according to the degree of specificity of at least one sample class in which each facial image sample is located comprises:

5. The training method of claim 1, wherein the training of the base face keypoint detection model according to the predicted coordinates, the real coordinates of the face keypoints in each face image sample and the calculated weight of each face image sample comprises:

6. The training method according to claim 5, wherein after the obtaining of the second cross entropy for each face image sample, the training method further comprises:

7. The training method of claim 5, wherein the determining the first cross entropy of each facial image sample according to the predicted coordinates and the real coordinates of the key points of the face in each facial image sample comprises:

8. The training method according to claim 7, wherein if the absolute difference is smaller than a preset threshold, the calculation formula matched with the absolute difference is:

wherein, omega, alpha and epsilon are model parameters of the basic face key point detection model, and y is the predicted seat of each face key pointThe mark is that,

the real coordinates of each face key point.

9. The training method according to claim 7, wherein if the absolute difference is greater than or equal to a predetermined threshold, the calculation formula matched with the absolute difference is:

Wherein the content of the first and second substances,

y is the predicted coordinates of each face keypoint,

10. A method for detecting key points of a human face is characterized by comprising the following steps:

acquiring a human face image to be detected;

inputting the face image to be detected into the face key point detection model obtained by training through the face key point detection model training method according to any one of claims 1 to 9, and obtaining coordinates of the face key points in the face image to be detected.

11. A training device for a face key point detection model, the training device comprising:

12. A detection device for face key points is characterized in that the detection device comprises:

the acquisition module is used for acquiring a face image to be detected;

a determining module, configured to input the facial image to be detected into the facial key point detection model obtained through training by the training apparatus of the facial key point detection model according to claim 11, so as to obtain coordinates of the facial key points in the facial image to be detected.

13. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when an electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the method for training a face keypoint detection model according to any one of claims 1 to 9 and/or performing the steps of the method for detecting a face keypoint according to claim 10.

14. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, performs the steps of the method for training a face keypoint detection model according to any one of claims 1 to 9 and/or performs the steps of the method for detecting face keypoints according to claim 10.