CN112270308A

CN112270308A - Face feature point positioning method based on double-layer cascade regression model

Info

Publication number: CN112270308A
Application number: CN202011305067.8A
Authority: CN
Inventors: 狄岚; 张佳慧; 顾雨迪
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2020-11-20
Filing date: 2020-11-20
Publication date: 2021-01-26
Anticipated expiration: 2040-11-20
Also published as: CN112270308B

Abstract

The invention discloses a face feature point positioning method based on a double-layer cascade regression model, which is applied to face feature point positioning, and improves the accuracy of face feature point positioning to a certain extent, wherein the first layer of the model is used for positioning a simplified face shape containing part of key feature points, in order to enhance the robustness of the face feature point positioning method, a fusion subspace is used for dividing samples, and each part of the samples train a feature extraction model and a regressor independently; the second layer is used for positioning the complete face shape, 3D fitting is carried out on the regression result of the first layer model to obtain the roughly aligned complete face shape, and finer regression is realized on the basis of the shape. Experiments prove that compared with a single-layer cascade regression model, the model provided by the invention has 23.55% improved performance on a 300-W challenge set with large attitude change and has certain attitude robustness.

Description

Face feature point positioning method based on double-layer cascade regression model

Technical Field

The invention relates to the technical field of computer image processing, in particular to a human face feature point positioning method based on a double-layer cascade regression model.

Background

The aim of the positioning of the face feature points is to position more specific face shapes such as eyebrows, eyes, a nose, a mouth, contours and the like on the basis of finishing face detection. The positioning of the face feature points is an important link in the face image processing task, and plays an important role in subsequent work such as face recognition, face expression analysis, 3D face reconstruction and the like. Most of the current face feature point positioning algorithms can obtain satisfactory effects on front face images, but have challenges for realizing high-precision face feature point positioning on unconstrained face images with huge expression and posture changes.

The related algorithm for positioning the face feature points can be mainly divided into two categories: (1) a method based on generative models. A representative algorithm is an Active Appearance Model (AAM) proposed by Cootes and the like, firstly, a parameter model is constructed in a shape and texture feature space by utilizing Principal Component Analysis (PCA), and then, the matching of the model and a human face image is realized by optimizing parameters. Although this type of approach has achieved various improvements, the expressive power of parametric models is ultimately limited and cannot handle subtle shape changes. (2) A discriminant model-based approach. The goal of such methods is to map directly from the extracted image features to the human face feature point coordinates. The cascade regression model is the most widely applied model in the field of positioning of feature points of human faces at present, and the Cascade Pose Regression (CPR) method proposed by Doll r and the like is the earliest. The algorithm based on cascade regression considers the positioning of the face feature points as solving the problem of nonlinear optimization between image texture features and face shapes, and gradually updates the initial shape to the final shape by learning a series of feature-to-shape mappings. In addition to the cascading regression model, a deep network-based method has also gained wide attention in recent years. The earliest application of the deep network to face feature point positioning is a face feature point positioning algorithm based on a Deep Convolutional Neural Network (DCNN) proposed by Sun and the like, and coarse-to-fine positioning is also realized in a cascading manner. In recent years, algorithms for realizing the positioning of the face feature points by using the 3D face model also appear. The method fits the model and the face image by adjusting parameters, and projects the fitting result to the 2D plane to obtain a positioning result. Due to the large number of parameters of the 3D face model, the algorithms mostly need to be assisted by a deep network. Although the performance of the face feature point positioning algorithm is greatly improved due to the addition of the deep network, a plurality of excellent algorithms based on cascade regression still emerge at present, and the performance of the algorithm based on deep learning is better than that of the algorithm based on deep learning under certain conditions on the basis of relatively low training cost.

Shape initialization is the first step of a face feature point positioning algorithm based on cascade regression, and many conventional cascade regression algorithms use the average value of the real shapes of all samples in a training set or a face shape with a neutral front face as an initial shape, and then based on the initial shape, the final prediction result is achieved through multiple iteration updates. When such an initial shape is used, it is often difficult to accurately locate a face with a large change in pose and expression, because when the initial shape is greatly different from the real shape, the cascaded regression model may have a limited number of iterations or fall into local optimality, resulting in an undesirable final result. Xiong et al propose to divide the sample according to the homogeneous degradation domain (DHD), and then establish regression models respectively to avoid regression from entering local optimum, but the division methods proposed by them need to use the real face shape of the sample during actual calculation, which is not practical in the testing process. The real shape of all samples is explored as a solution space by coarse-to-fine shape searching (CFSS), which solves the limitation of the initial shape but also reduces the operation rate.

Disclosure of Invention

This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.

The present invention has been made in view of the above-mentioned conventional problems.

Therefore, the technical problem solved by the invention is as follows: the method has the problems that the pose change of an unconstrained face image is large and the regression result is limited by the initial shape in the face feature point positioning process.

In order to solve the technical problems, the invention provides the following technical scheme: amplifying a sample by a method of randomly selecting a human face shape from a training sample set as an initial shape;

selecting 14 points which play a key role in positioning the characteristic points of the face from the complete face shape with 68 points to form a simplified face shape;

dividing a sample into a plurality of subsets by using a subspace division method, respectively training a regressor for each subset, wherein all the regressors jointly form a first-layer cascade regression model for predicting and simplifying the shape of the human face;

utilizing a 3D fitting mode to realize projection from the simplified human face shape predicted by the first layer to the complete human face shape and generating a rough initial complete human face shape;

and taking the roughly complete face shape as an initial shape, training a regressor by using all training samples to form a second-level cascade regression model, and forming a double-layer cascade regression model for predicting the complete face shape.

As a preferred scheme of the method for positioning human face feature points based on the double-layer cascade regression model, the method comprises the following steps: the step of constructing the two-layer cascading regression model includes,

firstly, extracting a shape index characteristic based on PHOG, projecting a projection matrix P obtained in a training process to a fusion subspace, and dividing K sample subsets according to the projection matrix P;

then, respectively training the feature mapping functions according to different subsets

And regression device

Simplified face shape obtained by T iterative regression

Obtaining affine transformation parameters by using a 3D fitting method, and projecting to obtain a rough complete human face shape s⁰；

And finally, training a feature extraction function phi and a regressor W according to all training samples and the roughly complete human face shape obtained in the previous step, and performing T times of iteration to realize fine regression.

As a preferred scheme of the method for positioning human face feature points based on the double-layer cascade regression model, the method comprises the following steps: the step of selecting the optimal simplified face shape includes,

based on 5 most representative contour feature points, respectively adding part of feature points at the positions of eyes, nose, mouth and contour, and selecting 6 simplified human face shapes;

the most representative 5 contour feature points include the center of the eyes, the tip of the nose, and the corners of the mouth.

As a preferred scheme of the method for positioning human face feature points based on the double-layer cascade regression model, the method comprises the following steps: the step of selecting the optimal simplified face shape may further comprise,

selecting the most suitable simplified human face shape, including calculating a positioning error and a fitting error;

and comparing the positioning error with the fitting error, and selecting a 14-point shape with the smallest positioning error and the smallest fitting error as a simplified face shape.

As a preferred scheme of the method for positioning human face feature points based on the double-layer cascade regression model, the method comprises the following steps: the positioning error calculation method comprises the steps of calculating the normalized standard error of each model on a test set sample based on each simplified face shape training single-layer cascade regression model;

the fitting error calculation method includes calculating an average normalized error between the generated complete face shape and the real face shape of the sample based on a positioning result obtained by the positioning error and the complete face shape generated by the 3D fitting method.

As a preferred scheme of the method for positioning human face feature points based on the double-layer cascade regression model, the method comprises the following steps: the fusion subspace comprises (a) a number of sub-spaces,

where r is the column dimension of the fusion subspace, z_i,jAnd representing the projection result of the shape index feature of the ith sample in the jth dimensional subspace.

As a preferred scheme of the method for positioning human face feature points based on the double-layer cascade regression model, the method comprises the following steps: the step of dividing the samples into K sample subsets using said fusion subspace comprises,

defining a set of face images and corresponding true shapes in a training set,

where N is the number of samples, I_iAnd

respectively an ith personal face image and a corresponding real shape;

matching the average simplified face shape with the bounding boxes of all the training set samples, and defining the matching result as

The shape residual is calculated and,

and PHOG-based shape indexing features

Analyzing the shape residual error and the shape index characteristic by utilizing typical correlation analysis to obtain projection matrixes P and Q corresponding to the shape residual error and the shape index characteristic;

the subsets are divided according to the following formula,

as a preferred scheme of the method for positioning human face feature points based on the double-layer cascade regression model, the method comprises the following steps: if the sign of each dimension of the ith and the g-th samples in the fused subspace is the same, the two samples belong to the same subset U_k。

As a preferred scheme of the method for positioning human face feature points based on the double-layer cascade regression model, the method comprises the following steps: the step of generating an initial shape for the second layer model using the 3D fitting method comprises,

respectively extracting 14 coordinate point sets and 68 coordinate point sets by using a 3DMM model formed by dense 3D coordinate point sets, and respectively recording the coordinate point sets and the 68 coordinate point sets as 3D simplified human face shapes

And 3D full face shape

Projecting the 3D simplified face shape onto a 2D plane using weak perspective projection, for s_3dFitting with a human face image, comprising:

where f is the scaling factor, P_oIs an orthographic projection matrix

R (alpha, beta, gamma) is a 3 x 3 rotation matrix formed by pitch angle alpha, yaw angle beta and roll angle gamma, t_3dIs a displacement vector.

Is composed of

Projection on 2D planeShadow;

if the 2D simplified face shape output by the first-layer cascade regression model is recorded as

By minimizing

And

the Euclidean distance between them, the parameters f, R and t are determined_3dTo achieve a fit of the face image to the 3D face shape, comprising:

after calculating the parameters of the radiation transformation by using a least square method, simplifying the shape of the human face by using 3D in the weak perspective projection formula

Substitution into 3D full face shape

The projection of the 3D complete face shape fitted with the face image on the 2D plane can be obtained

When the face pose changes, the feature points of the contour part marked on the 3D face shape are shielded by the cheek, and the real contour feature points should be at the boundary of the cheek.

And judging whether the face is deviated to the left side or the right side by using the yaw angle beta obtained in the fitting process.

As a preferred scheme of the method for positioning human face feature points based on the double-layer cascade regression model, the method comprises the following steps: the step of generating an initial shape for the second layer model using the 3D fitting method further comprises,

when the face is deviated to the left side, searching the x coordinate minimum point in the corresponding coordinate point subset by the 8 contour feature points on the left side, and taking the x coordinate minimum point as a new contour feature point;

when the face is deviated to the right side, searching the maximum point of the x coordinate in the corresponding coordinate point subset by the 8 contour feature points on the right side, and marking the maximum point as a new contour feature point;

the 8 contour feature points are 3 added contour feature points on the basis of the 5 contour feature points.

The invention has the beneficial effects that: the invention provides a face feature point positioning method based on a double-layer cascade regression model, which utilizes a subspace division method and a 3D fitting method and combines a double-layer cascade regression structure to improve the attitude robustness of the face feature point positioning method and further improve the overall positioning precision.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise. Wherein:

FIG. 1 is a schematic flow chart of a face feature point positioning method based on a double-layer cascade regression model;

FIG. 2 is a schematic diagram showing the results of different simplified human face shape positioning errors and fitting errors in a human face feature point positioning method based on a double-layer cascade regression model;

FIG. 3 is a two-dimensional subspace face image distribution diagram of a face feature point localization method based on a two-level cascade regression model;

FIG. 4 is a schematic diagram showing the influence of the K value of the face feature point positioning method based on the double-layer cascade regression model on the positioning effect of the simplified face shape on different data sets;

FIG. 5 is a schematic diagram showing the influence of two subspace partitioning methods of a face feature point positioning method based on a double-layer cascade regression model on the positioning effect of a simplified face shape on a 300-W data set;

FIG. 6 is a schematic diagram showing the influence of the K value of the human face feature point positioning method based on the double-layer cascade regression model on the positioning effect of the complete human face shape on different data sets;

FIG. 7 is a schematic diagram of CED curves of each algorithm of the face feature point positioning method based on a double-layer cascade regression model on a 300-W corpus;

FIG. 8 is a schematic diagram of the positioning results of different algorithms in the prior art of a human face feature point positioning method based on a double-layer cascade regression model on a 300-W test set;

FIG. 9 is a partial positioning result of the algorithm of the face feature point positioning method based on the double-layer cascade regression model on a 300-W test set.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, specific embodiments accompanied with figures are described in detail below, and it is apparent that the described embodiments are a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present invention, shall fall within the protection scope of the present invention.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.

Furthermore, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

The present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially in general scale for convenience of illustration, and the drawings are only exemplary and should not be construed as limiting the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.

Meanwhile, in the description of the present invention, it should be noted that the terms "upper, lower, inner and outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation and operate, and thus, cannot be construed as limiting the present invention. Furthermore, the terms first, second, or third are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

The terms "mounted, connected and connected" in the present invention are to be understood broadly, unless otherwise explicitly specified or limited, for example: can be fixedly connected, detachably connected or integrally connected; they may be mechanically, electrically, or directly connected, or indirectly connected through intervening media, or may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Example 1

Referring to fig. 1, a schematic flow chart of a method for locating a facial feature point based on a two-layer cascade regression model according to a first embodiment of the present invention is provided, and as shown in fig. 1, the method for locating a facial feature point based on a two-layer cascade regression model includes: amplifying the sample by a method of randomly selecting a human face shape from a training sample set as an initial shape; selecting 14 points which play a key role in positioning the characteristic points of the face from the complete face shape with 68 points to form a simplified face shape; constructing a double-layer cascade regression model, wherein the first layer is used for predicting the simplified human face shape, and the second layer is used for predicting the complete human face shape; in order to improve the attitude robustness, a subspace division method is utilized in a first-layer cascade regression model to divide a sample into a plurality of subsets, and a regressor is respectively trained by each subset; the projection from the simplified face shape to the complete face shape is realized by utilizing a 3D fitting mode between the two layers, and a rough initial complete face shape is generated; the first layer model uses a neutral simplified face shape as an initial shape, and the second layer model uses the result of the 3D fitting as an initial shape.

Specifically, the method for positioning the human face feature points based on the double-layer cascade regression model comprises the steps of utilizing a fusion subspace to divide a sample subset, and defining a human face image in a training set and a corresponding real shape set

Where N is the number of samples, I_iAnd

respectively, the ith personal face image and the corresponding real shape. Firstly, matching the average simplified human face shape with the bounding boxes of all training set samples, and defining the matching result as

Then calculating the shape residual

And PHOG-based shape indexing features

Analyzing the shape residual error and the shape index characteristic by using typical correlation analysis (CCA) to obtain projection matrixes P and Q corresponding to the shape residual error and the shape index characteristic; since the shape residual is not known during the test, only the projection results of the shape index features are used and the subspace of the projection results is named the fusion subspace

Where r is the column dimension of the fusion subspace, z_i,jRepresenting the projection result of the shape index feature of the ith sample in the jth dimensional subspace; if it is notThe sign of each dimension of the ith and the g-th samples in the fused subspace is the same, so that the two samples belong to the same subset U_k。

A face identification method for sparse mixed dictionary learning further includes generating an initial shape for a second layer model using 3D fitting. In this embodiment, a neutral face shape in the 3D mm model is used, and according to coordinate points required for positioning face feature points, 14 coordinate point sets and 68 coordinate point sets are extracted from the dense 3D coordinate point set and are respectively recorded as a 3D simplified face shape

And 3D full face shape

To make s_3dFitting with the face image, the embodiment firstly projects the 3D simplified face shape by using weak perspective projection

To the 2D plane. If the 2D simplified face shape output by the first-layer cascade regression model is recorded as

Then by minimizing

And

the Euclidean distance between them, the pair can be determined

And executing affine transformation parameters of weak perspective projection to realize the fitting of the face image and the 3D face shape. This example uses least squares to calculate the parameters of the radial transformation for a 3D full face shape

Performing the same affine transformationAlternatively, the projection of the 3D complete face shape fitted with the face image on the 2D plane can be obtained

When the face pose changes, the feature points of the contour part marked on the 3D face shape are shielded by the cheek, and the real contour feature points should be at the boundary of the cheek. This embodiment establishes a subset of coordinate points parallel to themselves for the 16 contour feature points (excluding the bottommost contour feature point). And judging whether the face is deviated to the left side or the right side by using the yaw angle beta obtained in the fitting process. When the face is deviated to the left side, searching the x coordinate minimum point in the corresponding coordinate point subset by the 8 contour feature points on the left side, and taking the x coordinate minimum point as a new contour feature point; when the face is inclined to the right side, the 8 contour feature points on the right side search the maximum point of the x coordinate in the corresponding coordinate point subset and mark the maximum point as a new contour feature point.

Preferably, the present embodiment combines the two methods, and constructs a two-layer cascade regression model based on the conventional cascade regression model. Firstly, extracting a shape index characteristic based on PHOG, projecting a projection matrix P obtained in a training process to a fusion subspace, and dividing K sample subsets according to the projection matrix P; then, respectively training the feature mapping functions according to different subsets

And regression device

Simplified face shape obtained by T iterative regression

Obtaining affine transformation parameters by using the 3D fitting method, and projecting to obtain a rough complete human face shape s⁰(ii) a And finally, training a feature extraction function phi and a regressor W according to all training samples and the roughly complete human face shape obtained in the previous step, and performing T times of iteration to realize fine regression.

Example 2

Referring to fig. 2 to 9, a second embodiment of the present invention, which is different from the first embodiment, is: the experimental samples used were from two widely used sets of facial feature point location data: HELEN dataset and 300-W dataset. These datasets contain unconstrained face images with pose deflection, occlusion and illumination variation, which are somewhat challenging. In the training process, all samples are horizontally turned, and then 10 human face shapes are randomly sampled from other training samples for each sample to serve as initial shapes, and the total amount of the samples is expanded by 20 times. In the testing process, the first-level cascade regression model still uses the average human face shape as the initial shape, and the second-level cascade regression model uses the result of 3D fitting as the initial shape.

Setting parameters: the first-level and second-level joint regression models adopt basically the same parameters, the number G of decision trees in each random forest is 10, the depth D of the decision trees is 5, and the iteration number T is 7. Because the shape change space of the second-level joint regression model is smaller than that of the first layer, the first-level and second-level joint regression models adopt different ranges when the pixel difference characteristics are randomly selected, the radius of the first layer is decreased from 0.4 to 0.08 along with the iteration times, and the radius of the second layer is decreased from 0.3 to 0.06 (both are distances normalized by a face boundary box) along with the iteration times.

Evaluation criteria: as with most current facial feature point localization algorithms, Normalized Mean Error (NME) is used to measure the accuracy of the algorithm. The normalized standard error is calculated as follows:

where N is the total number of samples, L is the number of feature points of the face shape, s_iAnd

predicted face shape and true face shape, respectively, for the ith sample, d_ipdIt is the euclidean distance between the centers of the eyes of the ith sample.

Based on the above, the evaluation of the fitting effect of the simplified face shape is first performed, and before the simplified face shape is located, it is first determined which feature points are specifically used as the simplified face shape. In the embodiment, based on the 5-point shape including the center of the eyes, the nose tip and the mouth corner, partial feature points are added to the positions of the eyes, the nose, the mouth and the outline respectively, and a total of 6 simplified human face shapes are selected. In order to select the most suitable simplified face shape from these shapes, the present embodiment evaluates from two aspects: (1) training a single-layer cascade regression model based on each simplified human face shape, calculating a normalized standard error of each model on a test set sample, and recording the normalized standard error as a positioning error; (2) and (3) generating a complete face shape according to the 3D fitting method provided by the embodiment by using the positioning result obtained in the step (1), and calculating an average normalized error between the generated complete face shape and the real face shape of the sample, and recording the average normalized error as a fitting error. Fig. 2 shows the calculated positioning error and fitting error of 6 simplified face shapes on the HELEN data set according to the above criteria. Meanwhile, in order to prove the advantage of simplifying the face shape, the figure is marked with an error of directly positioning the complete face shape and an error of a newly generated complete face shape after 3D fitting is carried out according to the positioning result.

As can be seen from the positioning error curve and the fitting error curve in fig. 2, the 8-point shape and the 16-point shape are increased by 3 points and 2 points in the contour portion compared with the 5-point shape and the 14-point shape, respectively, and the positioning error is increased accordingly, which proves that the feature of the contour portion cannot be reflected well by the texture feature. However, the fitting error of the 8-point shape is greatly reduced compared with that of the 5-point shape, and the fact that the whole shape can be constrained by adding a small number of feature points of the contour part is proved to reduce the fitting error. Compared with the former shape, the shapes of the 8 points, the 10 points, the 12 points and the 14 points are respectively improved by adding 3 contour feature points, changing 2 binocular central feature points into 4 binocular canthus feature points, adding 2 nose wing feature points and adding 2 upper and lower lip feature points, the error of fitting is gradually reduced, and the fact that the feature points of the five sense organs are added can be proved to obtain better fitting effect. However, in consideration of the operation cost, the present embodiment does not continue to try to add more feature points to the facial features.

It can also be seen from fig. 2 that the positioning error of the complete face shape containing 68 points is higher than that of all simplified face shapes, which indicates that a large number of non-critical feature points may increase the difficulty of positioning. In the selected 6 simplified human face shapes, the minimum positioning error and fitting error are both 14-point shapes, the specific fitting error is 9.22%, and the positioning and fitting operation is directly performed on the complete human face shape, so that the obtained fitting error is 9.63%. Therefore, the simplified human face shape can improve the positioning accuracy and prevent the 3D model from being under-fitted due to the mutual constraint of excessive feature points. In the following experiments, the 14-point shape was taken as the simplified face shape.

Preferably, the evaluation of the effect of subspace partitioning, in order to prove the effectiveness of the fusion subspace partitioning method proposed in this embodiment, will be evaluated from two aspects: (1) the relevance of the subspace and the human face pose is fused; (2) the result of subspace partitioning affects the positioning effect of the first-layer cascade regression model.

Because the face image has the characteristic of manifold distribution, an effective feature subspace should also meet the characteristic. According to the subspace partitioning method proposed in this embodiment, features are extracted from samples in the 300-W data set and projected to the fusion subspace, and the two-dimensional distribution of partial face images in the fusion subspace is as shown in fig. 3. It can be easily seen that, along the horizontal direction in the figure, the yaw angle of the sample changes, i.e. gradually transitions from going to the left to going to the right; and along the vertical direction in the figure, the rolling angle of the sample changes, namely, the rolling angle gradually changes from left tilting to right tilting. The sample posture deflection angle distributed in the center of the image is small, and the sample posture deflection angle is mainly a front face; and the posture deflection angles of the surrounding samples are gradually increased according to the variation trends in the horizontal and vertical directions. This proves that the fusion subspace satisfies the manifold distribution characteristics of the human face and can be used for dividing sample subsets of different postures.

In order to evaluate the influence of the number K of the sample subsets on the positioning effect of the first-layer cascade regression model, the embodiment calculates the normalized average error of the simplified human face shape positioning on different data sets when different K values are taken, and the specific result is shown in fig. 4. Where K-1 represents no subspace partitioning, a single regression model is trained using all samples. It can be seen that on the HELEN data set, the error is decreased from K ═ 1 to K ═ 4, and is decreased from 4.46% to 4.32%, which proves that the subspace partitioning method can effectively reduce the positioning error. However, in the case of K8 and K16, the error increases again, even when K16 exceeds the case in which no subspace division is performed. The reason for this is that as the number of subsets increases, the number of training samples in each subset also decreases, resulting in a decrease in the robustness of the model; secondly, the attitude change of the HELEN data set is small, and the number of subsets is too large, so that a chaotic partitioning result is easily caused. On the 300-W challenge set with rich posture variation, the subspace partitioning method can greatly reduce the positioning error from K to 1 to K to 8, and the error is reduced from 13.83% to 11.48%. The error rises slightly back at K16 for reasons similar to the HELEN dataset. According to the experimental result, when the algorithm is compared with other algorithms, the values of K on the HELEN data set and the 300-W data set are respectively 4 and 8.

In order to prove that the fusion subspace partitioning method is more advantageous than the PCA subspace partitioning method, this embodiment compares the influence of the two subspace partitioning methods on the positioning error of the simplified face shape on the 300-W challenge set when K is 2, 4, and 8, respectively, as shown in fig. 5. The position of the dotted line in the figure is the positioning error when the subspace division is not used, and it can be seen that the positioning error can be reduced by the two subspace division methods, but the effect of the division method based on the fusion subspace is obviously due to the PCA subspace. Since the partitioning method based on the PCA subspace cannot be applied to a static face, the method is implemented only by using the shape index feature in the experiment. The method ignores the correlation between the shape index characteristic and the shape residual error, and the obtained division result is relatively disordered; the method of the embodiment combines the characteristics of the two, so that the gesture division is more accurate, and lower alignment errors can be achieved in all values of K.

Further, the effect evaluation of the double-layer cascade regression model is based on the unmodified LBF algorithm, the double-layer cascade regression model without subspace partitioning (K ═ 1), and the positioning result of the double-layer cascade regression model when K takes different values on the complete face shape on the HELEN data set and the 300-W data set.

Since some of the parameters used in this embodiment are different from the original text of the LBF algorithm, the experimental data in table 1 about the LBF algorithm is different from the data in the text. From table 1, it can be seen that the two-layer cascade regression model without subspace partitioning (K ═ 1) performs better on all data sets than the improved LBF algorithm, which proves that the two-layer model is better than the single-layer model. After continuing to add the improvement of subspace partitioning, the error of the algorithm gradually decreases on the HELEN data set and the 300-W challenge set as the K value increases, and the error decreases to the minimum when K is 4 and K is 8 respectively, which are respectively reduced by 4% and 23.55% compared with the original algorithm. This corresponds to the positioning effect of simplifying the shape of the face, so that the range of K values only takes 4 and 8, respectively. On the 300-W ordinary set, the error of the double-layer cascade regression model with different K values is very small, and the reason for this phenomenon may be that the ordinary set samples with small attitude change are densely distributed in the central position of the subspace and close to the boundary of the partition. After subspace partitioning, some face samples may be assigned to inappropriate subsets, and this improvement is not very apparent for optimizing the localization result of the face image. However, after averaging the positioning results of the challenge set, the positioning error of the 300-W full set is reduced with the increase of the K value, which is reduced by 11.24% compared with the original algorithm.

As shown in fig. 6, where the results of the 7 th iteration correspond to the data in table 1. It can be seen that the accurate initial shape can enable the second-level joint regression model to have advantages in the first iteration, and the advantages can be maintained until the last iteration, so that a better positioning effect is achieved.

Table 1 full face shape localization effect on different data sets (%)

Based on the above, for comparison with the current existing methods, the present embodiment will use the normalized mean error of each algorithm on different data sets, the CED curves and their specific localization results as the evaluation criteria.

Table 2 shows the normalized mean error of the different algorithms on the HELEN and 300-W data sets, both from the original chapters of the algorithms or their related articles. The ESR, SDM, LBF, CFSS and MCO are algorithms based on cascade regression, and the CFAN, 3DDFA and PIFA-S are algorithms based on deep learning.

According to the data in the table, the algorithm of the embodiment has the smallest error in all comparison algorithms on the HELEN data set with small posture change and the 300-W common set, and is even better than three algorithms based on deep learning. The reason for this should be that the CFAN only changes the feature mapping function to the deep self-coder network, which in essence still belongs to single-layer cascade regression; the 3D FA and the PIFA-S are based on a 3D MM model, and the result of 3D face alignment is not superior to that of 2D face alignment. The result also proves that the precision of the algorithm is well improved by the regression process from coarse to fine of the double-layer cascade regression model. On the 300-W challenge set with rich posture change, the algorithm of the embodiment ranks second in the algorithm based on the cascade regression, is only second to the CFSS algorithm, and has certain competitiveness compared with two algorithms based on deep learning. The CFSS selects the candidate shapes by calculating the probability matrix, and merges the updating results of a plurality of candidate shapes to generate a new shape, so that the positioning accuracy is improved at the cost of huge calculation amount; the 3D DMM parameters are optimized by the aid of the convolutional neural network by the aid of the 3D DDFA and the PIFA-S, 3D face alignment is achieved, and robustness to posture and shielding changes is high. After the results of the 300-W normal set and the challenge set are averaged, the result of the algorithm of the embodiment is only second to CFSS, and exceeds all other algorithms based on cascade regression and two algorithms based on deep learning, so that the algorithm of the embodiment has a good competitive advantage compared with the currently advanced algorithm.

TABLE 2 normalized mean error (%) -on 300-W data set

As shown in fig. 7, the normalized average error corresponding to the SDM algorithm reproduced in this embodiment is 7.04%, which is better than 7.50% of the original text; the normalized average error corresponding to the recurrent LBF algorithm is 6.59%, which is slightly worse than 6.32% of the original text, because in order to prevent the running memory consumption from being too large due to too high feature dimension, the number of decision trees is reduced from 1200 to 680 in the original text during recurrence; the corresponding normalized average error of the recurrent CFSS algorithm is 6.00%, and the MATLAB code provided by the original author on Github is used, so that the algorithm of the embodiment has certain advantages compared with other algorithms.

As shown in fig. 8, the first row is the true shape of the sample, and the second to fifth rows are the positioning results of the algorithm, CFSS, LBF and SDM in this embodiment, respectively. As can be seen from the first column to the fourth column, the algorithm of the present embodiment can obtain a result very similar to the real shape, and errors can be seen in all the other three algorithms; in the fifth column to the eighth column, the result of the algorithm of this embodiment has a certain error, the SDM and LBF basic positioning fails, and the result of CFSS in the fifth and sixth columns is better than that of the algorithm of this embodiment, but the result of CFSS in the seventh and eighth columns with larger pitch angle fails, and the result is inferior to that of the algorithm of this embodiment. From these results, it can be seen that the algorithm of the present embodiment has a certain robustness in posture.

As shown in fig. 9, the samples marked with feature points represent the simple case, the difficult but successfully predicted case and the failed case, respectively. The main reason for the failure is that the facial expression of the sample is exaggerated. From these results, it is clear that the proposed method is pose robust.

It should be recognized that embodiments of the present invention can be realized and implemented by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The methods may be implemented in a computer program using standard programming techniques, including a non-transitory computer-readable storage medium configured with the computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner, according to the methods and figures described in the detailed description. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.

Further, the operations of processes described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described herein (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) collectively executed on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.

Further, the method may be implemented in any type of computing platform operatively connected to a suitable interface, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and the like. Aspects of the invention may be embodied in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or write storage medium, RAM, ROM, or the like, such that it may be read by a programmable computer, which when read by the storage medium or device, is operative to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described herein includes these and other different types of non-transitory computer-readable storage media when such media include instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The invention also includes the computer itself when programmed according to the methods and techniques described herein. A computer program can be applied to input data to perform the functions described herein to transform the input data to generate output data that is stored to non-volatile memory. The output information may also be applied to one or more output devices, such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including particular visual depictions of physical and tangible objects produced on a display.

As used in this application, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being: a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of example, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).

It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims

1. A face feature point positioning method based on a double-layer cascade regression model is characterized by comprising the following steps: comprises the steps of (a) preparing a mixture of a plurality of raw materials,

amplifying the sample by a method of randomly selecting a human face shape from a training sample set as an initial shape;

2. The method for locating the facial feature points based on the double-layer cascade regression model according to claim 1, wherein: the step of constructing the two-layer cascading regression model includes,

And regression device

Simplified face shape obtained by T iterative regression

3. The method for locating the facial feature points based on the double-layer cascade regression model according to claim 1 or 2, wherein: the step of selecting the optimal simplified face shape includes,

4. The method for locating the facial feature points based on the double-layer cascade regression model as claimed in claim 3, wherein: the step of selecting the optimal simplified face shape may further comprise,

5. The method for locating the facial feature points based on the double-layer cascade regression model as claimed in claim 4, wherein: the positioning error calculation method comprises the steps of calculating the normalized standard error of each model on a test set sample based on each simplified face shape training single-layer cascade regression model;

6. The method for locating the facial feature points based on the double-layer cascade regression model as claimed in claim 2, wherein: the fusion subspace comprises (a) a number of sub-spaces,

7. The method for locating the facial feature points based on the double-layer cascade regression model as claimed in claim 6, wherein: the step of dividing the samples into K sample subsets using said fusion subspace comprises,

defining a set of face images and corresponding true shapes in a training set,

where N is the number of samples, I_iAnd

respectively an ith personal face image and a corresponding real shape;

The shape residual is calculated and,

and PHOG-based shape indexing features

the subsets are divided according to the following formula,

8. the method for locating the facial feature points based on the double-layer cascade regression model as claimed in claim 7, wherein: if the sign of each dimension of the ith and the g-th samples in the fused subspace is the same, the two samples belong to the same subset U_k。

9. The method for locating the facial feature points based on the double-layer cascade regression model as claimed in claim 8, wherein: the step of generating an initial shape for the second layer model using the 3D fitting method comprises,

And 3D full face shape

Simplifying 3D human face shape with weak perspective projectionProjected onto a 2D plane, pair s_3dFitting with a human face image, comprising:

where f is the scaling factor, P_oIs an orthographic projection matrix

Is composed of

Projection onto a 2D plane;

By minimizing

And

Substitution into 3D full face shape

10. The method for locating facial feature points based on a double-layer cascade regression model as claimed in claim 9, wherein: the step of generating an initial shape for the second layer model using the 3D fitting method further comprises,