CN109766866B

CN109766866B - Face characteristic point real-time detection method and detection system based on three-dimensional reconstruction

Info

Publication number: CN109766866B
Application number: CN201910057766.6A
Authority: CN
Inventors: 汪令野; 沈江洋
Original assignee: Hangzhou Meidai Technology Co ltd
Current assignee: Hangzhou Meidai Technology Co ltd
Priority date: 2019-01-22
Filing date: 2019-01-22
Publication date: 2020-09-18
Anticipated expiration: 2039-01-22
Also published as: CN109766866A

Abstract

The invention discloses a human face characteristic point real-time detection method and a human face characteristic point real-time detection system based on three-dimensional reconstruction, wherein the detection method comprises the following steps: (1) acquiring a human face image frame, and reconstructing a human face geometric model in real time by using a geometric reconstruction algorithm; (2) preprocessing a face geometric model, and extracting face point cloud data; (3) inputting the human face point cloud data into a facePointNet consisting of a rough PlainFPN network and a block cascade network CascadeFPN to detect the human face three-dimensional feature points to obtain the human face three-dimensional feature points; the rough PlainFPN network is used for detecting cheek edge feature points of a human face and internal feature points with rough precision, the CascadeFPN divides point cloud data of the human face into a plurality of blocks according to the internal feature points with the rough precision, and the internal feature points with fine precision of each block are respectively detected. The method of the invention can improve the accuracy and the real-time performance of the detection method of the three-dimensional characteristic points of the human face.

Description

Face characteristic point real-time detection method and detection system based on three-dimensional reconstruction

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a human face characteristic point real-time detection method and a human face characteristic point real-time detection system based on three-dimensional reconstruction.

Background

With the popularization of mobile terminal RGB-D cameras, consumer grade RGB-D cameras assembled on mobile equipment can directly reach consumer terminal users, and the three-dimensional vision is certainly improved. And the three-dimensional information modeling of the human face is the basis of various fields such as mobile safety, mobile AR, mobile VR and the like, and provides infinite possibility for richer mobile application. In the three-dimensional information of the human face, the geometric grids, the appearance textures and the three-dimensional feature points of the human face have the greatest expressive force and practical value. However, the computing capability and the storage capability of the mobile device are limited, and a friendly interactive system and an efficient algorithm need to be designed on the mobile device to acquire the three-dimensional face information.

In the research of the three-dimensional geometric mesh reconstruction algorithm, a considerable part of algorithms give consideration to both performance and effect, and meanwhile, due to the continuous improvement of the operational performance of equipment, the RGB-D frame can be reconstructed in real time, so that a good effect is achieved. In addition, the appearance texture also contains rich information of the three-dimensional object. However, restoring the true appearance texture of a three-dimensional object is very challenging due to camera pose errors, geometric grid errors, complex illumination, and other factors. Most of the existing mainstream methods aim at general scenes, a complex global nonlinear optimization model is established for iterative solution, the operation is time-consuming, prior information of the face is not considered, and the method cannot be well adapted to the face.

The three-dimensional characteristic points of the human face also contain key information of the human face, and are the premise of higher-level applications such as human face recognition, human face animation, 3D printing and manufacturing and the like. The existing face feature point positioning research mainly focuses on the field of two-dimensional images. Of course, some scholars also provide effective algorithms for feature point positioning of the human face three-dimensional model. However, some models of the existing algorithms are too complex and have low operation efficiency, and the precision of results of other models cannot meet the requirement. Three-dimensional feature point localization differs from two-dimensional feature point localization in that the format of the input data is different. The input of two-dimensional data benefits from the development of image convolution neural networks, and great research progress is achieved. Therefore, many scholars begin to explore deep learning methods suitable for three-dimensional data storage structures, and this popular research field is called three-dimensional deep learning. With the continuous development of three-dimensional deep learning in the general field, the detection of the three-dimensional feature points of the human face is used as a sub-problem which is easy to expand and can be better solved.

Disclosure of Invention

The invention aims to provide a human face characteristic point real-time detection method and a human face characteristic point real-time detection system based on three-dimensional reconstruction, which can improve the accuracy and the real-time performance of the human face three-dimensional characteristic point detection method.

In order to achieve the purpose, the invention provides the following technical scheme:

a human face characteristic point real-time detection method based on three-dimensional reconstruction comprises the following steps:

(1) acquiring a human face image frame, and reconstructing a human face geometric model in real time by using a geometric reconstruction algorithm;

(2) preprocessing a face geometric model, and extracting face point cloud data;

(3) inputting the human face point cloud data into a facePointNet consisting of a rough PlainFPN network and a block cascade network CascadeFPN to detect the human face three-dimensional feature points to obtain the human face three-dimensional feature points;

the rough PlainFPN network is used for detecting cheek edge feature points of a human face and internal feature points with rough precision, the CascadeFPN divides point cloud data of the human face into a plurality of blocks according to the internal feature points with the rough precision, and the internal feature points with fine precision of each block are respectively detected.

The invention provides facePointNet aiming at the detection of the three-dimensional feature points of the human face. The block cascade network CascadeFPN detects three-dimensional feature points in the human face from coarse to fine according to the block cascade thought. Compared with the mainstream human face three-dimensional characteristic point detection method, the method is more accurate and has higher real-time performance.

Preferably, in step (1), the acquired face image frames are continuous image frames with aligned face poses. When the human face image frame is collected, a two-dimensional human face alignment algorithm is executed on an input picture, and the real-time human face posture and the two-dimensional feature points are detected. And judging whether the human face is aligned according to the current human face posture, and when the human face is not initially aligned, the system can feed back and prompt the user to adjust the posture.

In the step (1), the geometric reconstruction algorithm adopts a KinectFusion algorithm, and the specific reconstruction process is as follows:

(1-1) performing space region division operation on the face image, detecting two-dimensional characteristic points of the face image, and dividing a bounding box cube of the face by combining a depth frame;

and (1-2) realizing a Kinectfusion algorithm in the obtained bounding box cube, and performing real-time human face three-dimensional geometric reconstruction.

In the step (2), the pretreatment comprises alignment and cutting treatment, and the specific process comprises the following steps:

firstly, roughly aligning a geometric model of a human face to be detected to a coordinate space of a standard human face model; then, the nose tip is taken as the center of a sphere, the spherical surface with the radius of 80-100 mm is used for cutting the face, and the point cloud data of the face is obtained after cutting.

The rough alignment is to avoid the influence of the rotation of the input point cloud model on the detection result, the cutting is to extract the effective face part, and the adverse influence of invalid geometric interference data on the detection result is avoided. Through the preprocessing operation, the robustness of the whole detection method can be improved.

In the step (3), the

The PlainFPN network is divided into two modules, wherein one module is used for detecting the cheek edge characteristic points of the human face, and the other module is used for detecting the internal characteristic points of the human face with coarse precision.

Each module comprises 5 x-conv layers, the average sampling point number of each layer is 1024, 384, 128 and 128, an average pooling layer is arranged behind each x-conv layer, the average characteristic of each sampling point is extracted, the characteristic is input into a multi-layer perceptron consisting of three fully-connected layers, and finally the three-dimensional coordinates of the human face characteristic points are output.

The x-conv layer is composed of a multilayer perceptron and two ordinary convolution layers, input point cloud data enters the multilayer perceptron to be subjected to dimensionality enhancement, the obtained data is input into one convolution layer to obtain a transformation matrix, the transformation matrix is multiplied by the point cloud data subjected to dimensionality enhancement, the point cloud data is input into the other convolution layer, and finally a result is output.

The block cascade network CascadeFPN comprises: the system comprises a point cloud segmentation module for segmenting the characteristic points in the human face and a plurality of fine PlainFPN networks respectively corresponding to each segmented block after segmentation.

CascadeFPN takes the coarse-precision human face internal feature points as input, divides the human face point cloud into four blocks of nose, eyes, eyebrows and mouth by using the feature points, and then inputs the point cloud data of each block into a corresponding fine PlainFPN network. Each fine PlainFPN network comprises 4 x-conv layers, the number of sampling points of each layer is different, and other parts of the network have the same structure as the coarse PlainFPN network. And finally, outputting the three-dimensional coordinates of the feature points of the corresponding parts by each thin PlainFPN network.

According to the size of each part of the face and the scale of point cloud, four sampling strategies are respectively designed, and the sampling strategy of each block corresponding to the thin PlainFPN network is as follows:

the average sampling points of each x-con layer of the nose block are 738, 384, 128 and 128 respectively, the average sampling points of the eyes and the eyebrows are the same and are 400, 256, 128 and 128 respectively, and the average sampling points of the mouth are 512, 256, 128 and 128 respectively.

In the step (3), the specific process of obtaining the three-dimensional feature points of the human face comprises the following steps:

(3-1) inputting the point cloud data into a rough PlainFPN network to obtain rough aligned feature points including cheek edge feature points and internal feature points;

(3-2) inputting the roughly aligned internal feature points into CascadeFPN, and performing segmentation operation on point cloud data to obtain four blocks of eyebrows, eyes, a nose and a mouth;

and (3-3) respectively inputting the point cloud data of each block and the corresponding internal feature points into a fine PlainFPN network corresponding to each block in the CascadeFPN network to obtain the precisely aligned internal feature points.

The coarse PlaneFPN network designed by the invention can output more accurate three-dimensional characteristic points including cheek edge characteristic points and internal characteristic points. Since the local feature of the feature point of the cheek edge is not obvious and the noise is large, it is difficult to improve the accuracy thereof. However, for the characteristic points inside the human face, the blocking cascade network CascadeFPN of the invention is utilized to block the input point cloud, so that the local geometric details can be better extracted, the precision of the characteristic points can be continuously optimized from coarse to fine, and a more accurate result can be finally obtained,

the invention also discloses a system for detecting the human face characteristic points in real time based on three-dimensional reconstruction, which comprises a mobile terminal with a camera, wherein the mobile terminal also comprises:

the geometric reconstruction module is used for constructing a human face three-dimensional geometric model in real time by using human face image frames acquired by the camera;

the preprocessing module is used for aligning and cutting the three-dimensional geometric model of the human face and outputting point cloud data of the human face;

the three-dimensional characteristic point detection module is used for detecting three-dimensional characteristic points of the face point cloud data to obtain three-dimensional characteristic points of the face;

the three-dimensional characteristic point detection module comprises a facePointNet consisting of a coarse PlainFPN network and a block cascade network CascadeFPN; the coarse PlainFPN network is used for detecting cheek edge characteristic points of a human face and coarse-precision internal characteristic points; the CascadeFPN comprises: the system comprises a point cloud segmentation module for segmenting the characteristic points in the human face and a plurality of fine PlainFPN networks respectively corresponding to each segmented block after segmentation.

The invention provides facePointNet aiming at the detection of three-dimensional feature points of a human face, wherein the facePointNet consists of a coarse PlainFPN and a block cascade network CascadeFPN and is respectively used for detecting the feature points of the edge of the cheek of the human face and the internal feature points of the human face. The CascadeFPN is a cascade network structure designed according to the thought of block cascade, and detects three-dimensional feature points in the human face from coarse to fine. Compared with the mainstream human face three-dimensional feature point detection method at present, the method has higher accuracy and higher real-time performance.

Drawings

FIG. 1 is a schematic flow chart of a method for detecting facial feature points in real time based on three-dimensional reconstruction according to the present invention;

FIG. 2 is a schematic flow chart of real-time reconstruction of a geometric model of a human face by using a geometric reconstruction algorithm;

FIG. 3 is a schematic diagram of dividing a face space region, wherein (a) is a two-dimensional feature point, (b) is a three-dimensional feature point, and (c) is a bounding box cube;

FIG. 4 is a schematic diagram of the coarse PlainFPN network structure in the FacePointNet of the present invention;

FIG. 5 is a schematic diagram of CascadeFPN network structure in FacePointNet of the present invention;

FIG. 6 is a schematic diagram of the corresponding fine PlainFPN network structure for four segments in CascadeFPN, wherein (a) the fine PlainFPN network is a nose segment, (b) the fine PlainFPN network is an eye segment, (c) the fine PlainFPN network is an eyebrow segment, and (d) the fine PlainFPN network is a mouth segment;

FIG. 7 is a graph comparing rough alignment and fine alignment of feature points inside a human face, wherein (a) is a rough PlainFPN rough alignment result and (b) is a CascadeFPN fine alignment result;

fig. 8 is a schematic diagram of the labeling relationship of feature points in the face model.

Detailed Description

The invention will be described in further detail below with reference to the drawings and examples, which are intended to facilitate the understanding of the invention without limiting it in any way.

As shown in fig. 1, a schematic flow chart of a method for detecting a feature point of a human face in real time based on three-dimensional reconstruction according to the present invention includes:

step (1), collecting human face image frames, and reconstructing a human face geometric model in real time by using a geometric reconstruction algorithm.

As shown in fig. 2, when reconstructing a geometric face model, a two-dimensional face alignment algorithm needs to be performed on a picture first to detect a real-time face pose and two-dimensional feature points. And judging whether the human face is aligned according to the current human face posture, and when the human face is not initially aligned, the system can feed back and prompt the user to adjust the posture. After the human face postures are aligned, a space region division operation is executed, and the bounding box cube of the human face is divided by detecting two-dimensional feature points and combining with the depth frame so as to initialize real-time geometric reconstruction. Then, real-time geometric reconstruction is executed, and the face geometric model data is output. The specific process is as follows:

(1-1) face initial alignment

In this embodiment, the initial alignment of the face is performed by using an existing Displaced Dynamic Expression (DDE) two-dimensional face alignment algorithm. By inputting continuous RGB videos, the DDE algorithm can calculate information such as the position, size, posture, feature point, etc. of the face in the image in real time, as shown in fig. 3 (a). According to the obtained position, size and posture of the face, the system can prompt a user to adjust the corresponding posture in real time, so that the operation of face initial alignment is completed. At the mobile end, the DDE algorithm is provided by FaceUnity corporation as a commercial version of SDK, which is invoked directly when implemented.

(1-2) spatial region partitioning

And after the initial alignment of the human face is finished, taking the flowing first frame RGB-D frame as a reference frame for dividing a bounding box space region to be reconstructed so as to initialize three-dimensional reconstruction. Record the RGB frame as I₀Depth frame is D₀And obtaining a human face two-dimensional feature point set P of the frame from the DDE algorithm_2d∈R²Wherein each two-dimensional feature point p_2dAre all I₀A two-dimensional image coordinate of (a) in fig. 3. Due to P_2dOnly the feature points in the two-dimensional image I₀The coordinates of (c) are required to utilize with I₀Aligned depth frame D₀And the camera internal reference matrix, calculating and P_2dCorresponding three-dimensional feature point set P_3d∈R³And is used to determine the spatial region of the face, as shown in (b) of fig. 3. Suppose a two-dimensional feature point p_2d∈P_2dThe correspondence p can be calculated_3d∈P_3d

p_3d＝D₀(p_2d)K^-1[p_2d ^T,1]^T(1)

Where K is a 3 × 3 camera internal reference matrix.

However, in most cases, depth values of depth frames of consumer-grade cameras at edge contours are often missing. As shown in fig. 3 (b), the depth values of the gray areas in the depth frame are missing. Therefore, the whole P cannot be used_2dAnd the point set can only use the characteristic points inside the human face to determine the spatial region of the human face. First, we have the convention horizontal to the right as the x-axis, vertical up as the y-axis, and front out as the z-axis. Through the calculation of the formula (1), a three-dimensional feature point set P in the human face can be obtained_3d. Then, P is calculated_3dRectangular parallelepiped (x) of bounding box_min,x_max,y_min,y_max,z_min,z_max) As shown in fig. 3 (b). Finally, according to the prior constraint of the geometric proportion of the face, a bounding box cube (c) where the face is located is divided_x,c_y,c_z,c_w) As shown in fig. 3 (c). Wherein (c)_x,c_y,c_z) Denotes the center of the cube, c_wRepresenting the side length of the cube, the calculation process is as follows

c_w＝2.2(y_max-y_min) (2)

c_x＝0.5x_min+0.5x_max(3)

c_y＝0.2y_min+0.8y_max(4)

c_z＝y_max-0.5c_w+Δ (5)

Where Δ is a fixed small offset to prevent the tip of the nose from being cut by error, we take Δ 15mm when we do so. The space region is divided through the calculation, and the reconstruction precision of the geometric model can be adjusted in a self-adaptive mode for users with different face shapes.

(1-3) real-time geometric reconstruction

Dividing bounding box cubic areas (c) where the human faces are located through the calculation of the formulas (2) to (5)_x,c_y,c_z,c_w) Later, real-time geometric reconstruction was performed using the KinectFusion algorithm. Firstly, uniform voxel grid division is carried out on the bounding box cube to initialize the TSDF voxel grid of Kinectfusion, and then the TSDF value is continuously updated according to the transmitted depth frame to obtain a continuously updated human face three-dimensional geometric model.

And (2) preprocessing the geometric model of the human face and extracting point cloud data of the human face.

Firstly, a face model to be detected needs to be roughly aligned to a coordinate space of a standard face model. Two-dimensional internal feature points of the face can be roughly estimated by using a two-dimensional face alignment algorithm and mapped into rough three-dimensional feature points. And taking the first frame in the geometric reconstruction as a reference frame, and obtaining the two-dimensional human face characteristic point coordinates of the first frame. And rendering the depth map by using the reconstructed face model and the camera pose of the first frame, and then obtaining rough three-dimensional feature points in the face by using a transformation method in the space region division in the step (1-2). Then, a standard face model is taken, corresponding feature points are marked, and the face model to be detected can be roughly aligned to the coordinate space of the standard face model by calculating three-dimensional space transformation between two groups of feature points. Then, the nose tip is taken as the center of the sphere, and the spherical surface with the radius of 80-100 mm is used for cutting the face. The cut data is used as the input of the facePointNet network provided by the invention, and the three-dimensional characteristic points of the human face are detected from coarse to fine.

And (3) inputting the point cloud data of the face into a facePointNet network to detect three-dimensional feature points of the face, so as to obtain the three-dimensional feature points of the face.

An intuitive method for obtaining three-dimensional feature points is to map two-dimensional face feature points to three dimensions to obtain three-dimensional feature points of a face. However, the method based on two-dimensional images has obvious defects, cannot fully utilize three-dimensional geometric information and is sensitive to light. Therefore, in order to fully utilize the three-dimensional geometric information of the human face, point cloud data of a human face model is used as input, FacePointNet is designed on the basis of PointCNN, and three-dimensional feature points are detected from coarse to fine. FacePointNet consists of two networks: the rough PlainFPN and the block cascade network CascadeFPN are respectively used for detecting the cheek edge characteristic points of the human face and the internal characteristic points of the human face.

firstly, the invention designs a rough PlainFPN network by utilizing X-Conv convolution, the structure of which is shown in figure 4, the input of the rough PlainFPN network is human face point cloud data V ∈ R obtained after preprocessing³if the input includes n vertices, the input of the network is an n × 3 one-dimensional vector, and the set of feature points to be detected is set to be V_m∈R³if m feature points are included, the output of the network is an m × 3 one-dimensional vector Y ═ Y₁,y₂,…,y_3m}. Let y_iDenotes the fitting value, y_i ^*Representing a true value, the loss function of the network is

In fact, by using the coarse plain fpn network, not only the cheek edge feature points of the face but also the internal feature points of the face can be detected. Moreover, the results of the internal feature points detected with the coarse plain fpn are already better than those obtained with some mainstream methods. However, for the feature points inside the human face, a more accurate result is expected. Starting from the idea of solving the problem by cascade regression, the result of the coarse PlainFPN network can be used as a coarse alignment point. It is believed that the fine-aligned feature points are certainly distributed in the neighborhood of the coarse-aligned points, and can be obtained by performing neighborhood search in the original input point set. In addition, local geometric details can be better extracted by partitioning the input point cloud. Based on the observation, the CascadeFPN network is designed, and the precision of the feature points is continuously optimized from coarse to fine. In addition, since the local features of the feature points of the cheek edges are not obvious and the noise is large, experiments show that it is difficult to improve the accuracy of the cheek edges continuously. Therefore, the CascadeFPN network only improves the detection precision of the internal feature points of the human face.

Shown in fig. 5, in the dashed box is the CascadeFPN network designed by the present invention. Firstly, inputting the point cloud data of the human face into a rough PlainFPN network to obtain rough aligned feature points. Performing segmentation operation on the input point cloud data by using the roughly aligned feature points to obtain 4 blocks: eyebrow, eyes, nose, mouth. The operation of segmenting the point cloud is actually to better extract the local geometric details of the human face. And combining the point clouds of each block and the corresponding coarse characteristic points to form the input of the next step of network. And respectively designing fine PlainFPN networks with different parameters for the 4 blocks, and performing fine alignment operation.

As shown in fig. 6, a fine PlainFPN network designed for each segment, (a) a fine PlainFPN network for nose segments, (b) a fine PlainFPN network for eye segments, (c) a fine PlainFPN network for eyebrow segments, and (d) a fine PlainFPN network for mouth segments. The reason for this is that since the geometric features of each partition are different, different networks are required to extract the corresponding features. The logical structures of the 4 networks shown in fig. 6 are identical, and correspond to the nose, eyes, eyebrows, and mouth blocks, respectively. But the parameters of the various layers in the network are different.

The results of the rough alignment of the crude PlainFPN compared to the fine alignment of CascadeFPN are shown in FIG. 7, where (a) is the result of the rough alignment of the crude PlainFPN and (b) is the result of the fine alignment of CascadeFPN. As can be seen from comparison, CascadeFPN fine alignment has obvious improvement on the characteristic point result of each block. For those areas of the eyebrows and eyes where geometric details of the surface are not very prominent, the topological connections formed by the precisely aligned feature points are more natural than the coarse alignment. For locations where the geometric details of the surface, such as the mouth and nose, are quite prominent, the precisely aligned feature points tend to converge toward the edges of the geometric discontinuities, resulting in more accurate results. Among them, the improvement of the lower edge of the nose and mouth is particularly significant.

In order to verify the effectiveness of the method, the facePointNet provided by the invention is compared with two methods in the current mainstream on an open human face three-dimensional model data set BU3DFE, and the execution efficiency of the algorithm is analyzed.

The BU3DFE data set contains a total of 2500 scan patterns of 100 persons. A total of 1 neutral expression and 6 set expressions were collected per subject, each set expression having 4 intensity levels, thus 25 scan models were collected per subject, each model containing approximately 35000 vertices. The dataset also carries 85 manually labeled feature points as true values. Thus, the characteristic point detection algorithm of the present invention can be tested on this data set and compared to existing methods. In the experiment, 2 of 85 marked characteristic points of BU3DFE are found to be marked incorrectly, so the experiment of the invention only takes the remaining 83 correct characteristic points. The experiment takes the data of 2000 models in 80 as the training set and the data of 500 models in the rest 20 as the testing set.

The facePointNet is tested on a PC, and the hardware configuration of the PC to be tested is Xeon3.40GHz quad-core CPU, 16GB memory and Nvidia GTX 1080 GPU.

Firstly, a training data set is subjected to data enhancement operation, and a Rotation angle is adopted

Is scaled by N (1, 0.1) using a Gaussian distribution of²) The gaussian distribution of (c) while also using coordinate dithering (Jitter). For the PlainFPN network structure shown in fig. 4 and 6, each layer of X-Conv convolution operation has a resampling process for the input data of the previous layer. The sampling method uses the farthest point sampling. This is because the face feature point detection task of the present inventionThe requirement for a uniform distribution of points is higher. Model training then used the Adam algorithm with the training parameters as shown in table 1.

TABLE 1

The labeling relationship of each feature point in the face model is shown in fig. 8, and the following table 2 shows the comparison between the results of the existing method and the FacePointNet method of the present invention. Wherein, the method 1 is a method proposed in the thesis "Shape-based automatic detection of large number of 3D facial lands"; method 2 is the method proposed in the article "Fully automated and highly acid dense phase chromatography for facial diseases". In the table, Mean is an error average value, SD is a Mean square error in mm, and the smaller the value, the higher the detection accuracy.

TABLE 2

The results of the comparison show that the coarse PlainFPN network designed by the invention can obtain the coarse alignment results which are more accurate than the method 1 and are close to the effect of the method 2. As a result of the CascadeFPN network provided by the invention, the average error of the three-dimensional feature points after fine alignment is only 1.77mm, which is greatly improved compared with that of the CascadeFPN network obtained by the method 2 and the coarse PlainFPN network, and is respectively improved by 38.54% and 50.56% on average.

Moreover, method 2 also uses texture data in solving the feature points. The FacePointNet of the invention uses only three-dimensional geometric data, but the result is better. Experiments prove that the CascadeFPN based on the block cascading thought has a remarkable effect on the fine alignment of the feature points. The FacePointNet provided by the invention is an advanced human face three-dimensional feature point detection algorithm.

Under the experimental setting, the detection time of the single characteristic point of the facePointNet on the PC is about 3.61 seconds, and the real-time performance is high. Wherein, the time consumed by the coarse PlainFPN in the coarse alignment process is about 1.28 seconds, and the time consumed by the CascadeFPN in the fine alignment process is about 2.33 seconds, so that the efficiency is higher.

The embodiments described above are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only specific embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions and equivalents made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims

1. A human face characteristic point real-time detection method based on three-dimensional reconstruction is characterized by comprising the following steps:

(2) preprocessing a face geometric model, and extracting face point cloud data;

(3) inputting the point cloud data of the human face into a facePointNet consisting of a rough PlainFPN network and a block cascade network CascadeFPN, and detecting the three-dimensional feature points of the human face to obtain the three-dimensional feature points of the human face;

the rough PlainFPN network is used for detecting cheek edge characteristic points and rough-precision internal characteristic points of a human face and is divided into two modules, each module comprises 5 x-conv layers, the average sampling point number of each layer is 1024, 384, 128, and 128, an average pooling layer is connected behind the x-conv layers and used for extracting the average characteristic of each sampling point, then the characteristic is input into a multilayer perceptron consisting of three fully-connected layers, and finally the three-dimensional coordinates of the human face characteristic points are output;

CascadeFPN divides the point cloud data of the face into a plurality of blocks according to the internal feature point of the coarse precision, and detects the internal feature point of the fine precision of each block respectively, and the method specifically comprises the following steps: the system comprises a point cloud segmentation module for segmenting feature points in the human face and a plurality of fine PlainFPN networks respectively corresponding to each segmented block after segmentation; each fine PlainFPN network comprises 4 layers of x-conv layers, sampling strategies are respectively designed according to the size of each block and the point cloud scale, an average pooling layer and a multi-layer sensing machine consisting of three fully-connected layers are sequentially connected behind the x-conv layers, and finally, each fine PlainFPN network outputs three-dimensional coordinates of corresponding block feature points.

2. The method for detecting the human face feature points based on the three-dimensional reconstruction as claimed in claim 1, wherein in the step (1), the collected human face image frames are continuous image frames aligned with the human face pose.

3. The method for detecting the human face feature points based on the three-dimensional reconstruction as claimed in claim 1, wherein in the step (1), the geometric reconstruction algorithm adopts a KinectFusion algorithm, and the specific reconstruction process is as follows:

4. The method for detecting the human face feature points based on the three-dimensional reconstruction as claimed in claim 1, wherein in the step (2), the specific process of the preprocessing is as follows:

5. The method for detecting the human face characteristic points based on the three-dimensional reconstruction as claimed in claim 1, wherein the point cloud segmentation module divides the human face internal characteristic points into four blocks of nose, eyes, eyebrows and mouth; the sampling strategy of each block corresponding to the fine PlainFPN network is as follows:

the nose is divided into blocks, and the average sampling point number of each x-conv layer is respectively 738, 384, 128 and 128; in the eye and eyebrow blocks, the average sampling points of each x-conv layer are the same and are 400, 256, 128 and 128; in the mouth block, the average sampling points of each x-conv layer are 512, 256, 128 and 128.

6. The method for detecting the human face feature points based on the three-dimensional reconstruction as claimed in claim 1, wherein the specific process of obtaining the human face three-dimensional feature points in the step (3) is as follows:

7. A human face characteristic point real-time detection system based on three-dimensional reconstruction comprises a mobile terminal with a camera, and is characterized in that the mobile terminal further comprises:

the three-dimensional characteristic point detection module comprises a facePointNet consisting of a coarse PlainFPN network and a block cascade network CascadeFPN; the coarse PlainFPN network is used for detecting cheek edge characteristic points of a human face and coarse-precision internal characteristic points; the system is specifically divided into two modules, each module comprises 5 x-conv layers, the average sampling point number of each layer is 1024, 384, 128 and 128, an average pooling layer is connected behind the x-conv layers and used for extracting the average characteristic of each sampling point, then the characteristic is input into a multilayer perceptron consisting of three fully-connected layers, and finally the three-dimensional coordinates of human face characteristic points are output;

the CascadeFPN comprises: the system comprises a point cloud segmentation module for segmenting feature points in the human face and a plurality of fine PlainFPN networks respectively corresponding to each segmented block after segmentation; each fine PlainFPN network comprises 4 layers of x-conv layers, sampling strategies are respectively designed according to the size of each block and the point cloud scale, an average pooling layer and a multi-layer sensing machine consisting of three fully-connected layers are sequentially connected behind the x-conv layers, and finally, each fine PlainFPN network outputs three-dimensional coordinates of corresponding block feature points.