CN111523345B

CN111523345B - Real-time human face tracking system and method

Info

Publication number: CN111523345B
Application number: CN201910103409.9A
Authority: CN
Inventors: 陈英时; 耿敢超; 左建锋
Original assignee: Shanghai Kankan Intelligent Technology Co ltd
Current assignee: Shanghai Kankan Intelligent Technology Co ltd
Priority date: 2019-02-01
Filing date: 2019-02-01
Publication date: 2023-06-23
Anticipated expiration: 2039-02-01
Also published as: CN111523345A

Abstract

The invention discloses a real-time human face tracking system and a real-time human face tracking method. The face detection module is used for calling different face detection models for each frame of image, searching whether a face appears or not, recording the position of the face, and sending the corresponding position to the face feature point positioning module; the face feature point positioning module is used for positioning the coordinates of the second-order feature points of the face and correcting the coordinates of each feature point; the face tracking module is used for tracking the face in the video to obtain continuous spatial gestures. The face real-time tracking system and the face real-time tracking method can improve the accuracy, the processing speed and the stability of real-time face information tracking.

Description

Real-time human face tracking system and method

Technical Field

The invention belongs to the technical field of face tracking, relates to a face real-time tracking system, and particularly relates to a face real-time tracking system and method based on a second-order functional gradient.

Background

Face tracking mainly refers to determining a motion trail of a face in a continuous video sequence, and is widely focused and researched in multiple fields such as computer vision, artificial intelligence and the like; and is widely applied to video monitoring, robots, man-machine interaction and the like. With the explosive growth of mobile devices, more and more applications of face tracking, such as mobile payment, virtual makeup, and mei Yan Zipai, are required, and it is difficult to apply conventional face tracking to these situations, and extensive research is required in combination with the latest algorithms.

The traditional face tracking algorithm is mainly applied to mobile equipment, and has the following two problems 1) that the computing resource cost is high, and the traditional face tracking algorithm is difficult to directly transplant to the mobile equipment. The mobile equipment such as mobile phones and the like has weaker computing capacity and less memory, the traditional high-precision model has large computing capacity, and meanwhile, a large amount of memory is needed, so that after the model is simplified, the precision is inevitably reduced, and accurate tracking is difficult; 2) The robustness is poor, namely, for the situations of large side angle faces, partial shielding and the like, the positioning has deviation and tracking is not achieved.

Early face tracking requires modeling of faces. The shape modeling method includes a deformable template (Deformable Template), a point distribution model (active shape model Active Shape Model), a graph model, and the like. The appearance modeling method comprises global appearance modeling and local appearance modeling, wherein the global method comprises an active appearance model Active Appearance Model (a production model), a discriminant model Boosted Appearance Model (a discriminant model) and the like, and the local appearance modeling is used for modeling the appearance information of a local area and comprises a color model, a projection model, a side profile model and the like. Modeling-based methods are often limited by the model itself, and have low accuracy, and mainly simple models are difficult to express some difficult factors in practical application, including illumination, shielding, variable gestures and the like.

The recent multi-stage shape regression model (cascade shape regression) has made a major breakthrough in accuracy and speed, and the method uses the regression model to directly learn the mapping function from the appearance of the face to the shape of the face (or the parameters of the shape model of the face), so as to establish the correspondence from the appearance to the shape. And the method does not need complex shape and apparent modeling, and is easy to apply. Many comparative tests have shown that they are particularly suitable for use in uncontrolled, uncooperative scenarios, which are the main application scenarios for mobile devices. In addition, the face alignment method based on deep learning also achieves remarkable results. The deep learning combined with the shape regression framework can further improve the accuracy of the positioning model, and becomes one of the mainstream methods of current feature positioning. But because the data model of the deep learning model is huge (often containing tens of millions of variables), it is not suitable for mobile devices, and will not be discussed further below.

In view of this, there is an urgent need to design a face tracking method so as to overcome the above-mentioned drawbacks of the existing face tracking method.

Disclosure of Invention

The invention provides a real-time human face tracking system and a real-time human face tracking method, which can improve the accuracy, the processing speed and the stability of real-time human face information tracking.

In order to solve the technical problems, according to one aspect of the present invention, the following technical scheme is adopted:

a face real-time tracking system, the face real-time tracking system comprising:

the video frame image acquisition module is used for acquiring each frame image of the video;

the face detection module is used for calling different face detection models for each frame of image to find out whether a face appears; the face detection model can traverse each position of the image, judge whether a face appears at each position or not, and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to appear, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;

the face feature point positioning module is used for positioning coordinates (x _i ,y _i ) The method comprises the steps of carrying out a first treatment on the surface of the The face feature point positioning module gradually generates a plurality of decision trees, and each decision tree inputs the current coordinates of the second-order feature points, and the aim is to reduce the value of the second-order functional; modifying the coordinates (x) along the direction of the optimized gradient _i ,y _i ) Each feature point (x _i ,y _i ) Gradient of (2)

Wherein I is _j Is the leaf node where the feature point is located, |I _j I is the number of all feature points on the leaf node; the coordinates of each feature point are corrected as follows: x is x _i ＝x _i +ηdx _i ，y _i ＝y _i +ηdx _i The method comprises the steps of carrying out a first treatment on the surface of the η is a set constant;

the face tracking module is used for tracking the face in the video to obtain continuous spatial gestures; for each frame of video, the second-order characteristic points of the human face can be accurately positioned, and the sequence formed by the characteristic points represents the motion trail of the human face; based on the characteristic points, judging the aerial gesture of the face; the change of the characteristic points among different frames reflects various changes and movements of the human face.

As an embodiment of the present invention, the face detection module considers that a face actually appears only if the reliability is >0.95 for a certain position.

As one embodiment of the present invention, the face detection module invokes at least one face detection model for each frame of image of the video, the models traversing each location of the image; giving a confidence rate for each possible face position; the overall confidence of these confidences is calculated:

confidence rate＝ω ₁ R ₁ +ω ₂ R ₂ +……；

wherein R is ₁ ,R ₂ … are confidence degrees, omega, returned by different face detection models respectively ₁ ，ω ₂ … are set coefficients, respectively;

if the comprehensive confidence rate is more than 0.95, the position has a face; recording the position and sending the position information to a face feature point positioning module; if no face is detected, the next frame of image is continuously detected.

As an implementation mode of the invention, the face feature point positioning module calls a second-order feature point positioning algorithm for the region where each face is located;

b1, initializing coordinates (x) of N face feature points according to a standard template _i ,y _i ) The initial coordinates of each point come from an average face, namely, a plurality of faces are marked as samples, and the average value of the samples is taken for each characteristic point;

b2, defining a second-order functional and optimizing and solving:

constructing a decision tree with T leaf nodes; the input is the current coordinates of the feature points, and the aim is to reduce the value of the following second-order functional;

wherein I is _j Is the leaf node where sample i is located, (dx) _j ,dy _j ) Is the value of the optimal solution at the leaf node; g _i Is the first derivative of the loss function at each feature point, h _i Is the second order of the loss function at each feature pointThe derivative of the derivative is used to determine,

is the optimal solution of each feature point, the second order functional +.>

The extremum problem of (2) is converted into the optimal value calculation of the T leaf nodes;

b3, the functional takes the minimum value, and the coordinate correction of each characteristic point is correspondingly carried out; namely, the correction of the second-order feature points comes from the minimum value of the second-order functional; for leaf node T _j Wherein I _j The i samples, defined as follows:

and B4, correcting the coordinates of each characteristic point:

if the leaf node at which sample i is located is T _j Then:

x _i ＝x _i +ηdx _j ，y _i ＝y _i +ηdx _j the method comprises the steps of carrying out a first treatment on the surface of the Wherein, eta is a set constant and is generally valued;

and B5, continuously generating a plurality of decision trees, if the correction (sigma|dx| and sigma|dy|) between two continuous decision trees is smaller than a set threshold, converging by the module, recording the characteristic points, and sending the characteristic points to a face tracking module, otherwise, continuing the step B2.

As an implementation mode of the invention, the face tracking module is used for calculating the spatial pose of the face according to the second-order feature points of the face; the second-order feature points are used for determining key positions of the face; the face tracking module is used for judging whether to blink according to the second-order characteristic points at the eyes, and judging whether to open the mouth according to the second-order characteristic points at the mouth; the face tracking module is used for judging the left-right and up-down gestures of the face according to the second-order feature points of the face;

and the face tracking module is used for analyzing the characteristic point sequence to obtain a motion track of the face.

the face feature point positioning module is used for positioning coordinates of second-order feature points of the face;

The real-time face tracking method comprises the following steps:

a video frame image acquisition step of acquiring each frame image of a video;

a face detection step of calling different face detection models for each frame of image to find whether a face appears; the face detection models can traverse each position of the image, judge whether the face appears at the position or not, and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to appear, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;

a face feature point positioning step of positioning coordinates (x _i ,y _i ) The method comprises the steps of carrying out a first treatment on the surface of the Gradually generating a plurality of decision trees, and inputting second-order features of each decision treeThe current coordinates of the symptom point are aimed at decreasing the value of the second order function, i.e. improving the coordinates along the direction of the optimized gradient (x _i ,y _i ) I.e. each feature point (x _i ,y _i ) Gradient of (2)

Wherein I is _j Is the leaf node where the feature point is located, |I _j I is the number of all feature points on the leaf node; the coordinates of each feature point are corrected as follows: x is x _i ＝x _i +ηdx _i ，y _i ＝y _i +ηdx _i ；

A face tracking step of tracking a face in a video to obtain continuous spatial gestures thereof; for each frame of video, the second-order characteristic points of the human face can be accurately positioned, and the sequence formed by the characteristic points represents the motion trail of the human face; based on the characteristic points, judging the aerial gesture of the face; the change of the characteristic points among different frames reflects various changes and movements of the human face.

In the step of face detection, a different face detection model is called for each frame of image of the video, and each position of the image is traversed; giving a confidence rate for each possible face position; the overall confidence of these confidences is calculated:

confidence rate＝ω ₁ R ₁ +ω ₂ R ₂ +……；

In the step of locating the feature points of the face, a second-order feature point locating flow is called for the region where each face is located;

step B1, according toA standard template, initializing N feature points (x _i ,y _i ) The coordinates of each point are from the average face;

step B2, constructing a decision tree, inputting the current coordinates of the feature points, and reducing the value of the following second-order functional;

wherein I is _j Is the leaf node where the feature point is located, ω _j Is the value of the leaf node; g _i Is the first order of the loss function at each feature point,

is the optimal solution of each feature point, h _i Is the second derivative of the loss function at each feature point;

step B3, the functional takes the minimum value, and the coordinate correction of each characteristic point is correspondingly carried out; that is, the second order feature point is corrected by taking the minimum value of the second order functional, which is defined as follows:

and B4, correcting the coordinates of each characteristic point:

x _i ＝x _i +ηdx _i ，y _i ＝y _i +ηdx _i the method comprises the steps of carrying out a first treatment on the surface of the Wherein η is a set constant;

step B5, generating a plurality of decision trees continuously, such as correction (dx) _i ，dy _i ) If the characteristic points are smaller than the set threshold, the module converges, records the characteristic points, and goes to the step of face tracking, otherwise, goes to the step B2.

The real-time face tracking method comprises the following steps:

a video frame image acquisition step of acquiring each frame image of a video;

a step of locating the feature points of the face, wherein coordinates of second-order feature points of the face are located;

As one embodiment of the invention, a new second order functional gradient lifting algorithm (gradient boosting on second functional target) is employed to enable real-time face tracking on mobile devices. The specific scheme is as follows:

second order functional gradient

For a data set of N samples and M features, the gradient lifting (Gradient Boosting) algorithm is trained to obtain a series of additive functions { f ₁ ,f ₂ ,f ₃ … }, to predict output:

during the training process, the target value y of each sample is known _i Predicted value

And a target value y _i The difference between them is represented by a loss function->

To characterize. The training process of gradient lifting is thatThe loss is reduced along the current gradient direction. On the basis of the previous step (t-1), the functional definition of the step (t) is as follows:

let decision tree q handle each sample x _i Mapping to leaf node T _j I.e. q (x _i ) The value of the leaf node is ω =j _j Then (1) can be simplified as follows:

if the higher order margin is omitted, the functional (2) is further expanded as follows

Wherein g _i ,h _i Respectively the loss function is at

First and second derivatives of the position, i.e

Due to

Is a constant, the second order functional (4) is further simplified into

Setting the mapping corresponding to the decision tree as I _j ＝{i|q(x _i ) The corresponding functional expansion for j is as follows

Extremum the above, at each leaf node:

the corresponding extremum is as follows:

if the LOSS function is defined as a second order form,

then h _i =1, so each leaf node takes the value ω _j In practice g _i Is a mean value of (c). I.e.

Gradient lifting algorithm based on sparse random forest.

Combining a plurality of decision trees q together, and introducing random selection to improve accuracy and generalization capability, namely forming a random forest. Multistage regression based on random forests is currently the dominant algorithm. The algorithm can rapidly locate the characteristic points of each face, and then judges the gesture and the motion trail of the face according to the characteristic points. The method has high accuracy and high speed of positioning the characteristic points, and hundreds of frames of pictures can be processed per second on a PC. The mobile equipment has weaker computing power, but twenty frames of pictures can be detected every second, so that the effect of real-time tracking can be achieved.

But standard models require hundreds of megaspaces. Simple simplification (e.g., reducing the number of trees, or reducing the regression progression) only reduces accuracy. As shown in fig. 2, by observing and analyzing the characteristics of the random forest model, the feature vector can be seen to have sparsity. I.e. in the feature vector for each node, often only a few quantities have a high value, while the other components are not important. Therefore, a sparse representation algorithm is adopted for each node, so that the compression rate of more than 10 times is expected to be realized, the model is stored on a common mobile phone, and the precision is not affected. At the same time, the speed is also improved due to the shrinkage of the model.

The invention has the beneficial effects that: the face real-time tracking system and the face real-time tracking method can improve the accuracy, the processing speed and the stability of real-time face information tracking.

The invention adopts an innovative second-order functional gradient lifting algorithm (gradient boosting on second functional target) on the basis of a regression model to realize real-time face tracking on mobile equipment. The invention adopts a sparse random forest regression algorithm to realize the processing speed of 20 frames per second on a common mobile phone, thereby achieving the effect of real-time tracking. The invention uses the continuous flow model to treat the moving face sequence as continuous flow transformation, has strong robustness, and can accurately position the face under the difficult conditions of large side angle, partial shielding and the like.

The invention has the following advantages: 1) More accurate. The second-order functional gradient is adopted, so that a strict mathematical theory basis is provided, and the accuracy of the whole algorithm is ensured. 2) Faster. The sparse random forest regression algorithm is adopted, so that the processing speed of 20 frames per second is realized on a common mobile phone, and the effect of real-time tracking is achieved. 3) Is more stable. The continuous flow model regards the moving face sequence as continuous flow transformation, has strong robustness, and can accurately position the face under the difficult conditions of large side angle, partial shielding and the like.

Drawings

Fig. 1 is a schematic diagram of a real-time face tracking system according to an embodiment of the invention.

Fig. 2 is a flowchart of a face real-time tracking method according to an embodiment of the invention.

Fig. 3 is a schematic diagram of face detection performed by a conventional multi-level regression algorithm based on random forests.

Fig. 4 is a schematic diagram of a conventional feature vector detection based on a random forest model.

Fig. 5 is a schematic diagram of the conventional face detection effect (including a front face and a side face).

FIG. 6 is a schematic diagram of face detection using a continuous flow pattern model in accordance with the present invention.

Fig. 7 is a schematic diagram of face detection performed by the conventional 68-point labeling model.

Fig. 8 is a schematic diagram of the present invention employing subsampled feature points.

Detailed Description

Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

For a further understanding of the present invention, preferred embodiments of the invention are described below in conjunction with the examples, but it should be understood that these descriptions are merely intended to illustrate further features and advantages of the invention, and are not limiting of the claims of the invention.

The description of this section is intended to be illustrative of only a few exemplary embodiments and the invention is not to be limited in scope by the description of the embodiments. It is also within the scope of the description and claims of the invention to interchange some of the technical features of the embodiments with other technical features of the same or similar prior art.

The invention discloses a real-time human face tracking system, and FIG. 1 is a schematic diagram of the components of the real-time human face tracking system in an embodiment of the invention; referring to fig. 1, in an embodiment of the present invention, the face real-time tracking system includes: the system comprises a video frame image acquisition module 1, a face detection module 2, a face feature point positioning module 3 and a face tracking module 4.

The video frame image acquisition module 1 is used for acquiring each frame image of a video.

The face detection module 2 is used for calling different face detection models for each frame of image to find out whether a face appears; the models can traverse each position of the image, determine whether a face is present at that position, and return a reliability. The module synthesizes a plurality of models, so that the reliability of face recognition is further improved; for a certain position, only if the reliability is higher than a set value, the face is considered to appear indeed; recording the position of the face, and sending the corresponding position to a face feature point positioning module;

the face feature point positioning module 3 is configured to position coordinates (x _i ,y _i ). In an embodiment of the present invention, the face feature point positioning module gradually generates a plurality of decision trees (in an embodiment of the present invention, the face feature point positioning module gradually generates a plurality of decision trees based on a gradient lifting algorithm), and the goal is to reduce the value of a second order functional; modifying the coordinates (x) along the direction of the optimized gradient _i ,y _i ) Is a value of (2); each feature point (x _i ,y _i ) Gradient of (2)

Wherein I is _j Is the leaf node where the feature point is located, |I _j I is the number of all feature points on the leaf node; the coordinates of each feature point are corrected as follows: x is x _i ＝x _i +ηdx _i ，y _i ＝y _i +ηdx _i The method comprises the steps of carrying out a first treatment on the surface of the η is a set constant.

The face tracking module 4 is used for tracking faces in the video to obtain continuous spatial gestures; for each frame of video, the second-order characteristic points of the face can be accurately positioned, and the sequence formed by the characteristic points represents the motion trail of the face; based on the characteristic points, judging the aerial gesture of the face; the change of the characteristic points among different frames reflects various changes and movements of the human face.

In an embodiment of the present invention, the face detection module considers that a face actually appears only if the reliability is >0.95 for a certain position.

In an embodiment of the present invention, the face detection module invokes different face detection models (such as MTCNN, YOLOv3, etc.) for each frame of image of the video, and traverses each position of the image; giving a confidence rate for each possible face position; the overall confidence of these confidences is calculated:

confidence rate＝ω ₁ R ₁ +ω ₂ R ₂ +……；

In an embodiment of the present invention, the face feature point positioning module invokes a second-order feature point positioning procedure for an area where each face is located;

step B1, initializing N feature points (x according to a standard template _i ,y _i ) The coordinates of each point are from the average face;

step B2, constructing a decision tree, inputting the current coordinates of the feature points, and aiming at reducing the value of the following second-order functional

and B4, correcting the coordinates of each characteristic point:

x _i ＝x _i +ηdx _i ，y _i ＝y _i +ηdx _i the method comprises the steps of carrying out a first treatment on the surface of the Wherein η is a set constant; in one embodiment of the present invention, the constant η is 0.01.

Step B5, generating a plurality of decision trees continuously, such as correction (dx) _i ，dy _i ) And if the characteristic points are smaller than the set threshold (such as one thousandth of a coordinate value), the module converges, records the characteristic points and sends the characteristic points to the face tracking module, otherwise, the step B2 is continued.

In an embodiment of the invention, the face tracking module is configured to calculate a spatial pose of a face according to second-order feature points of the face; the second-order feature points are used for determining key positions of the face; the face tracking module is used for judging whether to blink according to the second-order characteristic points at the eyes, and judging whether to open the mouth according to the second-order characteristic points at the mouth; the face tracking module is used for judging the left-right and up-down gestures of the face according to the second-order feature points of the face;

The invention also discloses a face real-time tracking method, and FIG. 2 is a flow chart of the face real-time tracking method in an embodiment of the invention; referring to fig. 2, in an embodiment of the present invention, the face real-time tracking method includes:

step S1, acquiring each frame image of a video;

step S2, a face detection step, namely calling different face detection models for each frame of image to find out whether a face appears; the face detection models can traverse each position of the image, judge whether the face appears at the position or not, and return a reliability. The module synthesizes a plurality of models, so that the reliability of face recognition is further improved; for a certain position, only if the reliability is higher than a set value, the face is considered to appear indeed; recording the position of the face, and sending the corresponding position to a face feature point positioning module;

step S3, a step of locating feature points of the face, wherein coordinates (x _i ,y _i ). In one embodiment of the invention, a plurality of decision trees are generated step by step based on a gradient lifting algorithm, and the current coordinates of the input second-order feature points of each decision tree are aimed at reducing the value of the second-order functional; modifying the coordinates (x) along the direction of the optimized gradient _i ,y _i ) Is a value of (2); each feature point (x _i ,y _i ) Gradient of (2)

Wherein I is _j Is the leaf node where the feature point is located, |I _j I is the number of all feature points on the leaf node; the coordinates of each feature point are corrected as follows: x is x _i ＝x _i +ηdx _i ，y _i ＝y _i +ηdx _i 。

Step S4, tracking the human face in the video to obtain continuous spatial gestures; for each frame of video, the second-order characteristic points of the face can be accurately positioned, and the sequence formed by the characteristic points represents the motion trail of the face; based on the characteristic points, judging the aerial gesture of the face; the change of the characteristic points among different frames reflects various changes and movements of the human face.

In an embodiment of the present invention, in the step of face detection, for each frame of image of the video, a different face detection model is called, and each position of the image is traversed; giving a confidence rate for each possible face position; the overall confidence of these confidences is calculated:

confidence rate＝ω ₁ R ₁ +ω ₂ R ₂ +……；

In an embodiment of the present invention, in the step of locating the feature points of the faces, a second-order feature point locating procedure is invoked for each area where the face is located;

and B4, correcting the coordinates of each characteristic point:

step B5, generating a plurality of decision trees continuously, such as correction (dx) _i ，dy _i ) Less than a set threshold (e.g., one thousandth of a coordinate value), the module converges, and these are recordedAnd (3) turning to a step of face tracking, otherwise, turning to a step B2.

In one embodiment of the invention, the face real-time tracking system and method adopts a sparse random forest regression algorithm. Multistage regression based on random forests is the current mainstream algorithm, which is shown in fig. 3. The algorithm can rapidly locate the characteristic points of each face, and then judges the gesture and the motion trail of the face according to the characteristic points. The method has high accuracy and high speed of positioning the characteristic points, and hundreds of frames of pictures can be processed per second on a PC. The mobile equipment has weaker computing power, but twenty frames of pictures can be detected every second, so that the effect of real-time tracking can be achieved. But this is based on its trained big data model, which requires hundreds of megaspaces. Simple simplification (e.g., reducing the number of trees, or reducing the regression progression) only reduces accuracy. As shown in fig. 4, by observing and analyzing the characteristics of the random forest model, the feature vector thereof can be seen to have sparsity. I.e. in the feature vector for each node, often only a few quantities have a high value, while the other components are not important. Therefore, a sparse representation algorithm is adopted for each node, so that the compression rate of more than 10 times is expected to be realized, the model is stored on a common mobile phone, and the precision is not affected. At the same time, the speed is also improved due to the shrinkage of the model.

In one embodiment of the invention, the face real-time tracking system and method uses a continuous flow model. When the characteristic points of the front face are positioned well, but when the face deflection angle is larger and larger, even only half face exists, many characteristic points disappear or the corresponding positions are unknown (as shown in fig. 5). These vanishing feature points cannot be located but rather act as interference. In one embodiment of the invention, a continuous flow pattern transformation model is used to address this problem by treating the moving sequence of faces as being located in a continuous flow pattern space. Fig. 6 is a schematic diagram of recognition of real-time tracking of a face according to an embodiment of the present invention, and as shown in fig. 6, the reference pose of the face is determined by determining the spatial transformations. And from the pose at this point it is determined which feature points are visible and for those feature points that disappear, no longer used for localization. The scheme has strong robustness, and is expected to accurately position the face under the difficult conditions of large side angles, partial shielding and the like.

In one embodiment of the invention, the face real-time tracking system and method uses secondary adaptive sampling feature points. The conventional method is based on the fixed face feature points, namely the number and the positions of the feature points are fixed, and a 68-point standard model is shown in fig. 7. We automatically encrypt the feature points based on the standard feature points, as shown in fig. 8. Various constraint relationships between the encryption points and the reference points further improve the accuracy of positioning. Moreover, the points are properly encrypted according to the datum points, so that the calculation amount is less, and real-time tracking can still be realized.

In summary, the face real-time tracking system and the face real-time tracking method provided by the invention can improve the accuracy, the processing speed and the stability of real-time tracking face information.

The description and applications of the present invention herein are illustrative and are not intended to limit the scope of the invention to the embodiments described above. Variations and modifications of the embodiments disclosed herein are possible, and alternatives and equivalents of the various components of the embodiments are known to those of ordinary skill in the art. It will be clear to those skilled in the art that the present invention may be embodied in other forms, structures, arrangements, proportions, and with other assemblies, materials, and components, without departing from the spirit or essential characteristics thereof. Other variations and modifications of the embodiments disclosed herein may be made without departing from the scope and spirit of the invention.

Claims

1. A real-time face tracking system, the real-time face tracking system comprising:

2. The face real-time tracking system of claim 1, wherein:

the face detection module considers that the face actually appears only if the reliability is more than 0.95 for a certain position.

3. The face real-time tracking system of claim 1, wherein:

the face detection module invokes at least one face detection model for each frame of image of the video, and the models traverse each position of the image; giving a confidence rate for each possible face position; the overall confidence of these confidences is calculated:

confidence rate＝ω ₁ R ₁ +ω ₂ R ₂ +……；

4. The face real-time tracking system of claim 1, wherein:

the face feature point positioning module calls a second-order feature point positioning algorithm for the region where each face is located;

b2, defining a second-order functional and optimizing and solving:

wherein I is _j Is the leaf node where sample i is located, (dx) _j ,dy _j ) Is the value of the optimal solution at the leaf node; g _i Is the first derivative of the loss function at each feature point, h _i Is the second derivative of the loss function at each feature point,

is the optimal solution of each feature point, the second order functional +.>

and B4, correcting the coordinates of each characteristic point:

if the leaf node at which sample i is located is T _j Then:

5. The face real-time tracking system of claim 1, wherein:

the face tracking module is used for calculating the spatial pose of the face according to the second-order feature points of the face; the second-order feature points are used for determining key positions of the face; the face tracking module is used for judging whether to blink according to the second-order characteristic points at the eyes, and judging whether to open the mouth according to the second-order characteristic points at the mouth; the face tracking module is used for judging the left-right and up-down gestures of the face according to the second-order feature points of the face;

6. The real-time face tracking method is characterized by comprising the following steps of:

a video frame image acquisition step of acquiring each frame image of a video;

a face feature point positioning step of positioning coordinates (x _i ,y _i ) The method comprises the steps of carrying out a first treatment on the surface of the Gradually generating a plurality of decision trees, each of which inputs the current coordinates of the second-order feature points, with the objective of decreasing the value of the second-order function, i.e., improving the coordinates (x _i ,y _i ) I.e. each feature point (x _i ,y _i ) Gradient of (2)

7. The face real-time tracking method according to claim 6, wherein:

in the step of face detection, for each frame of image of the video, different face detection models are called, and each position of the image is traversed; giving a confidence rate for each possible face position; the overall confidence of these confidences is calculated:

confidence rate＝ω ₁ R ₁ +ω ₂ R ₂ +……；

8. The face real-time tracking method according to claim 6, wherein:

in the step of locating the face feature points, a second-order feature point locating flow is called for the area where each face is located;

and B4, correcting the coordinates of each characteristic point: