CN111523345B - Real-time human face tracking system and method - Google Patents

Real-time human face tracking system and method Download PDF

Info

Publication number
CN111523345B
CN111523345B CN201910103409.9A CN201910103409A CN111523345B CN 111523345 B CN111523345 B CN 111523345B CN 201910103409 A CN201910103409 A CN 201910103409A CN 111523345 B CN111523345 B CN 111523345B
Authority
CN
China
Prior art keywords
face
feature point
order
tracking
coordinates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910103409.9A
Other languages
Chinese (zh)
Other versions
CN111523345A (en
Inventor
陈英时
耿敢超
左建锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Kankan Intelligent Technology Co ltd
Original Assignee
Shanghai Kankan Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Kankan Intelligent Technology Co ltd filed Critical Shanghai Kankan Intelligent Technology Co ltd
Priority to CN201910103409.9A priority Critical patent/CN111523345B/en
Publication of CN111523345A publication Critical patent/CN111523345A/en
Application granted granted Critical
Publication of CN111523345B publication Critical patent/CN111523345B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a real-time human face tracking system and a real-time human face tracking method. The face detection module is used for calling different face detection models for each frame of image, searching whether a face appears or not, recording the position of the face, and sending the corresponding position to the face feature point positioning module; the face feature point positioning module is used for positioning the coordinates of the second-order feature points of the face and correcting the coordinates of each feature point; the face tracking module is used for tracking the face in the video to obtain continuous spatial gestures. The face real-time tracking system and the face real-time tracking method can improve the accuracy, the processing speed and the stability of real-time face information tracking.

Description

Real-time human face tracking system and method
Technical Field
The invention belongs to the technical field of face tracking, relates to a face real-time tracking system, and particularly relates to a face real-time tracking system and method based on a second-order functional gradient.
Background
Face tracking mainly refers to determining a motion trail of a face in a continuous video sequence, and is widely focused and researched in multiple fields such as computer vision, artificial intelligence and the like; and is widely applied to video monitoring, robots, man-machine interaction and the like. With the explosive growth of mobile devices, more and more applications of face tracking, such as mobile payment, virtual makeup, and mei Yan Zipai, are required, and it is difficult to apply conventional face tracking to these situations, and extensive research is required in combination with the latest algorithms.
The traditional face tracking algorithm is mainly applied to mobile equipment, and has the following two problems 1) that the computing resource cost is high, and the traditional face tracking algorithm is difficult to directly transplant to the mobile equipment. The mobile equipment such as mobile phones and the like has weaker computing capacity and less memory, the traditional high-precision model has large computing capacity, and meanwhile, a large amount of memory is needed, so that after the model is simplified, the precision is inevitably reduced, and accurate tracking is difficult; 2) The robustness is poor, namely, for the situations of large side angle faces, partial shielding and the like, the positioning has deviation and tracking is not achieved.
Early face tracking requires modeling of faces. The shape modeling method includes a deformable template (Deformable Template), a point distribution model (active shape model Active Shape Model), a graph model, and the like. The appearance modeling method comprises global appearance modeling and local appearance modeling, wherein the global method comprises an active appearance model Active Appearance Model (a production model), a discriminant model Boosted Appearance Model (a discriminant model) and the like, and the local appearance modeling is used for modeling the appearance information of a local area and comprises a color model, a projection model, a side profile model and the like. Modeling-based methods are often limited by the model itself, and have low accuracy, and mainly simple models are difficult to express some difficult factors in practical application, including illumination, shielding, variable gestures and the like.
The recent multi-stage shape regression model (cascade shape regression) has made a major breakthrough in accuracy and speed, and the method uses the regression model to directly learn the mapping function from the appearance of the face to the shape of the face (or the parameters of the shape model of the face), so as to establish the correspondence from the appearance to the shape. And the method does not need complex shape and apparent modeling, and is easy to apply. Many comparative tests have shown that they are particularly suitable for use in uncontrolled, uncooperative scenarios, which are the main application scenarios for mobile devices. In addition, the face alignment method based on deep learning also achieves remarkable results. The deep learning combined with the shape regression framework can further improve the accuracy of the positioning model, and becomes one of the mainstream methods of current feature positioning. But because the data model of the deep learning model is huge (often containing tens of millions of variables), it is not suitable for mobile devices, and will not be discussed further below.
In view of this, there is an urgent need to design a face tracking method so as to overcome the above-mentioned drawbacks of the existing face tracking method.
Disclosure of Invention
The invention provides a real-time human face tracking system and a real-time human face tracking method, which can improve the accuracy, the processing speed and the stability of real-time human face information tracking.
In order to solve the technical problems, according to one aspect of the present invention, the following technical scheme is adopted:
a face real-time tracking system, the face real-time tracking system comprising:
the video frame image acquisition module is used for acquiring each frame image of the video;
the face detection module is used for calling different face detection models for each frame of image to find out whether a face appears; the face detection model can traverse each position of the image, judge whether a face appears at each position or not, and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to appear, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;
the face feature point positioning module is used for positioning coordinates (x i ,y i ) The method comprises the steps of carrying out a first treatment on the surface of the The face feature point positioning module gradually generates a plurality of decision trees, and each decision tree inputs the current coordinates of the second-order feature points, and the aim is to reduce the value of the second-order functional; modifying the coordinates (x) along the direction of the optimized gradient i ,y i ) Each feature point (x i ,y i ) Gradient of (2)
Figure BDA0001966164310000021
Figure BDA0001966164310000022
Wherein I is j Is the leaf node where the feature point is located, |I j I is the number of all feature points on the leaf node; the coordinates of each feature point are corrected as follows: x is x i =x i +ηdx i ,y i =y i +ηdx i The method comprises the steps of carrying out a first treatment on the surface of the η is a set constant;
the face tracking module is used for tracking the face in the video to obtain continuous spatial gestures; for each frame of video, the second-order characteristic points of the human face can be accurately positioned, and the sequence formed by the characteristic points represents the motion trail of the human face; based on the characteristic points, judging the aerial gesture of the face; the change of the characteristic points among different frames reflects various changes and movements of the human face.
As an embodiment of the present invention, the face detection module considers that a face actually appears only if the reliability is >0.95 for a certain position.
As one embodiment of the present invention, the face detection module invokes at least one face detection model for each frame of image of the video, the models traversing each location of the image; giving a confidence rate for each possible face position; the overall confidence of these confidences is calculated:
confidence rate=ω 1 R 12 R 2 +……;
wherein R is 1 ,R 2 … are confidence degrees, omega, returned by different face detection models respectively 1 ,ω 2 … are set coefficients, respectively;
if the comprehensive confidence rate is more than 0.95, the position has a face; recording the position and sending the position information to a face feature point positioning module; if no face is detected, the next frame of image is continuously detected.
As an implementation mode of the invention, the face feature point positioning module calls a second-order feature point positioning algorithm for the region where each face is located;
b1, initializing coordinates (x) of N face feature points according to a standard template i ,y i ) The initial coordinates of each point come from an average face, namely, a plurality of faces are marked as samples, and the average value of the samples is taken for each characteristic point;
b2, defining a second-order functional and optimizing and solving:
constructing a decision tree with T leaf nodes; the input is the current coordinates of the feature points, and the aim is to reduce the value of the following second-order functional;
Figure BDA0001966164310000031
wherein I is j Is the leaf node where sample i is located, (dx) j ,dy j ) Is the value of the optimal solution at the leaf node; g i Is the first derivative of the loss function at each feature point, h i Is the second order of the loss function at each feature pointThe derivative of the derivative is used to determine,
Figure BDA0001966164310000032
is the optimal solution of each feature point, the second order functional +.>
Figure BDA0001966164310000033
The extremum problem of (2) is converted into the optimal value calculation of the T leaf nodes;
b3, the functional takes the minimum value, and the coordinate correction of each characteristic point is correspondingly carried out; namely, the correction of the second-order feature points comes from the minimum value of the second-order functional; for leaf node T j Wherein I j The i samples, defined as follows:
Figure BDA0001966164310000034
and B4, correcting the coordinates of each characteristic point:
if the leaf node at which sample i is located is T j Then:
x i =x i +ηdx j ,y i =y i +ηdx j the method comprises the steps of carrying out a first treatment on the surface of the Wherein, eta is a set constant and is generally valued;
and B5, continuously generating a plurality of decision trees, if the correction (sigma|dx| and sigma|dy|) between two continuous decision trees is smaller than a set threshold, converging by the module, recording the characteristic points, and sending the characteristic points to a face tracking module, otherwise, continuing the step B2.
As an implementation mode of the invention, the face tracking module is used for calculating the spatial pose of the face according to the second-order feature points of the face; the second-order feature points are used for determining key positions of the face; the face tracking module is used for judging whether to blink according to the second-order characteristic points at the eyes, and judging whether to open the mouth according to the second-order characteristic points at the mouth; the face tracking module is used for judging the left-right and up-down gestures of the face according to the second-order feature points of the face;
and the face tracking module is used for analyzing the characteristic point sequence to obtain a motion track of the face.
A face real-time tracking system, the face real-time tracking system comprising:
the video frame image acquisition module is used for acquiring each frame image of the video;
the face detection module is used for calling different face detection models for each frame of image to find out whether a face appears; the face detection model can traverse each position of the image, judge whether a face appears at each position or not, and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to appear, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;
the face feature point positioning module is used for positioning coordinates of second-order feature points of the face;
the face tracking module is used for tracking the face in the video to obtain continuous spatial gestures; for each frame of video, the second-order characteristic points of the human face can be accurately positioned, and the sequence formed by the characteristic points represents the motion trail of the human face; based on the characteristic points, judging the aerial gesture of the face; the change of the characteristic points among different frames reflects various changes and movements of the human face.
The real-time face tracking method comprises the following steps:
a video frame image acquisition step of acquiring each frame image of a video;
a face detection step of calling different face detection models for each frame of image to find whether a face appears; the face detection models can traverse each position of the image, judge whether the face appears at the position or not, and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to appear, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;
a face feature point positioning step of positioning coordinates (x i ,y i ) The method comprises the steps of carrying out a first treatment on the surface of the Gradually generating a plurality of decision trees, and inputting second-order features of each decision treeThe current coordinates of the symptom point are aimed at decreasing the value of the second order function, i.e. improving the coordinates along the direction of the optimized gradient (x i ,y i ) I.e. each feature point (x i ,y i ) Gradient of (2)
Figure BDA0001966164310000041
Wherein I is j Is the leaf node where the feature point is located, |I j I is the number of all feature points on the leaf node; the coordinates of each feature point are corrected as follows: x is x i =x i +ηdx i ,y i =y i +ηdx i
A face tracking step of tracking a face in a video to obtain continuous spatial gestures thereof; for each frame of video, the second-order characteristic points of the human face can be accurately positioned, and the sequence formed by the characteristic points represents the motion trail of the human face; based on the characteristic points, judging the aerial gesture of the face; the change of the characteristic points among different frames reflects various changes and movements of the human face.
In the step of face detection, a different face detection model is called for each frame of image of the video, and each position of the image is traversed; giving a confidence rate for each possible face position; the overall confidence of these confidences is calculated:
confidence rate=ω 1 R 12 R 2 +……;
wherein R is 1 ,R 2 … are confidence degrees, omega, returned by different face detection models respectively 1 ,ω 2 … are set coefficients, respectively;
if the comprehensive confidence rate is more than 0.95, the position has a face; recording the position and sending the position information to a face feature point positioning module; if no face is detected, the next frame of image is continuously detected.
In the step of locating the feature points of the face, a second-order feature point locating flow is called for the region where each face is located;
step B1, according toA standard template, initializing N feature points (x i ,y i ) The coordinates of each point are from the average face;
step B2, constructing a decision tree, inputting the current coordinates of the feature points, and reducing the value of the following second-order functional;
Figure BDA0001966164310000051
wherein I is j Is the leaf node where the feature point is located, ω j Is the value of the leaf node; g i Is the first order of the loss function at each feature point,
Figure BDA0001966164310000052
is the optimal solution of each feature point, h i Is the second derivative of the loss function at each feature point;
step B3, the functional takes the minimum value, and the coordinate correction of each characteristic point is correspondingly carried out; that is, the second order feature point is corrected by taking the minimum value of the second order functional, which is defined as follows:
Figure BDA0001966164310000053
and B4, correcting the coordinates of each characteristic point:
x i =x i +ηdx i ,y i =y i +ηdx i the method comprises the steps of carrying out a first treatment on the surface of the Wherein η is a set constant;
step B5, generating a plurality of decision trees continuously, such as correction (dx) i ,dy i ) If the characteristic points are smaller than the set threshold, the module converges, records the characteristic points, and goes to the step of face tracking, otherwise, goes to the step B2.
The real-time face tracking method comprises the following steps:
a video frame image acquisition step of acquiring each frame image of a video;
a face detection step of calling different face detection models for each frame of image to find whether a face appears; the face detection models can traverse each position of the image, judge whether the face appears at the position or not, and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to appear, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;
a step of locating the feature points of the face, wherein coordinates of second-order feature points of the face are located;
a face tracking step of tracking a face in a video to obtain continuous spatial gestures thereof; for each frame of video, the second-order characteristic points of the human face can be accurately positioned, and the sequence formed by the characteristic points represents the motion trail of the human face; based on the characteristic points, judging the aerial gesture of the face; the change of the characteristic points among different frames reflects various changes and movements of the human face.
As one embodiment of the invention, a new second order functional gradient lifting algorithm (gradient boosting on second functional target) is employed to enable real-time face tracking on mobile devices. The specific scheme is as follows:
second order functional gradient
For a data set of N samples and M features, the gradient lifting (Gradient Boosting) algorithm is trained to obtain a series of additive functions { f 1 ,f 2 ,f 3 … }, to predict output:
Figure BDA0001966164310000061
during the training process, the target value y of each sample is known i Predicted value
Figure BDA0001966164310000062
And a target value y i The difference between them is represented by a loss function->
Figure BDA0001966164310000063
To characterize. The training process of gradient lifting is thatThe loss is reduced along the current gradient direction. On the basis of the previous step (t-1), the functional definition of the step (t) is as follows:
Figure BDA0001966164310000064
let decision tree q handle each sample x i Mapping to leaf node T j I.e. q (x i ) The value of the leaf node is ω =j j Then (1) can be simplified as follows:
Figure BDA0001966164310000065
if the higher order margin is omitted, the functional (2) is further expanded as follows
Figure BDA0001966164310000066
Wherein g i ,h i Respectively the loss function is at
Figure BDA0001966164310000067
First and second derivatives of the position, i.e
Figure BDA0001966164310000068
Due to
Figure BDA0001966164310000069
Is a constant, the second order functional (4) is further simplified into
Figure BDA00019661643100000610
Setting the mapping corresponding to the decision tree as I j ={i|q(x i ) The corresponding functional expansion for j is as follows
Figure BDA00019661643100000611
Extremum the above, at each leaf node:
Figure BDA0001966164310000071
the corresponding extremum is as follows:
Figure BDA0001966164310000072
if the LOSS function is defined as a second order form,
Figure BDA0001966164310000073
then h i =1, so each leaf node takes the value ω j In practice g i Is a mean value of (c). I.e.
Figure BDA0001966164310000074
Gradient lifting algorithm based on sparse random forest.
Combining a plurality of decision trees q together, and introducing random selection to improve accuracy and generalization capability, namely forming a random forest. Multistage regression based on random forests is currently the dominant algorithm. The algorithm can rapidly locate the characteristic points of each face, and then judges the gesture and the motion trail of the face according to the characteristic points. The method has high accuracy and high speed of positioning the characteristic points, and hundreds of frames of pictures can be processed per second on a PC. The mobile equipment has weaker computing power, but twenty frames of pictures can be detected every second, so that the effect of real-time tracking can be achieved.
But standard models require hundreds of megaspaces. Simple simplification (e.g., reducing the number of trees, or reducing the regression progression) only reduces accuracy. As shown in fig. 2, by observing and analyzing the characteristics of the random forest model, the feature vector can be seen to have sparsity. I.e. in the feature vector for each node, often only a few quantities have a high value, while the other components are not important. Therefore, a sparse representation algorithm is adopted for each node, so that the compression rate of more than 10 times is expected to be realized, the model is stored on a common mobile phone, and the precision is not affected. At the same time, the speed is also improved due to the shrinkage of the model.
The invention has the beneficial effects that: the face real-time tracking system and the face real-time tracking method can improve the accuracy, the processing speed and the stability of real-time face information tracking.
The invention adopts an innovative second-order functional gradient lifting algorithm (gradient boosting on second functional target) on the basis of a regression model to realize real-time face tracking on mobile equipment. The invention adopts a sparse random forest regression algorithm to realize the processing speed of 20 frames per second on a common mobile phone, thereby achieving the effect of real-time tracking. The invention uses the continuous flow model to treat the moving face sequence as continuous flow transformation, has strong robustness, and can accurately position the face under the difficult conditions of large side angle, partial shielding and the like.
The invention has the following advantages: 1) More accurate. The second-order functional gradient is adopted, so that a strict mathematical theory basis is provided, and the accuracy of the whole algorithm is ensured. 2) Faster. The sparse random forest regression algorithm is adopted, so that the processing speed of 20 frames per second is realized on a common mobile phone, and the effect of real-time tracking is achieved. 3) Is more stable. The continuous flow model regards the moving face sequence as continuous flow transformation, has strong robustness, and can accurately position the face under the difficult conditions of large side angle, partial shielding and the like.
Drawings
Fig. 1 is a schematic diagram of a real-time face tracking system according to an embodiment of the invention.
Fig. 2 is a flowchart of a face real-time tracking method according to an embodiment of the invention.
Fig. 3 is a schematic diagram of face detection performed by a conventional multi-level regression algorithm based on random forests.
Fig. 4 is a schematic diagram of a conventional feature vector detection based on a random forest model.
Fig. 5 is a schematic diagram of the conventional face detection effect (including a front face and a side face).
FIG. 6 is a schematic diagram of face detection using a continuous flow pattern model in accordance with the present invention.
Fig. 7 is a schematic diagram of face detection performed by the conventional 68-point labeling model.
Fig. 8 is a schematic diagram of the present invention employing subsampled feature points.
Detailed Description
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
For a further understanding of the present invention, preferred embodiments of the invention are described below in conjunction with the examples, but it should be understood that these descriptions are merely intended to illustrate further features and advantages of the invention, and are not limiting of the claims of the invention.
The description of this section is intended to be illustrative of only a few exemplary embodiments and the invention is not to be limited in scope by the description of the embodiments. It is also within the scope of the description and claims of the invention to interchange some of the technical features of the embodiments with other technical features of the same or similar prior art.
The invention discloses a real-time human face tracking system, and FIG. 1 is a schematic diagram of the components of the real-time human face tracking system in an embodiment of the invention; referring to fig. 1, in an embodiment of the present invention, the face real-time tracking system includes: the system comprises a video frame image acquisition module 1, a face detection module 2, a face feature point positioning module 3 and a face tracking module 4.
The video frame image acquisition module 1 is used for acquiring each frame image of a video.
The face detection module 2 is used for calling different face detection models for each frame of image to find out whether a face appears; the models can traverse each position of the image, determine whether a face is present at that position, and return a reliability. The module synthesizes a plurality of models, so that the reliability of face recognition is further improved; for a certain position, only if the reliability is higher than a set value, the face is considered to appear indeed; recording the position of the face, and sending the corresponding position to a face feature point positioning module;
the face feature point positioning module 3 is configured to position coordinates (x i ,y i ). In an embodiment of the present invention, the face feature point positioning module gradually generates a plurality of decision trees (in an embodiment of the present invention, the face feature point positioning module gradually generates a plurality of decision trees based on a gradient lifting algorithm), and the goal is to reduce the value of a second order functional; modifying the coordinates (x) along the direction of the optimized gradient i ,y i ) Is a value of (2); each feature point (x i ,y i ) Gradient of (2)
Figure BDA0001966164310000091
Wherein I is j Is the leaf node where the feature point is located, |I j I is the number of all feature points on the leaf node; the coordinates of each feature point are corrected as follows: x is x i =x i +ηdx i ,y i =y i +ηdx i The method comprises the steps of carrying out a first treatment on the surface of the η is a set constant.
The face tracking module 4 is used for tracking faces in the video to obtain continuous spatial gestures; for each frame of video, the second-order characteristic points of the face can be accurately positioned, and the sequence formed by the characteristic points represents the motion trail of the face; based on the characteristic points, judging the aerial gesture of the face; the change of the characteristic points among different frames reflects various changes and movements of the human face.
In an embodiment of the present invention, the face detection module considers that a face actually appears only if the reliability is >0.95 for a certain position.
In an embodiment of the present invention, the face detection module invokes different face detection models (such as MTCNN, YOLOv3, etc.) for each frame of image of the video, and traverses each position of the image; giving a confidence rate for each possible face position; the overall confidence of these confidences is calculated:
confidence rate=ω 1 R 12 R 2 +……;
wherein R is 1 ,R 2 … are confidence degrees, omega, returned by different face detection models respectively 1 ,ω 2 … are set coefficients, respectively;
if the comprehensive confidence rate is more than 0.95, the position has a face; recording the position and sending the position information to a face feature point positioning module; if no face is detected, the next frame of image is continuously detected.
In an embodiment of the present invention, the face feature point positioning module invokes a second-order feature point positioning procedure for an area where each face is located;
step B1, initializing N feature points (x according to a standard template i ,y i ) The coordinates of each point are from the average face;
step B2, constructing a decision tree, inputting the current coordinates of the feature points, and aiming at reducing the value of the following second-order functional
Figure BDA0001966164310000092
Wherein I is j Is the leaf node where the feature point is located, ω j Is the value of the leaf node; g i Is the first order of the loss function at each feature point,
Figure BDA0001966164310000093
is the optimal solution of each feature point, h i Is the second derivative of the loss function at each feature point;
step B3, the functional takes the minimum value, and the coordinate correction of each characteristic point is correspondingly carried out; that is, the second order feature point is corrected by taking the minimum value of the second order functional, which is defined as follows:
Figure BDA0001966164310000101
and B4, correcting the coordinates of each characteristic point:
x i =x i +ηdx i ,y i =y i +ηdx i the method comprises the steps of carrying out a first treatment on the surface of the Wherein η is a set constant; in one embodiment of the present invention, the constant η is 0.01.
Step B5, generating a plurality of decision trees continuously, such as correction (dx) i ,dy i ) And if the characteristic points are smaller than the set threshold (such as one thousandth of a coordinate value), the module converges, records the characteristic points and sends the characteristic points to the face tracking module, otherwise, the step B2 is continued.
In an embodiment of the invention, the face tracking module is configured to calculate a spatial pose of a face according to second-order feature points of the face; the second-order feature points are used for determining key positions of the face; the face tracking module is used for judging whether to blink according to the second-order characteristic points at the eyes, and judging whether to open the mouth according to the second-order characteristic points at the mouth; the face tracking module is used for judging the left-right and up-down gestures of the face according to the second-order feature points of the face;
and the face tracking module is used for analyzing the characteristic point sequence to obtain a motion track of the face.
The invention also discloses a face real-time tracking method, and FIG. 2 is a flow chart of the face real-time tracking method in an embodiment of the invention; referring to fig. 2, in an embodiment of the present invention, the face real-time tracking method includes:
step S1, acquiring each frame image of a video;
step S2, a face detection step, namely calling different face detection models for each frame of image to find out whether a face appears; the face detection models can traverse each position of the image, judge whether the face appears at the position or not, and return a reliability. The module synthesizes a plurality of models, so that the reliability of face recognition is further improved; for a certain position, only if the reliability is higher than a set value, the face is considered to appear indeed; recording the position of the face, and sending the corresponding position to a face feature point positioning module;
step S3, a step of locating feature points of the face, wherein coordinates (x i ,y i ). In one embodiment of the invention, a plurality of decision trees are generated step by step based on a gradient lifting algorithm, and the current coordinates of the input second-order feature points of each decision tree are aimed at reducing the value of the second-order functional; modifying the coordinates (x) along the direction of the optimized gradient i ,y i ) Is a value of (2); each feature point (x i ,y i ) Gradient of (2)
Figure BDA0001966164310000102
Wherein I is j Is the leaf node where the feature point is located, |I j I is the number of all feature points on the leaf node; the coordinates of each feature point are corrected as follows: x is x i =x i +ηdx i ,y i =y i +ηdx i
Step S4, tracking the human face in the video to obtain continuous spatial gestures; for each frame of video, the second-order characteristic points of the face can be accurately positioned, and the sequence formed by the characteristic points represents the motion trail of the face; based on the characteristic points, judging the aerial gesture of the face; the change of the characteristic points among different frames reflects various changes and movements of the human face.
In an embodiment of the present invention, in the step of face detection, for each frame of image of the video, a different face detection model is called, and each position of the image is traversed; giving a confidence rate for each possible face position; the overall confidence of these confidences is calculated:
confidence rate=ω 1 R 12 R 2 +……;
wherein R is 1 ,R 2 … are confidence degrees, omega, returned by different face detection models respectively 1 ,ω 2 … are set coefficients, respectively;
if the comprehensive confidence rate is more than 0.95, the position has a face; recording the position and sending the position information to a face feature point positioning module; if no face is detected, the next frame of image is continuously detected.
In an embodiment of the present invention, in the step of locating the feature points of the faces, a second-order feature point locating procedure is invoked for each area where the face is located;
step B1, initializing N feature points (x according to a standard template i ,y i ) The coordinates of each point are from the average face;
step B2, constructing a decision tree, inputting the current coordinates of the feature points, and aiming at reducing the value of the following second-order functional
Figure BDA0001966164310000111
Wherein I is j Is the leaf node where the feature point is located, ω j Is the value of the leaf node; g i Is the first order of the loss function at each feature point,
Figure BDA0001966164310000112
is the optimal solution of each feature point, h i Is the second derivative of the loss function at each feature point;
step B3, the functional takes the minimum value, and the coordinate correction of each characteristic point is correspondingly carried out; that is, the second order feature point is corrected by taking the minimum value of the second order functional, which is defined as follows:
Figure BDA0001966164310000113
and B4, correcting the coordinates of each characteristic point:
x i =x i +ηdx i ,y i =y i +ηdx i the method comprises the steps of carrying out a first treatment on the surface of the Wherein η is a set constant;
step B5, generating a plurality of decision trees continuously, such as correction (dx) i ,dy i ) Less than a set threshold (e.g., one thousandth of a coordinate value), the module converges, and these are recordedAnd (3) turning to a step of face tracking, otherwise, turning to a step B2.
In one embodiment of the invention, the face real-time tracking system and method adopts a sparse random forest regression algorithm. Multistage regression based on random forests is the current mainstream algorithm, which is shown in fig. 3. The algorithm can rapidly locate the characteristic points of each face, and then judges the gesture and the motion trail of the face according to the characteristic points. The method has high accuracy and high speed of positioning the characteristic points, and hundreds of frames of pictures can be processed per second on a PC. The mobile equipment has weaker computing power, but twenty frames of pictures can be detected every second, so that the effect of real-time tracking can be achieved. But this is based on its trained big data model, which requires hundreds of megaspaces. Simple simplification (e.g., reducing the number of trees, or reducing the regression progression) only reduces accuracy. As shown in fig. 4, by observing and analyzing the characteristics of the random forest model, the feature vector thereof can be seen to have sparsity. I.e. in the feature vector for each node, often only a few quantities have a high value, while the other components are not important. Therefore, a sparse representation algorithm is adopted for each node, so that the compression rate of more than 10 times is expected to be realized, the model is stored on a common mobile phone, and the precision is not affected. At the same time, the speed is also improved due to the shrinkage of the model.
In one embodiment of the invention, the face real-time tracking system and method uses a continuous flow model. When the characteristic points of the front face are positioned well, but when the face deflection angle is larger and larger, even only half face exists, many characteristic points disappear or the corresponding positions are unknown (as shown in fig. 5). These vanishing feature points cannot be located but rather act as interference. In one embodiment of the invention, a continuous flow pattern transformation model is used to address this problem by treating the moving sequence of faces as being located in a continuous flow pattern space. Fig. 6 is a schematic diagram of recognition of real-time tracking of a face according to an embodiment of the present invention, and as shown in fig. 6, the reference pose of the face is determined by determining the spatial transformations. And from the pose at this point it is determined which feature points are visible and for those feature points that disappear, no longer used for localization. The scheme has strong robustness, and is expected to accurately position the face under the difficult conditions of large side angles, partial shielding and the like.
In one embodiment of the invention, the face real-time tracking system and method uses secondary adaptive sampling feature points. The conventional method is based on the fixed face feature points, namely the number and the positions of the feature points are fixed, and a 68-point standard model is shown in fig. 7. We automatically encrypt the feature points based on the standard feature points, as shown in fig. 8. Various constraint relationships between the encryption points and the reference points further improve the accuracy of positioning. Moreover, the points are properly encrypted according to the datum points, so that the calculation amount is less, and real-time tracking can still be realized.
In summary, the face real-time tracking system and the face real-time tracking method provided by the invention can improve the accuracy, the processing speed and the stability of real-time tracking face information.
The description and applications of the present invention herein are illustrative and are not intended to limit the scope of the invention to the embodiments described above. Variations and modifications of the embodiments disclosed herein are possible, and alternatives and equivalents of the various components of the embodiments are known to those of ordinary skill in the art. It will be clear to those skilled in the art that the present invention may be embodied in other forms, structures, arrangements, proportions, and with other assemblies, materials, and components, without departing from the spirit or essential characteristics thereof. Other variations and modifications of the embodiments disclosed herein may be made without departing from the scope and spirit of the invention.

Claims (8)

1. A real-time face tracking system, the real-time face tracking system comprising:
the video frame image acquisition module is used for acquiring each frame image of the video;
the face detection module is used for calling different face detection models for each frame of image to find out whether a face appears; the face detection model can traverse each position of the image, judge whether a face appears at each position or not, and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to appear, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;
the face feature point positioning module is used for positioning coordinates (x i ,y i ) The method comprises the steps of carrying out a first treatment on the surface of the The face feature point positioning module gradually generates a plurality of decision trees, and each decision tree inputs the current coordinates of the second-order feature points, and the aim is to reduce the value of the second-order functional; modifying the coordinates (x) along the direction of the optimized gradient i ,y i ) Each feature point (x i ,y i ) Gradient of (2)
Figure FDA0004161082810000011
Figure FDA0004161082810000012
Wherein I is j Is the leaf node where the feature point is located, |I j I is the number of all feature points on the leaf node; the coordinates of each feature point are corrected as follows: x is x i =x i +ηdx i ,y i =y i +ηdx i The method comprises the steps of carrying out a first treatment on the surface of the η is a set constant;
the face tracking module is used for tracking the face in the video to obtain continuous spatial gestures; for each frame of video, the second-order characteristic points of the human face can be accurately positioned, and the sequence formed by the characteristic points represents the motion trail of the human face; based on the characteristic points, judging the aerial gesture of the face; the change of the characteristic points among different frames reflects various changes and movements of the human face.
2. The face real-time tracking system of claim 1, wherein:
the face detection module considers that the face actually appears only if the reliability is more than 0.95 for a certain position.
3. The face real-time tracking system of claim 1, wherein:
the face detection module invokes at least one face detection model for each frame of image of the video, and the models traverse each position of the image; giving a confidence rate for each possible face position; the overall confidence of these confidences is calculated:
confidence rate=ω 1 R 12 R 2 +……;
wherein R is 1 ,R 2 … are confidence degrees, omega, returned by different face detection models respectively 1 ,ω 2 … are set coefficients, respectively;
if the comprehensive confidence rate is more than 0.95, the position has a face; recording the position and sending the position information to a face feature point positioning module; if no face is detected, the next frame of image is continuously detected.
4. The face real-time tracking system of claim 1, wherein:
the face feature point positioning module calls a second-order feature point positioning algorithm for the region where each face is located;
b1, initializing coordinates (x) of N face feature points according to a standard template i ,y i ) The initial coordinates of each point come from an average face, namely, a plurality of faces are marked as samples, and the average value of the samples is taken for each characteristic point;
b2, defining a second-order functional and optimizing and solving:
constructing a decision tree with T leaf nodes; the input is the current coordinates of the feature points, and the aim is to reduce the value of the following second-order functional;
Figure FDA0004161082810000021
wherein I is j Is the leaf node where sample i is located, (dx) j ,dy j ) Is the value of the optimal solution at the leaf node; g i Is the first derivative of the loss function at each feature point, h i Is the second derivative of the loss function at each feature point,
Figure FDA0004161082810000022
is the optimal solution of each feature point, the second order functional +.>
Figure FDA0004161082810000023
The extremum problem of (2) is converted into the optimal value calculation of the T leaf nodes;
b3, the functional takes the minimum value, and the coordinate correction of each characteristic point is correspondingly carried out; namely, the correction of the second-order feature points comes from the minimum value of the second-order functional; for leaf node T j Wherein I j The i samples, defined as follows:
Figure FDA0004161082810000024
and B4, correcting the coordinates of each characteristic point:
if the leaf node at which sample i is located is T j Then:
x i =x i +ηdx j ,y i =y i +ηdx j the method comprises the steps of carrying out a first treatment on the surface of the Wherein, eta is a set constant and is generally valued;
and B5, continuously generating a plurality of decision trees, if the correction (sigma|dx| and sigma|dy|) between two continuous decision trees is smaller than a set threshold, converging by the module, recording the characteristic points, and sending the characteristic points to a face tracking module, otherwise, continuing the step B2.
5. The face real-time tracking system of claim 1, wherein:
the face tracking module is used for calculating the spatial pose of the face according to the second-order feature points of the face; the second-order feature points are used for determining key positions of the face; the face tracking module is used for judging whether to blink according to the second-order characteristic points at the eyes, and judging whether to open the mouth according to the second-order characteristic points at the mouth; the face tracking module is used for judging the left-right and up-down gestures of the face according to the second-order feature points of the face;
and the face tracking module is used for analyzing the characteristic point sequence to obtain a motion track of the face.
6. The real-time face tracking method is characterized by comprising the following steps of:
a video frame image acquisition step of acquiring each frame image of a video;
a face detection step of calling different face detection models for each frame of image to find whether a face appears; the face detection models can traverse each position of the image, judge whether the face appears at the position or not, and return a reliability; for a certain position, only if the reliability is higher than a set value, the face is considered to appear, the position of the face is recorded, and the corresponding position is sent to a face feature point positioning module;
a face feature point positioning step of positioning coordinates (x i ,y i ) The method comprises the steps of carrying out a first treatment on the surface of the Gradually generating a plurality of decision trees, each of which inputs the current coordinates of the second-order feature points, with the objective of decreasing the value of the second-order function, i.e., improving the coordinates (x i ,y i ) I.e. each feature point (x i ,y i ) Gradient of (2)
Figure FDA0004161082810000031
Wherein I is j Is the leaf node where the feature point is located, |I j I is the number of all feature points on the leaf node; the coordinates of each feature point are corrected as follows: x is x i =x i +ηdx i ,y i =y i +ηdx i
A face tracking step of tracking a face in a video to obtain continuous spatial gestures thereof; for each frame of video, the second-order characteristic points of the human face can be accurately positioned, and the sequence formed by the characteristic points represents the motion trail of the human face; based on the characteristic points, judging the aerial gesture of the face; the change of the characteristic points among different frames reflects various changes and movements of the human face.
7. The face real-time tracking method according to claim 6, wherein:
in the step of face detection, for each frame of image of the video, different face detection models are called, and each position of the image is traversed; giving a confidence rate for each possible face position; the overall confidence of these confidences is calculated:
confidence rate=ω 1 R 12 R 2 +……;
wherein R is 1 ,R 2 … are confidence degrees, omega, returned by different face detection models respectively 1 ,ω 2 … are set coefficients, respectively;
if the comprehensive confidence rate is more than 0.95, the position has a face; recording the position and sending the position information to a face feature point positioning module; if no face is detected, the next frame of image is continuously detected.
8. The face real-time tracking method according to claim 6, wherein:
in the step of locating the face feature points, a second-order feature point locating flow is called for the area where each face is located;
step B1, initializing N feature points (x according to a standard template i ,y i ) The coordinates of each point are from the average face;
step B2, constructing a decision tree, inputting the current coordinates of the feature points, and reducing the value of the following second-order functional;
Figure FDA0004161082810000041
wherein I is j Is the leaf node where the feature point is located, ω j Is the value of the leaf node; g i Is the first order of the loss function at each feature point,
Figure FDA0004161082810000042
is the optimal solution of each feature point, h i Is the second derivative of the loss function at each feature point;
step B3, the functional takes the minimum value, and the coordinate correction of each characteristic point is correspondingly carried out; that is, the second order feature point is corrected by taking the minimum value of the second order functional, which is defined as follows:
Figure FDA0004161082810000043
and B4, correcting the coordinates of each characteristic point:
x i =x i +ηdx i ,y i =y i +ηdx i the method comprises the steps of carrying out a first treatment on the surface of the Wherein η is a set constant;
step B5, generating a plurality of decision trees continuously, such as correction (dx) i ,dy i ) If the characteristic points are smaller than the set threshold, the module converges, records the characteristic points, and goes to the step of face tracking, otherwise, goes to the step B2.
CN201910103409.9A 2019-02-01 2019-02-01 Real-time human face tracking system and method Active CN111523345B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910103409.9A CN111523345B (en) 2019-02-01 2019-02-01 Real-time human face tracking system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910103409.9A CN111523345B (en) 2019-02-01 2019-02-01 Real-time human face tracking system and method

Publications (2)

Publication Number Publication Date
CN111523345A CN111523345A (en) 2020-08-11
CN111523345B true CN111523345B (en) 2023-06-23

Family

ID=71899996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910103409.9A Active CN111523345B (en) 2019-02-01 2019-02-01 Real-time human face tracking system and method

Country Status (1)

Country Link
CN (1) CN111523345B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114140864B (en) * 2022-01-29 2022-07-05 深圳市中讯网联科技有限公司 Trajectory tracking method and device, storage medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101344922A (en) * 2008-08-27 2009-01-14 华为技术有限公司 Human face detection method and device
CN103310204A (en) * 2013-06-28 2013-09-18 中国科学院自动化研究所 Feature and model mutual matching face tracking method based on increment principal component analysis
CN104182718A (en) * 2013-05-21 2014-12-03 腾讯科技(深圳)有限公司 Human face feature point positioning method and device thereof
WO2014205768A1 (en) * 2013-06-28 2014-12-31 中国科学院自动化研究所 Feature and model mutual matching face tracking method based on increment principal component analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101344922A (en) * 2008-08-27 2009-01-14 华为技术有限公司 Human face detection method and device
CN104182718A (en) * 2013-05-21 2014-12-03 腾讯科技(深圳)有限公司 Human face feature point positioning method and device thereof
CN103310204A (en) * 2013-06-28 2013-09-18 中国科学院自动化研究所 Feature and model mutual matching face tracking method based on increment principal component analysis
WO2014205768A1 (en) * 2013-06-28 2014-12-31 中国科学院自动化研究所 Feature and model mutual matching face tracking method based on increment principal component analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
战江涛 ; 刘强 ; 柴春雷 ; .基于三维模型与Gabor小波的人脸特征点跟踪方法.浙江大学学报(工学版).2011,(01),全文. *

Also Published As

Publication number Publication date
CN111523345A (en) 2020-08-11

Similar Documents

Publication Publication Date Title
Wang et al. Hidden‐Markov‐models‐based dynamic hand gesture recognition
US9299161B2 (en) Method and device for head tracking and computer-readable recording medium
CN107146237B (en) Target tracking method based on online state learning and estimation
CN111127519B (en) Dual-model fusion target tracking control system and method thereof
Feng et al. Kalman filter for spatial-temporal regularized correlation filters
CN111402303A (en) Target tracking architecture based on KFSTRCF
Ma et al. Correlation filters based on multi-expert and game theory for visual object tracking
Baik et al. Learning to forget for meta-learning via task-and-layer-wise attenuation
CN112258557A (en) Visual tracking method based on space attention feature aggregation
CN110378932B (en) Correlation filtering visual tracking method based on spatial regularization correction
CN111523345B (en) Real-time human face tracking system and method
Xing et al. NoisyOTNet: A robust real-time vehicle tracking model for traffic surveillance
CN107798329A (en) Adaptive particle filter method for tracking target based on CNN
CN108469729B (en) Human body target identification and following method based on RGB-D information
Gu et al. Vtst: Efficient visual tracking with a stereoscopic transformer
Ikram et al. Real time hand gesture recognition using leap motion controller based on CNN-SVM architechture
Leng et al. Stable hand pose estimation under tremor via graph neural network
CN113129332A (en) Method and apparatus for performing target object tracking
CN107194947B (en) Target tracking method with self-adaptive self-correction function
Ke An efficient and accurate DDPG-based recurrent attention model for object localization
Deng et al. Supervised learning based online filters for targets tracking using radar measurements
Huang et al. Methods on visual positioning based on basketball shooting direction standardisation
JP2013152595A (en) Information processing apparatus and method, and program
Zhang et al. A fast and robust maneuvering target tracking method without Markov assumption
CN104751448A (en) Online video tracking method based on PCA (Principal Component Analysis) and noise separation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant