CN118038560A

CN118038560A - Method and device for predicting face pose of driver

Info

Publication number: CN118038560A
Application number: CN202410438255.XA
Authority: CN
Inventors: 胡启昶; 陈宇; 张如高; 虞正华
Original assignee: Magic Vision Intelligent Technology Wuhan Co ltd
Current assignee: Magic Vision Intelligent Technology Wuhan Co ltd
Priority date: 2024-04-12
Filing date: 2024-04-12
Publication date: 2024-05-14

Abstract

The invention relates to the technical field of face pose estimation, and discloses a face pose prediction method and device for a driver. Wherein the method comprises the following steps: acquiring a face image to be detected; identifying a plurality of key points in the face image to be detected; carrying out alignment treatment on the face image to be detected; determining a rolling angle corresponding to the face image to be detected; calculating to obtain a yaw angle projection coefficient and a pitch angle projection coefficient of the face image to be detected; according to the yaw angle projection coefficient and the first mapping relation, calculating the yaw angle of the face in the face image to be detected, and according to the pitch angle projection coefficient and the second mapping relation, calculating the pitch angle of the face in the face image to be detected; and determining the rolling angle, the yaw angle and the pitch angle as a face gesture prediction result of the face image to be detected. By implementing the technical scheme of the invention, the calculation dimension is greatly reduced, and the accuracy and the reliability of the face gesture estimation are improved on the premise of accurate feature point detection.

Description

Method and device for predicting face pose of driver

Technical Field

The invention relates to the technical field of face pose estimation, in particular to a face pose prediction method and device for a driver.

Background

With the continuous development of artificial intelligence technology, face pose estimation has become one of key technologies in the intelligent driving field, aiming at monitoring the state of a driver to assist driving or monitoring dangerous driving behaviors and ensuring traffic safety.

The existing face pose prediction method mainly carries out regression on an input face image based on a trained neural network to estimate the pose of the face in the image, or projects the face image to each principal component analysis pose space, and selects the pose of the face in the face image from the poses of each principal component analysis pose space. However, in the practical application process, the feature extraction and matching of the neural network or the principal component analysis gesture space all need higher-dimension calculation, so that the calculation complexity is high, and the face gesture recognition efficiency is affected. And the quality of the face image itself, for example, whether the face is blurred, whether the face is blocked or whether the face is bright or dark, can have an influence on the prediction of the final face pose.

Disclosure of Invention

In view of the above, the present invention provides a method and apparatus for predicting a face pose of a driver, so as to solve the problem of how to efficiently and accurately predict the face pose during driving.

In a first aspect, the present invention provides a face pose prediction method for a driver, including: acquiring a face image to be detected; identifying a plurality of key points in the face image to be detected to obtain a key point set; aligning the face image to be detected by utilizing a pre-configured reference key point set and a key point set to obtain a processed standard key point set; determining a rolling angle corresponding to the face image to be detected based on the relation between the reference key point set and the corresponding key points in the key point set; calculating to obtain a yaw angle projection coefficient and a pitch angle projection coefficient of the face image to be detected by utilizing the characteristics of key points in the standard key point set; according to the yaw angle projection coefficient and a first mapping relation between the yaw angle projection coefficient and the face attitude angle, the yaw angle of the face in the face image to be detected is calculated, and according to the pitch angle projection coefficient and a second mapping relation between the pitch angle projection coefficient and the face attitude angle, the pitch angle of the face in the face image to be detected is calculated; and determining the rolling angle, the yaw angle and the pitch angle as a face gesture prediction result of the face image to be detected.

According to the face gesture prediction method for the driver, after the face image to be detected is obtained, the face image to be detected is analyzed, a plurality of key points in the face image to be detected are identified, and the key points can reflect different characteristics and positions of the face. And carrying out alignment processing by utilizing a pre-configured reference key point set and a key point set of the face to be detected to obtain a standard key point set so as to eliminate deviation caused by the gesture of the face and enable the subsequent calculation to be more accurate. And determining the rolling angle of the face image to be detected, namely the rotation angle of the face on the horizontal plane by comparing the relation between the reference key point set and the key point set of the face image to be detected. And calculating the projection coefficients of the yaw angle and the pitch angle of the face image to be detected by utilizing the characteristics of the key points in the standard key point set. And calculating the yaw angle and the pitch angle of the face in the face image to be detected according to the projection coefficients of the yaw angle and the pitch angle and the mapping relation between the projection coefficients and the face attitude angle. And finally, the calculated rolling angle, yaw angle and pitch angle are used as the face gesture prediction result of the face image to be detected, so that the calculation dimension is greatly reduced, the interference of image blurring, illumination change and partial shielding on the picture pixel information is avoided, the accuracy and reliability of face gesture prediction are improved on the premise of accurate feature point detection, and the face gesture prediction method is simple, efficient and wider in application range.

In an alternative embodiment, before acquiring the face image to be measured, the method further includes: acquiring a face pose sample data set, wherein the face pose sample data set comprises a plurality of face sample images marked with face key points and face poses, the face key points correspond to different positions in a face, and the different positions are described through different numbers of face key points; extracting a sample key point labeling set and a sample gesture angle labeling set of a plurality of face sample images from a face gesture sample data set; the sample key point labeling set is used for representing coordinate sets of a plurality of face key points of a plurality of face sample images, and the sample attitude angle labeling set is used for representing sample pitch angles and sample yaw angle sets corresponding to a plurality of face attitudes of the plurality of face sample images; aiming at specific key points in the face key points, taking a first coordinate average value from a plurality of sample coordinates corresponding to the specific key points included in the sample key point labeling set, and determining the first coordinate average value as reference key point coordinates of the specific key points; wherein the specific key point is any point in the key points of the human face; taking an angle average value of a plurality of face gesture angles corresponding to a plurality of face sample images included in the sample gesture angle annotation set, and determining the angle average value as a reference gesture angle of the face gesture sample data set; the reference attitude angle comprises a yaw angle reference attitude angle and a pitch angle reference attitude angle; a plurality of reference keypoint coordinates and reference pose angles are determined as a reference set of keypoints.

The face pose prediction method for the driver provided by the embodiment of the invention collects a plurality of face sample images containing labeled face key points and face poses before acquiring the face image to be detected. And extracting sample key point labeling sets and sample gesture angle labeling sets of a plurality of face sample images from the face gesture sample data set. And taking an average value of a plurality of sample coordinates corresponding to the specific key point from the sample key point labeling set aiming at the specific key point in the face key points, and determining the average value as the reference key point coordinate of the specific key point. Taking an average value of the attitude angles of a plurality of face sample images from the sample attitude angle annotation set, and determining the average value as a reference attitude angle of the face attitude sample data set, wherein the reference attitude angle comprises a yaw angle and a pitch angle. And determining a plurality of reference key point coordinates and reference gesture angles as a reference key point set for subsequent face gesture prediction, so that the accurate prediction of the gesture of the face image to be detected can be improved by establishing the reference key point set and the reference gesture angles, and errors caused by sample differences are reduced. By processing the sample data set and extracting the key information, the influence of noise in the data on the prediction result can be effectively reduced, and the stability of the algorithm is improved. After the reference key point set and the attitude angle are determined, the complexity of subsequent calculation can be simplified, and the efficiency and practicality of the algorithm are improved.

In an alternative embodiment, the aligning process is performed on the face image to be detected by using a pre-configured reference key point set and a key point set, so as to obtain a processed standard key point set, which includes: taking a second coordinate average value for a plurality of reference key point coordinates in the reference key point set, and determining the second coordinate average value as a reference centroid of the reference face image corresponding to the reference key point set; identifying a plurality of actual coordinate positions corresponding to a plurality of key points in the key point set; taking a third coordinate average value from the actual coordinate positions, and determining the third coordinate average value as an actual centroid of the face image to be detected; and carrying out alignment processing on the face image to be detected by utilizing the position information of the reference centroid and the actual centroid to obtain a processed standard key point set.

According to the face gesture prediction method for the driver, provided by the embodiment of the invention, the second coordinates of the plurality of reference key point coordinates in the reference key point set are averaged to obtain the reference centroid of the reference face image corresponding to the reference key point set. And identifying a plurality of key points in the key point set of the face image to be detected, determining the coordinate positions of the key points in the actual image, and averaging the third coordinates of the actual coordinate positions to obtain the actual centroid of the face image to be detected. And carrying out alignment processing on the face image to be detected by utilizing the position information of the reference centroid and the actual centroid. And (3) aligning the face image to be detected with the reference face image by adjusting the position of the face image to be detected, so as to obtain a processed standard key point set. Therefore, the alignment processing is carried out on the face image to be detected, so that the face image to be detected is more consistent with the reference face image in position, and the accuracy and reliability of subsequent processing tasks are improved. By calculating the reference centroid and the actual centroid, the spatial offset and the rotation difference in the face image to be detected can be eliminated, so that the subsequent operation is more stable and reliable. Through alignment processing, the face image to be detected can be converted into a standard key point set, so that the complexity of a subsequent processing task is simplified, and the efficiency and accuracy of an algorithm are improved.

In an optional implementation manner, alignment processing is performed on the face image to be detected by using relevant position information of the reference centroid and the actual centroid, and a processed standard key point set is obtained, which includes: subtracting the reference centroid coordinates corresponding to the reference centroids from the reference key point coordinates in the reference key point set respectively to obtain the standardized reference key point coordinates of the reference face image; subtracting the actual centroid coordinates corresponding to the actual centroids from any one of the actual coordinate positions in the key point set to obtain standardized actual coordinate positions of the face image to be detected; calculating a rotation matrix according to the standardized reference key point coordinates and the standardized actual coordinate positions so as to rotate a plurality of actual coordinate positions to the positions of a plurality of reference key point coordinates; and obtaining a standard key point set according to the plurality of actual coordinate positions of the rotated rotation matrix.

According to the face gesture prediction method for the driver, provided by the embodiment of the invention, the reference centroid coordinates corresponding to the reference centroid are subtracted from the reference key point coordinates in the reference key point set respectively, so that the standardized reference key point coordinates of the reference face image are obtained. Subtracting the actual centroid coordinates corresponding to the actual centroids from any one of the actual coordinate positions in the key point set to obtain the standardized actual coordinate positions of the face image to be detected. According to the standardized reference key point coordinates and the standardized actual coordinate positions, a rotation matrix capable of rotating a plurality of actual coordinate positions to a plurality of reference key point coordinate positions is calculated. And transforming the actual coordinate position according to the rotation matrix to obtain the rotated actual coordinate position. And obtaining the processed standard key point set through the rotated actual coordinate position. Therefore, by calculating the position information of the reference centroid and the actual centroid and the transformation of the rotation matrix, more accurate alignment processing can be realized, so that the positions of the face image to be detected and the reference face image are more consistent. By standardizing the coordinates of the reference key points, the positions of the actual coordinates and the transformation of the rotation matrix, the spatial displacement and rotation difference in the face image to be detected can be eliminated, so that the subsequent processing is more accurate and reliable. By obtaining the standard key point set, the complexity of subsequent processing tasks can be simplified, and the stability and accuracy of the algorithm can be improved.

In an alternative embodiment, determining a roll angle corresponding to a face image to be detected based on a relationship between a reference key point set and corresponding key points in the key point set includes: performing inverse trigonometric function operation on the rotation matrix to obtain a rotation angle between the reference key point set and the key point set; and determining the rotation angle as a rolling angle corresponding to the face image to be detected.

According to the face gesture prediction method for the driver, which is provided by the embodiment of the invention, the rotation matrix is subjected to inverse trigonometric function operation so as to calculate the rotation angle between the reference key point set and the key point set. The calculated rotation angle is determined to be the corresponding rolling angle of the face image to be detected, so that the rotation condition of the face image to be detected relative to the reference face image can be described more accurately by determining the rolling angle based on the relation between the reference key point set and the key point set of the face image to be detected, and the accuracy of subsequent processing is improved. The rolling angle is determined, so that the processing flow of the face image to be detected can be simplified, the adjustment and the alignment are more visual and convenient, and the follow-up tasks such as face recognition and feature extraction are facilitated. Determining the roll angle can help stabilize rotational transformations that may exist during processing, thereby improving the stability and robustness of the algorithm, ensuring that the processing results are more reliable and accurate.

In an alternative embodiment, after determining the plurality of reference keypoint coordinates and the reference pose angle as the reference keypoint set, further comprises: preprocessing a plurality of face sample images in a face gesture sample data set according to a reference key point set to eliminate scale and position differences among the face sample images, so as to obtain a face key point standardized set; and carrying out principal component analysis on the face key point standardization set to obtain the corresponding relation between the change direction of the principal component space corresponding to each sign vector and the face posture change.

According to the face gesture prediction method for the driver, provided by the embodiment of the invention, a plurality of reference key point coordinates and reference gesture angles are determined as a reference key point set and used as standard reference points for subsequent processing. And preprocessing a plurality of face sample images in the face gesture sample data set according to the reference key point set. And obtaining a standardized set of face key points through the preprocessed face sample image. And carrying out principal component analysis on the face key point standardization set so as to explore the main change direction in the data set. Therefore, the preprocessing process can effectively eliminate the scale and position difference between different face samples, so that the subsequent analysis is more accurate and reliable. By obtaining the standardized set of the key points of the human face, the key point positions among different human face samples can be in the same scale and position, and the comparison and analysis of subsequent data are facilitated. Principal component analysis can help reveal the primary direction of change in the dataset, thereby better understanding the features and correlation of face pose changes, providing beneficial information for subsequent pattern recognition and classification tasks.

In an optional implementation manner, principal component analysis is performed on the face key point standardization set to obtain a corresponding relationship between a change direction of a principal component space corresponding to each symptom vector and a face posture change, where the principal component analysis includes: constructing a two-dimensional key point matrix according to the face key point standardized set; calculating the mean vector of the two-dimensional key point matrix; subtracting the mean value vector from the two-dimensional key point matrix to obtain a standardized key point matrix; transposing the standardized key point matrix to obtain a transposed matrix of the standardized key point matrix; based on the transpose matrix, the standardized key point matrix and the number of a plurality of face sample images in the face gesture sample data set, calculating to obtain a covariance matrix corresponding to the two-dimensional key point matrix; performing eigenvalue decomposition on the covariance matrix to obtain eigenvectors and eigenvalues; the feature vector is used for representing the change direction of each sample key point in the face key point standardization set in each principal component space; the eigenvalues are used for representing the weight coefficients of the eigenvectors; sequencing the characteristic values according to the sequence from big to small to obtain a sequencing result; extracting a plurality of target feature values from the sorting result, and confirming target feature vectors corresponding to the plurality of target feature values; performing visual analysis on each target feature vector to obtain a visual analysis result; the visual analysis result is used for representing the change direction of the principal component space corresponding to the target feature vector; and selecting a yaw angle characteristic vector used for representing the change direction of the face yaw angle and a pitch angle characteristic vector used for representing the change direction of the pitch angle of the face from the target characteristic vectors according to the visual analysis result.

According to the face gesture prediction method for the driver, provided by the embodiment of the invention, the key point coordinates in the face key point standardization set are constructed into the two-dimensional key point matrix according to the specific rule, and the mean value vector of the two-dimensional key point matrix is calculated and used for subsequent standardization processing. Subtracting the mean vector from the two-dimensional key point matrix to obtain a standardized key point matrix so as to eliminate deviation in the data. Based on the standardized key point matrix, calculating to obtain a covariance matrix corresponding to the two-dimensional key point matrix for subsequent eigenvalue decomposition. And carrying out eigenvalue decomposition on the covariance matrix to obtain eigenvectors and eigenvalues, sorting the eigenvalues according to the size, and selecting eigenvectors corresponding to the first several eigenvalues as target eigenvectors. And carrying out visual analysis on the target feature vector, and displaying the change directions of the principal component space, including the change directions of the yaw angle and the pitch angle. Therefore, principal component analysis can help reduce the data dimension, extract the main characteristic which can represent the data change most, and help understand and analyze the change of the face gesture. Through eigenvalue decomposition and eigenvector selection, redundant information in the data can be reduced, the most important information is reserved, and the effectiveness and the interpretability of the data are improved. The main direction of the face posture change can be intuitively displayed through the visual analysis result, and the method is convenient for further research and application.

In an optional implementation manner, by using features of key points in the standard key point set, a yaw angle projection coefficient and a pitch angle projection coefficient of the face image to be measured are calculated, including: calculating the difference value between the standard key point set and the mean value vector; multiplying the difference value and the yaw angle feature vector to obtain a yaw angle projection coefficient of the face image to be detected; and multiplying the difference value by the pitch angle characteristic vector to obtain a pitch angle projection coefficient of the face image to be detected.

According to the face posture prediction method for the driver, provided by the embodiment of the invention, the key point data and the mean value vector of the face image to be detected are subtracted to obtain the difference value, and the difference value is multiplied by the yaw angle feature vector extracted in advance to obtain the yaw angle projection coefficient of the face image to be detected, so that the deviation degree of the face image to be detected in the yaw angle direction is reflected. And multiplying the difference value by the pitch angle characteristic vector to obtain a pitch angle projection coefficient of the face image to be detected, and reflecting the offset degree of the face image to be detected in the pitch angle direction. Therefore, the human face can be more accurately positioned and identified by calculating the yaw angle and pitch angle projection coefficients of the human face image to be detected, and personalized human face feature extraction is realized. By using the calculation method of the feature vector and the projection coefficient, the human face gesture can be simply and efficiently described and analyzed, and the calculation efficiency and accuracy are improved. The feature vector obtained by the principal component analysis can better capture the main change information in the face data, and is helpful for accurately and effectively representing the change of the yaw angle and the pitch angle of the face.

In an alternative embodiment, before calculating the yaw angle of the face in the face image to be measured, the method further includes: determining a first calculation result obtained by multiplying the standardized key point matrix and the yaw angle feature vector as a sample yaw angle projection coefficient; wherein the sample yaw angle projection coefficients comprise a plurality of projection coefficients of a plurality of face sample images on a yaw angle feature space; according to a preset yaw angle polynomial model, a yaw angle reference attitude angle and a sample yaw angle projection coefficient, calculating to obtain a yaw angle fitting parameter; and calculating to obtain a first mapping relation according to the yaw angle fitting parameter and the yaw angle polynomial model.

According to the face gesture prediction method for the driver, provided by the embodiment of the invention, the standardized key point matrix and the yaw angle feature vector are multiplied to obtain a first calculation result, namely a sample yaw angle projection coefficient. And calculating fitting parameters of the yaw angle based on a preset yaw angle polynomial model, a yaw angle reference attitude angle and a sample yaw angle projection coefficient. These parameters can be used to fit the change in the face image in the yaw angle direction. And calculating a first mapping relation by using the yaw angle fitting parameters and the yaw angle polynomial model, wherein the first mapping relation is used for mapping the sample yaw angle projection coefficient to the actual yaw angle. Therefore, the standard key point matrix is multiplied by the yaw angle feature vector to obtain the sample yaw angle projection coefficient, and the projection coefficient can effectively represent the change of the face image in the yaw angle direction, so that a foundation is provided for the subsequent fitting parameter calculation. Through a preset yaw angle polynomial model and a yaw angle reference attitude angle, flexible parameter setting and adjustment can be performed according to actual requirements so as to meet the accuracy requirements of yaw angles in different scenes. By calculating the yaw angle fitting parameter and the first mapping relation, the sample yaw angle projection coefficient can be mapped to the actual yaw angle measurement, so that the accuracy and the precision of yaw angle measurement are improved.

In an alternative embodiment, before calculating the pitch angle of the face in the face image to be measured, determining the second mapping relationship between the pitch angle projection coefficient and the face attitude angle includes: determining a second calculation result obtained by multiplying the standardized key point matrix and the pitch angle characteristic vector as a sample pitch angle projection coefficient; the sample pitch angle projection coefficients comprise a plurality of projection coefficients of a plurality of face sample images on a pitch angle characteristic space; calculating to obtain a pitch angle fitting parameter according to a preset pitch angle polynomial model, a pitch angle reference attitude angle and a sample pitch angle projection coefficient; and calculating to obtain a second mapping relation according to the pitch angle fitting parameter and the pitch angle polynomial model.

According to the face pose prediction method for the driver, provided by the embodiment of the invention, the standardized key point matrix and the pitch angle characteristic vector are multiplied to obtain a second calculation result, namely a sample pitch angle projection coefficient. And calculating to obtain fitting parameters of the pitch angle based on a preset pitch angle polynomial model, a pitch angle reference attitude angle and a sample pitch angle projection coefficient. And calculating to obtain a second mapping relation by using the pitch angle fitting parameters and the pitch angle polynomial model, wherein the second mapping relation is used for mapping the sample pitch angle projection coefficient to the actual pitch angle measurement. Therefore, the standard key point matrix is multiplied by the pitch angle characteristic vector to obtain the sample pitch angle projection coefficient, and the projection coefficient can effectively represent the change of the face image in the pitch angle direction, so that a foundation is provided for the subsequent fitting parameter calculation. Through a preset pitch angle polynomial model and a pitch angle reference attitude angle, flexible parameter setting and adjustment can be performed according to actual requirements so as to meet the accuracy requirements of the pitch angle in different scenes. The sample pitch angle projection coefficient can be mapped to the actual pitch angle measurement by calculating the pitch angle fitting parameter and the second mapping relation, so that the precision and accuracy of pitch angle measurement are improved.

In an alternative embodiment, the yaw angle polynomial model and the pitch angle polynomial model are both cubic polynomial models.

According to the face posture prediction method for the driver, which is provided by the embodiment of the invention, the three-time polynomial function is adopted to describe the posture change of the face in the two directions when the yaw angle and the pitch angle are modeled, so that the three-time polynomial model has higher flexibility, and complex posture change can be fitted well. For describing the change of the human face in yaw angle and pitch angle, the three-degree polynomial model can be used for more accurately capturing the fine change of the human face posture. Meanwhile, as the curvature and inflection points of the curve are considered in the cubic polynomial model, the gesture change of the face under different angles can be better described, so that the model is more in line with the actual situation, and the measurement accuracy of the gesture angle is improved.

In a second aspect, the present invention provides a face pose prediction apparatus of a driver, including: the first acquisition module is used for acquiring a face image to be detected; the identification module is used for identifying a plurality of key points in the face image to be detected to obtain a key point set; the alignment module is used for performing alignment treatment on the face image to be detected by utilizing the preset reference key point set and the key point set to obtain a standard key point set after treatment; the first determining module is used for determining a rolling angle corresponding to the face image to be detected based on the relation between the reference key point set and the corresponding key points in the key point set; the first calculation module is used for calculating and obtaining a yaw angle projection coefficient and a pitch angle projection coefficient of the face image to be detected by utilizing the characteristics of the key points in the standard key point set; the second calculation module is used for calculating the yaw angle of the face in the face image to be detected according to the yaw angle projection coefficient and the first mapping relation between the yaw angle projection coefficient and the face attitude angle, and calculating the pitch angle of the face in the face image to be detected according to the pitch angle projection coefficient and the second mapping relation between the pitch angle projection coefficient and the face attitude angle; and the second determining module is used for determining the rolling angle, the yaw angle and the pitch angle as a face gesture prediction result of the face image to be detected.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a face pose prediction method of a driver according to an embodiment of the present invention;

Fig. 2 is a flowchart illustrating a face pose prediction method of another driver according to an embodiment of the present invention;

FIG. 3 is a schematic illustration of 68 face keypoints according to an embodiment of the invention;

FIG. 4 is a flow chart of a face pose prediction method of a further driver according to an embodiment of the present invention;

fig. 5 is a block diagram of a face pose prediction apparatus of a driver according to an embodiment of the present invention;

fig. 6 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

With the continuous development of the artificial intelligence field, related technologies such as automatic driving and auxiliary driving are also advancing continuously. The driver state monitoring function has been applied to some vehicle types, and when the driver state monitoring function is implemented, it is often necessary to estimate the facial pose of the driver, so as to analyze the current driving state of the driver.

The existing face pose estimation method is mainly divided into three types: the first method is to calculate and obtain the face pose through the mapping relation between the three-dimensional coordinates and the two-dimensional coordinates of the face key points and the face model. The second is based on neural network to directly return the face attitude angle according to the input face image. The third is to project a face image to each principal component analysis pose space, and take the pose of the closest projection coefficient space as the face pose of the image. However, the above three methods have certain limitations, the first method relies on a standard face model and face key points, and if errors occur in positioning, larger errors in face pose estimation can be caused; in the second method, a neural network model is constructed to train a large number of face images with known gestures to recover the relationship and determine the face gestures, but the actual corresponding relationship requires a large number of training images and computing resources, and has high requirements on a deployment platform; the third method relies on the whole pixel information of the image, so that the calculation dimension is high, the pose space is discontinuous, and a large number of face image samples with different poses are needed.

In view of the above, the technical scheme of the invention can directly combine the most advanced open source face key point detection model, directly estimate the face gesture based on the face key points output by the model, save complex intermediate matching steps and post-processing processes, and avoid the cost of additional data collection, labeling and model training by carrying out principal component analysis on the key point sets to obtain projection coefficients of different principal component spaces, then selecting the projection coefficients of the associated space and the face gesture angles to fit to obtain corresponding mapping relations and estimating the face gesture. By using the key points of the human face to replace the image pixels in the traditional method, the calculation dimension is greatly reduced, the method is simple, efficient and easy to deploy, and the application range is wider. The method reduces the calculated amount, avoids the interference of image pixel information, image blurring, illumination change and partial shielding on the image, and improves the accuracy and reliability of face gesture estimation on the premise of accurate feature point detection.

According to an embodiment of the present invention, there is provided an embodiment of a face pose prediction method for a driver, it should be noted that the steps shown in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in an order different from that herein.

In this embodiment, a face pose prediction method of a driver is provided, which may be used for a vehicle, and fig. 1 is a flowchart of a face pose prediction method of a driver according to an embodiment of the present invention, as shown in fig. 1, where the flowchart includes the following steps:

step S101, obtaining a face image to be detected.

In a driver monitoring system or a vehicle safety system, a face image is acquired from a camera or other device for identifying, monitoring or evaluating the status of the driver. Specifically, the face image of the driver may be captured by an in-vehicle camera or a built-in camera.

Step S102, a plurality of key points in the face image to be detected are identified, and a key point set is obtained.

The key point set is a key point set which is formed by arranging the obtained coordinates of a plurality of key points and is used for representing the structural characteristics and the morphological information of the human face. Specifically, a face detection algorithm is used to determine the position of a face in an image, and a plurality of keypoints, such as the position of eyes, the position of nose, the position of mouth, etc., are identified and located on the detected face, which typically requires locating the keypoints by means of a face keypoint detection algorithm, such as a neural network-based keypoint detection model. And arranging the obtained key point coordinates into a key point set.

Step S103, aligning the face image to be detected by utilizing a preset reference key point set and a preset key point set to obtain a processed standard key point set.

The reference key point set is a set of key point coordinates which are defined in advance or marked in advance, and is used as a standard key point position in tasks such as face recognition, face alignment and the like. These reference keypoint sets may be predetermined for a particular task or a particular data set to be used as a benchmark for comparison or alignment.

The standard key point set is a set obtained after alignment processing is carried out on the face image to be detected. Specifically, according to a pre-configured reference key point set and a key point set in the face image to be detected, a transformation matrix may be calculated to describe how to transform the key points in the face image to be detected to align to the reference key point positions. And applying the calculated transformation matrix to a key point set on the face image to be detected, and performing corresponding transformation operation to align the key points to specified standard positions. After alignment treatment, a treated standard key point set can be obtained, and the set can be used for subsequent tasks such as face recognition, feature extraction and the like.

Step S104, determining the rolling angle corresponding to the face image to be detected based on the relation between the reference key point set and the corresponding key points in the key point set.

The Roll Angle (Roll Angle) is the rotation Angle of the face with respect to the horizontal direction, i.e., the Angle at which the head is tilted left and right. Specifically, according to the relation between the reference key point set and the corresponding key points in the key point set, calculating the rolling angle corresponding to the face image to be detected. For example, the transformation matrix calculated according to the preset reference key point set and the key point set in the face image to be detected may be calculated to obtain the rotation angle of the face relative to the horizontal line, that is, the roll angle. Or the position of the corresponding key point A can be found by utilizing the key point set in the face image to be detected according to the position of the key point A in the reference key point set, the horizontal offset between the position of the key point A in the reference key point set and the position of the key point A in the face image to be detected can be calculated by comparing the positions of the key point A and the key point A, and the rolling angle can be deduced according to the horizontal offset of eyes.

Step S105, calculating to obtain a yaw angle projection coefficient and a pitch angle projection coefficient of the face image to be detected by utilizing the characteristics of the key points in the standard key point set.

The yaw angle projection coefficient is used for calculating the yaw angle of the face, and the pitch angle projection coefficient is used for calculating the pitch angle of the face. Specifically, through the key point related features in the standard key point set and the key point related features in the reference key point set, the yaw angle projection coefficient and the pitch angle projection coefficient of the face image to be detected can be calculated.

Step S106, according to the yaw angle projection coefficient and the first mapping relation between the yaw angle projection coefficient and the face posture angle, the yaw angle of the face in the face image to be detected is calculated, and according to the pitch angle projection coefficient and the second mapping relation between the pitch angle projection coefficient and the face posture angle, the pitch angle of the face in the face image to be detected is calculated.

Wherein, yaw Angle (Yaw Angle) and pitch Angle (PITCH ANGLE) are angles describing the left-right rotation of the face on the horizontal plane and the up-down rotation on the vertical plane.

The first mapping relation is a corresponding relation between a yaw angle projection coefficient obtained through training data and a face posture angle. The second mapping relation is the corresponding relation between the pitch angle projection coefficient obtained through training data and the face attitude angle. Specifically, after the yaw angle projection coefficient is obtained, the yaw angle of the face in the face image to be detected can be calculated by combining the first mapping relation. And after the pitch angle projection coefficient is obtained, the pitch angle of the face in the face image to be detected can be calculated by combining the second mapping relation.

And S107, determining a rolling angle, a yaw angle and a pitch angle as a face gesture prediction result of the face image to be detected.

Roll angle, yaw angle and pitch angle are used to describe the rotation of a face in three dimensions. Specifically, after the roll angle, yaw angle and pitch angle are determined, the roll angle, yaw angle and pitch angle are determined as the face pose prediction result of the face image to be detected.

In this embodiment, a face pose prediction method of a driver is provided, which may be used for a vehicle, and fig. 2 is a flowchart of a face pose prediction method of a driver according to an embodiment of the present invention, as shown in fig. 2, where the flowchart includes the following steps:

Step S201, a face gesture sample data set is obtained, wherein the face gesture sample data set comprises a plurality of face sample images marked with face key points and face gestures, the face key points correspond to different parts in a face, and the different parts are described through different numbers of face key points.

The face pose sample data set is a data set containing marked face pose information, and comprises a large number of face sample images, wherein each image marks the key point positions of the faces and the corresponding face pose information. Specifically, the face pose sample data set may be acquired by an academic research institution or a data sharing platform.

For example, the face pose sample data set is downloaded through the data sharing platform, and the face sample data set contains more than 600 face images with poses and expression diversity of multiple people collected from different angles under multiple illumination conditions, and 68 face key points and corresponding poses are marked, as shown in fig. 3.

Step S202, extracting a sample key point labeling set and a sample gesture angle labeling set of a plurality of face sample images from a face gesture sample data set; the sample key point labeling set is used for representing coordinate sets of a plurality of face key points of a plurality of face sample images, and the sample attitude angle labeling set is used for representing sample pitch angles and sample yaw angle sets corresponding to a plurality of face attitudes of the plurality of face sample images.

The sample keypoint labeling set represents the positions of keypoints in the face image, such as the coordinate positions of eyes, nose, mouth and the like. The sample attitude angle annotation set represents the pitch angle and yaw angle of the face, namely the orientation angle of the face in the three-dimensional space. Specifically, reading an image file and a corresponding labeling file in a face gesture sample data set, analyzing a key point labeling and a gesture angle labeling of each image, and sorting the key point labeling and the gesture angle labeling into a set form so that the key point labeling and the gesture angle labeling correspond to each face image.

For example, a plurality of face keypoints are typically defined using 68 keypoints as shown in FIG. 3, which can be used to describe various parts and poses of the face. For each face picture in the data set, a group of face key point labeling sets are correspondingFor representing 68 face keypoints, whereAnd representing the horizontal axis coordinate and the vertical axis coordinate of the ith key point on the 2D face image. Suppose that the dataset contains/>The key point mark of the data set can be obtained by the face image: /(I)Wherein/>And the key points are a group of key points corresponding to the ith face picture.

For another example, the face pose is mainly through three pose angles: pitch angle(Pitch), yaw angle/>(Yaw) and roll angle/>(Roll) to represent the angle of the face in the three-dimensional coordinate system. For each face picture in the dataset, a set of pose angle labels/>Wherein/>Representing pitch angle of face image,/>Representing the yaw angle. Suppose that the dataset contains/>The face image can be obtained to obtain the face attitude angle mark of the training set:

Wherein the method comprises the steps of Labeling set of pitch angle of each face image in data set,/>A set of annotations for each face image yaw angle in the dataset.

Step S203, aiming at specific key points in the key points of the human face, taking a first coordinate average value of a plurality of sample coordinates corresponding to the specific key points included in the sample key point labeling set, and determining the first coordinate average value as the reference key point coordinates of the specific key points; the specific key point is any point in the key points of the human face.

Any one of the face keypoints is selected as a specific keypoint, for example, a B-keypoint of 68 keypoints, and all sample coordinates containing the selected B-keypoint are found from the sample keypoint labeling set, and correspond to the B-keypoint positions in different face images. And calculating a first coordinate average value for a plurality of coordinates of the B key point in each sample, and determining the coordinate corresponding to the first coordinate average value as the reference key point coordinate of the B key point.

Step S204, taking an angle average value of a plurality of face gesture angles corresponding to a plurality of face sample images included in the sample gesture angle annotation set, and determining the angle average value as a reference gesture angle of the face gesture sample data set; wherein the reference attitude angles include a yaw angle reference attitude angle and a pitch angle reference attitude angle.

And acquiring yaw angle and pitch angle data corresponding to the face sample images from the sample attitude angle annotation set, wherein the yaw angle and pitch angle data represent the orientation angles of different face samples. And respectively calculating the corresponding yaw angle and pitch angle of each face sample image, and respectively averaging the yaw angles and the pitch angles of all the samples to obtain a yaw angle average value and a pitch angle average value. And determining the yaw angle average value and the pitch angle average value as reference attitude angles of the face attitude sample data set.

In step S205, a plurality of reference keypoint coordinates and reference attitude angles are determined as a reference keypoint set.

The reference key point set is a set formed by arranging a plurality of reference key point coordinates and reference attitude angles. Specifically, the calculated coordinates of the plurality of reference key points and the calculated reference attitude angles are arranged into a set, so that each key point corresponds to a specific coordinate value, and each angle corresponds to the associated attitude angle one by one. And taking the tidied reference key point coordinates and the reference attitude angles as a part of a data set to form a set containing all the reference key points and the reference attitude angles.

And determining the face image corresponding to the reference key point set as a reference face image. For example, the reference face image may include 68 keypoints with a first coordinate average and a reference pose angle.

Step S206, preprocessing a plurality of face sample images in the face gesture sample data set according to the reference key point set to eliminate scale and position differences among the face sample images, and obtaining a face key point standardized set.

The face key point standardization set is a set obtained by adjusting the scale and the position of each face sample image and then sorting the processed key point coordinates. Specifically, because the face pose sample data set contains face images with various scales and positions, the face sample images need to be preprocessed to eliminate scale and position differences between the face sample images, so that the model can concentrate on learning the characteristics of the face pose.

For example, pose changes between different faces may be normalized to a uniform reference pose using a posing analysis (Procrustes Analysis), comprising the steps of:

And a1, carrying out normalization processing on all the coordinates of the key points of the face sample images in the sample key point labeling set, namely subtracting the corresponding average value of the first coordinates from the coordinates of the key points of each face sample image.

And a2, calculating the average value of all key points in the key point coordinate set of each face sample image, namely calculating the barycenter coordinate. And comparing the mass center of each face key point set with the mass center of the reference face image, and calculating the offset, namely the difference value between the mass center of the reference face and the mass center of the current face. And using the calculated offset to align and adjust the coordinates of each key point of the current face to the mass center of the reference face so as to enable the mass centers to coincide.

And a3, calculating the rotation angle between each face key point set and the reference face image, and correspondingly rotating all key points to enable the key points to be consistent with the direction of the reference face. Wherein, the rotation angle from the face key point set to the reference face key point set in each image is obtained by using a least square method,

（1）

Wherein a and b represent parameters of rotation variation:

And/> And the x coordinate mean value and the y coordinate mean value of the key points i in the data set are represented, and k is a scale coefficient and does not influence the rotation angle.

The partial derivative of (1) can be used for obtaining a target rotation variation parameter,

For a pair ofThe rotation angle/>, can be obtained by performing inverse trigonometric function operation，

And a4, calculating the scale difference between each face key point set and the reference face image, and correspondingly scaling all key points to enable the scale of the key points to be matched with the reference face.

In step a5, the alignment process is optimized in an iterative manner, and common optimization methods include a least squares method or an iterative closest point algorithm.

Step S207, principal component analysis is carried out on the face key point standardization set, and the corresponding relation between the change direction of the principal component space corresponding to each sign vector and the face posture change is obtained.

Principal component analysis (PRINCIPAL COMPONENTS ANALYSIS, PCA) is performed on the standardized set of face keypoints, which can extract the main direction of change of the keypoints, thereby reducing the dimensions of the keypoints and preserving the key information.

The main component analysis is performed on the key point information of the human face to find the corresponding relation between the change direction of the main component space corresponding to each sign vector and the change of the human face posture.

Specifically, the step S207 includes:

Step S2071, constructing a two-dimensional key point matrix according to the face key point standardized set.

And constructing a two-dimensional key point matrix according to the face key point standardized set, and arranging all face key point coordinates into a two-dimensional matrix according to a certain rule.

Specifically, face key points in the data set after the Pu's analysis are stored into a two-dimensional matrixIn,Wherein/>Is the number of face sample images in the face gesture sample data set,/>Is the number of face keypoints multiplied by the coordinate dimension of the keypoints, e.g./>. Matrix/>Each row vector of (a) represents a face sample and each column vector represents the abscissa or ordinate of a key point.

Step S2072, calculating the mean vector of the two-dimensional key point matrix; subtracting the mean vector from the two-dimensional key point matrix to obtain a standardized key point matrix.

And calculating the average value vector of the two-dimensional key point matrix, and subtracting the average value vector from the two-dimensional key point matrix to obtain a standardized key point matrix so as to eliminate the offset of the data.

Specifically, the keypoint coordinates are normalized such that all keypoints are in the same scale range. Computing a matrixMean vector/>Represents the average value of the key points in each dimension,

Matrix arraySubtracting the mean vector/>Obtaining a standardized key point matrix/>，

Step S2073, transpose the standardized key point matrix to obtain a transposed matrix of the standardized key point matrix.

And transposing the standardized key point matrix to obtain a transposed matrix of the standardized key point matrix for subsequent calculation. Specifically, the standardized key point matrix is transposed to obtain。

Step S2074, calculating to obtain a covariance matrix corresponding to the two-dimensional key point matrix based on the transpose matrix, the standardized key point matrix and the number of the plurality of face sample images in the face gesture sample data set.

Based on the transpose matrix, the standardized key point matrix and the number of a plurality of face sample images in the face gesture sample data set, a covariance matrix corresponding to the two-dimensional key point matrix is calculated and obtained and used for analyzing the correlation among data.

Specifically, a data matrix is calculatedCovariance matrix/>Representing covariance relationships between the key points,

S2075, performing eigenvalue decomposition on the covariance matrix to obtain eigenvectors and eigenvalues; the feature vector is used for representing the change direction of each sample key point in the face key point standardization set in each principal component space; the eigenvalues are used to characterize the weight coefficients of the individual eigenvectors.

And carrying out eigenvalue decomposition on the covariance matrix to obtain eigenvectors and eigenvalues, wherein the eigenvectors and eigenvalues are used for representing the main change direction and weight of the data.

Specifically, for covariance matrixPerforming eigenvalue decomposition to obtain eigenvectors/>，

And corresponding characteristic values，

Wherein the feature vectorRepresenting the direction of change of the key point data in each principal component space,/>Is the direction of change of the ith feature space, and the feature value/>Representing the importance of the individual principal components (eigenvectors)/>Representing feature vectorsWeight coefficient of (c) in the above-mentioned formula (c).

Step S2076, sorting the characteristic values in the order from big to small to obtain a sorting result.

The eigenvalues are ordered by size for subsequent selection of the principal eigenvector. Specifically, for characteristic valuesOrdering from big to small, before selecting/>The characteristic values are such that/>The number of eigenvalues is more than 99% of the sum of all eigenvalues,

Step S2077, extracting a plurality of target feature values from the sorting result, and confirming target feature vectors corresponding to the plurality of target feature values.

A plurality of target feature values are extracted from the ranking result, and target feature vectors corresponding to these feature values are identified, and the feature vector corresponding to the largest feature value is generally selected. Specifically, before determiningFeature vectors corresponding to the feature values。

Step S2078, performing visual analysis on each target feature vector to obtain a visual analysis result; the visual analysis result is used for representing the change direction of the principal component space corresponding to the target feature vector.

And carrying out visual analysis on the target feature vector to obtain a visual result, and helping to understand the data change direction corresponding to the feature vector. In particular, by facing the frontFeature vector/>, corresponding to each feature valueAnd (5) carrying out visual inspection one by one, and observing the change direction of the feature space corresponding to the feature vector.

Step S2079, selecting a yaw angle characteristic vector for representing the change direction of the yaw angle of the face and a pitch angle characteristic vector for representing the change direction of the pitch angle of the face from target characteristic vectors according to the visual analysis result.

And selecting feature vectors, such as yaw angle feature vectors and pitch angle feature vectors, for representing the change direction of the facial gestures according to the visual analysis result so as to facilitate subsequent gesture analysis and recognition. Specifically, selecting from the ranking resultsAnd/>Will represent the/>, direction of change of the yaw angle of the faceIs recorded as/>Represents the change direction/>, of the pitching angle of the faceIs recorded as/>。

In this embodiment, a face pose prediction method of a driver is provided, which may be used for a vehicle, and fig. 4 is a flowchart of a face pose prediction method of a driver according to an embodiment of the present invention, as shown in fig. 4, the flowchart includes the following steps:

Step S301, a face image to be detected is obtained. Please refer to step S101 in the embodiment shown in fig. 1 in detail, which is not described herein.

Step S302, a plurality of key points in the face image to be detected are identified, and a key point set is obtained. Please refer to step S102 in the embodiment shown in fig. 1 in detail, which is not described herein.

Step S303, performing alignment processing on the face image to be detected by utilizing a pre-configured reference key point set and a key point set to obtain a processed standard key point set.

Specifically, the step S303 includes:

step S3031, a second coordinate average value is taken for the coordinates of the plurality of reference key points in the reference key point set, and the second coordinate average value is determined as the reference centroid of the reference face image corresponding to the reference key point set.

And extracting coordinates of a plurality of key points from the reference key point set, carrying out average calculation on the coordinates of the plurality of reference key points to obtain a second coordinate average value, and determining the calculated average value as a reference centroid of the reference face image corresponding to the reference key point set.

In step S3032, a plurality of actual coordinate positions corresponding to a plurality of keypoints in the set of keypoints are identified.

Analyzing the key point set to obtain the actual coordinate positions of a plurality of key points in the key point set. Specifically, the method can be realized by an image processing algorithm or a computer vision technology to extract the specific coordinates of each key point in the image, so as to provide an accurate data basis for subsequent face alignment and other tasks.

Step S3033, a third coordinate average value is obtained for the actual coordinate positions, and the third coordinate average value is determined as the actual centroid of the face image to be detected.

And carrying out average calculation on the actual coordinate positions to obtain a third coordinate average value, and determining the third coordinate average value obtained by calculation as the actual centroid coordinate of the face image to be detected.

Step S3034, alignment processing is carried out on the face image to be detected by utilizing the position information of the reference centroid and the actual centroid, and a processed standard key point set is obtained.

And determining the offset and the rotation angle of the face image to be detected by calculating the position information of the reference centroid and the actual centroid, and carrying out translation and rotation operations on the face image to be detected according to the offset and the rotation angle so as to align the face image to be detected with the reference face image. The alignment results in a processed standard set of keypoints whose positions in space have been kept consistent with the reference set of keypoints.

In some alternative embodiments, step S3034 includes:

And b1, subtracting the reference centroid coordinates corresponding to the reference centroids from the reference key point coordinates in the reference key point set respectively to obtain the standardized reference key point coordinates of the reference face image.

And calculating the average coordinates of all key points in the reference key point set, namely the reference centroid coordinates corresponding to the reference centroid. For each key point in the reference key point set, subtracting the coordinates of the reference centroid from the coordinates of each key point to obtain the coordinates of each standardized reference key point of the reference face image, and adjusting the coordinate values of each standardized reference key point relative to the position of the reference centroid to enable the reference centroid to be a new origin.

And b2, subtracting the actual centroid coordinates corresponding to the actual centroids from any one of the actual coordinate positions in the key point set to obtain the standardized actual coordinate positions of the face image to be detected.

And calculating average coordinates of all key points in the key point set to be measured, namely actual centroid coordinates corresponding to the actual centroid. For any one actual coordinate position in the key point set to be detected, subtracting the coordinates of the actual centroid from the coordinates of the key point set to be detected to obtain the standardized actual coordinate position of the face image to be detected, wherein the coordinates are adjusted relative to the actual centroid, so that the actual centroid becomes a new origin.

And b3, calculating a rotation matrix according to the standardized reference key point coordinates and the standardized actual coordinate positions so as to rotate the actual coordinate positions to the positions of the reference key point coordinates.

And calculating to obtain a rotation matrix by a least square method and other methods according to the standardized reference key point coordinates and the standardized actual coordinate positions. The rotation matrix is typically a 2x2 matrix describing how to rotate the coordinate system to align the two sets of keypoint locations. And multiplying all actual key point positions (subjected to standardization processing) in the face image to be detected by the calculated rotation matrix to obtain new positions of the actual key point positions after rotation. Therefore, a plurality of actual key point positions in the face image to be detected are rotated to positions aligned with the reference key point positions, and the face alignment effect is achieved.

Specifically, using Prussian analysis to obtain a set of key points corresponding to the face image to be detectedPreprocessing to obtain a standard key point set/>Rotation matrix

/>

Wherein the method comprises the steps ofThe representation rotation matrix is used for correspondingly rotating all key points in the face image to be detected, so that the key points are consistent with the direction of the reference face image.

And b4, obtaining a standard key point set according to the plurality of actual coordinate positions after the rotation matrix rotates.

And for each actual coordinate position in the face image to be detected, carrying out coordinate transformation by applying the calculated rotation matrix, and rotating the actual coordinate position to a reference key point position corresponding to the reference face image to obtain a standard key point set.

In the above embodiment, each reference key point coordinate in the reference key point set is subtracted from the reference centroid coordinate corresponding to the reference centroid, so as to obtain each standardized reference key point coordinate of the reference face image. Subtracting the actual centroid coordinates corresponding to the actual centroids from any one of the actual coordinate positions in the key point set to obtain the standardized actual coordinate positions of the face image to be detected. According to the standardized reference key point coordinates and the standardized actual coordinate positions, a rotation matrix capable of rotating a plurality of actual coordinate positions to a plurality of reference key point coordinate positions is calculated. And transforming the actual coordinate position according to the rotation matrix to obtain the rotated actual coordinate position. And obtaining the processed standard key point set through the rotated actual coordinate position. Therefore, by calculating the position information of the reference centroid and the actual centroid and the transformation of the rotation matrix, more accurate alignment processing can be realized, so that the positions of the face image to be detected and the reference face image are more consistent. By standardizing the coordinates of the reference key points, the positions of the actual coordinates and the transformation of the rotation matrix, the spatial displacement and rotation difference in the face image to be detected can be eliminated, so that the subsequent processing is more accurate and reliable. By obtaining the standard key point set, the complexity of subsequent processing tasks can be simplified, and the stability and accuracy of the algorithm can be improved.

Step S304, determining a rolling angle corresponding to the face image to be detected based on the relation between the reference key point set and the corresponding key points in the key point set.

Specifically, the step S304 includes:

Step S3041, performing inverse trigonometric function operation on the rotation matrix to obtain a rotation angle between the reference key point set and the key point set.

And performing inverse trigonometric function operation on the rotation matrix to calculate the rotation angle between the reference key point set and the key point set. This angle can help determine how much rotation is required for the actual keypoint to align with the reference keypoint location.

Specifically, the matrix is rotatedThe inverse trigonometric function operation can be obtained

Wherein the method comprises the steps ofRepresenting the rotation angle between the set of keypoints and the reference face image.

Step S3042, determining the rotation angle as a rolling angle corresponding to the face image to be detected.

And determining the rotation angle as a rolling angle corresponding to the face image to be detected. Specifically, by setting the rotation angle to the roll angle corresponding to the face image to be measured, the rotation posture of the face in the image can be more intuitively understood and described.

In the above embodiment, the rotation matrix is subjected to inverse trigonometric function operation to calculate the rotation angle between the reference key point set and the key point set. The calculated rotation angle is determined to be the corresponding rolling angle of the face image to be detected, so that the rotation condition of the face image to be detected relative to the reference face image can be described more accurately by determining the rolling angle based on the relation between the reference key point set and the key point set of the face image to be detected, and the accuracy of subsequent processing is improved. The rolling angle is determined, so that the processing flow of the face image to be detected can be simplified, the adjustment and the alignment are more visual and convenient, and the follow-up tasks such as face recognition and feature extraction are facilitated. Determining the roll angle can help stabilize rotational transformations that may exist during processing, thereby improving the stability and robustness of the algorithm, ensuring that the processing results are more reliable and accurate.

Step S305, calculating to obtain a yaw angle projection coefficient and a pitch angle projection coefficient of the face image to be detected by utilizing the characteristics of the key points in the standard key point set.

Specifically, the step S305 includes:

in step S3051, a difference between the standard keypoint set and the mean vector is calculated.

The difference value between the standard key point set and the mean value vector is calculated to measure the deviation condition between the key point position in the face image to be measured and the key point position in the reference face image.

And step S3052, multiplying the difference value by the yaw angle feature vector to obtain a yaw angle projection coefficient of the face image to be detected.

And multiplying the calculated difference value by the yaw angle feature vector to obtain a yaw angle projection coefficient of the face image to be detected. The yaw angle projection coefficient reflects the rotation degree of the human face on the horizontal plane and is helpful to describe the left-right deflection condition of the human face.

And step S3053, multiplying the difference value by the pitch angle characteristic vector to obtain a pitch angle projection coefficient of the face image to be detected.

And multiplying the same difference value by the pitch angle characteristic vector to obtain a pitch angle projection coefficient of the face image to be detected. The pitch angle projection coefficient is used for describing the rotation degree of the face on the vertical plane and helping to understand the up-down elevation angle condition of the face.

Specifically, a standard key point set is obtainedThen, the projection coefficient of the face gesture feature space can be calculated,

Wherein the method comprises the steps ofFor the projection coefficient of the face image to be detected in the yaw angle characteristic space,/>Is the projection coefficient of the pitch angle characteristic space.

In the above embodiment, the difference value is obtained by subtracting the mean value vector from the key point data of the face image to be detected, and the yaw angle projection coefficient of the face image to be detected is obtained by multiplying the difference value by the yaw angle feature vector extracted in advance, thereby reflecting the offset degree of the face image to be detected in the yaw angle direction. And multiplying the difference value by the pitch angle characteristic vector to obtain a pitch angle projection coefficient of the face image to be detected, and reflecting the offset degree of the face image to be detected in the pitch angle direction. Therefore, the human face can be more accurately positioned and identified by calculating the yaw angle and pitch angle projection coefficients of the human face image to be detected, and personalized human face feature extraction is realized. By using the calculation method of the feature vector and the projection coefficient, the human face gesture can be simply and efficiently described and analyzed, and the calculation efficiency and accuracy are improved. The feature vector obtained by the principal component analysis can better capture the main change information in the face data, and is helpful for accurately and effectively representing the change of the yaw angle and the pitch angle of the face.

In some alternative embodiments, between step S305 and step S306, comprising:

step b1, determining a first calculation result obtained by multiplying a standardized key point matrix and a yaw angle feature vector as a sample yaw angle projection coefficient; wherein the sample yaw angle projection coefficients comprise a plurality of projection coefficients of a plurality of face sample images on a yaw angle feature space.

Specifically, by matrixAnd feature vector/>The projection coefficient/>, on the yaw angle feature space, of a plurality of face sample images can be obtained by performing matrix dot product operation，/>

Wherein the method comprises the steps ofCorresponds to the projection coefficients of each face sample image on the yaw angle feature space.

And b2, calculating to obtain yaw angle fitting parameters according to a preset yaw angle polynomial model, a yaw angle reference attitude angle and a sample yaw angle projection coefficient.

Specifically, a third order polynomial model of yaw angle is builtProjection coefficient matrix/>, of face gesture sample data set in yaw angle feature spaceAnd labeling matrix of yaw angle/>Substituting the yaw angle polynomial model and converting it into matrix form

Wherein the method comprises the steps of

Solving the linear equation set by using a linear algebra method to obtain a parameter vector。

And b3, calculating to obtain a first mapping relation according to the yaw angle fitting parameter and the yaw angle polynomial model.

Specifically, it willSubstitution into a polynomial model of the yaw angle is available,

Equation (2) is a first mapping relationship between the yaw angle projection coefficient and the face attitude angle.

In the above embodiment, the normalized key point matrix is multiplied by the yaw angle feature vector to obtain the first calculation result, that is, the sample yaw angle projection coefficient. And calculating fitting parameters of the yaw angle based on a preset yaw angle polynomial model, a yaw angle reference attitude angle and a sample yaw angle projection coefficient. These parameters can be used to fit the change in the face image in the yaw angle direction. And calculating a first mapping relation by using the yaw angle fitting parameters and the yaw angle polynomial model, wherein the first mapping relation is used for mapping the sample yaw angle projection coefficient to the actual yaw angle. Therefore, the standard key point matrix is multiplied by the yaw angle feature vector to obtain the sample yaw angle projection coefficient, and the projection coefficient can effectively represent the change of the face image in the yaw angle direction, so that a foundation is provided for the subsequent fitting parameter calculation. Through a preset yaw angle polynomial model and a yaw angle reference attitude angle, flexible parameter setting and adjustment can be performed according to actual requirements so as to meet the accuracy requirements of yaw angles in different scenes. By calculating the yaw angle fitting parameter and the first mapping relation, the sample yaw angle projection coefficient can be mapped to the actual yaw angle measurement, so that the accuracy and the precision of yaw angle measurement are improved.

In some optional embodiments, before the step S306, the method further includes:

Step c1, determining a second calculation result obtained by multiplying the standardized key point matrix and the pitch angle characteristic vector as a sample pitch angle projection coefficient; the sample pitch angle projection coefficients comprise a plurality of projection coefficients of a plurality of face sample images on a pitch angle characteristic space.

Specifically, by matrixAnd feature vector/>The projection coefficient/>, on the pitch angle characteristic space, of the face sample images can be obtained by performing matrix dot product operation，

Wherein the method comprises the steps ofThe projection coefficients of each face sample image on the pitch angle characteristic space are corresponding to each element in the image.

And c2, calculating to obtain a pitch angle fitting parameter according to a preset pitch angle polynomial model, a pitch angle reference attitude angle and a sample pitch angle projection coefficient.

Specifically, a third order polynomial model of pitch angle is builtProjection coefficient matrix/>, of face pose sample data in pitch angle feature space, is collectedAnd labeling matrix of pitch angle/>Substituting the pitch angle polynomial model and converting into matrix form

Wherein the method comprises the steps of

And c3, calculating to obtain a second mapping relation according to the pitch angle fitting parameters and the pitch angle polynomial model.

Specifically, it willSubstitution into a pitch angle polynomial model can be obtained,

And (3) obtaining a second mapping relation between the pitch angle projection coefficient and the face attitude angle.

In the above embodiment, the normalized key point matrix and the pitch angle feature vector are multiplied to obtain the second calculation result, that is, the sample pitch angle projection coefficient. And calculating to obtain fitting parameters of the pitch angle based on a preset pitch angle polynomial model, a pitch angle reference attitude angle and a sample pitch angle projection coefficient. And calculating to obtain a second mapping relation by using the pitch angle fitting parameters and the pitch angle polynomial model, wherein the second mapping relation is used for mapping the sample pitch angle projection coefficient to the actual pitch angle measurement. Therefore, the standard key point matrix is multiplied by the pitch angle characteristic vector to obtain the sample pitch angle projection coefficient, and the projection coefficient can effectively represent the change of the face image in the pitch angle direction, so that a foundation is provided for the subsequent fitting parameter calculation. Through a preset pitch angle polynomial model and a pitch angle reference attitude angle, flexible parameter setting and adjustment can be performed according to actual requirements so as to meet the accuracy requirements of the pitch angle in different scenes. The sample pitch angle projection coefficient can be mapped to the actual pitch angle measurement by calculating the pitch angle fitting parameter and the second mapping relation, so that the precision and accuracy of pitch angle measurement are improved.

Step S306, according to the yaw angle projection coefficient and the first mapping relation between the yaw angle projection coefficient and the face attitude angle, the yaw angle of the face in the face image to be detected is calculated, and according to the pitch angle projection coefficient and the second mapping relation between the pitch angle projection coefficient and the face attitude angle, the pitch angle of the face in the face image to be detected is calculated.

Specifically, the projection coefficient in step S3053 is obtainedAnd/>Then, combining the first mapping relation and the second mapping relation obtained by the training data, the attitude angle of the face image to be detected can be predicted,

Wherein the method comprises the steps ofFor the estimated face yaw angle,/>Is the estimated pitch angle of the human face.

Step S307, determining the rolling angle, the yaw angle and the pitch angle as the face gesture prediction result of the face image to be detected.

Specifically, in step S3041，/>The method can be used for representing the face gesture of the face image to be detected, namely a face gesture prediction result.

The embodiment also provides a device for predicting the face pose of the driver, which is used for implementing the above embodiment and the preferred embodiment, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

The present embodiment provides a face pose prediction apparatus for a driver, as shown in fig. 5, including:

the first obtaining module 401 is configured to obtain a face image to be detected.

The identifying module 402 is configured to identify a plurality of key points in the face image to be detected, and obtain a set of key points.

The alignment module 403 is configured to perform alignment processing on the face image to be detected by using the pre-configured reference key point set and the key point set, so as to obtain a processed standard key point set.

The first determining module 404 is configured to determine a roll angle corresponding to the face image to be detected based on a relationship between the reference key point set and the corresponding key points in the key point set.

The first calculation module 405 is configured to calculate, using features of key points in the standard key point set, a yaw angle projection coefficient and a pitch angle projection coefficient of the face image to be measured.

The second calculating module 406 is configured to calculate a yaw angle of a face in the face image to be measured according to the yaw angle projection coefficient and a first mapping relationship between the yaw angle projection coefficient and the face pose angle, and calculate a pitch angle of the face in the face image to be measured according to the pitch angle projection coefficient and a second mapping relationship between the pitch angle projection coefficient and the face pose angle.

The second determining module 407 is configured to determine the roll angle, the yaw angle and the pitch angle as a face pose prediction result of the face image to be measured.

In some alternative embodiments, the apparatus further comprises:

The second acquisition module is used for acquiring a face gesture sample data set, wherein the face gesture sample data set comprises a plurality of face sample images marked with face key points and face gestures, the face key points correspond to different parts in a face, and the different parts are described through different numbers of face key points.

The extraction module is used for extracting a sample key point labeling set and a sample gesture angle labeling set of a plurality of face sample images from the face gesture sample data set. The sample key point labeling set is used for representing coordinate sets of a plurality of face key points of a plurality of face sample images, and the sample attitude angle labeling set is used for representing sample pitch angles and sample yaw angle sets corresponding to a plurality of face attitudes of the plurality of face sample images.

And the third determining module is used for taking a first coordinate average value of a plurality of sample coordinates corresponding to the specific key points included in the sample key point labeling set aiming at the specific key points in the face key points, and determining the first coordinate average value as the reference key point coordinates of the specific key points. The specific key point is any point in the key points of the human face.

And the fourth determining module is used for taking an angle average value of a plurality of face gesture angles corresponding to a plurality of face sample images included in the sample gesture angle annotation set, and determining the angle average value as a reference gesture angle of the face gesture sample data set. Wherein the reference attitude angles include a yaw angle reference attitude angle and a pitch angle reference attitude angle.

And a fifth determining module, configured to determine a plurality of reference keypoint coordinates and reference attitude angles as a reference keypoint set.

In some alternative embodiments, the alignment module 403 includes:

the first determining sub-module is used for taking a second coordinate average value for a plurality of reference key point coordinates in the reference key point set, and determining the second coordinate average value as a reference centroid of the reference face image corresponding to the reference key point set.

And the identification sub-module is used for identifying a plurality of actual coordinate positions corresponding to a plurality of key points in the key point set.

And the second determining submodule is used for taking a third coordinate average value for the actual coordinate positions and determining the third coordinate average value as the actual centroid of the face image to be detected.

And the alignment sub-module is used for carrying out alignment processing on the face image to be detected by utilizing the position information of the reference centroid and the actual centroid to obtain a processed standard key point set.

In some alternative embodiments, the alignment sub-module includes:

the first calculation unit is used for subtracting the reference centroid coordinates corresponding to the reference centroid from the reference key point coordinates in the reference key point set respectively to obtain the standardized reference key point coordinates of the reference face image.

The second calculation unit is used for subtracting the actual centroid coordinates corresponding to the actual centroid from any one of the actual coordinate positions in the key point set to obtain the standardized actual coordinate position of the face image to be detected.

And the third calculation unit is used for calculating a rotation matrix according to the standardized reference key point coordinates and the standardized actual coordinate positions so as to rotate the actual coordinate positions to the positions of the reference key point coordinates.

And the rotating unit is used for obtaining a standard key point set according to the plurality of actual coordinate positions after the rotation of the rotating matrix.

In some alternative embodiments, the first determination module 404 includes:

And the operation sub-module is used for performing inverse trigonometric function operation on the rotation matrix to obtain the rotation angle between the reference key point set and the key point set.

And the third determining submodule is used for determining the rotation angle as a rolling angle corresponding to the face image to be detected.

In some alternative embodiments, the apparatus further comprises:

The preprocessing module is used for preprocessing a plurality of face sample images in the face gesture sample data set according to the reference key point set so as to eliminate scale and position differences among the face sample images and obtain a face key point standardized set.

And the principal component analysis module is used for carrying out principal component analysis on the standardized set of the key points of the human face to obtain the corresponding relation between the change direction of the principal component space corresponding to each feature vector and the change of the human face posture.

In some alternative embodiments, the principal component analysis module includes:

And the construction submodule is used for constructing a two-dimensional key point matrix according to the face key point standardized set.

And the first computing sub-module is used for computing the mean value vector of the two-dimensional key point matrix.

And the second calculation sub-module is used for subtracting the mean vector from the two-dimensional key point matrix to obtain a standardized key point matrix.

And the transposition sub-module is used for transposing the standardized key point matrix to obtain a transposed matrix of the standardized key point matrix.

And the third calculation sub-module is used for calculating and obtaining a covariance matrix corresponding to the two-dimensional key point matrix based on the transpose matrix, the standardized key point matrix and the number of the plurality of face sample images in the face gesture sample data set.

And the eigenvalue decomposition sub-module is used for carrying out eigenvalue decomposition on the covariance matrix to obtain eigenvectors and eigenvalues. The feature vector is used for representing the change direction of each sample key point in the face key point standardization set in each principal component space. The eigenvalues are used to characterize the weight coefficients of the individual eigenvectors.

And the sorting sub-module is used for sorting the characteristic values according to the order from big to small to obtain a sorting result.

And the extraction sub-module is used for extracting a plurality of target feature values from the sorting result and confirming target feature vectors corresponding to the plurality of target feature values.

And the visual analysis sub-module is used for performing visual analysis on each target feature vector to obtain a visual analysis result. The visual analysis result is used for representing the change direction of the principal component space corresponding to the target feature vector.

And the selecting sub-module is used for selecting a yaw angle characteristic vector used for representing the change direction of the yaw angle of the face and a pitch angle characteristic vector used for representing the change direction of the pitch angle of the face from the target characteristic vectors according to the visual analysis result.

In some alternative embodiments, the first computing module 405 includes:

and the fourth computing sub-module is used for computing the difference value between the standard key point set and the mean value vector.

And the fifth calculation sub-module is used for multiplying the difference value and the yaw angle characteristic vector to obtain a yaw angle projection coefficient of the face image to be detected.

And the sixth calculation sub-module is used for multiplying the difference value and the pitch angle characteristic vector to obtain a pitch angle projection coefficient of the face image to be detected.

In some alternative embodiments, the apparatus further comprises:

And the third calculation module is used for determining a first calculation result obtained by multiplying the standardized key point matrix and the yaw angle characteristic vector as a sample yaw angle projection coefficient. Wherein the sample yaw angle projection coefficients comprise a plurality of projection coefficients of a plurality of face sample images on a yaw angle feature space.

And the fourth calculation module is used for calculating and obtaining yaw angle fitting parameters according to a preset yaw angle polynomial model, a yaw angle reference attitude angle and a sample yaw angle projection coefficient.

And the fifth calculation module is used for calculating to obtain a first mapping relation according to the yaw angle fitting parameter and the yaw angle polynomial model.

In some alternative embodiments, the apparatus further comprises:

And the sixth calculation module is used for determining a second calculation result obtained by multiplying the standardized key point matrix and the pitch angle characteristic vector as a sample pitch angle projection coefficient. The sample pitch angle projection coefficients comprise a plurality of projection coefficients of a plurality of face sample images on a pitch angle characteristic space.

And the seventh calculation module is used for calculating to obtain a pitch angle fitting parameter according to a preset pitch angle polynomial model, a pitch angle reference attitude angle and a sample pitch angle projection coefficient.

And the eighth calculation module is used for calculating to obtain a second mapping relation according to the pitch angle fitting parameter and the pitch angle polynomial model.

Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.

The face pose prediction apparatus of the driver in this embodiment is presented in the form of a functional unit, where the unit refers to an ASIC (Application SPECIFIC INTEGRATED Circuit) Circuit, a processor and a memory that execute one or more software or fixed programs, and/or other devices that can provide the above functions.

According to the face gesture prediction device for the driver, after the face image to be detected is obtained, the face image to be detected is analyzed, a plurality of key points are identified, and the key points can reflect different characteristics and positions of the face. And carrying out alignment processing by utilizing a pre-configured reference key point set and a key point set of the face to be detected to obtain a standard key point set so as to eliminate deviation caused by the gesture of the face and enable the subsequent calculation to be more accurate. And determining the rolling angle of the face image to be detected, namely the rotation angle of the face on the horizontal plane by comparing the relation between the reference key point set and the key point set of the face image to be detected. And calculating the projection coefficients of the yaw angle and the pitch angle of the face image to be detected by utilizing the characteristics of the key points in the standard key point set. And calculating the yaw angle and the pitch angle of the face in the face image to be detected according to the projection coefficients of the yaw angle and the pitch angle and the mapping relation between the projection coefficients and the face attitude angle. And finally, the calculated rolling angle, yaw angle and pitch angle are used as the face gesture prediction result of the face image to be detected, so that the calculation dimension is greatly reduced, the interference of image blurring, illumination change and partial shielding on the picture pixel information is avoided, the accuracy and reliability of face gesture prediction are improved on the premise of accurate feature point detection, and the face gesture prediction method is simple, efficient and wider in application range.

The embodiment of the invention also provides computer equipment, which is provided with the face gesture prediction device of the driver shown in the figure 5.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, as shown in fig. 6, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 6.

The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.

Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform the methods shown in implementing the above embodiments.

The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the computer device, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.

The computer device further comprises input means 30 and output means 40. The processor 10, memory 20, input device 30, and output device 40 may be connected by a bus or other means, for example in fig. 6.

The input device 30 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus, such as a touch screen, a keypad, a mouse, a trackpad, a touchpad, a pointer stick, one or more mouse buttons, a trackball, a joystick, and the like. The output means 40 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. Such display devices include, but are not limited to, liquid crystal displays, light emitting diodes, displays and plasma displays. In some alternative implementations, the display device may be a touch screen.

The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.

Portions of the present invention may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or aspects in accordance with the present invention by way of operation of the computer. Those skilled in the art will appreciate that the form of computer program instructions present in a computer readable medium includes, but is not limited to, source files, executable files, installation package files, etc., and accordingly, the manner in which the computer program instructions are executed by a computer includes, but is not limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installed program. Herein, a computer-readable medium may be any available computer-readable storage medium or communication medium that can be accessed by a computer.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims

1.A method for predicting a face pose of a driver, the method comprising:

Acquiring a face image to be detected;

Identifying a plurality of key points in the face image to be detected to obtain a key point set;

Aligning the face image to be detected by utilizing a pre-configured reference key point set and the key point set to obtain a processed standard key point set;

Determining a rolling angle corresponding to the face image to be detected based on the relation between the reference key point set and the corresponding key points in the key point set;

Calculating a yaw angle projection coefficient and a pitch angle projection coefficient of the face image to be detected by utilizing the characteristics of key points in the standard key point set;

According to the yaw angle projection coefficient and a first mapping relation between the yaw angle projection coefficient and the face attitude angle, calculating the yaw angle of the face in the face image to be detected, and according to the pitch angle projection coefficient and a second mapping relation between the pitch angle projection coefficient and the face attitude angle, calculating the pitch angle of the face in the face image to be detected;

And determining the rolling angle, the yaw angle and the pitch angle as a face gesture prediction result of the face image to be detected.

2. The method according to claim 1, wherein before the acquiring the face image to be measured, the method further comprises:

Acquiring a face pose sample data set, wherein the face pose sample data set comprises a plurality of face sample images marked with face key points and face poses, the face key points correspond to different parts in a face, and the different parts are described through different numbers of the face key points;

Extracting a sample key point labeling set and a sample gesture angle labeling set of the plurality of face sample images from the face gesture sample data set; the sample key point labeling set is used for representing coordinate sets of a plurality of face key points of the plurality of face sample images, and the sample attitude angle labeling set is used for representing sample pitch angles and sample yaw angle sets corresponding to a plurality of face attitudes of the plurality of face sample images;

aiming at a specific key point in the face key points, taking a first coordinate average value of a plurality of sample coordinates corresponding to the specific key point included in the sample key point labeling set, and determining the first coordinate average value as a reference key point coordinate of the specific key point; wherein the specific key point is any point in the face key points;

taking an angle average value of a plurality of face gesture angles corresponding to the face sample images included in the sample gesture angle annotation set, and determining the angle average value as a reference gesture angle of the face gesture sample data set; wherein the reference attitude angles include a yaw angle reference attitude angle and a pitch angle reference attitude angle;

and determining a plurality of the reference key point coordinates and the reference gesture angles as the reference key point set.

3. The method according to claim 2, wherein performing alignment processing on the face image to be detected by using a pre-configured reference key point set and the key point set to obtain a processed standard key point set, includes:

taking a second coordinate average value for a plurality of reference key point coordinates in the reference key point set, and determining the second coordinate average value as a reference centroid of a reference face image corresponding to the reference key point set;

Identifying a plurality of actual coordinate positions corresponding to a plurality of key points in the key point set;

taking a third coordinate average value from the actual coordinate positions, and determining the third coordinate average value as an actual centroid of the face image to be detected;

And carrying out alignment processing on the face image to be detected by utilizing the position information of the reference centroid and the actual centroid to obtain a processed standard key point set.

4. A method according to claim 3, wherein the aligning the face image to be detected by using the related position information of the reference centroid and the actual centroid to obtain a processed standard key point set includes:

Subtracting the reference centroid coordinates corresponding to the reference centroids from the reference key point coordinates in the reference key point set respectively to obtain standardized reference key point coordinates of the reference face image;

subtracting the actual centroid coordinates corresponding to the actual centroid from any one of the actual coordinate positions in the key point set to obtain a standardized actual coordinate position of the face image to be detected;

Calculating a rotation matrix according to the standardized reference key point coordinates and the standardized actual coordinate positions so as to rotate the actual coordinate positions to the positions of the reference key point coordinates;

And obtaining the standard key point set according to the plurality of actual coordinate positions after the rotation matrix rotates.

5. The method of claim 4, wherein determining the roll angle corresponding to the face image to be detected based on the relationship between the reference set of keypoints and corresponding keypoints in the set of keypoints comprises:

Performing inverse trigonometric function operation on the rotation matrix to obtain a rotation angle between the reference key point set and the key point set;

And determining the rotation angle as a rolling angle corresponding to the face image to be detected.

6. The method of claim 2, wherein after determining the plurality of the reference keypoint coordinates and the reference pose angle as the reference keypoint set, further comprising:

Preprocessing the plurality of face sample images in the face gesture sample data set according to the reference key point set to eliminate scale and position differences among the plurality of face sample images, so as to obtain a face key point standardization set;

and carrying out principal component analysis on the face key point standardization set to obtain the corresponding relation between the change direction of the principal component space corresponding to each symptom vector and the face posture change.

7. The method of claim 6, wherein the performing principal component analysis on the face keypoint standardization set to obtain a correspondence between a change direction of a principal component space corresponding to each symptom vector and a face pose change includes:

Constructing a two-dimensional key point matrix according to the face key point standardized set;

Calculating the mean vector of the two-dimensional key point matrix;

subtracting the mean value vector from the two-dimensional key point matrix to obtain a standardized key point matrix;

transposing the standardized key point matrix to obtain a transposed matrix of the standardized key point matrix;

Calculating to obtain a covariance matrix corresponding to the two-dimensional key point matrix based on the transpose matrix, the standardized key point matrix and the number of a plurality of face sample images in the face gesture sample data set;

Performing eigenvalue decomposition on the covariance matrix to obtain eigenvectors and eigenvalues; the feature vector is used for representing the change direction of each sample key point in the face key point standardization set in each principal component space; the characteristic values are used for representing weight coefficients of the characteristic vectors;

Sequencing the characteristic values according to the sequence from big to small to obtain a sequencing result;

Extracting a plurality of target feature values from the sorting result, and confirming target feature vectors corresponding to the plurality of target feature values;

Performing visual analysis on each target feature vector to obtain a visual analysis result; the visual analysis result is used for representing the change direction of the principal component space corresponding to the target feature vector;

And selecting a yaw angle characteristic vector used for representing the change direction of the yaw angle of the face and a pitch angle characteristic vector used for representing the change direction of the pitch angle of the face from the target characteristic vectors according to the visual analysis result.

8. The method according to claim 7, wherein calculating a yaw angle projection coefficient and a pitch angle projection coefficient of the face image to be measured using the features of the key points in the standard key point set, includes:

calculating the difference value between the standard key point set and the mean vector;

multiplying the difference value by the yaw angle feature vector to obtain a yaw angle projection coefficient of the face image to be detected;

and multiplying the difference value by the pitch angle characteristic vector to obtain a pitch angle projection coefficient of the face image to be detected.

9. The method of claim 7, wherein prior to calculating the yaw angle of the face in the face image to be measured, the method further comprises:

Determining a first calculation result obtained by multiplying the standardized key point matrix and the yaw angle characteristic vector as a sample yaw angle projection coefficient; wherein the sample yaw angle projection coefficients comprise a plurality of projection coefficients of the plurality of face sample images on a yaw angle feature space;

Calculating to obtain yaw angle fitting parameters according to a preset yaw angle polynomial model, a yaw angle reference attitude angle and the sample yaw angle projection coefficient;

and calculating to obtain the first mapping relation according to the yaw angle fitting parameter and the yaw angle polynomial model.

10. The method of claim 7, further comprising, prior to calculating a pitch angle of a face in the face image to be measured:

determining a second calculation result obtained by multiplying the standardized key point matrix and the pitch angle characteristic vector as a sample pitch angle projection coefficient; the sample pitch angle projection coefficients comprise a plurality of projection coefficients of the plurality of face sample images on a pitch angle characteristic space;

Calculating to obtain a pitch angle fitting parameter according to a preset pitch angle polynomial model, a pitch angle reference attitude angle and the sample pitch angle projection coefficient;

and calculating to obtain the second mapping relation according to the pitch angle fitting parameter and the pitch angle polynomial model.

11. The method according to claim 9 or 10, wherein the yaw angle polynomial model and the pitch angle polynomial model are both cubic polynomial models.

12. A face pose prediction apparatus of a driver, the apparatus comprising:

the first acquisition module is used for acquiring a face image to be detected;

The identification module is used for identifying a plurality of key points in the face image to be detected to obtain a key point set;

the alignment module is used for performing alignment processing on the face image to be detected by utilizing a pre-configured reference key point set and the key point set to obtain a processed standard key point set;

The first determining module is used for determining a rolling angle corresponding to the face image to be detected based on the relation between the reference key point set and the corresponding key points in the key point set;

the first calculation module is used for calculating and obtaining a yaw angle projection coefficient and a pitch angle projection coefficient of the face image to be detected by utilizing the characteristics of key points in the standard key point set;

the second calculation module is used for calculating the yaw angle of the face in the face image to be detected according to the yaw angle projection coefficient and the first mapping relation between the yaw angle projection coefficient and the face attitude angle, and calculating the pitch angle of the face in the face image to be detected according to the pitch angle projection coefficient and the second mapping relation between the pitch angle projection coefficient and the face attitude angle;

and the second determining module is used for determining the rolling angle, the yaw angle and the pitch angle as a face gesture prediction result of the face image to be detected.