CN111291590B

CN111291590B - Driver fatigue detection method, driver fatigue detection device, computer equipment and storage medium

Info

Publication number: CN111291590B
Application number: CN201811485916.5A
Authority: CN
Inventors: 彭斐; 毛茜; 何俏君; 尹超凡; 李彦琳; 谷俊
Original assignee: Guangzhou Automobile Group Co Ltd
Current assignee: Guangzhou Automobile Group Co Ltd
Priority date: 2018-12-06
Filing date: 2018-12-06
Publication date: 2021-03-19
Anticipated expiration: 2038-12-06
Also published as: CN111291590A

Abstract

The invention relates to a method, a device, computer equipment and a storage medium for detecting fatigue of a driver, wherein the method comprises the following steps: acquiring a face video of a target driver, and respectively detecting the opening degree of human eyes of each frame of face image in the face video to obtain the opening degree value of human eyes in each frame of face image; determining a first opening threshold value and a second opening threshold value according to each human eye opening value, wherein the first opening threshold value is larger than the second opening threshold value; according to each human eye opening value, the first opening threshold and the second opening threshold, counting a first image frame value of which the human eye opening value is smaller than or equal to the first opening threshold and a second image frame value of which the human eye opening value is smaller than or equal to the second opening threshold; and if the ratio of the first image frame value to the second opening threshold value is greater than a preset fatigue judgment threshold value, judging that the target driver is in a fatigue state. By adopting the scheme of the invention, the accuracy of the fatigue detection result can be improved.

Description

Driver fatigue detection method, driver fatigue detection device, computer equipment and storage medium

Technical Field

The invention relates to the technical field of image processing, in particular to a method and a device for detecting fatigue of a driver, computer equipment and a storage medium.

Background

Traffic accidents have always been one of the most serious threats to the security of lives and properties faced by mankind, and most of them occur due to human factors of drivers. In the driving process of a vehicle, fatigue of a driver is one of important reasons for causing a malignant traffic accident, and the traffic safety is seriously damaged.

With the development of image recognition processing technology, the fatigue state of a driver is judged and an alarm is given out by recognizing and processing the facial image information of the driver in the driving process, so that a new solution is provided for preventing traffic accidents.

The traditional driver fatigue detection method based on facial image information is to identify the opening and closing states of human eyes and finally carry out fatigue detection through the detection states of continuous frames, and the method is low in accuracy of detection results.

Disclosure of Invention

In view of the above, it is necessary to provide a driver fatigue detection method, apparatus, computer device, and storage medium capable of improving the detection accuracy.

A driver fatigue detection method, the method comprising:

acquiring a face video of a target driver, and respectively detecting the opening degree of human eyes of each frame of face image in the face video to obtain the opening degree value of human eyes in each frame of face image;

determining a first opening threshold value and a second opening threshold value according to each human eye opening value, wherein the first opening threshold value is larger than the second opening threshold value;

according to each human eye opening value, the first opening threshold and the second opening threshold, counting a first image frame value of which the human eye opening value is smaller than or equal to the first opening threshold and a second image frame value of which the human eye opening value is smaller than or equal to the second opening threshold;

if the ratio of the first image frame value to the second image frame value is larger than a preset fatigue judgment threshold value, judging that the target driver is in a fatigue state

A driver fatigue method apparatus, the apparatus comprising:

the detection module is used for acquiring a face video of a target driver, and detecting the opening degree of human eyes of each frame of face image in the face video to obtain the opening degree value of human eyes of each frame of face image;

the processing module is used for determining a first opening threshold value and a second opening threshold value according to each human eye opening value, wherein the first opening threshold value is larger than the second opening threshold value;

a counting module, configured to count, according to each of the eye opening values, the first opening threshold and the second opening threshold, a first image frame value of which the eye opening value is smaller than or equal to the first opening threshold, and a second image frame value of which the eye opening value is smaller than or equal to the second opening threshold;

and the judging module is used for judging that the target driver is in a fatigue state if the ratio of the first image frame value to the second image frame value is greater than a preset fatigue judging threshold value.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

and if the ratio of the first image frame value to the second image frame value is greater than a preset fatigue judgment threshold value, judging that the target driver is in a fatigue state.

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

According to the driver fatigue detection method, the driver fatigue detection device, the computer equipment and the storage medium, a first opening threshold value and a second opening threshold value are determined according to the eye opening values in the face images of the frames of the face video, eye states are distinguished based on the first opening threshold value and the second opening threshold value, the target driver is judged to be in a fatigue state based on the ratio of a first image frame value of which the eye opening value is greater than the first opening threshold value to a second image frame value of which the eye opening value is smaller than the second opening threshold value and a preset fatigue judgment threshold value, and the accuracy of a detection result can be improved.

Drawings

FIG. 1 is a diagram of an exemplary driver fatigue detection method;

FIG. 2 is a flow diagram of a driver fatigue detection method in one embodiment;

FIG. 3 is a schematic diagram illustrating a process of obtaining an opening value of a human eye according to an embodiment;

FIG. 4 is a schematic diagram illustrating a process for obtaining a first opening degree threshold and a second opening degree threshold according to an embodiment;

FIG. 5 is a schematic diagram illustrating an exemplary process for obtaining a facial feature image;

FIG. 6 is a schematic diagram of a training process for an eye feature point localization model in one embodiment;

FIG. 7 is a DPM feature extraction schematic in one embodiment;

FIG. 8 is a diagram of a 2-fold spatial model (right) after Gaussian filtering of a root filter (left) component filter in one embodiment;

FIG. 9 is a diagram illustrating the comparison of the effects of a conventional Hog + SVM and an applied DPM + Latent-SVM in one embodiment (a) and the formula comparison (b);

FIG. 10 is a diagram illustrating the effects of cascading iterations in one embodiment;

FIG. 11 is a diagram that illustrates a hybrid tree model that encodes topology changes due to viewpoints, in one embodiment;

FIG. 12 is a diagram illustrating the positioning results of the human eye feature points in one embodiment;

FIG. 13 is a block diagram showing the construction of a driver fatigue method apparatus in another embodiment;

fig. 14 is an internal structural view of a computer device in another embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.

It should be noted that the terms "first" and "second" and the like in the description, the claims, and the drawings of the present application are used for distinguishing similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein.

The driver fatigue detection method provided by the application can be applied to the application environment shown in fig. 1. The infrared camera collects video information of a driver, and the video information collected by the infrared camera can be input into the terminal to carry out a driver fatigue method. Wherein, the preferred mounted position of infrared camera is on the steering column of car steering wheel below. The infrared camera can communicate with the terminal in a wired or wireless mode. The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, vehicle-mounted terminals, and portable wearable devices.

In one embodiment, as shown in fig. 2, a method for detecting fatigue of a driver is provided, which is described by taking the method as an example of being applied to a terminal, and comprises the following steps:

step S201: acquiring a face video of a target driver, and respectively detecting the opening degree of human eyes of each frame of face image in the face video to obtain the opening degree value of human eyes in each frame of face image;

here, the face video is obtained by photographing the face of the target driver.

Specifically, a face video of a target driver in one detection period may be acquired, and eye opening detection may be performed on each frame of face image in the face video, so as to obtain an eye opening value in each frame of the face image. The size of the detection period can be set according to actual needs.

Step S202: determining a first opening threshold value and a second opening threshold value according to each human eye opening value, wherein the first opening threshold value is larger than the second opening threshold value;

generally, the first opening degree threshold value is also smaller than the maximum opening degree value of the eyes of the target driver, and the second opening degree threshold value is also larger than 0, the first opening degree threshold value being a threshold value for determining whether the eyes are in a fully open state, and the second opening degree threshold value being a threshold value for determining whether the eyes are in a closed state.

Step S203: according to each human eye opening value, the first opening threshold and the second opening threshold, counting a first image frame value of which the human eye opening value is smaller than or equal to the first opening threshold and a second image frame value of which the human eye opening value is smaller than or equal to the second opening threshold;

the eye opening value is larger than the first opening threshold value and indicates that the eyes are in a fully opened state, and the eye opening value is smaller than the second opening threshold value and indicates that the eyes are in a closed state. The first image frame value is the number of image frames in the face video whose eye opening value is less than or equal to the first opening threshold, and the second image frame value is the number of image frames in the face video whose eye opening value is less than or equal to the second opening threshold.

Specifically, a first image frame value and a second image frame value in one detection period may be counted according to each of the human eye opening value, the first opening threshold value, and the second opening threshold value.

Step S204: if the ratio of the first image frame value to the second opening threshold value is larger than a preset fatigue judgment threshold value, judging that the target driver is in a fatigue state;

the fatigue determination threshold value may be set according to actual conditions.

In the driver fatigue detection method, a face video of a target driver is obtained, the eye opening degree of each frame of face image in the face video is detected respectively to obtain the eye opening degree value of each frame of face image, determining a first opening degree threshold value and a second opening degree threshold value according to each human eye opening degree value, wherein the first opening degree threshold value is larger than the second opening degree threshold value, counting a first image frame numerical value of which the human eye opening value is smaller than or equal to the first opening threshold value according to each human eye opening value, the first opening threshold value and the second opening threshold value, and a second image frame value with the human eye opening value smaller than or equal to the second opening threshold value, and if the ratio of the first image frame value to the second image frame value is larger than a preset fatigue judgment threshold value, judging that the target driver is in a fatigue state. In the scheme of this embodiment, the eye states are distinguished based on the first opening threshold and the second opening threshold, and the target driver is determined to be in the fatigue state based on the ratio of the first image frame value with the human eye opening value larger than the first opening threshold to the second image frame value with the human eye opening value smaller than the second opening threshold and the preset fatigue determination threshold, so that the accuracy of the detection result can be improved.

In one embodiment, in order to eliminate the influence of the change of the distance between the human eye and the camera on the calculation of the human eye opening degree, a processing mode of normalizing the human eye opening degree value is provided. As shown in fig. 3, specifically, the above-mentioned detecting the eye opening degree of each frame of face image in the face video to obtain the eye opening degree value of each frame of face image may include:

step S301: respectively carrying out eye feature point positioning on each frame of the face image to obtain eye feature points in each frame of the face image;

step S302: respectively determining a human eye interpupillary distance value and a human eye opening original value in the face image of each frame according to the eye feature points in the face image of each frame;

specifically, the calibration point of the upper eyelid and the calibration point of the lower eyelid directly facing the pupil in each frame of the face image may be determined respectively according to the eye feature points in each frame of the face image, and the original value of the eye opening may be calculated according to the calibration point of the upper eyelid and the calibration point of the lower eyelid, where the original value of the eye opening may be equal to the distance between the calibration point of the upper eyelid and the calibration point of the lower eyelid.

Step S303: and respectively determining the human eye opening value in the face image of each frame according to the human eye interpupillary distance value and the human eye opening original value in the face image of each frame.

In particular, can be according to

Determining an eye opening value in each frame of the face image, wherein H_iRepresenting the opening value of human eye in the face image of the ith frame h_iIndicating the original value of the opening degree of the human eye in the face image of the ith frame, l_iThe value of the interpupillary distance of the human eyes in the face image of the ith frame is shown, C represents a correction parameter, and the size of the correction parameter can be set or adjusted according to the situation.

By adopting the scheme in the embodiment, the influence of the change of the distance between the human eyes and the camera on the calculation of the human eye opening degree can be effectively eliminated.

In one embodiment, as shown in fig. 4, the determining the first opening threshold and the second opening threshold according to each of the eye opening values includes:

step S401: determining the maximum eye opening degree value according to the eye opening degree values;

specifically, the sizes of the eye opening values may be compared to obtain a maximum value of the eye opening values, and the maximum value may be used as the maximum eye opening value. Or sequencing the eye opening degree values in the descending order, taking the average value of the first N eye opening degree values in the sequencing, and taking the average value as the maximum eye opening degree value.

Step S402: multiplying the maximum opening degree value by a preset first proportional coefficient and a preset second proportional coefficient respectively to obtain a first opening degree threshold value and a second opening degree threshold value;

the first scaling factor is greater than the second scaling factor, and generally, the first scaling factor and the second scaling factor are both values greater than 0 and less than 1, and the magnitudes of the first scaling factor and the second scaling factor can be selected according to actual needs, preferably, the first scaling factor is 0.8, and the second scaling factor is 0.2.

In this embodiment, the first opening threshold and the second opening threshold are obtained by multiplying the maximum opening value by a preset first proportional coefficient and a preset second proportional coefficient, so that the algorithm is easy to implement, the first opening threshold and the second opening threshold are determined based on the maximum opening value of the human eye, and the maximum opening value of the human eye is determined by the opening values of the human eye, so that the detection accuracy can be further improved.

In one embodiment, as shown in fig. 5, the above-mentioned performing the facial feature point positioning on the facial image of each frame to obtain each facial feature image may include:

step S501: extracting a first DPM feature map, wherein the first DPM feature map is a DPM feature map of a current face image, and the current face image is any one frame of face image;

step S502: sampling the first DPM feature map, and extracting a second DPM feature map, wherein the second DPM feature map is a DPM feature map of an image obtained by sampling the first DPM feature map;

step S503: performing convolution operation on the first DPM characteristic diagram by using a pre-trained root filter to obtain a response diagram of the root filter;

step S504: performing convolution operation on the N times of the second DPM characteristic diagram by using a pre-trained component filter to obtain a response diagram of the component filter, wherein the resolution of the component filter is N times of that of the root filter, and N is a positive integer;

step S505: obtaining a target response diagram according to the response diagram of the root filter and the response diagram of the component filter;

step S506: and acquiring a current face feature image according to the target response image.

The facial images of the frames may be respectively used as the current facial image in this embodiment, and the steps S501 to S506 are respectively adopted to perform the facial feature point positioning, so as to obtain the facial feature images corresponding to the facial images of the frames.

In the embodiment, the face detection is performed by adopting the DPM target detection algorithm, so that the detection accuracy of the algorithm is improved, and the false detection rate and the missing detection rate can be reduced at the same time.

In one embodiment, the performing eye feature point positioning on each frame of the face image to obtain eye feature points in each frame of the face image may include: respectively carrying out face feature point positioning on the face image of each frame to obtain each face feature image; and respectively inputting each human face feature image into a preset eye feature point positioning model to obtain eye feature points in each frame of the human face image.

In one embodiment, as shown in fig. 6, the training process of the eye feature point location model may include:

step S601: acquiring a pixel value of each pixel point of a target image and a feature vector of each pixel point;

in this embodiment, the model used may be based on a hybrid tree and a shared component V pool. Each facial landmark is modeled as a part and global blending is used to capture topological changes due to the viewpoint.

Step S602: configuring a tree structure local model according to the pixel values and the feature vectors, and determining a score function at the L part;

wherein the scoring function is S (I, L, m) App_m(I,L)+Shape_m(L)+α^m，

I denotes the target image,/_i＝(x_i,y_i) Representing the pixel value of the ith pixel point of the target image, w representing a partial model, m representing that the tree structure is a mixed type, the partial model is obtained by modeling each facial feature point in the target image as a part, a, b, c and d representing elastic parameters, and alpha representing a mixed offset scalar;

in the present embodiment, the first and second electrodes are,

find out at_iTemplate at position i (i-th pixel point)

Sum of phi (I, l)_i) An object image l is represented_iA feature vector at the pixel.

In the present embodiment, the first and second electrodes are,

expressed is the permutation score of the mixed-type specific spatial L permutation, where dx ═ x_i-x_jAnd dy ═ y_i-y_jIndicating the displacement of the ith part to the jth part. Each parameter in the formula (referred to as a, b, c, d) can be interpreted as a spatial constraint between the different parts.

Step S603: obtaining optimal configuration parameters of each part of each hybrid type by calculating values of L and m which enable the score function to obtain a maximum value;

in particular, all hybrids may be enumerated, finding the best configuration parameters for each of the components for each hybrid.

Step S604: establishing a training sample set, wherein the training sample set comprises a positive sample and a negative sample which are set with labels, the positive sample is an image containing a human face, and the negative sample is an image not containing the human face;

in particular, assume a fully supervised scene, in which there are positive examples and mixed labels that contain faces and negative examples that do not contain faces. The shape parameters and appearance parameters may be learned differentially with a structural prediction framework.

Step S605: constructing a target vector according to the partial model, the elasticity parameters and the mixed bias scalar, and modifying the score function according to the target vector;

specifically, the partial model w, the elastic parameters (a, b, c, d), and the mixed bias scalar α are all placed into a vector β, and the scoring function described above is modified to the form: s (I, z) ═ β · Φ (I, z). The vector Φ (I, z) is sparse, having non-zero terms in a single interval corresponding to the hybrid m.

Step S606: learning to obtain the eye characteristic point positioning model according to the training sample set, the optimal configuration parameters, the modified score function and a predefined target prediction function;

wherein, the eye characteristic point location model obtained by learning is as follows:

wherein β represents the target vector, z_n＝{L_n,m_nC represents the penalty factor of the objective function, ξ_nAnd the penalty term of the nth sample is represented, pos and neg respectively represent a positive sample and a negative sample, K represents the number of the target vectors, and K represents the number of the corresponding target vectors.

In the embodiment, the human face feature points and the eye feature points are positioned by using a machine learning algorithm and are respectively positioned, so that the positioning accuracy is very high, the generalization capability on illumination and posture is very strong, and the accuracy of calculating the opening and closing degree of the eyes can be improved.

In order to facilitate an understanding of the present invention, a preferred embodiment of the present invention will be described in detail.

The driver fatigue detection method in this embodiment includes the steps of: the first step is as follows: inputting video information; the second step is that: detecting a human face; the third step: positioning face feature points; the fourth step: positioning human eye characteristic points; the fifth step: blink detection-calculating the eye opening, fatigue detection analysis.

The first step is as follows: and collecting video information. The monocular infrared camera (arranged on a steering column below a steering wheel) for inputting video information inputs face state information (images) of a driver in a driving process in real time. The frequency of the video input is 30Hz and the image size per frame is 1280 × 1080 pixels. Wherein, the infrared camera can adapt to the different light condition in the car, accurately catches driver head gesture information and facial information.

The second step is that: and detecting the human face. For each frame of image of the input video, the present embodiment performs face detection by using a dpm (deformable Part model) target detection algorithm. The DPM algorithm is applied to part of principles in the HOG algorithm: firstly, graying the picture; then, as equation (1), the input image is normalized (normalized) in color space by using a Gamma correction method:

I(x,y)＝I(x,y)^gamma (1)

the value of gamma is seen from specific conditions (for example, 1/2 can be taken), so that local shadow and illumination change of the image can be effectively reduced; next, gradient calculation is performed, wherein the gradient reflects the change between adjacent pixels, the change between adjacent pixels is relatively flat, the gradient is smaller, and the gradient is larger, and the gradient of any point pixel (x, y) of the simulation image f (x, y) is a vector:

wherein G is_xIs a gradient in the x-direction, G_yIs the gradient along the y-direction, and the magnitude and direction angle of the gradient can be expressed by the following formula:

pixel points in the digital image are calculated using the difference:

since the detection effect obtained by the gradient operation using a simple one-dimensional discrete differential template [ -1,0,1] is the best, the calculation formula used is as follows:

the value, the magnitude and direction of its gradient, is calculated as follows:

then, the whole target picture is divided into cell units (cells) which are not overlapped with each other and have the same size, and then the gradient size and direction of each cell unit are calculated. DPM retained the cell units of the HOG map and then normalized a certain cell unit on the map (8 × 8 cell unit in fig. 7) to its four cells in the diagonal neighborhood. Extracting signed HOG gradients, 0-360 degrees will yield 18 gradient vectors, extracting unsigned HOG gradients, 0-180 degrees will yield 9 gradient vectors. DPM extracts only unsigned features, generates 4 × 9-36 dimensional features, adds rows and columns to form 13 feature vectors (9 columns and 4 rows as shown in fig. 7), adds extracted 18-dimensional signed gradient features (18 columns and 18 rows as shown in fig. 18) to further improve accuracy, and finally obtains 13+ 18-31 dimensional gradient features.

As shown in fig. 8, the DPM model employs an 8 × 8 resolution Root filter (left) and a 4 × 4 resolution component filter (middle). Wherein the resolution of the middle graph is 2 times that of the left graph, and the size of the component filter is 2 times that of the root filter, so the gradient can be seen more finely. The right image is a 2-fold spatial model after gaussian filtering.

Firstly, a DPM feature map (DPM feature map of an original image) is extracted from an input image, gaussian pyramid upsampling (scaling image) is performed, and then the DPM feature map of the gaussian pyramid upsampled image is extracted. And carrying out convolution operation on the DPM characteristic diagram of the original image and the trained root filter to obtain a response diagram of the root filter. Meanwhile, performing convolution operation on the DPM characteristic diagram (sampled on a Gaussian pyramid) of the extracted 2-time image by using a trained component filter to obtain a response diagram of the component filter. The resulting response maps of the component filters are subjected to a fine gaussian downsampling operation so that the response maps of the root filter and the component filters have the same resolution. And finally, carrying out weighted average on the two images to obtain a final response image, wherein the response effect is better when the brightness is higher, and the human face is detected. Wherein the response value is expressed in the following formula:

wherein x is₀,y₀,l₀Respectively representing the abscissa, ordinate and scale of the characteristic point;

is the response score of the root model;

response points of the component model; 2 (x)₀,y₀) The pixel representing the component model is 2 times the original; b is the offset coefficient between different model components for alignment with the model; 2 (x)₀,y₀) The pixels representing the component model are 2 times the original, so pixel x 2; v. of_iThe deviation coefficient between the pixel point and the ideal detection point is obtained; wherein the detailed response score formula of the component model is as follows:

similar to equation (8), we expect the objective function (D)_i,l(x, y)) the larger the value, the better, the variable is dx, dy. Further, in the above formula, (x, y) is the position of the ideal model trained; dx, dy is the offset of the ideal model position, and the range is the position from the ideal position to the picture edge; r_i,l(x + dx, y + dy) is the matching score of the component model; d_i*Φ_d(dx, dy) is the offset loss score for the component; d_iIs the offset loss factor; phi_d(dx, dy) is the distance between a pixel point of the component model and a detection point of the component model. The formula shows that the higher the response of the component model is, the smaller the distance between each component and the corresponding pixel point is, the higher the response score is, and the more likely it is that the object is to be detected.

When training the model, the DPM features obtained above are trained. DPM is here used as the late-SVM classification, where a late variable (Latent variable) is added to the Linear-SVM classification, and can be used to determine which of the positive samples is the positive sample. There are many Latent variables in LSVM, because after a bounding box is marked given a picture of a positive sample, a maximum sample needs to be proposed at a certain position by a certain scale as the maximum sample of a certain part. FIG. 9(a) is a graph comparing the effects of a general Hog + SVM and an applied DPM + tension-SVM. The general formula of Hog + SVM and DPM + tension-SVM used is shown in FIG. 9 (b).

The third step: and positioning the face feature points. In the embodiment, the LBF algorithm is used, and a cascade of regressors is used to locate the facial feature points and the eye feature points in milliseconds. Each regression r_t(,) use the current picture I and the shape vector

To predict the updated shape vector, the specific formula is as follows:

wherein

Indicating the current estimated vector S, X_iThe (x, y) coordinates of the facial feature points in the image I are represented. The most important step in the cascade is the regressor r_t(,) based on predictions such as pixel grayscale features based on image I calculations and the current shape vector

Indexed out. Geometric invariance is introduced in this process, and as the cascade progresses one can more determine that the precise semantic location of the face is being indexed.

If the initial estimate is

Belonging to this space, it is then ensured that the output range expanded by the set lies in the linear subspace of the training data. Doing so does not require additional restrictions on the prediction, which greatly simplifies the method. Furthermore, the initial shape is simply selected as the average shape of the training data that is centered and scaled according to the bounding box output of the generic face detector.

Next, each regressor in the learning cascade is used with a training data set ((I)₁,S₁),......,(I_n,S_n) To learn the regression function r₀,I_iRepresenting a picture of a face, S_iRepresenting a shape vector. Initialized shape estimation and target update)

The following were used:

π_i∈{1,......n} (13)

wherein i ═ 1.... N). The total number of these triplets is set to N-nR, where R is the number of initializations used per image I. Each initialized shape estimate for an image is derived from (S)₁,......,S_n) Uniformly sampled and no replacement is required.

Using sum of squared error losses from this dataGradient tree lifting, the regression function r can be learned₀The specific algorithm is as follows:

training data

Learning rate 0<v<1, the specific process is as follows:

a. initialization:

b. for K from 1 to K:

①i＝1,…,N:

② for regression function r_ikFitting a regression tree to give a weak regression function

And (3) updating:

c. and (3) outputting:

the triplet training data in turn will update the training data as:

the next regressor r in the cascade₁Set as follows (t ═ 0):

this process is iterated until a cascade r of T regressions₀,r₁,…,r_T-1The combination gives a sufficient level of accuracy.

Each regression function r_tIs a tree-based regression function that fits the residual target in the gradient boosting algorithm. On each separate node of the regression tree, we make a decision based on the threshold of intensity difference between the two pixels. In the coordinate system defined based on the average shape, the pixel coordinate used in the test is (u, v). For an arbitrarily shaped face image, we want to index points that have the same position as their shape as u and v for the average shape. To achieve this, the image may be deformed into an average shape based on the current shape estimate before extracting the features. Because we use a very sparse representation of the pixels of this image, it is more efficient to dewax the positions of these points rather than dewax the entire image.

Suppose k_uIs the index of the facial marker closest to u in the average shape and defines its offset from u as:

then, for image I_iThe shape S defined in_i，I_iThis is qualitatively similar to defining u in a shape image:

wherein s is_i,R_iIs a scale matrix and a rotation matrix, both of which are used to minimize the sum of squared differences between the mean shape facial marker points and the warped points:

v' is defined similarly. Formally each segmentation is a decision involving 3, with the parameter θ ═ t, u, v and applied to each training and test sample.

Here u 'and v' are defined by a scale matrix and a rotation matrix. And (4) calculating similarity conversion, and completing cascade connection only once at each level in a part of the process of the maximum calculation amount in the test time.

For each regression tree, we approximate the bottom-level function using a piecewise constant function, with a constant vector fitting to each leaf node. To train the regression tree, we randomly generate a set of candidate segmentations, i.e., θ, at each tree node. We then choose θ from these candidates at will, which minimizes the sum of the squared errors. If Q is the index set of the training examples on the node, this corresponds to minimizing:

wherein Q_θ,SIs an index of the sample, r_iIs the vector of all residuals computed for image i in the gradient enhancement algorithm, and μ_θ，sThe formula is defined as follows:

the best optimization point can be easily found because if we rearrange the formula or ignore the factors that depend on θ, we see the following formula relationship:

when evaluating different theta, we only need to calculate mu_θ，lAs is mu_θ，rCan pass through mu and mu_θ，lThe process is as follows:

the decision at each node is based on thresholding the difference in intensity values at a pair of pixels. This is a fairly simple test, but it is more powerful than a single threshold because it is relatively insensitive to global illumination variations. Unfortunately, a drawback of using pixel differences is that the number of possible segmentation (feature) candidates is quadratic in the number of pixels in the average image. This makes it difficult to find a good theta without searching for many theta. However, by considering the structure of the image data, such a limiting factor can be alleviated to some extent. We first introduce an index

p(u,v)αe^-λ||u-v|| (29)

The pixel segmentation points within the distance range are easy to select, so that the number of prediction errors of the data set can be effectively reduced.

Handling missing tags, we introduce a variable w ranging between 0 and 1_i,j(the jth landmark point representing the ith image), a new squared difference and formula is derived:

wherein W_iIs a vector (w)_i,i,......w_i,p)^TA distorted diagonal matrix. In addition, mu_θ,sThe formula of (1) is as follows:

the gradient enhancement algorithm must also be modified to take into account these weighting factors. This can be done simply by initializing the overall model with a weighted average of the targets and fitting a regression tree to the weights. In addition, the weight residual algorithm of the fitting regression tree is as follows:

wherein the cascade iteration effect is shown in fig. 10.

The fourth step: and positioning the characteristic points of the human eyes. In this embodiment, the model used is based on a hybrid tree and a shared pool of components V. In this approach we model each facial landmark as a part and use global mixing to capture the topological changes due to the viewpoint. As shown in fig. 11, the hybrid tree model employed in the present embodiment encodes topology changes due to viewpoints.

Tree structure local model: we write a tree structure T that is linear per parameter_m＝(V_m，E_m) Wherein m indicates that the structure is a hybrid, and

we mark a picture as I, and_i＝(x_i,y_i) To represent the pixel at position i. Our scores in section L are configured as:

S(I,L,m)＝App_m(I,L)+Shape_m(L)+α^m (33)

equation (34) obtains_iTemplate at position i

And, wherein m represents a mixed type here. Phi (I, l)_i) Is shown on picture I_iA feature vector at the pixel.

Expression (35) represents the arrangement score of the mixed-type specific spatial L arrangement, where dx ═ x_i-x_jAnd dy ═ y_i-y_jIndicating the displacement of the ith part to the jth part. Each parameter in the formula (referred to as a, b, c, d) can be interpreted as a spatial constraint between the different parts. Alpha is alpha^mScalar represents a scalar offset.

Since the method for positioning the eye feature points is mainly applied in the scheme of the embodiment, the factors of whole and part sharing are not considered.

The values of the parameters L and m are found which give the maximum value of the formula S (I, L, m):

simply enumerate all hybrids and find the best configuration for each of the components for each hybrid.

Because each hybrid type T_m＝(V_m，E_m) Is a tree structure, so the internal maximization can be efficiently completed through dynamic programming. Due to the lack of space, a way of omitting the messaging equations may be employed. The total number of different partial templates in the vocabulary in this embodiment is M' | V |, assuming that the dimension of each part is D and there are N candidate locations. The total cost of evaluating all segments at all locations is:

and then, distance conversion is carried out, and the information transmission cost is converted into: o (NM | V |). This makes the overall model of the solution of the present embodiment linear in terms of the number of components and the image size.

And training a human eye characteristic point positioning model. The scheme of the embodiment assumes a fully supervised sceneThere are positive examples and mixed labels in this scene and no negative examples of face images. The present embodiment scheme discriminatively learns the shape parameters and appearance parameters using a structure prediction framework. First, the edge structure E of each mixture type needs to be estimated_m. Although it is a natural process to derive human body type models using tree structures, the tree structure of human eye features is not clear.

Embodiments use the Chow-Liu algorithm to find the maximum similarity tree structure that best explains the positions of the feature points of the gaussian distribution. Positive sample of a given label I_n,L_n,m_nAnd negative samples { I }_nIn the embodiment, a structural target prediction function is defined, and z is assumed to be_n＝{L_n,m_n}. The formula S (I, L, m) relates to the partial model w, the elastic parameters (a, b, c, d) and the hybrid bias α. Putting all these parameters into a vector β, we can then write the scoring function as follows:

S(I,z)＝β·Φ(I,z) (38)

where the vector Φ (I, z) is sparse with non-zero terms in a single interval corresponding to mix m.

Next, a model of the form:

in equation (39), C represents the penalty coefficient (hyper-parameter, which is needed to find the most suitable value for the tuning parameter) of the objective function, ξ_nAnd (3) representing penalty items (penalty items of the nth sample) corresponding to different samples, wherein n corresponds to different samples, pos and neg respectively represent positive and negative samples, K represents the number of the target vectors beta, and K represents the number of the corresponding target vectors beta.

Fig. 12 is a schematic diagram showing the result of locating the feature points of the human eye.

The fifth step: blink detection-calculating the eye opening and performing fatigue analysis. The longer eye closure time when blinking is one of the important indicators of driver fatigue. On the basis of the previous steps, the human eye area has been located and the human eye feature points have been found. In order to calculate the eye opening, in the present embodiment, first, a change in a distance between the eye and the camera is excluded to prevent the change from affecting the calculation of the eye opening. On the basis, the fatigue degree of the driver is judged by the proportion of the eye closing time in unit time. The details are as follows.

First, the open/close state of the human eye is determined based on the open degree value of the human eye. In the scheme of the embodiment, the eye feature points positioned in the previous step are selected, and the calibration points at the positions of the upper eyelid and the lower eyelid which are opposite to the pupil are found to calculate the opening degree of the eye. However, it has been found through a lot of experiments and experiences that the opening degree of the human eye is smaller when the distance between the human eye and the camera is longer, and the opening degree of the human eye is larger when the distance between the human eye and the camera is shorter. This is not beneficial to the detection of the fatigue of the driver in the later period, and for this reason, the abnormal change of the opening degree of the human eyes caused by the relative position change of the human eyes and the camera is normalized. In the scheme of this embodiment, the pupil distance l of a person can be measured by using the positioning of the eye feature points in the previous step, a linear relationship exists between the pupil distance and the change of the eye opening degree, assuming that the actually measured eye opening degree (equivalent to the original value of the eye opening degree) is H, the normalized eye opening degree is H, and the eye opening degree value is corrected by using the following formula:

where C represents a selected correction parameter.

Next is the human eye state division. In the embodiment, the maximum opening degree of the human eye is obtained according to the human eye opening degree value obtained above, and is recorded as MaxW. Assuming that the measured eye opening is W, wherein state I represents W > 80% MaxW

W > 80% MaxW, with the eye in a fully open state; state II indicates that 20% MaxW is less than or equal to 80% MaxW, and the eyes are in a half-open state; state III means W ≦ 20% MaxW, the eye is in the closed state.

In this embodiment, a ratio f can be obtained by counting the number of frames with the eye opening smaller than or equal to 80% of the maximum eye opening in the period (which is equivalent to the detection period) and recording the number of frames with the eye opening smaller than or equal to 20% of the maximum eye opening in the period as n, and recording the number of frames with the eye opening smaller than or equal to 20% of the maximum eye opening in the period as m, and the calculation formula is as follows:

the closer f is to 1, the closer the driver is to fatigue. An experimental threshold value T (equivalent to the fatigue judgment threshold value) can be obtained through a large number of experiments, if f is larger than T, the driver is in a fatigue state, and early warning is performed in a voice mode to remind the driver of paying attention to fatigue driving.

In the scheme of the embodiment, the DPM algorithm is used for face detection, so that the detection accuracy of the algorithm is greatly improved, the false detection rate and the missing detection rate are reduced, and the robustness of illumination and face posture is improved; the human face feature points and the eye feature points are positioned by using a machine learning algorithm, the positioning accuracy is very high, meanwhile, the illumination and posture are very high in generalization capability, and finally, the opening and closing degree of the eyes can be accurately estimated by using the algorithm; for fatigue detection, not only the open-closed eye state is used as a main criterion, but also the closed-eye time, the blinking times per unit time, the eye opening degree and the like are used as fatigue criteria.

In one embodiment, as shown in fig. 13, there is provided a driver fatigue method apparatus, including: a detection module 1301, a processing module 1302, a statistics module 1303 and a discrimination module 1304, wherein:

the detection module 1301 is configured to acquire a face video of a target driver, and perform eye opening detection on each frame of face image in the face video to obtain an eye opening value in each frame of face image;

a processing module 1302, configured to determine a first opening threshold and a second opening threshold according to each human eye opening value, where the first opening threshold is greater than the second opening threshold;

a statistic module 1303, configured to count, according to each eye opening value, the first opening threshold, and the second opening threshold, a first image frame value of which the eye opening value is smaller than or equal to the first opening threshold, and a second image frame value of which the eye opening value is smaller than or equal to the second opening threshold;

a determining module 1304, configured to determine that the target driver is in a fatigue state if a ratio of the first image frame value to the second image frame value is greater than a preset fatigue determination threshold.

In one embodiment, the detection module 1301 may perform eye feature point positioning on each frame of the face image to obtain an eye feature point in the face image of each frame, determine a pupil distance value and an eye opening original value in the face image of each frame according to the eye feature point in the face image of each frame, and determine an eye opening value in the face image of each frame according to the pupil distance value and the eye opening original value in the face image of each frame.

In an embodiment, the processing module 1302 may determine a maximum eye opening degree value according to each of the eye opening degree values, and multiply the maximum eye opening degree value by a preset first scaling coefficient and a preset second scaling coefficient, respectively, to obtain the first opening degree threshold and the second opening degree threshold.

In one embodiment, the detection module 1301 may perform face feature point positioning on each frame of the face image to obtain each face feature image; respectively inputting each face feature image into a preset eye feature point positioning model to obtain eye feature points in each frame of the face image;

wherein, the training process of the eye feature point positioning model comprises the following steps: acquiring a pixel value of each pixel point of a target image and a feature vector of each pixel point; configuring a tree structure local model according to the pixel values and the feature vectors, and determining a score function in the L part, wherein the score function is S (I, L, m) App_m(I,L)+Shape_m(L)+α^m(ii) a Tong (Chinese character of 'tong')Obtaining the optimal configuration parameters of each part of each mixed type by calculating the values of L and m which enable the score function to obtain the maximum value; establishing a training sample set, wherein the training sample set comprises a positive sample and a negative sample which are set with labels, the positive sample is an image containing a human face, and the negative sample is an image not containing the human face; constructing a target vector according to the partial model, the elasticity parameters and the mixed bias scalar, and modifying the score function according to the target vector; learning to obtain the eye characteristic point positioning model according to the training sample set, the optimal configuration parameters, the modified score function and a predefined target prediction function;

wherein,

i denotes the target image,/_i＝(x_i,y_i) The method comprises the steps of representing a pixel value of an ith pixel point of a target image, w representing a partial model, m representing that a tree structure is a mixed type, the partial model is obtained by modeling each facial feature in the target image as a part, a, b, c and d representing elastic parameters, and alpha representing a mixed offset scalar.

In one embodiment, the detection module 1301 may extract a first DPM feature map, where the first DPM feature map is a DPM feature map of a current face image, the current face image is any one frame of face image, perform sampling processing on the first DPM feature map, extract a second DPM feature map, where the second DPM feature map is a DPM feature map of an image obtained by performing sampling processing on the first DPM feature map, perform convolution operation on the first DPM feature map with a pre-trained root filter to obtain a response map of the root filter, perform convolution operation on N times of the second DPM feature map with a pre-trained component filter to obtain a response map of the component filter, where the resolution of the component filter is N times of the resolution of the root filter, where N is a positive integer, and according to the response map of the root filter and the response map of the component filter, and obtaining a target response image, and acquiring a current face feature image according to the target response image.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 14. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of face feature analysis. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the architecture shown in FIG. 14 is only a block diagram of some of the structures associated with the inventive arrangements and is not intended to limit the computing devices to which the inventive arrangements may be applied, as a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the computer program implementing the driver fatigue detection method in any of the above embodiments.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, carries out the method of driver fatigue detection in any one of the above embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A driver fatigue detection method, characterized in that the method comprises:

2. The method of claim 1, wherein the detecting the eye opening degree of each frame of face image in the face video to obtain the eye opening degree value of each frame of face image comprises:

respectively carrying out eye feature point positioning on each frame of the face image to obtain eye feature points in each frame of the face image;

respectively determining a human eye interpupillary distance value and a human eye opening original value in the face image of each frame according to the eye feature points in the face image of each frame;

and respectively determining the human eye opening value in the face image of each frame according to the human eye interpupillary distance value and the human eye opening original value in the face image of each frame.

3. The driver fatigue detection method according to claim 1 or 2, wherein the determining a first opening degree threshold value and a second opening degree threshold value from each of the human eye opening degree values includes:

determining the maximum eye opening degree value according to the eye opening degree values;

and multiplying the maximum opening degree value by a preset first proportional coefficient and a preset second proportional coefficient respectively to obtain the first opening degree threshold value and the second opening degree threshold value.

4. The method for detecting driver fatigue according to claim 2, wherein the performing eye feature point positioning on the face image of each frame to obtain eye feature points in the face image of each frame includes:

respectively carrying out face feature point positioning on the face image of each frame to obtain each face feature image;

respectively inputting each face feature image into a preset eye feature point positioning model to obtain eye feature points in each frame of the face image;

wherein, the training process of the eye feature point positioning model comprises the following steps:

acquiring a pixel value of each pixel point of a target image and a feature vector of each pixel point;

configuring a tree structure local model according to the pixel values and the feature vectors, and determining a score function in the L part, wherein the score function is S (I, L, m) App_m(I,L)+Shape_m(L)+α^m；

Wherein,

i denotes the target image,/_i＝(x_i,y_i) Representing the pixel value of the ith pixel point of the target image, w representing a partial model, m representing that the tree structure is a mixed type, the partial model is obtained by modeling each facial feature in the target image as a part, a, b, c and d representing elastic parameters, and alpha representing a mixed offset scalar; phi (I, l)_i) Representing l on the target image I_iA feature vector at the pixel;

obtaining optimal configuration parameters of each part of each hybrid type by calculating values of L and m which enable the score function to obtain a maximum value;

establishing a training sample set, wherein the training sample set comprises a positive sample and a negative sample which are set with labels, the positive sample is an image containing a human face, and the negative sample is an image not containing the human face;

constructing a target vector according to the partial model, the elasticity parameters and the mixed bias scalar, and modifying the score function according to the target vector;

and learning to obtain the eye characteristic point positioning model according to the training sample set, the optimal configuration parameters, the modified score function and a predefined target prediction function.

5. The method of claim 4, wherein the performing facial feature point location on the facial image of each frame to obtain each facial feature image comprises:

extracting a first DPM feature map, wherein the first DPM feature map is a DPM feature map of a current face image, and the current face image is any one frame of face image; DPM is a target detection algorithm Deformable Part Model;

sampling the first DPM feature map, and extracting a second DPM feature map, wherein the second DPM feature map is a DPM feature map of an image obtained by sampling the first DPM feature map;

performing convolution operation on the first DPM characteristic diagram by using a pre-trained root filter to obtain a response diagram of the root filter;

performing convolution operation on the N times of the second DPM characteristic diagram by using a pre-trained component filter to obtain a response diagram of the component filter, wherein the resolution of the component filter is N times of that of the root filter, and N is a positive integer;

obtaining a target response diagram according to the response diagram of the root filter and the response diagram of the component filter;

and acquiring a current face feature image according to the target response image.

6. The method of detecting driver fatigue as set forth in claim 4, wherein the eye feature point location model is:

7. A driver fatigue detecting device, characterized in that the device comprises:

8. The driver fatigue detection device according to claim 7, characterized in that:

the detection module positions eye feature points of each frame of the face image to obtain eye feature points of each frame of the face image, determines a human eye pupil distance value and a human eye opening original value of each frame of the face image according to the eye feature points of each frame of the face image, and determines a human eye opening value of each frame of the face image according to the human eye pupil distance value and the human eye opening original value of each frame of the face image.

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 6 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.