CN111291607B

CN111291607B - Driver distraction detection method, driver distraction detection device, computer equipment and storage medium

Info

Publication number: CN111291607B
Application number: CN201910466961.4A
Authority: CN
Inventors: 彭斐; 谷俊; 尹超凡; 何俏君; 毛茜; 李彦琳
Original assignee: Guangzhou Automobile Group Co Ltd
Current assignee: Guangzhou Automobile Group Co Ltd
Priority date: 2018-12-06
Filing date: 2019-05-31
Publication date: 2021-01-22
Anticipated expiration: 2039-05-31
Also published as: CN111291607A

Abstract

The invention relates to a driver distraction detection method, a driver distraction detection device, computer equipment and a storage medium, wherein the method comprises the following steps: detecting a head deviation angle value and a sight line deviation angle value of a driver; determining an attention deviation angle value according to the head deviation angle value and the sight line deviation angle value; determining a time judgment threshold according to the attention deviation angle value; and if the first duration time that the attention deviation angle value is greater than the preset first angle threshold value is greater than the time judgment threshold value, judging that the driver is in a distraction state. By adopting the scheme of the invention, the omission factor can be reduced.

Description

Driver distraction detection method, driver distraction detection device, computer equipment and storage medium

Technical Field

The invention relates to the technical field of image processing, in particular to a driver distraction detection method, a driver distraction detection device, computer equipment and a storage medium.

Background

Traffic accidents have always been one of the most serious threats to the security of lives and properties faced by mankind, and most of them occur due to human factors of drivers. The distraction of the driver during the driving of the vehicle is very dangerous. The driver is distracted, so that the condition of the road surface in front of the driver cannot be normally observed, and when the driver encounters a dangerous condition, the driver usually has no time to make a correct response, so that traffic accidents are caused. Driver distraction, particularly during high vehicle speeds, can lead to even more serious consequences.

With the development of image recognition processing technology, the distraction state of the driver is judged and the driver is alarmed by recognizing and processing the facial image information of the driver in the driving process, so that a new solution is provided for preventing traffic accidents.

The traditional distraction detection mode mainly has two kinds, one is that whether there is every frame picture of shooting to judge driver's face orientation through detecting the image of two eyes, nose, mouth to and the camera can not shoot driver complete eyes, nose, mouth information and judge driver distraction, and this kind of mode omission factor is higher. Another way is to determine the driver's state of attention based on the line of sight or head orientation, with a high missed detection rate.

Disclosure of Invention

In view of the above, it is necessary to provide a driver distraction detection method, apparatus, computer device and storage medium capable of reducing a missed detection rate in view of the above technical problems.

A driver distraction detection method, the method comprising:

detecting a head deviation angle value and a sight line deviation angle value of a driver;

determining an attention deviation angle value according to the head deviation angle value and the sight line deviation angle value;

determining a time judgment threshold according to the attention deviation angle value;

and if the first duration time that the attention deviation angle value is greater than the preset first angle threshold value is greater than the time judgment threshold value, judging that the driver is in a distraction state.

A driver distraction detection apparatus, the apparatus comprising:

the detection module is used for detecting a head deviation angle value and a sight line deviation angle value of a driver;

the processing module is used for determining an attention deviation angle value according to the head deviation angle value and the sight line deviation angle value, and determining a time judgment threshold value according to the attention deviation angle value;

the first judging module is used for judging that the driver is in a distraction state when the first duration time that the attention deviation angle value is larger than a preset first angle threshold value is larger than the time judging threshold value.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

The driver distraction detection method, the device, the computer equipment and the storage medium detect a head deviation angle value and a sight line deviation angle value of a driver, determine an attention deviation angle value according to the head deviation angle value and the sight line deviation angle value, determine a time judgment threshold according to the attention deviation angle value, and judge that the driver is in a distraction state if a first duration of the attention deviation angle value being greater than a preset first angle threshold is greater than the time judgment threshold. In the scheme of the embodiment, distraction detection is performed according to the attention deviation angle value, the attention deviation angle value takes two factors of the head deviation angle value and the sight deviation angle value into consideration, the time judgment threshold value is determined according to the attention deviation angle value, and distraction of the driver is comprehensively judged according to the combined distraction threshold value (the first angle threshold value and the time judgment threshold value), so that distraction detection accuracy can be improved, and omission ratio can be reduced.

Drawings

FIG. 1 is a diagram of an exemplary driver distraction detection method;

FIG. 2 is a schematic flow chart diagram of a driver distraction detection method in one embodiment;

FIG. 3 is a schematic flow chart of a driver distraction detection method in another embodiment;

FIG. 4 is a graph of an angle of attention deviation value versus a time evaluation threshold in one embodiment;

FIG. 5 is a schematic diagram illustrating a flowchart of obtaining an angle of gaze deviation value according to an embodiment;

FIG. 6 is a flow chart illustrating a process for obtaining a head deviation angle value according to one embodiment;

FIG. 7 is a schematic diagram illustrating an exemplary process for face detection and feature point location acquisition;

FIG. 8 is a DPM feature extraction schematic in one embodiment;

FIG. 9 is a diagram of a 2-fold spatial model (right) after Gaussian filtering of a root filter (left) component filter in one embodiment;

FIG. 10 is a diagram illustrating the comparison of the effects of a conventional Hog + SVM and an applied DPM + Latent-SVM in one embodiment (a) and the formula comparison (b);

FIG. 11 is a diagram illustrating the effects of cascading iterations in one embodiment;

FIG. 12 is a CNN model architecture diagram in one embodiment;

FIG. 13 is a schematic view of a gaze estimation process based on the CNN model in one embodiment;

FIG. 14 is a result display diagram using picture rendering in one embodiment;

FIG. 15 is a diagram of an eyelid movement model in one embodiment;

FIG. 16 is a block diagram showing the construction of a driver distraction detecting apparatus according to another embodiment;

fig. 17 is an internal structural view of a computer device in another embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.

It should be noted that the terms "first" and "second" and the like in the description, the claims, and the drawings of the present application are used for distinguishing similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein.

The driver distraction detection method provided by the application can be applied to the application environment shown in fig. 1. The infrared camera collects video information of a driver, and the video information collected by the infrared camera can be input into the terminal to detect distraction of the driver. Wherein, the preferred mounted position of infrared camera is on the steering column of car steering wheel below. The infrared camera can communicate with the terminal in a wired or wireless mode. The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, vehicle-mounted terminals, and portable wearable devices.

In one embodiment, as shown in fig. 2, a driver distraction detection method is provided, which is exemplified by applying the method to a terminal, and includes the following steps:

step S201: detecting a head deviation angle value and a sight line deviation angle value of a driver;

wherein the head deviation angle value represents a parameter of the head pose of the driver relative to the frontal pose of the driver. The gaze deviation angle value is an angle value of a gaze direction when the current direction of the gaze deviates from the gaze direction when the eye is looking straight ahead.

Step S202: determining an attention deviation angle value according to the head deviation angle value and the sight line deviation angle value;

in particular, the attention deviation angle value may be obtained by summing the head deviation angle value and the gaze deviation angle value. In the calculation process, a positive direction may be set, the head deviation angle value and the line-of-sight deviation angle value are summed according to the set positive direction, and an absolute value is taken from the obtained sum value to obtain the attention deviation angle value.

Step S203: determining a time judgment threshold according to the attention deviation angle value;

in this embodiment, the time evaluation threshold value is changed according to the change of the attention deviation angle value.

Step S204: and if the first duration time that the attention deviation angle value is greater than the preset first angle threshold value is greater than the time judgment threshold value, judging that the driver is in a distraction state.

The size of the first angle threshold can be determined according to actual needs, and preferably, the first angle threshold is 15 degrees and can also be selected within 10 degrees to 20 degrees as required.

Specifically, when it is detected that the attention deviation angle value is greater than a preset first angle threshold, it is detected whether a first duration time for which the attention deviation angle value is greater than the preset first angle threshold is greater than the time judgment threshold, and if so, it is determined that the driver is in a distraction state.

In the driver distraction detection method, a head deviation angle value and a sight line deviation angle value of a driver are detected, an attention deviation angle value is determined according to the head deviation angle value and the sight line deviation angle value, a time judgment threshold value is determined according to the attention deviation angle value, and if a first duration of the attention deviation angle value being greater than a preset first angle threshold value is greater than the time judgment threshold value, the driver is judged to be in a distraction state. In the scheme of the embodiment, distraction detection is performed according to the attention deviation angle value, the attention deviation angle value takes two factors of the head deviation angle value and the sight deviation angle value into consideration, the time judgment threshold value is determined according to the attention deviation angle value, and distraction of the driver is comprehensively judged according to the combined distraction threshold value (the first angle threshold value and the time judgment threshold value), so that distraction detection accuracy can be improved, and omission ratio can be reduced. In addition, distraction detection in the scheme of the embodiment can be used as driving right conversion of automatic driving to provide a measurement index, early warning is timely sent out at dangerous moment, real-time performance is good, and traffic accidents caused by distraction driving of a driver can be effectively reduced.

In one embodiment, a driver distraction detection method is provided, which is described by taking an example that the method is applied to a terminal, and includes the following steps:

step S301: detecting a head deviation angle value and a sight line deviation angle value of a driver;

step S302: determining an attention deviation angle value according to the head deviation angle value and the sight line deviation angle value;

step S303: determining a time judgment threshold according to the attention deviation angle value;

in this embodiment, the steps S301 to S303 may refer to the steps S201 to S203, which are not described herein again.

Step S304: if the second duration time of the head deviation angle value which is greater than a preset second angle threshold value is greater than the time judgment threshold value, judging that the driver is in a distraction state;

the size of the second angle threshold can be determined according to actual needs, and preferably, the second angle threshold is 60 degrees, and can also take a value within 50 degrees to 70 degrees as needed.

Specifically, whether the head deviation angle value is larger than a preset second angle threshold value or not is detected, if yes, whether a second duration time that the head deviation angle value is larger than the preset second angle threshold value is larger than the time judgment threshold value or not is detected, and if yes, the driver is judged to be in a distraction state.

Step S305: if the head deviation angle value is not greater than the second angle threshold value, or the second duration time is not greater than the time evaluation threshold value, detecting whether a first duration time is greater than the time evaluation threshold value when the attention deviation angle value is detected to be greater than a preset first angle threshold value;

here, the first duration is a duration in which the angle of attention deviation value is greater than a preset first angle threshold.

Step S306: and if the first duration is greater than the time judgment threshold, judging that the driver is in a distraction state.

In this embodiment, the distraction detection is preferentially performed according to the head deviation angle value, and the distraction detection is performed according to the attention deviation angle value only when the head deviation angle value does not satisfy the condition for determining that the driver is in the distraction state.

In one embodiment, the determining the time evaluation threshold according to the attention deviation angle value may include: when the attention deviation angle value is larger than the second angle threshold value, the time evaluation threshold value is a first preset time value; when the attention deviation angle value is smaller than the first angle threshold value, the time evaluation threshold value is a second preset time value; the time evaluation threshold decreases linearly with increasing angle of attention deviation value when the angle of attention deviation value is between the first angle threshold and the second angle threshold.

As shown in fig. 4, taking the example that the second angle threshold is 60 °, the first angle threshold is 15 °, the first preset time value is 2s (2 seconds), and the second preset time value is 5s (5 seconds), the second angle threshold, the first preset time value, and the second preset time value may be other values as needed.

In the embodiment, the time judgment threshold is set to be correspondingly small in consideration of the fact that attention is seriously deviated and danger is more likely to occur, so that the risk can be discovered as early as possible to avoid the risk as far as possible.

In one embodiment, as shown in fig. 5, the obtaining of the line-of-sight deviation angle value of the driver may include:

step S501: acquiring a face image of the driver, and positioning feature points of the face image to obtain feature point information;

specifically, each frame in the video may be used as a face image of the driver, and the face image is subjected to face feature positioning to obtain feature point information. The driver can be subjected to video acquisition through the monocular infrared camera arranged on the steering column below the steering wheel, so that the face state information of the driver in the driving process can be obtained. Simultaneously, the outer camera can adapt to the different light condition in the car, can accurately catch driver's head gesture information and facial information. The frequency of the video input may be 30Hz, and the image size per frame may be 1280 × 1080 pixels, but the camera and the installation location are not limited thereto.

Step S502: acquiring eyeball center coordinates and pupil center coordinates of the driver according to the characteristic point information;

wherein, the eyeball center coordinate and the pupil center coordinate are three-dimensional coordinates.

Step S503: determining the sight deviation angle value according to the eyeball center coordinate and the pupil center coordinate;

specifically, the direction vectors of the eyeball center and the pupil center of the driver may be determined according to the eyeball center coordinate and the pupil center coordinate, and then the gaze deviation angle value may be determined according to the direction vectors.

In one embodiment, as shown in fig. 6, the detecting the head deviation angle value of the driver may include:

step 601: acquiring a face image of the driver, and positioning feature points of the face image to obtain feature point information;

step 601 may refer to the above description of step 501, which is not repeated herein

Step 602: determining a spatial position relationship between the current two-dimensional image of the driver and the front face three-dimensional model according to the feature point positioning information and the front face three-dimensional model of the driver;

since the head pose of the driver is relative to the frontal pose, the frontal pose three-dimensional model of the face (frontal three-dimensional model of the face) can be used as a reference for pose measurement, and the spatial position relationship between the current two-dimensional image (camera coordinate system) and the frontal three-dimensional model of the face (world coordinate system) is calculated, namely the head pose of the driver.

Step 603: and determining the head deviation angle value of the driver according to the spatial position relation.

In one embodiment, as shown in fig. 7, the above locating feature points of the face image to obtain feature point information may include:

step S701: extracting a first DPM feature map, wherein the first DPM feature map is a DPM feature map of the face image;

step S702: sampling the first DPM feature map, and extracting a second DPM feature map, wherein the second DPM feature map is a DPM feature map of an image obtained by sampling the first DPM feature map;

step S703: performing convolution operation on the first DPM characteristic diagram by using a pre-trained root filter to obtain a response diagram of the root filter;

step S704: performing convolution operation on the doubled second DPM characteristic diagram by using a pre-trained component filter to obtain a response diagram of the component filter;

step S705: obtaining a target response diagram according to the response diagram of the root filter and the response diagram of the component filter;

step S706: and determining a face area according to the target response image, and positioning the feature points of the face area to obtain feature point information.

In the embodiment, the face detection is performed by adopting the DPM target detection algorithm, so that the detection accuracy of the algorithm is improved, and the false detection rate and the missing detection rate can be reduced at the same time.

In one embodiment, the above process of detecting the angle of departure of the driver's gaze uses a multi-modal convolutional neural network model, and the data samples of the convolutional neural network model are eye area images generated by rendering a plurality of dynamically controllable eye area models. By adopting the scheme of the embodiment, the condition of insufficient samples can be made up, and meanwhile, by adopting the multi-modal convolutional neural network model, the algorithm is accurate, and the capability of resisting external conditions such as illumination is strong.

In order to facilitate an understanding of the present invention, a preferred embodiment of the present invention will be described in detail.

The driver distraction detection method in this embodiment includes the steps of: the first step is as follows: collecting video information; the second step is that: detecting a human face; the third step: positioning face feature points; the fourth step: detecting the head posture in real time; the fifth step: and (5) line of sight estimation and distraction judgment. These steps are described in detail below.

The first step is as follows: and collecting video information. The monocular infrared camera (arranged on a steering column below a steering wheel) for inputting video information inputs face state information (images) of a driver in a driving process in real time. The frequency of the video input is 30Hz and the image size per frame is 1280 × 1080 pixels.

The second step is that: and detecting the human face. For each frame of image of the input video, the present embodiment performs face detection by using a dpm (deformable Part model) target detection algorithm. The DPM algorithm is applied to part of principles in the HOG algorithm: firstly, graying the picture; then, as equation (1), the input image is normalized (normalized) in color space by using a Gamma correction method:

I(x,y)＝I(x,y)^gamma (1)

wherein the content of the first and second substances,^gammathe value of (A) is seen from the specific situation (for example, 1/2 can be taken), so that the local shadow and illumination change of the image can be effectively reduced; next, gradient calculation is performed, wherein the gradient reflects the change between adjacent pixels, the change between adjacent pixels is relatively flat, the gradient is smaller, and the gradient is larger, and the gradient of any point pixel (x, y) of the simulation image f (x, y) is a vector:

wherein G is_xIs a gradient in the x-direction, G_yIs the gradient along the y-direction, and the magnitude and direction angle of the gradient can be expressed by the following formula:

pixel points in the digital image are calculated using the difference:

since the detection effect obtained by the gradient operation using a simple one-dimensional discrete differential template [ -1,0,1] is the best, the calculation formula used is as follows:

in the above formula, G_x,G_yH (x, y) represents the gradient and the pixel value of the pixel point (x, y) in the horizontal direction and the vertical direction, respectively, and the calculation formula of the magnitude and the direction of the gradient is as follows:

then, the whole target picture is divided into cell units (cells) which are not overlapped with each other and have the same size, and then the gradient size and direction of each cell unit are calculated. DPM retained the cell units of the HOG map and then normalized a certain cell unit on the map (8 x8 cell unit in fig. 8) to its four cells in the diagonal neighborhood. Extracting signed HOG gradients, 0-360 degrees will yield 18 gradient vectors, extracting unsigned HOG gradients, 0-180 degrees will yield 9 gradient vectors. DPM extracts only unsigned features, generates 4 × 9-36 dimensional features, adds rows and columns to form 13 feature vectors (9 columns and 4 rows as shown in fig. 8), adds extracted 18-dimensional signed gradient features (18 columns and 18 rows as shown in fig. 18) to further improve accuracy, and finally obtains 13+ 18-31 dimensional gradient features.

As shown in fig. 9, the DPM model employs an 8 × 8 resolution Root filter (left) and a 4 × 4 resolution component filter (middle). Wherein the resolution of the middle graph is 2 times that of the left graph, and the size of the component filter is 2 times that of the root filter, so the gradient can be seen more finely. The right image is a 2-fold spatial model after gaussian filtering.

Firstly, a DPM feature map (DPM feature map of an original image) is extracted from an input image, gaussian pyramid upsampling (scaling image) is performed, and then the DPM feature map of the gaussian pyramid upsampled image is extracted. And carrying out convolution operation on the DPM characteristic diagram of the original image and the trained root filter to obtain a response diagram of the root filter. Meanwhile, performing convolution operation on the DPM characteristic diagram (sampled on a Gaussian pyramid) of the extracted 2-time image by using a trained component filter to obtain a response diagram of the component filter. The resulting response maps of the component filters are subjected to a fine gaussian downsampling operation so that the response maps of the root filter and the component filters have the same resolution. And finally, carrying out weighted average on the two images to obtain a final response image, wherein the response effect is better when the brightness is higher, and the human face is detected. Wherein the response value is expressed in the following formula:

wherein x is₀,y₀,l₀Respectively representing the abscissa, ordinate and scale of the characteristic point;

is the response score of the root model;

response points of the component model; 2 (x)₀,y₀) The pixel representing the component model is 2 times the original; b is the offset coefficient between different model components for alignment with the model; 2 (x)₀,y₀) The pixels representing the component model are 2 times the original, so pixel x 2; v. of_iThe deviation coefficient between the pixel point and the ideal detection point is obtained; wherein the detailed response score formula of the component model is as follows:

similar to equation (8), we expect the objective function (D)_i,l(x, y)) the larger the better, the variable is dx, dy. In addition, in the above formula, (x, y) is the position of the ideal model for training(ii) a dx, dy is the offset of the ideal model position, and the range is the position from the ideal position to the picture edge; r_i,l(x + dx, y + dy) is the matching score of the component model; d_i*Φ_d(dx, dy) is the offset loss score for the component; d_iIs the offset loss factor; phi_d(dx, dy) is the distance between a pixel point of the component model and a detection point of the component model. The formula shows that the higher the response of the component model is, the smaller the distance between each component and the corresponding pixel point is, the higher the response score is, and the more likely it is that the object is to be detected.

When training the model, the DPM features obtained above are trained. DPM is here used as the late-SVM classification, where a late variable (Latent variable) is added to the Linear-SVM classification, and can be used to determine which of the positive samples is the positive sample. There are many Latent variables in LSVM, because after a bounding box is marked given a picture of a positive sample, a maximum sample needs to be proposed at a certain position by a certain scale as the maximum sample of a certain part. FIG. 10(a) is a graph comparing the effects of a general Hog + SVM and an applied DPM + Latent-SVM. The general formula of Hog + SVM and DPM + Latent-SVM used is shown in FIG. 10 (b).

The third step: and positioning the face feature points. In the embodiment, the LBF algorithm is used, and a cascade of regressors is used to locate the facial feature points and the eye feature points in milliseconds. Each regression r_t(,) use the current picture I and the shape vector

To predict the updated shape vector, the specific formula is as follows:

wherein

Indicating the current estimated vector S, X_iThe (x, y) coordinates of the facial feature points in the image I are represented. The most important step in the cascade is the regressor r_t(,) based on predictions such as pixel grayscale features based on image I calculations and the current shape vector

Indexed out. Geometric invariance is introduced in this process, and as the cascade progresses one can more determine that the precise semantic location of the face is being indexed.

If the initial estimate is

Belonging to this space, it is then ensured that the output range expanded by the set lies in the linear subspace of the training data. Doing so does not require additional restrictions on the prediction, which greatly simplifies the method. Furthermore, the initial shape is simply selected as the average shape of the training data that is centered and scaled according to the bounding box output of the generic face detector.

Next, each regressor in the learning cascade is used with a training data set ((I)₁,S₁),......,(I_n,S_n) To learn the regression function r₀,I_iRepresenting a picture of a face, S_iRepresenting a shape vector. Initialized shape estimation and target update)

The following were used:

π_i∈{1,......n} (13)

wherein i ═ 1.... N). The total number of these triplets is set to N-nR, where R is the number of initializations used per image I. Each initialized shape estimate for an image is derived from (S)₁,......,S_n) Uniformly sampled and no replacement is required.

From this data, the regression function r can be learned using a gradient tree lifting with a sum of squared error losses₀The specific algorithm is as follows:

training data

Learning rate 0<v<1, the specific process is as follows:

a. initialization:

b. for K from 1 to K:

①i＝1,…,N:

② for regression function r_ikFitting a regression tree to give a weak regression function

And (3) updating:

c. and (3) outputting:

the triplet training data in turn will update the training data as:

the next regressor r in the cascade₁Set as follows (t ═ 0):

this process is iterated until a cascade of T regressions

The combination gives a sufficient level of accuracy.

Each regression function r_tIs a tree-based regression function that fits the residual target in the gradient boosting algorithm. On each separate node of the regression tree, we make a decision based on the threshold of intensity difference between the two pixels. In the coordinate system defined based on the average shape, the pixel coordinate used in the test is (u, v). For an arbitrarily shaped face image, we want to index points that have the same position as their shape as u and v for the average shape. To achieve this, the image may be deformed into an average shape based on the current shape estimate before extracting the features. Because we use a very sparse representation of the pixels of this image, it is more efficient to dewax the positions of these points rather than dewax the entire image.

Suppose k_uIs the index of the facial marker closest to u in the average shape and defines its offset from u as:

then, for image I_iThe shape S defined in_i，I_iThis is qualitatively similar to defining u in a shape image:

wherein s is_i,R_iIs a scale matrix and a rotation matrix, both of which are used to minimize the sum of squared differences between the mean shape facial marker points and the warped points:

v' is defined similarly. Formally each segmentation is a decision involving 3, with the parameter θ ═ t, u, v and applied to each training and test sample.

Here u 'and v' are defined by a scale matrix and a rotation matrix. And (4) calculating similarity conversion, and completing cascade connection only once at each level in a part of the process of the maximum calculation amount in the test time.

For each regression tree, we approximate the bottom-level function using a piecewise constant function, with a constant vector fitting to each leaf node. To train the regression tree, we randomly generate a set of candidate segmentations, i.e., θ, at each tree node. We then choose θ from these candidates at will, which minimizes the sum of the squared errors. If Q is the index set of the training examples on the node, this corresponds to minimizing:

wherein Q_θ,SIs an index of the sample, r_iIs the vector of all residuals computed for image i in the gradient enhancement algorithm, and μ_θ,sThe formula is defined as follows:

the best optimization point can be easily found because if we rearrange the formula or ignore the factors that depend on θ, we see the following formula relationship:

when evaluating different theta, we only need to calculate mu_θ,lAs is mu_θ,rCan pass through mu and mu_θ,lThe process is as follows:

the decision at each node is based on thresholding the difference in intensity values at a pair of pixels. This is a fairly simple test, but it is more powerful than a single threshold because it is relatively insensitive to global illumination variations. Unfortunately, a drawback of using pixel differences is that the number of possible segmentation (feature) candidates is quadratic in the number of pixels in the average image. This makes it difficult to find a good theta without searching for many theta. However, by considering the structure of the image data, such a limiting factor can be alleviated to some extent. We first introduce an index

p(u,v)αe^-λ||u-v|| (29)

The pixel segmentation points within the distance range are easy to select, so that the number of prediction errors of the data set can be effectively reduced.

Handling missing tags, we introduce a variable w ranging between 0 and 1_i,j(j represents the ith imageOne marker point), a new squared error and formula is derived:

wherein W_iIs a vector (w)_i,i,......w_i,p)^TA distorted diagonal matrix. In addition, mu_θ,sThe formula of (1) is as follows:

the gradient enhancement algorithm must also be modified to take into account these weighting factors. This can be done simply by initializing the overall model with a weighted average of the targets and fitting a regression tree to the weights. In addition, the weight residual algorithm of the fitting regression tree is as follows:

the cascade iteration effect is shown in fig. 11:

the fourth step: and monitoring the head posture in real time. The method comprises the following steps of constructing a 3D face model of a current driver by utilizing a general face model and a front face 2D model, modeling an imaging process of a camera by selecting a four-parameter pinhole camera model without considering distortion factors, wherein an imaging relation between a three-dimensional space point and an image projection point is as follows:

in the formula (I), the compound is shown in the specification,

is an intrinsic parameter of the camera, f_x,f_yFocal lengths in the directions of the horizontal and vertical axes, c_x,c_yFor the principal point coordinates of the imaging surface of the camera, the 2D model and the 3D model of the face are both frontal faces, so the method has the advantages of simple structure, low cost and high precisionThe rotation matrix R in the formula is an identity matrix, i.e.

t＝[t_x,t_y,t_z]^TFor head translation vectors, R and t are called camera extrinsic parameters, (X)_i,Y_i,Z_i) Are homogeneous coordinates of spatial points.

And solving the head posture of the driver in real time by adopting an EPNP algorithm. The EPNP algorithm calculates a spatial position relationship between a camera coordinate system (two-dimensional image) and a world coordinate system (three-dimensional object) based on a correspondence relationship between a two-dimensional image point set and a three-dimensional object point set. Since the head pose of the driver is relative to the frontal pose, the frontal pose face 3D model can be used as a reference for pose measurement, and the calculated spatial position relationship between the current two-dimensional image (camera coordinate system) and the frontal face 3D model (world coordinate system) is the head pose of the driver. Under a camera coordinate system, n characteristic points and 4 non-coplanar virtual control points in a face area in a 2D image of the face are recorded as homogeneous coordinates respectively

Exist of

And is

The homogeneous coordinate of the virtual control point in the world coordinate system is

Also have

Since the camera intrinsic parameters are known, assuming that the identity matrix is available:

substituting the third component into the first and second components yields:

note as matrix: mx is 0, M is a 2n multiplied by 12 matrix, the position vector x belongs to the right zero space of M and comprises the nonhomogeneous coordinates of 4 virtual control points in a camera coordinate system, and the experiment can determine that the maximum number of base vectors in the space is 4, so the method has the advantages of simple structure, low cost and high precision

Wherein v is_iIs the right singular vector of the M matrix, which can be solved by solving for M^TObtaining the zero space characteristic value of the M matrix, wherein the value of N is 1,2,3 or 4, estimating beta values under different N values by using the distance invariance under rigid body transformation, determining the value of N by comparing back projection errors under four different N values so as to recover the value of x, namely the coordinate of the virtual control point under a world coordinate system, and finally obtaining the zero space characteristic value of the M matrix by using the back projection error under four different N values

And recovering coordinates under a camera coordinate system, solving the two-dimensional to three-dimensional PNP problem, converting the PNP problem into the rigid motion problem, and estimating attitude parameters by using the existing fast algorithm.

The steps for calculating R and t are as follows:

(1) the central point is calculated,

(2) removing the center:

(3) the H matrix is calculated and,

(4) singular value decomposition of H-U Λ V^T；

(5) Calculating X ═ V^TIf det (X) is 1, then R is X,

otherwise R (2, R-R (2, R)

The fifth step: line of sight estimation and distraction detection. The sight line is a direction vector formed by 3D coordinates of the center of the eyeball and the center of the pupil. In the scheme of the real-time example, a multi-modal Convolutional Neural Network (CNN) is adopted to learn the mapping from the input characteristic 2D head angle rotation vector h and the eye image e to the visual line angle vector g in the normalized space. The distinction between left and right eyes is insignificant in a human independent training scenario, where both eyes are treated with a single regression function by flipping the eye images horizontally and mirroring h and g around the y-axis. As shown in fig. 12, the CNN model in the embodiment uses a LeNet network architecture, which is composed of two convolutional layers, two max-pooling layers, and a full-link layer. The linear regression layer is trained on top of the fully connected layer with the predicted gaze angle vector g, using a multimodal CNN model to connect the output of h and fully connected layers into the CNN model using eye images and head pose information, the head information is encoded into our CNN model. Input to the network is a gray scale image e, which is fixed to a size of 60 × 36 pixels. The feature size of the two convolutional layers is 5 x 5 pixels, the first layer is 20, the second layer is 50, the number of hidden units in the fully connected layer is 500, where each unit is connected to all feature maps of the previous convolutional layer and is calculated by summing all activation values. The network output is a two-dimensional gaze angle vector

It is formed by yawing

And pitch of thread

Two gaze angles. Fig. 13 is a schematic view of a visual line estimation process based on the CNN model.

As shown in fig. 14, in the solution of the present embodiment, the data samples of the training CNN model are mass-rendered by a series of dynamic and controllable eye area models using unityees to generate a real eye area image. The eye model, including sclera, pupil, iris and cornea, shows real changes in shape (pupil dilation) and texture (iris color, scleral veins). The head models used cover different genders, races and ages.

As shown in fig. 15, the eyelid movement model takes a mixed shape of upward and downward looking, and interpolates between them an eyeball model based on a global model so that it is continuously deformed to fit the eyeball posture. When the tissue surrounding the eye is compressed or stretched, the skin details such as wrinkles and folds are either weakened or exaggerated, and we remove wrinkles by simulating the down-lid using smooth color and displacement texture.

And (4) separating the spirit and judging: we define the line of sight and the angle of the head from the positive front as positive to the left and negative to the right. The attention direction is the gaze deviation angle + head deviation angle. The distraction determination logic is as follows:

if the head yaw angle > threshold 1 and the duration > threshold 3, the driver is distracted;

if the direction of attention > threshold 2 and the duration > threshold 3, the driver is distracted.

The threshold value 1 (corresponding to the second angle threshold value) is a maximum head-deflection angle, and if the maximum head-deflection angle exceeds this angle, it is determined as a distraction state. The recommended threshold is 60 °. The above threshold 2 (corresponding to the above first angle threshold) refers to the maximum attention-deviation angle, and the recommended threshold is 15 °. The threshold 3 (corresponding to the time evaluation threshold) refers to the attention deviation time, which is a function of the attention deviation angle θ, and is not described herein in detail with reference to fig. 4.

In the scheme of the embodiment, the DPM algorithm is used for face detection, so that the detection accuracy of the algorithm is greatly improved, the false detection rate and the missing detection rate are reduced, and the robustness of illumination and face posture is improved; the human face feature points are positioned by using a machine learning algorithm, and the eye feature points are respectively positioned, so that the positioning precision is very high, and meanwhile, the illumination and posture are very strong in generalization capability; a large amount of sample data is generated by using a picture rendering technology, so that the condition that training samples of a convolutional neural network model are insufficient is overcome; the visual line direction is predicted by adopting the CNN model, the algorithm is accurate, and the capability of resisting external environment conditions such as illumination is strong; and a measurement index is provided for the driving right conversion in combination with automatic driving.

In one embodiment, as shown in fig. 16, there is provided a driver distraction detection apparatus including: a detection module 1601, a first evaluation module 1602, and a first evaluation module 1603, wherein:

the detecting module 1601 is configured to detect a head deviation angle value and a gaze deviation angle value of a driver;

a processing module 1602, configured to determine an attention deviation angle value according to the head deviation angle value and the gaze deviation angle value, and determine a time evaluation threshold according to the attention deviation angle value;

a first judging module 1603, configured to judge that the driver is in the distraction state when a first duration of the attention deviation angle value being greater than a preset first angle threshold is greater than the time judging threshold.

The driver distraction detection apparatus in one embodiment may further include a second judgment module configured to judge that the driver is in a distraction state when a second duration in which the head deviation angle value is greater than the second angle threshold is greater than the time judgment threshold; the first judging module 1603 detects whether the attention deviation angle value is greater than a preset first angle threshold value when the head deviation angle value is not greater than a preset second angle threshold value or when the second duration is not greater than the time judging threshold value, detects whether the first duration is greater than the time judging threshold value if the attention deviation angle value is greater than the preset first angle threshold value, and judges that the driver is in a distraction state if the first duration is greater than the time judging threshold value.

In one embodiment, the detecting module 1601 may determine that the time evaluation threshold is a first preset time value when the attention deviation angle value is greater than the second angle threshold, determine that the time evaluation threshold is a second preset time value when the attention deviation angle value is less than the first angle threshold, and linearly decrease with an increase in the attention deviation angle value when the attention deviation angle value is between the first angle threshold and the second angle threshold.

In one embodiment, the detection module 1601 may obtain a face image of the driver, perform feature point positioning on the face image to obtain feature point information, obtain eyeball center coordinates and pupil center coordinates of the driver according to the feature point information, and determine the gaze deviation angle value according to the eyeball center coordinates and the pupil center coordinates.

In one embodiment, the detection module 1601 may obtain a face image of the driver, perform feature point positioning on the face image to obtain feature point information, determine a spatial position relationship between a current two-dimensional image of the driver and a front face three-dimensional model of the driver according to the feature point positioning information and the front face three-dimensional model of the driver, and determine a head deviation angle value of the driver according to the spatial position relationship.

In one embodiment, the detection module 1601 may extract a first DPM feature map, where the first DPM feature map is a DPM feature map of the face image, perform sampling processing on the first DPM feature map, extract a second DPM feature map, where the second DPM feature map is a DPM feature map of an image obtained by performing sampling processing on the first DPM feature map, perform convolution operation on the first DPM feature map by using a pre-trained root filter to obtain a response map of the root filter, perform convolution operation on N times of the second DPM feature map by using a pre-trained component filter to obtain a response map of the component filter, where resolution of the component filter is N times of resolution of the root filter, where N is a positive integer, obtain a target response map according to the response map of the root filter and the response map of the component filter, determine a face region according to the target response map, and carrying out characteristic point positioning on the face area to obtain characteristic point information.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 17. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of face feature analysis. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the architecture shown in fig. 17 is only a block diagram of some of the structures associated with the inventive arrangements and does not constitute a limitation on the computing devices to which the inventive arrangements may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the driver distraction detection method in any of the above embodiments when executing the computer program.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, carries out the driver distraction detection method in any of the above embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A driver distraction detection method, characterized in that the method comprises:

detecting a head deviation angle value and a sight line deviation angle value of a driver; determining an attention deviation angle value according to the head deviation angle value and the sight line deviation angle value; the attention deviation angle value is obtained by calculation of a preset direction, the head deviation angle value and the sight line deviation angle value;

2. The method for detecting distraction of the driver according to claim 1, further comprising, before determining that the driver is in the distraction state if the first duration of the attention deviation angle value being greater than the preset first angle threshold is greater than the time criterion threshold:

if the second duration time that the head deviation angle value is greater than the second angle threshold value is greater than the time judgment threshold value, judging that the driver is in a distraction state;

if the head deviation angle value is not greater than a preset second angle threshold, or the second duration is not greater than the time evaluation threshold, detecting whether the first duration is greater than the time evaluation threshold when the attention deviation angle value is greater than a preset first angle threshold.

3. The driver distraction detection method according to claim 1 or 2, wherein the determining a time judgment threshold value according to the attention deviation angle value comprises:

when the attention deviation angle value is larger than a second angle threshold value, the time evaluation threshold value is a first preset time value;

when the attention deviation angle value is smaller than the first angle threshold value, the time evaluation threshold value is a second preset time value;

the time evaluation threshold decreases linearly with increasing angle of attention deviation value when the angle of attention deviation value is between the first angle threshold and the second angle threshold.

4. The driver distraction detection method of claim 2, wherein obtaining the angle of gaze deviation value for the driver comprises:

acquiring a face image of the driver, and positioning feature points of the face image to obtain feature point information;

acquiring eyeball center coordinates and pupil center coordinates of the driver according to the characteristic point information;

and determining the sight deviation angle value according to the eyeball center coordinate and the pupil center coordinate.

5. The driver distraction detection method of claim 1, wherein the detecting the head deviation angle value of the driver comprises:

determining a spatial position relationship between the current two-dimensional image of the driver and the front face three-dimensional model according to the feature point positioning information and the front face three-dimensional model of the driver;

and determining the head deviation angle value of the driver according to the spatial position relation.

6. The method according to claim 4 or 5, wherein the positioning of the feature points of the face image to obtain the feature point information comprises:

extracting a first DPM feature map, wherein the first DPM feature map is a DPM feature map of the face image;

sampling the first DPM feature map, and extracting a second DPM feature map, wherein the second DPM feature map is a DPM feature map of an image obtained by sampling the first DPM feature map;

performing convolution operation on the first DPM characteristic diagram by using a pre-trained root filter to obtain a response diagram of the root filter;

performing convolution operation on the N times of the second DPM characteristic diagram by using a pre-trained component filter to obtain a response diagram of the component filter, wherein the resolution of the component filter is N times of that of the root filter, and N is a positive integer;

obtaining a target response diagram according to the response diagram of the root filter and the response diagram of the component filter;

and determining a face area according to the target response image, and positioning the feature points of the face area to obtain feature point information.

7. The driver distraction detection method according to claim 1 or 2, wherein the process of detecting the angle of gaze deviation value of the driver employs a multi-modal convolutional neural network model, the data samples of which are eye region images generated by rendering a plurality of dynamically controllable eye region models.

8. A driver distraction detection device, characterized in that the device comprises:

the detection module is used for detecting a head deviation angle value and a sight line deviation angle value of a driver; the processing module is used for determining an attention deviation angle value according to the head deviation angle value and the sight line deviation angle value, and determining a time judgment threshold value according to the attention deviation angle value; the attention deviation angle value is obtained by calculation of a preset direction, the head deviation angle value and the sight line deviation angle value;

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.