CN111753674A

CN111753674A - Fatigue driving detection and identification method based on deep learning

Info

Publication number: CN111753674A
Application number: CN202010505707.3A
Authority: CN
Inventors: 徐国保; 姚旭; 叶昌鑫; 麦锐滔; 赵霞; 王骥; 王立臣; 李小立; 陆晓珉; 陈晓航
Original assignee: Guangdong Ocean University
Current assignee: Guangdong Ocean University
Priority date: 2020-06-05
Filing date: 2020-06-05
Publication date: 2020-10-09

Abstract

The invention provides a method for detecting and identifying fatigue driving based on deep learning, which comprises the following steps: collecting real-time video images of driving of a driver; extracting a face image in the video image; realizing key point alignment of the face image by adopting a face characteristic point detection method, and segmenting eye and mouth regions in the face image; establishing a three-channel convolution neural network by using the extracted images of the human face, the eyes and the mouth, and detecting whether a driver opens or closes the eyes and yawns; and (3) calculating the percentage of the closed state of the eyes of the driver in unit time, and simultaneously combining the frequency of yawning to finish an early warning mechanism of fatigue driving of the driver. The invention does not need to be in direct contact with a driver, effectively removes the interference of environmental noise by automatically identifying the face, has high real-time performance, combines the eye and mouth characteristics, improves the proportion of the influence of key characteristics on the result, further improves the precision and has better applicability.

Description

Fatigue driving detection and identification method based on deep learning

Technical Field

The invention relates to the technical field of deep learning, in particular to a fatigue driving detection and identification method based on deep learning.

Background

Fatigue driving is an important factor in the occurrence of traffic accidents, and in 2014, reports by the national highway traffic safety administration NHTSA recorded 846 death incidents related to drowsy drivers, and these deaths remained essentially the same over the last decade. Between 2005 and 2009, the average number of accidents associated with drowsy driving per year was estimated to be 83,000. Under the background of economic high-speed growth, the holding quantity of motor vehicles is exponentially increased, the road traffic mileage is increased year by year, and how to detect fatigue driving is a difficult problem in academia in order to reduce the occurrence of traffic accidents.

The behaviors mainly shown by the driver in fatigue driving are blinking, nodding, yawning and the like, wherein the yawning is one of the main behaviors. At present, fatigue driving detection is mainly divided into two main categories: behavioral characteristic-based methods and physiological characteristic-based methods. Although both the two types of fatigue detection methods have certain effects, the detection method based on physiological signal characteristics has high precision, but a driver needs to be in direct physical contact with detection equipment to acquire signals, the driving operation of the driver is possibly interfered, the equipment cost is too high, and the method is more suitable for a laboratory environment and is not suitable for practical application; the detection method based on the behavior characteristics does not need a driver to directly contact the detection device, is low in equipment requirement on the basis of the existing device of the automobile, strong in practicability and easy to popularize, can be limited by personal habits of the driver, road conditions and vehicle models, and is low in detection accuracy rate in rainy and snowy weather or when the road conditions are not ideal.

At present, fatigue detection related methods mainly comprise a physiological characteristic detection based method, a behavior characteristic detection based method and a deep learning detection based method. The physiological characteristic-based detection method mainly adopts physiological signals and physiological response characteristics as the basis for judgment. Wu D et al consider an important regression problem in brain-computer interface technology BCI, namely, online driver drowsiness estimation is performed according to electroencephalogram signals EEG, and then the team provides a new enhanced batch processing mode active learning EBMAL regression method, and the baseline active learning method is improved by improving the reliability, representativeness and diversity of selected samples, so that better regression performance is realized on fatigue driving detection. Cui Y et al propose a method for feature weighted sporadic training of FWET on processing electroencephalographic signals to completely eliminate the calibration requirement. Experiments show that the characteristic weighting and the accidental training are effective, and the integration of the characteristic weighting and the accidental training can further improve the fatigue driving detection performance. Suman D et al considered parameters such as eye blinking duration, blinking rate, eye opening time, eye closing time, and peak speed, and proposed a method to analyze an eye electrical signal (EOG) for detecting drowsiness of a driver, although a fatigue driving method based on physiological characteristics is breaking through, physiological signals such as brain waves, eye blinking parameters, and the like are extremely easily interfered by environmental noise in the acquisition process, and a part of systems need to use head-mounted equipment to acquire signals, which causes certain inconvenience to driver driving.

The behavior characteristics mainly comprise vehicle behavior characteristic driving and human behavior characteristics. At present, the method based on the behavior features mainly extracts the behavior signal information of the vehicle and the human. Zhang Y et al propose a pulse control model PCM for vehicle lane keeping. The pulse classification steering characteristics can be used for normal driver state recognition and highlight abnormal driving behavior, thereby recognizing typical driving characteristics such as inattention, tiredness, and the like. Krajewski et al studied a fatigue monitoring system based on steering behavior of the steering wheel with a recognition rate of 86.1% on a strong fatigue data set. Baronti F et al, by incorporating a pressure sensor in the steering wheel, collected the pressure distribution of the steering wheel while the driver was driving, and finally used to detect the fatigue state of the driver. Morris D M et al propose a method for detecting fatigue driving based on lane position deviation indicators by analyzing lateral lane position changes and vehicle heading direction differences. Ding et al propose a method for detecting inattentive driving behavior based on a frequency modulated continuous wave radar system, with an average accuracy of about 95% obtained by extensive experiments in a real automotive environment. Sun et al facilitate real-time identification of fatigue status by determining the three most effective contextual features of continuous driving time, sleep duration and current time. The method based on the behavior characteristics has good effect on fatigue detection, but the judgment through lane deviation or other signals of the vehicle is very limited by the limits of road conditions and vehicle models, and is not easy to popularize. The steering behavior and the pressure distribution of the steering wheel are judged, and the applicability of the steering wheel is poor due to different driving habits of different drivers.

In recent years, deep learning has been highly successful in the fields of classification, decision-making, target detection, and the like. The technology also promotes the research of fatigue detection technology. Dwivedi et al propose an intelligent method based on vision to detect drowsiness of the driver, and use a shallow convolutional neural network CNN to extract facial features, and achieve 78% accuracy. Park et al propose a deep architecture called deep sleepiness detection DDD network for learning valid features and detecting the sleepiness of the driver given RGB input video, which uses multiple networks for feature extraction, where multiple network frameworks are fused. Finally, 73% accuracy was achieved on the data set it was made. Chiou C Y et al propose a new individual-based Hierarchical DMS (HDMS), and the HDMS has a two-layer working mechanism, and is superior to the existing latest DMS method in the aspects of detecting normal driving behaviors, drowsy driving behaviors and dispersed driving behaviors. The driving scheme of the dangerous state of Lashkov, I.B. and the like focuses on identifying sleepiness and distraction of a driver, and in a prototype mobile application test based on a smart phone, the performance and efficiency of identifying the dangerous driving state are improved. The basic idea of Chao Yan et al is to use the extracted features to detect the position of the driver's hand region to predict whether unsafe driving is possible. Zhang F et al use infrared video for detection and propose a convolutional neural network CNN based eye state identification method to detect fatigue. In recent studies for detecting fatigue driving based on deep learning, although various methods can perform a certain recognition function on a specific data set, there are many possible areas to be improved. Both k.dwivedi et al and s.park et al, above, improve the accuracy of fatigue driving by using two classifications. In practical application, the complexity of networks is too complex, so that the operation speed is too low, and the speed requirement of real-time detection cannot be met, the operation speed is improved by reducing a too lengthy network structure, the accuracy is ensured as much as possible, and the number of matrix operations is reduced by reducing the dimensionality of input, so that the operation speed is improved, but even if the reduced network frame is still too complex, the anti-noise performance of a model is poor by directly taking a human face as input, and the accuracy is also influenced to a certain extent. Whether the driver is unsafe driving or not is predicated through detecting the hand region condition by the Chao Yan and the like, the ZhangF and the like detect fatigue through detecting the eye state, the driver only depends on a single visual characteristic to detect fatigue driving, and the driver is difficult to adapt to complicated and diversified driver's cab background in real life. In addition, the degree of opening and closing of the eyes may vary from person to person, and irregular head movements, hand movements may also produce false alarms, and the like.

How to provide a fatigue detection method with strong anti-interference performance and high accuracy is a technical problem to be solved urgently.

Disclosure of Invention

The invention aims to provide a method for detecting and identifying fatigue driving based on deep learning, which solves the problems in the prior art, can accurately detect the driving state of a driver and has strong anti-interference performance.

In order to achieve the purpose, the invention provides the following scheme: the invention provides a method for detecting and identifying fatigue driving based on deep learning, which comprises the following steps:

a. collecting real-time video images of driving of a driver;

b. extracting a face image in the video image;

c. realizing key point alignment of the face image by adopting a face characteristic point detection method, and segmenting eye and mouth regions in the face image;

d. establishing a three-channel convolution neural network by using the extracted images of the human face, the eyes and the mouth, and detecting the results of opening and closing the eyes and yawning of a driver;

e. and (4) calculating the percentage of the eyes of the driver in the closed state in the preset time, and combining the frequency of yawning to finish the early warning of the fatigue driving of the driver.

Preferably, in step b, an Adaboost method based on rectangular features is selected to extract a face image in the video image, the Adaboost method forms a strong classifier by cascading weak classifiers, and the (x) expression of the weak classifier h is as follows:

where f (-) is the eigenvalue, Θ is the threshold, p is the direction of the inequality, j represents the serial number of the weak classifier.

Preferably, the feature of the region is calculated by selecting different rectangular features and performing continuous shift sliding on a detected picture rectangular window, and every time the rectangular window reaches a region, the sum of the pixels located in the white rectangular region is subtracted from the sum of the gray rectangular pixels, and the calculated value is the feature value of the region; wherein the calculation of the rectangular eigenvalue is accelerated by using an integrogram, and the basic data required for the calculation of the matrix eigenvalue is the sum of pixels within the area.

Preferably, the integral map is calculated by integrating the pixel points in a rectangular region formed by the point (x, y) and the origin of the image, and the formula is as follows:

S(X,Y)＝S(X,Y-1)+I(X,Y)

II(X,Y)＝II(X-1,Y)+S(X,Y)

where II (X, Y) represents the integrogram, I (X, Y) represents the pixel points of the original image, S (X, Y) is the cumulative row sum, and (X ', Y') represents the pixel values in the image.

Preferably, the specific content of step d is as follows: after the eye, mouth and face regions of the driver are segmented, a three-channel convolutional neural network model is established, the features of the eye, mouth and face are extracted through a convolutional layer, and then the features of the three channels are summed together to classify whether yawning is done to the driver; a branch is separated from the eye neural network in the main network to identify the state of the eyes in real time.

Preferably, in step e, the percentage of the closed state of the driver's eyes in a predetermined time is found based on a physical quantity PERCLOS measuring fatigue/drowsiness, which is defined as the ratio of the number of frames of images of the driver's eyes closed to the total number of frames in a continuous period of time, namely:

the invention discloses the following technical effects: the method does not need to be in direct contact with a driver, effectively removes the interference of environmental noise by automatically identifying the face, has high real-time performance, combines the eye and mouth characteristics, improves the proportion of the influence of key characteristics on the result, further improves the detection precision of the driving state of the driver, and has better use value.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a flowchart of a method for detecting and identifying fatigue driving based on deep learning according to the present invention;

FIG. 2 is an exemplary diagram of a rectangular feature of the present invention;

FIG. 3 is a schematic diagram of an integral graph used in the present invention;

FIG. 4 is a contour diagram of a feature point of a human face;

FIG. 5 is a network framework diagram;

FIG. 6 is an eye enhancement data set;

FIG. 7 is an example of a YAWDD data set for use with the present embodiment;

fig. 8 shows the detection results of different driving states of the driver according to the present embodiment.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Referring to fig. 1 to 8, the invention provides a method for detecting and identifying fatigue driving based on deep learning, which comprises the following steps:

the method comprises the steps of collecting a real-time video image of a driver driving, then extracting a face image in the video image by using an Adaboost method based on rectangular features, and selecting different rectangular features and enabling the different rectangular features to continuously shift and slide on a detected picture window. For each position of the rectangular window, the feature of the area is calculated, and the sum of the pixels in the white rectangle is subtracted from the sum of the pixels in the gray rectangle, as shown in fig. 2: (A) two rectangular features are shown in (a), (B), (C) three rectangular features are shown, and (D) four rectangular features are shown. The calculated value is the characteristic value of the area. Wherein the use of the integral map can be used to speed up the computation of the rectangular eigenvalue.

The basic data required for calculating the rectangular feature is the sum of pixels in the area, and the integral map is calculated by integrating the pixel points in the rectangular area formed by the point (x, y) and the image origin, and the formula is as follows:

S(X,Y)＝S(X,Y-1)+I(X,Y)

II(X,Y)＝II(X-1,Y)+S(X,Y)

wherein II (X, Y) represents an integral graph, I (X, Y) represents a pixel point of an original image, S (X, Y) is the accumulated row sum, and (X ', Y') represents a pixel value in an image to be detected.

The feature values are computed quickly by the integrogram shown in fig. 3, using four array references to compute the sum of the pixels in the rectangle D. The value of the integral image at position 1 is the sum of the pixels in rectangle A the value at position 2 is A + B, the value at position 3 is A + C, and the value at position 4 is A + B + C + D. The sum in D is calculated to be 4+1- (2+3), namely, the characteristic value of the rectangular characteristic is only related to the endpoint integrogram of the characteristic rectangle, and the speed is greatly improved due to the characteristic, so that the requirement of real-time detection is met.

The Adaboost method can enable classification to reach any precision by cascading weak classifiers to form a strong classifier. Wherein, the weak classifier h (x) has the following expression:

in the above formula, f (·) is a feature value, Θ is a threshold, and is used to determine whether the face is a human face, p is a direction of an inequality, j represents a sequence number of a weak classifier, and a specific training process of the Adaboost method is shown in table 1:

TABLE 1

Then, the human face characteristic points are detected by a human face characteristic point detection method, the key point alignment of a human face image is realized, the eye and mouth regions in the human face image are segmented, the method optimizes a square error loss function and trains to obtain an optimal model by a gradient lifting and regression tree integrating method, and finally 68 special mark point models are detected in an input image, the human face is aligned, and the outlines of the eyes, the mouth and the chin are shown in figure 4.

As shown in fig. 5, after the eye, mouth and face regions are located, the eye, mouth and face regions are segmented, then a three-channel convolutional neural network model is established, the features of the eye, mouth and face are extracted through convolutional layers, finally the three-channel features are summed together to classify whether yawning is performed, and a branch is divided from the eye neural network in the main network to identify the state of the eye in real time.

Then, judging the frequency of yawning by using a PERCLOS method to further calculate whether the driver is fatigue driving, wherein the specific method comprises the following steps: the method comprises the following steps of judging whether a driver is tired within a preset time by using a PERCLOS principle, wherein PERCLOS represents the value of the number of image frames and the total number of frames of the tired driver within a period of continuous time, and is specifically shown as the following formula:

and (4) calculating the percentage of the eyes of the driver in the closed state in the preset time, and combining the frequency of yawning to finish the early warning of the fatigue driving of the driver.

The embodiment also verifies the method of the invention, specifically adopts MRL Eye Dataset as the blink Dataset and YAWDD as the yawdown Dataset, the MRL Dataset contains 38 persons and 84900 data sets, and the data sets are collected by three different sensors, and have a certain diversity. The invention increases certain randomness for the data set through scaling, histogram equalization and random positive and negative rotation by 20 degrees, and finally constructs 60900 training sets and 24000 testing sets, as shown in fig. 6. The data set YAWDD can be used for verifying methods such as face detection, face feature extraction, yawning detection and the like. It contains 2 video sets with a number of driver behaviors with different facial features recorded. The resolution of the video recording was 640x480, with 24bit true color, recording 30 frames per second. The first video set comprises 322 videos, and a camera for shooting the first video set is arranged below the rearview mirror in the automobile. It records 3 ~ 4 sections of videos to every driver, contains multiple different facial behaviors such as normal driving, speech, singing, yawning, has contained 29 videos in the second video set, and the camera of shooting the second video set is installed in the panel board top, and every driver has only one section video, has contained normal driving, has spoken when driving and has beaten yawning 3 kinds of behaviors when driving. An example YAWDD data set is shown in fig. 7.

In the experiment of this example, the server was trained using GTX Titan X (2 pcs). The RTX 2070 GPU was used for testing and deployed in embedded x86 board with I56500. GTX Titan X has 3072 light-shielding units and works at 1100Mhz (over-frequency). The server unit used in this embodiment can perform the calculation of 14.72 TFLOPS (32fp) at the maximum. RTX 2070 has 2304 material units, runs in stock setup, base clock 1410Mhz and turbine up to 1620 Mhz. It may perform 7.465 TFLOPS (32fp) calculations. The test PC has i5-9600k running on all cores at 5.3Ghz (over-clocking) and 32gb memory. In the test environment of the present embodiment, the embedded board of I56500 can calculate 700GFLOPS (32fp) using iGPU acceleration (openCL). The device is less expensive than any of the previously mentioned GPUs, and the extensibility and software ecology as an X86 processor makes the board a suitable model that can be deployed as a small and powerful unit on a motor vehicle.

The system uses Ubuntu 16.04 and the deep learning framework uses MXnet. The complete data set is stored on the local iSCSI server in a compressed large Numpy array for fast loading during training. The initial learning rate of the network training is 0.01, and after 60 periods of training, the learning rate of the next period is reduced through a natural index function. Finally, the model was verified at YAWDD, MRL EyeDataset.

Eye state:

table 2 shows that the accuracy of the Eye data MRL Eye Dataset in the model is verified, 24000 pieces of data are collected in the test set, and each of the eyes opened and closed accounts for half, and it can be seen that the accuracy is very high by using the extracted high-dimensional features as network input, which meets the application requirements.

TABLE 2

And (3) detecting a fatigue state:

the experimental accuracy of the method established by the present embodiment was verified to be 97% on the verification set YAWDD, and compared with other methods.

TABLE 3 method accuracy rates based on validation on YAWDD datasets

As can be seen from table 3, the method of the present embodiment increases the specific gravity of the key features by inputting the key features as models. Compared with other methods, the method has high precision, strong anti-interference performance due to the diversity of input, and applicability to different environments, and the experimental result is shown in fig. 8.

The invention does not need to be in direct contact with a driver, effectively removes the interference of environmental noise by automatically identifying the face, has high real-time performance, combines the eye and mouth characteristics, improves the proportion of the key characteristics on the result, further improves the precision, has better applicability and is suitable for popularization and use in the industry.

In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, are merely for convenience of description of the present invention, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention.

The above-mentioned embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements made to the technical solutions of the present invention by those skilled in the art without departing from the spirit of the present invention should fall into the protection scope of the present invention.

Claims

1. A detection and identification method for fatigue driving based on deep learning is characterized by comprising the following steps:

a. collecting real-time video images of driving of a driver;

b. extracting a face image in the video image;

2. The method for detecting and identifying fatigue driving based on deep learning of claim 1, wherein in step b, an Adaboost method based on rectangular features is selected to extract the face image in the video image, the Adaboost method forms a strong classifier by cascading weak classifiers, and the expression of the weak classifier h (x) is as follows:

3. The method for detecting and identifying fatigue driving based on deep learning of claim 2, wherein the feature of the region is calculated by selecting different rectangular features and performing continuous shift sliding on the detected picture rectangular window, the characteristic of the region is calculated every time the rectangular window reaches a region, the sum of the pixels in the white rectangular region is subtracted from the sum of the pixels in the gray rectangular region, and the calculated value is the feature value of the region; wherein the calculation of the rectangular eigenvalue is accelerated by using an integrogram, and the basic data required for the calculation of the matrix eigenvalue is the sum of pixels within the area.

4. The method for detecting and identifying fatigue driving based on deep learning of claim 3, wherein the integral map is calculated by integrating the points (x, y) and the pixel points in the rectangular region formed by the origin of the image, and the formula is as follows:

S(X,Y)＝S(X,Y-1)+I(X,Y)

II(X,Y)＝II(X-1,Y)+S(X,Y)

5. The method for detecting and identifying fatigue driving based on deep learning according to claim 1, wherein the specific content of step d is as follows: after the eye, mouth and face regions of the driver are segmented, a three-channel convolutional neural network model is established, the features of the eye, mouth and face are extracted through a convolutional layer, and then the features of the three channels are summed together to classify whether yawning is done to the driver; a branch is separated from the eye neural network in the main network to identify the state of the eyes in real time.

6. The method for detecting and identifying fatigue driving based on deep learning of claim 1, wherein the percentage of the closed state of the driver's eyes in the predetermined time is determined based on the physical quantity PERCLOS for measuring fatigue/drowsiness in step e, the PERCLOS is defined as the ratio of the number of the image frames of the driver's eyes closed to the total number of the frames in a continuous period of time, namely: