CN110096957B

CN110096957B - Fatigue driving monitoring method and system based on facial recognition and behavior recognition fusion

Info

Publication number: CN110096957B
Application number: CN201910236282.8A
Authority: CN
Inventors: 刘星; 张伟
Original assignee: Suzhou Tsingtech Microvision Electronic Technology Co ltd
Current assignee: Suzhou Tsingtech Microvision Electronic Technology Co ltd
Priority date: 2019-03-27
Filing date: 2019-03-27
Publication date: 2023-08-08
Anticipated expiration: 2039-03-27
Also published as: CN110096957A

Abstract

The invention discloses a fatigue driving monitoring method based on fusion of facial recognition and behavior recognition, which comprises the following steps: detecting the position of a human body on the collected behavior image; performing skeleton detection on the detected positions of the human body to obtain the positions of all parts of the human body in the image and the corresponding confidence, predicting the association vector fields among all the parts, and representing the connection relation among all the parts through the association vector fields to obtain a human skeleton model; comparing the skeleton model of the predefined fatigue driving state with the obtained skeleton model to obtain a behavior state identification result; processing the collected facial images to obtain facial expression state characteristics, judging whether to drive fatigue according to the facial expression state characteristics, and obtaining a facial state recognition result; and fusing the face state recognition result and the behavior state recognition result to obtain a final detection result. The face recognition result and the behavior recognition result are dynamically fused, so that the fatigue driving state can be accurately judged, and the accuracy is higher.

Description

Fatigue driving monitoring method and system based on facial recognition and behavior recognition fusion

Technical Field

The invention relates to the technical field of fatigue driving detection, in particular to a fatigue driving monitoring method and system based on fusion of facial recognition and behavior recognition.

Background

There are many methods for detecting fatigue states of a driver, and methods based on physiological signals of a driver, operation behaviors of a driver, vehicle state information, physiological response characteristics of a driver, and the like are roughly classified according to the types of detection.

The accuracy of judging fatigue driving based on physiological signals (brain electrical signals, electrocardiosignals and the like) is higher, physiological signals are not very different for all healthy drivers, and the physiological signals have commonality, but the traditional physiological signal acquisition mode needs to adopt contact measurement, so that a lot of inconvenience and limitation are brought to the practical application of fatigue detection of the drivers.

The operation behavior of the driver is affected by personal habits, traveling speed, road environment, and operation skills, in addition to the fatigue state, and thus many disturbance factors to be considered affect the accuracy of determining the fatigue driving based on the operation behavior of the driver (such as steering wheel operation).

The fatigue state of the driver can be estimated by using the vehicle running state information such as the vehicle running track change and the lane departure, but the running state of the vehicle is also related to many environmental factors such as the vehicle characteristics and the road, and the driving experience and the driving habit of the driver are highly relevant, so that the disturbance factors to be considered in determining the fatigue driving based on the vehicle state information are also more.

The fatigue driving judging method based on the physiological response characteristics of the driver is to infer the fatigue state of the driver by utilizing the eye characteristics, mouth movement characteristics and the like of the driver, the information is considered to be important characteristics reflecting fatigue, and the blink amplitude, blink frequency, average closing time, yawning actions and the like can be directly used for detecting fatigue, but the robustness of judging the state of the driver through a single facial expression characteristic is not high enough due to certain differences among habits and characteristics of different drivers.

However, the detection means of a single detection index has limitations due to individual differences of drivers, mainly manifested as low accuracy, easy occurrence of deviation, and the like.

Therefore, how to introduce the multi-sensor information fusion technology into the driving fatigue detection technology to improve the accuracy and the real-time performance of the driver fatigue detection becomes a problem to be solved urgently.

Disclosure of Invention

In order to solve the technical problems, the invention provides a fatigue driving monitoring system and a fatigue driving monitoring method based on facial recognition and behavior recognition fusion, which dynamically fuses a facial recognition result and a behavior recognition result, can accurately judge the fatigue driving state and has higher precision.

The technical scheme adopted by the invention is as follows:

a fatigue driving monitoring method based on fusion of facial recognition and behavior recognition comprises the following steps:

s01: detecting the position of a human body by adopting a first depth convolution neural network on the acquired behavior image;

s02: performing skeleton detection on the detected human body position by adopting a second deep convolutional neural network to obtain the position of each part of the human body in the image and the corresponding confidence coefficient, simultaneously predicting the association vector field among the parts, representing the connection relation among the parts by the association vector field, and obtaining a human body skeleton model according to the obtained position of each part and the association vector field among the parts;

s03: comparing the skeleton model of the predefined fatigue driving state with the obtained skeleton model to obtain a behavior state identification result p ₁ (x)；

S04: processing the collected facial images of the driver to obtain facial expression state characteristics of the driver, judging whether to fatigue driving according to the facial expression state characteristics, and obtaining a facial state recognition result p ₂ (x)；

S05: result p of face status recognition ₂ (x) Behavior state recognition result p ₁ (x) Fusion is carried out to obtain a final detection result p (x), namely: p (x) =w ₁ ·p ₁ (x)+w ₂ ·p ₂ (x)；

Wherein w is ₁ ,w ₂ Weights for feature fusion.

In a preferred technical solution, the step S05 further includes dividing a fatigue grade, where the fatigue grade includes a stage I, light fatigue; stage II, moderate fatigue; stage III, severe fatigue; and when the fatigue driving is judged, adopting different alarm modes for alarming the fatigue of different grades.

In a preferred embodiment, w in the identification process ₁ ,w ₂ According to p ₁ P ₂ Values of (2)And (3) dynamically adjusting, wherein the corresponding relation formula is as follows: w (w) ₁ ＝p ₁ *p ₁ ，w ₂ ＝p ₂ *p ₂ 。

In a preferred embodiment, the step of detecting the position of the human body in the step S01 includes:

1) Determining a plurality of candidate boxes in the image;

2) Obtaining a multidimensional feature vector diagram according to the first deep convolutional neural network;

3) Obtaining a plurality of possible candidate frames for each position of the image;

4) Judging whether the multidimensional features extracted from the candidate frames belong to a specific class or not by using a classifier;

5) And for candidate frames belonging to a certain class, adjusting the positions of the candidate frames by using a regressor to obtain a target position with the minimum loss function, namely a target frame Rect (x, y, width, height), wherein x, y are central point coordinates, width is width, and height is height.

In a preferred technical solution, the specific step S04 includes:

s41: processing the acquired face image of the driver by adopting a progressive calibration network to calibrate the face, wherein the progressive calibration network comprises three layers of networks, and a layer 1 network is used for calculating the face deflection angle and calibrating the face; the layer 2 network is used for further calculating the face deflection angle and further calibrating the face; the layer 3 network is used for calculating the face deflection angle, calibrating the face, and outputting face classification, deflection angle and boundary regression;

s42: positioning facial features, wherein the facial features comprise a mouth, eyes and a head, separating out eye images, inputting the eye images into a third depth convolutional neural network, classifying to obtain eye states, and judging whether the eye states are eye-closed states or not; separating the mouth region image, inputting the mouth region image into a fourth depth network, classifying to obtain a state of a mouth, and judging whether the state is a yawning state or not; analyzing to obtain the head movement state by counting the head area track, and judging whether the head is in a relatively static state or not;

s43: counting and judging the state characteristics of the mouth, eyes and head in a period of timeWhether the broken eyes generate a closed eye state, whether the mouth generates a yawning state and whether the head is in a relatively static state are cumulatively judged to obtain a face state recognition result p ₂ (x)。

The invention also discloses a fatigue driving monitoring system based on the fusion of facial recognition and behavior recognition, which comprises:

the human body position detection module is used for detecting the position of a human body by adopting a first depth convolution neural network to the acquired behavior image;

the human skeleton detection module is used for carrying out skeleton detection on the detected human body position by adopting a second deep convolutional neural network to obtain the position of each part of the human body in the image and the corresponding confidence coefficient, and simultaneously predicting the association vector field among the parts, representing the connection relation among the parts by the association vector field, and obtaining a human skeleton model according to the obtained position of each part and the association vector field among the parts;

the behavior state recognition module is used for comparing the skeleton model of the predefined fatigue driving state with the obtained skeleton model to obtain a behavior state recognition result p ₁ (x)；

Facial image processing and recognition module: processing the collected facial images of the driver to obtain facial expression state characteristics of the driver, judging whether to fatigue driving according to the facial expression state characteristics, and obtaining a facial state recognition result p ₂ (x)；

And a fusion module: result p of face status recognition ₂ (x) Behavior state recognition result p ₁ (x) Fusion is carried out to obtain a final detection result p (x), namely: p (x) =w ₁ ·p ₁ (x)+w ₂ ·p ₂ (x)；

Wherein w is ₁ ,w ₂ Weights for feature fusion.

In the preferred technical scheme, the device also comprises a judging and alarming module which is used for dividing the fatigue grade, wherein the fatigue grade comprises an I stage and mild fatigue; stage II, moderate fatigue; stage III, severe fatigue; and when the fatigue driving is judged, adopting different alarm modes for alarming the fatigue of different grades.

In a preferred technical solution, the fusion module, w, during the identification process ₁ ,w ₂ According to p ₁ P ₂ The value of (2) is dynamically adjusted, and the corresponding relation formula is as follows: w (w) ₁ ＝p ₁ *p ₁ ，w ₂ ＝p ₂ *p ₂ 。

In a preferred technical solution, the human body position detection step in the human body position detection module includes:

1) Determining a plurality of candidate boxes in the image;

In a preferred technical solution, the specific processing steps of the facial image processing and identifying module include:

s43: counting the state characteristics of the mouth, eyes and head in a period of time, judging whether the eyes generate the eye closing state, judging whether the mouth generates the yawning state and judging whether the head is in a relatively static state in an accumulated way, and obtaining a face state recognition result p ₂ (x)。

Compared with the prior art, the invention has the beneficial effects that:

1. according to the invention, the face recognition result and the behavior recognition result are dynamically fused, so that the fatigue driving state can be accurately judged, and the grading early warning reminding can be performed in real time, so that the accuracy is higher.

2. The invention uses the human skeleton model state to represent the posture of the human body, can more accurately analyze and predict the behavior state of the driver, and monitors the behavior of the driver.

Drawings

The invention is further described below with reference to the accompanying drawings and examples:

FIG. 1 is a flow chart of a fatigue driving monitoring method based on fusion of facial recognition and behavior recognition;

FIG. 2 is a schematic diagram showing the detection effect of the human body target according to the present invention;

FIG. 3 is a schematic diagram of the detection effect of the human skeleton model of the present invention;

fig. 4 is a schematic diagram of the rotating face detection of the present invention.

Detailed Description

The objects, technical solutions and advantages of the present invention will become more apparent by the following detailed description of the present invention with reference to the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the invention. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention.

Examples

As shown in fig. 1, the fatigue driving monitoring method based on the fusion of facial recognition and behavior recognition of the invention comprises the following steps:

the invention adopts a double-camera acquisition device to acquire an image S101, mainly comprises a face camera 1, acquires the image aiming at the face of a driver, and inputs the image to a face analysis module. The face is captured by the camera and input to the algorithm module for recognition and processing. The camera 1 adopts an MIPI digital camera, and the image resolution is 1920×1080. Supporting automatic exposure, automatic focusing and automatic white balance. The image output per second is not less than 30 frames.

The behavior camera 2 is downwards from the front of the driver by about 45 degrees, is aimed at the body part of the driver, collects the behavior image of the driver and inputs the behavior image into the behavior analysis module. The camera 2 adopts an AHD analog camera, and the image resolution is 1280 x 720. Supporting automatic exposure, automatic focusing and automatic white balance. The image output per second is not less than 30 frames.

And the human body positioning detection module is used for determining the position S102 of the human body through the behavior image acquired by the behavior camera. Based on the deep learning technology, a lightweight convolutional neural network is designed, and an end-to-end single-stage training method is used. The feature images of different convolution layers are detected respectively, and finally, the non-maximum suppression method is used for integrating the results to detect the human body, so that the real-time performance of the algorithm is greatly improved compared with the method of using an image pyramid detection in the traditional algorithm. As shown in fig. 2, the human body is located, and the target position Rect (x, y, width, height) is returned. The end-to-end human body detection method based on deep learning comprises the following steps:

1) About 1000-2000 candidate boxes are determined in the image (using selective searching).

2) And inputting the whole picture into a convolutional neural network to obtain 128-dimensional feature vector diagrams related to human body gradients, edges and the like.

3) 256-channel images at scale 51 x 39, consider 9 possible candidate windows for each position of the image.

4) And judging whether the 128-dimensional features extracted from the candidate frames belong to a specific class or not by using a classifier.

5) And for a candidate frame belonging to a certain characteristic, adjusting the position of the candidate frame by using a regressive to obtain a target position Rect (x, y, width, height) with the minimum loss function, wherein x, y, a center point x, y coordinates, a width of the height object are the same as the target position Rect.

A human skeleton S103 is obtained. The human body framework state is obtained by carrying out framework detection on the human body framework, and the current action and behavior of the human body are represented, so that the current behavior action of the human body can be deduced as long as the state of the human body framework can be detected, and the driving behavior of a driver can be monitored through the human body framework detection model.

The human skeleton model utilizes a deep convolutional neural network to process an input image, outputs the positions of all parts of a human body in a picture and the corresponding confidence degrees, predicts the association vector fields among all the parts to represent the connection relation among all the parts, and finally utilizes a greedy algorithm to infer the output positions and the association vector fields among the parts to obtain the human skeleton model.

And obtaining a behavior state recognition result S104 through the classification result, as shown in fig. 3, comparing the obtained skeleton model with the skeleton model when the fatigue state of the driver is defined through training in advance, and calculating the distance between vectors through a support vector machine algorithm to obtain the similarity. Namely, the behavior recognition state result P1.

The face and the five sense organs are located through the image collected by the facial camera S105, and the face and the five sense organs can be located through the collected facial image, and the expression characteristic can be obtained by processing the face and the five sense organs by the method described in the patent document of 201310731567.1 application to judge the fatigue driving. But the present application proposes a new solution.

The present solution proposes to progressively calibrate the network detector. Given an image, all candidate faces are obtained according to the sliding window and image pyramid principles, and each candidate window steps through the detector. In each stage of the progressive calibration network, the detector simultaneously rejects most candidates with low face confidence, performs a boundary regression on the remaining face candidates, and calibrates the plane rotation direction of the face candidates. After each stage, non-maximal suppression (NMS) is used to merge those candidates that are highly overlapping.

Layer 1 network: firstly, a deflection angle is slightly judged, the face is calibrated, and the deflection angle range of the face is reduced.

Layer 2 network: in the same way, the face deflection angle range is further calibrated to be reduced.

Layer 3 network: and (3) accurately calculating the deflection angle, and directly outputting face classification, deflection angle and boundary regression by using a third layer network after calibration based on the first two steps. The face angle is calibrated progressively, the deflection angle of the face is reduced gradually, the face rotating at any angle can be processed by the method, and the result is shown in figure 4.

Modeling and classifying the mouth, eyes and head, S106, positioning the mouth, eyes and head parts, separating eye images, inputting the eye images into a CNN network, and classifying to obtain the eye state, and whether the eye state is a closed eye state or not. And separating the mouth region image, inputting the mouth region image into a CNN network, and classifying to obtain states of the mouth, such as states of yawning or the like. And (3) analyzing and obtaining the head movement state by counting the head area track, and judging whether the head of the driver is in a relatively static state or not.

The facial state recognition result S107 is obtained through the classification result, and the facial state recognition result P2 is obtained through statistics of the mouth and eye state characteristics in a period of time, namely, through accumulated judgment of whether the eyes generate the eye closing state, whether the mouth generates the yawning state and whether the head is in the relatively static state.

The fusion judgment strategy S108 fuses the facial recognition state and the behavior recognition state to obtain a final detection result, namely:

p(x)＝w ₁ ·p ₁ (x)+w ₂ ·p ₂ (x)

wherein w is ₁ ,w ₂ Weights fused to features; initial weight w ₁ ＝0.5，w ₂ =0.5, w in the identification process ₁ ,w ₂ According to p ₁ P ₂ The corresponding relation formula is as follows: w (w) ₁ ＝p ₁ *p ₁ ，w ₂ ＝p ₂ *p ₂ 。

And after the fatigue driving is judged, the fatigue grade is divided, and different forms of alarm modes are adopted for the fatigue of different grades. The fatigue grade is divided into stage I and mild fatigue; stage II, moderate fatigue; stage III, severe fatigue. The behavior is shown in the following table. The alarm prompting sound in the stage I is gentle, the prompting sound in the stage II is rapid and high in volume, the prompting sound in the stage III is rapid and high in volume, and a vibration seat can be additionally arranged.

It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explanation of the principles of the present invention and are in no way limiting of the invention. Accordingly, any modification, equivalent replacement, improvement, etc. made without departing from the spirit and scope of the present invention should be included in the scope of the present invention. Furthermore, the appended claims are intended to cover all such changes and modifications that fall within the scope and boundary of the appended claims, or equivalents of such scope and boundary.

Claims

1. The fatigue driving monitoring method based on the fusion of facial recognition and behavior recognition is characterized by comprising the following steps of:

s03: a skeleton model of a predefined fatigue driving state and the obtained skeletonComparing the models to obtain a behavior state recognition result p ₁ (x)；

Wherein w is ₁ ，w ₂ Weights fused to features;

the step S05 further comprises the steps of classifying fatigue grades, wherein the fatigue grades comprise I phases and mild fatigue; stage II, moderate fatigue; stage III, severe fatigue; and when the fatigue driving is judged, adopting different alarm modes for alarming the fatigue of different grades.

2. The fatigue driving monitoring method based on the fusion of facial recognition and behavior recognition according to claim 1, wherein in the recognition process, w ₁ ，w ₂ According to p ₁ P ₂ The value of (2) is dynamically adjusted, and the corresponding relation formula is as follows: w (w) ₁ ＝p ₁ *p ₁ ，w ₂ ＝p ₂ *p ₂ 。

3. The fatigue driving monitoring method based on the fusion of facial recognition and behavior recognition according to claim 1, wherein the human body position detection step in step S01 includes:

1) Determining a plurality of candidate boxes in the image;

4. The fatigue driving monitoring method based on the fusion of facial recognition and behavior recognition according to claim 1, wherein the specific step of step S04 includes:

5. A fatigue driving monitoring system based on fusion of facial recognition and behavior recognition, comprising:

Wherein w is ₁ ，w ₂ Weights fused to features;

the fatigue grade detection device also comprises a judging and alarming module which is used for dividing the fatigue grade, wherein the fatigue grade comprises a stage I and mild fatigue; stage II, moderate fatigue; stage III, severe fatigue; and when the fatigue driving is judged, adopting different alarm modes for alarming the fatigue of different grades.

6. The fatigue driving monitoring system based on fusion of facial recognition and behavior recognition according to claim 5, wherein the fusion module, during recognition, w ₁ ，w ₂ According to p ₁ P ₂ The value of (2) is dynamically adjusted, and the corresponding relation formula is as follows: w (w) ₁ ＝p ₁ *p ₁ ，w ₂ ＝p ₂ *p ₂ 。

7. The fatigue driving monitoring system based on fusion of facial recognition and behavior recognition according to claim 5, wherein the human body position detection step in the human body position detection module comprises:

1) Determining a plurality of candidate boxes in the image;

8. The fatigue driving monitoring system based on the fusion of facial recognition and behavior recognition according to claim 5, wherein the specific processing steps of the facial image processing and recognition module include: