CN116630943A

CN116630943A - Method, device, equipment and medium for constructing fatigue detection model of driver

Info

Publication number: CN116630943A
Application number: CN202310554606.9A
Authority: CN
Inventors: 黄莉; 冉光伟; 刘棨; 舒选才; 周健珊; 邓晨; 张莹; 刘俊峰
Original assignee: Xinghe Zhilian Automobile Technology Co Ltd
Current assignee: Xinghe Zhilian Automobile Technology Co Ltd
Priority date: 2023-05-16
Filing date: 2023-05-16
Publication date: 2023-08-22

Abstract

The application discloses a method, a device, equipment and a medium for constructing a fatigue detection model of a driver, wherein facial image data of the driver in different fatigue states are collected as a data set; extracting the characteristic information of the driver from the data set by adopting a pre-built characteristic extraction model; taking the transducer model as a characteristic extractor of the depth residual error network model, and adding cross-layer connection to a residual error block of the depth residual error network model to obtain an improved depth residual error network model; and inputting the extracted characteristic information into an improved depth residual error network model for training to obtain a fatigue detection model. A fatigue detection model capable of accurately detecting various and complex fatigue expression forms of drivers is constructed.

Description

Method, device, equipment and medium for constructing fatigue detection model of driver

Technical Field

The application relates to the technical field of vehicle control, in particular to a method, a device, equipment and a medium for constructing a driver fatigue detection model.

Background

In the traffic safety field, fatigue driving is one of important factors causing traffic accidents, in the statistical traffic accidents, 25-30% of traffic accidents are caused by fatigue driving, and in the major traffic accidents, 40% of traffic accidents are caused by fatigue driving; there is a fatigue driving experience for 70% of drivers on the highway. Driver fatigue detection is very important for driving safety.

However, the existing fatigue detection technology model has low detection precision, and is difficult to accurately detect the fatigue expression form of the driver with various and complex representation adaptations.

Disclosure of Invention

In order to solve the problems, the application provides a method, a device, equipment and a medium for constructing a fatigue detection model of a driver, which can accurately detect various and complex fatigue expression forms of the driver.

The embodiment of the application provides a method for constructing a fatigue detection model of a driver, which comprises the following steps:

collecting facial image data of a driver in different fatigue states as a data set;

extracting the characteristic information of the driver from the data set by adopting a pre-built characteristic extraction model;

taking the transducer model as a characteristic extractor of the depth residual error network model, and adding cross-layer connection to a residual error block of the depth residual error network model to obtain an improved depth residual error network model;

and inputting the extracted characteristic information into an improved depth residual error network model for training to obtain a fatigue detection model.

Preferably, the feature information specifically includes facial features, eye movements, and head features extracted by a convolutional neural network trained in advance, and the degree of eye closure extracted by a face recognition algorithm.

As an improvement of the scheme, the convolutional neural network is specifically a VGG-16 network architecture or a depth residual error network model;

the face recognition algorithm specifically comprises a Dlib library.

As a preferred solution, the method uses a transducer model as an extractor of a depth residual error network model, and adds a cross-layer connection to a residual error block of the depth residual error network model to obtain an improved depth residual error network model, which specifically includes:

replacing a second convolution layer in the ResNet block with a single-layer transform model, and adding the output of the previous layer to the input of the current layer through a cross-layer connection to obtain an improved ResNet block;

constructing a depth residual error network model by using a plurality of improved ResNet blocks, carrying out global average pooling on the output of the last improved ResNet block, and outputting a classification result by the full connection layer as the input of the full connection layer;

and calculating a prediction result by adopting the Sigmoid function as an activation function to obtain an improved depth residual error network model.

Preferably, the optimization objective of the improved depth residual network model is specifically:

wherein θ represents a model parameter, f (x _i ) Representing input x _i Is used to determine the model predictive value of (1), I.I ₂ Represents L ₂ Norms, λ is regularization parameter, y _i For single modified ResNet block output, y _i ＝x _i +FFN(MHA(F(x _i )))，F(x _i ) Representing the eigenvectors after the first convolution layer of a single modified ResNet block, MHA (·) represents the multi-headed attentiveness mechanism, FFN (·) represents the feedforward neural network, N represents the number of samples, and W represents the weight matrix of the fully connected layer.

Preferably, the calculation formula of the improved depth residual network model is as follows:

y＝σ(W ₂ ReLU(W ₁ AvgPool(F(x)))+b ₂ )；

where x represents the input eigenvector, y represents the output, F (x) represents the eigenvalue matrix of multiple improved ResNet block extractions, avgPool (·) represents the global average pooling layer, W ₁ And W is ₂ Respectively representing weight matrix of two full connection layers, b ₂ Representing the bias vector, reLU (·) representing the activation function, σ (·) representing the residual activation function.

As a preferred embodiment, the method further comprises:

and performing scaling, clipping and graying operations on the data in the acquired data set.

The embodiment of the application also provides a device for constructing the fatigue detection model of the driver, which comprises the following components:

the data acquisition module is used for acquiring facial image data of the driver in different fatigue states as a data set;

the feature extraction module is used for extracting the feature information of the driver from the data set by adopting a pre-built feature extraction model;

the model construction module is used for taking the transducer model as a characteristic extractor of the depth residual error network model, and adding cross-layer connection to a residual error block of the depth residual error network model to obtain an improved depth residual error network model;

and the model training module is used for inputting the extracted characteristic information into the improved depth residual error network model for training to obtain a fatigue detection model.

the face recognition algorithm specifically comprises a Dlib library.

Preferably, the method uses a transducer model as an extractor of a depth residual error network model, and adds a cross-layer connection to a residual error block of the depth residual error network model to obtain an improved depth residual error network model, which specifically includes:

As a preferred solution, the optimization objective of the improved depth residual network model is specifically:

wherein the method comprises the steps ofθ represents model parameters, f (x) _i ) Representing input x _i Is used to determine the model predictive value of (1), I.I ₂ Represents L ₂ Norms, λ is regularization parameter, y _i For single modified ResNet block output, y _i ＝x _i +FFN(MHA(F(x _i )))，F(x _i ) Representing the eigenvectors after the first convolution layer of a single modified ResNet block, MHA (·) represents the multi-headed attentiveness mechanism, FFN (·) represents the feedforward neural network, N represents the number of samples, and W represents the weight matrix of the fully connected layer.

y＝σ(W ₂ ReLU(W ₁ AvgPool(F(x)))+b ₂ )；

Preferably, the apparatus further comprises a preprocessing module for:

The embodiment of the application also provides a terminal device, which comprises a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, wherein the processor realizes the method for constructing the driver fatigue detection model according to any one of the embodiments when executing the computer program.

The embodiment of the application also provides a computer readable storage medium, which comprises a stored computer program, wherein the equipment where the computer readable storage medium is located is controlled to execute the method for constructing the fatigue detection model of the driver according to any one of the embodiments when the computer program runs.

The application provides a method, a device, equipment and a medium for constructing a fatigue detection model of a driver, wherein facial image data of the driver in different fatigue states are collected as a data set; extracting the characteristic information of the driver from the data set by adopting a pre-built characteristic extraction model; taking the transducer model as a characteristic extractor of the depth residual error network model, and adding cross-layer connection to a residual error block of the depth residual error network model to obtain an improved depth residual error network model; and inputting the extracted characteristic information into an improved depth residual error network model for training to obtain a fatigue detection model. A fatigue detection model capable of accurately detecting various and complex fatigue expression forms of drivers is constructed.

Drawings

FIG. 1 is a schematic flow chart of a method for constructing a fatigue detection model of a driver according to an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a driver fatigue detection model building device according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Referring to fig. 1, a schematic flow chart of a method for constructing a fatigue detection model of a driver according to an embodiment of the present application is shown, and the method includes steps S1 to S4;

s1, collecting facial image data of a driver in different fatigue states as a data set;

s2, extracting the characteristic information of the driver from the data set by adopting a pre-built characteristic extraction model;

s3, using the transducer model as an extractor of the depth residual error network model, and adding cross-layer connection to a residual error block of the depth residual error network model to obtain an improved depth residual error network model;

and S4, inputting the extracted characteristic information into an improved depth residual error network model for training to obtain a fatigue detection model.

When the embodiment is implemented, firstly, a data set is needed for model training, and a large number of facial image data and video data of drivers in different fatigue states are collected as the data set. The acquisition of the data set can be obtained from a public database or can be collected through an actual driving scene.

Extracting feature information for model training through a pre-built feature extraction model, inputting the collected data set into the pre-built feature extraction model, and extracting feature information for analyzing fatigue states;

aiming at the problem that the traditional ResNet model uses a convolution layer as a feature extractor, only local features can be learned, and the feature extraction capability of the model is not strong, the application adopts a transform model as the feature extractor of the depth residual error network model, so that global features can be better learned, and the feature extraction capability of the model is improved.

Adding cross-layer connection to a residual block of the depth residual network model to obtain an improved depth residual network model; and the transducer model and the ResNet model are fused, so that the nonlinear fitting capacity of the network is enhanced. The defect of insufficient nonlinear fitting capability of the traditional ResNet model is avoided, and fatigue performance of drivers with various forms cannot be dealt with.

And inputting the extracted characteristic information into an improved depth residual error network model for training to obtain a fatigue detection model, and judging the fatigue state of the real-time monitored facial features of the driver by using the trained model.

In the embodiment, the transducer is used as the feature extractor, so that the network can learn the global features, and the expression capacity and performance of the network are improved. And the improved ResNet block is adopted to fuse the transducer block with the ResNet block, so that the nonlinear fitting capacity of the network is enhanced. The application constructs the fatigue detection model capable of accurately detecting various and complex fatigue expression forms of the driver.

In yet another embodiment provided by the present application, the feature information specifically includes facial features, eye movements and head features extracted by a pre-trained convolutional neural network, and the degree of eye closure extracted by a facial recognition algorithm.

In the embodiment, when extracting the feature information, the extracted feature information may include various types of features, and may include facial feature information such as facial features, eye movements, head features, and eye closing degrees, which are used for model construction.

The specific characteristic information can be identified by extracting facial characteristics, eye actions and head characteristics through a pre-trained convolutional neural network; the key points are detected through a face recognition algorithm, and the eye closure degree is calculated.

The model detection range can be wider by extracting various types of characteristic information for model training, so that the final model can be detected only through the characteristics of eyes and mouths when fatigue detection is performed, and the model has expansibility.

In a further embodiment provided by the application, the convolutional neural network is specifically a VGG-16 network architecture or a depth residual network model;

the face recognition algorithm specifically comprises a Dlib library.

In implementations of this embodiment, the model that may be employed when facial features, eye movements, and head features are extracted using a pre-trained convolutional neural network model includes a VGG-16 network architecture or a depth residual network model.

The key points are detected through a face recognition algorithm, and a model which can be used when the eye closure degree is calculated comprises a Dlib library.

In this embodiment, two convolutional neural network models for extracting facial features, eye movements and head features are shown, and in other embodiments, other models may be used to extract facial features, eye movements and head features.

It should be noted that, in the present embodiment, a face recognition algorithm for extracting the eye closure degree is provided, and in other embodiments, other algorithms may be used to calculate the eye closure degree.

In yet another embodiment of the present application, the step S3 specifically includes:

In the implementation of this embodiment, the present application uses an improved ResNet model for driver fatigue detection, where the ResNet blocks of the existing ResNet model are a deep residual neural network, each ResNet block is composed of two convolutional layers and a residual layer connected, where the residual connection adds the output of the previous layer directly to the input of the current layer.

A transducer is a self-attention mechanism network that processes sequence data. The transducer does not need to keep history information like RNN in the calculation process, so that parallel calculation can be performed, and the calculation speed is higher. The second convolution layer in the ResNet block is replaced by a single-layer transform model, and the output of the previous layer is added to the input of the current layer through cross-layer connection, so that the network can learn global features better, the network can learn identity mapping more easily, and the expression capability and performance of the network are improved.

The problems of gradient extinction and gradient explosion in the depth network are solved by introducing a cross-layer connection (i.e. a residual connection). The output of the previous layer is also added to the input of the current layer using a cross-layer connection, preserving the depth and information flow of the network.

To get a better characterization, a number of modified ResNet blocks are used to build the whole model and a global averaging pooling layer and fully connected layer are added on top of it to get classification results. The output of the last modified ResNet block is globally averaged pooled and then taken as input to the full connection layer. The prediction result is finally calculated using the Sigmoid function as an activation function.

The global average pooling layer and the full connection layer are used for obtaining the final classification result, and meanwhile, the Sigmoid function is used as an activation function, so that the algorithm has better interpretability and stability.

In a further embodiment provided by the present application, the optimization objective of the improved depth residual network model is specifically:

In the implementation of this embodiment, the driver fatigue detection task is a binary classification problem requiring training of a binary classifier f (x): R ^d -0, 1. Wherein xR is ^d Represents the input feature vector, {0,1} represents the class label.

Considering the structure of a single ResNet block, assuming that the input of the block is x and the output is y, the calculation formula of the ResNet block is specifically: y=x+f (x);

where F (x) represents the convolution operation within the block, i.e., the output of the input x after a series of convolution transformations. This formula shows that the residual block implements identity mapping by adding a cross-layer connection.

The method provided by the application uses a transducer as a feature extractor of the ResNet model, and improves a residual block. The transducer network is described next.

Consider a single layer transducer model, assuming an input of x and an output of y. The input information is first encoded by a multi-head attention mechanism and then input into a feedforward neural network for nonlinear transformation. The final output is obtained by adding residual connection and input, and the calculation formula is as follows: y=x+ffn (MHA (x)).

Where MHA (x) represents the encoding result of the input information under the multi-head attention mechanism, FFN (·) represents the computational operation of the feedforward neural network, i.e. the ReLU activation function between the two fully connected layers. This formula shows that the residual block implements identity mapping by adding a cross-layer connection.

The calculation formula of the improved ResNet model is y _i ＝x _i +FFN(MHA(F(x _i )))；

Wherein f (x) _i ) Representing input x _i Model predictive value of F (x) _i ) Representing the eigenvectors after the first convolution layer, MHA (·) represents the multi-headed attentiveness mechanism and FFN (·) represents the feedforward neural network. This formula shows that the residual block implements identity mapping by adding cross-layer connections and a transducer model.

The optimization objective of the entire improved ResNet model is the same as that of the normal ResNet model, i.e., minimizing the cross entropy loss function:

wherein θ represents a model parameter, f (x _i ) The predicted value of the model is represented, I.I ₂ Represents L ₂ Norm, λ is the regularization parameter, W represents the weight matrix of the fully connected layer, N represents the number of samples.

In yet another embodiment of the present application, the calculation formula of the improved depth residual network model is:

y＝σ(W ₂ ReLU(W ₁ AvgPool(F(x)))+b ₂ )；

When the embodiment is implemented, the calculation formula of the whole improved depth residual network model can be expressed as follows:

y＝η(W ₂ ReLU(W ₁ AvgPool(F(x)))+b ₂ )；

where x represents the input feature vector, F (x) represents the feature matrix of multiple improved ResNet block extractions, avgPool (·) represents the global average pooling layer, W ₁ And W is ₂ Respectively representing weight matrix of two full connection layers, b ₂ Representing the bias vector. The final output is activated by a Sigmoid function, and a classification result is obtained, wherein sigma (·) represents a residual activating function.

In yet another embodiment provided by the present application, the method further comprises:

When the embodiment is specifically implemented, the data in the data set is subjected to scaling, cutting and graying operations, so that the quality of the data set is improved, and the accuracy of a fatigue detection model obtained through subsequent training of the data set can be improved.

Still another embodiment of the present application provides a device for constructing a fatigue detection model of a driver, referring to fig. 2, which is a schematic structural diagram of the device for constructing a fatigue detection model of a driver according to the embodiment of the present application, where the device includes:

The driver fatigue detection model construction device provided in this embodiment can execute all the steps and functions of the driver fatigue detection model construction method provided in any one of the above embodiments, and specific functions of the device are not described herein.

Referring to fig. 3, a schematic structural diagram of a terminal device according to an embodiment of the present application is provided. The terminal device includes: a processor, a memory and a computer program stored in the memory and executable on the processor, such as a driver fatigue detection model building program. The steps of the embodiment of the method for constructing the fatigue detection model of the driver, such as steps S1 to S4 shown in fig. 1, are implemented when the processor executes the computer program. Alternatively, the processor may implement the functions of the modules in the above-described device embodiments when executing the computer program.

The computer program may be divided into one or more modules, which are stored in the memory and executed by the processor to accomplish the present application, for example. The one or more modules may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments describe the execution of the computer program in the one terminal device. For example, the computer program may be divided into several modules, and specific functions of each module are described in detail in the method for constructing a fatigue detection model for a driver provided in any of the foregoing embodiments, and specific functions of the device are not described herein.

The terminal equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The terminal device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of one type of terminal device and is not limiting of one type of terminal device, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the one type of terminal device may also include input-output devices, network access devices, buses, etc.

The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is a control center of the one terminal device, and which connects the respective parts of the entire one terminal device using various interfaces and lines.

The memory may be used to store the computer program and/or the module, and the processor may implement the various functions of the driver fatigue detection model building device by running or executing the computer program and/or the module stored in the memory and invoking the data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

Wherein the terminal device integrated module may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a stand alone product. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.

It should be noted that modifications and adaptations to the application may occur to one skilled in the art without departing from the principles of the present application and are intended to be within the scope of the present application.

Claims

1. A method for constructing a driver fatigue detection model, the method comprising:

2. The driver fatigue detection model construction method according to claim 1, wherein the feature information specifically includes facial features, eye movements, and head features extracted by a convolutional neural network trained in advance, and eye closure degrees extracted by a face recognition algorithm.

3. The driver fatigue detection model construction method according to claim 2, wherein the convolutional neural network is specifically a VGG-16 network architecture or a depth residual network model;

the face recognition algorithm specifically comprises a Dlib library.

4. The method for constructing a fatigue detection model of a driver according to claim 1, wherein the method for constructing a modified depth residual network model by using a transducer model as an extractor of the depth residual network model and adding a cross-layer connection to a residual block of the depth residual network model specifically comprises:

5. The driver fatigue detection model construction method according to claim 1, wherein the optimization objective of the improved depth residual network model is specifically:

6. The driver fatigue detection model construction method according to claim 1, wherein the calculation formula of the improved depth residual network model is:

y＝σ(W ₂ ReLU(W ₁ AvgPool(F(x)))+b ₂ )；

7. The driver fatigue detection model construction method according to claim 1, characterized in that the method further comprises:

8. A driver fatigue detection model construction apparatus, characterized by comprising:

9. A terminal device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the driver fatigue detection model construction method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored computer program, wherein the computer program, when run, controls a device in which the computer-readable storage medium is located to execute the driver fatigue detection model construction method according to any one of claims 1 to 7.