CN116630943A - Method, device, equipment and medium for constructing fatigue detection model of driver - Google Patents

Method, device, equipment and medium for constructing fatigue detection model of driver Download PDF

Info

Publication number
CN116630943A
CN116630943A CN202310554606.9A CN202310554606A CN116630943A CN 116630943 A CN116630943 A CN 116630943A CN 202310554606 A CN202310554606 A CN 202310554606A CN 116630943 A CN116630943 A CN 116630943A
Authority
CN
China
Prior art keywords
model
residual error
depth residual
driver
network model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310554606.9A
Other languages
Chinese (zh)
Inventor
黄莉
冉光伟
刘棨
舒选才
周健珊
邓晨
张莹
刘俊峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinghe Zhilian Automobile Technology Co Ltd
Original Assignee
Xinghe Zhilian Automobile Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinghe Zhilian Automobile Technology Co Ltd filed Critical Xinghe Zhilian Automobile Technology Co Ltd
Priority to CN202310554606.9A priority Critical patent/CN116630943A/en
Publication of CN116630943A publication Critical patent/CN116630943A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/193Preprocessing; Feature extraction
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Ophthalmology & Optometry (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a method, a device, equipment and a medium for constructing a fatigue detection model of a driver, wherein facial image data of the driver in different fatigue states are collected as a data set; extracting the characteristic information of the driver from the data set by adopting a pre-built characteristic extraction model; taking the transducer model as a characteristic extractor of the depth residual error network model, and adding cross-layer connection to a residual error block of the depth residual error network model to obtain an improved depth residual error network model; and inputting the extracted characteristic information into an improved depth residual error network model for training to obtain a fatigue detection model. A fatigue detection model capable of accurately detecting various and complex fatigue expression forms of drivers is constructed.

Description

Method, device, equipment and medium for constructing fatigue detection model of driver
Technical Field
The application relates to the technical field of vehicle control, in particular to a method, a device, equipment and a medium for constructing a driver fatigue detection model.
Background
In the traffic safety field, fatigue driving is one of important factors causing traffic accidents, in the statistical traffic accidents, 25-30% of traffic accidents are caused by fatigue driving, and in the major traffic accidents, 40% of traffic accidents are caused by fatigue driving; there is a fatigue driving experience for 70% of drivers on the highway. Driver fatigue detection is very important for driving safety.
However, the existing fatigue detection technology model has low detection precision, and is difficult to accurately detect the fatigue expression form of the driver with various and complex representation adaptations.
Disclosure of Invention
In order to solve the problems, the application provides a method, a device, equipment and a medium for constructing a fatigue detection model of a driver, which can accurately detect various and complex fatigue expression forms of the driver.
The embodiment of the application provides a method for constructing a fatigue detection model of a driver, which comprises the following steps:
collecting facial image data of a driver in different fatigue states as a data set;
extracting the characteristic information of the driver from the data set by adopting a pre-built characteristic extraction model;
taking the transducer model as a characteristic extractor of the depth residual error network model, and adding cross-layer connection to a residual error block of the depth residual error network model to obtain an improved depth residual error network model;
and inputting the extracted characteristic information into an improved depth residual error network model for training to obtain a fatigue detection model.
Preferably, the feature information specifically includes facial features, eye movements, and head features extracted by a convolutional neural network trained in advance, and the degree of eye closure extracted by a face recognition algorithm.
As an improvement of the scheme, the convolutional neural network is specifically a VGG-16 network architecture or a depth residual error network model;
the face recognition algorithm specifically comprises a Dlib library.
As a preferred solution, the method uses a transducer model as an extractor of a depth residual error network model, and adds a cross-layer connection to a residual error block of the depth residual error network model to obtain an improved depth residual error network model, which specifically includes:
replacing a second convolution layer in the ResNet block with a single-layer transform model, and adding the output of the previous layer to the input of the current layer through a cross-layer connection to obtain an improved ResNet block;
constructing a depth residual error network model by using a plurality of improved ResNet blocks, carrying out global average pooling on the output of the last improved ResNet block, and outputting a classification result by the full connection layer as the input of the full connection layer;
and calculating a prediction result by adopting the Sigmoid function as an activation function to obtain an improved depth residual error network model.
Preferably, the optimization objective of the improved depth residual network model is specifically:
wherein θ represents a model parameter, f (x i ) Representing input x i Is used to determine the model predictive value of (1), I.I 2 Represents L 2 Norms, λ is regularization parameter, y i For single modified ResNet block output, y i =x i +FFN(MHA(F(x i ))),F(x i ) Representing the eigenvectors after the first convolution layer of a single modified ResNet block, MHA (·) represents the multi-headed attentiveness mechanism, FFN (·) represents the feedforward neural network, N represents the number of samples, and W represents the weight matrix of the fully connected layer.
Preferably, the calculation formula of the improved depth residual network model is as follows:
y=σ(W 2 ReLU(W 1 AvgPool(F(x)))+b 2 );
where x represents the input eigenvector, y represents the output, F (x) represents the eigenvalue matrix of multiple improved ResNet block extractions, avgPool (·) represents the global average pooling layer, W 1 And W is 2 Respectively representing weight matrix of two full connection layers, b 2 Representing the bias vector, reLU (·) representing the activation function, σ (·) representing the residual activation function.
As a preferred embodiment, the method further comprises:
and performing scaling, clipping and graying operations on the data in the acquired data set.
The embodiment of the application also provides a device for constructing the fatigue detection model of the driver, which comprises the following components:
the data acquisition module is used for acquiring facial image data of the driver in different fatigue states as a data set;
the feature extraction module is used for extracting the feature information of the driver from the data set by adopting a pre-built feature extraction model;
the model construction module is used for taking the transducer model as a characteristic extractor of the depth residual error network model, and adding cross-layer connection to a residual error block of the depth residual error network model to obtain an improved depth residual error network model;
and the model training module is used for inputting the extracted characteristic information into the improved depth residual error network model for training to obtain a fatigue detection model.
Preferably, the feature information specifically includes facial features, eye movements, and head features extracted by a convolutional neural network trained in advance, and the degree of eye closure extracted by a face recognition algorithm.
As an improvement of the scheme, the convolutional neural network is specifically a VGG-16 network architecture or a depth residual error network model;
the face recognition algorithm specifically comprises a Dlib library.
Preferably, the method uses a transducer model as an extractor of a depth residual error network model, and adds a cross-layer connection to a residual error block of the depth residual error network model to obtain an improved depth residual error network model, which specifically includes:
replacing a second convolution layer in the ResNet block with a single-layer transform model, and adding the output of the previous layer to the input of the current layer through a cross-layer connection to obtain an improved ResNet block;
constructing a depth residual error network model by using a plurality of improved ResNet blocks, carrying out global average pooling on the output of the last improved ResNet block, and outputting a classification result by the full connection layer as the input of the full connection layer;
and calculating a prediction result by adopting the Sigmoid function as an activation function to obtain an improved depth residual error network model.
As a preferred solution, the optimization objective of the improved depth residual network model is specifically:
wherein the method comprises the steps ofθ represents model parameters, f (x) i ) Representing input x i Is used to determine the model predictive value of (1), I.I 2 Represents L 2 Norms, λ is regularization parameter, y i For single modified ResNet block output, y i =x i +FFN(MHA(F(x i ))),F(x i ) Representing the eigenvectors after the first convolution layer of a single modified ResNet block, MHA (·) represents the multi-headed attentiveness mechanism, FFN (·) represents the feedforward neural network, N represents the number of samples, and W represents the weight matrix of the fully connected layer.
Preferably, the calculation formula of the improved depth residual network model is as follows:
y=σ(W 2 ReLU(W 1 AvgPool(F(x)))+b 2 );
where x represents the input eigenvector, y represents the output, F (x) represents the eigenvalue matrix of multiple improved ResNet block extractions, avgPool (·) represents the global average pooling layer, W 1 And W is 2 Respectively representing weight matrix of two full connection layers, b 2 Representing the bias vector, reLU (·) representing the activation function, σ (·) representing the residual activation function.
Preferably, the apparatus further comprises a preprocessing module for:
and performing scaling, clipping and graying operations on the data in the acquired data set.
The embodiment of the application also provides a terminal device, which comprises a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, wherein the processor realizes the method for constructing the driver fatigue detection model according to any one of the embodiments when executing the computer program.
The embodiment of the application also provides a computer readable storage medium, which comprises a stored computer program, wherein the equipment where the computer readable storage medium is located is controlled to execute the method for constructing the fatigue detection model of the driver according to any one of the embodiments when the computer program runs.
The application provides a method, a device, equipment and a medium for constructing a fatigue detection model of a driver, wherein facial image data of the driver in different fatigue states are collected as a data set; extracting the characteristic information of the driver from the data set by adopting a pre-built characteristic extraction model; taking the transducer model as a characteristic extractor of the depth residual error network model, and adding cross-layer connection to a residual error block of the depth residual error network model to obtain an improved depth residual error network model; and inputting the extracted characteristic information into an improved depth residual error network model for training to obtain a fatigue detection model. A fatigue detection model capable of accurately detecting various and complex fatigue expression forms of drivers is constructed.
Drawings
FIG. 1 is a schematic flow chart of a method for constructing a fatigue detection model of a driver according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of a driver fatigue detection model building device according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Referring to fig. 1, a schematic flow chart of a method for constructing a fatigue detection model of a driver according to an embodiment of the present application is shown, and the method includes steps S1 to S4;
s1, collecting facial image data of a driver in different fatigue states as a data set;
s2, extracting the characteristic information of the driver from the data set by adopting a pre-built characteristic extraction model;
s3, using the transducer model as an extractor of the depth residual error network model, and adding cross-layer connection to a residual error block of the depth residual error network model to obtain an improved depth residual error network model;
and S4, inputting the extracted characteristic information into an improved depth residual error network model for training to obtain a fatigue detection model.
When the embodiment is implemented, firstly, a data set is needed for model training, and a large number of facial image data and video data of drivers in different fatigue states are collected as the data set. The acquisition of the data set can be obtained from a public database or can be collected through an actual driving scene.
Extracting feature information for model training through a pre-built feature extraction model, inputting the collected data set into the pre-built feature extraction model, and extracting feature information for analyzing fatigue states;
aiming at the problem that the traditional ResNet model uses a convolution layer as a feature extractor, only local features can be learned, and the feature extraction capability of the model is not strong, the application adopts a transform model as the feature extractor of the depth residual error network model, so that global features can be better learned, and the feature extraction capability of the model is improved.
Adding cross-layer connection to a residual block of the depth residual network model to obtain an improved depth residual network model; and the transducer model and the ResNet model are fused, so that the nonlinear fitting capacity of the network is enhanced. The defect of insufficient nonlinear fitting capability of the traditional ResNet model is avoided, and fatigue performance of drivers with various forms cannot be dealt with.
And inputting the extracted characteristic information into an improved depth residual error network model for training to obtain a fatigue detection model, and judging the fatigue state of the real-time monitored facial features of the driver by using the trained model.
In the embodiment, the transducer is used as the feature extractor, so that the network can learn the global features, and the expression capacity and performance of the network are improved. And the improved ResNet block is adopted to fuse the transducer block with the ResNet block, so that the nonlinear fitting capacity of the network is enhanced. The application constructs the fatigue detection model capable of accurately detecting various and complex fatigue expression forms of the driver.
In yet another embodiment provided by the present application, the feature information specifically includes facial features, eye movements and head features extracted by a pre-trained convolutional neural network, and the degree of eye closure extracted by a facial recognition algorithm.
In the embodiment, when extracting the feature information, the extracted feature information may include various types of features, and may include facial feature information such as facial features, eye movements, head features, and eye closing degrees, which are used for model construction.
The specific characteristic information can be identified by extracting facial characteristics, eye actions and head characteristics through a pre-trained convolutional neural network; the key points are detected through a face recognition algorithm, and the eye closure degree is calculated.
The model detection range can be wider by extracting various types of characteristic information for model training, so that the final model can be detected only through the characteristics of eyes and mouths when fatigue detection is performed, and the model has expansibility.
In a further embodiment provided by the application, the convolutional neural network is specifically a VGG-16 network architecture or a depth residual network model;
the face recognition algorithm specifically comprises a Dlib library.
In implementations of this embodiment, the model that may be employed when facial features, eye movements, and head features are extracted using a pre-trained convolutional neural network model includes a VGG-16 network architecture or a depth residual network model.
The key points are detected through a face recognition algorithm, and a model which can be used when the eye closure degree is calculated comprises a Dlib library.
In this embodiment, two convolutional neural network models for extracting facial features, eye movements and head features are shown, and in other embodiments, other models may be used to extract facial features, eye movements and head features.
It should be noted that, in the present embodiment, a face recognition algorithm for extracting the eye closure degree is provided, and in other embodiments, other algorithms may be used to calculate the eye closure degree.
In yet another embodiment of the present application, the step S3 specifically includes:
replacing a second convolution layer in the ResNet block with a single-layer transform model, and adding the output of the previous layer to the input of the current layer through a cross-layer connection to obtain an improved ResNet block;
constructing a depth residual error network model by using a plurality of improved ResNet blocks, carrying out global average pooling on the output of the last improved ResNet block, and outputting a classification result by the full connection layer as the input of the full connection layer;
and calculating a prediction result by adopting the Sigmoid function as an activation function to obtain an improved depth residual error network model.
In the implementation of this embodiment, the present application uses an improved ResNet model for driver fatigue detection, where the ResNet blocks of the existing ResNet model are a deep residual neural network, each ResNet block is composed of two convolutional layers and a residual layer connected, where the residual connection adds the output of the previous layer directly to the input of the current layer.
A transducer is a self-attention mechanism network that processes sequence data. The transducer does not need to keep history information like RNN in the calculation process, so that parallel calculation can be performed, and the calculation speed is higher. The second convolution layer in the ResNet block is replaced by a single-layer transform model, and the output of the previous layer is added to the input of the current layer through cross-layer connection, so that the network can learn global features better, the network can learn identity mapping more easily, and the expression capability and performance of the network are improved.
The problems of gradient extinction and gradient explosion in the depth network are solved by introducing a cross-layer connection (i.e. a residual connection). The output of the previous layer is also added to the input of the current layer using a cross-layer connection, preserving the depth and information flow of the network.
To get a better characterization, a number of modified ResNet blocks are used to build the whole model and a global averaging pooling layer and fully connected layer are added on top of it to get classification results. The output of the last modified ResNet block is globally averaged pooled and then taken as input to the full connection layer. The prediction result is finally calculated using the Sigmoid function as an activation function.
The global average pooling layer and the full connection layer are used for obtaining the final classification result, and meanwhile, the Sigmoid function is used as an activation function, so that the algorithm has better interpretability and stability.
In a further embodiment provided by the present application, the optimization objective of the improved depth residual network model is specifically:
wherein θ represents a model parameter, f (x i ) Representing input x i Is used to determine the model predictive value of (1), I.I 2 Represents L 2 Norms, λ is regularization parameter, y i For single modified ResNet block output, y i =x i +FFN(MHA(F(x i ))),F(x i ) Representing the eigenvectors after the first convolution layer of a single modified ResNet block, MHA (·) represents the multi-headed attentiveness mechanism, FFN (·) represents the feedforward neural network, N represents the number of samples, and W represents the weight matrix of the fully connected layer.
In the implementation of this embodiment, the driver fatigue detection task is a binary classification problem requiring training of a binary classifier f (x): R d -0, 1. Wherein xR is d Represents the input feature vector, {0,1} represents the class label.
Considering the structure of a single ResNet block, assuming that the input of the block is x and the output is y, the calculation formula of the ResNet block is specifically: y=x+f (x);
where F (x) represents the convolution operation within the block, i.e., the output of the input x after a series of convolution transformations. This formula shows that the residual block implements identity mapping by adding a cross-layer connection.
The method provided by the application uses a transducer as a feature extractor of the ResNet model, and improves a residual block. The transducer network is described next.
Consider a single layer transducer model, assuming an input of x and an output of y. The input information is first encoded by a multi-head attention mechanism and then input into a feedforward neural network for nonlinear transformation. The final output is obtained by adding residual connection and input, and the calculation formula is as follows: y=x+ffn (MHA (x)).
Where MHA (x) represents the encoding result of the input information under the multi-head attention mechanism, FFN (·) represents the computational operation of the feedforward neural network, i.e. the ReLU activation function between the two fully connected layers. This formula shows that the residual block implements identity mapping by adding a cross-layer connection.
The calculation formula of the improved ResNet model is y i =x i +FFN(MHA(F(x i )));
Wherein f (x) i ) Representing input x i Model predictive value of F (x) i ) Representing the eigenvectors after the first convolution layer, MHA (·) represents the multi-headed attentiveness mechanism and FFN (·) represents the feedforward neural network. This formula shows that the residual block implements identity mapping by adding cross-layer connections and a transducer model.
The optimization objective of the entire improved ResNet model is the same as that of the normal ResNet model, i.e., minimizing the cross entropy loss function:
wherein θ represents a model parameter, f (x i ) The predicted value of the model is represented, I.I 2 Represents L 2 Norm, λ is the regularization parameter, W represents the weight matrix of the fully connected layer, N represents the number of samples.
In yet another embodiment of the present application, the calculation formula of the improved depth residual network model is:
y=σ(W 2 ReLU(W 1 AvgPool(F(x)))+b 2 );
where x represents the input eigenvector, y represents the output, F (x) represents the eigenvalue matrix of multiple improved ResNet block extractions, avgPool (·) represents the global average pooling layer, W 1 And W is 2 Respectively representing weight matrix of two full connection layers, b 2 Representing the bias vector, reLU (·) representing the activation function, σ (·) representing the residual activation function.
When the embodiment is implemented, the calculation formula of the whole improved depth residual network model can be expressed as follows:
y=η(W 2 ReLU(W 1 AvgPool(F(x)))+b 2 );
where x represents the input feature vector, F (x) represents the feature matrix of multiple improved ResNet block extractions, avgPool (·) represents the global average pooling layer, W 1 And W is 2 Respectively representing weight matrix of two full connection layers, b 2 Representing the bias vector. The final output is activated by a Sigmoid function, and a classification result is obtained, wherein sigma (·) represents a residual activating function.
In yet another embodiment provided by the present application, the method further comprises:
and performing scaling, clipping and graying operations on the data in the acquired data set.
When the embodiment is specifically implemented, the data in the data set is subjected to scaling, cutting and graying operations, so that the quality of the data set is improved, and the accuracy of a fatigue detection model obtained through subsequent training of the data set can be improved.
Still another embodiment of the present application provides a device for constructing a fatigue detection model of a driver, referring to fig. 2, which is a schematic structural diagram of the device for constructing a fatigue detection model of a driver according to the embodiment of the present application, where the device includes:
the data acquisition module is used for acquiring facial image data of the driver in different fatigue states as a data set;
the feature extraction module is used for extracting the feature information of the driver from the data set by adopting a pre-built feature extraction model;
the model construction module is used for taking the transducer model as a characteristic extractor of the depth residual error network model, and adding cross-layer connection to a residual error block of the depth residual error network model to obtain an improved depth residual error network model;
and the model training module is used for inputting the extracted characteristic information into the improved depth residual error network model for training to obtain a fatigue detection model.
The driver fatigue detection model construction device provided in this embodiment can execute all the steps and functions of the driver fatigue detection model construction method provided in any one of the above embodiments, and specific functions of the device are not described herein.
Referring to fig. 3, a schematic structural diagram of a terminal device according to an embodiment of the present application is provided. The terminal device includes: a processor, a memory and a computer program stored in the memory and executable on the processor, such as a driver fatigue detection model building program. The steps of the embodiment of the method for constructing the fatigue detection model of the driver, such as steps S1 to S4 shown in fig. 1, are implemented when the processor executes the computer program. Alternatively, the processor may implement the functions of the modules in the above-described device embodiments when executing the computer program.
The computer program may be divided into one or more modules, which are stored in the memory and executed by the processor to accomplish the present application, for example. The one or more modules may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments describe the execution of the computer program in the one terminal device. For example, the computer program may be divided into several modules, and specific functions of each module are described in detail in the method for constructing a fatigue detection model for a driver provided in any of the foregoing embodiments, and specific functions of the device are not described herein.
The terminal equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The terminal device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of one type of terminal device and is not limiting of one type of terminal device, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the one type of terminal device may also include input-output devices, network access devices, buses, etc.
The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is a control center of the one terminal device, and which connects the respective parts of the entire one terminal device using various interfaces and lines.
The memory may be used to store the computer program and/or the module, and the processor may implement the various functions of the driver fatigue detection model building device by running or executing the computer program and/or the module stored in the memory and invoking the data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
Wherein the terminal device integrated module may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a stand alone product. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.
It should be noted that modifications and adaptations to the application may occur to one skilled in the art without departing from the principles of the present application and are intended to be within the scope of the present application.

Claims (10)

1. A method for constructing a driver fatigue detection model, the method comprising:
collecting facial image data of a driver in different fatigue states as a data set;
extracting the characteristic information of the driver from the data set by adopting a pre-built characteristic extraction model;
taking the transducer model as a characteristic extractor of the depth residual error network model, and adding cross-layer connection to a residual error block of the depth residual error network model to obtain an improved depth residual error network model;
and inputting the extracted characteristic information into an improved depth residual error network model for training to obtain a fatigue detection model.
2. The driver fatigue detection model construction method according to claim 1, wherein the feature information specifically includes facial features, eye movements, and head features extracted by a convolutional neural network trained in advance, and eye closure degrees extracted by a face recognition algorithm.
3. The driver fatigue detection model construction method according to claim 2, wherein the convolutional neural network is specifically a VGG-16 network architecture or a depth residual network model;
the face recognition algorithm specifically comprises a Dlib library.
4. The method for constructing a fatigue detection model of a driver according to claim 1, wherein the method for constructing a modified depth residual network model by using a transducer model as an extractor of the depth residual network model and adding a cross-layer connection to a residual block of the depth residual network model specifically comprises:
replacing a second convolution layer in the ResNet block with a single-layer transform model, and adding the output of the previous layer to the input of the current layer through a cross-layer connection to obtain an improved ResNet block;
constructing a depth residual error network model by using a plurality of improved ResNet blocks, carrying out global average pooling on the output of the last improved ResNet block, and outputting a classification result by the full connection layer as the input of the full connection layer;
and calculating a prediction result by adopting the Sigmoid function as an activation function to obtain an improved depth residual error network model.
5. The driver fatigue detection model construction method according to claim 1, wherein the optimization objective of the improved depth residual network model is specifically:
wherein θ represents a model parameter, f (x i ) Representing input x i Is used to determine the model predictive value of (1), I.I 2 Represents L 2 Norms, λ is regularization parameter, y i For single modified ResNet block output, y i =x i +FFN(MHA(F(x i ))),F(x i ) Representing the eigenvectors after the first convolution layer of a single modified ResNet block, MHA (·) represents the multi-headed attentiveness mechanism, FFN (·) represents the feedforward neural network, N represents the number of samples, and W represents the weight matrix of the fully connected layer.
6. The driver fatigue detection model construction method according to claim 1, wherein the calculation formula of the improved depth residual network model is:
y=σ(W 2 ReLU(W 1 AvgPool(F(x)))+b 2 );
where x represents the input eigenvector, y represents the output, F (x) represents the eigenvalue matrix of multiple improved ResNet block extractions, avgPool (·) represents the global average pooling layer, W 1 And W is 2 Respectively representing weight matrix of two full connection layers, b 2 Representing the bias vector, reLU (·) representing the activation function, σ (·) representing the residual activation function.
7. The driver fatigue detection model construction method according to claim 1, characterized in that the method further comprises:
and performing scaling, clipping and graying operations on the data in the acquired data set.
8. A driver fatigue detection model construction apparatus, characterized by comprising:
the data acquisition module is used for acquiring facial image data of the driver in different fatigue states as a data set;
the feature extraction module is used for extracting the feature information of the driver from the data set by adopting a pre-built feature extraction model;
the model construction module is used for taking the transducer model as a characteristic extractor of the depth residual error network model, and adding cross-layer connection to a residual error block of the depth residual error network model to obtain an improved depth residual error network model;
and the model training module is used for inputting the extracted characteristic information into the improved depth residual error network model for training to obtain a fatigue detection model.
9. A terminal device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the driver fatigue detection model construction method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored computer program, wherein the computer program, when run, controls a device in which the computer-readable storage medium is located to execute the driver fatigue detection model construction method according to any one of claims 1 to 7.
CN202310554606.9A 2023-05-16 2023-05-16 Method, device, equipment and medium for constructing fatigue detection model of driver Pending CN116630943A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310554606.9A CN116630943A (en) 2023-05-16 2023-05-16 Method, device, equipment and medium for constructing fatigue detection model of driver

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310554606.9A CN116630943A (en) 2023-05-16 2023-05-16 Method, device, equipment and medium for constructing fatigue detection model of driver

Publications (1)

Publication Number Publication Date
CN116630943A true CN116630943A (en) 2023-08-22

Family

ID=87635802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310554606.9A Pending CN116630943A (en) 2023-05-16 2023-05-16 Method, device, equipment and medium for constructing fatigue detection model of driver

Country Status (1)

Country Link
CN (1) CN116630943A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116959078A (en) * 2023-09-14 2023-10-27 山东理工职业学院 Method for constructing fatigue detection model, fatigue detection method and device thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116959078A (en) * 2023-09-14 2023-10-27 山东理工职业学院 Method for constructing fatigue detection model, fatigue detection method and device thereof
CN116959078B (en) * 2023-09-14 2023-12-05 山东理工职业学院 Method for constructing fatigue detection model, fatigue detection method and device thereof

Similar Documents

Publication Publication Date Title
Springenberg et al. Improving deep neural networks with probabilistic maxout units
Jayalekshmi et al. Facial expression recognition and emotion classification system for sentiment analysis
Ouali et al. Spatial contrastive learning for few-shot classification
CN111091175A (en) Neural network model training method, neural network model classification method, neural network model training device and electronic equipment
CA3197846A1 (en) A temporal bottleneck attention architecture for video action recognition
CN110602120B (en) Network-oriented intrusion data detection method
JP2011248879A (en) Method for classifying object in test image
Taghanaki et al. Robust representation learning via perceptual similarity metrics
Gastaldo et al. Machine learning solutions for objective visual quality assessment
CN117155706B (en) Network abnormal behavior detection method and system
CN116630943A (en) Method, device, equipment and medium for constructing fatigue detection model of driver
US11475684B1 (en) Methods and systems for performing noise-resistant computer vision techniques
CN108496174B (en) Method and system for face recognition
CN113869234A (en) Facial expression recognition method, device, equipment and storage medium
Liu et al. A dual-branch balance saliency model based on discriminative feature for fabric defect detection
US20230410465A1 (en) Real time salient object detection in images and videos
CN111652320B (en) Sample classification method and device, electronic equipment and storage medium
Jose et al. Genus and species-level classification of wrasse fishes using multidomain features and extreme learning machine classifier
Gan et al. Automated Classification System for Tick‐Bite Defect on Leather
Sufikarimi et al. Speed up biological inspired object recognition, HMAX
CN114117037A (en) Intention recognition method, device, equipment and storage medium
CN113468936A (en) Food material identification method, device and equipment
Atallah et al. NEURAL NETWORK WITH AGNOSTIC META-LEARNING MODEL FOR FACE-AGING RECOGNITION
Gouizi et al. Nested-Net: a deep nested network for background subtraction
JP7466815B2 (en) Information processing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination