CN117975420A

CN117975420A - Driver distraction identification method, intelligent cabin and electronic equipment

Info

Publication number: CN117975420A
Application number: CN202311782465.2A
Authority: CN
Inventors: 曾月; 李斯; 杨周龙
Original assignee: Dongpu Software Co Ltd
Current assignee: Dongpu Software Co Ltd
Priority date: 2023-12-22
Filing date: 2023-12-22
Publication date: 2024-05-03

Abstract

The invention discloses a driver distraction recognition method based on deep learning, an intelligent cabin and electronic equipment, wherein the method comprises the following steps: classifying the images according to the characteristic information of the images/videos in the vehicle, and processing the images/videos by using a video enhancement program; correcting the processed image/video by using an L2 regularization model to generate a training set; and inputting the training set and the verification set into the time transformer model again for training, inputting the acquired images/videos into the trained distraction recognition model, and sending out corresponding different early warning prompt information according to the recognition result of the distraction recognition model. The invention recognizes the head gesture and other body gestures of the driver based on the computer vision technology, can monitor whether the driver is looking right in the left, and can effectively improve the safety consciousness and alertness of the driver, thereby reducing the occurrence rate of traffic accidents.

Description

Driver distraction identification method, intelligent cabin and electronic equipment

Technical Field

The invention relates to an identification method, an intelligent cabin and electronic equipment, in particular to a driver distraction identification method, an intelligent cabin and electronic equipment.

Background

The driver's right look left during driving is a common behavior. However, in some cases, this behavior may affect the driver's attention to the road conditions, thereby increasing the risk of traffic accidents. For example, if the driver fails to notice the vehicle in the rear side in time during lane change, dangerous situations may occur. Currently, prior art vehicle driving management systems focus on monitoring of driving data, positioning information, etc. of a vehicle, and monitoring of behavior and attentiveness states of a driver is relatively lacking. The traditional monitoring means are difficult to accurately identify the head gesture and the sight direction of a driver, and cannot meet the requirements of people, so that improvement is needed.

Disclosure of Invention

The invention aims to provide a driver distraction identification method, an intelligent cabin and electronic equipment, which realize intelligent real-time monitoring of driver behaviors and solve the defects existing in the prior art.

The invention provides the following scheme:

a deep learning-based driver distraction identification method, comprising:

Acquiring an in-vehicle image/video in the running process of the vehicle, and classifying the image according to the preset driving safety rule of the vehicle and the characteristic information of the in-vehicle image/video;

According to different driving states of the driver corresponding to the classified images/videos, processing the images/videos by using a video enhancement program;

performing data processing based on time sequence on the image/video by using a time transformer model, and correcting the processed image/video by using an L2 regularization model to generate a training set;

inputting the training set and the verification set into the time transformer model again for training, wherein the training set is based on a model training algorithm and at least comprises the following steps: linear regression, logistic regression and support vector machines to generate corresponding distraction recognition models;

Inputting the acquired images/videos into a trained distraction recognition model, and sending out corresponding different early warning prompt information according to the recognition result of the distraction recognition model.

Further, the classifying the images according to the preset driving safety rule of the vehicle and combining the characteristic information of the images/videos in the vehicle further comprises: the image/video feature information further includes:

a first driving state for characterizing a driver left look-aside right look-ahead scene;

A second driving state for characterizing normal driving of the driver;

And the third driving state which does not belong to the first driving state and the second driving state is used for representing other unsafe behaviors of the driver which does not belong to the first driving state and the second driving state.

Further, the method further comprises the following steps:

Establishing a driving state image/video library of a driver: acquiring a face image/video of a driver in a left look-aside right look-aside scene to form a corresponding image/video data set;

Decomposing the image/video into continuous frame images based on a time sequence, and converting the continuous frame images into a digital matrix for labeling;

the marked digital matrix is sent to a neural network for learning, iterative optimization and weight configuration are carried out until the neural network outputs a first result variable which can identify the right look-aside scene of the driver;

the first result variable is marked as a first driving state for storage for video enhancement processing and as an input to a time transformer model.

Further, the processing the image/video by using the video enhancement program according to different driving states of the driver corresponding to the classified image/video further includes:

Performing data fusion on the classified first video and the classified second video by using VideoMix data enhancement technology, wherein the first video is the driving state of a driver in a first time period, the second video is the driving state of the driver in a second time period, and the first time period and the second time period are distributed based on a time sequence;

the method comprises the steps of receiving key frames and time stamp information of a first time period and a second time period, extracting features of each key frame to obtain corresponding different feature values, and carrying out synchronous processing on the first time period and the second time period according to the time stamp information;

and carrying out space-time fusion on the first time period and the second time period after the synchronous processing, and comparing the characteristic values to carry out space-time fusion on video information of the first video and the second video.

Further, the processing the image/video by using the time transformer model based on the time sequence data, and correcting the processed image/video by using the L2 regularization model to generate a training set, further includes:

acquiring each frame of picture of an image/video, and converting each frame of picture into a feature vector;

Establishing a mapping relation between the time dimension and the feature vector, and obtaining dynamic change information corresponding to the mapping relation through weights among different mapping relations;

And inputting the dynamic change information into a machine learning model, and training the dynamic change information based on the L2 regularized loss function.

Further, the training of the dynamic change information based on the loss function of the L2 regularization specifically includes:

Acquiring parameters of a time transformer model, and inputting the parameters of the time transformer into a loss function, wherein the loss function is used for measuring the difference between a model prediction result and a true value;

Establishing a penalty term of the loss function, wherein the penalty term is obtained by multiplying a regulating parameter after square sum calculation of parameters of the time converter, and obtaining a penalty term coefficient value;

And detecting whether the penalty term of the loss function reaches the degree of overfitting according to a preset threshold value, if the penalty term of the loss function does not reach the degree of overfitting, carrying out iterative operation until the penalty term of the loss function exceeds the degree of overfitting, stopping iterative operation, and recording the previous calculation result exceeding the degree of overfitting.

A deep learning based driver distraction identification system comprising:

The in-vehicle image/video acquisition and classification module is used for acquiring in-vehicle images/videos in the running process of the vehicle and classifying the images according to preset vehicle driving safety rules and the characteristic information of the in-vehicle images/videos;

The image/video enhancement processing module is used for processing the images/videos by using a video enhancement program according to different driving states of drivers corresponding to the classified images/videos;

The time transformer model correction module is used for performing data processing on the images/videos based on a time sequence by using the time transformer model, correcting the processed images/videos by using an L2 regularization model, and generating a training set;

The time transformer model training module inputs the training set and the verification set into the time transformer model again for training, wherein the training set is based on a model training algorithm and at least comprises the following steps: linear regression, logistic regression and support vector machines to generate corresponding distraction recognition models;

And the early warning prompt information sending module inputs the acquired images/videos into a trained distraction recognition model, and sends out corresponding different early warning prompt information according to the recognition result of the distraction recognition model.

An intelligent cabin is characterized in that a driver distraction recognition system based on deep learning is arranged in the intelligent cabin.

An electronic device, comprising: the device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; the memory has stored therein a computer program which, when executed by the processor, causes the processor to perform the steps of the method.

A computer readable storage medium storing a computer program executable by an electronic device, which when run on the electronic device causes the electronic device to perform the steps of the method.

Compared with the prior art, the invention has the following advantages:

The invention recognizes the head gesture and other body gestures of the driver based on the computer vision technology, and can monitor whether the driver is looking right in real time. The invention can effectively improve the safety consciousness and alertness of the driver, thereby reducing the incidence rate of traffic accidents, identifying the head gesture through the computer vision technology, and timely sending out warning signals to the driver or reminding the driver to pay attention to road conditions, thereby greatly reducing potential risks caused by distraction, and reducing traffic accidents caused by reasons such as distraction, fatigue, right-looking and left-looking.

In order to intelligently solve the traffic safety hidden trouble caused by distraction, fatigue and right-looking-left-looking, the invention improves the definition of the collected images/videos in the vehicle by classifying the images/video enhancement programs, processes the data based on the time sequence by utilizing a time transformer model, a loss function and an L2 regularization processing method thereof, prevents the algorithm from fitting phenomenon during operation, adds the generalization capability of the data model and improves the application range of the driver attention-dispersing identification method.

In addition, the method and the device also perform fusion processing on the classified video/image data through VideoMix data enhancement technology, and perform feature extraction based on key frames and time stamps, so that space-time fusion of different videos is realized, and the application and training of a deep learning model are facilitated.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a driver distraction identifying method.

Fig. 2 is a schematic diagram of a driver distraction recognition system.

Fig. 3 is a schematic structural view of the electronic device.

Detailed Description

The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The driver distraction identifying method based on deep learning as shown in fig. 1 includes:

Step S1, acquiring an in-vehicle image/video in the running process of a vehicle, and classifying the image according to the preset driving safety rule of the vehicle and the characteristic information of the in-vehicle image/video;

Preferably, the image/video feature information further includes:

A second driving state for characterizing normal driving of the driver;

According to the common sense of life and the common sense of driving, when a driver frequently looks about right-look-ahead during driving, the driver may mean that he is looking for road signs, confirming road conditions or paying attention to surrounding environment changes, one of the main purposes of the invention is to accurately capture these fine action changes by combining means of a vehicle-mounted intelligent device and an intelligent cabin with means of computer vision analysis and the like, and timely make corresponding warnings or reminders. Also, it is understood by those skilled in the art that the preset driving safety rules of the vehicle include not only traffic safety rules but also technical content information such as a mechanical state and an electronic device state of the vehicle.

Preferably, a time converter can be added in the vehicle-mounted equipment, or a time converter module is added in the intelligent cabin, and a deep learning model and a neural network model are combined to intelligently identify a scene of a driver looking right in the left view:

However, since noise and blurred regions may exist when in-vehicle image/video acquisition is performed, video enhancement processing is required. Those skilled in the art will appreciate that video enhancement processing may also be used in image processing, for example, the image may be processed by a video enhancement software program, where the principles of video enhancement processing are in communication with the principles of image processing, solutions for processing images using video enhancement processing techniques have been widely used in the art.

To provide image/video processing quality, various angles of the image/video should be adjusted, for example: brightness, contrast, color saturation, etc., and may also be based on time series processing of the image/video. In this embodiment, a time transformer model is employed to perform data on images/videos based on time series, and the input time series data is converted into output data having different characteristic representations or distributions.

For example, if the vehicle is traveling on a night road in a low light condition, it is difficult to acquire a high-quality image/video of the driver looking left and right if the brightness and contrast of the image/video are not improved, so the vehicle traveling on the night road may be subjected to video enhancement processing, the enhanced image/video may be transmitted as input to the time converter model TimeSformer, and the time converter model TimeSformer may detect and mark the behavior of the driver in the vehicle traveling on the night street by the feature vector and generate an output result with the marked information.

Finally, the collected images/videos of the driving process of the driver are taken as samples and classified according to the following three categories: category 1: look left to right look-ahead scene; category 2: normally running; category 3: other unsafe behavior; the look-left right look-ahead scene includes: the attention of the driver is deviated to the left, and the situation of crossing, lane changing and the like can be searched; the normal running includes: the driver keeps normal looking ahead and focuses on the driving scene; other unsafe behavior includes: climbing, leaning against a seat, making a call, looking at a cell phone, etc. may interfere with driving behavior;

and S2, processing the images/videos by using a video enhancement program according to different driving states of drivers corresponding to the classified images/videos, and improving the picture quality.

Specifically, the (image/video) data enhancement processing can be performed by using VideoMix, and the classified first video and second video are subjected to data fusion by using VideoMix data enhancement technology, wherein the first video is a driving state of a driver in a first time period, the second video is a driving state of the driver in a second time period, and the first time period and the second time period are distributed based on a time sequence;

Image/video processing is performed by adopting VideoMix data enhancement technology, so that the diversity of training samples can be increased, the activation capability of the model can be improved, and the robustness of the model in various scenes can be improved.

Exemplary, the image/video attribute is changed by means of video clipping, frame interpolation, image transformation (translation, rotation and scaling), color space transformation (brightness adjustment and comparison file adjustment), data mixing, data fusion (weight addition and maximum value taking) and the like based on VideoMix data enhancement technology, so that the model can be better adapted to the input under different environmental conditions, and the algorithm performance and the robustness of the machine learning and the neural network are effectively improved.

And step S3, performing data processing based on time sequence on the image/video by using a time transformer model TimeSformer, and correcting the processed image/video by using an L2 regularization model to generate a training set.

The time transformer model TimeSformer is a model for processing the task of time series data, and can be used for effectively capturing long-term dependency in the time series through introducing self-attention mechanism, multi-layer perceptron and other components through modeling of time dimension, has the advantage of parallel calculation, and can better process longer input sequences.

Illustratively, the time transformer model processes time series based data, can establish correlations between different locations of different time series, and provides a valuable training set relatively accurately by calculating the correlation between one particular location and other locations.

Specifically, the process of generating the training set using the time transformer model TimeSformer further includes:

and inputting the dynamic change information into a machine learning model, and training the dynamic change information based on a L2 regularized loss function to prevent overfitting caused by the fact that training data are too sensitive and cannot be generalized to new samples.

By way of example, the edge-aware processing of images/video may be performed using the L2 canonical optimization approach of the temporal transformer model TimeSformer,

The L2 regularization optimization model edge perception process refers to optimizing the model by using an L2 regularization technology in machine learning and deep learning, and realizing edge perception processing on images or other data. The edge is an area formed by abrupt change of color, brightness or texture in the image, and the purpose of edge perception (edge detection) is to identify and extract contour line information formed by the places with obvious boundary between objects and intense color or gray value conversion in the image, for example, the image/video of a driver can be identified and appear from the in-vehicle space through the contour line, the loss function can obtain high-quality original input data before processing the data, and because the original input data is properly restrained and normalized, the clear visible representative characteristic points among different objects in the image can be better identified, and further image/video processing work is facilitated.

If the L2 regularization is not performed, the target will usually only pay attention to the whole in-vehicle region when detecting or segmenting the in-vehicle image/video, but neglect the fine but important feature points (such as contour lines) around the driver in the vehicle, which may lead to inaccurate results or even erroneous judgment, and after the L2 regularization is adopted, the fine but decisive feature point information can be better retained and emphasized.

Illustratively, the dynamic change information is trained based on the loss function of the L2 regularization, specifically:

Parameters of the time transformer model are acquired, and the parameters of the time transformer are input into a loss function, wherein the loss function is used for measuring the difference between a model prediction result and a true value.

Establishing a penalty term of the loss function, wherein the penalty term refers to: the parameters of the time converter are multiplied by an adjusting parameter after square operation, a penalty term coefficient value is obtained, the adjusting parameter represents the integral influence degree of L2 positive side on the loss function, and the integral influence degree can be obtained by prior value, experience value or searching corresponding technical manual and database.

And detecting whether the penalty term of the loss function reaches the degree of overfitting according to a preset threshold value, if the penalty term of the loss function does not reach the degree of overfitting, carrying out iterative operation until the penalty term of the loss function exceeds the degree of overfitting, stopping iterative operation, and recording the previous calculation result exceeding the degree of overfitting, wherein a balance point can be found between the underfitting and the overfitting, so that the best fitting effect is obtained.

Step S4, inputting the training set and the verification set into the time transformer model again for training, wherein the training set is based on a model training algorithm and at least comprises the following steps: linear regression, logistic regression and support vector machines to generate corresponding distraction recognition models;

The linear regression can establish a relationship between continuous variables, for example, in the invention, a relationship between continuous variables based on time series is established, and a relationship between independent variables and dependent variables is established by fitting gesture image/video information of a driver in the driving process of a vehicle, so as to find a rule between the gesture information of the driver and a time axis and a driving distance.

In logistic regression, the driver gesture information is mapped between 0 and 1 as an input feature through a function, and classification is performed based on the probability, so that the classification problem can be solved, namely, the driver look left at right look-ahead behavior and look-aside right look-ahead behavior are classified, corresponding data levels are formed, a decision boundary can be obtained from a book sample through self-learning, and prediction is performed in a new sample.

In the support vector machine, an optimal hyperplane can be found that separates the two types of samples (driver right-look, non-right-look, and left-look) and maximizes the sample point distance nearest to the hyperplane. The support vector machine is suitable for processing high-dimensional data, and has strong robustness, for example: a support vector machine may be used to distinguish between different vehicles and corresponding different drivers.

And S5, inputting the acquired images/videos into a trained distraction recognition model, and sending out corresponding different early warning prompt information according to the recognition result of the distraction recognition model.

The deep learning-based driver distraction identification system as shown in fig. 2 includes:

It should be noted that, although only some basic functional modules are disclosed in the embodiment of the present invention, the composition of the present system is not meant to be limited to the above basic functional modules, but rather, the present embodiment is meant to express: one skilled in the art can add one or more functional modules to the basic functional module to form an infinite number of embodiments or technical solutions, that is, the system is open rather than closed, and the scope of protection of the claims is not limited to the disclosed basic functional module because the present embodiment only discloses individual basic functional modules. Meanwhile, for convenience of description, the above devices are described as being functionally divided into various units and modules, respectively. Of course, the functions of the units, modules may be implemented in one or more pieces of software and/or hardware when implementing the invention.

The embodiments of the system described above are merely illustrative, for example: wherein each functional module, unit, subsystem, etc. in the system may or may not be physically separate, or may not be a physical unit, i.e. may be located in the same place, or may be distributed over a plurality of different systems and subsystems or modules thereof. Those skilled in the art may select some or all of the functional modules, units or subsystems according to actual needs to achieve the purposes of the embodiments of the present invention, and in this case, those skilled in the art may understand and implement the present invention without any inventive effort.

As shown in fig. 3, the invention further provides an intelligent cabin, an electronic device and a storage medium corresponding to the deep learning-based driver distraction identifying method and system based on the deep learning:

an intelligent cabin in which a driver distraction recognition system based on deep learning is provided.

The intelligent cabin is used for upgrading and reforming the inside of the automobile by combining intelligent technical means and intelligent equipment, so that user experience and convenience are improved, a safer, more convenient and pleasant journey experience is brought to a user, the intelligent cabin can have Virtual Reality (VR) and/or Augmented Reality (AR) functions, immersive entertainment experience can be brought to the user, rich interfaces and sensors can be provided, for example, in the embodiment, in-automobile images/videos in the running process of the automobile can be acquired through the interfaces and the sensors of the intelligent cabin, after the operation of an on-board computer or a cloud server at the automobile end, the images/videos are input into a trained attention-dispersing recognition model, and different early warning prompt information is sent out according to the recognition result of the attention-dispersing recognition model.

An electronic device, comprising: the device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; the memory has stored therein a computer program which, when executed by the processor, causes the processor to perform the steps of a deep learning based driver distraction identification method.

A computer readable storage medium storing a computer program executable by an electronic device, which when run on the electronic device, causes the electronic device to perform the steps of a deep learning based driver distraction identification method.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 3, an electronic device provided in an embodiment of the present invention includes: one or more processors 710 and a storage 720; the electronic device may have one or more processors 710, one processor 710 being illustrated in fig. 3; the storage 720 is used to store one or more programs; the one or more programs are executed by the one or more processors 710 to cause the one or more processors 710 to implement the deep learning based driver distraction identification method according to any one of the embodiments of the present invention.

The electronic device may further include: an input device 730 and an output device 740.

The processor 710, the storage 720, the input 730, and the output 740 of the electronic device may be connected by a bus or other means, for example in fig. 3.

The storage 720 in the electronic device is used as a computer readable storage medium, and may be used to store one or more programs, such as a software program, a computer executable program, and a module, such as program instructions/modules corresponding to the deep learning-based driver distraction identifying method provided in the embodiment of the present invention. The processor 710 executes various functional applications of the electronic device and data processing by executing software programs, instructions, and modules stored in the storage 720, i.e., implements the deep learning-based driver distraction recognition method in the above-described method embodiment.

The storage 720 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the electronic device, etc. In addition, the storage 720 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, storage 720 may further include memory located remotely from processor 710, which may be connected to the device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 730 may be used to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the electronic device. The output device 740 may include a display device such as a display screen.

And, when one or more programs included in the above-described electronic device are executed by the one or more processors 710, the programs perform the operation steps of the driver distraction recognition method based on deep learning.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example: any of the embodiments claimed in the claims may be used in any combination of the embodiments of the invention.

In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. A driver distraction identifying method based on deep learning, comprising:

2. The deep learning-based driver distraction identification method according to claim 1, wherein the classifying the images in combination with the feature information of the in-vehicle images/videos according to a preset driving safety rule of the vehicle, further comprises: the image/video feature information further includes:

A second driving state for characterizing normal driving of the driver;

3. The deep learning based driver distraction identification method of claim 2, further comprising:

4. The deep learning-based driver distraction identifying method according to claim 1, wherein the processing the image/video using the video enhancement program according to different driving states of the driver corresponding to the classified image/video, further comprises:

5. The deep learning-based driver distraction identification method of claim 1, wherein the time-series-based data processing is performed on the images/videos using a time transformer model, the processed images/videos are corrected using an L2 regularization model, and a training set is generated, further comprising:

6. The deep learning based driver distraction identification method of claim 5, wherein the L2 regularization based loss function trains the dynamic change information, in particular:

7. A deep learning-based driver distraction identification system, comprising:

8. An intelligent cockpit, characterized in that the intelligent cockpit is provided with the deep learning-based driver distraction recognition system according to claim 7.

9. An electronic device, comprising: the device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; the memory has stored therein a computer program which, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 6.

10. A computer readable storage medium, characterized in that it stores a computer program executable by an electronic device, which, when run on the electronic device, causes the electronic device to perform the steps of the method of any one of claims 1 to 6.