CN111222477B

CN111222477B - Vision-based method and device for detecting departure of hands from steering wheel

Info

Publication number: CN111222477B
Application number: CN202010026699.4A
Authority: CN
Inventors: 戚治舟; 王汉超
Original assignee: Xiamen Ruiwei Information Technology Co ltd
Current assignee: Xiamen Ruiwei Information Technology Co ltd
Priority date: 2020-01-10
Filing date: 2020-01-10
Publication date: 2023-05-30
Anticipated expiration: 2040-01-10
Also published as: CN111222477A

Abstract

The invention provides a vision-based method for detecting whether two hands leave a steering wheel, which comprises the following steps: collecting sample data, marking the sample data, and performing network training and optimization by using the marked sample data to obtain a model; converting the model into a model under ncnn; acquiring an infrared picture of a driver, processing the picture, inputting the picture into a model, analyzing a model result to acquire a steering wheel position, expanding a steering wheel region, selecting a set roi region, cutting out the roi region, processing the cut picture, inputting the picture into the model to judge whether the hands of the driver leave the steering wheel, and giving an alarm if the driver does not have one hand and then turns to the steering wheel; if not, not alarming; the invention also provides a device which can effectively improve the detection rate of the model, reduce the input size of the network and improve the speed of the model more quickly.

Description

Vision-based method and device for detecting departure of hands from steering wheel

Technical Field

The invention relates to the technical field of computers, in particular to a method and a device for detecting whether hands leave a steering wheel based on vision.

Background

When a driver drives a vehicle, many factors interfere with safe driving, and the driver does not necessarily observe traffic regulations and safe operation regulations, such as driving to receive calls, smoking and other actions, can endanger the safety of passengers. At the moment, the driver can be warned to correct the irregular behavior of the driver by judging the time when the hands of the driver leave the steering wheel. At present, three main detection methods for the driver's hands leaving the steering wheel are as follows:

(1) Based on the steering wheel torque signal: the driver torque state is estimated from the plurality of electric power steering signals, the torque state is used to determine whether the driver is gripping the steering wheel, and the magnitude of the driver torque state is compared to a high grip torque threshold. The method has the advantages of higher processing speed, obvious defects, poor robustness and small application range, and only a preset experience threshold value can be used.

(2) Methods for measuring pressure or temperature of hands based on steering wheel sensor and the like: mainly through the ability of hardware sensor, install the sensor around the steering wheel, whether both hands grip the steering wheel through temperature or pressure sensing. The method has high processing speed, but has higher cost, can be easily subjected to external interference factors and is easy to cause false alarm.

(3) Machine vision based methods: with the development of deep learning, computer machine vision technology has been rapidly developed through convolutional neural networks. The biggest advantage of the deep learning method is that features required by a target task are learned through a convolution network, and in many fields, the recognition rate of the deep learning object can exceed that of human eyes. Because of the limitation of the computing power of the hardware equipment, the neural network with excellent effect needs enough computing power, so that many artificial intelligence projects are difficult to land.

Disclosure of Invention

The invention aims to solve the technical problem of providing a vision-based method and device for detecting whether two hands leave a steering wheel, which can effectively improve the detection rate of a model, reduce the size of network input and improve the speed of the model more quickly.

In a first aspect, the present invention provides a method comprising:

step 1, collecting sample data, marking the sample data, and performing network training and optimization by using the marked sample data to obtain a model;

step 2, converting the model into a model under ncnn;

step 3, obtaining an infrared picture of a driver, processing the picture, inputting the picture into a model, analyzing a model result to obtain a steering wheel position, expanding a steering wheel region, selecting a set roi region, cutting out the roi region, processing the cut picture, inputting the picture into the model to judge whether the hands of the driver leave the steering wheel, and giving an alarm if the driver does not have one hand to turn the steering wheel; if not, the alarm is not given.

Further, the step 1 is further specifically: the method comprises the steps that an infrared picture of a driver is collected through an in-vehicle monitoring camera, wherein the infrared picture comprises a picture of the camera right above the driver, a picture of the camera on the top of a vehicle door and a picture of a steering wheel crawled on the internet;

marking a steering wheel in the collected infrared pictures, obtaining position coordinates of the steering wheel, expanding a steering wheel area, selecting a set roi area, cutting the picture, and marking the pictures, wherein the hand coordinates of a driver holding the steering wheel, the hand coordinates without holding the steering wheel and category information are marked respectively;

the method comprises the steps of performing network training by using a caffe frame, converting marked infrared pictures into lmdb training data under caffe, selecting a MobileNet v2-yolov3 network by a steering wheel detection network and a two-hand leaving steering wheel identification network, setting learning rate and the number of training data each time by adopting an SGD (generalized name space) optimization learning method, performing data enhancement operation on pictures with different input sizes, performing training for set times by the steering wheel detection network and the two-hand leaving steering wheel identification network, stabilizing and converging network loss values, and performing pruning optimization on the steering wheel detection network to finally obtain a trained model.

Further, the step 3 is further specifically:

step 31, starting the vehicle, acquiring an infrared picture of a driver, processing the picture, inputting the picture into a steering wheel detection network if the position of the steering wheel does not exist, analyzing a model result to acquire the position of the steering wheel, and entering step 32; if so, go to step 32;

step 32, enlarging a steering wheel area, selecting a set roi area, cutting out the roi area, processing the cut picture, inputting the picture into a two-hand leaving steering wheel identification network to judge whether the two hands of a driver leave the steering wheel, and calling a steering wheel detection network again if the two-hand leaving steering wheel identification network continuously identifies that the driver does not have one hand and then the steering wheel is located in the set time, and alarming if the steering wheel is detected, and not needing to alarm if the steering wheel is not detected; if the hands leave the steering wheel identification network within the set time and continuously identify that one or two hands of the driver are on the steering wheel again, no alarm is given.

In a second aspect, the present invention provides an apparatus comprising:

the training optimization module is used for collecting sample data, marking the sample data, and performing network training and optimization by using the marked sample data to obtain a model;

the conversion module converts the model into a model under ncnn;

the detection module is used for acquiring an infrared picture of a driver, processing the picture, inputting the picture into the model, analyzing a model result to acquire a steering wheel position, expanding a steering wheel region, selecting a set roi region, cutting out the roi region, processing the cut picture, inputting the picture into the model to judge whether the hands of the driver leave the steering wheel, and giving an alarm if the driver does not have one hand to turn the steering wheel; if not, the alarm is not given.

Further, the training optimization module is further specifically: the method comprises the steps that an infrared picture of a driver is collected through an in-vehicle monitoring camera, wherein the infrared picture comprises a picture of the camera right above the driver, a picture of the camera on the top of a vehicle door and a picture of a steering wheel crawled on the internet;

Further, the detection module is further specifically:

the position unit is used for starting the vehicle, acquiring an infrared picture of a driver, processing the picture, inputting the picture into a steering wheel detection network if the position of the steering wheel does not exist, analyzing a model result to acquire the position of the steering wheel, and entering the alarm unit; if yes, entering an alarm unit;

the alarming unit expands the steering wheel area, selects a set roi area, cuts out the roi area, processes the cut picture, inputs the picture into the two-hand leaving steering wheel identification network to judge whether the two hands of the driver leave the steering wheel, and if the two-hand leaving steering wheel identification network continuously identifies that the driver does not have one hand and then the steering wheel is positioned on the steering wheel within the set time, invokes the steering wheel detection network again, alarms if the steering wheel is detected, and does not need to alarm if the steering wheel is not detected; if the hands leave the steering wheel identification network within the set time and continuously identify that one or two hands of the driver are on the steering wheel again, no alarm is given.

One or more technical solutions provided in the embodiments of the present invention at least have the following technical effects or advantages:

(1) The in-car monitoring camera is directly utilized, no matter the camera is at any angle, the detection roi area can be cut out through the steering wheel detection algorithm, the trouble of the camera installation is avoided, the application range is wide, and the cost is low.

(2) The combination judgment of the steering wheel detection model and the two network models of whether the two hands leave the steering wheel recognition network can better solve the problems of false alarm and external interference. When people and objects are met to shield the camera or the installation position of the camera is not good, false alarm can not be caused. The method has the advantages that the detection of whether the hand leaves the steering wheel or not is carried out by intercepting the roi area of interest through the steering wheel detection model, so that the detection rate of the model can be effectively improved, the network input size is reduced, and the model speed is improved more rapidly.

(3) The scheme combines the light-weight mobile v2 main network and the post-processing method of the yolo v3 network with very good detection effect. Although two detection networks are needed to cooperate, the speed is not affected at all, because the detection times of the steering wheel are small, the network consumption time is negligible, whether the two hands leave the steering wheel to identify the network input is small, the speed is high, and the speed can reach 40-60 ms on the arm. On the premise of high speed, the method has better detection effect and better robustness compared with other algorithms. In the scheme, 328904 pictures are collected in total, 25592 pictures are collected in a test set, the accuracy of steering wheel detection reaches more than 99%, and the preparation rate of hands leaving the steering wheel reaches more than 95%.

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.

Drawings

The invention will be further described with reference to examples of embodiments with reference to the accompanying drawings.

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a schematic illustration of a sample annotation according to the present invention;

FIG. 3 is a flowchart and steps of an on-line model deployment of the present invention.

Detailed Description

According to the technical scheme in the embodiment of the application, the overall thought is as follows:

(1) The steering wheel detection technology is utilized to position the steering wheel, and the steering wheel area is searched for by the steering wheel position, so that the technical innovation is that the steering wheel is accurately positioned, the required hardware calculation force is not much, and the steering wheel detection speed is very high.

(2) The acquired steering wheel area is expanded, the roi area of interest is selected, the roi area is input to the convolution network, the convolution network can identify whether the hands of a driver leave the steering wheel or not, the convolution network outputs the positions of the hands simultaneously, and the state of each hand (whether the steering wheel is grasped or not) is judged.

(3) In order to overcome the false alarm caused by the fact that a person or an object blocks the steering wheel when the driver is identified to hold the steering wheel, multiple frames are adopted to judge whether the driver holds the steering wheel or not within a continuous period of time, whether the steering wheel is shielded or not is judged by detecting the steering wheel, and the false alarm is thoroughly solved by integrating the identification result calculation of the multiple frames.

The first part is model training, including data collection, data format conversion, network selection, model training and network optimization; the second part is online service deployment, including model conversion, preprocessing under the mobile terminal framework, network analysis and other code writing.

1. Detailed steps and flow of deep learning model training (as shown in FIG. 1)

(1) Collection of data: the infrared pictures of the driver are collected through the in-vehicle monitoring camera, and the in-vehicle monitoring camera comprises pictures of the camera right above the driver, pictures of the camera at the top of the vehicle door and pictures of the steering wheel crawled on the internet, and covers various scenes (pictures under conditions of day, night, strong light, dim light, backlight and the like).

(2) Labeling of sample data (as in fig. 2): and marking the steering wheel in the collected infrared picture data, and obtaining the position coordinates of the steering wheel, wherein the position of the steering wheel of the same vehicle is not moved and only needs to be marked once. After the steering wheel area is enlarged and the roi area of interest is selected, the picture is marked after cutting, and the hand coordinates and the category information of the driver holding the steering wheel and the hand without holding the steering wheel are marked respectively.

(3) Training and optimization of the network: from the above data collection and data annotation we finally obtained training data. We use the caffe framework for network training and need to convert the pictures and annotation data in the training data into lmdb training data under caffe. Next is the design of our network, which has chosen a lightweight network MobileNet-V2 as the primary network due to the need to run algorithms on the embedded device (arm). The MobileNet series network is specially designed by *** for mobile terminal equipment, so that the calculation amount of the network is greatly reduced, and the network is selected as a main network to improve the performance. The post-processing of the network is selected between ssd network post-processing and yolov3 network post-processing, experiments show that the post-processing of yolov3 has better detection effect on small targets, and finally the mobile Net v2-yolov3 network is selected for steering wheel detection and steering wheel departure recognition of both hands. After the network and lmdb training data are ready, training begins. By adopting the SGD optimization learning method, the learning rate is set to be 0.001, the number of the network training data batch_size is 128 each time, data enhancement operations such as random scaling and the like are performed on pictures with different input sizes, and the robustness of the network is improved. After 15 ten thousand times of training, the network loss value is stable and converged, and then pruning optimization is carried out on the steering wheel detection network, so that a trained model is finally obtained. After training, the accuracy of the steering wheel detection network can reach more than 0.99 and the accuracy of the hands leaving the steering wheel network can reach more than 0.95 after testing on 3 ten thousand data sets. Because the detection area of the two hands leaving the steering wheel network is the roi area of the steering wheel area, the network input is smaller, the consumed time is smaller, 40-60 ms can be reached on arm, and the requirements can be better met. But the steering wheel detection network needs full-image detection, the network input is large, and the time consumption reaches 400ms. And pruning optimization is carried out on the network, wherein pruning mainly reduces the output of partial network layers and removes redundant characteristics extracted by the network. According to the target detection size, it is inferred that one up-sampling layer has little effect in the network, one up-sampling layer is deleted, the time consumption of the network is reduced from 400ms to 100ms, and the accuracy can still reach more than 0.99.

2. Steps and flow of online model deployment (as in fig. 3):

(1) Model conversion: since the frame used by us is caffe during training, the frame is really deployed on the mobile terminal and needs to be transplanted to the mobile terminal frame. The domestic mobile terminal framework is better provided with a forward computation framework ncnn of the neural network with vacation and a deep neural network reasoning engine mnn of the Ali. Since ncnn uses a relatively large number, the model of caffe is converted into a model under ncnn.

(2) Preprocessing, network analysis and other code writing: the infrared picture of the driver is obtained through the camera, the picture is processed by the preprocessing code and then is input into the steering wheel detection network, the model result is analyzed to obtain the position of the steering wheel (the model result is a relative value, the real coordinate of the steering wheel on the picture is restored according to the size of the picture), and the steering wheel detection network only needs to be called once because the coordinate of the steering wheel of the same vehicle is not changed. Enlarging the steering wheel area, selecting an interesting roi area, cutting out the roi area, processing the cut picture, and inputting whether the hands leave the steering wheel network or not. If the driver is not on the steering wheel, the network can send out an alarm, and the network can send out an alarm to the driver if the driver is on the steering wheel, if the driver is in the steering wheel, the network can send out an alarm, and if the driver is detected, the network can send out an alarm, and the driver is prompted to drive safely by putting the hands on the steering wheel.

Example 1

The embodiment provides a method comprising;

step 1, collecting infrared pictures of a driver through a monitoring camera in a vehicle, wherein the infrared pictures comprise pictures of the camera right above the driver, pictures of the camera on the top of a vehicle door and pictures of a steering wheel crawled on the internet;

the method comprises the steps of performing network training by using a caffe frame, converting marked infrared pictures into lmdb training data under caffe, selecting a MobileNet v2-yolov3 network by a steering wheel detection network and a two-hand leaving steering wheel identification network, setting learning rate and the number of training data each time by adopting an SGD (generalized name space) optimization learning method, performing data enhancement operation on pictures with different input sizes, performing training for set times by the steering wheel detection network and the two-hand leaving steering wheel identification network, stabilizing and converging network loss values, and performing pruning optimization on the steering wheel detection network to finally obtain a trained model;

step 2, converting the model into a model under ncnn;

The step 3 is further specifically:

Based on the same inventive concept, the present application also provides a device corresponding to the method in the first embodiment, and details of the second embodiment are described in the following.

Example two

In this embodiment, there is provided an apparatus including:

the training optimization module is used for collecting infrared pictures of a driver through an in-vehicle monitoring camera, wherein the infrared pictures comprise pictures of the camera right above the driver, pictures of the camera on the top of a vehicle door and pictures of a steering wheel crawled on the internet;

the conversion module converts the model into a model under ncnn;

The detection module is further specifically:

Since the device described in the second embodiment of the present invention is a device for implementing the method described in the first embodiment of the present invention, based on the method described in the first embodiment of the present invention, a person skilled in the art can understand the specific structure and the deformation of the device, and thus the detailed description thereof is omitted herein. All devices used in the method according to the first embodiment of the present invention are within the scope of the present invention.

While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that the specific embodiments described are illustrative only and not intended to limit the scope of the invention, and that equivalent modifications and variations of the invention in light of the spirit of the invention will be covered by the claims of the present invention.

Claims

1. A vision-based method for detecting whether two hands leave a steering wheel is characterized in that: comprising the following steps:

step 2, converting the model into a model under ncnn;

step 3, obtaining an infrared picture of a driver, processing the picture, inputting the picture into a model, analyzing a model result to obtain a steering wheel position, expanding a steering wheel region, selecting a set roi region, cutting out the roi region, processing the cut picture, inputting the picture into the model to judge whether the hands of the driver leave the steering wheel, and giving an alarm if the driver does not have one hand to turn the steering wheel; if not, not alarming;

wherein, the step 1 is further specifically: the method comprises the steps that an infrared picture of a driver is collected through an in-vehicle monitoring camera, wherein the infrared picture comprises a picture of the camera right above the driver, a picture of the camera on the top of a vehicle door and a picture of a steering wheel crawled on the internet;

wherein: the step 3 is further specifically:

2. Vision-based steering wheel detection device is left to both hands, its characterized in that: comprising the following steps:

the conversion module converts the model into a model under ncnn;

the detection module is used for acquiring an infrared picture of a driver, processing the picture, inputting the picture into the model, analyzing a model result to acquire a steering wheel position, expanding a steering wheel region, selecting a set roi region, cutting out the roi region, processing the cut picture, inputting the picture into the model to judge whether the hands of the driver leave the steering wheel, and giving an alarm if the driver does not have one hand to turn the steering wheel; if not, not alarming;

wherein, the training optimization module further specifically comprises: the method comprises the steps that an infrared picture of a driver is collected through an in-vehicle monitoring camera, wherein the infrared picture comprises a picture of the camera right above the driver, a picture of the camera on the top of a vehicle door and a picture of a steering wheel crawled on the internet;

wherein, the detection module further specifically comprises: