CN116071817A

CN116071817A - Network architecture and training method of gesture recognition system for automobile cabin

Info

Publication number: CN116071817A
Application number: CN202211306446.8A
Authority: CN
Inventors: 刘新华; 贺之彬; 郝敬宾; 华德正; 祁鹏; 刘晓帆; 周皓; 王晴晴; 格热戈尔茨·罗尔奇克
Original assignee: China University of Mining and Technology CUMT
Current assignee: China University of Mining and Technology CUMT
Priority date: 2022-10-25
Filing date: 2022-10-25
Publication date: 2023-05-05

Abstract

The invention discloses a network architecture and a training method of a gesture recognition system for an automobile cabin, comprising the steps of training data preparation, collecting real-time video output by an RGB (red, green and blue) camera and a thermal imaging camera, extracting key characteristic information in the video, and executing related operations by an operation execution system according to gesture information in an SQL Server database; the data processing system executes corresponding feedback according to the CoAttention identification result; the gesture action of the driver is recognized through the corresponding gesture recognition device, and the gesture instruction is generated, so that the corresponding intelligent vehicle-mounted equipment executes corresponding operation according to the corresponding gesture instruction, safe driving is realized, potential safety hazards are avoided, the operation of the driver on the intelligent vehicle-mounted equipment is reduced, the driver does not need to transfer the line of sight to manually operate specific keys on the intelligent screen during the running of the automobile, the intelligent vehicle-mounted equipment can be controlled to perform corresponding operation, and the safety performance in the driving process is improved.

Description

Network architecture and training method of gesture recognition system for automobile cabin

Technical Field

The invention relates to a gesture recognition system for an automobile cabin, in particular to a network architecture and a training method of the gesture recognition system for the automobile cabin, and belongs to the technical field of intelligent vehicle-mounted equipment.

Background

With the continuous development of automatic driving technology and the continuous improvement of the living standard of people, the purchase of automatic driving/assisted driving automobiles taking new energy as a riding instead of walking tool is increasingly popular, so that great convenience is brought to the travel of people, the living quality of people is greatly improved, and a brand new driving experience is brought to people.

However, most of the existing automobiles still have certain drawbacks and disadvantages in the on-board configuration: in the driving process, a driver is difficult to avoid to operate an operation key on a central console of the automobile to start a vehicle-mounted voice system to answer/dial a call, operate a vehicle-mounted multimedia system to play music, start a navigator system to conduct real-time line navigation and the like, and the operations are easy to scatter the attention of the driver, so that potential safety hazards are caused, and more safety problems can be caused.

Therefore, the vehicle-mounted intelligent control system is a technical problem which is particularly concerned by the industry and vehicle owners.

Disclosure of Invention

The invention aims to solve at least one technical problem by providing a network architecture and a training method of a gesture recognition system for an automobile cabin, wherein a gesture recognition device which stores preset gestures and is in signal connection with corresponding intelligent vehicle-mounted equipment is arranged at a preset position of an automobile driver seat or a position adjacent to the automobile driver seat, gesture actions of a driver are recognized through the corresponding gesture recognition device, gesture instructions are generated, so that the corresponding intelligent vehicle-mounted equipment can execute corresponding operations according to the corresponding gesture instructions, safe driving is realized, potential safety hazards are avoided, and operations of the driver on the intelligent vehicle-mounted equipment are reduced.

The invention realizes the above purpose through the following technical scheme: the network architecture of the automobile cabin gesture recognition system comprises an intelligent vehicle-mounted device, wherein the intelligent vehicle-mounted device comprises a storage module for pre-storing preset gestures, a gesture acquisition module for acquiring gesture actions of a user, a gesture recognition module for comparing the similarity of the acquired gesture actions of the user with the preset gestures and generating comparison results, and a data processing module for receiving the comparison results, generating gesture instructions and executing operations corresponding to the gesture instructions according to the gesture instructions;

a training method of a gesture recognition system for an automobile cabin comprises the following steps:

step S1, manufacturing sample data by using RGB images shot by a high-definition camera and a thermal image shot by a thermal imaging camera, and transmitting the images to a vehicle-mounted data processing center through a data transmission device;

s2, after registering and fusing the RGB image and the thermal imaging image, carrying out image feature enhancement by adopting a bilateral self-adaptive Gamma enhancement algorithm based on Retinex theory improvement;

step S3, expanding the gesture sample image in the step S1 by adopting a small-batch gradient descent (Mini-Batch Gradient Descent, MBGD) optimized depth convolution countermeasure generation network (MGAN);

s4, adding a block diagram and a label to the gesture expansion picture generated in the gesture image in the step S2, and inputting the generated expansion picture and the label into a subsequent neural network;

step S5, a ResNet50 model with high precision and low complexity is adopted as a frame-level feature extraction network, 3 parallel 3DCNN flows are adopted, and each flow receives different RGB images respectively: the RGB image of the detected gesture, each parallel 3DCNN stream is a 3D-ResNet50 model; the outputs of the 3 DCNNs are connected and input to the next step of the model; 512 units are placed in the full connectivity layer after each 3D-ResNet50, the units being fixed independent of the dataset;

step S6, the gesture recognition network VQA adopting the proposed common attention mechanism obtains the most relevant image gesture features and corresponding semantics, and the participated images and semantic feature vectors are connected to a softmax layer for final result output;

step S7, a dynamic gesture recognition application program is designed by taking the trained CoAttention gesture recognition model as a core, and the size of a predefined answer set in the dynamic gesture recognition application program is set as the number of gesture labels in a data set;

and S8, transmitting the real-time data acquired by the equipment into a dynamic gesture recognition application program through a data transmission device, carrying out gesture classification recognition according to key feature information contained in the real-time video, and uploading related gesture recognition information to an SQL Server database through the data transmission device for storage.

As still further aspects of the invention: in the steps S2-S3, the selected network structure and algorithm take the precision and the code complexity into consideration, and image enhancement, feature extraction and reconstruction are performed aiming at the special environment of the automobile cabin.

As still further aspects of the invention: in the step S3, the method specifically includes:

step S3.1, inputting the gesture sample X collected in step S1 _real And set the iteration number of MBGDAnd a step size;

step S3.2, generating sample Picture X by the Generator _fake And in sample picture X _fake Adding randomly distributed noise Z-N (0, 1) ^z ；

Step S3.3, calculating the true sample loss D (X) by the discriminator to obtain a true sample discriminator S _real =d (X); the discriminator pair generates sample loss

Performing calculation to obtain a generated sample discriminator +.>

Step S3.4, iteratively updating the discriminator by adopting a small-batch gradient descent algorithm, wherein the learning rate ld=log (S) _real )+log(1-S _fake )，

Step S3.5, the generator iteratively updates with a small batch gradient descent algorithm, and the generator learning rate lg=log (S) _fake )，

Step S3.6, lr=decay (lr, 400,0.96), learning rate decay of 0.96 every 400 steps;

step S3.7, repeating steps S3.2 to S3.5 until the whole network reaches the optimal or maximum iteration number, and outputting the generated sample

As still further aspects of the invention: in the step S6, the method specifically includes:

step S6.1, calculating visual and semantic features V of the input image based on the input image and its semantic feature vectors V and Q ^* 、Q ^* ；

h _n ＝tanh(W _v V)⊙tanh(W _v m _b )(1)

α _n ＝softmaxW _hv h _n (2)

V ^* ＝tanh(α _n V)(3)

h _t ＝tanh(W _q Q)⊙tanh(W _m m _b )(4)

α _t ＝softmaxW _hq h _t (5)

Q ^* ＝tanh(α _n Q)(6)

Wherein W is _v 、W _q 、W _m 、W _hv 、W _hq Indicating hidden layer, alpha _n 、α _t The attention weight is represented, V and Q represent visual and semantic features of an input image, and V and Q are feature vectors of the image and the semantic respectively;

step S6.2, calculating the classification vector h by using the linear layer and the softmax function _t By p to _t Soft max is performed to calculate probability distribution p of different gesture classifications _t ；

h _t ＝tanh(W _o O ^* )

p _t ＝soft max(W _h h _t )

Wherein W is _o And W is _h Hidden parameters for the linear layer;

step S6.3, during the training phase, by minimizing the label y _t Cross entropy on vectors to optimize gesture recognition neural network parameters

The CoAttention gesture recognition model pre-training is completed, and the CoAttention gesture recognition model is uploaded to an SQL SeFveF database through a data transmission device to be stored;

a training device for an automotive cabin gesture recognition system, the training device comprising a processor comprising a gesture recognition device or a training device for a gesture recognition network.

As still further aspects of the invention: the training device further comprises a memory for storing the gesture recognition result.

The beneficial effects of the invention are as follows:

aiming at the problem of insufficient vehicle-mounted gesture data sets, the invention provides a gesture sample generation model of an MGAN network, which effectively solves the sample problem required by deep learning network training; the invention provides a gesture recognition network VQA with a common attention mechanism, which combines the common attention machine with a video feature extraction network adopting 3DCNN to realize the real-time recognition of gestures, improves the accuracy of gesture recognition, and can more accurately distinguish different information expressed among different gestures; the gesture action of the driver is recognized through the corresponding gesture recognition device, and the gesture instruction is generated, so that the corresponding intelligent vehicle-mounted equipment executes corresponding operation according to the corresponding gesture instruction, safe driving is realized, potential safety hazards are avoided, the operation of the driver on the intelligent vehicle-mounted equipment is reduced, the driver does not need to transfer the line of sight to manually operate specific keys on the intelligent screen during the running of the automobile, the intelligent vehicle-mounted equipment can be controlled to perform corresponding operation, and the safety performance in the driving process is improved.

Drawings

FIG. 1 is a flow chart of a pre-training network of the present invention;

fig. 2 is a flow chart of the MGAN of the present invention;

FIG. 3 is a flow chart of a pre-training of a gesture recognition network of the present invention;

FIG. 4 is a flow chart of the gesture recognition system of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

As shown in fig. 1 to 4, a network architecture of a gesture recognition system for an automobile cabin is provided, the training network architecture comprises an intelligent vehicle-mounted device, the intelligent vehicle-mounted device comprises a storage module for pre-storing preset gestures, a gesture acquisition module for acquiring gesture actions of a user, a gesture recognition module for comparing the similarity of the acquired gesture actions of the user with the preset gestures and generating comparison results, and a data processing module for receiving the comparison results, generating gesture instructions and executing operations corresponding to the gesture instructions according to the gesture instructions.

In the embodiment of the invention, in the steps S2-S3, the selected network structure and algorithm take the precision and the code complexity into consideration at the same time, and the image enhancement, the feature extraction and the reconstruction are carried out aiming at the special environment of the automobile cabin.

In the embodiment of the present invention, the step s3 specifically includes:

step S3.1, inputting the gesture sample X acquired in step S1 _real Setting the iteration times and the step length of MBGD;

Performing calculation to obtain a generated sample discriminator +.>

Step S3.4, iteratively updating the discriminator by using a small-batch gradient descent algorithm, and learning the discriminator with ld=log (S) _real )+log(1-S _fake )，

Step S3.5, the generator iteratively updates with a small batch gradient descent algorithm, and the generator learning rate lg=log (S _fake )，

/>

Example two

In the embodiment of the present invention, the step S6 specifically includes:

h _n ＝tanh(W _v V)⊙tanh(W _v m _b )(1)

α _n ＝softmaxW _hv h _n (2)

V ^* ＝tanh(α _n V)(3)

h _t ＝tanh(W _q Q)⊙tanh(W _m m _b )(4)

α _t ＝softmaxW _hq h _t (5)

Q ^* ＝tanh(α _n Q)(6)

step S6.2, calculating the vector p by using the linear layer and the softmax function _t To represent probability distributions of different gesture classifications;

h _t ＝tanh(W _o O ^* )

p _t ＝soft max(W _h h _t )

wherein W is _o And W is _h Hidden parameters for the linear layer;

The CoAttention gesture recognition model pre-training is completed, and the CoAttention gesture recognition model is uploaded to an SQL Server database for storage through a data transmission device;

a training device of a gesture recognition system for an automobile cabin comprises a memory for storing gesture recognition results.

In an embodiment of the present invention, the training device comprises a processor, and the processor comprises a gesture recognition device or a training device of a gesture recognition network.

The gesture recognition network is obtained through training of the to-be-processed image combined with the weight vector, the coordinate information comprises gesture frame coordinates and/or key point coordinates, the gesture classification information is used for marking that a gesture in the gesture frame image belongs to one preset gesture in a plurality of preset gestures, and the background information comprises a foreground image and a background image.

Pre-collecting RGB images shot by an RGB camera and heat maps shot by a thermal imaging camera, and transmitting the images to a vehicle-mounted data processing center through a data transmission device; after registering and fusing the RGB image and the thermal imaging image, adopting a bilateral self-adaptive Gamma enhancement algorithm based on RetinexX theory improvement to enhance the image characteristics; the data processing center adopts a small batch gradient descent (MBGD) optimized depth convolution countermeasure generation network (MGAN) to expand the pattern sample image; adding a label to the gesture expansion picture, and inputting the generated expansion picture and the label into a subsequent CoAttention neural network, thereby completing the pre-training of the CoAttention classification recognition network; acquiring real-time videos output by an RGB video camera and a thermal imaging camera, and transmitting the images to a vehicle-mounted data processing center through a data transmission device; the method comprises the steps of adopting a ResNet50 model with high precision and low complexity as a frame-level feature extractor to extract key feature information in a video; the data processing system utilizes a pretrained CoAttention classification and identification model to classify and identify key feature information in the extracted video, and uploads relevant identification information to an SQL Server database for storage through a data transmission device; the operation execution system executes related operations according to gesture information in the SQL Server database; and the data processing system executes corresponding feedback according to the CoAttention identification result, so that the whole closed loop process is ended.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.

Claims

1. The utility model provides a network architecture of gesture recognition system for car cabin which characterized in that: the network architecture comprises intelligent vehicle-mounted equipment, wherein the intelligent vehicle-mounted equipment comprises a storage module pre-stored with preset gestures, a gesture acquisition module used for acquiring gesture actions of a user, a gesture recognition module used for comparing the similarity of the acquired gesture actions of the user with the preset gestures and generating comparison results, and a data processing module used for receiving the comparison results to generate gesture instructions and executing operations corresponding to the gesture instructions according to the gesture instructions.

2. A training method for a gesture recognition system for an automobile cabin, the training method comprising:

step S3, a gradient descent optimized depth convolution countermeasure generation network is adopted to expand the gesture sample image in the step S1;

step S5, a ResNet50 model is adopted as a frame-level feature extraction network, 3 parallel 3DCNN flows are adopted, and each flow receives different RGB images respectively: the RGB image of the detected gesture, each parallel 3DCNN stream is a 3D-ResNet50 model; the outputs of the 3 DCNNs are connected and input to the next step of the model; 512 units are placed in the full connectivity layer after each 3D-ResNet50, the units being fixed independent of the dataset;

3. A training method of a gesture recognition system for a car cabin according to claim 2, wherein: in the steps S2-S3, the selected network structure and algorithm take the precision and the code complexity into consideration, and image enhancement, feature extraction and reconstruction are performed aiming at the special environment of the automobile cabin.

4. A training method of a gesture recognition system for a car cabin according to claim 2, wherein: in the step S3, the method specifically includes:

step S3.1, inputting the gesture sample X collected in step S1 _real Setting the iteration times and the step length of MBGD;

Performing calculation to obtain a generated sample discriminator +.>

Step S3.4, the discriminator adopts a small batch gradient descent algorithm to carry out iterative updating, and the learning rate L of the discriminator _D ＝log(S _real )+log(1-S _fake )，

/>

Step S3.5, the generator adopts a small batch gradient descent algorithm to carry out iterative updating, and the generator learns the rate L _G ＝log(S _fake )，

Step S3.6, learning rate lr=decay (lr, 400,0.96), learning rate decay of 0.96 every 400 steps;

5. A training method of a gesture recognition system for a car cabin according to claim 1, wherein: in the step S6, the method specifically includes:

step S6.1 rootCalculating visual and semantic features V of an input image based on feature vectors V and Q of the input image and its semantics ^* 、Q ^* ；

h _n ＝tanh(W _v V)⊙tanh(W _v m _b ) (1)

α _n ＝softmaxW _hv h _n (2)

V ^* ＝tanh(α _n V) (3)

h _t ＝tanh(W _q Q)⊙tanh(W _m m _b ) (4)

α _t ＝softmaxW _hq h _t (5)

Q ^* ＝tanh(α _n Q) (6)

h _t ＝tanh(W _o O ^* )

p _t ＝soft max(W _h h _t )

Wherein W is _o And W is _h Hidden parameters for the linear layer;

The pretraining of the CoAttention gesture recognition model is completed, and the pretraining is uploaded to an SQL Server database for speaking and storing through a data transmission device;

6. a training device of a gesture recognition system for an automobile cabin is characterized in that: the training device includes a processor that includes a gesture recognition device or a training device of a gesture recognition network.

7. The training device of a gesture recognition system for an automobile cabin according to claim 6, wherein: a memory for storing the gesture recognition results is also included.