CN116071817A - Network architecture and training method of gesture recognition system for automobile cabin - Google Patents

Network architecture and training method of gesture recognition system for automobile cabin Download PDF

Info

Publication number
CN116071817A
CN116071817A CN202211306446.8A CN202211306446A CN116071817A CN 116071817 A CN116071817 A CN 116071817A CN 202211306446 A CN202211306446 A CN 202211306446A CN 116071817 A CN116071817 A CN 116071817A
Authority
CN
China
Prior art keywords
gesture
gesture recognition
image
sample
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211306446.8A
Other languages
Chinese (zh)
Inventor
刘新华
贺之彬
郝敬宾
华德正
祁鹏
刘晓帆
周皓
王晴晴
格热戈尔茨·罗尔奇克
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Mining and Technology CUMT
Original Assignee
China University of Mining and Technology CUMT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Mining and Technology CUMT filed Critical China University of Mining and Technology CUMT
Priority to CN202211306446.8A priority Critical patent/CN116071817A/en
Publication of CN116071817A publication Critical patent/CN116071817A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a network architecture and a training method of a gesture recognition system for an automobile cabin, comprising the steps of training data preparation, collecting real-time video output by an RGB (red, green and blue) camera and a thermal imaging camera, extracting key characteristic information in the video, and executing related operations by an operation execution system according to gesture information in an SQL Server database; the data processing system executes corresponding feedback according to the CoAttention identification result; the gesture action of the driver is recognized through the corresponding gesture recognition device, and the gesture instruction is generated, so that the corresponding intelligent vehicle-mounted equipment executes corresponding operation according to the corresponding gesture instruction, safe driving is realized, potential safety hazards are avoided, the operation of the driver on the intelligent vehicle-mounted equipment is reduced, the driver does not need to transfer the line of sight to manually operate specific keys on the intelligent screen during the running of the automobile, the intelligent vehicle-mounted equipment can be controlled to perform corresponding operation, and the safety performance in the driving process is improved.

Description

Network architecture and training method of gesture recognition system for automobile cabin
Technical Field
The invention relates to a gesture recognition system for an automobile cabin, in particular to a network architecture and a training method of the gesture recognition system for the automobile cabin, and belongs to the technical field of intelligent vehicle-mounted equipment.
Background
With the continuous development of automatic driving technology and the continuous improvement of the living standard of people, the purchase of automatic driving/assisted driving automobiles taking new energy as a riding instead of walking tool is increasingly popular, so that great convenience is brought to the travel of people, the living quality of people is greatly improved, and a brand new driving experience is brought to people.
However, most of the existing automobiles still have certain drawbacks and disadvantages in the on-board configuration: in the driving process, a driver is difficult to avoid to operate an operation key on a central console of the automobile to start a vehicle-mounted voice system to answer/dial a call, operate a vehicle-mounted multimedia system to play music, start a navigator system to conduct real-time line navigation and the like, and the operations are easy to scatter the attention of the driver, so that potential safety hazards are caused, and more safety problems can be caused.
Therefore, the vehicle-mounted intelligent control system is a technical problem which is particularly concerned by the industry and vehicle owners.
Disclosure of Invention
The invention aims to solve at least one technical problem by providing a network architecture and a training method of a gesture recognition system for an automobile cabin, wherein a gesture recognition device which stores preset gestures and is in signal connection with corresponding intelligent vehicle-mounted equipment is arranged at a preset position of an automobile driver seat or a position adjacent to the automobile driver seat, gesture actions of a driver are recognized through the corresponding gesture recognition device, gesture instructions are generated, so that the corresponding intelligent vehicle-mounted equipment can execute corresponding operations according to the corresponding gesture instructions, safe driving is realized, potential safety hazards are avoided, and operations of the driver on the intelligent vehicle-mounted equipment are reduced.
The invention realizes the above purpose through the following technical scheme: the network architecture of the automobile cabin gesture recognition system comprises an intelligent vehicle-mounted device, wherein the intelligent vehicle-mounted device comprises a storage module for pre-storing preset gestures, a gesture acquisition module for acquiring gesture actions of a user, a gesture recognition module for comparing the similarity of the acquired gesture actions of the user with the preset gestures and generating comparison results, and a data processing module for receiving the comparison results, generating gesture instructions and executing operations corresponding to the gesture instructions according to the gesture instructions;
a training method of a gesture recognition system for an automobile cabin comprises the following steps:
step S1, manufacturing sample data by using RGB images shot by a high-definition camera and a thermal image shot by a thermal imaging camera, and transmitting the images to a vehicle-mounted data processing center through a data transmission device;
s2, after registering and fusing the RGB image and the thermal imaging image, carrying out image feature enhancement by adopting a bilateral self-adaptive Gamma enhancement algorithm based on Retinex theory improvement;
step S3, expanding the gesture sample image in the step S1 by adopting a small-batch gradient descent (Mini-Batch Gradient Descent, MBGD) optimized depth convolution countermeasure generation network (MGAN);
s4, adding a block diagram and a label to the gesture expansion picture generated in the gesture image in the step S2, and inputting the generated expansion picture and the label into a subsequent neural network;
step S5, a ResNet50 model with high precision and low complexity is adopted as a frame-level feature extraction network, 3 parallel 3DCNN flows are adopted, and each flow receives different RGB images respectively: the RGB image of the detected gesture, each parallel 3DCNN stream is a 3D-ResNet50 model; the outputs of the 3 DCNNs are connected and input to the next step of the model; 512 units are placed in the full connectivity layer after each 3D-ResNet50, the units being fixed independent of the dataset;
step S6, the gesture recognition network VQA adopting the proposed common attention mechanism obtains the most relevant image gesture features and corresponding semantics, and the participated images and semantic feature vectors are connected to a softmax layer for final result output;
step S7, a dynamic gesture recognition application program is designed by taking the trained CoAttention gesture recognition model as a core, and the size of a predefined answer set in the dynamic gesture recognition application program is set as the number of gesture labels in a data set;
and S8, transmitting the real-time data acquired by the equipment into a dynamic gesture recognition application program through a data transmission device, carrying out gesture classification recognition according to key feature information contained in the real-time video, and uploading related gesture recognition information to an SQL Server database through the data transmission device for storage.
As still further aspects of the invention: in the steps S2-S3, the selected network structure and algorithm take the precision and the code complexity into consideration, and image enhancement, feature extraction and reconstruction are performed aiming at the special environment of the automobile cabin.
As still further aspects of the invention: in the step S3, the method specifically includes:
step S3.1, inputting the gesture sample X collected in step S1 real And set the iteration number of MBGDAnd a step size;
step S3.2, generating sample Picture X by the Generator fake And in sample picture X fake Adding randomly distributed noise Z-N (0, 1) z
Step S3.3, calculating the true sample loss D (X) by the discriminator to obtain a true sample discriminator S real =d (X); the discriminator pair generates sample loss
Figure BDA0003906233090000031
Performing calculation to obtain a generated sample discriminator +.>
Figure BDA0003906233090000032
Step S3.4, iteratively updating the discriminator by adopting a small-batch gradient descent algorithm, wherein the learning rate ld=log (S) real )+log(1-S fake ),
Figure BDA0003906233090000033
Step S3.5, the generator iteratively updates with a small batch gradient descent algorithm, and the generator learning rate lg=log (S) fake ),
Figure BDA0003906233090000034
Step S3.6, lr=decay (lr, 400,0.96), learning rate decay of 0.96 every 400 steps;
step S3.7, repeating steps S3.2 to S3.5 until the whole network reaches the optimal or maximum iteration number, and outputting the generated sample
Figure BDA0003906233090000035
As still further aspects of the invention: in the step S6, the method specifically includes:
step S6.1, calculating visual and semantic features V of the input image based on the input image and its semantic feature vectors V and Q * 、Q *
h n =tanh(W v V)⊙tanh(W v m b )(1)
α n =softmaxW hv h n (2)
V * =tanh(α n V)(3)
h t =tanh(W q Q)⊙tanh(W m m b )(4)
α t =softmaxW hq h t (5)
Q * =tanh(α n Q)(6)
Wherein W is v 、W q 、W m 、W hv 、W hq Indicating hidden layer, alpha n 、α t The attention weight is represented, V and Q represent visual and semantic features of an input image, and V and Q are feature vectors of the image and the semantic respectively;
step S6.2, calculating the classification vector h by using the linear layer and the softmax function t By p to t Soft max is performed to calculate probability distribution p of different gesture classifications t
h t =tanh(W o O * )
p t =soft max(W h h t )
Wherein W is o And W is h Hidden parameters for the linear layer;
step S6.3, during the training phase, by minimizing the label y t Cross entropy on vectors to optimize gesture recognition neural network parameters
Figure BDA0003906233090000042
The CoAttention gesture recognition model pre-training is completed, and the CoAttention gesture recognition model is uploaded to an SQL SeFveF database through a data transmission device to be stored;
Figure BDA0003906233090000041
a training device for an automotive cabin gesture recognition system, the training device comprising a processor comprising a gesture recognition device or a training device for a gesture recognition network.
As still further aspects of the invention: the training device further comprises a memory for storing the gesture recognition result.
The beneficial effects of the invention are as follows:
aiming at the problem of insufficient vehicle-mounted gesture data sets, the invention provides a gesture sample generation model of an MGAN network, which effectively solves the sample problem required by deep learning network training; the invention provides a gesture recognition network VQA with a common attention mechanism, which combines the common attention machine with a video feature extraction network adopting 3DCNN to realize the real-time recognition of gestures, improves the accuracy of gesture recognition, and can more accurately distinguish different information expressed among different gestures; the gesture action of the driver is recognized through the corresponding gesture recognition device, and the gesture instruction is generated, so that the corresponding intelligent vehicle-mounted equipment executes corresponding operation according to the corresponding gesture instruction, safe driving is realized, potential safety hazards are avoided, the operation of the driver on the intelligent vehicle-mounted equipment is reduced, the driver does not need to transfer the line of sight to manually operate specific keys on the intelligent screen during the running of the automobile, the intelligent vehicle-mounted equipment can be controlled to perform corresponding operation, and the safety performance in the driving process is improved.
Drawings
FIG. 1 is a flow chart of a pre-training network of the present invention;
fig. 2 is a flow chart of the MGAN of the present invention;
FIG. 3 is a flow chart of a pre-training of a gesture recognition network of the present invention;
FIG. 4 is a flow chart of the gesture recognition system of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As shown in fig. 1 to 4, a network architecture of a gesture recognition system for an automobile cabin is provided, the training network architecture comprises an intelligent vehicle-mounted device, the intelligent vehicle-mounted device comprises a storage module for pre-storing preset gestures, a gesture acquisition module for acquiring gesture actions of a user, a gesture recognition module for comparing the similarity of the acquired gesture actions of the user with the preset gestures and generating comparison results, and a data processing module for receiving the comparison results, generating gesture instructions and executing operations corresponding to the gesture instructions according to the gesture instructions.
A training method of a gesture recognition system for an automobile cabin comprises the following steps:
step S1, manufacturing sample data by using RGB images shot by a high-definition camera and a thermal image shot by a thermal imaging camera, and transmitting the images to a vehicle-mounted data processing center through a data transmission device;
s2, after registering and fusing the RGB image and the thermal imaging image, carrying out image feature enhancement by adopting a bilateral self-adaptive Gamma enhancement algorithm based on Retinex theory improvement;
step S3, expanding the gesture sample image in the step S1 by adopting a small-batch gradient descent (Mini-Batch Gradient Descent, MBGD) optimized depth convolution countermeasure generation network (MGAN);
s4, adding a block diagram and a label to the gesture expansion picture generated in the gesture image in the step S2, and inputting the generated expansion picture and the label into a subsequent neural network;
step S5, a ResNet50 model with high precision and low complexity is adopted as a frame-level feature extraction network, 3 parallel 3DCNN flows are adopted, and each flow receives different RGB images respectively: the RGB image of the detected gesture, each parallel 3DCNN stream is a 3D-ResNet50 model; the outputs of the 3 DCNNs are connected and input to the next step of the model; 512 units are placed in the full connectivity layer after each 3D-ResNet50, the units being fixed independent of the dataset;
step S6, the gesture recognition network VQA adopting the proposed common attention mechanism obtains the most relevant image gesture features and corresponding semantics, and the participated images and semantic feature vectors are connected to a softmax layer for final result output;
step S7, a dynamic gesture recognition application program is designed by taking the trained CoAttention gesture recognition model as a core, and the size of a predefined answer set in the dynamic gesture recognition application program is set as the number of gesture labels in a data set;
and S8, transmitting the real-time data acquired by the equipment into a dynamic gesture recognition application program through a data transmission device, carrying out gesture classification recognition according to key feature information contained in the real-time video, and uploading related gesture recognition information to an SQL Server database through the data transmission device for storage.
In the embodiment of the invention, in the steps S2-S3, the selected network structure and algorithm take the precision and the code complexity into consideration at the same time, and the image enhancement, the feature extraction and the reconstruction are carried out aiming at the special environment of the automobile cabin.
In the embodiment of the present invention, the step s3 specifically includes:
step S3.1, inputting the gesture sample X acquired in step S1 real Setting the iteration times and the step length of MBGD;
step S3.2, generating sample Picture X by the Generator fake And in sample picture X fake Adding randomly distributed noise Z-N (0, 1) z
Step S3.3, calculating the true sample loss D (X) by the discriminator to obtain a true sample discriminator S real =d (X); the discriminator pair generates sample loss
Figure BDA0003906233090000071
Performing calculation to obtain a generated sample discriminator +.>
Figure BDA0003906233090000072
Step S3.4, iteratively updating the discriminator by using a small-batch gradient descent algorithm, and learning the discriminator with ld=log (S) real )+log(1-S fake ),
Figure BDA0003906233090000073
Step S3.5, the generator iteratively updates with a small batch gradient descent algorithm, and the generator learning rate lg=log (S fake ),
Figure BDA0003906233090000074
Step S3.6, lr=decay (lr, 400,0.96), learning rate decay of 0.96 every 400 steps;
step S3.7, repeating steps S3.2 to S3.5 until the whole network reaches the optimal or maximum iteration number, and outputting the generated sample
Figure BDA0003906233090000075
/>
Example two
As shown in fig. 1 to 4, a network architecture of a gesture recognition system for an automobile cabin is provided, the training network architecture comprises an intelligent vehicle-mounted device, the intelligent vehicle-mounted device comprises a storage module for pre-storing preset gestures, a gesture acquisition module for acquiring gesture actions of a user, a gesture recognition module for comparing the similarity of the acquired gesture actions of the user with the preset gestures and generating comparison results, and a data processing module for receiving the comparison results, generating gesture instructions and executing operations corresponding to the gesture instructions according to the gesture instructions.
A training method of a gesture recognition system for an automobile cabin comprises the following steps:
step S1, manufacturing sample data by using RGB images shot by a high-definition camera and a thermal image shot by a thermal imaging camera, and transmitting the images to a vehicle-mounted data processing center through a data transmission device;
s2, after registering and fusing the RGB image and the thermal imaging image, carrying out image feature enhancement by adopting a bilateral self-adaptive Gamma enhancement algorithm based on Retinex theory improvement;
step S3, expanding the gesture sample image in the step S1 by adopting a small-batch gradient descent (Mini-Batch Gradient Descent, MBGD) optimized depth convolution countermeasure generation network (MGAN);
s4, adding a block diagram and a label to the gesture expansion picture generated in the gesture image in the step S2, and inputting the generated expansion picture and the label into a subsequent neural network;
step S5, a ResNet50 model with high precision and low complexity is adopted as a frame-level feature extraction network, 3 parallel 3DCNN flows are adopted, and each flow receives different RGB images respectively: the RGB image of the detected gesture, each parallel 3DCNN stream is a 3D-ResNet50 model; the outputs of the 3 DCNNs are connected and input to the next step of the model; 512 units are placed in the full connectivity layer after each 3D-ResNet50, the units being fixed independent of the dataset;
step S6, the gesture recognition network VQA adopting the proposed common attention mechanism obtains the most relevant image gesture features and corresponding semantics, and the participated images and semantic feature vectors are connected to a softmax layer for final result output;
step S7, a dynamic gesture recognition application program is designed by taking the trained CoAttention gesture recognition model as a core, and the size of a predefined answer set in the dynamic gesture recognition application program is set as the number of gesture labels in a data set;
and S8, transmitting the real-time data acquired by the equipment into a dynamic gesture recognition application program through a data transmission device, carrying out gesture classification recognition according to key feature information contained in the real-time video, and uploading related gesture recognition information to an SQL Server database through the data transmission device for storage.
In the embodiment of the present invention, the step S6 specifically includes:
step S6.1, calculating visual and semantic features V of the input image based on the input image and its semantic feature vectors V and Q * 、Q *
h n =tanh(W v V)⊙tanh(W v m b )(1)
α n =softmaxW hv h n (2)
V * =tanh(α n V)(3)
h t =tanh(W q Q)⊙tanh(W m m b )(4)
α t =softmaxW hq h t (5)
Q * =tanh(α n Q)(6)
Wherein W is v 、W q 、W m 、W hv 、W hq Indicating hidden layer, alpha n 、α t The attention weight is represented, V and Q represent visual and semantic features of an input image, and V and Q are feature vectors of the image and the semantic respectively;
step S6.2, calculating the vector p by using the linear layer and the softmax function t To represent probability distributions of different gesture classifications;
h t =tanh(W o O * )
p t =soft max(W h h t )
wherein W is o And W is h Hidden parameters for the linear layer;
step S6.3, during the training phase, by minimizing the label y t Cross entropy on vectors to optimize gesture recognition neural network parameters
Figure BDA0003906233090000092
The CoAttention gesture recognition model pre-training is completed, and the CoAttention gesture recognition model is uploaded to an SQL Server database for storage through a data transmission device;
Figure BDA0003906233090000091
a training device of a gesture recognition system for an automobile cabin comprises a memory for storing gesture recognition results.
In an embodiment of the present invention, the training device comprises a processor, and the processor comprises a gesture recognition device or a training device of a gesture recognition network.
The gesture recognition network is obtained through training of the to-be-processed image combined with the weight vector, the coordinate information comprises gesture frame coordinates and/or key point coordinates, the gesture classification information is used for marking that a gesture in the gesture frame image belongs to one preset gesture in a plurality of preset gestures, and the background information comprises a foreground image and a background image.
Pre-collecting RGB images shot by an RGB camera and heat maps shot by a thermal imaging camera, and transmitting the images to a vehicle-mounted data processing center through a data transmission device; after registering and fusing the RGB image and the thermal imaging image, adopting a bilateral self-adaptive Gamma enhancement algorithm based on RetinexX theory improvement to enhance the image characteristics; the data processing center adopts a small batch gradient descent (MBGD) optimized depth convolution countermeasure generation network (MGAN) to expand the pattern sample image; adding a label to the gesture expansion picture, and inputting the generated expansion picture and the label into a subsequent CoAttention neural network, thereby completing the pre-training of the CoAttention classification recognition network; acquiring real-time videos output by an RGB video camera and a thermal imaging camera, and transmitting the images to a vehicle-mounted data processing center through a data transmission device; the method comprises the steps of adopting a ResNet50 model with high precision and low complexity as a frame-level feature extractor to extract key feature information in a video; the data processing system utilizes a pretrained CoAttention classification and identification model to classify and identify key feature information in the extracted video, and uploads relevant identification information to an SQL Server database for storage through a data transmission device; the operation execution system executes related operations according to gesture information in the SQL Server database; and the data processing system executes corresponding feedback according to the CoAttention identification result, so that the whole closed loop process is ended.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.

Claims (7)

1. The utility model provides a network architecture of gesture recognition system for car cabin which characterized in that: the network architecture comprises intelligent vehicle-mounted equipment, wherein the intelligent vehicle-mounted equipment comprises a storage module pre-stored with preset gestures, a gesture acquisition module used for acquiring gesture actions of a user, a gesture recognition module used for comparing the similarity of the acquired gesture actions of the user with the preset gestures and generating comparison results, and a data processing module used for receiving the comparison results to generate gesture instructions and executing operations corresponding to the gesture instructions according to the gesture instructions.
2. A training method for a gesture recognition system for an automobile cabin, the training method comprising:
step S1, manufacturing sample data by using RGB images shot by a high-definition camera and a thermal image shot by a thermal imaging camera, and transmitting the images to a vehicle-mounted data processing center through a data transmission device;
s2, after registering and fusing the RGB image and the thermal imaging image, carrying out image feature enhancement by adopting a bilateral self-adaptive Gamma enhancement algorithm based on Retinex theory improvement;
step S3, a gradient descent optimized depth convolution countermeasure generation network is adopted to expand the gesture sample image in the step S1;
s4, adding a block diagram and a label to the gesture expansion picture generated in the gesture image in the step S2, and inputting the generated expansion picture and the label into a subsequent neural network;
step S5, a ResNet50 model is adopted as a frame-level feature extraction network, 3 parallel 3DCNN flows are adopted, and each flow receives different RGB images respectively: the RGB image of the detected gesture, each parallel 3DCNN stream is a 3D-ResNet50 model; the outputs of the 3 DCNNs are connected and input to the next step of the model; 512 units are placed in the full connectivity layer after each 3D-ResNet50, the units being fixed independent of the dataset;
step S6, the gesture recognition network VQA adopting the proposed common attention mechanism obtains the most relevant image gesture features and corresponding semantics, and the participated images and semantic feature vectors are connected to a softmax layer for final result output;
step S7, a dynamic gesture recognition application program is designed by taking the trained CoAttention gesture recognition model as a core, and the size of a predefined answer set in the dynamic gesture recognition application program is set as the number of gesture labels in a data set;
and S8, transmitting the real-time data acquired by the equipment into a dynamic gesture recognition application program through a data transmission device, carrying out gesture classification recognition according to key feature information contained in the real-time video, and uploading related gesture recognition information to an SQL Server database through the data transmission device for storage.
3. A training method of a gesture recognition system for a car cabin according to claim 2, wherein: in the steps S2-S3, the selected network structure and algorithm take the precision and the code complexity into consideration, and image enhancement, feature extraction and reconstruction are performed aiming at the special environment of the automobile cabin.
4. A training method of a gesture recognition system for a car cabin according to claim 2, wherein: in the step S3, the method specifically includes:
step S3.1, inputting the gesture sample X collected in step S1 real Setting the iteration times and the step length of MBGD;
step S3.2, generating sample Picture X by the Generator fake And in sample picture X fake Adding randomly distributed noise Z-N (0, 1) z
Step S3.3, calculating the true sample loss D (X) by the discriminator to obtain a true sample discriminator S real =d (X); the discriminator pair generates sample loss
Figure FDA0003906233080000021
Performing calculation to obtain a generated sample discriminator +.>
Figure FDA0003906233080000022
Step S3.4, the discriminator adopts a small batch gradient descent algorithm to carry out iterative updating, and the learning rate L of the discriminator D =log(S real )+log(1-S fake ),
Figure FDA0003906233080000023
/>
Step S3.5, the generator adopts a small batch gradient descent algorithm to carry out iterative updating, and the generator learns the rate L G =log(S fake ),
Figure FDA0003906233080000024
Step S3.6, learning rate lr=decay (lr, 400,0.96), learning rate decay of 0.96 every 400 steps;
step S3.7, repeating steps S3.2 to S3.5 until the whole network reaches the optimal or maximum iteration number, and outputting the generated sample
Figure FDA0003906233080000025
5. A training method of a gesture recognition system for a car cabin according to claim 1, wherein: in the step S6, the method specifically includes:
step S6.1 rootCalculating visual and semantic features V of an input image based on feature vectors V and Q of the input image and its semantics * 、Q *
h n =tanh(W v V)⊙tanh(W v m b ) (1)
α n =softmaxW hv h n (2)
V * =tanh(α n V) (3)
h t =tanh(W q Q)⊙tanh(W m m b ) (4)
α t =softmaxW hq h t (5)
Q * =tanh(α n Q) (6)
Wherein W is v 、W q 、W m 、W hv 、W hq Indicating hidden layer, alpha n 、α t The attention weight is represented, V and Q represent visual and semantic features of an input image, and V and Q are feature vectors of the image and the semantic respectively;
step S6.2, calculating the classification vector h by using the linear layer and the softmax function t By p to t Soft max is performed to calculate probability distribution p of different gesture classifications t
h t =tanh(W o O * )
p t =soft max(W h h t )
Wherein W is o And W is h Hidden parameters for the linear layer;
step S6.3, during the training phase, by minimizing the label y t Cross entropy on vectors to optimize gesture recognition neural network parameters
Figure FDA0003906233080000032
The pretraining of the CoAttention gesture recognition model is completed, and the pretraining is uploaded to an SQL Server database for speaking and storing through a data transmission device;
Figure FDA0003906233080000031
6. a training device of a gesture recognition system for an automobile cabin is characterized in that: the training device includes a processor that includes a gesture recognition device or a training device of a gesture recognition network.
7. The training device of a gesture recognition system for an automobile cabin according to claim 6, wherein: a memory for storing the gesture recognition results is also included.
CN202211306446.8A 2022-10-25 2022-10-25 Network architecture and training method of gesture recognition system for automobile cabin Pending CN116071817A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211306446.8A CN116071817A (en) 2022-10-25 2022-10-25 Network architecture and training method of gesture recognition system for automobile cabin

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211306446.8A CN116071817A (en) 2022-10-25 2022-10-25 Network architecture and training method of gesture recognition system for automobile cabin

Publications (1)

Publication Number Publication Date
CN116071817A true CN116071817A (en) 2023-05-05

Family

ID=86182739

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211306446.8A Pending CN116071817A (en) 2022-10-25 2022-10-25 Network architecture and training method of gesture recognition system for automobile cabin

Country Status (1)

Country Link
CN (1) CN116071817A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117152843A (en) * 2023-09-06 2023-12-01 世优(北京)科技有限公司 Digital person action control method and system
CN117218716A (en) * 2023-08-10 2023-12-12 中国矿业大学 DVS-based automobile cabin gesture recognition system and method
CN117351557A (en) * 2023-08-17 2024-01-05 中国矿业大学 Vehicle-mounted gesture recognition method for deep learning

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117218716A (en) * 2023-08-10 2023-12-12 中国矿业大学 DVS-based automobile cabin gesture recognition system and method
CN117218716B (en) * 2023-08-10 2024-04-09 中国矿业大学 DVS-based automobile cabin gesture recognition system and method
CN117351557A (en) * 2023-08-17 2024-01-05 中国矿业大学 Vehicle-mounted gesture recognition method for deep learning
CN117152843A (en) * 2023-09-06 2023-12-01 世优(北京)科技有限公司 Digital person action control method and system
CN117152843B (en) * 2023-09-06 2024-05-07 世优(北京)科技有限公司 Digital person action control method and system

Similar Documents

Publication Publication Date Title
CN110021051B (en) Human image generation method based on generation of confrontation network through text guidance
CN116071817A (en) Network architecture and training method of gesture recognition system for automobile cabin
CN110399821B (en) Customer satisfaction acquisition method based on facial expression recognition
CN107679491A (en) A kind of 3D convolutional neural networks sign Language Recognition Methods for merging multi-modal data
CN109740419A (en) A kind of video behavior recognition methods based on Attention-LSTM network
CN112784929B (en) Small sample image classification method and device based on double-element group expansion
Rani et al. Object detection and recognition using contour based edge detection and fast R-CNN
CN107194344B (en) Human behavior recognition method adaptive to bone center
CN113807340B (en) Attention mechanism-based irregular natural scene text recognition method
CN112232395B (en) Semi-supervised image classification method for generating countermeasure network based on joint training
CN111914911A (en) Vehicle re-identification method based on improved depth relative distance learning model
CN113378949A (en) Dual-generation confrontation learning method based on capsule network and mixed attention
CN115861981A (en) Driver fatigue behavior detection method and system based on video attitude invariance
CN115731597A (en) Automatic segmentation and restoration management platform and method for mask image of face mask
Shi et al. Learning attention-enhanced spatiotemporal representation for action recognition
CN117710841A (en) Small target detection method and device for aerial image of unmanned aerial vehicle
CN116645287B (en) Diffusion model-based image deblurring method
CN117726809A (en) Small sample semantic segmentation method based on information interaction enhancement
CN115984949B (en) Low-quality face image recognition method and equipment with attention mechanism
Han et al. Feature fusion and adversary occlusion networks for object detection
CN110929632A (en) Complex scene-oriented vehicle target detection method and device
CN110942463A (en) Video target segmentation method based on generation countermeasure network
CN116311251A (en) Lightweight semantic segmentation method for high-precision stereoscopic perception of complex scene
KR102279772B1 (en) Method and Apparatus for Generating Videos with The Arrow of Time
CN110688986B (en) 3D convolution behavior recognition network method guided by attention branches

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination