CN116071817A - Network architecture and training method of gesture recognition system for automobile cabin - Google Patents
Network architecture and training method of gesture recognition system for automobile cabin Download PDFInfo
- Publication number
- CN116071817A CN116071817A CN202211306446.8A CN202211306446A CN116071817A CN 116071817 A CN116071817 A CN 116071817A CN 202211306446 A CN202211306446 A CN 202211306446A CN 116071817 A CN116071817 A CN 116071817A
- Authority
- CN
- China
- Prior art keywords
- gesture
- gesture recognition
- image
- sample
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Social Psychology (AREA)
- Psychiatry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a network architecture and a training method of a gesture recognition system for an automobile cabin, comprising the steps of training data preparation, collecting real-time video output by an RGB (red, green and blue) camera and a thermal imaging camera, extracting key characteristic information in the video, and executing related operations by an operation execution system according to gesture information in an SQL Server database; the data processing system executes corresponding feedback according to the CoAttention identification result; the gesture action of the driver is recognized through the corresponding gesture recognition device, and the gesture instruction is generated, so that the corresponding intelligent vehicle-mounted equipment executes corresponding operation according to the corresponding gesture instruction, safe driving is realized, potential safety hazards are avoided, the operation of the driver on the intelligent vehicle-mounted equipment is reduced, the driver does not need to transfer the line of sight to manually operate specific keys on the intelligent screen during the running of the automobile, the intelligent vehicle-mounted equipment can be controlled to perform corresponding operation, and the safety performance in the driving process is improved.
Description
Technical Field
The invention relates to a gesture recognition system for an automobile cabin, in particular to a network architecture and a training method of the gesture recognition system for the automobile cabin, and belongs to the technical field of intelligent vehicle-mounted equipment.
Background
With the continuous development of automatic driving technology and the continuous improvement of the living standard of people, the purchase of automatic driving/assisted driving automobiles taking new energy as a riding instead of walking tool is increasingly popular, so that great convenience is brought to the travel of people, the living quality of people is greatly improved, and a brand new driving experience is brought to people.
However, most of the existing automobiles still have certain drawbacks and disadvantages in the on-board configuration: in the driving process, a driver is difficult to avoid to operate an operation key on a central console of the automobile to start a vehicle-mounted voice system to answer/dial a call, operate a vehicle-mounted multimedia system to play music, start a navigator system to conduct real-time line navigation and the like, and the operations are easy to scatter the attention of the driver, so that potential safety hazards are caused, and more safety problems can be caused.
Therefore, the vehicle-mounted intelligent control system is a technical problem which is particularly concerned by the industry and vehicle owners.
Disclosure of Invention
The invention aims to solve at least one technical problem by providing a network architecture and a training method of a gesture recognition system for an automobile cabin, wherein a gesture recognition device which stores preset gestures and is in signal connection with corresponding intelligent vehicle-mounted equipment is arranged at a preset position of an automobile driver seat or a position adjacent to the automobile driver seat, gesture actions of a driver are recognized through the corresponding gesture recognition device, gesture instructions are generated, so that the corresponding intelligent vehicle-mounted equipment can execute corresponding operations according to the corresponding gesture instructions, safe driving is realized, potential safety hazards are avoided, and operations of the driver on the intelligent vehicle-mounted equipment are reduced.
The invention realizes the above purpose through the following technical scheme: the network architecture of the automobile cabin gesture recognition system comprises an intelligent vehicle-mounted device, wherein the intelligent vehicle-mounted device comprises a storage module for pre-storing preset gestures, a gesture acquisition module for acquiring gesture actions of a user, a gesture recognition module for comparing the similarity of the acquired gesture actions of the user with the preset gestures and generating comparison results, and a data processing module for receiving the comparison results, generating gesture instructions and executing operations corresponding to the gesture instructions according to the gesture instructions;
a training method of a gesture recognition system for an automobile cabin comprises the following steps:
step S1, manufacturing sample data by using RGB images shot by a high-definition camera and a thermal image shot by a thermal imaging camera, and transmitting the images to a vehicle-mounted data processing center through a data transmission device;
s2, after registering and fusing the RGB image and the thermal imaging image, carrying out image feature enhancement by adopting a bilateral self-adaptive Gamma enhancement algorithm based on Retinex theory improvement;
step S3, expanding the gesture sample image in the step S1 by adopting a small-batch gradient descent (Mini-Batch Gradient Descent, MBGD) optimized depth convolution countermeasure generation network (MGAN);
s4, adding a block diagram and a label to the gesture expansion picture generated in the gesture image in the step S2, and inputting the generated expansion picture and the label into a subsequent neural network;
step S5, a ResNet50 model with high precision and low complexity is adopted as a frame-level feature extraction network, 3 parallel 3DCNN flows are adopted, and each flow receives different RGB images respectively: the RGB image of the detected gesture, each parallel 3DCNN stream is a 3D-ResNet50 model; the outputs of the 3 DCNNs are connected and input to the next step of the model; 512 units are placed in the full connectivity layer after each 3D-ResNet50, the units being fixed independent of the dataset;
step S6, the gesture recognition network VQA adopting the proposed common attention mechanism obtains the most relevant image gesture features and corresponding semantics, and the participated images and semantic feature vectors are connected to a softmax layer for final result output;
step S7, a dynamic gesture recognition application program is designed by taking the trained CoAttention gesture recognition model as a core, and the size of a predefined answer set in the dynamic gesture recognition application program is set as the number of gesture labels in a data set;
and S8, transmitting the real-time data acquired by the equipment into a dynamic gesture recognition application program through a data transmission device, carrying out gesture classification recognition according to key feature information contained in the real-time video, and uploading related gesture recognition information to an SQL Server database through the data transmission device for storage.
As still further aspects of the invention: in the steps S2-S3, the selected network structure and algorithm take the precision and the code complexity into consideration, and image enhancement, feature extraction and reconstruction are performed aiming at the special environment of the automobile cabin.
As still further aspects of the invention: in the step S3, the method specifically includes:
step S3.1, inputting the gesture sample X collected in step S1 real And set the iteration number of MBGDAnd a step size;
step S3.2, generating sample Picture X by the Generator fake And in sample picture X fake Adding randomly distributed noise Z-N (0, 1) z ;
Step S3.3, calculating the true sample loss D (X) by the discriminator to obtain a true sample discriminator S real =d (X); the discriminator pair generates sample lossPerforming calculation to obtain a generated sample discriminator +.>
Step S3.4, iteratively updating the discriminator by adopting a small-batch gradient descent algorithm, wherein the learning rate ld=log (S) real )+log(1-S fake ),
Step S3.5, the generator iteratively updates with a small batch gradient descent algorithm, and the generator learning rate lg=log (S) fake ),
Step S3.6, lr=decay (lr, 400,0.96), learning rate decay of 0.96 every 400 steps;
step S3.7, repeating steps S3.2 to S3.5 until the whole network reaches the optimal or maximum iteration number, and outputting the generated sample
As still further aspects of the invention: in the step S6, the method specifically includes:
step S6.1, calculating visual and semantic features V of the input image based on the input image and its semantic feature vectors V and Q * 、Q * ;
h n =tanh(W v V)⊙tanh(W v m b )(1)
α n =softmaxW hv h n (2)
V * =tanh(α n V)(3)
h t =tanh(W q Q)⊙tanh(W m m b )(4)
α t =softmaxW hq h t (5)
Q * =tanh(α n Q)(6)
Wherein W is v 、W q 、W m 、W hv 、W hq Indicating hidden layer, alpha n 、α t The attention weight is represented, V and Q represent visual and semantic features of an input image, and V and Q are feature vectors of the image and the semantic respectively;
step S6.2, calculating the classification vector h by using the linear layer and the softmax function t By p to t Soft max is performed to calculate probability distribution p of different gesture classifications t ;
h t =tanh(W o O * )
p t =soft max(W h h t )
Wherein W is o And W is h Hidden parameters for the linear layer;
step S6.3, during the training phase, by minimizing the label y t Cross entropy on vectors to optimize gesture recognition neural network parametersThe CoAttention gesture recognition model pre-training is completed, and the CoAttention gesture recognition model is uploaded to an SQL SeFveF database through a data transmission device to be stored;
a training device for an automotive cabin gesture recognition system, the training device comprising a processor comprising a gesture recognition device or a training device for a gesture recognition network.
As still further aspects of the invention: the training device further comprises a memory for storing the gesture recognition result.
The beneficial effects of the invention are as follows:
aiming at the problem of insufficient vehicle-mounted gesture data sets, the invention provides a gesture sample generation model of an MGAN network, which effectively solves the sample problem required by deep learning network training; the invention provides a gesture recognition network VQA with a common attention mechanism, which combines the common attention machine with a video feature extraction network adopting 3DCNN to realize the real-time recognition of gestures, improves the accuracy of gesture recognition, and can more accurately distinguish different information expressed among different gestures; the gesture action of the driver is recognized through the corresponding gesture recognition device, and the gesture instruction is generated, so that the corresponding intelligent vehicle-mounted equipment executes corresponding operation according to the corresponding gesture instruction, safe driving is realized, potential safety hazards are avoided, the operation of the driver on the intelligent vehicle-mounted equipment is reduced, the driver does not need to transfer the line of sight to manually operate specific keys on the intelligent screen during the running of the automobile, the intelligent vehicle-mounted equipment can be controlled to perform corresponding operation, and the safety performance in the driving process is improved.
Drawings
FIG. 1 is a flow chart of a pre-training network of the present invention;
fig. 2 is a flow chart of the MGAN of the present invention;
FIG. 3 is a flow chart of a pre-training of a gesture recognition network of the present invention;
FIG. 4 is a flow chart of the gesture recognition system of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As shown in fig. 1 to 4, a network architecture of a gesture recognition system for an automobile cabin is provided, the training network architecture comprises an intelligent vehicle-mounted device, the intelligent vehicle-mounted device comprises a storage module for pre-storing preset gestures, a gesture acquisition module for acquiring gesture actions of a user, a gesture recognition module for comparing the similarity of the acquired gesture actions of the user with the preset gestures and generating comparison results, and a data processing module for receiving the comparison results, generating gesture instructions and executing operations corresponding to the gesture instructions according to the gesture instructions.
A training method of a gesture recognition system for an automobile cabin comprises the following steps:
step S1, manufacturing sample data by using RGB images shot by a high-definition camera and a thermal image shot by a thermal imaging camera, and transmitting the images to a vehicle-mounted data processing center through a data transmission device;
s2, after registering and fusing the RGB image and the thermal imaging image, carrying out image feature enhancement by adopting a bilateral self-adaptive Gamma enhancement algorithm based on Retinex theory improvement;
step S3, expanding the gesture sample image in the step S1 by adopting a small-batch gradient descent (Mini-Batch Gradient Descent, MBGD) optimized depth convolution countermeasure generation network (MGAN);
s4, adding a block diagram and a label to the gesture expansion picture generated in the gesture image in the step S2, and inputting the generated expansion picture and the label into a subsequent neural network;
step S5, a ResNet50 model with high precision and low complexity is adopted as a frame-level feature extraction network, 3 parallel 3DCNN flows are adopted, and each flow receives different RGB images respectively: the RGB image of the detected gesture, each parallel 3DCNN stream is a 3D-ResNet50 model; the outputs of the 3 DCNNs are connected and input to the next step of the model; 512 units are placed in the full connectivity layer after each 3D-ResNet50, the units being fixed independent of the dataset;
step S6, the gesture recognition network VQA adopting the proposed common attention mechanism obtains the most relevant image gesture features and corresponding semantics, and the participated images and semantic feature vectors are connected to a softmax layer for final result output;
step S7, a dynamic gesture recognition application program is designed by taking the trained CoAttention gesture recognition model as a core, and the size of a predefined answer set in the dynamic gesture recognition application program is set as the number of gesture labels in a data set;
and S8, transmitting the real-time data acquired by the equipment into a dynamic gesture recognition application program through a data transmission device, carrying out gesture classification recognition according to key feature information contained in the real-time video, and uploading related gesture recognition information to an SQL Server database through the data transmission device for storage.
In the embodiment of the invention, in the steps S2-S3, the selected network structure and algorithm take the precision and the code complexity into consideration at the same time, and the image enhancement, the feature extraction and the reconstruction are carried out aiming at the special environment of the automobile cabin.
In the embodiment of the present invention, the step s3 specifically includes:
step S3.1, inputting the gesture sample X acquired in step S1 real Setting the iteration times and the step length of MBGD;
step S3.2, generating sample Picture X by the Generator fake And in sample picture X fake Adding randomly distributed noise Z-N (0, 1) z ;
Step S3.3, calculating the true sample loss D (X) by the discriminator to obtain a true sample discriminator S real =d (X); the discriminator pair generates sample lossPerforming calculation to obtain a generated sample discriminator +.>
Step S3.4, iteratively updating the discriminator by using a small-batch gradient descent algorithm, and learning the discriminator with ld=log (S) real )+log(1-S fake ),
Step S3.5, the generator iteratively updates with a small batch gradient descent algorithm, and the generator learning rate lg=log (S fake ),
Step S3.6, lr=decay (lr, 400,0.96), learning rate decay of 0.96 every 400 steps;
step S3.7, repeating steps S3.2 to S3.5 until the whole network reaches the optimal or maximum iteration number, and outputting the generated sample/>
Example two
As shown in fig. 1 to 4, a network architecture of a gesture recognition system for an automobile cabin is provided, the training network architecture comprises an intelligent vehicle-mounted device, the intelligent vehicle-mounted device comprises a storage module for pre-storing preset gestures, a gesture acquisition module for acquiring gesture actions of a user, a gesture recognition module for comparing the similarity of the acquired gesture actions of the user with the preset gestures and generating comparison results, and a data processing module for receiving the comparison results, generating gesture instructions and executing operations corresponding to the gesture instructions according to the gesture instructions.
A training method of a gesture recognition system for an automobile cabin comprises the following steps:
step S1, manufacturing sample data by using RGB images shot by a high-definition camera and a thermal image shot by a thermal imaging camera, and transmitting the images to a vehicle-mounted data processing center through a data transmission device;
s2, after registering and fusing the RGB image and the thermal imaging image, carrying out image feature enhancement by adopting a bilateral self-adaptive Gamma enhancement algorithm based on Retinex theory improvement;
step S3, expanding the gesture sample image in the step S1 by adopting a small-batch gradient descent (Mini-Batch Gradient Descent, MBGD) optimized depth convolution countermeasure generation network (MGAN);
s4, adding a block diagram and a label to the gesture expansion picture generated in the gesture image in the step S2, and inputting the generated expansion picture and the label into a subsequent neural network;
step S5, a ResNet50 model with high precision and low complexity is adopted as a frame-level feature extraction network, 3 parallel 3DCNN flows are adopted, and each flow receives different RGB images respectively: the RGB image of the detected gesture, each parallel 3DCNN stream is a 3D-ResNet50 model; the outputs of the 3 DCNNs are connected and input to the next step of the model; 512 units are placed in the full connectivity layer after each 3D-ResNet50, the units being fixed independent of the dataset;
step S6, the gesture recognition network VQA adopting the proposed common attention mechanism obtains the most relevant image gesture features and corresponding semantics, and the participated images and semantic feature vectors are connected to a softmax layer for final result output;
step S7, a dynamic gesture recognition application program is designed by taking the trained CoAttention gesture recognition model as a core, and the size of a predefined answer set in the dynamic gesture recognition application program is set as the number of gesture labels in a data set;
and S8, transmitting the real-time data acquired by the equipment into a dynamic gesture recognition application program through a data transmission device, carrying out gesture classification recognition according to key feature information contained in the real-time video, and uploading related gesture recognition information to an SQL Server database through the data transmission device for storage.
In the embodiment of the present invention, the step S6 specifically includes:
step S6.1, calculating visual and semantic features V of the input image based on the input image and its semantic feature vectors V and Q * 、Q * ;
h n =tanh(W v V)⊙tanh(W v m b )(1)
α n =softmaxW hv h n (2)
V * =tanh(α n V)(3)
h t =tanh(W q Q)⊙tanh(W m m b )(4)
α t =softmaxW hq h t (5)
Q * =tanh(α n Q)(6)
Wherein W is v 、W q 、W m 、W hv 、W hq Indicating hidden layer, alpha n 、α t The attention weight is represented, V and Q represent visual and semantic features of an input image, and V and Q are feature vectors of the image and the semantic respectively;
step S6.2, calculating the vector p by using the linear layer and the softmax function t To represent probability distributions of different gesture classifications;
h t =tanh(W o O * )
p t =soft max(W h h t )
wherein W is o And W is h Hidden parameters for the linear layer;
step S6.3, during the training phase, by minimizing the label y t Cross entropy on vectors to optimize gesture recognition neural network parametersThe CoAttention gesture recognition model pre-training is completed, and the CoAttention gesture recognition model is uploaded to an SQL Server database for storage through a data transmission device;
a training device of a gesture recognition system for an automobile cabin comprises a memory for storing gesture recognition results.
In an embodiment of the present invention, the training device comprises a processor, and the processor comprises a gesture recognition device or a training device of a gesture recognition network.
The gesture recognition network is obtained through training of the to-be-processed image combined with the weight vector, the coordinate information comprises gesture frame coordinates and/or key point coordinates, the gesture classification information is used for marking that a gesture in the gesture frame image belongs to one preset gesture in a plurality of preset gestures, and the background information comprises a foreground image and a background image.
Pre-collecting RGB images shot by an RGB camera and heat maps shot by a thermal imaging camera, and transmitting the images to a vehicle-mounted data processing center through a data transmission device; after registering and fusing the RGB image and the thermal imaging image, adopting a bilateral self-adaptive Gamma enhancement algorithm based on RetinexX theory improvement to enhance the image characteristics; the data processing center adopts a small batch gradient descent (MBGD) optimized depth convolution countermeasure generation network (MGAN) to expand the pattern sample image; adding a label to the gesture expansion picture, and inputting the generated expansion picture and the label into a subsequent CoAttention neural network, thereby completing the pre-training of the CoAttention classification recognition network; acquiring real-time videos output by an RGB video camera and a thermal imaging camera, and transmitting the images to a vehicle-mounted data processing center through a data transmission device; the method comprises the steps of adopting a ResNet50 model with high precision and low complexity as a frame-level feature extractor to extract key feature information in a video; the data processing system utilizes a pretrained CoAttention classification and identification model to classify and identify key feature information in the extracted video, and uploads relevant identification information to an SQL Server database for storage through a data transmission device; the operation execution system executes related operations according to gesture information in the SQL Server database; and the data processing system executes corresponding feedback according to the CoAttention identification result, so that the whole closed loop process is ended.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.
Claims (7)
1. The utility model provides a network architecture of gesture recognition system for car cabin which characterized in that: the network architecture comprises intelligent vehicle-mounted equipment, wherein the intelligent vehicle-mounted equipment comprises a storage module pre-stored with preset gestures, a gesture acquisition module used for acquiring gesture actions of a user, a gesture recognition module used for comparing the similarity of the acquired gesture actions of the user with the preset gestures and generating comparison results, and a data processing module used for receiving the comparison results to generate gesture instructions and executing operations corresponding to the gesture instructions according to the gesture instructions.
2. A training method for a gesture recognition system for an automobile cabin, the training method comprising:
step S1, manufacturing sample data by using RGB images shot by a high-definition camera and a thermal image shot by a thermal imaging camera, and transmitting the images to a vehicle-mounted data processing center through a data transmission device;
s2, after registering and fusing the RGB image and the thermal imaging image, carrying out image feature enhancement by adopting a bilateral self-adaptive Gamma enhancement algorithm based on Retinex theory improvement;
step S3, a gradient descent optimized depth convolution countermeasure generation network is adopted to expand the gesture sample image in the step S1;
s4, adding a block diagram and a label to the gesture expansion picture generated in the gesture image in the step S2, and inputting the generated expansion picture and the label into a subsequent neural network;
step S5, a ResNet50 model is adopted as a frame-level feature extraction network, 3 parallel 3DCNN flows are adopted, and each flow receives different RGB images respectively: the RGB image of the detected gesture, each parallel 3DCNN stream is a 3D-ResNet50 model; the outputs of the 3 DCNNs are connected and input to the next step of the model; 512 units are placed in the full connectivity layer after each 3D-ResNet50, the units being fixed independent of the dataset;
step S6, the gesture recognition network VQA adopting the proposed common attention mechanism obtains the most relevant image gesture features and corresponding semantics, and the participated images and semantic feature vectors are connected to a softmax layer for final result output;
step S7, a dynamic gesture recognition application program is designed by taking the trained CoAttention gesture recognition model as a core, and the size of a predefined answer set in the dynamic gesture recognition application program is set as the number of gesture labels in a data set;
and S8, transmitting the real-time data acquired by the equipment into a dynamic gesture recognition application program through a data transmission device, carrying out gesture classification recognition according to key feature information contained in the real-time video, and uploading related gesture recognition information to an SQL Server database through the data transmission device for storage.
3. A training method of a gesture recognition system for a car cabin according to claim 2, wherein: in the steps S2-S3, the selected network structure and algorithm take the precision and the code complexity into consideration, and image enhancement, feature extraction and reconstruction are performed aiming at the special environment of the automobile cabin.
4. A training method of a gesture recognition system for a car cabin according to claim 2, wherein: in the step S3, the method specifically includes:
step S3.1, inputting the gesture sample X collected in step S1 real Setting the iteration times and the step length of MBGD;
step S3.2, generating sample Picture X by the Generator fake And in sample picture X fake Adding randomly distributed noise Z-N (0, 1) z ;
Step S3.3, calculating the true sample loss D (X) by the discriminator to obtain a true sample discriminator S real =d (X); the discriminator pair generates sample lossPerforming calculation to obtain a generated sample discriminator +.>
Step S3.4, the discriminator adopts a small batch gradient descent algorithm to carry out iterative updating, and the learning rate L of the discriminator D =log(S real )+log(1-S fake ),/>
Step S3.5, the generator adopts a small batch gradient descent algorithm to carry out iterative updating, and the generator learns the rate L G =log(S fake ),
Step S3.6, learning rate lr=decay (lr, 400,0.96), learning rate decay of 0.96 every 400 steps;
5. A training method of a gesture recognition system for a car cabin according to claim 1, wherein: in the step S6, the method specifically includes:
step S6.1 rootCalculating visual and semantic features V of an input image based on feature vectors V and Q of the input image and its semantics * 、Q * ;
h n =tanh(W v V)⊙tanh(W v m b ) (1)
α n =softmaxW hv h n (2)
V * =tanh(α n V) (3)
h t =tanh(W q Q)⊙tanh(W m m b ) (4)
α t =softmaxW hq h t (5)
Q * =tanh(α n Q) (6)
Wherein W is v 、W q 、W m 、W hv 、W hq Indicating hidden layer, alpha n 、α t The attention weight is represented, V and Q represent visual and semantic features of an input image, and V and Q are feature vectors of the image and the semantic respectively;
step S6.2, calculating the classification vector h by using the linear layer and the softmax function t By p to t Soft max is performed to calculate probability distribution p of different gesture classifications t ;
h t =tanh(W o O * )
p t =soft max(W h h t )
Wherein W is o And W is h Hidden parameters for the linear layer;
step S6.3, during the training phase, by minimizing the label y t Cross entropy on vectors to optimize gesture recognition neural network parametersThe pretraining of the CoAttention gesture recognition model is completed, and the pretraining is uploaded to an SQL Server database for speaking and storing through a data transmission device;
6. a training device of a gesture recognition system for an automobile cabin is characterized in that: the training device includes a processor that includes a gesture recognition device or a training device of a gesture recognition network.
7. The training device of a gesture recognition system for an automobile cabin according to claim 6, wherein: a memory for storing the gesture recognition results is also included.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211306446.8A CN116071817A (en) | 2022-10-25 | 2022-10-25 | Network architecture and training method of gesture recognition system for automobile cabin |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211306446.8A CN116071817A (en) | 2022-10-25 | 2022-10-25 | Network architecture and training method of gesture recognition system for automobile cabin |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116071817A true CN116071817A (en) | 2023-05-05 |
Family
ID=86182739
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211306446.8A Pending CN116071817A (en) | 2022-10-25 | 2022-10-25 | Network architecture and training method of gesture recognition system for automobile cabin |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116071817A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117152843A (en) * | 2023-09-06 | 2023-12-01 | 世优(北京)科技有限公司 | Digital person action control method and system |
CN117218716A (en) * | 2023-08-10 | 2023-12-12 | 中国矿业大学 | DVS-based automobile cabin gesture recognition system and method |
CN117351557A (en) * | 2023-08-17 | 2024-01-05 | 中国矿业大学 | Vehicle-mounted gesture recognition method for deep learning |
-
2022
- 2022-10-25 CN CN202211306446.8A patent/CN116071817A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117218716A (en) * | 2023-08-10 | 2023-12-12 | 中国矿业大学 | DVS-based automobile cabin gesture recognition system and method |
CN117218716B (en) * | 2023-08-10 | 2024-04-09 | 中国矿业大学 | DVS-based automobile cabin gesture recognition system and method |
CN117351557A (en) * | 2023-08-17 | 2024-01-05 | 中国矿业大学 | Vehicle-mounted gesture recognition method for deep learning |
CN117152843A (en) * | 2023-09-06 | 2023-12-01 | 世优(北京)科技有限公司 | Digital person action control method and system |
CN117152843B (en) * | 2023-09-06 | 2024-05-07 | 世优(北京)科技有限公司 | Digital person action control method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110021051B (en) | Human image generation method based on generation of confrontation network through text guidance | |
CN116071817A (en) | Network architecture and training method of gesture recognition system for automobile cabin | |
CN110399821B (en) | Customer satisfaction acquisition method based on facial expression recognition | |
CN107679491A (en) | A kind of 3D convolutional neural networks sign Language Recognition Methods for merging multi-modal data | |
CN109740419A (en) | A kind of video behavior recognition methods based on Attention-LSTM network | |
CN112784929B (en) | Small sample image classification method and device based on double-element group expansion | |
Rani et al. | Object detection and recognition using contour based edge detection and fast R-CNN | |
CN107194344B (en) | Human behavior recognition method adaptive to bone center | |
CN113807340B (en) | Attention mechanism-based irregular natural scene text recognition method | |
CN112232395B (en) | Semi-supervised image classification method for generating countermeasure network based on joint training | |
CN111914911A (en) | Vehicle re-identification method based on improved depth relative distance learning model | |
CN113378949A (en) | Dual-generation confrontation learning method based on capsule network and mixed attention | |
CN115861981A (en) | Driver fatigue behavior detection method and system based on video attitude invariance | |
CN115731597A (en) | Automatic segmentation and restoration management platform and method for mask image of face mask | |
Shi et al. | Learning attention-enhanced spatiotemporal representation for action recognition | |
CN117710841A (en) | Small target detection method and device for aerial image of unmanned aerial vehicle | |
CN116645287B (en) | Diffusion model-based image deblurring method | |
CN117726809A (en) | Small sample semantic segmentation method based on information interaction enhancement | |
CN115984949B (en) | Low-quality face image recognition method and equipment with attention mechanism | |
Han et al. | Feature fusion and adversary occlusion networks for object detection | |
CN110929632A (en) | Complex scene-oriented vehicle target detection method and device | |
CN110942463A (en) | Video target segmentation method based on generation countermeasure network | |
CN116311251A (en) | Lightweight semantic segmentation method for high-precision stereoscopic perception of complex scene | |
KR102279772B1 (en) | Method and Apparatus for Generating Videos with The Arrow of Time | |
CN110688986B (en) | 3D convolution behavior recognition network method guided by attention branches |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |