CN108985223B - Human body action recognition method - Google Patents
Human body action recognition method Download PDFInfo
- Publication number
- CN108985223B CN108985223B CN201810766185.5A CN201810766185A CN108985223B CN 108985223 B CN108985223 B CN 108985223B CN 201810766185 A CN201810766185 A CN 201810766185A CN 108985223 B CN108985223 B CN 108985223B
- Authority
- CN
- China
- Prior art keywords
- network
- sequence
- deep learning
- convolution
- recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000009471 action Effects 0.000 title claims abstract description 27
- 238000000034 method Methods 0.000 title claims abstract description 17
- 238000013135 deep learning Methods 0.000 claims abstract description 35
- 230000003287 optical effect Effects 0.000 claims abstract description 21
- 238000012549 training Methods 0.000 claims abstract description 17
- 238000000605 extraction Methods 0.000 claims abstract description 13
- 238000010586 diagram Methods 0.000 claims description 7
- 210000002569 neuron Anatomy 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 2
- 238000001514 detection method Methods 0.000 abstract description 8
- 238000012545 processing Methods 0.000 description 6
- 230000000875 corresponding effect Effects 0.000 description 5
- 230000008859 change Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000000692 anti-sense effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Social Psychology (AREA)
- Multimedia (AREA)
- Psychiatry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Human Computer Interaction (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a human body action recognition method based on a deep learning technology, which comprises two stages of training and recognition, wherein a network used in the training and recognition stages comprises a sequence feature extraction module, the sequence feature extraction module comprises a color map deep learning network and a CNN network of an optical flow deep learning network, the color map deep learning network comprises three LSTM layers, and the optical flow deep learning network comprises two LSTM layers. After the LSTM layer is added, the recognition method has the capability of learning a long image sequence, so that the time sequence information of the sequence video can be better utilized, and the detection accuracy is effectively improved. Meanwhile, a convolution network with a four-layer structure is used in the deep learning network, and the convolution network is used for changing the receptive field of the feature codes, so that a part of images in the image sequence also participate in the determination of the detection result.
Description
Technical Field
The invention belongs to the field of machine learning, and particularly relates to a human motion recognition method.
Background
The traditional human body motion recognition is to add a biological sensor or a mechanical sensor and other acquisition equipment on the body of a person, so that the human body motion recognition is a contact type motion detection method, and can bring an anti-sense or tired sense to the person. With the development of technology, this recognition mode has been gradually replaced by an image-based recognition method.
The deep learning brings breakthrough progress to machine learning and brings new development direction to human motion recognition. Unlike traditional recognition methods, deep learning can automatically learn high-level features from low-level features, and solves the problems that feature selection is too dependent on tasks and long time is consumed in the adjustment process.
Disclosure of Invention
In the prior art, the recognition of human body actions directly uses a full connection layer, and detection is performed based on the whole feature, so that problems can occur, such as when the actions are relatively fast, the length of a picture sequence with the actions is much smaller than the length of a unit complete sequence set in the detection, and the problems that the actions cannot be detected can occur. Meanwhile, the historical information of the sequence images is not considered in the prior art, and the detection precision is still to be improved. Based on the design, the human body action recognition method adopts the following technical scheme:
The human body action recognition method is based on a deep learning technology and comprises two stages of training and recognition, wherein a network used in the training and recognition stages comprises a sequence feature extraction module, the sequence feature extraction module comprises a color map deep learning network and a CNN network of an optical flow deep learning network, the color map deep learning network comprises three LSTM layers, and the optical flow deep learning network comprises two LSTM layers.
Further, the number of neurons in the hidden layer in the LSTM layer is 200.
Further, the training phase includes the steps of:
step 1, acquiring an action video, splitting the action video into frame images, calculating a light flow diagram, extracting a frame of image at intervals of 16 frames to serve as a sequence center frame, and marking an action position;
step 2, respectively generating a sequence picture sample and a label, a center frame picture sample and a position label and a sequence optical flow picture sample and a label from the video sequence image for training a corresponding feature extraction model;
Step 3, sending the sequence picture sample and the label into a color image deep learning network, sending the center frame picture sample and the position label into a CNN network, sending the sequence optical flow picture sample into the optical flow deep learning network, and extracting features;
Step 4, fusing the extracted features of the three network models to generate feature codes corresponding to the video sequences;
Step 5, the feature codes are sent into a convolution network, and the receptive fields of the video sequence features are subjected to different time scale changes;
step 6, sending the feature code samples with different receptive fields into a video recognition network to generate a recognition model;
And 7, iterative training until the recognition model converges.
Further, feature codes of the video sequence in the identification stage are generated by the sequence feature extraction module, and the feature codes are identified and classified after the feature codes change the receptive field through a convolution network.
Further, the convolution network adopts a four-layer structure.
Compared with the prior art, the invention has the beneficial effects that:
1. The redesigned deep learning network structure can better extract the characteristics of the video sequence, and the motion recognition precision is high.
2. The four-layer convolution network is adopted to carry out receptive field change on the video sequence feature codes, and the problem that when the length of a picture sequence containing actions in a sequence image is much smaller than the length of a complete sequence, the actions cannot be detected is effectively solved on the premise of ensuring the identification instantaneity.
Drawings
FIG. 1 is a flow chart of the model training of the present invention;
FIG. 2 is a color map deep learning network workflow diagram;
FIG. 3 is an optical flow deep learning network workflow;
FIG. 4 is a CNN network workflow diagram;
FIG. 5 is a flow chart of the motion recognition of the present invention;
Fig. 6 is a convolutional layer network workflow diagram.
Detailed Description
As shown in fig. 1, a training phase in the human motion recognition method of the present invention includes:
step 1, acquiring an action video, splitting the action video into frame images, calculating a light flow diagram, extracting a frame of image at intervals of 16 frames to serve as a sequence center frame, and marking an action position;
step 2, respectively sending the video sequence images into an image sequence processing unit, a center frame image processing unit and an optical flow sequence processing unit to generate a sequence picture sample and a label, a center frame picture sample and a position label and a sequence optical flow picture sample and a label, which are used for training a corresponding feature extraction model;
Step 3, sending the sequence picture sample and the label into a color image deep learning network, sending the center frame picture sample and the position label into a CNN network, sending the sequence optical flow picture sample into the optical flow deep learning network, and extracting features;
Step 4, fusing the extracted features of the three network models to generate feature codes corresponding to the video sequences;
Step 5, the feature codes are sent into a convolution network, and the receptive fields of the video sequence features are subjected to different time scale changes;
step 6, sending the feature code samples with different receptive fields into a video recognition network to generate a recognition model;
And 7, iterative training until the recognition model converges.
The image sequence processing unit, the center frame image processing unit, the optical flow sequence processing unit, the color map deep learning network, the CNN network, the optical flow deep learning network and the feature fusion unit form a sequence feature extraction module.
Because human motion is continuous and the acquired image frames are discrete, the history information of the previous frame image is correlated to the image of the current frame. The deep learning network is mainly constructed as a CNN network, and the invention constructs a color map deep learning network and an optical flow deep learning network on the basis of the CNN network. The CNN network adopts an SSD network layer to extract specific position information of actions in the key frames. As shown in fig. 2 and 3, the color map deep learning network adds three LSTM layers, and the optical flow deep learning network adds two LSTM layers. Wherein the hidden layer in the LSTM layer has 200 neurons. After adding the LSTM layer, the recognition method is enabled to learn long image sequences. Compared with an algorithm for identifying only by adopting a single frame of picture, the method for identifying the deep learning network by utilizing the reconstruction can better utilize the time sequence information of the sequence video, and effectively improve the detection accuracy.
As shown in fig. 5, the recognition stage in the human motion recognition method of the present invention includes:
step 1, acquiring an action video, splitting the action video into frame images, calculating a light flow diagram, extracting a frame of image at intervals of 16 frames to serve as a sequence center frame, and marking an action position;
step 2, generating a feature code corresponding to the video sequence by utilizing a sequence feature extraction module;
step 3, the feature codes are sent into a convolution network, and the receptive fields of the video sequence features are subjected to different time scale changes;
Step 4, classifying the feature codes with different receptive fields;
And 5, obtaining a human body action recognition result.
As shown in fig. 6, the convolutional network used in the training and recognition process has a four-layer structure, and is used for changing the receptive field of the feature code, and after the feature code passes through the four-layer convolutional layers, the receptive field is changed four times. The purpose of the changing receptive field is to have a part of the image in a sequence of a certain length also participate in the determination of the detection result, i.e. the result is determined by the whole feature code data and part of the feature code data together. The convolution network is composed of time sequence convolutions, one-dimensional convolutions of conv9 are used in each layer of convolutions, the step length is 1, and each convolution layer is matched with one pooling layer.
The above embodiments are merely preferred embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalent substitutions, improvements, etc. within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (1)
1. The human body action recognition method is based on a deep learning technology and is characterized in that the human body action recognition method comprises two stages of training and recognition, a sequence feature extraction module is arranged in a network used in the training and recognition stages, the sequence feature extraction module comprises a color image deep learning network, an optical flow deep learning network and a CNN network, the color image deep learning network is added with three LSTM layers on the basis of the CNN network, the optical flow deep learning network is added with two LSTM layers on the basis of the CNN network, and the CNN network adopts an SSD network layer;
the number of neurons in the hidden layer in the LSTM layer is 200;
the training phase comprises the following steps:
Step 1, acquiring an action video, splitting the action video into frame images, calculating a light flow diagram, extracting a frame of image at intervals of 16 frames to serve as a sequence center frame, and marking an action position;
step 2, respectively generating a sequence picture sample and a label from the video sequence image; center frame picture sample position and label; the sequence optical flow picture sample and the label are used for training the corresponding feature extraction model;
Step 3, sending the sequence picture sample and the label into a color image deep learning network, sending the center frame picture sample and the position label into a CNN network, sending the sequence optical flow picture sample into the optical flow deep learning network, and extracting features;
Step 4, fusing the extracted features of the three network models to generate feature codes corresponding to the video sequences;
Step 5, the feature codes are sent into a convolution network, and the receptive fields of the video sequence features are subjected to different time scale changes;
Step 6, sending the feature code samples with different receptive fields into a video recognition network to generate a recognition model;
step 7, iterative training is carried out until the recognition model converges;
The feature codes of the video sequence in the identification stage are generated by the sequence feature extraction module, and the feature codes are identified after the receptive field is changed by the convolution network;
the convolution network adopts a four-layer structure and is formed by time sequence convolution;
Each convolution layer in the convolution network uses one-dimensional convolution, the step length is 1, and each convolution layer is matched with one pooling layer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810766185.5A CN108985223B (en) | 2018-07-12 | 2018-07-12 | Human body action recognition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810766185.5A CN108985223B (en) | 2018-07-12 | 2018-07-12 | Human body action recognition method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108985223A CN108985223A (en) | 2018-12-11 |
CN108985223B true CN108985223B (en) | 2024-05-07 |
Family
ID=64537893
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810766185.5A Active CN108985223B (en) | 2018-07-12 | 2018-07-12 | Human body action recognition method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108985223B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109685213B (en) * | 2018-12-29 | 2022-01-07 | 百度在线网络技术(北京)有限公司 | Method and device for acquiring training sample data and terminal equipment |
CN110084259B (en) * | 2019-01-10 | 2022-09-20 | 谢飞 | Facial paralysis grading comprehensive evaluation system combining facial texture and optical flow characteristics |
CN109902565B (en) * | 2019-01-21 | 2020-05-05 | 深圳市烨嘉为技术有限公司 | Multi-feature fusion human behavior recognition method |
CN109919031B (en) * | 2019-01-31 | 2021-04-09 | 厦门大学 | Human behavior recognition method based on deep neural network |
CN110544301A (en) * | 2019-09-06 | 2019-12-06 | 广东工业大学 | Three-dimensional human body action reconstruction system, method and action training system |
CN112257568B (en) * | 2020-10-21 | 2022-09-20 | 中国人民解放军国防科技大学 | Intelligent real-time supervision and error correction system and method for individual soldier queue actions |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104933417A (en) * | 2015-06-26 | 2015-09-23 | 苏州大学 | Behavior recognition method based on sparse spatial-temporal characteristics |
CN106845351A (en) * | 2016-05-13 | 2017-06-13 | 苏州大学 | It is a kind of for Activity recognition method of the video based on two-way length mnemon in short-term |
CN107273800A (en) * | 2017-05-17 | 2017-10-20 | 大连理工大学 | A kind of action identification method of the convolution recurrent neural network based on attention mechanism |
CN107292247A (en) * | 2017-06-05 | 2017-10-24 | 浙江理工大学 | A kind of Human bodys' response method and device based on residual error network |
CN107463949A (en) * | 2017-07-14 | 2017-12-12 | 北京协同创新研究院 | A kind of processing method and processing device of video actions classification |
CN108108699A (en) * | 2017-12-25 | 2018-06-01 | 重庆邮电大学 | Merge deep neural network model and the human motion recognition method of binary system Hash |
CN108229338A (en) * | 2017-12-14 | 2018-06-29 | 华南理工大学 | A kind of video behavior recognition methods based on depth convolution feature |
-
2018
- 2018-07-12 CN CN201810766185.5A patent/CN108985223B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104933417A (en) * | 2015-06-26 | 2015-09-23 | 苏州大学 | Behavior recognition method based on sparse spatial-temporal characteristics |
CN106845351A (en) * | 2016-05-13 | 2017-06-13 | 苏州大学 | It is a kind of for Activity recognition method of the video based on two-way length mnemon in short-term |
CN107273800A (en) * | 2017-05-17 | 2017-10-20 | 大连理工大学 | A kind of action identification method of the convolution recurrent neural network based on attention mechanism |
CN107292247A (en) * | 2017-06-05 | 2017-10-24 | 浙江理工大学 | A kind of Human bodys' response method and device based on residual error network |
CN107463949A (en) * | 2017-07-14 | 2017-12-12 | 北京协同创新研究院 | A kind of processing method and processing device of video actions classification |
CN108229338A (en) * | 2017-12-14 | 2018-06-29 | 华南理工大学 | A kind of video behavior recognition methods based on depth convolution feature |
CN108108699A (en) * | 2017-12-25 | 2018-06-01 | 重庆邮电大学 | Merge deep neural network model and the human motion recognition method of binary system Hash |
Non-Patent Citations (5)
Title |
---|
Expression Empowered ResiDen Network for Facial Action Unit Detection;Shreyank Jyoti 等;arXiv;20180614;第1节 * |
Long-term Recurrent Convolutional Networks for Visual Recognition and Description;Jeff Donahue 等;《2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)》;第1节、第4节 * |
Long-term Recurrent Convolutional Networks for Visual Recognition and Description;Jeff Donahue 等;2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR);20151015;第1节、第4节、图1 * |
一种基于融合多传感器信息的手语手势识别方法;阳平 等;航天医学与医学工程;20120831;第25卷(第4期);摘要 * |
基于双流CNN的异常行为分类算法研究;王昕培;《中国优秀硕士学位论文全文数据库 信息科技辑》;第2018年卷(第2期);I138-2191 * |
Also Published As
Publication number | Publication date |
---|---|
CN108985223A (en) | 2018-12-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108985223B (en) | Human body action recognition method | |
CN109543667B (en) | Text recognition method based on attention mechanism | |
US11657230B2 (en) | Referring image segmentation | |
CN107766894B (en) | Remote sensing image natural language generation method based on attention mechanism and deep learning | |
CN110580500B (en) | Character interaction-oriented network weight generation few-sample image classification method | |
CN109671102B (en) | Comprehensive target tracking method based on depth feature fusion convolutional neural network | |
CN110046671A (en) | A kind of file classification method based on capsule network | |
CN112949647B (en) | Three-dimensional scene description method and device, electronic equipment and storage medium | |
CN110135461B (en) | Hierarchical attention perception depth measurement learning-based emotion image retrieval method | |
CN111753189A (en) | Common characterization learning method for few-sample cross-modal Hash retrieval | |
CN113536922A (en) | Video behavior identification method for weighting fusion of multiple image tasks | |
CN113516152B (en) | Image description method based on composite image semantics | |
CN113449801B (en) | Image character behavior description generation method based on multi-level image context coding and decoding | |
Tian et al. | Aligned dynamic-preserving embedding for zero-shot action recognition | |
CN108960171B (en) | Method for converting gesture recognition into identity recognition based on feature transfer learning | |
Mogan et al. | Gait-ViT: Gait recognition with vision transformer | |
Duwairi et al. | Automatic recognition of Arabic alphabets sign language using deep learning. | |
CN116452805A (en) | Transformer-based RGB-D semantic segmentation method of cross-modal fusion network | |
CN115690549A (en) | Target detection method for realizing multi-dimensional feature fusion based on parallel interaction architecture model | |
Aksoy et al. | Detection of Turkish sign language using deep learning and image processing methods | |
Al-Obodi et al. | A Saudi Sign Language recognition system based on convolutional neural networks | |
Singh et al. | A sparse coded composite descriptor for human activity recognition | |
CN112216379A (en) | Disease diagnosis system based on intelligent joint learning | |
CN116578738B (en) | Graph-text retrieval method and device based on graph attention and generating countermeasure network | |
CN105046193B (en) | A kind of human motion recognition method based on fusion rarefaction representation matrix |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |