CN108985223B

CN108985223B - Human body action recognition method

Info

Publication number: CN108985223B
Application number: CN201810766185.5A
Authority: CN
Inventors: 张德馨; 史玉坤
Original assignee: Tianjin Isecure Technology Co ltd
Current assignee: Tianjin Isecure Technology Co ltd
Priority date: 2018-07-12
Filing date: 2018-07-12
Publication date: 2024-05-07
Anticipated expiration: 2038-07-12
Also published as: CN108985223A

Abstract

The invention provides a human body action recognition method based on a deep learning technology, which comprises two stages of training and recognition, wherein a network used in the training and recognition stages comprises a sequence feature extraction module, the sequence feature extraction module comprises a color map deep learning network and a CNN network of an optical flow deep learning network, the color map deep learning network comprises three LSTM layers, and the optical flow deep learning network comprises two LSTM layers. After the LSTM layer is added, the recognition method has the capability of learning a long image sequence, so that the time sequence information of the sequence video can be better utilized, and the detection accuracy is effectively improved. Meanwhile, a convolution network with a four-layer structure is used in the deep learning network, and the convolution network is used for changing the receptive field of the feature codes, so that a part of images in the image sequence also participate in the determination of the detection result.

Description

Human body action recognition method

Technical Field

The invention belongs to the field of machine learning, and particularly relates to a human motion recognition method.

Background

The traditional human body motion recognition is to add a biological sensor or a mechanical sensor and other acquisition equipment on the body of a person, so that the human body motion recognition is a contact type motion detection method, and can bring an anti-sense or tired sense to the person. With the development of technology, this recognition mode has been gradually replaced by an image-based recognition method.

The deep learning brings breakthrough progress to machine learning and brings new development direction to human motion recognition. Unlike traditional recognition methods, deep learning can automatically learn high-level features from low-level features, and solves the problems that feature selection is too dependent on tasks and long time is consumed in the adjustment process.

Disclosure of Invention

In the prior art, the recognition of human body actions directly uses a full connection layer, and detection is performed based on the whole feature, so that problems can occur, such as when the actions are relatively fast, the length of a picture sequence with the actions is much smaller than the length of a unit complete sequence set in the detection, and the problems that the actions cannot be detected can occur. Meanwhile, the historical information of the sequence images is not considered in the prior art, and the detection precision is still to be improved. Based on the design, the human body action recognition method adopts the following technical scheme:

The human body action recognition method is based on a deep learning technology and comprises two stages of training and recognition, wherein a network used in the training and recognition stages comprises a sequence feature extraction module, the sequence feature extraction module comprises a color map deep learning network and a CNN network of an optical flow deep learning network, the color map deep learning network comprises three LSTM layers, and the optical flow deep learning network comprises two LSTM layers.

Further, the number of neurons in the hidden layer in the LSTM layer is 200.

Further, the training phase includes the steps of:

step 1, acquiring an action video, splitting the action video into frame images, calculating a light flow diagram, extracting a frame of image at intervals of 16 frames to serve as a sequence center frame, and marking an action position;

step 2, respectively generating a sequence picture sample and a label, a center frame picture sample and a position label and a sequence optical flow picture sample and a label from the video sequence image for training a corresponding feature extraction model;

Step 3, sending the sequence picture sample and the label into a color image deep learning network, sending the center frame picture sample and the position label into a CNN network, sending the sequence optical flow picture sample into the optical flow deep learning network, and extracting features;

Step 4, fusing the extracted features of the three network models to generate feature codes corresponding to the video sequences;

Step 5, the feature codes are sent into a convolution network, and the receptive fields of the video sequence features are subjected to different time scale changes;

step 6, sending the feature code samples with different receptive fields into a video recognition network to generate a recognition model;

And 7, iterative training until the recognition model converges.

Further, feature codes of the video sequence in the identification stage are generated by the sequence feature extraction module, and the feature codes are identified and classified after the feature codes change the receptive field through a convolution network.

Further, the convolution network adopts a four-layer structure.

Compared with the prior art, the invention has the beneficial effects that:

1. The redesigned deep learning network structure can better extract the characteristics of the video sequence, and the motion recognition precision is high.

2. The four-layer convolution network is adopted to carry out receptive field change on the video sequence feature codes, and the problem that when the length of a picture sequence containing actions in a sequence image is much smaller than the length of a complete sequence, the actions cannot be detected is effectively solved on the premise of ensuring the identification instantaneity.

Drawings

FIG. 1 is a flow chart of the model training of the present invention;

FIG. 2 is a color map deep learning network workflow diagram;

FIG. 3 is an optical flow deep learning network workflow;

FIG. 4 is a CNN network workflow diagram;

FIG. 5 is a flow chart of the motion recognition of the present invention;

Fig. 6 is a convolutional layer network workflow diagram.

Detailed Description

As shown in fig. 1, a training phase in the human motion recognition method of the present invention includes:

step 2, respectively sending the video sequence images into an image sequence processing unit, a center frame image processing unit and an optical flow sequence processing unit to generate a sequence picture sample and a label, a center frame picture sample and a position label and a sequence optical flow picture sample and a label, which are used for training a corresponding feature extraction model;

And 7, iterative training until the recognition model converges.

The image sequence processing unit, the center frame image processing unit, the optical flow sequence processing unit, the color map deep learning network, the CNN network, the optical flow deep learning network and the feature fusion unit form a sequence feature extraction module.

Because human motion is continuous and the acquired image frames are discrete, the history information of the previous frame image is correlated to the image of the current frame. The deep learning network is mainly constructed as a CNN network, and the invention constructs a color map deep learning network and an optical flow deep learning network on the basis of the CNN network. The CNN network adopts an SSD network layer to extract specific position information of actions in the key frames. As shown in fig. 2 and 3, the color map deep learning network adds three LSTM layers, and the optical flow deep learning network adds two LSTM layers. Wherein the hidden layer in the LSTM layer has 200 neurons. After adding the LSTM layer, the recognition method is enabled to learn long image sequences. Compared with an algorithm for identifying only by adopting a single frame of picture, the method for identifying the deep learning network by utilizing the reconstruction can better utilize the time sequence information of the sequence video, and effectively improve the detection accuracy.

As shown in fig. 5, the recognition stage in the human motion recognition method of the present invention includes:

step 2, generating a feature code corresponding to the video sequence by utilizing a sequence feature extraction module;

step 3, the feature codes are sent into a convolution network, and the receptive fields of the video sequence features are subjected to different time scale changes;

Step 4, classifying the feature codes with different receptive fields;

And 5, obtaining a human body action recognition result.

As shown in fig. 6, the convolutional network used in the training and recognition process has a four-layer structure, and is used for changing the receptive field of the feature code, and after the feature code passes through the four-layer convolutional layers, the receptive field is changed four times. The purpose of the changing receptive field is to have a part of the image in a sequence of a certain length also participate in the determination of the detection result, i.e. the result is determined by the whole feature code data and part of the feature code data together. The convolution network is composed of time sequence convolutions, one-dimensional convolutions of conv9 are used in each layer of convolutions, the step length is 1, and each convolution layer is matched with one pooling layer.

The above embodiments are merely preferred embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalent substitutions, improvements, etc. within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The human body action recognition method is based on a deep learning technology and is characterized in that the human body action recognition method comprises two stages of training and recognition, a sequence feature extraction module is arranged in a network used in the training and recognition stages, the sequence feature extraction module comprises a color image deep learning network, an optical flow deep learning network and a CNN network, the color image deep learning network is added with three LSTM layers on the basis of the CNN network, the optical flow deep learning network is added with two LSTM layers on the basis of the CNN network, and the CNN network adopts an SSD network layer;

the number of neurons in the hidden layer in the LSTM layer is 200;

the training phase comprises the following steps:

step 2, respectively generating a sequence picture sample and a label from the video sequence image; center frame picture sample position and label; the sequence optical flow picture sample and the label are used for training the corresponding feature extraction model;

step 7, iterative training is carried out until the recognition model converges;

The feature codes of the video sequence in the identification stage are generated by the sequence feature extraction module, and the feature codes are identified after the receptive field is changed by the convolution network;

the convolution network adopts a four-layer structure and is formed by time sequence convolution;

Each convolution layer in the convolution network uses one-dimensional convolution, the step length is 1, and each convolution layer is matched with one pooling layer.