CN111310668B

CN111310668B - Gait recognition method based on skeleton information

Info

Publication number: CN111310668B
Application number: CN202010100136.5A
Authority: CN
Inventors: ***; 尤昭阳; 毕胜; 刘祥
Original assignee: Dalian Maritime University
Current assignee: Dalian Maritime University
Priority date: 2020-02-18
Filing date: 2020-02-18
Publication date: 2023-06-23
Anticipated expiration: 2040-02-18
Also published as: CN111310668A

Abstract

The invention provides a gait recognition method based on skeleton information, which comprises the following steps: acquiring gait video sequences; performing gesture estimation on the gait video sequence by using OpenPose to obtain a gait key point sequence; constructing a space-time skeleton sequence; inputting the adjacency matrix and gait key point sequences into a multi-scale space-time diagram convolution network for training; after training, testing by using a trained model, extracting gait characteristics, and performing characteristic matching. The invention mainly adopts a human body key point form, introduces a graph convolution neural network aiming at a graph structure, improves a connection mode and a division strategy, combines cross entropy loss and contrast loss by adopting a twin mechanism, fuses shallow layer characteristics, middle layer characteristics and deep layer characteristics of the network, and improves the robustness of gait recognition to a certain extent.

Description

Gait recognition method based on skeleton information

Technical Field

The invention relates to the technical field of pattern recognition, in particular to a gait recognition method based on skeleton information.

Background

Gait recognition is an emerging biological feature recognition technology, aims to identify people by walking gestures in a video sequence, and has the advantages of non-contact, long distance and difficult disguising compared with other biological recognition technologies. The method has more advantages in the fields of security protection and intelligent monitoring, and the gait recognition application still has some problems in the actual complex environment so far.

In recent years, scientific research institutions at home and abroad pay more attention to gait recognition technology. The prior art is mainly divided into the following two types:

1. model-based methods. The method mainly comprises the steps of cutting a body into a plurality of blocks or acquiring body joints, and fitting through the joints or partial motion tracks of body motion. Such methods rely primarily on the static characteristics of each block of the body and the motion trajectories of the joints for modeling, including two-dimensional models and three-dimensional models.

2. Non-model based methods. The method is mainly realized by collecting the appearance of human walking and the gait characteristics such as characteristic parameters, and the like, and a walking gait model is not required to be reconstructed. There are roughly three subjects: gait energy diagram, contour diagram sequence and human key point sequence.

In recent years, with great progress of deep learning technology in various fields of computer vision, a large number of gait recognition methods based on deep learning are emerging. For example: and describing a gait sequence by adopting a gait energy diagram, and training a matching model by using a deep convolutional neural network so as to match the identity of a person. The method has the advantages that when the variation range of the walking view angle of a human body is large, the extracted multi-view angle gait feature representation capability is insufficient, and meanwhile, the robustness to clothes, carried objects and the like is low. Furthermore, the gait feature representation and matching are performed using the pose information. The human body key point coordinates are obtained from the gait video sequence by using an attitude estimation algorithm, the gait key point sequence is trained by using a convolutional neural network and a long-short-time memory network, and meanwhile, manual features are introduced to perform gait recognition. However, the existing gait recognition technology still has the following defects:

1. gait recognition is performed through a gait contour map or a gait energy map, requirements on contour quality and background are high, and the contour map is often extracted incompletely due to the fact that the contour map is greatly influenced by illumination conditions and complex background.

2. The covariates can not be separated from the human body due to the influence of covariates such as clothes, carrying objects and the like, so that the recognition accuracy is reduced.

Disclosure of Invention

According to the technical problem, a gait recognition method based on skeleton information is provided. The invention mainly adopts a human body key point form, introduces a graph convolution neural network aiming at a graph structure, improves a connection mode and a division strategy, combines cross entropy loss and contrast loss by adopting a twin mechanism, fuses shallow layer characteristics, middle layer characteristics and deep layer characteristics of the network, and improves the robustness of gait recognition to a certain extent.

The invention adopts the following technical means:

a gait recognition method based on skeleton information comprises the following steps:

s1, acquiring gait video sequences;

s2, performing gesture estimation on the gait video sequence by using OpenPose to obtain a gait key point sequence;

s3, constructing a space-time skeleton sequence;

s4, inputting the adjacency matrix and gait key point sequences into a multi-scale space-time diagram convolution network for training;

and S5, after training, testing by using a trained model, extracting gait characteristics, and performing characteristic matching.

Further, the step S3 specifically includes:

s31, carrying out natural connection on human bodies in space on the gait key point sequence; at the same time, symmetry is introduced to connect symmetrical joint points (only the symmetrical key points of legs are connected, because symmetry between arms is lost under the condition of carrying objects); connecting the same key points among frames in time;

s32, defining a sampling function, a node v _ti Is defined as:

B(v _ti )＝{v _tj |d(v _tj ,v _ti )≤D},

wherein B is node v _ti Is a neighbor set of (1); v represents a node; d represents a distance; d (v) _tj ,v _ti ) Representing the shortest path between two nodes, typically taking d=1; thus, the sampling function is defined as: p (v) _tj ,v _ti )＝v _tj ；

S33, selecting a partitioning strategy, namely dividing a neighborhood set into four subsets, namely the node is a first subset, wherein in an asymmetric node, the node is a second subset which is closer to the gravity center than the node, the node is a third subset which is farther from the gravity center than the node, and the symmetric node is defined as a fourth subset, namely:

s34, defining a weight function, dividing a neighborhood set into four subsets, wherein each subset is provided with a digital label, and a mapping function l is adopted _ti Mapping each node to its subset labels, the mapping function is defined as: b (v) _ti ) -0, … …, K-1, k=4; the weight function is defined as: w (v) _ti ,v _tj )＝w’(l _ti (v _tj ))；

S35, extending the space diagram convolution to a space time domain, and defining a space time neighborhood set as: b (v) _ti )＝{v _qj |d(v _tj ,v _ti ) K is less than or equal to K, q-t is less than or equal to Γ, B is node v _ti Is a neighbor set of (1); v represents a node;k represents a distance; Γ controls the range of the graph that is included in the neighborhood, i.e., the temporal convolution kernel.

Further, the process of training the multi-scale space-time diagram convolutional neural network is specifically as follows:

s41, after selecting samples, randomly selecting one sample from all samples with the same ID as the selected samples as a positive sample, and randomly selecting one sample from all samples with different IDs as a negative sample;

s42, adopting a twin mechanism, inputting the selected sample into the branch 1, and sequentially inputting the positive sample and the negative sample into the branch 2, wherein the branch 1 and the branch 2 share parameters;

s43, classifying the selected sample characteristics in the branch 1 by adopting SoftMax and a cross entropy loss function;

s44, comparing the characteristics of the selected sample and the positive sample with the characteristics of the selected sample and the negative sample by adopting a comparison loss function; samples from the same ID, label 1, samples from different IDs, label 0.

S45, adding two parts of losses, wherein the total loss is as follows:

Loss＝Lid+0.5*[Lc(sample,pos,1)+Lc(sample,neg,0)]，

and (3) performing back propagation to update the network, wherein Lid is cross entropy loss, lc is contrast loss.

Further, in the step S4, when training the multiscale space-time diagram convolutional neural network, the following setting process is further included:

step 1, inputting a gait sequence, wherein the dimension is [3,100,18], 3 is that 3 channels are provided for inputting key point characteristics, X and Y coordinates and confidence coefficient C are respectively provided, 100 is that the time dimension is 100 frames, and 18 is that 18 key points are provided for each frame;

step 2, outputting 64 channels from the first three layers, wherein the convolution kernel size is (9, 3), 9 is the time convolution kernel size, 3 is the space convolution kernel size, and the output dimension is [64,100,18];

step 3, outputting 128 channels by three layers in the middle, wherein the convolution kernel size is (9, 3), the output dimension is [128,50,18], and the convolution step length of the time dimension is 2 in the fourth layer;

step 4, outputting 256 channels from the rear three layers, wherein the convolution kernel size is (9, 3), the output dimension is [256,25,18], and the convolution step length of the time dimension is 2 in the seventh layer;

step 5, carrying out global average pooling, wherein after pooling, the characteristic dimension is changed into 256 dimensions;

step 6, carrying out dimension exchange on the characteristics [64,100,18] output by the first layer, and carrying out average pooling to obtain 18-dimensional characteristics;

step 7, carrying out dimension exchange on the output characteristics [128,50,18] of the fifth layer, and carrying out average pooling to obtain 18-dimensional characteristics;

step 8, expressing gait characteristics in a mode of fusing shallow layer characteristics and deep layer characteristics, and splicing 18-dimensional characteristics of a first layer, 18-dimensional characteristics of a fifth layer and 256-dimensional characteristics of a last layer to obtain 292-dimensional characteristics;

and 9, classifying 292-dimensional features by adopting a softMax classifier.

The invention also provides a testing method of the gait recognition method based on the skeleton information, which comprises the following steps:

step I: inputting a gait key point sequence to be tested;

step II: extracting gait characteristics by using a trained network, and carrying out two-norm normalization on the characteristics;

step III: performing the operations of the step I and the step II on the samples in the sample library, and using the feature vectors to represent the gait sequence of the pedestrian to be searched and the gait sequence of the pedestrian in the search library;

step IV: calculating the distance between the gait sequence of the pedestrian to be searched and the gait sequence of the pedestrian in the search library, namely calculating the distance between the characteristics of the pedestrian to be searched and all the gait sequences of the pedestrian in the search library aiming at one gait sequence of the pedestrian to be searched;

step V: and sorting the similarity of the samples in the search library from small to large according to the calculated distance, wherein the earlier the sample is, the more likely the sample is consistent with the ID of the pedestrian to be searched.

Compared with the prior art, the invention has the following advantages:

1. the gait recognition method provided by the invention adopts a key point sequence to represent the gait aiming at the covariate influence of backpacks, clothes and the like, solves the problem that the gait characteristics represented by the profile diagram and the energy diagram are lower in robustness under the covariate condition, and improves the recognition accuracy under the covariate influence.

2. According to the gait recognition method provided by the invention, aiming at the characteristic symmetry of gait, symmetry is introduced when a space-time skeleton sequence is constructed, and joint point information of human leg symmetry is added into an adjacent matrix, so that the association degree of related nodes is enhanced, meanwhile, noise caused by inaccurate joint estimation is reduced, and the recognition accuracy is improved.

3. Because the deep convolutional neural network extracts high-level features, the high-level semantic information is singly represented, and static features cannot be described, the expression form of gait features is enriched by adopting a multi-scale mode of fusing shallow features, middle features and deep features, and the recognition accuracy is improved.

For the reasons, the method can be widely popularized in the fields of pattern recognition and the like.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort to a person skilled in the art.

FIG. 1 is a flow chart of the method of the present invention.

Fig. 2 is a schematic diagram of a specific arrangement of a multi-scale space-time convolutional neural network according to the present invention.

FIG. 3 is a schematic diagram of the test method of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

As shown in fig. 1, the invention provides a gait recognition method based on skeleton information, which comprises the following steps:

s1, acquiring gait video sequences;

s3, constructing a space-time skeleton sequence;

further, as a preferred embodiment of the present invention, the step S3 specifically includes:

s31, carrying out natural connection on human bodies in space on the gait key point sequence; because the walking gesture of the person has the characteristic of symmetry, symmetry is introduced at the same time to connect symmetrical joint points (only the symmetrical key points of the legs are connected, because the symmetry between arms is lost under the condition of carrying objects); connecting the same key points among frames in time;

s32, defining samplesFunction, a node v _ti Is defined as:

B(v _ti )＝{v _tj |d(v _tj ,v _ti )≤D},

S35, extending the space diagram convolution to a space time domain, and defining a space time neighborhood set as: b (v) _ti )＝{v _qj |d(v _tj ,v _ti ) K is less than or equal to K, q-t is less than or equal to Γ, B is node v _ti Is a neighbor set of (1); v represents a node; k represents a distance; Γ controls the range of the graph that is included in the neighborhood, i.e., the temporal convolution kernel.

further, as a preferred embodiment of the present invention, as shown in fig. 2, the step S4 further includes the following setup procedure before training the multiscale space-time diagram convolutional neural network:

step 8, because the deep convolutional neural network extracts high-level features, the high-level semantic information is singly represented, and static features cannot be described, gait features are represented by adopting a mode of fusing shallow features and deep features, and 18-dimensional features of a first layer, 18-dimensional features of a fifth layer and 256-dimensional features of a last layer are spliced to become 292-dimensional features;

and 9, classifying 292-dimensional features by adopting a softMax classifier.

In this embodiment, the CASIA-B dataset is used, NM: normal walking conditions, BG: carrier conditions, CL: conditions of wearing the overcoat. The following table shows:

the process of training the multi-scale space-time diagram convolutional neural network is specifically as follows:

in the training stage, the aim is to train the network to extract the characteristics representing pedestrians, so the network trains in a classified mode, and the specific steps are as follows:

s44, comparing the characteristics of the selected sample and the positive sample with the characteristics of the selected sample and the negative sample by adopting a comparison loss function; samples from the same ID are labeled 1, samples from different IDs are labeled 0.

S45, adding two parts of losses, wherein the total loss is as follows:

Loss＝Lid+0.5*[Lc(sample,pos,1)+Lc(sample,neg,0)]，

As shown in fig. 3, the invention further provides a testing method of the gait recognition method based on the skeleton information, which comprises the following steps:

step I: inputting a gait key point sequence to be tested;

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced with equivalents; such modifications and substitutions do not depart from the spirit of the technical solutions according to the embodiments of the present invention.

Claims

1. The gait recognition method based on the skeleton information is characterized by comprising the following steps of:

s1, acquiring gait video sequences;

s3, constructing a space-time skeleton sequence; the step S3 specifically comprises the following steps:

s31, carrying out natural connection on human bodies in space on the gait key point sequence; meanwhile, symmetry is introduced to connect symmetrical joint points, and only leg symmetrical key points are connected, because symmetry between arms is lost under the condition of carrying objects; connecting the same key points among frames in time;

s32, defining a sampling function, a node v _ti Is defined as:

B(v _ti )＝{v _tj |d(v _tj ,v _ti )≤D},

wherein the method comprises the steps ofB is node v _ti Is a neighbor set of (1); v represents a node; d represents a distance; d (v) _tj ,v _ti ) Representing the shortest path between two nodes, typically taking d=1; thus, the sampling function is defined as: p (v) _tj ,v _ti )＝v _tj ；

S35, extending the space diagram convolution to a space time domain, and defining a space time neighborhood set as: b (v) _ti )＝{v _qj |d(v _tj ,v _ti ) K is less than or equal to K, q-t is less than or equal to Γ, B is node v _ti Is a neighbor set of (1); v represents a node; k represents a distance; Γ controls the range of the graph included in the neighborhood, i.e., the temporal convolution kernel;

s4, inputting the adjacency matrix and gait key point sequences into a multi-scale space-time diagram convolution network for training; the process of training the multi-scale space-time diagram convolutional neural network is specifically as follows:

s44, comparing the characteristics of the selected sample and the positive sample with the characteristics of the selected sample and the negative sample by adopting a comparison loss function; samples from the same ID, with a 1 tag, samples from different IDs, and a 0 tag;

s45, adding two parts of losses, wherein the total loss is as follows:

loss=lid+0.5 [ Lc (sample, pos, 1) +lc (neg, 0) ], where Lid is the cross entropy Loss, lc is the contrast Loss, and then back propagation is performed to update the network;

the method also comprises the following setting process when training the multi-scale space-time diagram convolutional neural network:

step 8, expressing gait characteristics in a mode of fusing shallow layer characteristics, middle layer characteristics and deep layer characteristics, and splicing 18-dimensional characteristics of a first layer, 18-dimensional characteristics of a fifth layer and 256-dimensional characteristics of a last layer to obtain 292-dimensional characteristics;

step 9, classifying 292-dimensional features by adopting a softMax classifier;

2. A method of testing a gait recognition method based on skeletal information as claimed in claim 1, comprising the steps of:

step I: inputting a gait key point sequence to be tested;