CN113191230A

CN113191230A - Gait recognition method based on gait space-time characteristic decomposition

Info

Publication number: CN113191230A
Application number: CN202110426681.8A
Authority: CN
Inventors: 云静; 高硕�; 邢红梅; 张丽霞; 刘利民
Original assignee: Inner Mongolia University of Technology
Current assignee: Inner Mongolia University of Technology
Priority date: 2021-04-20
Filing date: 2021-04-20
Publication date: 2021-07-30

Abstract

A gait recognition method based on gait space-time characteristic decomposition comprises the steps of preprocessing an obtained video, and converting an original gait sequence containing a plurality of frames into a dynamic skeleton sequence expressed in an array form; extracting gait space-time characteristics of the actor based on two dimensions of time and space by using a space-time graph convolution network, and fusing the gait space-time characteristics to form a gait characteristic graph; performing optimal decomposition on parameters of each order factor matrix in the gait feature map based on CP decomposition to obtain main gait features; identifying an object to be identified based on the primary gait features. Compared with the prior art, the gait recognition method avoids the interference of external factors such as clothes and backpack factors on the human body through the dynamic human skeleton sequence, and further improves the gait recognition accuracy.

Description

Gait recognition method based on gait space-time characteristic decomposition

Technical Field

The invention belongs to the technical field of artificial intelligence and pattern recognition, and particularly relates to a gait recognition method based on gait space-time characteristic decomposition.

Background

Currently, with the continuous improvement of the AI algorithm precision and the large amount of outbreaks of application scenes, the face recognition technology and the iris recognition technology become the most widely applied biometric representative technologies in the application scenes. However, the requirements of face recognition and iris recognition on the external environment are still relatively strict, and especially in large-scale crowded places, the recognition accuracy is affected by factors such as light, shielding, installation angle, and matching degree.

Gait recognition is an emerging biological feature recognition technology as an uncontrolled feature recognition mode, and aims to identify the identity of people through walking postures. Compared with static biological characteristics such as fingerprints, human faces, palmprints, veins and the like, the gait belongs to dynamic characteristics, supports remote identification, does not need rigid cooperation, has strong environmental adaptability and the like. Gait recognition starts to enter the industry fields of security, traffic, industry and the like to develop relevant application, and is AI technology application which brings innovation to the industry. As a biometric identification technology for identity authentication in crowded places, gait identification can well make up for application defects of face identification due to application advantages of long identification distance, wide application range, no need of cooperation and the like, and thus, the gait identification is gradually paid more attention.

Most importantly, gait recognition has higher anti-counterfeiting performance compared with other biological characteristics. Almost all recognition modes are interfered by occlusion, but the gait has unique advantages relatively speaking, namely that the recognition distance is long, so that the adjustment is carried out for a longer time, and the real-time response (such as occlusion removal) to new changes is carried out, in addition, the gait recognition recognizes the whole body information, the 360-degree full-view angle recognition can be realized, and even if the illumination is changed, or the clothes to be worn are changed, even if the face is completely occluded, the recognition can still be realized. In criminal investigation, even though criminals who have counterreconnaissance awareness can miss the face recognition system by makeup and shielding means, it is difficult to confuse the face recognition system by disguising a walking posture. Therefore, in the criminal investigation field, the gait recognition technology is playing an increasingly important role with its unique application advantages, and is more flexible and efficient compared with other recognition methods.

Most of the existing gait recognition technologies obtain the human body outline by means of a back shadow subtraction method, and then human body gait features are extracted from the human body outline. The method is greatly interfered by the human body shape, for example, the human body wears clothes with different thicknesses or the human backpack carries problems in different seasons, which cause great interference to the human body shape, thereby influencing the extraction of the human gait characteristics

Disclosure of Invention

In order to overcome the defects of the prior art, the invention aims to provide a gait recognition method based on gait space-time characteristic decomposition, a gait sequence of a person walking is captured by a camera, a dynamic skeleton sequence of the person is extracted by using a human body posture estimation tool, gait characteristics are extracted from the skeleton sequence by using a space-time graph convolution network, and in order to further eliminate interference factors influencing the gait recognition performance in the gait characteristics, the invention also selects to decompose a gait characteristic diagram to obtain main gait characteristics, thereby realizing the recognition of the person identity.

In order to achieve the purpose, the invention adopts the technical scheme that:

a gait recognition method based on gait space-time characteristic decomposition comprises the following steps:

step 1, modeling human body posture

Preprocessing the obtained video, and converting an original gait sequence containing a plurality of frames into a dynamic skeleton sequence represented in an array form;

step 2, extracting gait space-time characteristics

Extracting gait space-time characteristics of the actor based on two dimensions of time and space by using a space-time graph convolution network, and fusing the gait space-time characteristics to form a gait characteristic graph;

step 3, decomposing the gait space-time characteristics

Performing optimal decomposition on parameters of each order factor matrix in the gait feature map based on CP decomposition to obtain main gait features;

step 4, classification and identification

Identifying an object to be identified based on the primary gait features.

Compared with the prior art, the gait recognition method avoids the interference of external factors such as clothes and backpack factors on the human body through the dynamic human skeleton sequence, and further improves the gait recognition accuracy.

Drawings

FIG. 1 is a block flow diagram of the present invention.

FIG. 2 is a complete flow diagram of the present invention including details.

Fig. 3 is a human bone space-time diagram constructed in accordance with the present invention.

FIG. 4 is a process of the present invention for extracting timing characteristics using TCN in the time dimension.

Fig. 5 is a flow chart in an embodiment of the present invention.

Detailed Description

The embodiments of the present invention will be described in detail below with reference to the drawings and examples.

As shown in fig. 1 and fig. 2, the gait recognition method based on gait spatiotemporal feature decomposition of the invention comprises:

step 1, modeling human body posture

The original gait sequence of the actor has a plurality of gait-independent interference factors, such as clothes, backpacks and the like of the actor, so that the acquired video needs to be preprocessed. And converting the original gait sequence into a dynamic skeleton sequence, using the estimated joint position in the pixel coordinate system as input, and then carrying out the next step of gait feature extraction. Therefore, the invention utilizes the human body posture estimation tool to process the original gait sequence, obtains the original gait sequence containing a plurality of frames, converts the original gait sequence into a dynamic skeleton sequence expressed in an array form, and applies the dynamic skeleton sequence to the subsequent gait feature extraction step. The specific method comprises the following steps:

step 1.1, an original gait sequence is converted into a dynamic skeleton sequence expressed in an array form by using human posture estimation tools such as OpenPose and the like which are trained in an open source mode. Using the estimated joint positions in the pixel coordinate system as input, discarding the original RGB frames, adjusting the resolution of all videos to 340 × 256 in order to obtain the joint positions, and then using the openpos algorithm to obtain the positions of 18 joints on each frame, openpos giving 2D coordinates (X, Y) in the pixel coordinate system and confidence θ of 18 human joints;

step 1.2, for the generated 18 human joint coordinates and corresponding Confidence coefficients, representing each joint by using a tuple (X, Y, theta), encoding 2D vectors of the positions and the directions of limbs in an image domain by using Partial Affinity Fields (PAF), and marking the Confidence coefficient of each joint point by using a partial Detection Confidence Coefficient Map (CMP); through two branches of the partial affinity and partial detection confidence maps, the relation between the joint point position and the whole is jointly learned, and therefore a skeleton frame is recorded as an array of 18 tuples. For a multi-person case, the present invention may select the person with the highest average joint confidence in each segment. Thus, a fragment with T frames is converted into a skeleton sequence of these tuples. In practice, the present invention uses a tensor of (18,3, T) dimensions to represent the sequence of bone coordinates for each frame.

Step 2, extracting gait space-time characteristics

The walking dynamic skeleton sequence of the agent comprises gait information of the agent in two dimensions of time and space, and the invention aims to extract the gait space-time characteristics of the agent based on the two dimensions of time and space by means of a space-time graph convolution network (ST-GCN) and fuse the gait space-time characteristics to form a gait characteristic graph, thereby better improving the performance of the method, and the specific method comprises the following steps:

step 2.1, 18 skeleton points (i.e. joint points) are obtained in obtaining the 2D pose estimation, the present invention uses 2D or 3D coordinates of each human skeleton in each frame to represent a skeleton sequence, links all joint vectors in each frame into a feature vector, further uses a space-time graph convolutional network (ST-GCN) to form a multi-layer skeleton sequence expression, and further constructs a skeleton space-time graph G ═ V, E, as shown in fig. 3, V is a node matrix set, and includes the skeleton sequence on which the skeleton is arrangedV ═ V, { c ═ V } of all joint points_ti|t＝1,...,T,i＝1,...N}，v_tiIs the ith joint point on the T frame, T is the frame number, N is the number of joint points, E is the set of edges, and is composed of two subsets, the first subset Es is the link of the joint point in each frame, Es ═ v_tiv_tjL (i, j) is belonged to H, H represents a human body joint point set, and a second subset E_FRepresenting links between different frame-to-frame joint points, E_F＝{v_tiv_(t+1)i}，E_FEach edge in (a) represents the trajectory of a particular joint over time; when the skeleton sequence is input into the time-space graph convolution network, the characteristic vector F (v) of the t frame and the i joint point_ti) The system comprises joint point coordinates and confidence degrees, wherein the joint point coordinates refer to the coordinates of the joint point in a 2D coordinate system.

The bone space-time diagram G of the invention can be constructed by the following two steps:

firstly, connecting the same joint point between adjacent frames to obtain an edge between the frames, and representing the time sequence relation of the corresponding joint point of the human body;

and secondly, constructing a space diagram according to the connection relation of the natural skeletons of the human body in each frame. The establishment of such links is dependent on natural structures, with no manual design.

Step 2.2, in the constructed bone space-time diagram, in a single frame of time t, N joint points V_tThe edge within the skeleton es (t) ═ v_tiv_tjL (i, j) belongs to H }. From the definition of convolution operations on 2D natural images or feature maps, they can be considered as two-dimensional meshes. The feature map output by the convolution operation is also a 2D grid. When the selection step size is 1 and the appropriate padding is selected, the output feature map can be made to remain the same size as the input image.

Referring to fig. 4, in a single frame, the spatial convolution of the extracted dynamic bone sequence using GCN yields the gait spatial characteristics of the actor, as follows:

B^(l+1)＝σ(YB^(l)W^(l))

wherein B is^(l)And B^(l+1)Are respectively a convolutionInputs to the l layer and the l +1 layer of the network, W^(l)Is a weight matrix between the l-th layer and the (l +1) -th layer, Y is an N × N adjacency matrix associated with the skeleton space-time graph G, σ is a nonlinear activation function, each weight matrix in the GCN represents a convolution kernel, and a plurality of convolution kernels are applied to the input of the space-time graph convolution network to obtain an eigentensor as an output, i.e., a gait space eigen;

local features to adjacent joints in space are learned using the GCN, on the basis of which the TCN is used to learn local features of joint changes in time.

And in the time convolution, the size of a convolution kernel is Kx 1, the convolution of 1 node and K key frames is completed each time, the stride is set to be 1, namely the convolution of the next 1 node is performed after 1 node is completed, a convolution kernel with the size of Kx K is set, the input image is f_inThe number of channels is c, the output of a single channel at position x:

in the formula, h is the number of key frames, w is the number of neighbor nodes, p (x, h, w) is a sampling function and is used for listing the neighbor nodes at the position x, and w (h, w) is the weight of the neighbor nodes to the root node;

finally, a two-dimensional feature is fused, namely a space map CNN (a real GCN, because the GCN has a CNN convolution principle, gait space features can be extracted in a single frame, then the CNN principle is expanded to a time dimension, after the space features of a joint point are extracted, the time dimension features, namely TCN, are extracted by the CNN principle on the time dimension of the joint point and are expanded to a space time domain, namely the concept of a neighborhood is expanded to the joint which also comprises time connection:

B(v_ti)＝{v_qj|d(v_tj,v_ti)≤K,|q-t|≤[Γ/2]}

thereby forming a weight of the joint point space-time characteristics, due to the weightThe heavy-tracking gait characteristics are corresponding, namely the characteristics of each joint point correspond to a weight parameter, so that the gait characteristic map is obtained equivalently. Wherein B (v)_ti) Denotes v_tiSet of neighbor nodes, the set except for v_tiThe joint points of the nodes in the same frame also comprise joints which are connected in time; d (v)_tj,v_ti) Represents from v_tjTo v_tiQ represents the number of all frames containing node vti; Γ represents the temporal kernel size.

Step 3, decomposing the gait space-time characteristics

During the walking process of the human body, the joints move in the form of local small groups, however, the gait characteristic diagram formed in the step 2 contains the movement information of all the joints when the human body walks, and redundant characteristics irrelevant to the gait exist, so the formed gait characteristic diagram needs to be optimized. The invention adopts the idea of CP decomposition to carry out optimization decomposition on the parameters of each order factor matrix in the characteristic diagram so as to eliminate the interference factors influencing the gait recognition performance in the gait characteristics and obtain the main gait characteristics, thereby achieving the optimal performance of the whole model, and the method comprises the following steps:

step 3.1: the higher order tensor

Decomposed in the form of the sum of R tensors of rank 1, i.e.

Representing the vector outer product, R representing the total number of rank-tensors of the decomposition, h_r∈R^D，j_r∈R^V，k_r∈R^T，H＝[h₁,h₂,...,h_R]，J＝[j₁,j₂,...,j_R]，K＝[k₁,k₂,...,k_R]Direction of expressionThe combination of quantities, the process of decomposition of the gait spatiotemporal characteristics is represented by the following formula:

then normalizing H, J and K, and extracting a weight vector lambda epsilon R^RObtaining the decomposition structure of X:

h_R、h₁、j_R、j₁、k_R、k₁representing characteristic factors decomposed from the characteristic diagram in each dimension;

step 3.2: the value range of R is limited by calculating the weak upper bound of the tensor maximum rank, i.e. an iterative method that traverses R from 1 until a suitable solution is found:

rank(X)≤min{DV,DT,VT}

step 3.3: after obtaining the decomposition structure and R of X, optimizing the matrixes H, J and K to obtain the reasonable decomposition structure of X, and before optimization, matrixing X on each dimension to obtain the matrix X₍₁₎，X₍₂₎And X₍₃₎The formula is as follows:

X₍₁₎≈H(K·J)^T

X₍₂₎≈J(K·H)^T

X₍₃₎≈K(J·H)^T

ALS can be used using alternating least squares, for X₍₁₎，X₍₂₎And X₍₃₎Optimizing, namely optimizing the matrixes H, J and K to obtain a reasonable decomposition structure of X, wherein the process is as follows:

wherein

The ALS method is to fix two factor matrixes to solve the last factor matrix and continuously repeat the whole process until a certain convergence criterion is met;

h can be solved by fixing J and K first, yielding:

wherein

Then, an optimization result is obtained:

f represents a pseudo-inverse, and iteration is repeated until a solution that the target function stops descending is found; to facilitate understanding and implementing the optimization process, the present invention modifies the above formula into

The process of ALS requires repeated iterations to converge until a solution is found where the objective function stops decreasing. By this method, X can be used separately₍₂₎And X₍₃₎To obtain corresponding

And

finally, the invention obtains the optimal decomposition structure of the gait feature diagram:

which is the main gait characteristic.

Step 4, classification and identification

Identifying an object to be identified based on the primary gait features.

Fig. 5 is a schematic flow chart according to an embodiment of the present invention, where the embodiment is established on a cloud computing platform, and the platform is composed of 10 NVIDIA Tesla P100 servers, and includes Vmware essi 5, 20T disk array, and 1000M network switch, and a Hadoop cluster is deployed.

The method comprises the steps of firstly capturing an original gait sequence of a tester when the tester walks by using a camera, carrying out posture modeling on the tester by OpenPose to obtain a dynamic skeleton sequence under the original gait sequence, and obtaining 18 joint positions on each frame. Thus, an original gait sequence comprising several frames is transformed into a dynamic bone sequence represented in array form. And then inputting the dynamic skeleton sequence obtained by preprocessing into ST-GCN, extracting gait features, and after all space-time convolution layers are processed, increasing the joint feature dimension, keeping the joint number unchanged and reducing the key frame number. In order to enhance the expressive power of the model, all spatio-temporal convolutional network layers follow the ReLU nonlinearity, except for the complete connection layer, all layers are followed by a normalization layer to ensure the convergence of data. Then, for each gait sequence, a third-order gait tensor characteristic map is obtained. After the gait feature map is obtained, CP decomposition is carried out, parameters of factor matrixes of each order in the feature map are subjected to optimized decomposition, namely interference factors such as redundant features irrelevant to gait are deleted, and main gait features are extracted. And finally, performing classification and identification by using Softmax to obtain the similarity of 0.98 with the tester, and considering that the tester is the to-be-detected agent through the similarity score and the identification is correct.

Through the example, the gait recognition method based on the gait space-time characteristic decomposition has the advantages that the gait information of the actor is fused from two dimensions of time and space, the gait characteristics of the actor are fully extracted, in addition, in order to prevent overfitting and remove unnecessary redundant characteristics, the CP decomposition used by the method can better optimize the gait characteristic diagram according to the characteristic weight, and the characteristic vectors which are irrelevant to gait and have low correlation coefficient are deleted, so that the superior performance of the method is ensured.

Although the present invention has been described by way of preferred embodiments, the present invention is not limited to the embodiments described herein, and various changes and modifications may be made without departing from the scope of the present invention.

Claims

1. A gait recognition method based on gait space-time characteristic decomposition is characterized by comprising the following steps:

step 1, modeling human body posture

step 2, extracting gait space-time characteristics

Extracting gait space-time characteristics of an actor based on two dimensions of time and space by using a space-time graph convolutional network (ST-GCN), and fusing the gait space-time characteristics to form a gait characteristic graph;

step 3, decomposing the gait space-time characteristics

step 4, classification and identification

Identifying an object to be identified based on the primary gait features.

2. The gait recognition method based on gait spatio-temporal feature decomposition as claimed in claim 1, characterized in that in step 1, the original gait sequence is converted into dynamic skeleton sequence by using OpenPose human posture estimation tool trained in advance.

3. A gait recognition method based on gait spatiotemporal feature decomposition according to claim 2, characterized in that the specific method of transformation is as follows:

step 1.1, using the estimated joint position in the pixel coordinate system as input, discarding the original RGB frame, adjusting the resolution of all videos to 340 × 256, and then using the openpos algorithm to obtain the positions of 18 joints on each frame, including 2D coordinates (X, Y) in the pixel coordinate system and the confidence θ of 18 human joints;

step 1.2, representing each joint by using a tuple (X, Y, theta), encoding 2D vectors of the positions and the directions of limbs in an image domain by using Partial Affinity Fields (PAF), and marking the Confidence coefficient of each joint point by using a partial Detection Confidence Map (CMP); through two branches of the partial affinity and partial detection confidence maps, the relation between the joint point position and the whole is jointly learned, and therefore a skeleton frame is recorded as an array of 18 tuples.

4. A gait recognition method based on gait spatiotemporal feature decomposition according to claim 3, characterized in that for the multi-person case, the person with the highest average joint confidence in each segment is selected.

5. A gait recognition method based on gait space-time characteristic decomposition according to claim 3, characterized in that the specific method of extracting the gait space-time characteristic in the step 2 is as follows:

step 2.1, a skeleton sequence is represented by using 2D or 3D coordinates of each human skeleton of each frame, all joint vectors in each frame are linked into a feature vector, a space-time graph convolution network is used to form an expression of a multi-layer skeleton sequence, and a skeleton space-time graph G ═ V, E is further constructed, wherein V is a node matrix set and contains all joint points on the skeleton sequence, and V ═ V_ti|t＝1,...,T,i＝1,...N}，v_tiIs the ith joint point on the T frame, T is the frame number, N is the number of joint points, E is the set of edges, and is composed of two subsets, the first subset Es is the link of the joint point in each frame, Es ═ v_tiv_tjL (i, j) is belonged to H, H represents a human body joint point set, and a second subset E_FRepresenting links between different frame-to-frame joint points, E_F＝{v_tiv_(t+1)i}，E_FEach edge in (a) represents the trajectory of a particular joint over time; when the skeleton sequence is input into the time-space graph convolution network, the characteristic vector F (v) of the t frame and the i joint point_ti) The joint point coordinate system comprises joint point coordinates and confidence degrees, wherein the joint point coordinates refer to the coordinates of the joint points in a 2D coordinate system;

and 2.2, performing space convolution on the extracted dynamic skeleton sequence by using GCN to obtain gait space characteristics of the actor in a single frame, performing time convolution on the extracted dynamic skeleton sequence by using TCN to obtain gait time characteristics of the actor in the frame, and finally fusing two-dimensional characteristics to form a gait characteristic diagram.

6. The gait recognition method based on gait spatiotemporal feature decomposition according to claim 5, characterized in that the bone spatiotemporal image G is constructed by two steps:

and secondly, constructing a space diagram according to the connection relation of the natural skeletons of the human body in each frame.

7. A gait recognition method based on gait spatiotemporal feature decomposition according to claim 5, characterized in that the bone spatiotemporal image G is spatially convolved with GCN, as follows:

B^(l+1)＝σ(YB^(l)W^(l))

wherein B is^(l)And B^(l+1)Are inputs to the l layer and the l +1 layer of the convolutional network, W, respectively^(l)Is a weight matrix between the l-th and (l +1) -th layers, Y is an N adjacency matrix associated with the bone space-time graph G, σ is a non-linear activation function, each weight matrix in the GCN represents a convolution kernel, and multiple convolution kernels are applied to the outputs of the space-time graph convolution networkFirstly, obtaining a feature tensor as an output, namely gait space features;

the method for convolving the bone space-time diagram G in the time dimension by using the TCN is as follows:

in the time convolution, the size of a convolution kernel is KxK 1, the convolution of K key frames is completed by 1 node each time, the step is set to be 1, namely 1 frame is moved each time, the convolution of the next 1 node is performed after 1 node is completed, a convolution kernel with the size of KxK is set, the input image is f_inNumber of channels c, output of a single channel at position x

Wherein h is the number of the key frame number, w is the number of the neighbor nodes, p (x, h, w) is a sampling function and is used for listing the neighbor nodes at the position x, and w (h, w) is the weight of the neighbor nodes to the root node;

the spatial map CNN is extended to the spatial-temporal domain, i.e. the concept of the neighborhood is extended to also contain temporally connected joints:

B(v_ti)＝{v_qj|d(v_tj,v_ti)≤K,|q-t|≤[Γ/2]}

form the space-time characteristics of human gait, wherein B (v)_ti) Denotes v_tiSet of neighbor nodes, the set except for v_tiThe joint points of the nodes in the same frame also comprise joints which are connected in time; d (v)_tj,v_ti) Represents from v_tjTo v_tiQ represents the number of all frames containing node vti; Γ represents the temporal kernel size.

8. A gait recognition method based on gait spatiotemporal feature decomposition according to claim 1, characterized in that the method of optimizing decomposition in step 3 is as follows:

step 3.1: the higher order tensor

Decomposed in the form of the sum of R tensors of rank 1, i.e.

Representing the vector outer product, R representing the total number of rank-tensors of the decomposition, h_r∈R^D，j_r∈R^V，k_r∈R^T，H＝[h₁,h₂,...,h_R]，J＝[j₁,j₂,...,j_R]，K＝[k₁,k₂,...,k_R]Representing the combination of vectors, the process of gait spatiotemporal feature decomposition is represented by the following formula:

rank(X)≤min{DV,DT,VT}

step 3.3: the matrices H, J, K are optimized to obtain a reasonable decomposition structure for X.

9. The gait recognition method based on gait spatio-temporal feature decomposition according to claim 7, characterized in that in step 3.3, before optimization,performing matrixing on X in each dimension to obtain a matrix X₍₁₎，X₍₂₎And X₍₃₎The formula is as follows:

X₍₁₎≈H(K·J)^T

X₍₂₎≈J(K·H)^T

X₍₃₎≈K(J·H)^T

using alternating least squares ALS for X₍₁₎，X₍₂₎And X₍₃₎Optimizing, namely optimizing the matrixes H, J and K to obtain a reasonable decomposition structure of X, wherein the process is as follows:

wherein

when J and K are fixed to solve H, we get:

wherein

Then, an optimization result is obtained:

f represents a pseudo-inverse, and iteration is repeated until a solution that the target function stops descending is found;

respectively by X₍₂₎And X₍₃₎To obtain corresponding

And

finally, obtaining an optimal decomposition structure of the gait feature map:

which is the main gait characteristic.