CN110796110A

CN110796110A - Human behavior identification method and system based on graph convolution network

Info

Publication number: CN110796110A
Application number: CN201911070446.0A
Authority: CN
Inventors: 朱光明; 张亮; 杨露; 李洪升; 沈沛意; 宋娟
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-11-05
Filing date: 2019-11-05
Publication date: 2020-02-14
Anticipated expiration: 2039-11-05
Also published as: CN110796110B

Abstract

The invention discloses a human behavior identification method and a human behavior identification system based on a graph convolution network, wherein the identification method comprises the following steps: extracting human body skeleton information from an image containing human body behaviors, acquiring a human body joint point position information sequence, and constructing a topological graph sequence with any length of a human body skeleton; performing feature extraction and adaptive evolution of a topological structure on a topological graph sequence through a space-time graph convolution network based on topological learnable graph convolution to obtain new features of nodes fusing local space-time features and a topological graph sequence with a new topological structure; extracting features through a graph convolution long-term and short-term memory neural network; obtaining global space-time characteristics by using global pooling operation; and carrying out human behavior recognition based on the global space-time characteristics through a classifier. The method directly learns the characteristics of the whole graph, expands the weight matrix in graph convolution to the structure of the whole topological graph, learns the relation between any two nodes in the graph without the limitation of the topological structure, and has high identification accuracy.

Description

Human behavior identification method and system based on graph convolution network

Technical Field

The invention belongs to the technical field of artificial intelligence, and relates to a human behavior identification method and a human behavior identification system based on a graph convolution network, which can be used for action identification of a topological graph sequence.

Background

Convolutional neural networks have achieved tremendous success in many areas, but rely on data characterization with a grid structure. However, data in many fields is not in a grid structure, and data in irregular domains usually shows a topological graph structure, so that the convolutional neural network is difficult to popularize in the graph domain. In order to maintain the characteristic that graph convolution products keep the characteristic, usually a transition matrix is defined on each node and a weight matrix is defined for the degree of the node so that graph convolution can learn on different topological subgraphs, and a corresponding space division rule and a rule used for determining are designed according to the number of domain subsets of the nodes of the space-time graph. The existing self-adaptive graph convolution can only learn the topological self-adaptive relationship between adjacent nodes, and the learning capability of the relationship between nodes with longer distance is insufficient. Furthermore, due to the limitations of the transition matrix in graph convolution, there is often a lack of effective modeling of long-term temporal relationships between sequences of topological graphs.

The topology of the topological graph data is usually fixed at all layers of the network, but the natural topology is not necessarily optimal, so the graph convolution network with the ability to learn arbitrary topologies has great significance to the convolutional neural network in the field of topological structure data.

Disclosure of Invention

In order to solve the problems, the invention provides a human behavior identification method based on a graph convolution network, which directly learns the characteristics of the whole graph, expands a weight matrix in the graph convolution to the structure of the whole topological graph, learns the relationship between any two nodes in the graph without being limited by the topological structure, and has high identification accuracy; meanwhile, a recurrent neural network is introduced to model the long-term time relation of the topological graph sequence, so that the problems in the prior art are solved.

The invention also aims to provide a human body behavior identification method and system based on the graph convolution network.

The invention adopts the technical scheme that a human behavior identification method based on a graph convolution network comprises the following steps:

s1, extracting human skeleton information from the image containing human behavior, obtaining a human joint point position information sequence, and constructing a topological graph sequence with any length of the human skeleton by taking each joint point as a node and the skeleton between the joint points as an edge;

s2, performing feature extraction and adaptive evolution of a topological structure on the topological graph sequence through a space-time graph convolution network based on topological learnable graph convolution to obtain new node features fusing local space-time features and a topological graph sequence with a new topological structure;

s3, extracting the characteristics of the new topological graph sequence through the graph convolution long-term and short-term memory neural network to obtain a topological graph sequence with long-term space-time characteristics;

s4, further fusing the characteristics of the topological graph sequence by using global pooling operation to obtain global space-time characteristics;

and S5, recognizing human body behaviors by using a classifier based on the global space-time characteristics.

Further, in the step S1, the topological graph sequence of the human skeleton is composed of a plurality of topological graph structures, and the topological graph structures are represented by formula (1-1);

G＝(V,E)＝(f_v,w_E) (1-1)

wherein G is a topological graph structure of a human skeleton, and a node set V ═ V_ti|t＝1,…,T,i＝1,…, N represents human joints, T is the frame number of the sequence, N is the number of the joints, and the node set V comprises all the nodes in the skeleton sequence at each moment; the edge set E consists of two edge sets of a space domain and a time domain, and the edge set E in the space domain_S＝{v_tiv_tjL (i, j) belongs to H, and represents the edge of the t frame node i and the node j, wherein H is a set of natural connection of human joints; edge set E in time domain_T＝{v_tiv_(t+1)iRepresents the connection between the front and back frames of the same node; f. of_vFeature vectors, w, representing nodes_ERepresenting the connection weight of the edge.

Further, in step S2, specifically, the step includes:

s21, the space-time graph convolution network based on the topology learnable graph convolution is provided with a plurality of graph convolution blocks, and the space-domain feature and the time-domain feature are learnt for each graph convolution block respectively to obtain a node feature vector fusing local space-time features;

spatial domain feature learning: learning the spatial domain features by using a node feature learning function to obtain a node feature vector fusing local spatial domain features, wherein the formula is shown as (1-2):

wherein W is a node characteristic learning parameter matrix,

is node v_iOf the feature vector of (1), node v_iIs the ith node in the topological graph, W_mRepresenting the m-th dimension of the matrix W,

representing a node v_iThe corresponding feature vector, namely the content stored in the data structure corresponding to the node, M represents the corresponding dimension of the vector or the matrix; normalizing the learned airspace features by using a batch normalization function, and finally processing the features by using a linear rectification activation function;

learning time domain features: learning time domain features by using a time domain convolution function, and then normalizing the learned time domain features by using a batch standardization function;

s22, after airspace feature learning, fusing the airspace feature vector through a node fusion function GFuse (-) to obtain the connection weight of a new edge set; GFusion (. cndot.) is implemented using a matrix multiplication between topology learnable fusion weights and node features with a specific initialization, as shown in equations (1-3):

wherein L represents a topology learnable fusion parameter matrix,is a node v_iFeature vector of, L_ijIs node v_iAnd v_jWith a learnable fusion weight initialized by normalizing the adjacency matrix or the all-0 matrix, v_jIs a dividing node v in the topological graph_iAll nodes except "⊙" represent the product of the elements of the two matrices,

representing a node v_jThe topological learnable fusion parameter matrix L is self-adaptive and is realized by utilizing two-dimensional convolution or matrix multiplication with convolution kernel size of 1 multiplied by 1;

and S23, substituting the node feature vector fusing the local space-time features and the connection weight of the new edge set into formula (1-1) to obtain a topological graph sequence with a new topological structure.

Further, the topological graph sequence with long-term spatio-temporal characteristics in step S3 is determined according to equation (1-4):

F_vt＝GCNLSTM(STGCN(I)) (1-4)

wherein, F_vtIs the long-time space-time characteristic of a node v in the t-th frame, I is a human skeleton topological graph sequence shown in a formula (1-1), STGCN is a space-time graph convolution network based on topological learnable graph convolution, GCNLSTM is a graph convolution long-term and short-term memory network,the specific implementation mode is shown as the formula (1-5):

wherein, W_xiAnd W_hiIs the weight of the input and hidden states in the input gate, W_xfAnd W_hfIs the weight of the input and hidden states in the forgetting gate, W_xoAnd W_hoIs the weight of the input and hidden states in the output gate, W_xcIs the weight of the input in the cell state, W_hcIs the weight of the hidden state in the cell state, "+_g"represents a graph convolution operation, X_tIs input at the current time, H_tIs a hidden state at the present time, H_t-1Is a hidden state at the previous moment, b_i，b_f，b_oAnd b_cRespectively, the deviations of the input gate, the forgetting gate, the output gate and the cell state, sigma is an S-shaped Sigmoid function,

i_t，f_tand o_tGate function values, C, for input, forgetting and output gates, respectively_t-1The state of the cells at the previous moment,representing a Hadamard product, and tanh is a hyperbolic tangent function; c_tThe cell state at the current time t.

Further, the step S4 is specifically performed according to the following steps:

s41, firstly, performing mean pooling operation on all node characteristics at each moment to obtain a characteristic vector at each moment, as shown in the formula (1-6):

wherein, F_vtFor long-term spatiotemporal characteristics, F_tFor the feature vector after the fusion at the time t, GPooling () is a node feature mean value pooling function and represents that for each nodePerforming mean pooling operation on all nodes of a moment feature graph to obtain a feature vector of each moment;

s42, aggregating the feature vectors at each moment by using a time domain mean global pooling operation to obtain global space-time features, as shown in the formula (1-7):

wherein, F_tAnd F is the global space-time feature obtained by fusion, and TPooling () is a time domain mean global pooling function, and the feature vectors at all the moments are pooled to obtain the global space-time feature.

Further, the step S5 is specifically represented by the formula (1-8);

where C is the number of behavior classes, C_kIs the k-th behavior class, S_kAnd S_iThe probability that the global space-time feature F belongs to the k-th behavior class and the i-th behavior class is obtained through the known full-connected layer function calculation, and e is a constant.

Further, the topology learnable fusion parameter matrix L and the node characteristic learnt parameter matrix W are learnt and optimized through back propagation.

Further, the determination of the spatial edge set of the topological graph with the new topological structure comprises: node v when t frame_tiAnd node v_tjWhen the fusion weight between the nodes is not 0, it represents the node v_tiAnd node v_tjHave a spatial relationship between them, form a new edge.

A behavior recognition system based on a spatio-temporal graph convolution and a graph convolution long-term and short-term memory network adopts the human behavior recognition method based on the graph convolution network, and comprises the following steps:

the topological graph sequence construction module is used for extracting human skeleton information from an input image, acquiring a human joint point position information sequence, and constructing a topological graph sequence of a human skeleton by taking all joint points as nodes and bones among the joint points as edges;

the space-time graph convolution network is used for carrying out feature extraction and adaptive evolution of a topological structure on the topological graph sequence to obtain new node features fusing local space-time features and the topological graph sequence with the new topological structure;

the graph convolution long-short term memory neural network is used for extracting the characteristics of the topological graph sequence of the new topological structure to obtain the topological graph sequence with long-term space-time characteristics;

the global pooling module is used for further fusing the characteristics of the topological graph sequence to obtain global space-time characteristics;

and the classifier is used for carrying out human behavior identification based on the global space-time characteristics.

The invention has the beneficial effects that:

(1) the graph convolution is separated into two operations of feature learning and node fusion, a new topological graph except a manually set topological structure is learned by expanding the range of node fusion, the self-adaptive relation of the whole topological graph is learned by a specifically initialized topology learnable fusion parameter matrix, and a weight matrix in the graph convolution is expanded to the whole topological graph structure, so that the relation between interconnected nodes can be learned, the relation between two unconnected nodes can be learned, and the graph convolution is flexible to use and good in adaptability; the characteristics of the whole topological graph structure are directly learned, so that a human body skeleton sequence is converted into deep space-time characteristics, the integrity of the characteristics of the topological structure is kept in the whole learning process, the characteristics of the whole topological graph are more effectively extracted, the identification accuracy is improved, and the problem that the learning capability of the existing self-adaptive graph convolution on topological graphs except for the manually set topological structure is insufficient is solved.

(2) The method is combined with a cyclic neural network to learn the long-term time-space characteristics of the topological graph sequence, is used for human behavior recognition based on the skeleton sequence data, effectively learns the characteristics of topological structure data, and solves the problem that the long-term time characteristic modeling capability of the conventional adaptive graph convolution on the topological graph sequence is insufficient.

(3) The invention can use the latest classifier to improve the performance, has good flexibility and expandability, effectively solves the problem of inconsistent duration time between different actions by converting the action sequence into the global space-time characteristic, effectively models the relation between body parts lacking physical connection, realizes the study of a dynamic topological structure sequence, and can be applied to the applications of human behavior recognition, gesture recognition, facial expression recognition and the like based on a skeleton sequence.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of a behavior recognition method based on spatiotemporal graph convolution and graph convolution long and short term memory networks according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of obtaining a new topological graph sequence by a graph convolution method capable of topology learning according to an embodiment of the present invention.

FIG. 3 is a block diagram of a behavior recognition system based on spatiotemporal graph convolution and graph convolution long and short term memory networks according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a human behavior recognition method based on a spatio-temporal graph convolution and a graph convolution long-term and short-term memory network, as shown in figure 1, the method comprises the following steps:

s1, extracting human skeleton information from the image containing human behavior, obtaining a human joint point position information sequence, constructing a topological graph sequence of the human skeleton by taking each joint point as a node and bones among the joint points as edges, wherein the skeleton information is usually extracted from a color image or a depth image;

the topological graph sequence of the human skeleton consists of a plurality of topological graph structures, and the topological graph structures are represented by formula (1);

G＝(V,E)＝(f_v,w_E) (1)

wherein G is a topological graph structure of a human skeleton, and a node set V ═ V_tiL T1, …, T, i 1, …, N, where T is the number of frames in the sequence, N is the number of joints, and the node set V includes all nodes in the skeleton sequence at each time; the edge set E consists of two edge sets of a space domain and a time domain, and the edge set E in the space domain_S＝{v_tiv_tjL (i, j) belongs to H, and represents the edge of the t frame node i and the node j, wherein H is a set of natural connection of human joints; edge set E in time domain_T＝{v_tiv_(t+1)iRepresents the connection between the front and back frames of the same node; f. of_vFeature vectors, w, representing nodes_ERepresenting the connection weight of the edge.

S2, performing feature extraction and adaptive evolution of a topological structure on the topological graph sequence through a space-time graph convolution network based on topological learnable Graph Convolution (GCN), and obtaining new node features fusing local space-time features and a topological graph sequence with a new topological structure;

the convolution of the topology learnable graph can be completed by two steps of node feature learning and node feature fusion, and the convolution function of the topology learnable graph is obtained by a node feature learning function and a node fusion function, as shown in formula (2):

GraphConv (·) is a convolution function of a topology learnable graph, GFusion (·) is a node fusion function, L represents a topology learnable fusion parameter matrix, and FConv (·) is a node feature learning function; the output result of the topology learnable graph convolution function is a node new feature fusing local space-time features and a topological graph sequence with a new topological structure.

S21, the topological learnable graph convolution-based space-time graph convolution network is provided with a plurality of graph convolution blocks, the space domain characteristics and the time domain characteristics of each graph convolution block are learnt respectively, and the node characteristic vector f fusing local space-time characteristics is obtained_v ^′；

Spatial domain characteristics: learning the spatial domain features by using a node feature learning function to obtain a node feature vector fused with local spatial domain features, wherein the formula (3) is as follows:

wherein W is a node characteristic learning parameter matrix,

is node v_iOf the feature vector of (1), node v_iIs the ith node in the topological graph, W_mRepresenting the m-th dimension of the matrix W,representing a node v_iThe corresponding feature vector, i.e. the content stored in the data structure corresponding to the node, M represents the dimension corresponding to the vector or matrix. Normalizing the learned airspace characteristics by using a batch normalization function (BN), accelerating convergence, relieving overfitting, making the network insensitive to the initialization weight and allowing a larger learning rate to be used; and finally, processing characteristics by using a Rectified linear unit (ReLU), so that the calculated amount is saved, the gradient disappearance is avoided, and the overfitting is relieved.

Time domain characteristics: learning time domain features by using a time domain convolution function, and then normalizing the learned time domain features by using a batch standardization function; the convolution function operating on the spatial domain feature is a graph convolution function, because the spatial domain feature is a topological graph structure; the convolution operation operating on time domain features is a common convolution function because time domain features are grid structure data, non-topological structures.

S22, after the airspace feature learning, fusing the airspace feature through a node fusion function GFuse (-) to obtain the connection weight of the new edge set

GFusion (·) is implemented using a matrix multiplication between topology learnable fusion weights and node features with a specific initialization, as shown in equation (4):

wherein L represents a topology learnable fusion parameter matrix,

is a node v_iFeature vector of, L_ijIs node v_iAnd v_jWith a learnable fusion weight initialized by normalizing the adjacency matrix or the all-0 matrix, v_jIs a dividing node v in the topological graph_iAll nodes outside (including but not limited to node v)_iAll nodes of the adjacent nodes) of the two matrices, the relationship between not only the nodes connected by the edges, but also any two nodes can be learned, "⊙" represents the product of the elements of the two matrices,

representing a node v_jThe topology learnable fusion parameter matrix L is self-adaptive and is realized by utilizing two-dimensional convolution or matrix multiplication with the convolution kernel size of 1 multiplied by 1, so that not only the fusion weight of the existing edge can be learnt, but also the fusion weight between any two nodes can be learnt; for example, in human skeletons, there is no natural connection between the joints of the left and right hands, but there is often a correlation between the two when performing an action, and this implicit relationship can be learned by a topology learnable graph volume; the fusion parameter matrix L and the node characteristic learning parameter matrix W can be learned through back propagation learning topology, and parameters are optimized.

In the embodiment, the space-time graph convolution network based on topology learnable graph convolution has 7 graph volume blocks, the number of channels of each graph volume block is 64, 128, 256 and 256, respectively, and the number of the graph volume blocks and the corresponding number of the channels have no specific requirements, and belong to the setting of hyper-parameters in a neural network.

S23, fusing the node feature vector f of the local space-time feature_v', connection weight of new edge set

Substituting formula (1) to obtain a topological graph sequence with a new topological structure, as shown in formula (5):

wherein G is_GCNIs a new topological graph sequence, V represents a node set, E'_SIs a set of edges, f 'of a topology graph having a new topology'_vTo fuse the node feature vectors of the local spatio-temporal features,the connection weight of the new edge set.

Edge set E 'of topology graph with new topology in air domain'_S＝{v_tiv_tj|L_i,jNot equal to 0}, i.e. the node v of the t-th frame_tiAnd node v_tjWhen the fusion weight between the nodes is not 0, it represents the node v_tiAnd node v_tjThe two sides have a spatial relationship, a new edge is formed, and the topological structure of the topological graph is updated. The time domain convolution operation does not change the topology of the graph, but only updates the characteristics of each graph node. Only the topology learnable graph convolution proposed by the invention can simultaneously update the graph node characteristics and the graph topology structure, and the topology learnable graph convolution is only applied to the spatial domain graph convolution, and the time domain convolution still adopts the known time domain convolution method.

In FIG. 2, first, a topological graph sequence of T-th frame is inputted, wherein the topological graph of T-th frame is represented by formula (1), and the topological graph hasN nodes, each node having a feature vector f_v，C_inRepresenting the number of input channels of the space-time graph convolutional neural network; and performing characteristic learning on the input topological graph sequence through a node characteristic learning parameter matrix W and a learnable fusion weight parameter matrix L in the network, wherein,

the link weight value L of the edge between the ith joint and the jth joint in the t frame_ijRepresents the ith row and the jth column of the matrix L; obtaining a new topological graph sequence through a space-time graph convolutional neural network, wherein each node has a new feature vector f'_v，C_outRepresenting the number of channels output by the network.

The convolution of the existing topological learnable graph only learns the relation between the connected nodes, and firstly extracts the characteristics of the subgraph in a bottom-up mode and then fuses to obtain the characteristics of the whole graph. The graph convolution is separated into two operations of feature learning and node fusion, a new topological graph except a topological structure set manually is learned by expanding the range of node fusion, and the self-adaptive relation of the whole topological graph can be learned by a specific initialized topology learning fusion parameter matrix L; expanding the weight matrix in the graph convolution to the whole topological graph structure, learning the relationship between any two nodes in the graph without the limitation of the topological structure, expanding the learning range from the connected nodes to any two nodes; the method does not adopt a bottom-up mode, but can directly learn the characteristics of the whole graph by learning the relation of any two nodes, the parameter quantity is larger than that of learning the subgraph, but the extracted characteristics are more effective.

The invention replaces the feature of the learning sub-topological graph with the feature of the learning whole topological graph for re-fusion, and in order to have equivalent parameters and calculated amount with other self-adaptive graph convolution neural networks, the invention uses simple topology learnable convolution with specific initialization and shows better performance.

S3, extracting the characteristics of the new topological graph sequence through a graph convolution long-term short-term memory neural network (GCNLSTM) to obtain a topological graph sequence with long-term space-time characteristics, as shown in formula (6):

F_vt＝GCNLSTM(STGCN(I)) (6)

wherein, I is a human skeleton topological graph sequence shown in formula (1), STGCN is a time-space graph convolution network based on topological learnable graph convolution, GCNLSTM is a graph convolution long-short term memory network, and the specific implementation mode is shown in formula (7); f_vtThe long-term space-time characteristics of the node v in the t frame, namely a new characteristic vector of the node.

The graph volume long short term memory neural network is shown as a formula (7);

wherein, W_xiAnd W_hiIs the weight of the input and hidden states in the input gate, W_xfAnd W_hfIs the weight of the input and hidden states in the forgetting gate, W_xoAnd W_hoIs the weight of the input and hidden states in the output gate, W_xcIs the weight of the input in the cell state, W_hcIs the weight of the hidden state in the cell state, "+_g"represents a graph convolution operation, X_tIs input at the current time, H_tIs a hidden state at the present time, H_t-1Is a hidden state at the previous moment, b_i，b_f，b_oAnd b_cRespectively, the deviations of the input gate, the forgetting gate, the output gate and the cell state, sigma is an S-shaped Sigmoid function,)；i_t，f_tand o_tGate function values, C, for input, forgetting and output gates, respectively_t-1The state of the cells at the previous moment,

representing a Hadamard product, and tanh is a hyperbolic tangent function; c_tThe cell state at the current time t.

The convolutional network of the space-time diagram can only learn short-term high-level space-time characteristics, and for a data sequence with a time relation, the short-term high-level characteristics are not enough for pattern recognition. The cyclic neural network carries out long-term time modeling on the learned short-term high-level space-time characteristics, fully learns the time relation on the sequence, and obviously improves the effect of mode identification.

S4, further fusing the features of the topological graph sequence by using global pooling operation to obtain global space-time features, effectively solving the problem of inconsistent duration time between different actions, and effectively modeling the relationship between body parts lacking physical connection;

s41, first performing a mean pooling operation on all node features at each time to obtain a feature vector at each time, as shown in equation (8):

wherein, F_vtFor long-term spatiotemporal characteristics, F_tFor the feature vector after the fusion at the time t, GPooling () is a node feature mean pooling function, which represents that mean pooling operation is performed on all nodes of the feature map at each time to obtain the feature vector at each time.

S42, aggregating the feature vectors at each moment by using a time domain mean global pooling operation to obtain global space-time features, as shown in formula (9):

S5, recognizing human body behaviors by using a Softmax classifier based on global space-time characteristics, wherein the Softmax classifier has good flexibility and expandability and improves performance;

specifically, as shown in formula (10);

where C is the number of behavior classes, C_kIs the kth behavior category (common behavior categories are drinking, eating, brushing teeth, combing head, reading and the like actions), S_kAnd S_iThe probabilities that the global space-time feature F belongs to the k-th behavior class and the i-th behavior class are obtained through the known full-connection layer function calculation, and e is a constant and has a value of about 2.718.

The embodiment of the invention discloses a behavior recognition system based on a spatio-temporal graph convolution and a graph convolution long-term and short-term memory network, and as shown in a figure 3, the human behavior recognition method based on the graph convolution network comprises the following steps:

The present invention compares on both data sets with a space-time graph convolutional neural network (ST-GCN), one of the most advanced current graph-convolutional neural networks. On the Kinetics-Skeleton data set, the optimal recognition accuracy rate of the method reaches 36.2 percent, and is 5.5 percent higher than that of ST-GCN; on an NTU-RGBD data set, the optimal recognition accuracy of the method reaches 89.2 percent, which is 7.7 percent higher than that of ST-GCN. Compared with the existing human behavior recognition method based on the time-space graph convolution, the method can directly learn the characteristics of the whole graph, the learned characteristics can better represent human skeleton information, the relationship between two joints without physical connection can be learned, the recognition of human behaviors is more favorable, long-term time characteristics can be learned by using the time relationship of the graph convolution long-term and short-term memory network learning sequence, and the method has more superiority than the time relationship of the ordinary convolution learning sequence.

The method is used for human behavior recognition, including action recognition, gesture recognition and facial expression recognition, and can form a topological graph sequence by taking human joints as nodes of a topological graph based on human skeleton information, and the method is adopted for recognition; for example, the method can be applied to purchasing behavior recognition in unmanned supermarket, better man-machine interaction can be realized by recognizing the behaviors of people by the intelligent robot in home life, the behaviors of specific people in specific places can be recognized in the field of security monitoring, and the like; the method can also be applied to data analysis and other applications with a relational model data structure.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A human behavior recognition method based on graph convolution network is characterized by comprising the following steps:

2. The method for recognizing human body behaviors based on graph convolution network as claimed in claim 1, wherein in step S1, the topological graph sequence of the human body skeleton is composed of a plurality of topological graph structures, and the topological graph structures are represented by formula (1-1);

G＝(V,E)＝(f_v,w_E) (1-1)

3. The human behavior recognition method based on graph convolution network of claim 2, wherein the step S2 specifically includes:

wherein W is a node characteristic learning parameter matrix,is node v_iOf the feature vector of (1), node v_iIs the ith node in the topological graph, W_mRepresenting the m-th dimension of the matrix W,

wherein L represents a topology learnable fusion parameter matrix,

is a node v_iFeature vector of, L_ijIs node v_iAnd v_jWith a learnable fusion weight initialized by normalizing the adjacency matrix or the all-0 matrix, v_jIs a dividing node v in the topological graph_iAll nodes except "⊙" represent the product of the elements of the two matrices,

4. The method for recognizing human body behaviors based on graph convolution network as claimed in claim 3, wherein the topological graph sequence with long-term spatiotemporal features in step S3 is determined according to equation (1-4):

F_vt＝GCNLSTM(STGCN(I)) (1-4)

wherein, F_vtFor the long-time space-time characteristics of a node v in the t-th frame, I is a human skeleton topological graph sequence shown in a formula (1-1), STGCN is a space-time graph convolution network based on topological learnable graph convolution, GCNLSTM is a graph convolution long-term and short-term memory network, and the specific implementation mode is shown in a formula (1-5):

wherein, W_xiAnd W_hiIs the weight of the input and hidden states in the input gate, W_xfAnd W_hfIs the weight of the input and hidden states in the forgetting gate, W_xoAnd W_hoIs the weight of the input and hidden states in the output gate, W_xcIs the weight of the input in the cell state, W_hcIs the weight of the hidden state in the cell state, "+_g"represents a graph convolution operation, X_tIs input at the current time, H_tIs a hidden state at the present time, H_t-1Is a hidden state at the previous moment, b_i，b_f，b_oAnd b_cRespectively an input gate, a forgetting gate, an output gate and a cellThe deviation of the state, σ, is a Sigmoid function,

i_t，f_tand o_tGate function values, C, for input, forgetting and output gates, respectively_t-1The state of the cells at the previous moment,

5. The method for recognizing human body behaviors based on graph convolution network according to claim 4, wherein the step S4 is specifically performed according to the following steps:

wherein, F_vtFor long-term spatiotemporal characteristics, F_tFor the feature vector after the fusion at the time t, GPooling () is a node feature mean pooling function, which represents that mean pooling operation is performed on all nodes of the feature map at each time to obtain the feature vector at each time;

6. The method for recognizing human body behaviors based on graph convolution network according to claim 5, wherein the step S5 is specifically represented by formula (1-8);

7. The human behavior recognition method based on the graph convolution network, characterized in that the topology learnable fusion parameter matrix L and the node feature learning parameter matrix W are both learned and optimized through back propagation.

8. The human body behavior recognition method based on the graph convolution network is characterized in that the determination of the topological graph space domain edge set with the new topological structure is as follows: node v when t frame_tiAnd node v_tjWhen the fusion weight between the nodes is not 0, it represents the node v_tiAnd node v_tjHave a spatial relationship between them, form a new edge.

9. A behavior recognition system based on spatio-temporal graph convolution and graph convolution long and short term memory network, characterized in that, a human behavior recognition method based on graph convolution network as claimed in any one of claims 1-8 is adopted, which includes: