CN114821804A - Attention mechanism-based action recognition method for graph convolution neural network - Google Patents

Attention mechanism-based action recognition method for graph convolution neural network Download PDF

Info

Publication number
CN114821804A
CN114821804A CN202210547472.3A CN202210547472A CN114821804A CN 114821804 A CN114821804 A CN 114821804A CN 202210547472 A CN202210547472 A CN 202210547472A CN 114821804 A CN114821804 A CN 114821804A
Authority
CN
China
Prior art keywords
graph convolution
matrix
module
attention
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210547472.3A
Other languages
Chinese (zh)
Inventor
翟晓东
汝乐
凌涛
凌婧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Austin Photoelectric Technology Co ltd
Original Assignee
Jiangsu Austin Photoelectric Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Austin Photoelectric Technology Co ltd filed Critical Jiangsu Austin Photoelectric Technology Co ltd
Priority to CN202210547472.3A priority Critical patent/CN114821804A/en
Publication of CN114821804A publication Critical patent/CN114821804A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/66Analysis of geometric attributes of image moments or centre of gravity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an attention mechanism-based action recognition method for a graph convolution neural network, which comprises the following steps of: the method comprises the steps of 1, obtaining video stream data of human body action types to be identified, obtaining human body skeleton type data through a posture estimation algorithm to serve as a human body skeleton data set, 2, constructing a coordination attention module, calculating coordination characteristics generated by limbs and trunk in the human body movement process, obtaining a gravity center matrix with the coordination characteristics, adding the gravity center matrix into the human body skeleton data set, and 3, inputting the human body skeleton data set into a double-flow graph convolution neural network and outputting predicted actions. And adds an importance attention module in the double-flow graph convolution neural network. The action recognition model of the invention can improve the final classification accuracy rate and make the existing double-flow self-adaptive graph convolution model more fit with the action recognition task.

Description

Attention mechanism-based action recognition method for graph convolution neural network
Technical Field
The invention relates to the technical field of video motion recognition, in particular to a motion recognition method of a graph convolution neural network based on an attention mechanism.
Background
In the field of machine learning, motion recognition is a very important task, and many scenes such as automatic driving, human-computer interaction, public safety and the like can be used in daily life, so that the task is paid more and more attention to people. At present, due to the explosive development of machine learning and deep learning in recent years, many motion recognition algorithms with excellent performance are emerged, and the motion recognition algorithm based on the space-time diagram convolution achieves excellent performance.
The theory of human body motion balance is described in such a way that in the process of motion, in order to ensure that a person cannot fall down, the body needs to continuously adjust the posture to keep the position of the gravity center basically unchanged, and particularly for athletes, the person can maintain self balance through actions of arm swinging, leg stretching and the like. For ordinary people, daily behaviors and actions also need to be balanced, and the cooperation of limbs and the trunk is needed to ensure that the people cannot fall down. Therefore, in the process of completing a certain action, the limbs have a roughly fixed movement track, for example, in the action of 'running', when the left foot is stepped forward, the right arm must swing backwards, so that the position of the gravity center of the right arm can be kept unchanged, otherwise, the person is in a risk of falling.
In addition, the importance of different joints is different in different actions of the human body, and the more important joints are often not more than one, so that the existing model cannot pay good attention to the extraction of the characteristics. In addition, due to the fixity of physical connection of human bodies, the graph convolution neural network is often fixed when extracting features, the mutual features between several important joints cannot be better noticed from a global perspective, the joints are often disconnected in most actions, for example, in the action of "clapping hands", from the human skeleton diagram, nodes of two hands are not directly connected and are far away, but the two hands are important components of the action of "clapping hands", and the changes of various features are focused on the two hands.
While skeleton-based motion recognition algorithms have achieved excellent results on open data sets, current algorithms ignore both of these problems.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a method and a device for identifying the action of a graph convolution neural network based on a double attention mechanism.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention discloses a method for identifying the action of a graph convolution neural network based on an attention mechanism, which is characterized by comprising the following steps of:
step 1, video stream data of human motion types to be identified are obtained, and human skeleton type data are obtained through a posture estimation algorithm and serve as a human skeleton data set.
Step 2, constructing a coordination attention module, calculating coordination characteristics generated by limbs and trunk in the human body movement process, acquiring a gravity center matrix with the coordination characteristics, and adding the gravity center matrix into a human body skeleton data set;
and 3, inputting the human skeleton data set into a double-flow graph convolution neural network, and outputting a prediction action.
Further, the coordination attention module in the step 2 is used for partitioning the human skeleton, calculating the gravity center matrix and calculating the coordination matrix, so as to obtain the gravity center matrix with coordination characteristics, and the method specifically comprises the following steps:
and 2.1, dividing the human skeleton map into 5 regions according to the structure of the human body, wherein the 5 regions respectively correspond to the head, the left arm, the right arm, the left leg and the right leg to obtain 5 region subgraphs.
Step 2.2, calculating the gravity center point of each area;
the center of gravity point coordinates of each region are calculated using the following formula:
Figure BDA0003649990070000021
in the formula, w x Abscissa, x, of the center of gravity of the region n The abscissa of each joint point in the region is shown, n is the reference number of the joint point in the region, and n is 1,2, …, n. The calculation method of the y coordinate and the z coordinate of the area gravity center point is the same as the above formula.
Figure BDA0003649990070000022
Figure BDA0003649990070000023
The resulting center of gravity matrix is shown in the form:
(w 1 ,w 2 ,w 3 ,w 4 ,w 5 )
and 2.3, calculating the coordination between every two areas according to the covariance matrix.
The covariance matrix calculation formula is as follows:
Figure BDA0003649990070000024
w i 、w j respectively represent (w) 1 ,w 2 ,w 3 ,w 4 ,w 5 ) 1,2 … … 5, as shown in the following form:
wherein cov (-) represents the calculation result of covariance, w i And w j The coordinates of the center of gravity point of the area are represented,
Figure BDA0003649990070000038
represents the average of the coordinates of the gravity point of each area.
Step S2.4, calculating a coordination matrix according to the formula, wherein the form of the coordination matrix is as follows:
Figure BDA0003649990070000031
step S2.5, 3 groups of coordination matrixes with the size of 5 multiplied by 5 can be obtained according to the step 2.4, and the 3 groups of coordination matrixes are respectively expressed as w x ,w y And w z The 3 sets of matrices can be used to represent the coordination characteristics of the body. Will w x ,w y And w z The compression is carried out in the same size as the dimension of the gravity matrix, the compression method is carried out in a mode of adding column by column, the first column is taken as an example, and the compression method is specifically illustrated as follows:
w 1 ′=cov(w 1 ,w 1 )+cov(w 2 ,w 1 )+cov(w 3 ,w 1 )+cov(w 4 ,w 1 )+cov(w 5 ,w 1 )
s2.6, adding the gravity center matrix and the compressed harmony matrix to obtain the gravity center matrix with harmony characteristics
Figure BDA0003649990070000032
Wherein,
Figure BDA0003649990070000033
Figure BDA0003649990070000034
Figure BDA0003649990070000035
Figure BDA0003649990070000036
Figure BDA0003649990070000037
furthermore, due to the limitation of the human body topological graph, the graph volume model is difficult to learn the relationship among various end nodes, and the nodes are often the important components of the action. In addition, the deep-layer graph volume model is easy to cause the phenomenon of over-smooth characteristics, and the deep-layer model is not suitable to be used. The invention introduces an attention mechanism into the model, acquires the importance characteristics of the nodes in the characteristic diagram and transfers the importance characteristics to the original characteristic diagram. The invention provides an importance attention module which is directly operated on a feature map without using an adjacent matrix when extracting features in importance and can effectively overcome the limitation caused by a graph convolution neural network.
The double-flow graph convolution neural network in the step 3 comprises an importance attention module, and a double-flow graph convolution neural network with the importance attention module is constructed; comprises the following steps
Step 3.1, constructing an adaptive graph convolution module, wherein the adaptive graph convolution module comprises a space graph convolution layer convs, a time graph convolution layer convt, an additional random discarding treatment Dropout and a residual connection which are sequentially connected; in addition, a batch standardization layer and an activation function layer are respectively connected behind the space map convolution layer and the time convolution layer;
step 3.2, building an adaptive graph convolution module with an importance attention module, wherein the adaptive graph convolution module comprises a space graph convolution layer convs, an importance attention module, a time graph convolution layer convt, an importance attention module, an additional random discarding processing Dropout and a residual error connection which are sequentially connected; in addition, a batch standardization layer and an activation function layer are respectively connected behind the two importance attention modules;
step 3.3, constructing J flow module and B flow module
The J flow module and the B flow module have the same structure and respectively comprise a data BN layer, 9 adaptive graph convolution modules and 1 adaptive graph convolution module with an importance attention module, wherein the data BN layer is respectively added in front of the 10 adaptive graph convolution modules so as to standardize input data.
Step 3.4, building a double-flow graph convolution neural network with an important attention module
The double-flow graph convolutional neural network with the importance attention module comprises a J flow module, a B flow module, an importance attention module and two SoftMax layers; inputting a human skeleton data set with harmony characteristics into the dual-flow graph convolutional neural network, processing the human skeleton data set to generate joint data and skeleton data, respectively inputting the joint data and the skeleton data into the J flow module and the B flow module, executing a global average pooling layer after the processing to pool the characteristic mapping of different samples into the same size, respectively passing through the importance attention module and then reaching two SoftMax layers to obtain classification scores of the two flows, and then adding the two scores to obtain a fused score and predict an action label.
Further, the importance attention module in step 3 comprises two convolution layers and a Softmax layer, and the specific data processing process is as follows:
step 3.1.1, the characteristic diagram A is sent into two convolution layers to obtain two new characteristic diagrams B and C,
Figure BDA0003649990070000041
where nxm is the product of the batch size and the number of people, C represents the number of channels, T represents the number of action frames, and V represents the number of joint points.
Step 3.1.2, recombining the characteristic diagrams B and C
Figure BDA0003649990070000042
Where D ═ N × M × T × V, TableThe number of feature points on each channel is shown.
Step 3.1.3, perform matrix multiplication operation on the transposes of feature maps B and C and calculate the position attention feature map S by using Softmax layer, here
Figure BDA0003649990070000043
The calculation formula of the attention feature map is shown in the following form:
Figure BDA0003649990070000051
s ji showing the influence of the ith position on the characteristics of the jth position on the characteristic diagram.
Step 3.1.4, the characteristic diagram S is recombined and multiplied by a scale coefficient alpha, and the multiplied value is added with the input characteristic diagram A to obtain the final output characteristic M, wherein
Figure BDA0003649990070000052
The initial value of α is set to 0, and a larger weight can be learned gradually.
The specific calculation mode is expressed as follows:
Figure BDA0003649990070000053
E j it is shown that the feature M for each location is a weighted sum of all location features and the original features. It has a global view and can selectively aggregate context information according to spatial attention.
Further, the spatial map convolution layer convs in step 3.1 is used to learn the topology of the network from the adjacency matrix a in an end-to-end manner k It was decided to obtain the following form:
Figure BDA0003649990070000054
in the formula K v Is set to be 3, W k Is a weight matrix, A k Is an N x N contiguous matrix that represents the physical structure of the human body. B is k Is also an N contiguous matrix, C k Representing a data correlation graph for learning a unique graph for each sample, and in order to determine whether there is a connection between two adjacent nodes and how strong the connection is, the model uses a normalized gaussian function to calculate the similarity of the two nodes, as shown in the following form:
Figure BDA0003649990070000055
n in the formula represents the number of all nodes. The model uses dot product to calculate the similarity of two nodes in the embedding space, and the obtained C k The method is an NxN similarity matrix, and the model normalizes values to be 0-1.
In a second aspect, an embodiment of the present invention provides an attention-based atlas neural network structure, which includes a human skeleton data set generation module, a coordinated attention module, a dual-flow atlas neural network, and a dual-flow atlas neural network.
The human body skeleton data set generation module is used for acquiring video stream data of human body action types to be identified, processing the imported video stream data by adopting the existing posture estimation algorithm to obtain human body skeleton type data and human body skeleton graphs, generating and simultaneously obtaining the coordinate and confidence characteristic of each key node, and generating a human body skeleton data set;
the coordination characteristic extraction module is used for calculating coordination characteristics generated during human body movement, dividing a human body skeleton diagram into 5 partitions, combining the definition of the center of gravity and the definition of a covariance matrix, calculating the interrelation between every two centers of gravity on each partition to obtain a group of coordination characteristic matrixes, and endowing the coordination characteristic matrixes to the original characteristic diagram so that the input data of the model has human body coordination characteristics;
the double-flow graph convolution neural network is used for carrying out action recognition on the human skeleton data set containing the harmony characteristics. The double-flow graph convolution neural network further comprises an importance feature extraction module, more key joints in human motion are extracted in the global visual angle range by using an improved space attention mechanism, the module is directly operated on the feature graph, the limitation of topological graph connection can be overcome, and more important joints for motion composition are extracted from the global visual angle.
The invention has the beneficial effects that: on one hand, the invention is inspired by the theory of human body movement coordination, and designs a module capable of extracting the coordination characteristics generated during the human body movement. On the other hand, aiming at the defects of the existing graph volume model and combining the existing attention mechanism, an improved space attention module is provided, and the module is used for learning more important joint features in the global scope. The action recognition model of the invention can improve the final classification accuracy rate and make the existing double-flow self-adaptive graph convolution model more fit with the action recognition task.
Drawings
FIG. 1 is a schematic view of a human skeletal zone strategy according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of a module for coordinating attention according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of an adaptive graph rolling model according to an embodiment of the invention.
Fig. 4 is a schematic diagram of a motion recognition model of a dual attention mechanism-based graph convolution neural network according to an embodiment of the present invention.
FIG. 5 is a schematic diagram of an importance attention module according to an embodiment of the present invention.
FIG. 6 is a diagram of an adaptive graph volume model of a fusion importance attention module according to an embodiment of the present invention.
Detailed Description
The present invention will now be described in further detail with reference to the accompanying drawings.
The action recognition method of the graph convolution neural network based on the attention mechanism comprises the following steps:
step 1, acquiring video stream data of a human body action type to be identified, and processing the imported video stream data by adopting an existing posture estimation algorithm to generate a human body skeleton data set.
Step 2, constructing a coordination attention module, calculating coordination characteristics generated by limbs and trunk in the human body movement process, and adding the coordination characteristics into a human body skeleton data set;
the coordination attention module is a calculation unit and is used for partitioning the human skeleton, calculating a gravity center matrix and calculating a coordination matrix so as to acquire the gravity center matrix with coordination characteristics, and the coordination attention module specifically comprises the following steps:
and 2.1, dividing a human skeleton map into 5 regions according to the structure of a human body, wherein the 5 regions respectively correspond to the head, the left arm, the right arm, the left leg and the right leg to obtain 5 region subgraphs as shown in the figure 1.
And 2.2, calculating the gravity point of each area according to the definition of the gravity point in mathematics and physics. In order to reduce the calculation amount of the model, the module uses the gravity centers of all the areas to calculate the harmony, so that the calculation amount is much smaller than that of the joint which is directly used, and the problem that the nodes of all the parts are inconsistent can be effectively avoided. The center of gravity point coordinates of each region are calculated using the following formula:
Figure BDA0003649990070000071
in the formula, the abscissa, x, of the gravity center point of the w region n The abscissa of each joint point in the region is shown, n is the reference number of the joint point in the region, and n is 1,2, …, n. The calculation method of the y-coordinate and the z-coordinate is the same as the above formula.
Figure BDA0003649990070000072
Figure BDA0003649990070000073
The resulting center of gravity matrix is shown in the form:
(w 1 ,w 2 ,w 3 ,w 4 ,w 5 )
step S2.3, the covariance matrix can be used to calculate the harmony between two regions according to its definition. The invention rewrites the calculation formula of covariance matrix to make X i 、X j Are respectively equal to (w) 1 ,w 2 ,w 3 ,w 4 ,w 5 ) 1,2 … … 5, as shown in the following form:
Figure BDA0003649990070000074
cov (. cndot.) in the formula represents the calculation result of covariance, w i And w j The coordinates of the center of gravity point of the area are represented,
Figure BDA0003649990070000075
representing the average of the coordinates of the center of gravity point of each region, the values of i, j may be equal.
Step S2.4, calculating a coordination matrix according to the formula, wherein the form of the coordination matrix is as follows:
Figure BDA0003649990070000081
step S2.5, 3 groups of coordination matrixes with the size of 5 multiplied by 5 can be obtained according to the step 2.4, and the 3 groups of coordination matrixes are respectively expressed as w x ,w y And w z The 3 sets of matrices can be used to represent the coordination characteristics of the body. Will w x ,w y And w z The compression is carried out in the same size as the dimension of the gravity matrix, the compression method is carried out in a mode of adding column by column, the first column is taken as an example, and the compression method is specifically illustrated as follows:
Figure BDA0003649990070000082
s2.6, adding the gravity center matrix and the compressed harmony matrix to obtain the gravity center matrix with harmony characteristics
Figure BDA0003649990070000083
Wherein,
Figure BDA0003649990070000084
Figure BDA0003649990070000085
Figure BDA0003649990070000086
Figure BDA0003649990070000087
Figure BDA0003649990070000088
and 3, constructing an adaptive graph convolution layer with an important attention mechanism. Fig. 3 is a schematic diagram of an adaptive graph convolution model according to an embodiment of the present invention, fig. 4 is a schematic diagram of a motion recognition model of a graph convolution neural network based on a dual attention mechanism according to an embodiment of the present invention, fig. 5 is a schematic diagram of an importance attention module according to an embodiment of the present invention, and fig. 6 is a schematic diagram of an adaptive graph convolution model with an importance attention module according to an embodiment of the present invention.
Step 3.1, constructing an adaptive graph convolution model with an importance attention module, wherein the adaptive graph convolution model comprises a space graph convolution layer convs, a first importance attention module, a time graph convolution layer convt, a second importance attention module, an additional random discarding treatment Dropout and a residual connection which are sequentially connected; wherein Dropout is set to 0.5; the first importance attention module and the second importance attention module are respectively connected with a batch standardization layer and an activation function layer;
step 3.1.1, convolving the space map to obtain a characteristic map A,
Figure BDA0003649990070000089
where nxm is the product of the batch size and the number of people, C represents the number of channels, T represents the number of action frames, and V represents the number of joint points. Then the feature map A is fed into two 1 x 1 convolutional layers to obtain two new feature maps B and C,
Figure BDA0003649990070000091
step 3.1.2, recombining the characteristic diagrams B and C
Figure BDA0003649990070000092
Where D is (N × M) × T × V, the number of feature points per channel is indicated.
Step 3.1.3, perform a matrix multiplication operation on the transposes of B and C and calculate a position attention feature map S using the Softmax layer, here
Figure BDA0003649990070000093
The calculation formula of the attention feature map is shown in the following form:
Figure BDA0003649990070000094
s ji showing the influence of the ith position on the characteristics of the jth position on the characteristic diagram.
Step 3.1.4, the characteristic diagram S is recombined and multiplied by a scale coefficient alpha, and the obtained product is added with the characteristic diagram A to obtain the final output characteristic M, wherein
Figure BDA0003649990070000095
The initial value of α is set to 0, and can be gradually learned to be largerThe weight of (c). The representation form is as follows:
Figure BDA0003649990070000096
E j it is shown that the feature M for each location is a weighted sum of all location features and the original features. It has a global view and can selectively aggregate context information according to spatial attention.
And S4, constructing a double-flow graph convolution neural network with an important attention module, taking the data set obtained in the step 1 as input data of the double-flow graph convolution neural network, and taking a predicted action label as output data. Two attention modules in step S2 and step S3 are merged into a dual-flow graph convolutional neural network, as shown in fig. 4.
S4.1, building a spatial graph convolution layer convs; for learning the topology of the network from the adjacency matrix A in an end-to-end manner k It was decided to obtain the following form:
Figure BDA0003649990070000097
in the formula K v Is set to be 3, W k Is a weight matrix, A k Is an N x N contiguous matrix that represents the physical structure of the human body. B is k Is also an N contiguous matrix, but B k The interior value has no specific constraint condition, which means that the graph is completely learned from training data, and based on the data-driven task, the model can completely learn the graph from the target task, considering B k The value in (2) may be any value that can represent not only the physical structure of the human body but also the connection strength between adjacent nodes. C k Can represent a data correlation graph which can learn a unique graph for each sample, and in order to determine whether a connection exists between two adjacent nodes and how the connection strength is, the model uses a normalized Gaussian function to calculate the similarity of the two nodes, as shown in the following form:
Figure BDA0003649990070000101
N in the formula represents the number of all nodes. The model uses dot product to calculate the similarity of two nodes in the embedding space, and the obtained C k The method is an NxN similarity matrix, and the model normalizes values to be 0-1.
S4.2, building an adaptive graph convolution module; the self-adaptive graph convolution module comprises a spatial graph convolution layer convs, a temporal graph convolution layer convt, an additional random discarding treatment Dropout and a residual error connection which are sequentially connected; wherein Dropout is set to 0.5; a batch of standardization layers and an activation function layer are respectively connected behind the space map convolutional layers and the time convolutional layers;
s4.3, building an adaptive graph convolution module with an important attention module; on the basis of an original adaptive graph convolution module, an importance attention module proposed in the step S3 is added after the space graph convolution and the time convolution respectively;
s4.4, stacking the self-adaptive graph convolution modules to build a self-adaptive graph convolution network, wherein the model comprises 10 layers in total, 9 self-adaptive graph convolution modules and 1 self-adaptive graph convolution module with an importance attention module, which is provided in the step 4.3, and the output channels of each module are 64, 128, 256 and 256 respectively; adding a data BN layer at the beginning to standardize input data, executing a global tie pooling layer after the 10 th layer is finished to pool feature maps of different samples to the same size, and sending final model output to a SoftMax classifier to obtain a prediction result;
s4.5, constructing a motion recognition model of the graph convolution neural network with a double attention mechanism; and (5) dividing the bone data with the harmony characteristics into joint flow data and bone flow data, respectively inputting the joint flow data and the bone flow data into the model built in the step (S4.4), respectively obtaining SoftMax classification scores of the two flows, then adding the two scores to obtain a fused score, and predicting the action tag.
The invention also provides an attention-based atlas neural network, as shown in fig. 4, the network comprises a human skeleton data set generation module, a coordinated attention module and a double-flow atlas neural network.
The human body skeleton data set generation module is used for acquiring video stream data of human body action types to be identified, processing the imported video stream data by adopting the existing posture estimation algorithm to obtain human body skeleton type data and human body skeleton graphs, generating and simultaneously obtaining the coordinate and confidence characteristic of each key node, and generating a human body skeleton data set;
the coordination attention module is used for calculating coordination characteristics generated during human body movement, dividing a human body skeleton map into 5 partitions, combining the definition of the center of gravity and the definition of a covariance matrix, calculating the interrelation between every two centers of gravity on each partition to obtain a group of coordination characteristic matrixes, and endowing the coordination characteristic matrixes to the original characteristic map so that the input data of the model has human body coordination characteristics;
the double-flow graph convolution neural network construction module is used for constructing a double-flow graph convolution neural network, wherein joint data and bone data are respectively used as input characteristics of double flows, and a prediction action tag is used as output data.
The double-flow graph convolution neural network comprises an importance attention module, the more key joints of human body movement are extracted in the global visual angle range by using an improved space attention mechanism, the module is directly operated on a characteristic graph, the limitation of topological graph connection can be overcome, and the more important joints for movement composition are extracted from the global visual angle.
Through the network of the second embodiment of the invention, the transmission object is determined through the data containing relation of the whole resume application, and the aim of identifying the human body action in the video stream is achieved. The network provided by the embodiment of the invention can execute the action recognition method based on the self-adaptive graph convolution neural network provided by any embodiment of the invention, and has corresponding functions and beneficial effects of the execution method.

Claims (5)

1. The method for identifying the action of the graph convolution neural network based on the attention mechanism is characterized by comprising the following steps of:
step 1, acquiring video stream data of human motion types to be identified, obtaining human skeleton type data by a posture estimation algorithm as a human skeleton data set,
step 2, constructing a coordination attention module, calculating coordination characteristics generated by limbs and trunk in the process of human body movement, acquiring a gravity center matrix with the coordination characteristics, adding the gravity center matrix into a human body skeleton data set,
and 3, inputting the human skeleton data set into a double-flow graph convolution neural network, and outputting a prediction action.
2. The method for identifying actions of the attention-based graph convolution neural network according to claim 1, wherein the coordinated attention module in the step 2 is used for carrying out partition, center-of-gravity matrix calculation and coordinated matrix calculation on human bones so as to obtain a center-of-gravity matrix with coordinated characteristics, and specifically comprises the following steps:
step 2.1, dividing a human skeleton map into 5 regions according to the structure of a human body, wherein the 5 regions respectively correspond to a head, a left arm, a right arm, a left leg and a right leg to obtain 5 region sub-maps;
step 2.2, calculating the gravity center point of each area;
the center of gravity point coordinates of each region are calculated using the following formula:
Figure FDA0003649990060000011
in the formula, w x Abscissa, x, of the center of gravity of the region n The abscissa of each joint point in the region is shown, n is the index of the joint point in the region, and n is 1,2, …, n; in addition, the calculation methods of the y coordinate and the z coordinate of the area gravity center point are the same as the above formula;
Figure FDA0003649990060000012
Figure FDA0003649990060000013
the resulting center of gravity matrix is shown in the form:
(w 1 ,w 2 ,w 3 ,w 4 ,w 5 )
step 2.3, calculating the coordination between every two areas according to the covariance matrix;
the covariance matrix calculation formula is as follows:
Figure FDA0003649990060000014
w i 、w j respectively represent (w) 1 ,w 2 ,w 3 ,w 4 ,w 5 ) 1,2 … … 5, as shown in the following form:
wherein cov (-) represents the calculation result of covariance, w i And w j The coordinates of the center of gravity point of the area are represented,
Figure FDA0003649990060000015
representing the average of the coordinates of the gravity center point of each area;
step S2.4, calculating a coordination matrix according to the formula, wherein the form of the coordination matrix is as follows:
Figure FDA0003649990060000021
step S2.5, 3 groups of coordination matrixes with the size of 5 multiplied by 5 can be obtained according to the step 2.4, and the 3 groups of coordination matrixes are respectively expressed as w x ,w y And w z The 3 groups of matrixes can be used for representing the coordination characteristics of the body; will w x ,w y And w z Compressed into matrix dimension of center of gravityThe compression method is performed in a form of adding column by column, taking the first column as an example, and the compression method is specifically illustrated in the following form:
w 1 ′=cov(w 1 ,w 1 )+cov(w 2 ,w 1 )+cov(w 3 ,w 1 )+cov(w 4 ,w 1 )+cov(w 5 ,w 1 )
s2.6, adding the gravity center matrix and the compressed harmony matrix to obtain the gravity center matrix with harmony characteristics
Figure FDA0003649990060000022
Wherein,
Figure FDA0003649990060000023
Figure FDA0003649990060000024
Figure FDA0003649990060000025
Figure FDA0003649990060000026
Figure FDA0003649990060000027
3. the method for identifying an action of a graph convolution neural network based on an attention mechanism according to claim 1, wherein the dual-flow graph convolution neural network in the step 3 comprises an importance attention module, and a dual-flow graph convolution neural network with the importance attention module is constructed; comprises the following steps
Step 3.1, constructing an adaptive graph convolution module, wherein the adaptive graph convolution module comprises a space graph convolution layer convs, a time graph convolution layer convt, an additional random discarding treatment Dropout and a residual connection which are sequentially connected; in addition, a batch standardization layer and an activation function layer are respectively connected behind the space map convolution layer and the time convolution layer;
step 3.2, building an adaptive graph convolution module with an importance attention module, wherein the adaptive graph convolution module comprises a space graph convolution layer convs, an importance attention module, a time graph convolution layer convt, an importance attention module, an additional random discarding processing Dropout and a residual error connection which are sequentially connected; in addition, a batch standardization layer and an activation function layer are connected behind the two importance attention modules respectively;
step 3.3, constructing J flow module and B flow module
The J stream module and the B stream module have the same structure and respectively comprise a data BN layer, 9 adaptive graph convolution modules and 1 adaptive graph convolution module with an importance attention module, wherein the data BN layer is respectively added in front of the 10 adaptive graph convolution modules to standardize input data;
step 3.4, building a double-flow graph convolution neural network with an important attention module
The double-flow graph convolutional neural network with the importance attention module comprises a J flow module, a B flow module, an importance attention module and two SoftMax layers; inputting a human skeleton data set with harmony characteristics into the dual-flow graph convolutional neural network, processing the human skeleton data set to generate joint data and skeleton data, respectively inputting the joint data and the skeleton data into the J flow module and the B flow module, executing a global average pooling layer after the processing to pool the characteristic mapping of different samples into the same size, respectively passing through the importance attention module and then reaching two SoftMax layers to obtain classification scores of the two flows, and then adding the two scores to obtain a fused score and predict an action label.
4. The method for identifying actions of the attention-based graph convolution neural network according to claim 3, wherein the importance attention module comprises two convolution layers and a Softmax layer, and the specific data processing procedures are as follows:
step 3.1.1, the characteristic diagram A is sent into two convolution layers to obtain two new characteristic diagrams B and C,
Figure FDA0003649990060000031
wherein NxM is the product of batch size and number of people, C represents the number of channels, T represents the number of action frames, and V represents the number of joint points;
step 3.1.2, recombining the characteristic diagrams B and C
Figure FDA0003649990060000032
Wherein D ═ N × M × T × V, denotes the number of feature points on each channel;
step 3.1.3, perform matrix multiplication operation on the transposes of feature maps B and C and calculate the position attention feature map S by using Softmax layer, here
Figure FDA0003649990060000033
The calculation formula of the attention feature map is shown in the following form:
Figure FDA0003649990060000034
s ji representing the influence of the ith position on the characteristics of the jth position on the characteristic diagram;
step 3.1.4, the characteristic diagram S is recombined and multiplied by a scale coefficient alpha, and the multiplied value is added with the input characteristic diagram A to obtain the final output characteristic M, wherein
Figure FDA0003649990060000035
The α initial value is set to 0, and a larger weight can be learned gradually;
the specific calculation mode is expressed as follows:
Figure FDA0003649990060000041
E j it is shown that the feature M for each location is a weighted sum of all location features and the original features.
5. The method of claim 3, wherein the spatial map convolution layer convs is used to learn the topology of the network from the adjacency matrix A in an end-to-end manner k It was decided to obtain the following form:
Figure FDA0003649990060000042
in the formula K v Is set to be 3, W k Is a weight matrix, A k Is an N × N adjacency matrix, which represents the physical structure of the human body; b is k Is also an N contiguous matrix, C k Representing a data correlation graph for learning a unique graph for each sample, and in order to determine whether there is a connection between two adjacent nodes and how strong the connection is, the model uses a normalized gaussian function to calculate the similarity of the two nodes, as shown in the following form:
Figure FDA0003649990060000043
n in the formula represents the number of all nodes; the model uses dot product to calculate the similarity of two nodes in the embedding space, and the obtained C k The method is an NxN similarity matrix, and the model normalizes values to be 0-1.
CN202210547472.3A 2022-05-18 2022-05-18 Attention mechanism-based action recognition method for graph convolution neural network Pending CN114821804A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210547472.3A CN114821804A (en) 2022-05-18 2022-05-18 Attention mechanism-based action recognition method for graph convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210547472.3A CN114821804A (en) 2022-05-18 2022-05-18 Attention mechanism-based action recognition method for graph convolution neural network

Publications (1)

Publication Number Publication Date
CN114821804A true CN114821804A (en) 2022-07-29

Family

ID=82515933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210547472.3A Pending CN114821804A (en) 2022-05-18 2022-05-18 Attention mechanism-based action recognition method for graph convolution neural network

Country Status (1)

Country Link
CN (1) CN114821804A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116071809A (en) * 2023-03-22 2023-05-05 鹏城实验室 Face space-time representation generation method based on multi-class representation space-time interaction
CN116091496A (en) * 2023-04-07 2023-05-09 菲特(天津)检测技术有限公司 Defect detection method and device based on improved Faster-RCNN
CN116189054A (en) * 2023-02-27 2023-05-30 江南大学 Man-machine cooperation method and man-machine cooperation system based on neural network

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116189054A (en) * 2023-02-27 2023-05-30 江南大学 Man-machine cooperation method and man-machine cooperation system based on neural network
CN116071809A (en) * 2023-03-22 2023-05-05 鹏城实验室 Face space-time representation generation method based on multi-class representation space-time interaction
CN116091496A (en) * 2023-04-07 2023-05-09 菲特(天津)检测技术有限公司 Defect detection method and device based on improved Faster-RCNN
CN116091496B (en) * 2023-04-07 2023-11-24 菲特(天津)检测技术有限公司 Defect detection method and device based on improved Faster-RCNN

Similar Documents

Publication Publication Date Title
CN114821804A (en) Attention mechanism-based action recognition method for graph convolution neural network
CN111476181B (en) Human skeleton action recognition method
CN110472554A (en) Table tennis action identification method and system based on posture segmentation and crucial point feature
CN109753891A (en) Football player's orientation calibration method and system based on human body critical point detection
CN111160294B (en) Gait recognition method based on graph convolution network
CN113408455A (en) Action identification method, system and storage medium based on multi-stream information enhanced graph convolution network
CN113688765B (en) Action recognition method of self-adaptive graph rolling network based on attention mechanism
CN111191630B (en) Performance action recognition method suitable for intelligent interactive viewing scene
Devo et al. Deep reinforcement learning for instruction following visual navigation in 3D maze-like environments
CN113378656B (en) Action recognition method and device based on self-adaptive graph convolution neural network
CN111223168A (en) Target object control method and device, storage medium and computer equipment
CN113642379A (en) Human body posture prediction method and system based on attention mechanism fusion multi-flow graph
WO2023226186A1 (en) Neural network training method, human activity recognition method, and device and storage medium
CN114529984A (en) Bone action recognition method based on learnable PL-GCN and ECLSTM
CN113239897A (en) Human body action evaluation method based on space-time feature combination regression
CN112121419B (en) Virtual object control method, device, electronic equipment and storage medium
CN112446253B (en) Skeleton behavior recognition method and device
CN116189306A (en) Human behavior recognition method based on joint attention mechanism
CN109784295B (en) Video stream feature identification method, device, equipment and storage medium
CN116189284A (en) Human motion prediction method, device, equipment and storage medium
CN111626109A (en) Fall-down behavior analysis and detection method based on double-current convolutional neural network
CN117475518B (en) Synchronous human motion recognition and prediction method and system
CN113240714B (en) Human motion intention prediction method based on context awareness network
Zhu et al. Tri-HGNN: Learning triple policies fused hierarchical graph neural networks for pedestrian trajectory prediction
US20240177525A1 (en) Multi-view human action recognition method based on hypergraph learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination