CN113378656A - Action identification method and device based on self-adaptive graph convolution neural network - Google Patents

Action identification method and device based on self-adaptive graph convolution neural network Download PDF

Info

Publication number
CN113378656A
CN113378656A CN202110564099.8A CN202110564099A CN113378656A CN 113378656 A CN113378656 A CN 113378656A CN 202110564099 A CN202110564099 A CN 202110564099A CN 113378656 A CN113378656 A CN 113378656A
Authority
CN
China
Prior art keywords
data
graph convolution
neural network
action
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110564099.8A
Other languages
Chinese (zh)
Other versions
CN113378656B (en
Inventor
胡凯
丁益武
陆美霞
黄昱锟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202110564099.8A priority Critical patent/CN113378656B/en
Publication of CN113378656A publication Critical patent/CN113378656A/en
Application granted granted Critical
Publication of CN113378656B publication Critical patent/CN113378656B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a motion recognition method and a device based on a self-adaptive graph convolution neural network, wherein the method comprises the following steps: s1, generating a human skeleton data set; s2, taking the angle between the adjacent bone edges as the spatial feature of the deep layer; s3, calculating the average energy change value of each key node, and taking the average energy change value as the deep time characteristic; s4, constructing a dual-flow graph convolutional neural network; s5, expanding the double-flow graph convolution neural network, connecting 2 newly-added sub-networks in parallel, and constructing an action recognition model, wherein the newly-added sub-networks are respectively used for processing the spatial characteristics and the time characteristics; the action recognition model is used for simultaneously processing joint data, skeleton data, deep spatial features and deep temporal features, and calculating to obtain corresponding action types. The invention can effectively improve the recognition precision of the graph convolution network in the field of action recognition.

Description

Action identification method and device based on self-adaptive graph convolution neural network
Technical Field
The invention relates to the technical field of video flow recognition, in particular to a motion recognition method and device based on an adaptive graph convolution neural network.
Background
In the field of machine learning, motion recognition is a very important task, and many scenes such as automatic driving, human-computer interaction, public safety and the like can be used in daily life, so that the task is paid more and more attention to people. At present, due to the explosive development of machine learning and deep learning in recent years, many motion recognition algorithms with excellent performance are emerged, and the motion recognition algorithm based on the space-time diagram convolution achieves excellent performance.
The existing action recognition algorithm based on the graph neural network only utilizes the characteristics of a very shallow layer, firstly, the coordinates of key points of a human body obtained by a posture estimation algorithm and the confidence coefficient of the coordinates are directly utilized as the characteristics, and the position relation between the key points and the bones is neglected. For example, for the key point at the shoulder, it depends on where the upper body is located, while it determines the position of the upper arm; secondly, there is no obvious distinction between the duration of the action, such as falling and lying down, the action is similar, and it is obvious that the falling is faster than the lying down. The existence of these problems indicates that the existing methods still do not sufficiently extract information of data.
Thus, while skeleton-based motion recognition algorithms have achieved excellent results on open datasets, current algorithms all utilize only relatively shallow features, do not consider the association of skeleton data nodes with edges, do not consider the association of edges, and do not have an effective solution to the indistinguishable problem of motions like "fall" and "lie down".
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a motion recognition method and a device based on a self-adaptive graph convolution neural network.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, an embodiment of the present invention provides a motion recognition method based on an adaptive graph convolution neural network, where the motion recognition method includes:
s1, acquiring video stream data of the human body action type to be identified, processing the imported video stream data by adopting the existing posture estimation algorithm to obtain human body skeleton type data and human body skeleton graphics, generating and simultaneously obtaining coordinates and confidence characteristics of each key node, and generating a human body skeleton data set;
s2, calculating the change of angular momentum when the skeleton rotates around a key node in the process of human motion, and taking the angle between adjacent skeleton edges as a deep spatial feature;
s3, extracting energy information in the action duration time of the human body, accumulating the angle difference generated by the rotation of the skeleton around the key nodes to obtain the sum of angle change in the action duration time, dividing the accumulated sum of the angle difference corresponding to each key node by the number of the key frames of the current action, and calculating to obtain the average energy change value of each key node as the deep time characteristic;
s4, constructing a dual-flow graph convolutional neural network, wherein joint data and bone data are respectively used as input data of J flow and B flow, and predicted action labels are used as output data;
s5, expanding the double-flow graph convolution neural network, connecting 2 newly-added sub-networks in parallel, and constructing an action recognition model, wherein the newly-added sub-networks are respectively used for processing the spatial characteristics and the time characteristics; the action recognition model is used for simultaneously processing joint data, skeleton data, deep spatial features and deep temporal features, and calculating to obtain corresponding action types.
Optionally, in step S2, the step of calculating the change of angular momentum when the bone rotates around the key node in the human motion process, and using the variable of the angle between the adjacent bone edges as the deep spatial feature includes the following steps:
s21, calculating angles between all adjacent bones according to the coordinates and physical connection of each key node in the human skeleton data set; when the degree of the node is 1, the node only has one edge and does not calculate an angle; when the degree of the node is 2, namely one node is connected with two edges, an angle smaller than 180 degrees is calculated; when the degree of the node is 3, namely one node is connected with 3 edges, calculating 3 angles; when the degree of the node is 4, 4 angles are calculated;
s22, aiming at all angles in n key frames in the whole action duration, combining the calculated angles into a matrix form according to the sequence of key nodes and video frames, and expanding the obtained angle matrix to be:
Figure BDA0003080208800000021
wherein m is the total number of angles,
Figure BDA0003080208800000022
is the value of the ith angle in the jth key frame, i is 1,2, …, m, j is 1,2, …, n;
s23, subtracting the corresponding angle of the corresponding key point of the previous frame from the angle of any key point of the next frame to obtain the angle difference formed by the surrounding edge of the same node between the adjacent frames; calculating an angle difference matrix delta theta formed by surrounding bones of adjacent frames by taking the same node as a central point:
Figure BDA0003080208800000023
in the formula (I), the compound is shown in the specification,
Figure BDA0003080208800000024
is the value of the mth angle in the (n-1) th key frame.
Optionally, in step S3, the extracting energy information within the action duration of the human body, accumulating the angle differences generated by the rotation of the bone around the key nodes to obtain a total of angle changes within the action duration, and dividing the accumulated total of angle differences corresponding to each key node by the number of key frames of the current action to obtain an average energy change value of each key node, where the process of using the average energy change value as the deep-level time feature includes the following steps:
s31, accumulating and summing the calculated angle difference matrix delta theta according to the time sequence to obtain the angle change sum theta on each nodeI,θIThe expression of (A) is as follows:
Figure BDA0003080208800000025
wherein the subscript "1 to m-1" represents the number of the key node,
Figure BDA0003080208800000026
"1-n-1" in the superscript represents key frames, constituting a 1 × (m-1) energy matrix θI
S21, theta obtained in the step S31IDividing the current action frame number to obtain the average energy theta of the current actionaWherein
Figure BDA0003080208800000031
And n is the number of key frames extracted by the attitude estimation algorithm.
Optionally, in step S4, the process of constructing the dual-flow graph convolutional neural network includes the following steps:
step 4.1: building a self-adaptive graph volume layer; the adaptive graph convolution layer is used for optimizing the topology of the network together with other parameters of the network in an end-to-end learning mode, the skeleton graph is unique to different layers and samples, and the topology of the graph is formed by an adjacency matrix AkSum mask MkDetermination of AkDetermining whether a connection exists between two vertices, MkDetermine the strength of the connection, are obtainedThe following expression forms:
Figure BDA0003080208800000032
in the formula, KvKernel size, representing spatial dimension, set to 3; wkIs a weight matrix, k ∈ [0,3 ]];
Figure BDA0003080208800000033
Figure BDA0003080208800000034
Represents the normalized diagonal matrix, AkAn N × N adjacency matrix representing the physical structure of the human body; b iskIs an N × N adjacency matrix, BkElements in (A) are trained and optimized along with the adaptive graph convolution layer (B)kThe value of (a) is not limited, and the elements in the matrix are arbitrary values that indicate the presence and strength of a connection between two joints; ckIs a data correlation graph used to learn a unique graph for each sample;
to determine whether a connection exists between two vertices and the strength of the connection, a normalized embedded gaussian function is used to calculate the similarity between the two vertices:
Figure BDA0003080208800000035
wherein N represents the total number of keypoints, viAnd vjCharacteristic information on the node;
given a feature matrix input, two embedding functions θ (-) and
Figure BDA0003080208800000036
dimension of input from CinX T X N to CexTxN, the two embedded feature matrices being rearranged and reshaped to an NxCeMatrix of T and one CeT multiplied by N matrix, which is transformed into a similarity momentArray, calculating C using the following formulak
Figure BDA0003080208800000037
In the formula, WθAnd
Figure BDA0003080208800000038
is an embedding function theta (-) and
Figure BDA0003080208800000039
the parameters of (1);
step 4.2: building an adaptive graph convolution module; the self-adaptive graph convolution module comprises a space graph convolution layer convs, a time graph convolution layer convt, an additional random discarding treatment Dropout and a residual connection which are sequentially connected; wherein, Dropout rate is set to 0.5; a batch of standardization layers and an activation function layer are respectively connected behind the space map convolution layer convs and the time map convolution layer convt;
step 4.3: stacking the self-adaptive graph convolution modules to build a self-adaptive graph convolution network; the adaptive graph convolution network comprises 9 adaptive graph convolution modules, wherein the number of output channels of each adaptive graph convolution module is respectively 64,128, 256 and 256; adding a BN layer of data to normalize the input data at the beginning, performing a global average pooling layer to pool feature maps of different samples to the same size, the final output being sent to a SoftMax classifier to obtain a prediction;
step 4.4: building a dual-flow graph convolutional neural network;
calculating data of joints and data of bones, inputting the joint data and the bone data into J flow and B flow respectively, adding SoftMax scores of the two flows to obtain a fusion score and predicting an action label.
Optionally, in step S5, the step of calculating the corresponding action type includes:
s51, expanding the double-flow graph convolution neural network, connecting 2 newly-added sub-networks in parallel on the basis of 2 existing sub-networks of the double-flow graph convolution neural network, and building an action recognition model;
s52, respectively introducing the bone data, the joint data, the angle change between bones and the energy generated by the motion into four sub-networks of the motion recognition model to obtain corresponding prediction scores; the action recognition model also comprises an accumulator and a SoftMax classifier, and after the 4 prediction scores are added by the accumulator, the accumulated result is led into the SoftMax classifier to obtain a final classification result; the final classification result S is calculated as:
S=S1W1+S2W2+S3W3+S4W4
in the formula, S1、S2、S3、S4The predicted score results are respectively 4 sub-networks; w1、W2、W3、W4Is the weight of 4 sub-networks and is a hyper-parameter.
In a second aspect, an embodiment of the present invention provides a motion recognition apparatus based on an adaptive graph-convolution neural network, where the motion recognition apparatus includes:
the human body skeleton data set generation module is used for acquiring video stream data of human body action types to be identified, processing the imported video stream data by adopting the existing posture estimation algorithm to obtain human body skeleton type data and human body skeleton graphs, generating and simultaneously obtaining coordinates and confidence coefficient characteristics of each key node, and generating a human body skeleton data set;
the spatial feature extraction module is used for calculating the change of angular momentum when bones rotate around key nodes in the human motion process, and taking the variable of the angle between adjacent bone edges as a deep spatial feature;
the time characteristic extraction module is used for extracting energy information in the action duration time of a human body, accumulating the angle difference generated by the rotation of the skeleton around the key nodes to obtain the total of angle change in the action duration time, dividing the accumulated sum of the angle difference corresponding to each key node by the number of key frames of the current action, and calculating to obtain the average energy change value of each key node as the deep time characteristic;
the double-flow graph convolution neural network construction module is used for constructing a double-flow graph convolution neural network, wherein joint data and bone data are respectively used as input data of J flow and B flow, and a predicted action tag is used as output data;
the action recognition model building module is used for expanding the double-flow graph convolution neural network, connecting 2 newly-added sub-networks in parallel and building an action recognition model, wherein the 2 newly-added sub-networks are respectively used for processing the spatial characteristics and the temporal characteristics;
and the action recognition model is used for simultaneously processing joint data, bone data, deep spatial features and deep temporal features and calculating to obtain corresponding action types.
Optionally, the dual-flow graph convolutional neural network includes 2 sub-networks; the joint data and the bone data are respectively used as input data of 2 sub-networks, and corresponding prediction scores are obtained after sub-network processing.
Optionally, each of the sub-networks or the newly added sub-networks includes 9 adaptive map convolution modules, and the number of output channels of each adaptive map convolution module is 64,128, 256, and 256, respectively; adding a BN layer of data to normalize the input data at the beginning, performing a global average pooling layer to pool feature maps of different samples to the same size, the final output being sent to a SoftMax classifier to obtain a prediction;
the self-adaptive graph convolution module comprises a space graph convolution layer convs, a time graph convolution layer convt, an additional random discarding treatment Dropout and a residual connection which are sequentially connected; wherein, Dropout rate is set to 0.5; after the space map convolution layer convs and the time map convolution layer convt, a batch normalization layer and an activation function layer are respectively connected.
In a third aspect, the present embodiment refers to an electronic device comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement an adaptive graph convolution neural network based action recognition method as previously described.
In a third aspect, the present embodiment refers to a computer-readable storage medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method for motion recognition based on an adaptive graph convolution neural network as described above.
The invention has the beneficial effects that:
on one hand, the invention is inspired by angular momentum in robot dynamics, and calculates the change of the angular momentum when bones rotate around key points in the process of human motion, so that the variable of the angle between adjacent bone edges is introduced as a deep spatial feature; on the other hand, energy information in the action duration of the human body is extracted, the obtained angle differences are accumulated to obtain the total sum of angle changes in the action duration, and finally the angle sum on each node is divided by the number of key frames of the current action, and the obtained result is used as deep time characteristics. By adding the spatial characteristic of angle change and the temporal characteristic of average energy change, the final classification accuracy can be greatly improved by the action recognition model, and the advantages of a skeleton data set in the action recognition field are fully utilized by time-space combination, so that the conventional double-flow self-adaptive graph convolution network is more suitable for the task of action recognition.
Drawings
Fig. 1 is a flowchart of an action recognition method based on an adaptive graph convolution neural network according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of node labels of the public data set NTU-RGB + D data set according to the embodiment of the present invention.
FIG. 3 is a diagram of an adaptive graph rolling module according to an embodiment of the invention.
Fig. 4 is a schematic diagram of an adaptive graph convolution network according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of a dual-stream adaptive graph convolution network according to an embodiment of the present invention.
Fig. 6 is a schematic structural diagram of a motion recognition model according to an embodiment of the present invention.
Fig. 7 is a schematic view of an identification process of the motion identification model according to the embodiment of the present invention.
Detailed Description
The present invention will now be described in further detail with reference to the accompanying drawings.
It should be noted that the terms "upper", "lower", "left", "right", "front", "back", etc. used in the present invention are for clarity of description only, and are not intended to limit the scope of the present invention, and the relative relationship between the terms and the terms is not limited by the technical contents of the essential changes.
Example one
Fig. 1 is a flowchart of an action recognition method based on an adaptive graph-convolution neural network according to an embodiment of the present invention. The present embodiment may be used for recognizing human body motion in a video stream through a server or other devices, and the method may be performed by an adaptive graph-convolution neural network-based motion recognition apparatus, which may be implemented in software and/or hardware, and may be integrated in an electronic device, such as an integrated server device.
Referring to fig. 1, the motion recognition method includes:
s1, obtaining video stream data of the human body action type to be identified, processing the imported video stream data by adopting the existing posture estimation algorithm to obtain human body skeleton type data and human body skeleton graphics, generating and simultaneously obtaining the coordinates and confidence coefficient characteristics of each key node, and generating a human body skeleton data set.
Specifically, the existing posture estimation algorithm is used for processing video stream data into data of a human skeleton type, so that a human skeleton graph is obtained, meanwhile, the characteristics of coordinates, confidence coefficient and the like of each key point are obtained, and the data are stored in a text file for use in the subsequent steps.
For convenience of description, the algorithm is tested by taking 10 videos as an example, actions of people in the videos cover actions in the NTU-RGB + D data set, and finally a node label of the NTU-RGB + D data set shown in fig. 2 is obtained, in fig. 2, there are 25 key nodes, and serial numbers are 1 to 25 respectively.
S2, calculating the change of angular momentum when the skeleton rotates around the key node in the process of human motion, and taking the angle between the adjacent skeleton edges as the deep spatial feature.
The purpose of step S2 is to calculate the angle Δ θ of change between all key points and the bone. The innovation point of the step 2 is that: the position relation between bones and bones when the human body moves is fully utilized, the bone graph extracted by the human body action is similar to joints defined in robotics, so that the angular momentum variable in the robot dynamics is introduced, the information of data is fully utilized by calculating the angular momentum change generated when the human body moves, the quality of a human cannot be estimated, only the angle is reserved, the relation between key points and the joints when the human body moves is represented by the angle, and the spatial information of an algorithm is expanded. The method specifically comprises the following steps:
step 2.1: and calculating the included angle between the bones. Calculating the angle between two adjacent skeletons according to the coordinate and physical connection of each node in the human skeleton data set, wherein when the degree of the node is 1, namely the node only has one edge, the angle does not need to be calculated; when the degree of the node is 2, namely one node is connected with two edges, only an angle smaller than 180 degrees needs to be calculated; when the degree of a node is 3, namely one node is connected with 3 edges, 3 angles need to be calculated; similarly, a node with 4 degrees needs to calculate 4 angles. As shown in FIG. 2, taking node 21 as an example, 4 angles formed between 4 bones of the node are calculated
Figure BDA0003080208800000061
Wherein "tl" represents an angle of top left (top right), "tr" represents an angle of top right (top right), "ll" represents an angle of bottom left (lower left), "lr" represents an angle of bottom right (lower right), and "21" represents a node centered on the 21 st node; similarly, nodes with redundancy 2 and 3 are calculated in the same manner. The calculation formulas (1) to (4) of 4 angles around the node 21 as the center point are as followsShown below:
Figure BDA0003080208800000062
Figure BDA0003080208800000063
Figure BDA0003080208800000064
Figure BDA0003080208800000071
"x" and "y" in the above formula indicate the horizontal and vertical coordinates of the node, and the subscript thereof indicates the reference numeral of the node. All other nodes with the degree of more than 2 can calculate the angles of all adjacent bones by modifying the coordinates of the nodes by referring to the formula.
Step 2.2: and combining the 26 angles obtained in the step 2.1 into a matrix form according to the sequence of the nodes. Arranging all angles in the first frame according to the node sequence, if nodes with the degree of more than 2 are arranged according to the principle of from left to right and from top to bottom, taking the first frame as an example, and obtaining an angle matrix of
Figure BDA0003080208800000072
Permutation, where "1" in the superscript represents the first frame, and "1, 2, … …, 26" in the subscript, there are a total of 26 angles in the first frame. All angles of n frames for the entire duration of the action are combined as described above and then expanded by rows into an angle matrix as shown below:
Figure BDA0003080208800000073
step 2.3: and calculating the change delta theta of the angle formed by the surrounding bones by taking the same node as a central point between adjacent frames. Subtracting all angles of the previous frame from the angles of all key points of the next frame to obtain an angle difference formed by the peripheral edge of the same node between adjacent frames, and calculating the angle difference by using the angle matrix known in the step 2.2 to obtain an angle difference matrix delta theta, wherein the matrix expression form is as follows:
Figure BDA0003080208800000074
in the formula (I), the compound is shown in the specification,
Figure BDA0003080208800000075
is the value of the mth angle in the (n-1) th key frame.
And S3, extracting energy information in the action duration time of the human body, accumulating the angle difference generated by the rotation of the skeleton around the key nodes to obtain the sum of angle change in the action duration time, dividing the accumulated sum of the angle difference corresponding to each key node by the number of the key frames of the current action, and calculating to obtain the average energy change value of each key node as the deep-level time characteristic.
Step S3 is initiated by the concept of integration in mathematics to calculate the total energy θ generated during the duration of the motionIAnd dividing the secondary result by the number of the key frames extracted by the attitude estimation algorithm to obtain the average energy change. The innovative point of step S3 is to represent the sum of the energies generated by the whole set of actions by calculating the sum of the angular changes obtained after the action is completed, and then further dividing θIThe average energy change is obtained by dividing the number of the key frames, so that the operation can further utilize the time characteristic of the skeleton data set, and the operation is an effective solution which is not easy to distinguish and solves similar actions such as falling and lying down. The method specifically comprises the following steps:
step 3.1: accumulating and summing the angle matrix delta theta obtained by calculation in the step 2.3 according to the time sequence to obtain the angle change sum theta on each nodeI,θIThe expression of (A) is as follows:
Figure BDA0003080208800000076
wherein the subscript "1-25" represents the node designation,
Figure BDA0003080208800000077
in the superscript "1 to n-1" represent key frames, "θ2~θ25"also added in the above-described manner to finally form a 1 × 25 energy matrix θI
Step 3.2: theta obtained in step 3.1IDividing the current action frame number to obtain the average energy theta of the actionaWherein
Figure BDA0003080208800000081
Where n represents the number of key frames extracted by the attitude estimation algorithm, "θ1~θ25"is the sum over each node calculated in step 3.1.
And S4, constructing a dual-flow graph convolutional neural network, wherein joint data and bone data are used as input data of J flow and B flow respectively, and predicted action labels are used as output data. The method specifically comprises the following steps:
step 4.1: an adaptive graph convolution layer (AGC) is built, the topology structure of the network and other parameters of the network are optimized together in an end-to-end learning mode, and a skeleton graph is unique to different layers and samples, so that the flexibility of the model is greatly improved. More specifically, the topology of the graph is actually determined by the adjacency matrix and the mask, i.e., AkAnd Mk。AkDetermining whether a connection exists between two vertices, MkThe strength of the connection is determined. Thus, an expression form as in formula (5) is obtained:
Figure BDA0003080208800000082
in the above formula KvKernel size, representing spatial dimensions, set to 3, k ∈ [0,3 ]],WkIs a matrix of weights that is a function of,
Figure BDA0003080208800000083
Aksimilar to an N adjacency matrix, wherein
Figure BDA0003080208800000084
Is a normalized diagonal matrix, alpha is set to 0.001, in order to prevent empty rows; a. thekAn N × N adjacency matrix representing the physical structure of the human body; b iskAlso representing an N adjacency matrix, but with AkIn contrast, during the training process, B is adjustedkThe elements in (1) are trained and optimized together; b iskWithout limitation, this means that the graph is completely learned from the training data, and in this data-driven manner, the model can learn a graph that is completely specific to the recognition task and is more personalized to the different information contained in the different layers. The element in the matrix may be any value and it not only indicates the presence of a connection between two joints, but also the strength of the connection, which is related to MkThe attention mechanism performed is the same; ckIs a data correlation graph which learns a unique graph for each sample, and in order to determine whether a connection exists between two vertexes and the strength of the connection, the similarity between the two vertexes is calculated by using a normalized embedded gaussian function as shown in formula (6):
Figure BDA0003080208800000085
where N represents the total number of keypoints, viAnd vjCharacteristic information on the node. More specifically, given a feature matrix input, two embedding functions θ (-) and
Figure BDA0003080208800000086
dimension of input from CinX T X N to CexTxN, the two embedded feature matrices being rearranged and reshaped to an NxCeMatrix of T and one CeMultiplying T multiplied by N to form a similar matrix, and calculating C by using formula (7)k
Figure BDA0003080208800000087
W in the above formulaθAnd
Figure BDA0003080208800000088
is an embedding function theta (-) and
Figure BDA0003080208800000089
the parameter (c) of (c).
Step 4.2: and building an adaptive graph convolution module. The convolution in the time dimension is the same as the classic algorithm space-time graph convolution network (ST-GCN), and both the space graph convolution network layer and the time graph convolution network layer are followed by a Batch Normalization (BN) layer and an activation function (ReLU) layer. As shown in fig. 3, a basic block is a combination of a spatial map convolutional layer (convs), a temporal map convolutional layer (convt) and an additional random discard process (Dropout), the Dropout rate is set to 0.5, and a residual connection is added for each block for stable training.
Step 4.3: and building an adaptive graph convolution network. The Adaptive Graph Convolution Network (AGCN) is stacked for step 4.2 as shown in fig. 4, for a total of 9 modules, each having 64, 64,128, 256 and 256 output channels. A data BN layer is added at the beginning to normalize the input data, a Global average pooling layer (Global MaxPooling) is performed to pool feature maps of different samples to the same size. The final output is sent to the SoftMax classifier to obtain the prediction.
Step 4.4: and building a double-flow network. Referring to fig. 5, the data of the joints and the data of the bones are calculated first, then the joint data and the bone data are input into the J stream and the B stream respectively, and finally the SoftMax scores of the two streams are added to obtain a fusion score and predict an action label.
S5, expanding the double-flow graph convolution neural network, connecting 2 newly-added sub-networks in parallel, and constructing an action recognition model, wherein the newly-added sub-networks are respectively used for processing the spatial characteristics and the time characteristics; the action recognition model is used for simultaneously processing joint data, skeleton data, deep spatial features and deep temporal features, and calculating to obtain corresponding action types. And modifying the structure of the network again, and expanding the feature input on the basis of keeping the feature extraction method of the original model. The model consists of 4 sub-networks, wherein 2 sub-networks are kept unchanged on the basis of the original double-flow self-adaptive graph convolution neural network, and the other 2 sub-networks are respectively used for extracting the characteristics of space and time, and the specific steps comprise:
step 5.1: and building a spatio-temporal feature expansion graph convolution neural network model. Based on the double-flow adaptive graph convolution network described in step 4, the motion recognition model of this embodiment is shown in fig. 6. The motion recognition model consists of 4 sub-networks, 2 sub-networks keep the existing double-flow self-adaptive graph convolution network unchanged, and the rest 2 sub-networks are used for extracting features in space and time. And finally, each sub-network obtains a predicted score through a SoftMax classifier, and then the 4 scores are added to obtain a final classification result. The final classification score is S, which is expressed as equation (8):
S=S1W1+S2W2+S3W3+S4W4 (8)
s in the above formula1、S2、S3、S4Represents the predicted score results for 4 subnetworks, W1、W2、W3、W4The weights representing them are hyper-parameters whose magnitude can be adjusted according to the result.
Step 5.2: the model of this patent is trained. Firstly, preprocessing data, recombining data structures in an NTU-RGB + D public data set, and solving an angle difference matrix delta theta and an average energy change matrix theta according to formulas in the step 2 and the step 3a(ii) a Then, the values of delta theta and theta are adjustedaTwo space-time feature matrices are input into an adaptive graph convolution neural network model, and the modelThe optimization strategy adopts a Stochastic Gradient Descent (SGD) method with Nesterov momentum of 0.9, the Batch Size (Batch _ Size) is set to be 64, the weight attenuation is set to be 0.0001, the training times are set to be 64 times, the other two sub-networks are used for calculating the data of the original 2S-AGCN, and finally the classification scores calculated by the 4 networks are added to obtain the final total score, and the classification result is finally obtained. The training flow chart is shown in fig. 7.
Example two
The embodiment provides an action recognition device based on an adaptive graph convolution neural network, which comprises a human skeleton data set generation module, a spatial feature extraction module, a temporal feature extraction module, a double-flow graph convolution neural network construction module, an action recognition model construction module and an action recognition model.
And the human body skeleton data set generating module is used for acquiring video stream data of the human body action types to be identified, processing the imported video stream data by adopting the existing posture estimation algorithm to obtain human body skeleton type data and human body skeleton graphs, generating and simultaneously obtaining the coordinates and confidence characteristics of each key node, and generating a human body skeleton data set.
And the spatial feature extraction module is used for calculating the change of angular momentum when the bones rotate around the key nodes in the human motion process, and taking the variable of the angle between the adjacent bone edges as the deep spatial feature.
And the time characteristic extraction module is used for extracting energy information in the action duration time of the human body, accumulating the angle difference generated by the rotation of the skeleton around the key nodes to obtain the total of the angle change in the action duration time, dividing the accumulated sum of the angle difference corresponding to each key node by the number of the key frames of the current action, and calculating to obtain the average energy change value of each key node as the deep time characteristic.
The double-flow graph convolution neural network construction module is used for constructing a double-flow graph convolution neural network, wherein joint data and bone data are respectively used as input data of J flow and B flow, and predicted action labels are used as output data.
And the action recognition model building module is used for expanding the double-flow graph convolution neural network, connecting 2 newly-added sub-networks in parallel and building an action recognition model, and the 2 newly-added sub-networks are respectively used for processing the spatial characteristics and the temporal characteristics.
And the action recognition model is used for simultaneously processing joint data, bone data, deep spatial features and deep temporal features and calculating to obtain corresponding action types.
Optionally, the dual-flow graph convolutional neural network includes 2 sub-networks; the joint data and the bone data are respectively used as input data of 2 sub-networks, and corresponding prediction scores are obtained after sub-network processing.
In some examples, each sub-network or newly added sub-network includes 9 adaptive map convolution modules, and the number of output channels of each adaptive map convolution module is 64,128, 256 and 256; the BN layer of data is added at the beginning to normalize the input data, a global average pooling layer is performed to pool feature maps of different samples to the same size, and the final output is sent to the SoftMax classifier to obtain the prediction. The self-adaptive graph convolution module comprises a space graph convolution layer convs, a time graph convolution layer convt, an additional random discarding treatment Dropout and a residual connection which are sequentially connected; wherein, Dropout rate is set to 0.5; after the space map convolution layer convs and the time map convolution layer convt, a batch normalization layer and an activation function layer are respectively connected.
Through the action recognition device of the second embodiment of the invention, the transmission object is determined by establishing the data containing relation of the whole application, so that the aim of recognizing the human action in the video stream is achieved. The action recognition device provided by the embodiment of the invention can execute the action recognition method based on the self-adaptive graph convolution neural network provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
EXAMPLE III
The embodiment of the application provides an electronic device, which comprises a processor, a memory, an input device and an output device; in the electronic device, the number of the processors can be one or more; the processor, memory, input devices, and output devices in the electronic device may be connected by a bus or other means.
The memory, which is a computer-readable storage medium, may be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the detection method in the embodiments of the present invention. The processor executes various functional applications and data processing of the electronic device by running the software programs, instructions and modules stored in the memory, that is, the method for recognizing the action based on the adaptive graph convolution neural network provided by the embodiment of the invention is realized.
The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory may further include memory located remotely from the processor, and these remote memories may be connected to the electronic device through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device, and may include a keyboard, a mouse, and the like. The output device may include a display device such as a display screen.
Example four
The present application provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method for motion recognition based on an adaptive graph convolution neural network is implemented as described above.
Of course, the storage medium containing the computer-executable instructions provided by the embodiments of the present invention is not limited to the method operations described above, and may also perform related operations in the unified processing method based on the context consistency of the environment provided by any embodiments of the present invention.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims (10)

1. A motion recognition method based on an adaptive graph convolution neural network is characterized by comprising the following steps:
s1, acquiring video stream data of the human body action type to be identified, processing the imported video stream data by adopting the existing posture estimation algorithm to obtain human body skeleton type data and human body skeleton graphics, generating and simultaneously obtaining coordinates and confidence characteristics of each key node, and generating a human body skeleton data set;
s2, calculating the change of angular momentum when the skeleton rotates around a key node in the process of human motion, and taking the angle between adjacent skeleton edges as a deep spatial feature;
s3, extracting energy information in the action duration time of the human body, accumulating the angle difference generated by the rotation of the skeleton around the key nodes to obtain the sum of angle change in the action duration time, dividing the accumulated sum of the angle difference corresponding to each key node by the number of the key frames of the current action, and calculating to obtain the average energy change value of each key node as the deep time characteristic;
s4, constructing a dual-flow graph convolutional neural network, wherein joint data and bone data are respectively used as input data of J flow and B flow, and predicted action labels are used as output data;
s5, expanding the double-flow graph convolution neural network, connecting 2 newly-added sub-networks in parallel, and constructing an action recognition model, wherein the newly-added sub-networks are respectively used for processing the spatial characteristics and the time characteristics; the action recognition model is used for simultaneously processing joint data, skeleton data, deep spatial features and deep temporal features, and calculating to obtain corresponding action types.
2. The method for motion recognition based on adaptive graph convolution neural network of claim 1, wherein in step S2, the process of calculating the change of angular momentum when the bone rotates around the key node during the human motion process, and using the variable of the angle between the adjacent bone edges as the deep spatial feature includes the following steps:
s21, calculating angles between all adjacent bones according to the coordinates and physical connection of each key node in the human skeleton data set; when the degree of the node is 1, the node only has one edge and does not calculate an angle; when the degree of the node is 2, namely one node is connected with two edges, an angle smaller than 180 degrees is calculated; when the degree of the node is 3, namely one node is connected with 3 edges, calculating 3 angles; when the degree of the node is 4, 4 angles are calculated;
s22, aiming at all angles in n key frames in the whole action duration, combining the calculated angles into a matrix form according to the sequence of key nodes and video frames, and expanding the obtained angle matrix to be:
Figure FDA0003080208790000011
wherein m is the total number of angles,
Figure FDA0003080208790000012
is the value of the ith angle in the jth key frame, i is 1,2, …, m, j is 1,2, …, n;
s23, subtracting the corresponding angle of the corresponding key point of the previous frame from the angle of any key point of the next frame to obtain the angle difference formed by the surrounding edge of the same node between the adjacent frames; calculating an angle difference matrix delta theta formed by surrounding bones of adjacent frames by taking the same node as a central point:
Figure FDA0003080208790000013
in the formula (I), the compound is shown in the specification,
Figure FDA0003080208790000014
is the value of the mth angle in the (n-1) th key frame.
3. The method for motion recognition based on adaptive graph convolution neural network of claim 2, wherein in step S3, the step of extracting energy information within the motion duration of the human body, accumulating the angle differences generated by the rotation of the bone around the key nodes to obtain a total of angle changes within the motion duration, dividing the accumulated total of angle differences corresponding to each key node by the number of key frames of the current motion to calculate an average energy change value of each key node, and using the average energy change value as the deep temporal feature comprises the following steps:
s31, accumulating and summing the calculated angle difference matrix delta theta according to the time sequence to obtain the angle change sum theta on each nodeI,θIThe expression of (A) is as follows:
Figure FDA0003080208790000021
wherein the subscript "1 to m-1" represents the number of the key node,
Figure FDA0003080208790000022
"1-n-1" in the superscript represents key frames, constituting a 1 × (m-1) energy matrix θI
S21, theta obtained in the step S31IDividing the current action frame number to obtain the average energy theta of the current actionaWherein
Figure FDA0003080208790000023
And n is the number of key frames extracted by the attitude estimation algorithm.
4. The method for motion recognition based on adaptive graph convolution neural network of claim 1, wherein in step S4, the process of constructing the biflow graph convolution neural network includes the following steps:
step 4.1: building a self-adaptive graph volume layer; the adaptive graph convolution layer is used for optimizing the topology of the network together with other parameters of the network in an end-to-end learning mode, the skeleton graph is unique to different layers and samples, and the topology of the graph is formed by an adjacency matrix AkSum mask MkDetermination of AkDetermining whether a connection exists between two vertices, MkThe strength of the linkage was determined, giving the following expression:
Figure FDA0003080208790000024
in the formula, KvKernel size, representing spatial dimension, set to 3; wkIs a weight matrix, k ∈ [0,3 ]];
Figure FDA0003080208790000025
Figure FDA0003080208790000026
Represents the normalized diagonal matrix, AkAn N × N adjacency matrix representing the physical structure of the human body; b iskIs an N × N adjacency matrix, BkElements in (A) are trained and optimized along with the adaptive graph convolution layer (B)kThe value of (a) is not limited, and the elements in the matrix are arbitrary values that indicate the presence and strength of a connection between two joints; ckIs a data correlation graph used to learn a unique graph for each sample;
to determine whether a connection exists between two vertices and the strength of the connection, a normalized embedded gaussian function is used to calculate the similarity between the two vertices:
Figure FDA0003080208790000027
wherein N represents the total number of keypoints, viAnd vjCharacteristic information on the node;
given a feature matrix input, two embedding functions θ (-) and
Figure FDA0003080208790000031
dimension of input from CinX T X N to CexTxN, the two embedded feature matrices being rearranged and reshaped to an NxCeMatrix of T and one CeMultiplying T × N matrix to obtain a similar matrix, and calculating C by using the following formulak
Figure FDA0003080208790000032
In the formula, WθAnd
Figure FDA0003080208790000033
is an embedding function theta (-) and
Figure FDA0003080208790000034
the parameters of (1);
step 4.2: building an adaptive graph convolution module; the self-adaptive graph convolution module comprises a space graph convolution layer convs, a time graph convolution layer convt, an additional random discarding treatment Dropout and a residual connection which are sequentially connected; wherein, Dropout rate is set to 0.5; a batch of standardization layers and an activation function layer are respectively connected behind the space map convolution layer convs and the time map convolution layer convt;
step 4.3: stacking the self-adaptive graph convolution modules to build a self-adaptive graph convolution network; the adaptive graph convolution network comprises 9 adaptive graph convolution modules, wherein the number of output channels of each adaptive graph convolution module is respectively 64,128, 256 and 256; adding a BN layer of data to normalize the input data at the beginning, performing a global average pooling layer to pool feature maps of different samples to the same size, the final output being sent to a SoftMax classifier to obtain a prediction;
step 4.4: building a dual-flow graph convolutional neural network;
calculating data of joints and data of bones, inputting the joint data and the bone data into J flow and B flow respectively, adding SoftMax scores of the two flows to obtain a fusion score and predicting an action label.
5. The method for motion recognition based on adaptive graph-convolution neural network of claim 4, wherein in step S5, the step of calculating the corresponding motion type includes:
s51, expanding the double-flow graph convolution neural network, connecting 2 newly-added sub-networks in parallel on the basis of 2 existing sub-networks of the double-flow graph convolution neural network, and building an action recognition model;
s52, respectively introducing the bone data, the joint data, the angle change between bones and the energy generated by the motion into four sub-networks of the motion recognition model to obtain corresponding prediction scores; the action recognition model also comprises an accumulator and a SoftMax classifier, and after the 4 prediction scores are added by the accumulator, the accumulated result is led into the SoftMax classifier to obtain a final classification result; the final classification result S is calculated as:
S=S1W1+S2W2+S3W3+S4W4
in the formula, S1、S2、S3、S4The predicted score results are respectively 4 sub-networks; w1、W2、W3、W4Is the weight of 4 sub-networks and is a hyper-parameter.
6. An action recognition device based on an adaptive graph convolution neural network, characterized in that the action recognition device comprises:
the human body skeleton data set generation module is used for acquiring video stream data of human body action types to be identified, processing the imported video stream data by adopting the existing posture estimation algorithm to obtain human body skeleton type data and human body skeleton graphs, generating and simultaneously obtaining coordinates and confidence coefficient characteristics of each key node, and generating a human body skeleton data set;
the spatial feature extraction module is used for calculating the change of angular momentum when bones rotate around key nodes in the human motion process, and taking the variable of the angle between adjacent bone edges as a deep spatial feature;
the time characteristic extraction module is used for extracting energy information in the action duration time of a human body, accumulating the angle difference generated by the rotation of the skeleton around the key nodes to obtain the total of angle change in the action duration time, dividing the accumulated sum of the angle difference corresponding to each key node by the number of key frames of the current action, and calculating to obtain the average energy change value of each key node as the deep time characteristic;
the double-flow graph convolution neural network construction module is used for constructing a double-flow graph convolution neural network, wherein joint data and bone data are respectively used as input data of J flow and B flow, and a predicted action tag is used as output data;
the action recognition model building module is used for expanding the double-flow graph convolution neural network, connecting 2 newly-added sub-networks in parallel and building an action recognition model, wherein the 2 newly-added sub-networks are respectively used for processing the spatial characteristics and the temporal characteristics;
and the action recognition model is used for simultaneously processing joint data, bone data, deep spatial features and deep temporal features and calculating to obtain corresponding action types.
7. The adaptive graph convolution neural network-based action recognition system of claim 6, wherein the dual-flow graph convolution neural network includes 2 sub-networks; the joint data and the bone data are respectively used as input data of 2 sub-networks, and corresponding prediction scores are obtained after sub-network processing.
8. The adaptive graph convolution neural network-based motion recognition system of claim 7, wherein each of the sub-networks or newly added sub-networks comprises 9 adaptive graph convolution modules, and the number of output channels of each adaptive graph convolution module is 64,128, 256 and 256; adding a BN layer of data to normalize the input data at the beginning, performing a global average pooling layer to pool feature maps of different samples to the same size, the final output being sent to a SoftMax classifier to obtain a prediction;
the self-adaptive graph convolution module comprises a space graph convolution layer convs, a time graph convolution layer convt, an additional random discarding treatment Dropout and a residual connection which are sequentially connected; wherein, Dropout rate is set to 0.5; after the space map convolution layer convs and the time map convolution layer convt, a batch normalization layer and an activation function layer are respectively connected.
9. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of adaptive graph convolution neural network based action recognition according to any one of claims 1-5.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the adaptive graph convolution neural network-based action recognition method according to any one of claims 1 to 5.
CN202110564099.8A 2021-05-24 2021-05-24 Action recognition method and device based on self-adaptive graph convolution neural network Active CN113378656B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110564099.8A CN113378656B (en) 2021-05-24 2021-05-24 Action recognition method and device based on self-adaptive graph convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110564099.8A CN113378656B (en) 2021-05-24 2021-05-24 Action recognition method and device based on self-adaptive graph convolution neural network

Publications (2)

Publication Number Publication Date
CN113378656A true CN113378656A (en) 2021-09-10
CN113378656B CN113378656B (en) 2023-07-25

Family

ID=77571555

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110564099.8A Active CN113378656B (en) 2021-05-24 2021-05-24 Action recognition method and device based on self-adaptive graph convolution neural network

Country Status (1)

Country Link
CN (1) CN113378656B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114511553A (en) * 2022-02-25 2022-05-17 中山大学孙逸仙纪念医院 Arthritis scoring method and device based on hand X-ray image
CN114618147A (en) * 2022-03-08 2022-06-14 电子科技大学 Taijiquan rehabilitation training action recognition method
CN114821640A (en) * 2022-04-12 2022-07-29 杭州电子科技大学 Skeleton action identification method based on multi-stream multi-scale expansion space-time diagram convolution network

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401106A (en) * 2019-01-02 2020-07-10 ***通信有限公司研究院 Behavior identification method, device and equipment
CN111476181A (en) * 2020-04-13 2020-07-31 河北工业大学 Human skeleton action recognition method
CN111950485A (en) * 2020-08-18 2020-11-17 中科人工智能创新技术研究院(青岛)有限公司 Human body behavior identification method and system based on human body skeleton
US20210000404A1 (en) * 2019-07-05 2021-01-07 The Penn State Research Foundation Systems and methods for automated recognition of bodily expression of emotion
CN112528811A (en) * 2020-12-02 2021-03-19 建信金融科技有限责任公司 Behavior recognition method and device
CN112633209A (en) * 2020-12-29 2021-04-09 东北大学 Human action recognition method based on graph convolution neural network
CN112749671A (en) * 2021-01-19 2021-05-04 澜途集思生态科技集团有限公司 Human behavior recognition method based on video

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401106A (en) * 2019-01-02 2020-07-10 ***通信有限公司研究院 Behavior identification method, device and equipment
US20210000404A1 (en) * 2019-07-05 2021-01-07 The Penn State Research Foundation Systems and methods for automated recognition of bodily expression of emotion
CN111476181A (en) * 2020-04-13 2020-07-31 河北工业大学 Human skeleton action recognition method
CN111950485A (en) * 2020-08-18 2020-11-17 中科人工智能创新技术研究院(青岛)有限公司 Human body behavior identification method and system based on human body skeleton
CN112528811A (en) * 2020-12-02 2021-03-19 建信金融科技有限责任公司 Behavior recognition method and device
CN112633209A (en) * 2020-12-29 2021-04-09 东北大学 Human action recognition method based on graph convolution neural network
CN112749671A (en) * 2021-01-19 2021-05-04 澜途集思生态科技集团有限公司 Human behavior recognition method based on video

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
LEI SHI等: "Skeleton-based action recognition with multi-stream adaptive graph convolutional networks", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》, vol. 29, pages 9532 - 9545, XP011815656, DOI: 10.1109/TIP.2020.3028207 *
NING SUN等: "Multi-stream slowFast graph convolutional networks for skeleton-based action recognition", 《IMAGE AND VISION COMPUTING》, pages 1 - 9 *
YI-FAN SONG等: "Stronger, Faster and More Explainable: A Graph Convolutional Baseline for Skeleton-based Action Recognition", 《PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA》, pages 1625 - 1633 *
李龙: "融合注意力机制的人体骨骼点动作识别方法研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, no. 2020, pages 138 - 1137 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114511553A (en) * 2022-02-25 2022-05-17 中山大学孙逸仙纪念医院 Arthritis scoring method and device based on hand X-ray image
CN114618147A (en) * 2022-03-08 2022-06-14 电子科技大学 Taijiquan rehabilitation training action recognition method
CN114618147B (en) * 2022-03-08 2022-11-15 电子科技大学 Taijiquan rehabilitation training action recognition method
CN114821640A (en) * 2022-04-12 2022-07-29 杭州电子科技大学 Skeleton action identification method based on multi-stream multi-scale expansion space-time diagram convolution network
CN114821640B (en) * 2022-04-12 2023-07-18 杭州电子科技大学 Skeleton action recognition method based on multi-flow multi-scale expansion space-time diagram convolutional network

Also Published As

Publication number Publication date
CN113378656B (en) 2023-07-25

Similar Documents

Publication Publication Date Title
CN109902546B (en) Face recognition method, face recognition device and computer readable medium
CN113378656B (en) Action recognition method and device based on self-adaptive graph convolution neural network
WO2021258967A1 (en) Neural network training method and device, and data acquisition method and device
CN111291739B (en) Face detection and image detection neural network training method, device and equipment
EP3757905A1 (en) Deep neural network training method and apparatus
CN113408455B (en) Action identification method, system and storage medium based on multi-stream information enhanced graph convolution network
CN106897714A (en) A kind of video actions detection method based on convolutional neural networks
CN107045631A (en) Facial feature points detection method, device and equipment
CN111191630B (en) Performance action recognition method suitable for intelligent interactive viewing scene
WO2020107847A1 (en) Bone point-based fall detection method and fall detection device therefor
CN113254927B (en) Model processing method and device based on network defense and storage medium
CN112464930A (en) Target detection network construction method, target detection method, device and storage medium
WO2022052782A1 (en) Image processing method and related device
CN112699837A (en) Gesture recognition method and device based on deep learning
CN111738074B (en) Pedestrian attribute identification method, system and device based on weak supervision learning
CN114821804A (en) Attention mechanism-based action recognition method for graph convolution neural network
CN113516227A (en) Neural network training method and device based on federal learning
CN113239884A (en) Method for recognizing human body behaviors in elevator car
CN113688765A (en) Attention mechanism-based action recognition method for adaptive graph convolution network
CN114743273A (en) Human skeleton behavior identification method and system based on multi-scale residual error map convolutional network
CN115482523A (en) Small object target detection method and system of lightweight multi-scale attention mechanism
CN114140841A (en) Point cloud data processing method, neural network training method and related equipment
CN113158791A (en) Human-centered image description labeling method, system, terminal and medium
CN117115911A (en) Hypergraph learning action recognition system based on attention mechanism
CN115346264A (en) Human behavior recognition algorithm based on multi-stream graph convolution residual error network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant