CN113158970A - Action identification method and system based on fast and slow dual-flow graph convolutional neural network - Google Patents

Action identification method and system based on fast and slow dual-flow graph convolutional neural network Download PDF

Info

Publication number
CN113158970A
CN113158970A CN202110510781.9A CN202110510781A CN113158970A CN 113158970 A CN113158970 A CN 113158970A CN 202110510781 A CN202110510781 A CN 202110510781A CN 113158970 A CN113158970 A CN 113158970A
Authority
CN
China
Prior art keywords
fast
slow
features
branch
human body
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110510781.9A
Other languages
Chinese (zh)
Other versions
CN113158970B (en
Inventor
高跃
陈自强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202110510781.9A priority Critical patent/CN113158970B/en
Publication of CN113158970A publication Critical patent/CN113158970A/en
Application granted granted Critical
Publication of CN113158970B publication Critical patent/CN113158970B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method and a system for recognizing actions based on a fast-slow dual-flow graph convolutional neural network, wherein the method comprises the following steps: acquiring human skeleton joint characteristics; regularizing the human body skeleton joint characteristics, and deforming the shapes of the human body skeleton joint characteristics in one batch; copying the processed human body skeleton joint characteristics to generate two identical human body skeleton joint characteristics, and respectively inputting the two identical human body skeleton joint characteristics to a fast branch and a slow branch of a fast-slow double-flow graph convolution network for characteristic learning; and eliminating the dimensionality of the features of each action category through a global pooling layer, mapping the dimensionality-eliminated features to corresponding action categories through a full connection layer, and obtaining the score of each action category through a Softmax function. The method solves the problem that the modeling of the time sequence information is weak in the prior art, and is a method for capturing the time sequence information and the fast and slow motion information better.

Description

Action identification method and system based on fast and slow dual-flow graph convolutional neural network
Technical Field
The invention relates to the technical field of action recognition based on skeleton information, in particular to the technical field of action recognition based on skeleton information.
Background
In the task of motion recognition based on skeletal information, a method based on a graph convolution neural network is the current mainstream method. The graph convolution neural network is designed for feature extraction of a single static graph structure, and is weak for extracting time sequence information. The human skeleton information is a time-series continuous graph structure data, and can also be regarded as a dynamic graph data. For the task of motion recognition, capturing only the spatial structure information (single frame skeleton information) of the static image and ignoring the timing information cannot achieve satisfactory performance. Generally, for actions which only need single frame of static information and can be distinguished, the method based on the graph convolution neural network can obtain better performance; and some actions are similar to other actions due to the static frame, and the actions can be distinguished by adding time sequence action information, so that the model has better modeling capability of the time sequence information.
The design center of gravity of many current methods based on the graph convolution neural network improves the performance of the model by defining adaptive adjacency matrixes, new graph structure modeling methods, new node connection and the like on the aspect of capturing spatial structure information. Compared with the ST-GCN which applies GCN to the task of human skeleton action recognition for the first time, the methods have certain performance improvement. However, in the modeling of the timing information, the methods simply follow the two-dimensional convolution used by the ST-GCN to model the timing information, and are not greatly improved.
In the RGB video-based method, interaction of modeling timing information and modeling spatio-temporal information has been an important topic, and researchers use optical flow modalities to model motion information or use 3D convolutional networks to model both temporal and spatial information. In recent years, a convolutional neural network based method Slowfast has been greatly successful in an RGB video based motion recognition method.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, the invention aims to provide a method for recognizing actions based on a fast and slow dual-flow graph convolutional neural network, which is designed on the basis of a graph convolutional neural network method and is used for capturing time sequence information and fast and slow action information better by using the fast and slow dual-flow graph convolutional neural network so as to improve the accuracy of action recognition.
The second purpose of the invention is to provide a motion recognition system based on the fast and slow dual-flow graph convolutional neural network.
A third object of the invention is to propose a computer device.
A fourth object of the invention is to propose a non-transitory computer-readable storage medium.
In order to achieve the above object, an embodiment of a first aspect of the present invention provides a method for identifying an action based on a fast-slow dual-flow graph convolutional neural network, including the following steps:
step S10, obtaining human skeleton joint characteristics;
step S20, regularizing the human body skeleton joint features, wherein the shapes of a batch of the human body skeleton joint features are deformed, a one-dimensional regularization module is used for regularizing a time sequence dimension, and the shapes of a batch of the human body skeleton joint features are deformed into original shapes again;
step S30, copying the human body skeleton joint features processed in the step S20 to generate two identical human body skeleton joint features, inputting the two identical human body skeleton joint features to a fast branch and a slow branch of a fast-slow double-flow graph convolution network respectively for feature learning, and fusing learning results of the fast branch and the slow branch to obtain features of each action category, wherein the fast branch and the slow branch of the fast-slow double-flow graph convolution network have the same network structure and have different network parameter configurations and input features;
and step S40, performing dimensionality elimination on the features of each action category through the global pooling layer, mapping the features subjected to dimensionality elimination to corresponding action categories through the full connection layer, and obtaining the score of each action category through a Softmax function.
Optionally, in an embodiment of the present application, the step S10 includes the following steps:
human skeleton joint features are obtained from the data set, and the feature shape of each sample is as follows:
(C,T,M,V)
wherein C is the number of characteristic channels, has a value of 3 and represents the three-dimensional coordinates (x, y, z) of the joint points; t represents the number of frames of the action; m represents the number of persons performing the action; v represents the number of human joint points.
Optionally, in an embodiment of the present application, the step S20 includes the following steps:
carrying out regularization processing on data, using batch training in the training process, wherein the characteristic shape of the tensor of one batch is as follows:
(B,C,T,M,V)
firstly, deforming the one-dimensional batch tensor into the following steps:
(B,M*V*C,T)
and then, using one-dimensional batch regularization module to regularize the time sequence T dimension, and re-deform the features into the original shape (B, C, T, M, V).
Optionally, in an embodiment of the present application, the specific steps in step S30 include:
each branch comprises a plurality of continuously superposed graph convolution blocks, and each graph convolution block comprises a space graph convolution layer and a time sequence convolution layer; the time sequence convolution layer is a two-dimensional convolution module, the size of a convolution kernel is (t,1), and t is a time sequence receptive field of the convolution kernel; after the two convolution layers, a batch regularization layer and a ReLU activation function are attached, so that the characteristics of each channel are ensured to keep the same distribution; the calculation of the convolution block is described using the following formula:
Figure BDA0003060294420000031
Bkand CkIs an adaptive adjacency matrix proposed in 2s-AGCN, which changes during network training, wherein BkIs set to A at initializationkFor learning the potential association of any two nodes; ckIs a matrix calculated according to the sample characteristics and used for describing the specific node association of the sample.
Optionally, in an embodiment of the present application, the following two formulas respectively describe the feature shapes of the input features of the map convolution block at the same stage:
ffast i n=(B,βC,αT,V,M)
fslow i n=(B,C,T,V,M)
the timing dimension of the fast branch is always at1Alpha is a positive integer representing the ratio of the input frame rate of the fast branch, in which the number of channels beta, to the frame rate of the slow branch in the initial input signatureiCiIs significantly less than the channel number C of the convolution block of the same-stage slow branch graphiWhere i is the block number, βiIs a value less than 1, e.g., 1/3, and V for both branches is consistent, both being the number of graph nodes.
Optionally, in one embodiment of the present application, the information learned by both fast and slow branches is shared using a cross-connect module, fusing from the fast branch to the slow branch, since
Figure BDA0003060294420000032
And
Figure BDA0003060294420000033
the feature shapes of (A, B, beta C, alpha T, V, M) and (B, C, T, M, V) are respectively (B, beta C, alpha T, V, M) and (B, C, T, M, V), firstly, a two-dimensional convolution layer is adopted to carry out feature shape conversion, a batch regularization layer and a ReLU function are added after feature shape conversion is carried out, and then the two features are fused in a splicing or adding mode.
Optionally, in an embodiment of the application, in step S40, the final feature obtained in step S30 is eliminated by a global pooling layer, three dimensions of the time sequence T, the graph node V, and the number M of people are mapped to each action category by a full connection layer, and finally, a score of each action category is obtained by a Softmax function.
In order to achieve the above object, a second embodiment of the present application provides an action recognition system based on a fast-slow dual-flow graph convolutional neural network, including the following modules:
the acquisition module is used for acquiring the characteristics of the human skeleton joints;
the processing module is used for carrying out regularization processing on the human body skeleton joint features, wherein the shapes of a batch of the human body skeleton joint features are deformed, a one-dimensional regularization module is used for regularizing a time sequence dimension, and then the shapes of a batch of the human body skeleton joint features are deformed into the original shapes again;
the generating module is used for copying the human body skeleton joint features processed by the processing module, generating two identical human body skeleton joint features, inputting the two identical human body skeleton joint features to a fast branch and a slow branch of a fast-slow double-flow graph convolution network respectively for feature learning, and fusing learning results of the fast branch and the slow branch to obtain the features of each action category, wherein the fast branch and the slow branch of the fast-slow double-flow graph convolution network have the same network structure and have different network parameter configurations and input features;
and the determining module is used for carrying out dimensionality elimination on the features of each action category through the global pooling layer, mapping the features subjected to dimensionality elimination to the corresponding action categories through the full connection layer, and obtaining the score of each action category through a Softmax function.
In order to achieve the above object, a third aspect of the present application provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the method for motion recognition based on inter-joint association modeling according to the first aspect of the present application.
To achieve the above object, a non-transitory computer-readable storage medium is provided in a fourth embodiment of the present application, and a computer program is stored thereon, and when being executed by a processor, the computer program implements a motion recognition method based on inter-joint association modeling as described in the first embodiment of the present application.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a flowchart of an action identification method based on a fast-slow dual-flow graph convolutional neural network according to an embodiment of the present application.
FIG. 2 is a schematic structural diagram of a fast-slow dual-flow graph convolutional neural network according to an embodiment of the present application;
FIG. 3 is a diagram illustrating the change of the feature shapes of the input features of the fast and slow branches with the increase of the number of the convolution blocks according to the embodiment of the present application;
fig. 4 is a schematic view of a transverse connection module according to an embodiment of the present application.
Fig. 5 is a schematic structural diagram of an action recognition system based on a fast-slow dual-flow graph convolutional neural network according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The following describes an action recognition method based on a fast-slow dual-flow graph convolutional neural network according to an embodiment of the present invention with reference to the accompanying drawings.
As shown in fig. 1, to achieve the above object, an embodiment of a first aspect of the present invention provides a method for identifying an action based on a fast-slow dual-flow graph convolutional neural network, including the following steps:
step S10, obtaining human skeleton joint characteristics;
step S20, regularizing the human body skeleton joint features, wherein the shapes of a batch of the human body skeleton joint features are deformed, a one-dimensional regularization module is used for regularizing a time sequence dimension, and the shapes of a batch of the human body skeleton joint features are deformed into original shapes again;
step S30, copying the human body skeleton joint features processed in the step S20 to generate two identical human body skeleton joint features, inputting the two identical human body skeleton joint features to a fast branch and a slow branch of a fast-slow double-flow graph convolution network respectively for feature learning, and fusing learning results of the fast branch and the slow branch to obtain features of each action category, wherein the fast branch and the slow branch of the fast-slow double-flow graph convolution network have the same network structure and have different network parameter configurations and input features;
and step S40, performing dimensionality elimination on the features of each action category through the global pooling layer, mapping the features subjected to dimensionality elimination to corresponding action categories through the full connection layer, and obtaining the score of each action category through a Softmax function.
In an embodiment of the present application, the step S10 further includes the following steps:
human skeletal joint features are obtained from public data sets such as NTU RGB + D, and the feature shape of each sample is as follows:
(C,T,M,V)
wherein C is the number of characteristic channels, has a value of 3 and represents the three-dimensional coordinates (x, y, z) of the joint points; t represents the number of frames of the action; m represents the number of persons performing the action; v represents the number of human joint points.
In an embodiment of the present application, the step S20 further includes the following steps:
carrying out regularization processing on data, using batch training in the training process, wherein the characteristic shape of the tensor of one batch is as follows:
(B,C,T,M,V)
firstly, deforming the one-dimensional batch tensor into the following steps:
(B,M*V*C,T)
and then, using one-dimensional batch regularization module to regularize the time sequence T dimension, and re-deform the features into the original shape (B, C, T, M, V).
As shown in fig. 2, our network structure contains two branches, which we call fast and slow branches, respectively.
In an embodiment of the present application, further, the specific steps in step S30 include:
each branch comprises a plurality of continuously superposed graph convolution blocks, and each graph convolution block comprises a space graph convolution layer and a time sequence convolution layer; the time sequence convolution layer is a two-dimensional convolution module, the size of a convolution kernel is (t,1), and t is the time sequence feeling of the convolution kernel; after the two convolution layers, a batch regularization layer and a ReLU activation function are attached, so that the characteristics of each channel are ensured to keep the same distribution; the calculation of the convolution block is described using the following formula:
Figure BDA0003060294420000061
Bkand CkIs an adaptive adjacency matrix proposed in 2s-AGCN, which changes during network training, wherein BkIs set to A at initializationkFor learning the potential association of any two nodes; ckIs a matrix calculated according to the sample characteristics and used for describing the specific node association of the sample.
Optionally, in an embodiment of the present application, the following two formulas respectively describe the feature shapes of the input features of the map convolution block at the same stage:
ffast i n=(B,βC,αT,V,M)
fslow i n=(B,C,T,V,M)
the timing dimension of the fast branch is always at1α is a positive integer representing the input frame rate of the fast branch and the frame rate of the slow branch in the initial input signatureRatio, in fast branch, number of channels betaiCiIs significantly less than the channel number C of the convolution block of the same-stage slow branch graphiWhere i is the block number, βiIs a value less than 1, e.g., 1/3, and V for both branches is consistent, both being the number of graph nodes.
In one embodiment of the present application, further assuming that there are N convolutional blocks in the network structure, in the slow branch, in the time-series convolutional layer in the convolutional block of the graph, we will reduce the frame rate by the step size of the time-series convolutional layer, thus there is T1≥T2≥…≥TN(ii) a On the other hand, in each tile, the number of output channels will gradually increase with the increase of the tile to improve the capture capability of the slow branch for the graph space structure information, so there is C1≤C2≤…≤CN. In the fast branch, in the time sequence convolution layers of all the graph convolution blocks, the step length of the convolution kernel is set to 1 to ensure that the frame rate is not reduced, therefore, the time sequence dimension of the fast branch is always aT1And alpha is a positive integer and represents the ratio of the input frame rate of the fast branch to the frame rate of the slow branch in the initial input features. In the fast branch, the number of channels betaiCiIs significantly less than the channel number C of the convolution block of the same-stage slow branch graphiWhere i is the block number, βiIs a value less than 1, such as 1/3. The V of the two branches is identical, both being the number of graph nodes.
In one embodiment of the present application, further, as shown in FIG. 4, the information learned by both fast and slow branches is shared using a cross-connect module, merging from the fast branch to the slow branch, since
Figure BDA0003060294420000062
And
Figure BDA0003060294420000063
the feature shapes of (A) and (B) are (B, beta C, alpha T, V, M) and (B, C, T, M, V), firstly, a two-dimensional convolution layer is adopted for feature shape conversion, and a batch regularization layer are added after the feature shape conversion is carried outReLU function, then the two features are fused in a splicing or adding manner.
We first use a two-dimensional convolutional layer for feature shape transformation, and then add batch regularization layer and ReLU function, and then fuse the two features by means of splicing or addition. The above process can be described by the following formula.
Figure BDA0003060294420000071
Figure BDA0003060294420000072
Figure BDA0003060294420000073
Wherein Conv2D is a two-dimensional convolutional layer, BN is a batch regularization layer, ReLU is an activation function, Fuse is a fusion function, and the fusion mode can adopt modes such as summation (Sum) and splicing (coordination), and the two modes have close performance.
Further, the present embodiment employs a cross connection module inserted between two branches to share information between two modules. In the experiment of this embodiment, you have used 10 convolution blocks, where the number of input channels of the fast branch and the slow branch in each convolution block is 3, 128, 256, 512 and 3, 32, 64, 128, respectively.
In an embodiment of the application, in step S40, the final feature obtained in step S30 is eliminated by a global pooling layer through three dimensions of time sequence T, graph node V, and number M of people, the feature is mapped to each action category through a full connection layer, and finally, a score of each action category is obtained through a Softmax function.
To achieve the above object, as shown in fig. 5, a second aspect of the present application provides a fast-slow dual-flow graph convolutional neural network-based action recognition system according to the present invention, which includes the following modules:
the acquisition module is used for acquiring the characteristics of the human skeleton joints;
the processing module is used for carrying out regularization processing on the human body skeleton joint features, wherein the shapes of a batch of the human body skeleton joint features are deformed, a one-dimensional regularization module is used for regularizing a time sequence dimension, and then the shapes of a batch of the human body skeleton joint features are deformed into the original shapes again;
the generating module is used for copying the human body skeleton joint features processed by the processing module, generating two identical human body skeleton joint features, inputting the two identical human body skeleton joint features to a fast branch and a slow branch of a fast-slow double-flow graph convolution network respectively for feature learning, and fusing learning results of the fast branch and the slow branch to obtain the features of each action category, wherein the fast branch and the slow branch of the fast-slow double-flow graph convolution network have the same network structure and have different network parameter configurations and input features;
and the determining module is used for carrying out dimensionality elimination on the features of each action category through the global pooling layer, mapping the features subjected to dimensionality elimination to the corresponding action categories through the full connection layer, and obtaining the score of each action category through a Softmax function.
In order to implement the foregoing embodiments, the present invention further provides a computer device, where the computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method implements the method for identifying an action based on a fast-slow dual-flow graph convolutional neural network according to the embodiments of the present application.
In order to implement the foregoing embodiments, the present invention further provides a non-transitory computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the method for identifying an action based on a fast-slow dual-flow graph convolutional neural network according to an embodiment of the present application is implemented.
Although the present application has been disclosed in detail with reference to the accompanying drawings, it is to be understood that such description is merely illustrative and not restrictive of the application of the present application. The scope of the present application is defined by the appended claims and may include various modifications, adaptations, and equivalents of the invention without departing from the scope and spirit of the application.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. A motion recognition method based on a fast and slow dual-flow graph convolutional neural network is characterized by comprising the following steps:
step S10, obtaining human skeleton joint characteristics;
step S20, regularizing the human body skeleton joint features, wherein the shapes of a batch of the human body skeleton joint features are deformed, a one-dimensional regularization module is used for regularizing a time sequence dimension, and the shapes of a batch of the human body skeleton joint features are deformed into original shapes again;
step S30, copying the human body skeleton joint features processed in the step S20 to generate two identical human body skeleton joint features, inputting the two identical human body skeleton joint features to a fast branch and a slow branch of a fast-slow double-flow graph convolution network respectively for feature learning, and fusing learning results of the fast branch and the slow branch to obtain features of each action category, wherein the fast branch and the slow branch of the fast-slow double-flow graph convolution network have the same network structure and have different network parameter configurations and input features;
and step S40, performing dimensionality elimination on the features of each action category through the global pooling layer, mapping the features subjected to dimensionality elimination to corresponding action categories through the full connection layer, and obtaining the score of each action category through a Softmax function.
2. The method of claim 1, wherein the step S10 includes the steps of:
human skeleton joint features are obtained from the data set, and the feature shape of each sample is as follows:
(C,T,M,V)
wherein C is the number of characteristic channels, has a value of 3 and represents the three-dimensional coordinates (x, y, z) of the joint points; t represents the number of frames of the action; m represents the number of persons performing the action; v represents the number of human joint points.
3. The method of claim 1, wherein the step S20 includes the steps of:
carrying out regularization processing on data, using batch training in the training process, wherein the characteristic shape of the tensor of one batch is as follows:
(B,C,T,M,V)
firstly, deforming the one-dimensional batch tensor into the following steps:
(B,M*V*C,T)
and then, using one-dimensional batch regularization module to regularize the time sequence T dimension, and re-deform the features into the original shape (B, C, T, M, V).
4. The method as claimed in claim 1, wherein the step S30 includes the following steps:
each branch comprises a plurality of continuously superposed graph convolution blocks, and each graph convolution block comprises a space graph convolution layer and a time sequence convolution layer; the time sequence convolution layer is a two-dimensional convolution module, the size of a convolution kernel is (t,1), and t is a time sequence receptive field of the convolution kernel; after the two convolution layers, a batch regularization layer and a ReLU activation function are attached, so that the characteristics of each channel are ensured to keep the same distribution; the calculation of the convolution block is described using the following formula:
Figure FDA0003060294410000021
Bkand CkIs an adaptive adjacency matrix proposed in 2s-AGCN, which changes during network training, wherein BkIs set to A at initializationkFor learning the potential association of any two nodes; ckIs a matrix calculated according to the sample characteristics and used for describing the specific node association of the sample.
5. The method of claim 4, wherein the following two formulas describe the feature shapes of the input features of the map convolution block at the same stage respectively:
ffast i n=(B,βC,αT,V,M)
fslow i n=(B,C,T,V,M)
the timing dimension of the fast branch is always at1Alpha is a positive integer representing the ratio of the input frame rate of the fast branch, in which the number of channels beta, to the frame rate of the slow branch in the initial input signatureiCiIs significantly less than the channel number C of the convolution block of the same-stage slow branch graphiWhere i is the block number, βiIs a value less than 1, e.g., 1/3, and V for both branches is consistent, both being the number of graph nodes.
6. The method of claim 4, wherein the share cache is shared using a cross connect moduleThe information learned by the two slow branches is merged from the fast branch to the slow branch because
Figure FDA0003060294410000022
And
Figure FDA0003060294410000023
the feature shapes of (A, B, beta C, alpha T, V, M) and (B, C, T, M, V) are respectively (B, beta C, alpha T, V, M) and (B, C, T, M, V), firstly, a two-dimensional convolution layer is adopted to carry out feature shape conversion, a batch regularization layer and a ReLU function are added after feature shape conversion is carried out, and then the two features are fused in a splicing or adding mode.
7. The method as claimed in claim 1, wherein in step S40, the final feature obtained in step S30 is passed through a global pooling layer to eliminate three dimensions of time sequence T, graph nodes V and number M of people, and the feature is mapped to each action category through a full connection layer, and finally, the score of each action category is obtained through a Softmax function.
8. A motion recognition system based on a fast and slow biflow graph convolutional neural network is characterized by comprising:
the acquisition module is used for acquiring the characteristics of the human skeleton joints;
the processing module is used for carrying out regularization processing on the human body skeleton joint features, wherein the shapes of a batch of the human body skeleton joint features are deformed, a one-dimensional regularization module is used for regularizing a time sequence dimension, and then the shapes of a batch of the human body skeleton joint features are deformed into the original shapes again;
the generating module is used for copying the human body skeleton joint features processed by the processing module, generating two identical human body skeleton joint features, inputting the two identical human body skeleton joint features to a fast branch and a slow branch of a fast-slow double-flow graph convolution network respectively for feature learning, and fusing learning results of the fast branch and the slow branch to obtain the features of each action category, wherein the fast branch and the slow branch of the fast-slow double-flow graph convolution network have the same network structure and have different network parameter configurations and input features;
and the determining module is used for carrying out dimensionality elimination on the features of each action category through the global pooling layer, mapping the features subjected to dimensionality elimination to the corresponding action categories through the full connection layer, and obtaining the score of each action category through a Softmax function.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1-7 when executing the computer program.
10. A non-transitory computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method of any one of claims 1-7.
CN202110510781.9A 2021-05-11 2021-05-11 Action identification method and system based on fast and slow dual-flow graph convolutional neural network Active CN113158970B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110510781.9A CN113158970B (en) 2021-05-11 2021-05-11 Action identification method and system based on fast and slow dual-flow graph convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110510781.9A CN113158970B (en) 2021-05-11 2021-05-11 Action identification method and system based on fast and slow dual-flow graph convolutional neural network

Publications (2)

Publication Number Publication Date
CN113158970A true CN113158970A (en) 2021-07-23
CN113158970B CN113158970B (en) 2023-02-07

Family

ID=76874442

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110510781.9A Active CN113158970B (en) 2021-05-11 2021-05-11 Action identification method and system based on fast and slow dual-flow graph convolutional neural network

Country Status (1)

Country Link
CN (1) CN113158970B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114201475A (en) * 2022-02-16 2022-03-18 北京市农林科学院信息技术研究中心 Dangerous behavior supervision method and device, electronic equipment and storage medium
CN114550027A (en) * 2022-01-18 2022-05-27 清华大学 Vision-based motion video fine analysis method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
CN110059598A (en) * 2019-04-08 2019-07-26 南京邮电大学 The Activity recognition method of the long time-histories speed network integration based on posture artis
CN111325099A (en) * 2020-01-21 2020-06-23 南京邮电大学 Sign language identification method and system based on double-current space-time diagram convolutional neural network
US20200285944A1 (en) * 2019-03-08 2020-09-10 Adobe Inc. Graph convolutional networks with motif-based attention
CN111860128A (en) * 2020-06-05 2020-10-30 南京邮电大学 Human skeleton behavior identification method based on multi-stream fast-slow graph convolution network
CN112131908A (en) * 2019-06-24 2020-12-25 北京眼神智能科技有限公司 Action identification method and device based on double-flow network, storage medium and equipment
CN112183313A (en) * 2020-09-27 2021-01-05 武汉大学 SlowFast-based power operation field action identification method
CN112381004A (en) * 2020-11-17 2021-02-19 华南理工大学 Framework-based double-flow self-adaptive graph convolution network behavior identification method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
US20200285944A1 (en) * 2019-03-08 2020-09-10 Adobe Inc. Graph convolutional networks with motif-based attention
CN110059598A (en) * 2019-04-08 2019-07-26 南京邮电大学 The Activity recognition method of the long time-histories speed network integration based on posture artis
CN112131908A (en) * 2019-06-24 2020-12-25 北京眼神智能科技有限公司 Action identification method and device based on double-flow network, storage medium and equipment
CN111325099A (en) * 2020-01-21 2020-06-23 南京邮电大学 Sign language identification method and system based on double-current space-time diagram convolutional neural network
CN111860128A (en) * 2020-06-05 2020-10-30 南京邮电大学 Human skeleton behavior identification method based on multi-stream fast-slow graph convolution network
CN112183313A (en) * 2020-09-27 2021-01-05 武汉大学 SlowFast-based power operation field action identification method
CN112381004A (en) * 2020-11-17 2021-02-19 华南理工大学 Framework-based double-flow self-adaptive graph convolution network behavior identification method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CHENG-HUNG LIN等: "SlowFast-GCN: A Novel Skeleton-Based Action Recognition Framework", 《2020 INTERNATIONAL CONFERENCE ON PERVASIVE ARTIFICIAL INTELLIGENCE (ICPAI)》 *
LEI SHI等: "Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition", 《2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
NING SUN等: "Multi-stream slowFast graph convolutional networks for skeleton-based action recognition", 《IMAGE AND VISION COMPUTING》 *
张怡佳等: "基于双流卷积神经网络的改进人体行为识别算法", 《计算机测量与控制》 *
陈丽等: "多视角数据融合的特征平衡YOLOv3行人检测研究", 《智能***学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114550027A (en) * 2022-01-18 2022-05-27 清华大学 Vision-based motion video fine analysis method and device
CN114201475A (en) * 2022-02-16 2022-03-18 北京市农林科学院信息技术研究中心 Dangerous behavior supervision method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113158970B (en) 2023-02-07

Similar Documents

Publication Publication Date Title
CN112308200B (en) Searching method and device for neural network
CN109558862B (en) Crowd counting method and system based on attention thinning framework of space perception
CN111476719B (en) Image processing method, device, computer equipment and storage medium
Zhang et al. Progressive hard-mining network for monocular depth estimation
CN113449857A (en) Data processing method and data processing equipment
CN111738231A (en) Target object detection method and device, computer equipment and storage medium
CN111667459B (en) Medical sign detection method, system, terminal and storage medium based on 3D variable convolution and time sequence feature fusion
CN111480169A (en) Method, system and apparatus for pattern recognition
CN111476806B (en) Image processing method, image processing device, computer equipment and storage medium
CN113158970B (en) Action identification method and system based on fast and slow dual-flow graph convolutional neural network
CN111754396A (en) Face image processing method and device, computer equipment and storage medium
CN111833360B (en) Image processing method, device, equipment and computer readable storage medium
JP7357176B1 (en) Night object detection, training method and device based on self-attention mechanism in frequency domain
CN111160225B (en) Human body analysis method and device based on deep learning
US20230326173A1 (en) Image processing method and apparatus, and computer-readable storage medium
CN110222718A (en) The method and device of image procossing
CN111079507A (en) Behavior recognition method and device, computer device and readable storage medium
Zhang et al. Progressive point cloud upsampling via differentiable rendering
CN110688897A (en) Pedestrian re-identification method and device based on joint judgment and generation learning
He et al. Learning scene dynamics from point cloud sequences
CN114359289A (en) Image processing method and related device
Angelopoulou et al. Fast 2d/3d object representation with growing neural gas
CN113554656B (en) Optical remote sensing image example segmentation method and device based on graph neural network
CN113065529A (en) Motion recognition method and system based on inter-joint association modeling
CN111274901B (en) Gesture depth image continuous detection method based on depth gating recursion unit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant