CN111353447B - Human skeleton behavior recognition method based on graph convolution network - Google Patents

Human skeleton behavior recognition method based on graph convolution network Download PDF

Info

Publication number
CN111353447B
CN111353447B CN202010146319.0A CN202010146319A CN111353447B CN 111353447 B CN111353447 B CN 111353447B CN 202010146319 A CN202010146319 A CN 202010146319A CN 111353447 B CN111353447 B CN 111353447B
Authority
CN
China
Prior art keywords
graph
skeleton
frame
sequence
connection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010146319.0A
Other languages
Chinese (zh)
Other versions
CN111353447A (en
Inventor
曹江涛
赵挺
洪恺临
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liaoning Shihua University
Original Assignee
Liaoning Shihua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liaoning Shihua University filed Critical Liaoning Shihua University
Priority to CN202010146319.0A priority Critical patent/CN111353447B/en
Publication of CN111353447A publication Critical patent/CN111353447A/en
Application granted granted Critical
Publication of CN111353447B publication Critical patent/CN111353447B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A human skeleton behavior recognition method based on graph convolution network belongs to the field of computer vision and deep learning, and comprises the steps of obtaining a human skeleton video frame and carrying out normalization processing; constructing an intrinsic-dependent connection diagram of a human joint corresponding to each frame diagram, and constructing an extrinsic-dependent connection diagram and an interactive-dependent connection diagram of an individual; obtaining all joint connection diagrams of the interaction whole; weight values are distributed to each edge of each connection graph of the human joint; performing graph convolution operation to obtain the spatial characteristics of the skeleton sequence; and (5) performing time sequence modeling by using a long-short period memory network to obtain the corresponding category of the interaction behavior. According to the invention, the internal dependence connection side can learn basic human behavior characteristics, the external dependence connection side can learn additional behavior characteristics, the interactive dependence connection side can better learn the interaction relationship of two persons, and the motion relationship of the interaction behavior of two persons can be better represented, so that the recognition performance is improved.

Description

Human skeleton behavior recognition method based on graph convolution network
Technical Field
The invention belongs to the technical field of computer vision and deep learning, and particularly relates to a human skeleton behavior recognition method based on a graph rolling network.
Background
The human behavior recognition and understanding based on the video is a leading direction which is focused in the fields of image processing and computer vision, and the behavior recognition is widely applied to the fields of video analysis, intelligent monitoring, human-computer interaction, augmented reality, video retrieval and the like along with the technology fusion and development of deep learning and computer vision. Double interaction behavior is more common and difficult in daily life than single action. The double interaction behavior is mainly divided into research based on RGB and skeleton node data. The traditional RGB video has poor robustness due to factors such as illumination change, shielding, complex background and the like. The skeleton-based joint point data contains compact three-dimensional positions of the main body joints, and is robust to changes in viewpoint, body dimensions and movement speed. Therefore, behavior recognition based on skeletal node data has received increasing attention in recent years.
The double interaction behavior recognition method based on the skeleton joint point mainly comprises two main categories, namely a manual feature-based method and a deep learning-based method. For the first class, vemulapalli [1] The human skeleton is represented by the et al as a point in the Lie group, and time modeling and classification are implemented in Lie algebra. Weng [2] The Naive Bayesian Nearest Neighbor (NBNN) method is extended to space-time and uses stage-to-class distances to classify behavior. The characteristic design price of the method is complex, and the identification accuracy is difficult to further improve. The deep learning feature-based method can be further divided into a CNN-based model and an RNN-based model. For the CNN-based method, the joint point data are converted into pictures and then sent into a network for learning classification. Such methods ignore timing information in the video. For RNN-based methods, time-series information can be effectively modeled, but dependencies between joints and interaction relationships of two persons are ignored. (see [1 ] for details]Raviteja Vemulapalli,Felipe Arrate,and Rama Chellappa.Human action recognition by representing 3d skeletons as points in a lie group.In CVPR,pages 588–595,2014.[2]Junwu Weng,Chaoqun Weng,and Junsong Yuan.Spatiotemporal naive-bayes nearest-neighbor for skeleton-based action recognition.In CVPR,pages 4171–4180,2017.)。
Recently, with the popular application of graph roll-up networks (GCN, graph Convolutional Network), many researchers have also used the GCN method to conduct experiments in the field of behavior recognition. However, the current research is mainly aimed at single person behavior and mostly adopts human body natural connection diagrams, and ignores the dependency relationship among human body non-natural connection joints. In the existing application of double interaction, two persons are divided into two individuals to be respectively modeled, and interaction dependency relationship between the two persons is ignored.
Disclosure of Invention
Aiming at the problems and the shortcomings of the prior art, the invention provides a double interaction behavior identification method based on a graph rolling network, which comprises the steps of obtaining a double interaction skeleton video; normalizing the coordinates of the joint points of the acquired video; constructing an intra-human joint dependency graph, an individual external dependency graph and an inter-dependency graph; different weights are distributed to the connecting edges of the three joint connection diagrams; sending the spatial features into a graph convolution network for learning and extracting the spatial features; based on the spatial characteristics obtained by each frame, sending the spatial characteristics into a long-short-period memory network for time sequence modeling; and obtaining the recognition result of the interactive behavior category.
The method specifically comprises the following steps:
step S10, shooting video: starting a camera, recording double interactive videos, collecting skeleton videos of various interactive actions of different action executives as training videos of the interactive actions, marking the interactive action meanings of the various training videos, and establishing a video training set.
Step S20, carrying out normalization processing on a preset video frame in the acquired skeleton video to serve as a skeleton sequence to be identified.
Step S30, for each frame of image in the skeleton sequence to be identified, constructing a corresponding human joint internal dependent connection image according to the joint point coordinates, wherein the joint points are nodes of the image, and natural connection between the joint points is an internal dependent connection edge of the image; constructing external dependency connection edges of single persons and interactive dependency connection edges of double persons, and forming a human body joint connection diagram of each frame of the skeleton sequence to be identified;
step S40, respectively distributing weights to edges of three joint connection graphs corresponding to each frame graph of the skeleton sequence to be identified, and obtaining corresponding human joint connection graphs with different weight values;
step S50, performing graph convolution operation on the human body joint connection graphs with different weight values corresponding to each frame of the skeleton sequence to be identified, and obtaining the spatial characteristics of the skeleton sequence to be identified;
and step S60, performing time sequence modeling on the time dimension based on the spatial characteristics of the skeleton sequence to be identified, and obtaining the behavior category of the skeleton sequence to be identified.
Further, "a frame of a preset video in the acquired skeleton video is normalized and then used as a skeleton sequence to be identified", the method is as follows:
step S11, performing preset equidistant sampling on the obtained original skeleton video to serve as a training and recognition skeleton sequence;
step S12, carrying out rotation, translation and scale normalization processing on the joint point coordinates of each frame in the obtained skeleton sequence to obtain the skeleton sequence to be identified, wherein the specific method comprises the following steps:
Figure SMS_1
wherein the method comprises the steps of
Figure SMS_2
The ith coordinate value for the original acquired T-th frame, J and T represent the set of the node and the acquired frame,
Figure SMS_3
is the processed coordinate value;
rotation matrix R and rotation origin o R The definition is as follows:
Figure SMS_4
Figure SMS_5
wherein v is 1 And v 2 Is the vector perpendicular to the ground and the difference vector between the left and right hip joints of the original skeleton in each sequence,
Figure SMS_6
and v 1 ×v 2 Respectively represent v 1 And v 2 Vector projection on and the outer product of these two vectors, +.>
Figure SMS_7
And->
Figure SMS_8
The coordinates of the left and right hip joints of the initial skeleton of each sequence are represented.
Further, "for each frame of image in the skeleton sequence to be identified, constructing a corresponding internal dependent connection image of the human joint according to the coordinate of the joint points, wherein the joint points are nodes of the image, and natural connection between the joint points is an internal dependent connection edge of the image; constructing a single external dependency connecting edge and a double interactive dependency connecting edge, which form a human body joint connection diagram of each frame of a skeleton sequence to be identified, wherein the method comprises the following steps:
human body modeling is carried out on each frame by regarding each frame double interaction as a whole structure G (x, W) graph, wherein
Figure SMS_9
Three-dimensional coordinates of 2N joints are included, W is a 2N x 2N weighted adjacency matrix:
Figure SMS_10
(w 1,2 ) mn =γ, first person node m and second person node n
Wherein alpha, beta, gamma respectively represent weights corresponding to the intrinsic dependency, the extrinsic dependency and the interactive dependency.
Further, "weights are respectively allocated to edges of three joint connection graphs corresponding to each frame graph of the skeleton sequence to be identified, so as to obtain corresponding human joint connection graphs with different weight values", and the method comprises the following steps:
α=3, β=1, γ=5 to emphasize internal connection relationships, and additional external connection relationships, highlighting inter-connection relationships.
Further, "carrying out graph convolution operation on the human body joint connection graph with different weight values corresponding to each frame graph of the skeleton sequence to be identified, and obtaining the spatial characteristics of the skeleton sequence to be identified", wherein the method comprises the following steps:
Figure SMS_11
wherein represents a graph convolution operation; />
Figure SMS_12
Representing the graph convolution kernel. W is a weighted adjacency matrix of the human body joint connection diagram.
The concrete graph convolution kernel is calculated as follows: the graph laplace normalizes over the spectral domain: l=i n -D -1/2 WD -1/2 Wherein D is the angular matrix, D ii =∑ j w ij Scaling L to
Figure SMS_13
Representation->
Figure SMS_14
Wherein lambda is max Is the maximum characteristic value of L, T k Is chebyshev polynomials. The convolution operation can be expressed as:
Figure SMS_15
here eta e eta 01 ...,η K-1 ]Is a training parameter and K is the size of the graph convolution kernel.
Further, "based on the spatial features of the skeleton sequence to be identified, performing convolution operation in the time dimension to obtain the behavior category of the skeleton sequence to be identified", the method is as follows:
and (3) for the spatial characteristic information of each frame obtained by the graph convolution operation, after being unfolded through the full-connection layer, the spatial characteristic information is sent into a long-period memory network for time sequence modeling, and is classified by adopting softmax to obtain a final interactive behavior classification result.
The invention has the advantages and effects that:
according to the double interaction behavior recognition method based on the graph rolling network, a weighted joint connection graph added with double interaction dependency relationship is constructed, the graph rolling network is adopted to obtain the double interaction space characteristics with discriminant, and then the double interaction space characteristics are sent to the long-period memory network to obtain the dynamic time relationship for modeling, so that recognition accuracy is improved.
Drawings
FIG. 1 is a flow chart of a double interaction behavior recognition method based on a graph convolution network;
FIG. 2 is a schematic illustration of an intra-articular, and inter-articular graph constructed in accordance with the present invention;
FIG. 3 is a flowchart of an algorithm of the present invention;
FIG. 4 is a LSTM module cell diagram;
fig. 5 is a confusion matrix of the invention for the NTU rgb+d dataset test results.
Detailed Description
The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
The invention discloses a double interaction behavior identification method based on a graph rolling network, which comprises the following steps:
step S10, shooting video: starting a camera, recording double interactive videos, collecting skeleton videos of various interactive actions of different action executives as training videos of the interactive actions, marking the interactive action meanings of the various training videos, and establishing a video training set.
Step S20, carrying out normalization processing on a preset video frame in the acquired skeleton video to serve as a skeleton sequence to be identified.
Step S30, for each frame of image in the skeleton sequence to be identified, constructing a corresponding human joint internal dependent connection image according to the joint point coordinates, wherein the joint points are nodes of the image, and natural connection between the joint points is an internal dependent connection edge of the image; constructing external dependency connection edges of single persons and interactive dependency connection edges of double persons, and forming a human body joint connection diagram of each frame of the skeleton sequence to be identified;
step S40, respectively distributing weights to edges of three joint connection graphs corresponding to each frame graph of the skeleton sequence to be identified, and obtaining corresponding human joint connection graphs with different weight values;
step S50, performing graph convolution operation on the human body joint connection graphs with different weight values corresponding to each frame of the skeleton sequence to be identified, and obtaining the spatial characteristics of the skeleton sequence to be identified;
and step S60, performing time sequence modeling on the time dimension based on the spatial characteristics of the skeleton sequence to be identified, and obtaining the behavior category of the skeleton sequence to be identified.
In order to more clearly describe the method for identifying double interaction behavior based on graph rolling network of the present invention, each step in the method embodiment of the present invention is described in detail below with reference to fig. 1.
Step S10, shooting video: starting a camera, recording double interactive videos, collecting skeleton videos of various interactive actions of different action executives as training videos of the interactive actions, marking the interactive action meanings of the various training videos, and establishing a video training set.
With the development of image processing technology, microsoft Kinect cameras can be directly adopted to obtain skeleton videos of two people with interactive behaviors, and corresponding node data are stored.
Step S20, carrying out normalization processing on a preset video frame in the acquired skeleton video to serve as a skeleton sequence to be identified.
Due to character change and visual angle change in shooting, normalization processing is carried out on the character change and visual angle change in the data processing stage, and the specific method comprises the following steps:
Figure SMS_16
wherein the method comprises the steps of
Figure SMS_17
The ith coordinate value for the original acquired T-th frame, J and T represent the set of the node and the acquired frame,
Figure SMS_18
is the processed coordinate value;
rotation matrix R and rotation origin o R The definition is as follows:
Figure SMS_19
Figure SMS_20
wherein v is 1 And v 2 Is the vector perpendicular to the ground and the difference vector between the left and right hip joints of the original skeleton in each sequence,
Figure SMS_21
and v 1 ×v 2 Respectively represent v 1 And v 2 Vector projection on and the outer product of these two vectors, +.>
Figure SMS_22
And->
Figure SMS_23
The coordinates of the left and right hip joints of the initial skeleton of each sequence are represented.
Step S30, for each frame of image in the skeleton sequence to be identified, constructing a corresponding human joint internal dependent connection image according to the joint point coordinates, wherein the joint points are nodes of the image, and natural connection between the joint points is an internal dependent connection edge of the image; constructing external dependency connection edges of single person and interactive dependency connection edges of double persons, and forming human body joint connection diagrams of each frame of a skeleton sequence to be identified together by the three parts, wherein the method comprises the following steps:
human body modeling is carried out on each frame by regarding each frame double interaction as a whole structure G (x, W) graph, wherein
Figure SMS_24
Three-dimensional coordinates of 2N joints are included, W is a 2N x 2N weighted adjacency matrix:
Figure SMS_25
(w 1,2 ) mn =γ, first person node m and second person node n
Wherein alpha, beta, gamma respectively represent weights corresponding to the intrinsic dependency, the extrinsic dependency and the interactive dependency.
Step S40, respectively distributing weights to edges of three joint connection graphs corresponding to each frame graph of the skeleton sequence to be identified, and obtaining corresponding human joint connection graphs with different weight values:
weight assignment, α=3, β=1, γ=5 to emphasize internal connection relationships, and additional external connection relationships, highlighting inter-connection relationships.
Step S50, performing graph convolution operation on the human body joint connection graph with different weight values corresponding to each frame graph of the skeleton sequence to be identified, and obtaining the spatial characteristics of the skeleton sequence to be identified:
given a T-frame video, a graph G is constructed according to the method of claim 3 1 ,G 2 ,...,G T ]Graph G constructed for each t-frame T It is input into the picture scroll layer:
Figure SMS_26
wherein represents a graph convolution operation;
Figure SMS_27
representing the graph convolution kernel. W is a weighted adjacency matrix of the human body joint connection diagram.
The concrete graph convolution kernel is calculated as follows:
the graph laplace normalizes over the spectral domain: l=i n -D -1/2 WD -1/2 Wherein D is the angular matrix, D ii =∑ j w ij Scaling L to
Figure SMS_28
Representation->
Figure SMS_29
Wherein lambda is max Is the maximum characteristic value of L, T k Is chebyshev polynomials. The convolution operation can be expressed as:
Figure SMS_30
here eta e eta 01 ...,η K-1 ]Is a training parameter and K is the size of the graph convolution kernel.
Step S60, based on the spatial characteristics of the skeleton sequence to be identified, performing convolution operation on the time dimension to obtain the behavior category of the skeleton sequence to be identified:
spatial feature information f for each frame obtained by the graph convolution operation t And after the full-connection layer is unfolded, the full-connection layer is sent into a long-short period memory network for time sequence modeling, and the full-connection layer is classified by adopting softmax to obtain a final interactive behavior recognition result.
A dataset of the validation algorithm is presented. The NTU rgb+d dataset is the largest current skeleton-based behavior recognition dataset, has more than 56000 sequences and 400 ten thousand frames, and has 60 types of actions in total, and each skeleton has 25 nodes, and relates to single person actions and double person actions. In this embodiment, 11 kinds of double interaction behaviors in NTU rgb+d are adopted as the data set.
There are two types of protocols for the evaluation method of the dataset: cross-subjects (CS) and cross-view (CV). The proposed method is evaluated herein using CV standards.
According to the CV evaluation criteria, camera number 2,3 captured data for training and camera number 1 captured data for testing. The final recognition rate is 88%, and the obvious recognition effect is achieved. The confusion matrix is shown in fig. 4.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will fall within the scope of the present invention.

Claims (4)

1. A human skeleton behavior recognition method based on a graph rolling network is characterized by comprising the following steps of: the identification method comprises the steps of obtaining a double interaction skeleton video; normalizing the coordinates of the joint points of the acquired video; constructing an intra-human joint dependency graph, an individual external dependency graph and an inter-dependency graph; different weights are distributed to the connecting edges of the three joint connection diagrams; sending the spatial features into a graph convolution network for learning and extracting the spatial features; based on the spatial characteristics obtained by each frame, sending the spatial characteristics into a long-short-period memory network for time sequence modeling; obtaining an identification result of the interactive behavior category;
the identification method specifically comprises the following steps:
step S10, shooting video: starting a camera, recording double interactive videos, collecting skeleton videos of various interactive actions of different action executives as interactive action training videos, marking the interactive action meanings of the various training videos, and establishing a video training set;
step S20, carrying out normalization processing on a preset video frame in the acquired skeleton video to serve as a skeleton sequence to be identified;
step S30, for each frame of image in the skeleton sequence to be identified, constructing a corresponding human joint internal dependent connection image according to the joint point coordinates, wherein the joint points are nodes of the image, and natural connection between the joint points is an internal dependent connection edge of the image; constructing external dependency connection edges of single persons and interactive dependency connection edges of double persons, and forming a human body joint connection diagram of each frame of the skeleton sequence to be identified;
step S40, respectively distributing weights to edges of three joint connection graphs corresponding to each frame graph of the skeleton sequence to be identified, and obtaining corresponding human joint connection graphs with different weight values;
step S50, performing graph convolution operation on the human body joint connection graphs with different weight values corresponding to each frame of the skeleton sequence to be identified, and obtaining the spatial characteristics of the skeleton sequence to be identified;
step S60, performing time sequence modeling on the time dimension based on the spatial characteristics of the skeleton sequence to be identified to obtain the behavior category of the skeleton sequence to be identified;
in the step S30, "for each frame of the frame sequence to be identified, a corresponding intra-articular dependent connection graph of the human is constructed according to the coordinates of the nodes, the nodes are nodes of the graph, and the natural connection between the nodes is an intra-articular dependent connection edge of the graph; constructing a single external dependency connecting edge and a double interactive dependency connecting edge, which form a human body joint connection diagram of each frame of a skeleton sequence to be identified, wherein the method comprises the following steps:
human body modeling is carried out on each frame by regarding each frame double interaction as a whole structure G (x, W) graph, wherein
Figure QLYQS_1
Three-dimensional coordinates of 2N joints are included, W is a 2N x 2N weighted adjacency matrix:
Figure QLYQS_2
(w 1,2 ) mn =γ, first person node m and second person node n
Wherein alpha, beta, gamma respectively represent weights corresponding to the internal dependency relationship, the external dependency relationship and the interactive dependency relationship;
in the step S40, "weights are respectively assigned to edges of three kinds of joint connection diagrams corresponding to each frame of the frame sequence to be identified, so as to obtain corresponding human joint connection diagrams with different weight values", the method is as follows:
α=3, β=1, γ=5 to emphasize internal connection relationships, and additional external connection relationships, highlighting inter-connection relationships.
2. The human skeleton behavior recognition method based on graph rolling network of claim 1, wherein the human skeleton behavior recognition method is characterized by comprising the following steps of: in the step S20, "the frame to be identified is a frame sequence to be identified after normalizing a preset video frame in the acquired frame video", the method is as follows:
step S11, performing preset equidistant sampling on the obtained original skeleton video to serve as a training and recognition skeleton sequence;
step S12, carrying out rotation, translation and scale normalization processing on the joint point coordinates of each frame in the obtained skeleton sequence to obtain the skeleton sequence to be identified, wherein the specific method comprises the following steps:
Figure QLYQS_3
wherein the method comprises the steps of
Figure QLYQS_4
For the ith coordinate value of the original acquired T-th frame, J and T represent the set of the node and the acquired frame,/for the node and the acquired frame>
Figure QLYQS_5
Is the processed coordinate value;
rotation matrix R and rotation origin o R The definition is as follows:
Figure QLYQS_6
Figure QLYQS_7
wherein v is 1 And v 2 Is the vector perpendicular to the ground and the difference vector between the left and right hip joints of the initial skeleton in each sequence, proj v1 (v 2 ) And v 1 ×v 2 Respectively represent v 1 And v 2 The vector projection on and the outer product of these two vectors,
Figure QLYQS_8
and->
Figure QLYQS_9
The coordinates of the left and right hip joints of the initial skeleton of each sequence are represented.
3. The human skeleton behavior recognition method based on graph rolling network according to claim 1, wherein in step S50, "the graph rolling operation is performed on the human joint connection graph with different weight values corresponding to each frame graph of the skeleton sequence to be recognized, so as to obtain the spatial feature of the skeleton sequence to be recognized", the method is as follows:
given a T frame video, construct graph [ G ] 1 ,G 2 ,...,G T ]Graph G constructed for each t-frame T It is input into the picture scroll layer:
Figure QLYQS_10
wherein represents a graph convolution operation;
Figure QLYQS_11
representing a graph convolution kernel, wherein W is a weighted adjacency matrix of the human body joint connection graph;
the concrete graph convolution kernel is calculated as follows:
the graph laplace normalizes over the spectral domain: l=i n -D -1/2 WD -1/2 Wherein D is the angular matrix, D ii =∑ j w ij Scaling L to
Figure QLYQS_12
Representation->
Figure QLYQS_13
Wherein lambda is max Is the maximum characteristic value of L, T k For chebyshev polynomials, the convolution operation can be expressed as:
Figure QLYQS_14
here eta e eta 01 ...,η K-1 ]Is a training parameter and K is the size of the graph convolution kernel.
4. The human skeleton behavior recognition method based on graph convolution network according to claim 1, wherein in step S60, "based on the spatial feature of the skeleton sequence to be recognized, a convolution operation is performed in a time dimension to obtain a behavior class of the skeleton sequence to be recognized", the method is as follows:
spatial feature information f for each frame obtained by the graph convolution operation t And after the full-connection layer is unfolded, the full-connection layer is sent into a long-short period memory network for time sequence modeling, and the full-connection layer is classified by adopting softmax to obtain a final interactive behavior recognition result.
CN202010146319.0A 2020-03-05 2020-03-05 Human skeleton behavior recognition method based on graph convolution network Active CN111353447B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010146319.0A CN111353447B (en) 2020-03-05 2020-03-05 Human skeleton behavior recognition method based on graph convolution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010146319.0A CN111353447B (en) 2020-03-05 2020-03-05 Human skeleton behavior recognition method based on graph convolution network

Publications (2)

Publication Number Publication Date
CN111353447A CN111353447A (en) 2020-06-30
CN111353447B true CN111353447B (en) 2023-07-04

Family

ID=71194272

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010146319.0A Active CN111353447B (en) 2020-03-05 2020-03-05 Human skeleton behavior recognition method based on graph convolution network

Country Status (1)

Country Link
CN (1) CN111353447B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329562B (en) * 2020-10-23 2024-05-14 江苏大学 Human interactive action recognition method based on skeleton characteristics and slicing recurrent neural network
CN112668550B (en) * 2021-01-18 2023-12-19 沈阳航空航天大学 Double interaction behavior recognition method based on joint point-depth joint attention RGB modal data
CN113128425A (en) * 2021-04-23 2021-07-16 上海对外经贸大学 Semantic self-adaptive graph network method for human action recognition based on skeleton sequence
CN113283400B (en) * 2021-07-19 2021-11-12 成都考拉悠然科技有限公司 Skeleton action identification method based on selective hypergraph convolutional network
CN113792712A (en) * 2021-11-15 2021-12-14 长沙海信智能***研究院有限公司 Action recognition method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
CN107301370A (en) * 2017-05-08 2017-10-27 上海大学 A kind of body action identification method based on Kinect three-dimensional framework models
CN110045823A (en) * 2019-03-12 2019-07-23 北京邮电大学 A kind of action director's method and apparatus based on motion capture
CN110197195A (en) * 2019-04-15 2019-09-03 深圳大学 A kind of novel deep layer network system and method towards Activity recognition

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9489570B2 (en) * 2013-12-31 2016-11-08 Konica Minolta Laboratory U.S.A., Inc. Method and system for emotion and behavior recognition
CN108304795B (en) * 2018-01-29 2020-05-12 清华大学 Human skeleton behavior identification method and device based on deep reinforcement learning
CA2995242A1 (en) * 2018-02-15 2019-08-15 Wrnch Inc. Method and system for activity classification
CN108764107B (en) * 2018-05-23 2020-09-11 中国科学院自动化研究所 Behavior and identity combined identification method and device based on human body skeleton sequence
CN108985259B (en) * 2018-08-03 2022-03-18 百度在线网络技术(北京)有限公司 Human body action recognition method and device
CN109376720B (en) * 2018-12-19 2022-01-18 杭州电子科技大学 Action classification method based on joint point space-time simple cycle network and attention mechanism
CN110222611B (en) * 2019-05-27 2021-03-02 中国科学院自动化研究所 Human skeleton behavior identification method, system and device based on graph convolution network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
CN107301370A (en) * 2017-05-08 2017-10-27 上海大学 A kind of body action identification method based on Kinect three-dimensional framework models
CN110045823A (en) * 2019-03-12 2019-07-23 北京邮电大学 A kind of action director's method and apparatus based on motion capture
CN110197195A (en) * 2019-04-15 2019-09-03 深圳大学 A kind of novel deep layer network system and method towards Activity recognition

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition;Chenyang Si等;arXiv;全文 *
Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks;Lei Shi等;arXiv;全文 *
基于CNN与双向LSTM的行为识别算法;吴潇颖等;计算机工程与设计(第02期);全文 *
基于图卷积的骨架行为识别;董安等;现代计算机(第02期);全文 *
基于整体和个体分割融合的双人交互行为识别;曹江涛等;辽宁石油化工大学学报;第39卷(第06期);全文 *

Also Published As

Publication number Publication date
CN111353447A (en) 2020-06-30

Similar Documents

Publication Publication Date Title
CN111353447B (en) Human skeleton behavior recognition method based on graph convolution network
CN109344701B (en) Kinect-based dynamic gesture recognition method
CN113196289B (en) Human body action recognition method, human body action recognition system and equipment
CN112800903B (en) Dynamic expression recognition method and system based on space-time diagram convolutional neural network
CN108388882B (en) Gesture recognition method based on global-local RGB-D multi-mode
CN112132197B (en) Model training, image processing method, device, computer equipment and storage medium
US20150332117A1 (en) Composition modeling for photo retrieval through geometric image segmentation
WO2021218238A1 (en) Image processing method and image processing apparatus
CN108280411A (en) A kind of pedestrian's searching method with spatial alternation ability
Yang et al. Facial expression recognition based on dual-feature fusion and improved random forest classifier
CN112836625A (en) Face living body detection method and device and electronic equipment
WO2022120843A1 (en) Three-dimensional human body reconstruction method and apparatus, and computer device and storage medium
CN106407978B (en) Method for detecting salient object in unconstrained video by combining similarity degree
CN110222718A (en) The method and device of image procossing
CN112820071A (en) Behavior identification method and device
CN108898269A (en) Electric power image-context impact evaluation method based on measurement
CN112036260A (en) Expression recognition method and system for multi-scale sub-block aggregation in natural environment
CN110517270A (en) A kind of indoor scene semantic segmentation method based on super-pixel depth network
CN114332911A (en) Head posture detection method and device and computer equipment
CN112906520A (en) Gesture coding-based action recognition method and device
CN115577768A (en) Semi-supervised model training method and device
CN115222896A (en) Three-dimensional reconstruction method and device, electronic equipment and computer-readable storage medium
CN109508660A (en) A kind of AU detection method based on video
Bhattacharya et al. Qdf: A face database with varying quality
Su et al. Local fusion attention network for semantic segmentation of building facade point clouds

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant