CN111582220A - Skeleton point behavior identification system based on shift diagram convolution neural network and identification method thereof - Google Patents

Skeleton point behavior identification system based on shift diagram convolution neural network and identification method thereof Download PDF

Info

Publication number
CN111582220A
CN111582220A CN202010419839.4A CN202010419839A CN111582220A CN 111582220 A CN111582220 A CN 111582220A CN 202010419839 A CN202010419839 A CN 202010419839A CN 111582220 A CN111582220 A CN 111582220A
Authority
CN
China
Prior art keywords
image
points
joint
vector
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010419839.4A
Other languages
Chinese (zh)
Other versions
CN111582220B (en
Inventor
张一帆
程科
程健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202010419839.4A priority Critical patent/CN111582220B/en
Publication of CN111582220A publication Critical patent/CN111582220A/en
Application granted granted Critical
Publication of CN111582220B publication Critical patent/CN111582220B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Social Psychology (AREA)
  • Computing Systems (AREA)
  • Psychiatry (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a convolutional neural network skeleton point behavior identification system based on a shift map, which comprises the following components: the behavior recognition system comprises an image acquisition module, an image processing module, an extraction module and a behavior recognition module, wherein the image acquisition module is used for acquiring a behavior image; the image processing module is used for processing the behavior image acquired by the image acquisition module to perform image processing; the extraction module is used for extracting the bone points of the image processed by the image processing module; the behavior identification module is used for identifying and extracting the bone point behavior characteristics extracted by the module. The invention designs a behavior recognition module to recognize the behavior of the skeleton point, reduces the novel graph convolution of the graph convolution calculation amount, is different from the traditional graph convolution, and the shift graph convolution does not expand the feeling range by expanding the convolution kernel but makes the graph characteristics perform shift splicing by novel shift operation, thereby achieving the same or even higher recognition accuracy under the condition of obviously reducing the calculation amount and improving the calculation speed, and avoiding the increase of the calculation amount of the traditional graph convolution along with the increase of the convolution kernel.

Description

Skeleton point behavior identification system based on shift diagram convolution neural network and identification method thereof
Technical Field
The invention relates to a convolutional neural network skeleton point behavior identification system based on a shift map, which relates to the field of general image data processing or generation G06T, in particular to the field of motion analysis of G06T 7/20.
Background
In the behavior recognition task, due to the constraints of data volume and algorithm, the behavior recognition model based on the RGB image is often interfered by the change of the viewing angle and the complex background, so that the generalization performance is insufficient, and the robustness in practical application is poor. While behavior recognition based on skeletal point data may solve this problem well.
In the skeletal point data, the human body is represented by coordinates of several predefined key joint points in the camera coordinate system. It can be conveniently obtained by a depth camera and various attitude estimation algorithms.
However, in this conventional graph convolution method, the modeled convolution kernel covers only a neighborhood of one point. However, in the skeletal point behavior recognition task, some behaviors (such as clapping) need to model the position relationship of points which are physically far apart (such as two hands). This requires increasing the convolution kernel size of the graph convolution model. However, the calculation amount of the graph convolution is increased along with the increase of the convolution kernel, so that the conventional graph convolution calculation amount is large.
Disclosure of Invention
The purpose of the invention is as follows: the system for recognizing the skeleton point behaviors based on the shift diagram convolution neural network is provided to solve the problems in the prior art.
The technical scheme is as follows: a shift-map convolutional neural network-based bone point behavior identification system, comprising:
the image acquisition module is used for acquiring behavior images;
the image processing module is used for processing the behavior image acquired by the image acquisition module to perform image processing;
the skeleton point extraction module is used for extracting the image processed by the image processing module;
and the behavior recognition module is used for extracting the behavior characteristics of the bone points by the recognition extraction module.
In a further embodiment, the image acquisition module is based on the image acquisition device, the image acquisition device is including being the camera that equilateral triangle placed, and set up the rotating device of camera afterbody, rotating device include with camera fixed connection's axis of rotation, cup joint the rotation motor of axis of rotation.
In a further embodiment, the image acquisition module performs human body shooting behaviors through three groups of cameras arranged in an equilateral triangle, and then the behavior images acquired by the three groups of cameras are installed in front of, behind and on the side of the computer terminal respectively, so that the image processing module can compare and process the images.
In a further embodiment, the image processing module mainly processes the human behavior image acquired by the image acquisition module into a human body edge map; traversing pixel points in the image by using a convolution 3 x 3 template when detecting the edge of the image through a Krisch edge detection operator, inspecting pixel gray values of adjacent areas around each pixel point one by one, and calculating the weighted sum difference of the gray values of three adjacent pixels and the gray weighted sum of the rest five pixels; the convolution template is as follows:
Figure DEST_PATH_IMAGE002
1 2 3 4
Figure DEST_PATH_IMAGE004
5 6 7 8
sequentially processing all pixels in the original image by using eight convolution templates, calculating to obtain the edge intensity of the pixels, detecting by using a threshold value, extracting the final edge point, and finishing edge detection;
the method for detecting the image edge by the Krisch operator comprises the following steps:
step 1, acquiring a data area pointer of an original image;
step 2, establishing two buffer areas, wherein the size of the buffer areas is the same as that of the original image, the buffer areas are mainly used for storing the original image and an original image copy, and the two buffer areas are initialized into the original image copy and are respectively marked as an image 1 and an image 2;
step 3, independently setting a Krisch template for convolution operation in each buffer area, respectively traversing pixels in the duplicate image in the two areas, performing convolution operation one by one, calculating results, comparing, storing a calculated comparative value into the image 1, and copying the image 1 into the cache image 2;
step 4, repeating the step 3, setting the remaining six templates at a time, performing calculation processing, and finally storing the larger gray values in the obtained image 1 and the image 2 in the buffer image 1;
and 5, copying the processed image 1 into original image data, and programming to realize edge processing of the image.
In a further embodiment, the extraction module is configured to extract skeleton points of the image processed by the image processing module, and after the image processing module finishes processing the image acquired by the image acquisition module, the skeleton points pre-recorded in the human body edge map are matched according to the closest acquired image actor body type, and then the matched skeleton points are displayed on the human body edge map.
In a further embodiment, the extracting module further includes a correcting module, when the image obtaining module obtains the human behavior image, due to different body types of people, when people with different body types perform the same group of actions, due to different skeleton sizes, three-dimensional coordinates of skeleton points are different, and therefore skeleton sizes need to be normalized to the same size;
firstly, selecting a skeleton of a person as a reference skeleton, selecting a body central point as a root node for a certain frame of skeleton data, calculating all vectors from points directly connected with the root node to the root node, obtaining a direction vector (the module length is 1) of each vector by using each vector as the module length of the vector, multiplying the direction vector by the length of the corresponding vector in the reference skeleton to obtain a vector, adding the vector to the coordinate of the root node to obtain the corrected coordinate of a certain point directly connected with the root node, recording the coordinates of the connected points as the coordinate values of the corresponding skeleton points after normalization, sequentially updating the coordinate values of the root node according to the sequence of a breadth-first search algorithm, and repeating the steps until the values of all the skeleton points are corrected, wherein the algorithm comprises the following steps:
inputting: the length of the limb in the reference appendage is
Figure DEST_PATH_IMAGE006
Preparing a normalized bone point coordinate value;
the first step is as follows: definition of
Figure DEST_PATH_IMAGE008
Is a root node coordinate;
the second step is that: will be provided with
Figure 475059DEST_PATH_IMAGE008
Giving an initial value of
Figure DEST_PATH_IMAGE010
A third step; to all of (
Figure DEST_PATH_IMAGE012
) Sequentially executing according to a breadth-first search strategy;
the fourth step: computing
Figure DEST_PATH_IMAGE014
-
Figure DEST_PATH_IMAGE016
The fifth step: computing
Figure DEST_PATH_IMAGE018
And a sixth step:
Figure 277930DEST_PATH_IMAGE008
+
Figure DEST_PATH_IMAGE020
will be
Figure DEST_PATH_IMAGE022
The value of (a) is saved to the set a;
the seventh step: returning to the third part, knowing that all limbs in the skeleton are traversed;
and (3) outputting: the coordinates of the skeleton points stored in the set A are corrected coordinates;
wherein ,
Figure DEST_PATH_IMAGE024
the value of (A) represents
Figure DEST_PATH_IMAGE026
The body part is provided with a plurality of limbs,
Figure 633431DEST_PATH_IMAGE006
represents the first in the reference valuation
Figure 685701DEST_PATH_IMAGE026
The length of each limb is determined by the length of the limb,
Figure 528892DEST_PATH_IMAGE012
respectively represent the first in the reference valuation
Figure 717428DEST_PATH_IMAGE026
Coordinate values of the starting node and the ending node of each limb, so as to obtain all the coordinates
Figure 371394DEST_PATH_IMAGE022
Calculating the values of the skeleton points to obtain all corrected skeleton point coordinates, and zooming the estimated size under the condition of ensuring that the included angle between the limbs is unchanged;
when the included angle between the limbs changes, the included angle between the vectors is selected to describe the skeleton points so as to avoid the skeleton point deviation when the included angle between the limbs changes;
the steps for solving the included angle of the human joint vector are as follows:
obtaining the angle of a certain joint point, firstly obtaining three joint points used in angle calculation, capturing three-dimensional coordinate values of the joint points by using Kinect, constructing a structural vector between the three joint points, and then obtaining the size of a joint vector included angle by adopting an inverse cosine law;
determining the angle of the first joint
Figure DEST_PATH_IMAGE028
For example;
selecting other two joint points connected with the first joint to obtain three-dimensional coordinate values of the joint points captured by the Kinect, wherein the other two joint points are expressed as
Figure DEST_PATH_IMAGE030
Figure DEST_PATH_IMAGE032
The first joint point is represented as
Figure DEST_PATH_IMAGE034
Constructing an inter-joint structure vector, the first joint point to
Figure 583064DEST_PATH_IMAGE030
Point vector
Figure DEST_PATH_IMAGE036
=
Figure DEST_PATH_IMAGE038
First joint point to
Figure 167629DEST_PATH_IMAGE032
Point vector
Figure DEST_PATH_IMAGE040
=
Figure DEST_PATH_IMAGE042
Figure 148354DEST_PATH_IMAGE032
Point-to-point
Figure 97856DEST_PATH_IMAGE030
Vector of
Figure DEST_PATH_IMAGE044
Computing vectors
Figure 200417DEST_PATH_IMAGE036
Sum vector
Figure 650990DEST_PATH_IMAGE040
Angle of (2)
Figure 814118DEST_PATH_IMAGE028
Size:
Figure DEST_PATH_IMAGE046
wherein ,
Figure 872203DEST_PATH_IMAGE028
the range of the angle is between 0 degree and 180 degrees, in order to enable the representation based on the included angle of the joint vector to be more accurate, representative joint angles are selected for representation according to the importance ranking of the joint angles in the action process, and then the position of the bone point is corrected through size normalization and angle correction.
In a further embodiment, the behavior identification module is mainly used for identifying and extracting the behavior features of the bone points, shifting and splicing the adjacent behavior features according to the adjacency relation of the graph, obtaining the calculated behavior features by only performing 1 × 1 convolution once after splicing, and obtaining the calculated behavior features for one image
Figure DEST_PATH_IMAGE048
For a node graph, let the characteristic dimension be
Figure DEST_PATH_IMAGE050
With a characteristic size of
Figure DEST_PATH_IMAGE052
Wherein the node
Figure DEST_PATH_IMAGE054
Is provided with
Figure DEST_PATH_IMAGE056
A node is adjacent to it, and the set of adjacent nodes is
Figure DEST_PATH_IMAGE058
(ii) a For the first
Figure 74777DEST_PATH_IMAGE054
The characteristics of each node are equally divided into by the shift map module
Figure 63462DEST_PATH_IMAGE056
+1 part, the first part retaining its own character, the latter
Figure 877834DEST_PATH_IMAGE056
Shares are shifted from their neighbor node characteristics, mathematically expressed as follows:
Figure DEST_PATH_IMAGE060
=
Figure DEST_PATH_IMAGE062
wherein ,
Figure DEST_PATH_IMAGE064
Figure 838312DEST_PATH_IMAGE060
subscript of (1)
Figure DEST_PATH_IMAGE066
A label representing a Python is used,
Figure DEST_PATH_IMAGE068
and double vertical lines represent feature dimensions for feature splicing.
A recognition method based on a shift mapping convolutional neural network skeleton point behavior recognition system comprises the following steps:
step 1, firstly, controlling a camera to rotate through an image acquisition module, and further acquiring a human behavior characteristic image; the rotating motor rotates to drive the rotating shaft to rotate, and then the camera is driven to rotate through the rotating shaft, so that the position of the camera is adjusted;
step 2, the image acquisition module carries out human body shooting behaviors through three groups of cameras which are placed in an equilateral triangle, and then behavior images acquired by the three groups of cameras are respectively displayed on a computer terminal before, after and at the side parts of the behavior images are installed, so that the image processing module can compare and process the images;
step 3, the image processing module mainly processes the human behavior image acquired by the image acquisition module into a human body edge image; traversing pixel points in the image by using a convolution 3 x 3 template when detecting the edge of the image through a Krisch edge detection operator, inspecting pixel gray values of adjacent areas around each pixel point one by one, and calculating the weighted sum difference of the gray values of three adjacent pixels and the gray weighted sum of the rest five pixels;
sequentially processing all pixels in the original image by using eight convolution templates, calculating to obtain the edge intensity of the pixels, detecting by using a threshold value, extracting the final edge point, and finishing edge detection;
the method for detecting the image edge by the Krisch operator comprises the following steps:
step 1, acquiring a data area pointer of an original image;
step 2, establishing two buffer areas, wherein the size of the buffer areas is the same as that of the original image, the buffer areas are mainly used for storing the original image and an original image copy, and the two buffer areas are initialized into the original image copy and are respectively marked as an image 1 and an image 2;
step 3, independently setting a Krisch template for convolution operation in each buffer area, respectively traversing pixels in the duplicate image in the two areas, performing convolution operation one by one, calculating results, comparing, storing a calculated comparative value into the image 1, and copying the image 1 into the cache image 2;
step 4, repeating the step 3, setting the remaining six templates at a time, performing calculation processing, and finally storing the larger gray values in the obtained image 1 and the image 2 in the buffer image 1;
step 5, copying the processed image 1 into original image data, and programming to realize edge processing of the image;
step 4, after the human behavior characteristic image is processed, the extraction module is used for extracting skeletal points of the image processed by the image processing module, and after the image processing module processes the image acquired by the image acquisition module, the human body edge map matches the skeletal points which are input in advance according to the closest acquired image behavior body type, and then displays the matched skeletal points on the human body edge map;
step 5, after the extraction of the skeleton points is completed, correcting the positions of the skeleton points by a correction module, and when the image acquisition module acquires a human behavior image, due to different body types of people, when people with different body types perform the same group of actions, due to different sizes of skeletons of people, the three-dimensional coordinates of the skeleton points are different, so that the sizes of the skeletons need to be normalized to be the same size; firstly, selecting a skeleton of a person as a reference skeleton, selecting a body central point as a root node for a certain frame of skeleton data, calculating all vectors from points directly connected with the root node to the root node, obtaining a direction vector (the module length is 1) of each vector by using the module length of each vector, multiplying the direction vector by the length of the corresponding vector in the reference skeleton to obtain a vector, adding the vector to the coordinate of the root node to obtain a corrected coordinate of a certain point directly connected with the root node, recording the coordinates of the connected points as the coordinate values of the corresponding skeleton points after normalization, sequentially updating the coordinate values of the root node according to the sequence of a breadth-first search algorithm, and repeating the steps until all the values of the skeleton points are corrected; the correction method is to zoom the estimated size under the condition of ensuring the included angle between the limbs to be unchanged;
when the included angle between the limbs changes, the included angle between the vectors is selected to describe the skeleton points so as to avoid the skeleton point deviation when the included angle between the limbs changes;
the steps for solving the included angle of the human joint vector are as follows:
obtaining the angle of a certain joint point, firstly obtaining three joint points used in angle calculation, capturing three-dimensional coordinate values of the joint points by using Kinect, constructing a structural vector between the three joint points, and then obtaining the size of a joint vector included angle by adopting an inverse cosine law;
determining the angle of the first joint
Figure 387236DEST_PATH_IMAGE028
For example;
selecting other two joint points connected with the first joint to obtain three-dimensional coordinate values of the joint points captured by the Kinect, wherein the other two joint points are expressed as
Figure 179612DEST_PATH_IMAGE030
Figure 786174DEST_PATH_IMAGE032
The first joint point is represented as
Figure 389324DEST_PATH_IMAGE034
Constructing an inter-joint structure vector, the first joint point to
Figure 612495DEST_PATH_IMAGE030
Point vector
Figure 615086DEST_PATH_IMAGE036
=
Figure 731947DEST_PATH_IMAGE038
First joint point to
Figure 630633DEST_PATH_IMAGE032
Point vector
Figure 950887DEST_PATH_IMAGE040
=
Figure 694852DEST_PATH_IMAGE042
Figure 931798DEST_PATH_IMAGE032
Point-to-point
Figure 735806DEST_PATH_IMAGE030
Vector of
Figure 806006DEST_PATH_IMAGE044
Computing vectors
Figure 150399DEST_PATH_IMAGE036
Sum vector
Figure 117218DEST_PATH_IMAGE040
Angle of (2)
Figure 216761DEST_PATH_IMAGE028
Size:
Figure DEST_PATH_IMAGE046A
wherein ,
Figure 714870DEST_PATH_IMAGE028
the range of the angle is between 0 degree and 180 degrees, in order to enable the representation based on the included angle of the joint vector to be more accurate, representative joint angles are selected for representation according to the importance ranking of the joint angles in the action process, and then the positions of the skeleton points are corrected through size normalization and angle correction;
step 6, after the correction of the skeleton points is completed, the behavior recognition module carries out the behavior recognition of the skeleton points at the moment, the adjacent behavior characteristics are shifted and spliced according to the adjacent relation of the graphs, the calculated behavior characteristics can be obtained only by carrying out 1 x 1 convolution once after the splicing, and for one skeleton point, the calculated behavior characteristics can be obtained
Figure 190851DEST_PATH_IMAGE048
For a node graph, let the characteristic dimension be
Figure 746597DEST_PATH_IMAGE050
With a characteristic size of
Figure 33353DEST_PATH_IMAGE052
Wherein the node
Figure 940129DEST_PATH_IMAGE054
Is provided with
Figure 626325DEST_PATH_IMAGE056
A node is adjacent to it, and the set of adjacent nodes is
Figure 692370DEST_PATH_IMAGE058
(ii) a For the first
Figure 9082DEST_PATH_IMAGE054
The characteristics of each node are equally divided into by the shift map module
Figure 278521DEST_PATH_IMAGE056
+1 part, the first part retaining its own character, the latter
Figure 706091DEST_PATH_IMAGE056
Shares are shifted from their neighbor node characteristics, mathematically expressed as follows:
Figure 626642DEST_PATH_IMAGE060
=
Figure 379835DEST_PATH_IMAGE062
wherein ,
Figure 57941DEST_PATH_IMAGE064
Figure 185076DEST_PATH_IMAGE060
subscript of (1)
Figure 835500DEST_PATH_IMAGE066
A label representing a Python is used,
Figure 618648DEST_PATH_IMAGE068
and the double vertical lines represent characteristic dimensions to carry out characteristic splicing, so that the behavior characteristics of the bone points are identified.
Has the advantages that: the invention discloses a moving-map convolution-based neural network bone point behavior identification system, wherein a behavior identification module is designed to identify the behavior of a bone point, so that the novel graph convolution capable of obviously reducing the computation amount of the graph convolution is different from the traditional graph convolution.
Drawings
FIG. 1 is a schematic diagram of the convolution of the skeleton point behavior recognition shift map of the present invention.
FIG. 2 is a schematic diagram of the local chart of the present invention.
FIG. 3 is a schematic view of a non-local chart of the present invention.
Fig. 4 is a schematic diagram of conventional graph convolution for identifying the behavior of a bone point.
FIG. 5 is a table of the accuracy and computational complexity contrast of shift graph convolution with conventional graph convolution methods.
Detailed Description
Through research and analysis of the applicant, the reason for this problem (traditional volume calculation is large) is that in the traditional volume method, the modeled convolution kernel can only cover the neighborhood of one point. However, in the skeletal point behavior recognition task, some behaviors (such as clapping) need to model the position relationship of points which are physically far apart (such as two hands). This requires increasing the convolution kernel size of the graph convolution model. However, the calculated amount of the graph convolution is increased along with the increase of the convolution kernel, so that the calculated amount of the traditional graph convolution is larger, the behavior recognition module is designed to recognize the behavior of the bone points, the novel graph convolution capable of obviously reducing the calculated amount of the graph convolution is different from the traditional graph convolution, the sensing range is not expanded by expanding the convolution kernel by the shift graph convolution, the graph characteristics are subjected to shift splicing by a novel shift operation, the same or even higher recognition accuracy can be achieved under the condition that the calculated amount is obviously reduced and the calculation speed is improved, and the phenomenon that the calculated amount of the traditional graph convolution is increased along with the increase of the convolution kernel, so that the calculated amount of the traditional graph convolution is larger is avoided.
A shift-map convolutional neural network-based bone point behavior identification system, comprising: the image acquisition module is used for acquiring behavior images; the image processing module is used for processing the behavior image acquired by the image acquisition module to perform image processing; the skeleton point extraction module is used for extracting the image processed by the image processing module; the behavior recognition module is used for recognizing and extracting the bone point behavior characteristics extracted by the extraction module;
the present invention does not specify a method of skeletal point extraction. There are many methods for extracting human bone points, for example: shooting from a camera, and then obtaining the human skeleton points by an algorithm. And directly obtaining the data from the Kinect camera. The human body wears an acceleration sensor, so that the position of the skeleton is directly obtained; the present invention is concerned with how to perform behavior recognition in the case where bone points have been acquired. However, the present invention is not limited to the method for extracting the bone points, and any method for extracting the bone points is within the scope of the present invention, but in this embodiment, a correction module is provided to perform the identification and correction of the image, and the image acquisition device is correspondingly changed to increase the multi-angle of the image acquisition.
The image acquisition module is based on image acquisition device, image acquisition device is including being the camera that equilateral triangle placed, and set up the rotating device of camera afterbody, rotating device include with camera fixed connection's axis of rotation, cup joint the rotation motor of axis of rotation.
The image acquisition module is through three groups being the human action of making a video recording of the camera ware that equilateral triangle placed, and then the action image that acquires three groups of camera ware is installed before, after, the lateral part and is appeared respectively on computer terminal, and then carries out contrast processing image for image processing module.
The image processing module is mainly used for processing the human behavior image acquired by the image acquisition module into a human body edge map; traversing pixel points in the image by using a convolution 3 x 3 template when detecting the edge of the image through a Krisch edge detection operator, inspecting pixel gray values of adjacent areas around each pixel point one by one, and calculating the weighted sum difference of the gray values of three adjacent pixels and the gray weighted sum of the rest five pixels; the convolution template is as follows:
Figure DEST_PATH_IMAGE002A
1 2 3 4
Figure DEST_PATH_IMAGE004A
5 6 7 8
sequentially processing all pixels in the original image by using eight convolution templates, calculating to obtain the edge intensity of the pixels, detecting by using a threshold value, extracting the final edge point, and finishing edge detection; the method for detecting the image edge by the Krisch operator comprises the following steps: step 1, acquiring a data area pointer of an original image;
step 2, establishing two buffer areas, wherein the size of the buffer areas is the same as that of the original image, the buffer areas are mainly used for storing the original image and an original image copy, and the two buffer areas are initialized into the original image copy and are respectively marked as an image 1 and an image 2;
step 3, independently setting a Krisch template for convolution operation in each buffer area, respectively traversing pixels in the duplicate image in the two areas, performing convolution operation one by one, calculating results, comparing, storing a calculated comparative value into the image 1, and copying the image 1 into the cache image 2;
step 4, repeating the step 3, setting the remaining six templates at a time, performing calculation processing, and finally storing the larger gray values in the obtained image 1 and the image 2 in the buffer image 1;
and 5, copying the processed image 1 into original image data, and programming to realize edge processing of the image.
The extraction module is used for extracting the bone points of the image processed by the image processing module, when the image processing module finishes processing the image acquired by the image acquisition module, the positions of the bone points which are input in advance are matched according to the closest acquired image person body type on the human body edge image, and then the matched bone points are displayed on the human body edge image.
The extraction module also comprises a correction module, when the image acquisition module acquires the human behavior image, due to different body types of people, when people with different body types perform the same group of actions, due to different sizes of skeletons, the three-dimensional coordinates of skeleton points are different, so that the sizes of the skeletons need to be normalized to be the same size;
firstly, selecting a skeleton of a person as a reference skeleton, selecting a body central point as a root node for a certain frame of skeleton data, calculating all vectors from points directly connected with the root node to the root node, obtaining a direction vector (the module length is 1) of each vector by using each vector as the module length of the vector, multiplying the direction vector by the length of the corresponding vector in the reference skeleton to obtain a vector, adding the vector to the coordinate of the root node to obtain the corrected coordinate of a certain point directly connected with the root node, recording the coordinates of the connected points as the coordinate values of the corresponding skeleton points after normalization, sequentially updating the coordinate values of the root node according to the sequence of a breadth-first search algorithm, and repeating the steps until the values of all the skeleton points are corrected, wherein the algorithm comprises the following steps:
inputting: the length of the limb in the reference appendage is
Figure 206887DEST_PATH_IMAGE006
Preparing a normalized bone point coordinate value;
the first step is as follows: definition of
Figure 38577DEST_PATH_IMAGE008
Is a root node coordinate;
the second step is that: will be provided with
Figure 543507DEST_PATH_IMAGE008
Giving an initial value of
Figure 763136DEST_PATH_IMAGE010
A third step; to all of (
Figure 619097DEST_PATH_IMAGE012
) Sequentially executing according to a breadth-first search strategy;
the fourth step: computing
Figure 801947DEST_PATH_IMAGE014
-
Figure 426964DEST_PATH_IMAGE016
The fifth step: computing
Figure 489598DEST_PATH_IMAGE018
And a sixth step:
Figure 691909DEST_PATH_IMAGE008
+
Figure 334243DEST_PATH_IMAGE020
will be
Figure 686202DEST_PATH_IMAGE022
The value of (a) is saved to the set a;
the seventh step: returning to the third part, knowing that all limbs in the skeleton are traversed;
and (3) outputting: the coordinates of the skeleton points stored in the set A are corrected coordinates;
wherein ,
Figure 857421DEST_PATH_IMAGE024
the value of (A) represents
Figure 547028DEST_PATH_IMAGE026
The body part is provided with a plurality of limbs,
Figure 258632DEST_PATH_IMAGE006
represents the first in the reference valuation
Figure 858241DEST_PATH_IMAGE026
The length of each limb is determined by the length of the limb,
Figure 341306DEST_PATH_IMAGE012
respectively represent the first in the reference valuation
Figure 127996DEST_PATH_IMAGE026
Coordinate values of the starting node and the ending node of each limb, so as to obtain all the coordinates
Figure 971187DEST_PATH_IMAGE022
Calculating the values of the skeleton points to obtain all corrected skeleton point coordinates, and zooming the estimated size under the condition of ensuring that the included angle between the limbs is unchanged;
when the included angle between the limbs changes, the included angle between the vectors is selected to describe the skeleton points so as to avoid the skeleton point deviation when the included angle between the limbs changes;
the steps for solving the included angle of the human joint vector are as follows:
obtaining the angle of a certain joint point, firstly obtaining three joint points used in angle calculation, capturing three-dimensional coordinate values of the joint points by using Kinect, constructing a structural vector between the three joint points, and then obtaining the size of a joint vector included angle by adopting an inverse cosine law;
determining the angle of the first joint
Figure 425302DEST_PATH_IMAGE028
For example;
selecting other two joint points connected with the first joint to obtain three-dimensional coordinate values of the joint points captured by the Kinect, wherein the other two joint points are expressed as
Figure 640DEST_PATH_IMAGE030
Figure 415572DEST_PATH_IMAGE032
The first joint point is represented as
Figure 937820DEST_PATH_IMAGE034
Constructing an inter-joint structure vector, the first joint point to
Figure 105496DEST_PATH_IMAGE030
Point vector
Figure 54998DEST_PATH_IMAGE036
=
Figure 691647DEST_PATH_IMAGE038
First joint point to
Figure 814323DEST_PATH_IMAGE032
Point vector
Figure 977452DEST_PATH_IMAGE040
=
Figure 222488DEST_PATH_IMAGE042
Figure 471067DEST_PATH_IMAGE032
Point-to-point
Figure 207554DEST_PATH_IMAGE030
Vector of
Computing vectors
Figure 906706DEST_PATH_IMAGE036
Sum vector
Figure 439318DEST_PATH_IMAGE040
Angle of (2)
Figure 841481DEST_PATH_IMAGE028
Size:
Figure DEST_PATH_IMAGE046AA
wherein ,
Figure 120146DEST_PATH_IMAGE028
the range of the angle is between 0 degree and 180 degrees, in order to enable the representation based on the included angle of the joint vector to be more accurate, representative joint angles are selected for representation according to the importance ranking of the joint angles in the action process, and then the position of the bone point is corrected through size normalization and angle correction.
The behavior identification module is mainly used for identifying and extracting the behavior features of the bone points, shifting and splicing the adjacent behavior features according to the adjacent relation of the graphs, obtaining the calculated behavior features by only performing 1 × 1 convolution once after splicing, and obtaining the calculated behavior features for one line
Figure 723297DEST_PATH_IMAGE048
For a node graph, let the characteristic dimension be
Figure 477627DEST_PATH_IMAGE050
With a characteristic size of
Figure 949059DEST_PATH_IMAGE052
Wherein the node
Figure 65920DEST_PATH_IMAGE054
Is provided with
Figure 433447DEST_PATH_IMAGE056
A node is adjacent to it, and the set of adjacent nodes is
Figure 19280DEST_PATH_IMAGE058
(ii) a For the first
Figure 28825DEST_PATH_IMAGE054
The characteristics of each node are equally divided into by the shift map module
Figure 192DEST_PATH_IMAGE056
+1 part, the first part retaining its own character, the latter
Figure 866517DEST_PATH_IMAGE056
Shares are shifted from their neighbor node characteristics, mathematically expressed as follows:
Figure 798701DEST_PATH_IMAGE060
=
Figure 484372DEST_PATH_IMAGE062
wherein ,
Figure 185612DEST_PATH_IMAGE064
Figure 285155DEST_PATH_IMAGE060
subscript of (1)
Figure 970214DEST_PATH_IMAGE066
A label representing a Python is used,
Figure 118299DEST_PATH_IMAGE068
the double vertical lines represent characteristic dimensions for characteristic splicing; for intuitive understanding of the above formula, we use a graph of 7 nodes 20-dimensional features as an example, as shown in fig. 2 and 3; here we discuss two cases:
1. the neighborhood of each point contains only physically contiguous locations, we call the local design, shown in FIG. 2;
2. the location of each point contains the entire human skeleton map, we call the non-local design, shown in FIG. 3;
for both designs, we use node 1 (node 1) and node 2 (node 2), respectively, as examples; the following is a detailed explanation of the invention,
in fig. 2, for node 1, there are 1 neighboring nodes (i.e., node 2), so we average their features into 1+1=2, where the first retains its own features (node 1 labeled as part 1) and the second is shifted from node 2 (node 1 labeled as part 2). In fig. 2, for node 2, there are 3 contiguous nodes (i.e. node 1, node 3 and node 4), so we average their characteristics into 3+1=4 shares, the first of which holds its own characteristics (node 1 labeled as part 2) and the last 3 shares are shifted from nodes 1, 3, 4 respectively (corresponding to node 1 labeled as parts 1, 3, 4 respectively).
In fig. 3, for any one node, all other nodes are adjacent to it, so we shift the features of all other nodes from the current node. Examples of node 1 and node 2 are shown in fig. 3. After the shift is carried out, the formed features look like a spiral shape, which is the result of the intensive mixing of the features of different nodes, and experiments show that in two designs of the shift graph convolution, the non-local design has higher precision on the task of behavior identification, because the non-local design can better fuse the features of different nodes, the efficient feature fusion can be carried out even if the nodes are far away,
it is worth to be noted that, under the same recognition accuracy, the shift map convolution proposed by us is more than 3 times smaller than the conventional map convolution in computation cost, which is very important for fast recognition, and the method can be faster due to the saved computation times of convolution (compare fig. 1 and fig. 4); another aspect is that the shift operation can be implemented by a pointer in the C + + or CUDA languages, and thus can be very efficiently deployed on a CPU or GPU.
Our main experiment is shown in figure 5. ST-GCN, Adaptive-GCN and Adaptive-NL GCN are three typical methods of conventional GCN. Our Shift-map convolution (Shift GCN) includes both Local Shift GCN and Non-Local Shift GCN designs. As can be seen from the table, the FLOPs (floating point number of computations, representing computational complexity) of our method is more than 3 times smaller than the conventional graph convolution, which is very important for fast identification. Moreover, the accuracy of the method is higher than that of the traditional graph convolution method.
In addition, we also compare the case of reducing the adjacency matrix of the conventional graph convolution, i.e., the model of suffix "one a", and their computation amount is comparable to that of us, but the accuracy is significantly reduced. This means that the accuracy is significantly degraded when the amount of calculation of the conventional graph convolution is reduced. Our Shift-map convolution (Shift GCN) can achieve an accuracy exceeding all previous algorithms with a small amount of computation.
Description of the working principle: firstly, controlling a camera to rotate through an image acquisition module so as to acquire a human behavior characteristic image; the rotating motor rotates to drive the rotating shaft to rotate, and then the camera is driven to rotate through the rotating shaft, so that the position of the camera is adjusted; the image acquisition module is used for shooting human body behaviors through three groups of cameras which are placed in an equilateral triangle, and then behavior images acquired by the three groups of cameras are respectively displayed on a computer terminal before, after and at the side parts of the behavior images are installed, so that the image processing module can compare and process the images; the image processing module is mainly used for processing the human behavior image acquired by the image acquisition module into a human body edge map; traversing pixel points in the image by using a convolution 3 x 3 template when detecting the edge of the image through a Krisch edge detection operator, inspecting pixel gray values of adjacent areas around each pixel point one by one, and calculating the weighted sum difference of the gray values of three adjacent pixels and the gray weighted sum of the rest five pixels; sequentially processing all pixels in the original image by using eight convolution templates, calculating to obtain the edge intensity of the pixels, detecting by using a threshold value, extracting the final edge point, and finishing edge detection; the method for detecting the image edge by the Krisch operator comprises the following steps:
step 1, acquiring a data area pointer of an original image;
step 2, establishing two buffer areas, wherein the size of the buffer areas is the same as that of the original image, the buffer areas are mainly used for storing the original image and an original image copy, and the two buffer areas are initialized into the original image copy and are respectively marked as an image 1 and an image 2;
step 3, independently setting a Krisch template for convolution operation in each buffer area, respectively traversing pixels in the duplicate image in the two areas, performing convolution operation one by one, calculating results, comparing, storing a calculated comparative value into the image 1, and copying the image 1 into the cache image 2;
step 4, repeating the step 3, setting the remaining six templates at a time, performing calculation processing, and finally storing the larger gray values in the obtained image 1 and the image 2 in the buffer image 1;
step 5, copying the processed image 1 into original image data, and programming to realize edge processing of the image;
after the human behavior characteristic image is processed, the extraction module is used for extracting skeletal points of the image processed by the image processing module, and after the image processing module processes the image acquired by the image acquisition module, the skeletal points which are input in advance are matched according to the closest acquired image behavior body type on the human body edge map, and then the matched skeletal points are displayed on the human body edge map; after the extraction of the skeleton points is completed, the positions of the skeleton points are corrected by a correction module, when the image acquisition module acquires a human behavior image, due to different body types of people, when people with different body types perform the same group of actions, due to different sizes of skeletons, the three-dimensional coordinates of the skeleton points are different, and therefore the sizes of the skeletons need to be normalized to be the same size; firstly, selecting a skeleton of a person as a reference skeleton, selecting a body central point as a root node for a certain frame of skeleton data, calculating all vectors from points directly connected with the root node to the root node, obtaining a direction vector (the module length is 1) of each vector by using the module length of each vector, multiplying the direction vector by the length of the corresponding vector in the reference skeleton to obtain a vector, adding the vector to the coordinate of the root node to obtain a corrected coordinate of a certain point directly connected with the root node, recording the coordinates of the connected points as the coordinate values of the corresponding skeleton points after normalization, sequentially updating the coordinate values of the root node according to the sequence of a breadth-first search algorithm, and repeating the steps until all the values of the skeleton points are corrected; the correction method is to zoom the estimated size under the condition of ensuring the included angle between the limbs to be unchanged; when the included angle between the limbs changes, the included angle between the vectors is selected to describe the skeleton points so as to avoid the skeleton point deviation when the included angle between the limbs changes; the steps for solving the included angle of the human joint vector are as follows: obtaining the angle of a certain joint point, firstly obtaining three joint points used in angle calculation, capturing three-dimensional coordinate values of the joint points by using Kinect, constructing a structural vector between the three joint points, and then obtaining the size of a joint vector included angle by adopting an inverse cosine law; in order to enable the representation based on the joint vector included angle to be more accurate, representative joint angles are selected for representation according to the importance ranking of the joint angles in the action process, and then the positions of the skeleton points are corrected through size normalization and angle correction; after the correction of the skeleton points is completed, the behavior recognition module carries out the behavior recognition of the skeleton points at the moment, the adjacent behavior features are subjected to displacement splicing according to the adjacent relation of the graphs, and the calculated behavior features can be obtained only by carrying out 1 × 1 convolution once after the splicing.
The preferred embodiments of the present invention have been described in detail with reference to the accompanying drawings, however, the present invention is not limited to the specific details of the embodiments, and various equivalent changes can be made to the technical solution of the present invention within the technical idea of the present invention, and these equivalent changes are within the protection scope of the present invention.

Claims (7)

1. A neural network skeleton point behavior identification system based on shift mapping convolution is characterized by comprising the following components:
the behavior recognition module is used for recognizing and extracting the bone point behavior characteristics extracted by the extraction module;
the behavior recognition module is mainly used for recognizing and extracting the behavior characteristics of the bone points according toThe adjacent behavior features are subjected to shift splicing in the adjacent relation of the graphs, the calculated behavior features can be obtained only by performing 1 × 1 convolution once after splicing, and for one graph
Figure DEST_PATH_IMAGE001
For a node graph, let the characteristic dimension be
Figure 874249DEST_PATH_IMAGE002
With a characteristic size of
Figure DEST_PATH_IMAGE003
Wherein the node
Figure 525810DEST_PATH_IMAGE004
Is provided with
Figure DEST_PATH_IMAGE005
A node is adjacent to it, and the set of adjacent nodes is
Figure 977651DEST_PATH_IMAGE006
(ii) a For the first
Figure 310543DEST_PATH_IMAGE004
The characteristics of each node are equally divided into by the shift map module
Figure 531440DEST_PATH_IMAGE005
+1 part, the first part retaining its own character, the latter
Figure 885061DEST_PATH_IMAGE005
Shares are shifted from their neighbor node characteristics, mathematically expressed as follows:
Figure DEST_PATH_IMAGE007
=
Figure 782389DEST_PATH_IMAGE008
wherein ,
Figure DEST_PATH_IMAGE009
Figure 184552DEST_PATH_IMAGE007
subscript of (1)
Figure 56693DEST_PATH_IMAGE010
A label representing a Python is used,
Figure DEST_PATH_IMAGE011
and double vertical lines represent feature dimensions for feature splicing.
2. The system of claim 1, wherein the system is characterized in that: the device also comprises an image acquisition module for acquiring the behavior image;
the image acquisition module is based on image acquisition device, image acquisition device is including being the camera that equilateral triangle placed, and set up the rotating device of camera afterbody, rotating device include with camera fixed connection's axis of rotation, cup joint the rotation motor of axis of rotation.
3. The system of claim 2, wherein the system is characterized in that: the image acquisition module is through three groups being the human action of making a video recording of the camera ware that equilateral triangle placed, and then the action image that acquires three groups of camera ware is installed before, after, the lateral part and is appeared respectively on computer terminal, and then carries out contrast processing image for image processing module.
4. The system of claim 1, wherein the system is characterized in that: the behavior image acquisition module is used for acquiring behavior images of the user;
the image processing module is mainly used for processing the human behavior image acquired by the image acquisition module into a human body edge map; traversing pixel points in the image by using a convolution 3 x 3 template when detecting the edge of the image through a Krisch edge detection operator, inspecting pixel gray values of adjacent areas around each pixel point one by one, and calculating the weighted sum difference of the gray values of three adjacent pixels and the gray weighted sum of the rest five pixels; the convolution template is as follows:
Figure DEST_PATH_IMAGE013
1 2 3 4
Figure DEST_PATH_IMAGE015
5 6 7 8
sequentially processing all pixels in the original image by using eight convolution templates, calculating to obtain the edge intensity of the pixels, detecting by using a threshold value, extracting the final edge point, and finishing edge detection;
the method for detecting the image edge by the Krisch operator comprises the following steps:
step 1, acquiring a data area pointer of an original image;
step 2, establishing two buffer areas, wherein the size of the buffer areas is the same as that of the original image, the buffer areas are mainly used for storing the original image and an original image copy, and the two buffer areas are initialized into the original image copy and are respectively marked as an image 1 and an image 2;
step 3, independently setting a Krisch template for convolution operation in each buffer area, respectively traversing pixels in the duplicate image in the two areas, performing convolution operation one by one, calculating results, comparing, storing a calculated comparative value into the image 1, and copying the image 1 into the cache image 2;
step 4, repeating the step 3, setting the remaining six templates at a time, performing calculation processing, and finally storing the larger gray values in the obtained image 1 and the image 2 in the buffer image 1;
and 5, copying the processed image 1 into original image data, and programming to realize edge processing of the image.
5. The system of claim 1, wherein the system is characterized in that: the bone point extraction module is used for extracting the image processed by the image processing module;
the extraction module is used for extracting the bone points of the image processed by the image processing module, when the image processing module finishes processing the image acquired by the image acquisition module, the positions of the bone points which are input in advance are matched according to the closest acquired image person body type on the human body edge image, and then the matched bone points are displayed on the human body edge image.
6. The system of claim 5, wherein the system is characterized in that: the extraction module also comprises a correction module, when the image acquisition module acquires the human behavior image, due to different body types of people, when people with different body types perform the same group of actions, due to different sizes of skeletons, the three-dimensional coordinates of skeleton points are different, so that the sizes of the skeletons need to be normalized to be the same size;
firstly, selecting a skeleton of a person as a reference skeleton, selecting a body central point as a root node for a certain frame of skeleton data, calculating all vectors from points directly connected with the root node to the root node, obtaining a direction vector (the module length is 1) of each vector by using each vector as the module length of the vector, multiplying the direction vector by the length of the corresponding vector in the reference skeleton to obtain a vector, adding the vector to the coordinate of the root node to obtain the corrected coordinate of a certain point directly connected with the root node, recording the coordinates of the connected points as the coordinate values of the corresponding skeleton points after normalization, sequentially updating the coordinate values of the root node according to the sequence of a breadth-first search algorithm, and repeating the steps until the values of all the skeleton points are corrected, wherein the algorithm comprises the following steps:
inputting: the length of the limb in the reference appendage is
Figure 800789DEST_PATH_IMAGE016
Preparing a normalized bone point coordinate value;
the first step is as follows: definition of
Figure DEST_PATH_IMAGE017
Is a root node coordinate;
the second step is that: will be provided with
Figure 896396DEST_PATH_IMAGE017
Giving an initial value of
Figure 164567DEST_PATH_IMAGE018
A third step; to all of (
Figure DEST_PATH_IMAGE019
) Sequentially executing according to a breadth-first search strategy;
the fourth step: computing
Figure 766581DEST_PATH_IMAGE020
-
Figure DEST_PATH_IMAGE021
The fifth step: computing
Figure 930846DEST_PATH_IMAGE022
And a sixth step:
Figure 313417DEST_PATH_IMAGE017
+
Figure DEST_PATH_IMAGE023
will be
Figure 526223DEST_PATH_IMAGE024
The value of (a) is saved to the set a;
the seventh step: returning to the third part, knowing that all limbs in the skeleton are traversed;
and (3) outputting: the coordinates of the skeleton points stored in the set A are corrected coordinates;
wherein ,
Figure DEST_PATH_IMAGE025
the value of (A) represents
Figure 107377DEST_PATH_IMAGE026
The body part is provided with a plurality of limbs,
Figure 973702DEST_PATH_IMAGE016
represents the first in the reference valuation
Figure 106219DEST_PATH_IMAGE026
The length of each limb is determined by the length of the limb,
Figure 981771DEST_PATH_IMAGE019
respectively represent the first in the reference valuation
Figure 886273DEST_PATH_IMAGE026
Coordinate values of the starting node and the ending node of each limb, so as to obtain all the coordinates
Figure 189078DEST_PATH_IMAGE024
Calculating the values of the skeleton points to obtain all corrected skeleton point coordinates, and zooming the estimated size under the condition of ensuring that the included angle between the limbs is unchanged;
when the included angle between the limbs changes, the included angle between the vectors is selected to describe the skeleton points so as to avoid the skeleton point deviation when the included angle between the limbs changes;
the steps for solving the included angle of the human joint vector are as follows:
obtaining the angle of a certain joint point, firstly obtaining three joint points used in angle calculation, capturing three-dimensional coordinate values of the joint points by using Kinect, constructing a structural vector between the three joint points, and then obtaining the size of a joint vector included angle by adopting an inverse cosine law;
determining the angle of the first joint
Figure DEST_PATH_IMAGE027
For example;
selecting other two joint points connected with the first joint to obtain three-dimensional coordinate values of the joint points captured by the Kinect, wherein the other two joint points are expressed as
Figure 280662DEST_PATH_IMAGE028
Figure DEST_PATH_IMAGE029
The first joint point is represented as
Figure 100851DEST_PATH_IMAGE030
Constructing an inter-joint structure vector, the first joint point to
Figure 984493DEST_PATH_IMAGE028
Point vector
Figure DEST_PATH_IMAGE031
=
Figure 864724DEST_PATH_IMAGE032
First joint point to
Figure 709184DEST_PATH_IMAGE029
Point vector
Figure DEST_PATH_IMAGE033
=
Figure 333063DEST_PATH_IMAGE034
Figure 336791DEST_PATH_IMAGE029
Point-to-point
Figure 856765DEST_PATH_IMAGE028
Vector of
Figure DEST_PATH_IMAGE035
Computing vectors
Figure 716749DEST_PATH_IMAGE031
Sum vector
Figure 472216DEST_PATH_IMAGE033
Angle of (2)
Figure 940237DEST_PATH_IMAGE027
Size:
Figure DEST_PATH_IMAGE037
wherein ,
Figure 162271DEST_PATH_IMAGE027
the range of the angle is between 0 degree and 180 degrees, in order to enable the representation based on the included angle of the joint vector to be more accurate, representative joint angles are selected for representation according to the importance ranking of the joint angles in the action process, and then the position of the bone point is corrected through size normalization and angle correction.
7. The method for recognizing the bone point behavior recognition system based on the shift-map convolutional neural network as claimed in claim 1, which comprises:
step 1, firstly, controlling a camera to rotate through an image acquisition module, and further acquiring a human behavior characteristic image; the rotating motor rotates to drive the rotating shaft to rotate, and then the camera is driven to rotate through the rotating shaft, so that the position of the camera is adjusted;
step 2, the image acquisition module carries out human body shooting behaviors through three groups of cameras which are placed in an equilateral triangle, and then behavior images acquired by the three groups of cameras are respectively displayed on a computer terminal before, after and at the side parts of the behavior images are installed, so that the image processing module can compare and process the images;
step 3, the image processing module mainly processes the human behavior image acquired by the image acquisition module into a human body edge image; traversing pixel points in the image by using a convolution 3 x 3 template when detecting the edge of the image through a Krisch edge detection operator, inspecting pixel gray values of adjacent areas around each pixel point one by one, and calculating the weighted sum difference of the gray values of three adjacent pixels and the gray weighted sum of the rest five pixels;
sequentially processing all pixels in the original image by using eight convolution templates, calculating to obtain the edge intensity of the pixels, detecting by using a threshold value, extracting the final edge point, and finishing edge detection;
the method for detecting the image edge by the Krisch operator comprises the following steps:
step 1, acquiring a data area pointer of an original image;
step 2, establishing two buffer areas, wherein the size of the buffer areas is the same as that of the original image, the buffer areas are mainly used for storing the original image and an original image copy, and the two buffer areas are initialized into the original image copy and are respectively marked as an image 1 and an image 2;
step 3, independently setting a Krisch template for convolution operation in each buffer area, respectively traversing pixels in the duplicate image in the two areas, performing convolution operation one by one, calculating results, comparing, storing a calculated comparative value into the image 1, and copying the image 1 into the cache image 2;
step 4, repeating the step 3, setting the remaining six templates at a time, performing calculation processing, and finally storing the larger gray values in the obtained image 1 and the image 2 in the buffer image 1;
step 5, copying the processed image 1 into original image data, and programming to realize edge processing of the image;
step 4, after the human behavior characteristic image is processed, the extraction module is used for extracting skeletal points of the image processed by the image processing module, and after the image processing module processes the image acquired by the image acquisition module, the human body edge map matches the skeletal points which are input in advance according to the closest acquired image behavior body type, and then displays the matched skeletal points on the human body edge map;
step 5, after the extraction of the skeleton points is completed, correcting the positions of the skeleton points by a correction module, and when the image acquisition module acquires a human behavior image, due to different body types of people, when people with different body types perform the same group of actions, due to different sizes of skeletons of people, the three-dimensional coordinates of the skeleton points are different, so that the sizes of the skeletons need to be normalized to be the same size; firstly, selecting a skeleton of a person as a reference skeleton, selecting a body central point as a root node for a certain frame of skeleton data, calculating all vectors from points directly connected with the root node to the root node, obtaining a direction vector (the module length is 1) of each vector by using the module length of each vector, multiplying the direction vector by the length of the corresponding vector in the reference skeleton to obtain a vector, adding the vector to the coordinate of the root node to obtain a corrected coordinate of a certain point directly connected with the root node, recording the coordinates of the connected points as the coordinate values of the corresponding skeleton points after normalization, sequentially updating the coordinate values of the root node according to the sequence of a breadth-first search algorithm, and repeating the steps until all the values of the skeleton points are corrected; the correction method is to zoom the estimated size under the condition of ensuring the included angle between the limbs to be unchanged;
when the included angle between the limbs changes, the included angle between the vectors is selected to describe the skeleton points so as to avoid the skeleton point deviation when the included angle between the limbs changes;
the steps for solving the included angle of the human joint vector are as follows:
obtaining the angle of a certain joint point, firstly obtaining three joint points used in angle calculation, capturing three-dimensional coordinate values of the joint points by using Kinect, constructing a structural vector between the three joint points, and then obtaining the size of a joint vector included angle by adopting an inverse cosine law;
determining the angle of the first joint
Figure 309219DEST_PATH_IMAGE027
For example;
selecting other two joint points connected with the first joint to obtain three-dimensional coordinate values of the joint points captured by the Kinect, wherein the other two joint points are expressed as
Figure 868376DEST_PATH_IMAGE028
Figure 456483DEST_PATH_IMAGE029
The first joint point is represented as
Figure 177315DEST_PATH_IMAGE030
Constructing an inter-joint structure vector, the first joint point to
Figure 749241DEST_PATH_IMAGE028
Point vector
Figure 846510DEST_PATH_IMAGE031
=
Figure 882600DEST_PATH_IMAGE032
First joint point to
Figure 649698DEST_PATH_IMAGE029
Point vector
Figure 567976DEST_PATH_IMAGE033
=
Figure 609881DEST_PATH_IMAGE034
Figure 31635DEST_PATH_IMAGE029
Point-to-point
Figure 235215DEST_PATH_IMAGE028
Vector of
Figure 578471DEST_PATH_IMAGE035
Computing vectors
Figure 283122DEST_PATH_IMAGE031
Sum vector
Figure 697398DEST_PATH_IMAGE033
Angle of (2)
Figure 462092DEST_PATH_IMAGE027
Size:
Figure 964749DEST_PATH_IMAGE038
wherein ,
Figure 941932DEST_PATH_IMAGE027
the range of the angle is between 0 degree and 180 degrees, in order to enable the representation based on the included angle of the joint vector to be more accurate, representative joint angles are selected for representation according to the importance ranking of the joint angles in the action process, and then the positions of the skeleton points are corrected through size normalization and angle correction;
step 6, after the correction of the skeleton points is completed, the behavior recognition module carries out the behavior recognition of the skeleton points at the moment, the adjacent behavior characteristics are shifted and spliced according to the adjacent relation of the graphs, the calculated behavior characteristics can be obtained only by carrying out 1 x 1 convolution once after the splicing, and for one skeleton point, the calculated behavior characteristics can be obtained
Figure 479224DEST_PATH_IMAGE001
For a node graph, let the characteristic dimension be
Figure 149239DEST_PATH_IMAGE002
With a characteristic size of
Figure 139192DEST_PATH_IMAGE003
Wherein the node
Figure 920066DEST_PATH_IMAGE004
Is provided with
Figure 311865DEST_PATH_IMAGE005
A node is adjacent to it, and the set of adjacent nodes is
Figure 356044DEST_PATH_IMAGE006
(ii) a For the first
Figure 957927DEST_PATH_IMAGE004
The characteristics of each node are equally divided into by the shift map module
Figure 683437DEST_PATH_IMAGE005
+1 part, the first part retaining its own character, the latter
Figure 54376DEST_PATH_IMAGE005
Shares are shifted from their neighbor node characteristics, mathematically expressed as follows:
Figure 941560DEST_PATH_IMAGE007
=
Figure 765160DEST_PATH_IMAGE008
wherein ,
Figure 28782DEST_PATH_IMAGE009
Figure 457489DEST_PATH_IMAGE007
subscript of (1)
Figure 640209DEST_PATH_IMAGE010
A label representing a Python is used,
Figure 89120DEST_PATH_IMAGE011
and the double vertical lines represent characteristic dimensions to carry out characteristic splicing, so that the behavior characteristics of the bone points are identified.
CN202010419839.4A 2020-05-18 2020-05-18 Bone point behavior recognition system based on shift map convolution neural network and recognition method thereof Active CN111582220B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010419839.4A CN111582220B (en) 2020-05-18 2020-05-18 Bone point behavior recognition system based on shift map convolution neural network and recognition method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010419839.4A CN111582220B (en) 2020-05-18 2020-05-18 Bone point behavior recognition system based on shift map convolution neural network and recognition method thereof

Publications (2)

Publication Number Publication Date
CN111582220A true CN111582220A (en) 2020-08-25
CN111582220B CN111582220B (en) 2023-05-26

Family

ID=72123047

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010419839.4A Active CN111582220B (en) 2020-05-18 2020-05-18 Bone point behavior recognition system based on shift map convolution neural network and recognition method thereof

Country Status (1)

Country Link
CN (1) CN111582220B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112009717A (en) * 2020-08-31 2020-12-01 南京迪沃航空技术有限公司 Airport bulk cargo loader, machine leaning anti-collision system for bulk cargo loader and anti-collision method of machine leaning anti-collision system
CN113158782A (en) * 2021-03-10 2021-07-23 浙江工业大学 Multi-person concurrent interaction behavior understanding method based on single-frame image
CN113627409A (en) * 2021-10-13 2021-11-09 南通力人健身器材有限公司 Body-building action recognition monitoring method and system
CN114463840A (en) * 2021-12-31 2022-05-10 北京工业大学 Skeleton-based method for recognizing human body behaviors through shift graph convolution network
JP7485154B1 (en) 2023-05-19 2024-05-16 トヨタ自動車株式会社 Video Processing System

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
CN109522793A (en) * 2018-10-10 2019-03-26 华南理工大学 More people's unusual checkings and recognition methods based on machine vision
CN111340011A (en) * 2020-05-18 2020-06-26 中国科学院自动化研究所南京人工智能芯片创新研究院 Self-adaptive time sequence shift neural network time sequence behavior identification method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
CN109522793A (en) * 2018-10-10 2019-03-26 华南理工大学 More people's unusual checkings and recognition methods based on machine vision
CN111340011A (en) * 2020-05-18 2020-06-26 中国科学院自动化研究所南京人工智能芯片创新研究院 Self-adaptive time sequence shift neural network time sequence behavior identification method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
韩敏捷;: "基于深度学习框架的多模态动作识别", 计算机与现代化 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112009717A (en) * 2020-08-31 2020-12-01 南京迪沃航空技术有限公司 Airport bulk cargo loader, machine leaning anti-collision system for bulk cargo loader and anti-collision method of machine leaning anti-collision system
CN112009717B (en) * 2020-08-31 2022-08-02 南京迪沃航空技术有限公司 Airport bulk cargo loader, machine leaning anti-collision system for bulk cargo loader and anti-collision method of machine leaning anti-collision system
CN113158782A (en) * 2021-03-10 2021-07-23 浙江工业大学 Multi-person concurrent interaction behavior understanding method based on single-frame image
CN113158782B (en) * 2021-03-10 2024-03-26 浙江工业大学 Multi-person concurrent interaction behavior understanding method based on single-frame image
CN113627409A (en) * 2021-10-13 2021-11-09 南通力人健身器材有限公司 Body-building action recognition monitoring method and system
CN114463840A (en) * 2021-12-31 2022-05-10 北京工业大学 Skeleton-based method for recognizing human body behaviors through shift graph convolution network
JP7485154B1 (en) 2023-05-19 2024-05-16 トヨタ自動車株式会社 Video Processing System

Also Published As

Publication number Publication date
CN111582220B (en) 2023-05-26

Similar Documents

Publication Publication Date Title
CN110135455B (en) Image matching method, device and computer readable storage medium
CN111582220A (en) Skeleton point behavior identification system based on shift diagram convolution neural network and identification method thereof
US20170161901A1 (en) System and Method for Hybrid Simultaneous Localization and Mapping of 2D and 3D Data Acquired by Sensors from a 3D Scene
KR102285915B1 (en) Real-time 3d gesture recognition and tracking system for mobile devices
JP6493163B2 (en) Density search method and image processing apparatus
CN109919971B (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN108010082B (en) Geometric matching method
CN110070096B (en) Local frequency domain descriptor generation method and device for non-rigid shape matching
CN112819875B (en) Monocular depth estimation method and device and electronic equipment
CN111928842B (en) Monocular vision based SLAM positioning method and related device
JP2007249592A (en) Three-dimensional object recognition system
US20200005078A1 (en) Content aware forensic detection of image manipulations
CN106407978B (en) Method for detecting salient object in unconstrained video by combining similarity degree
CN111928857B (en) Method and related device for realizing SLAM positioning in dynamic environment
CN111199558A (en) Image matching method based on deep learning
Zhao et al. Probabilistic spatial distribution prior based attentional keypoints matching network
JP6482130B2 (en) Geometric verification apparatus, program and method
CN105447869B (en) Camera self-calibration method and device based on particle swarm optimization algorithm
JP7123270B2 (en) Neural network active training system and image processing system
Lati et al. Robust aerial image mosaicing algorithm based on fuzzy outliers rejection
CN115008454A (en) Robot online hand-eye calibration method based on multi-frame pseudo label data enhancement
JP6558803B2 (en) Geometric verification apparatus and program
CN113190120B (en) Pose acquisition method and device, electronic equipment and storage medium
CN115630660B (en) Barcode positioning method and device based on convolutional neural network
Geng et al. SANet: A novel segmented attention mechanism and multi-level information fusion network for 6D object pose estimation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant