CN111582220A

CN111582220A - Skeleton point behavior identification system based on shift diagram convolution neural network and identification method thereof

Info

Publication number: CN111582220A
Application number: CN202010419839.4A
Authority: CN
Inventors: 张一帆; 程科; 程健
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2020-05-18
Filing date: 2020-05-18
Publication date: 2020-08-25
Anticipated expiration: 2040-05-18
Also published as: CN111582220B

Abstract

The invention discloses a convolutional neural network skeleton point behavior identification system based on a shift map, which comprises the following components: the behavior recognition system comprises an image acquisition module, an image processing module, an extraction module and a behavior recognition module, wherein the image acquisition module is used for acquiring a behavior image; the image processing module is used for processing the behavior image acquired by the image acquisition module to perform image processing; the extraction module is used for extracting the bone points of the image processed by the image processing module; the behavior identification module is used for identifying and extracting the bone point behavior characteristics extracted by the module. The invention designs a behavior recognition module to recognize the behavior of the skeleton point, reduces the novel graph convolution of the graph convolution calculation amount, is different from the traditional graph convolution, and the shift graph convolution does not expand the feeling range by expanding the convolution kernel but makes the graph characteristics perform shift splicing by novel shift operation, thereby achieving the same or even higher recognition accuracy under the condition of obviously reducing the calculation amount and improving the calculation speed, and avoiding the increase of the calculation amount of the traditional graph convolution along with the increase of the convolution kernel.

Description

Skeleton point behavior identification system based on shift diagram convolution neural network and identification method thereof

Technical Field

The invention relates to a convolutional neural network skeleton point behavior identification system based on a shift map, which relates to the field of general image data processing or generation G06T, in particular to the field of motion analysis of G06T 7/20.

Background

In the behavior recognition task, due to the constraints of data volume and algorithm, the behavior recognition model based on the RGB image is often interfered by the change of the viewing angle and the complex background, so that the generalization performance is insufficient, and the robustness in practical application is poor. While behavior recognition based on skeletal point data may solve this problem well.

In the skeletal point data, the human body is represented by coordinates of several predefined key joint points in the camera coordinate system. It can be conveniently obtained by a depth camera and various attitude estimation algorithms.

However, in this conventional graph convolution method, the modeled convolution kernel covers only a neighborhood of one point. However, in the skeletal point behavior recognition task, some behaviors (such as clapping) need to model the position relationship of points which are physically far apart (such as two hands). This requires increasing the convolution kernel size of the graph convolution model. However, the calculation amount of the graph convolution is increased along with the increase of the convolution kernel, so that the conventional graph convolution calculation amount is large.

Disclosure of Invention

The purpose of the invention is as follows: the system for recognizing the skeleton point behaviors based on the shift diagram convolution neural network is provided to solve the problems in the prior art.

The technical scheme is as follows: a shift-map convolutional neural network-based bone point behavior identification system, comprising:

the image acquisition module is used for acquiring behavior images;

the image processing module is used for processing the behavior image acquired by the image acquisition module to perform image processing;

the skeleton point extraction module is used for extracting the image processed by the image processing module;

and the behavior recognition module is used for extracting the behavior characteristics of the bone points by the recognition extraction module.

In a further embodiment, the image acquisition module is based on the image acquisition device, the image acquisition device is including being the camera that equilateral triangle placed, and set up the rotating device of camera afterbody, rotating device include with camera fixed connection's axis of rotation, cup joint the rotation motor of axis of rotation.

In a further embodiment, the image acquisition module performs human body shooting behaviors through three groups of cameras arranged in an equilateral triangle, and then the behavior images acquired by the three groups of cameras are installed in front of, behind and on the side of the computer terminal respectively, so that the image processing module can compare and process the images.

In a further embodiment, the image processing module mainly processes the human behavior image acquired by the image acquisition module into a human body edge map; traversing pixel points in the image by using a convolution 3 x 3 template when detecting the edge of the image through a Krisch edge detection operator, inspecting pixel gray values of adjacent areas around each pixel point one by one, and calculating the weighted sum difference of the gray values of three adjacent pixels and the gray weighted sum of the rest five pixels; the convolution template is as follows:

1 2 3 4

5 6 7 8

sequentially processing all pixels in the original image by using eight convolution templates, calculating to obtain the edge intensity of the pixels, detecting by using a threshold value, extracting the final edge point, and finishing edge detection;

the method for detecting the image edge by the Krisch operator comprises the following steps:

step 1, acquiring a data area pointer of an original image;

step 2, establishing two buffer areas, wherein the size of the buffer areas is the same as that of the original image, the buffer areas are mainly used for storing the original image and an original image copy, and the two buffer areas are initialized into the original image copy and are respectively marked as an image 1 and an image 2;

step 3, independently setting a Krisch template for convolution operation in each buffer area, respectively traversing pixels in the duplicate image in the two areas, performing convolution operation one by one, calculating results, comparing, storing a calculated comparative value into the image 1, and copying the image 1 into the cache image 2;

step 4, repeating the step 3, setting the remaining six templates at a time, performing calculation processing, and finally storing the larger gray values in the obtained image 1 and the image 2 in the buffer image 1;

and 5, copying the processed image 1 into original image data, and programming to realize edge processing of the image.

In a further embodiment, the extraction module is configured to extract skeleton points of the image processed by the image processing module, and after the image processing module finishes processing the image acquired by the image acquisition module, the skeleton points pre-recorded in the human body edge map are matched according to the closest acquired image actor body type, and then the matched skeleton points are displayed on the human body edge map.

In a further embodiment, the extracting module further includes a correcting module, when the image obtaining module obtains the human behavior image, due to different body types of people, when people with different body types perform the same group of actions, due to different skeleton sizes, three-dimensional coordinates of skeleton points are different, and therefore skeleton sizes need to be normalized to the same size;

firstly, selecting a skeleton of a person as a reference skeleton, selecting a body central point as a root node for a certain frame of skeleton data, calculating all vectors from points directly connected with the root node to the root node, obtaining a direction vector (the module length is 1) of each vector by using each vector as the module length of the vector, multiplying the direction vector by the length of the corresponding vector in the reference skeleton to obtain a vector, adding the vector to the coordinate of the root node to obtain the corrected coordinate of a certain point directly connected with the root node, recording the coordinates of the connected points as the coordinate values of the corresponding skeleton points after normalization, sequentially updating the coordinate values of the root node according to the sequence of a breadth-first search algorithm, and repeating the steps until the values of all the skeleton points are corrected, wherein the algorithm comprises the following steps:

inputting: the length of the limb in the reference appendage is

Preparing a normalized bone point coordinate value;

the first step is as follows: definition of

Is a root node coordinate;

the second step is that: will be provided with

Giving an initial value of

；

A third step; to all of (

) Sequentially executing according to a breadth-first search strategy;

the fourth step: computing

-

；

The fifth step: computing

；

And a sixth step:

+

will be

The value of (a) is saved to the set a;

the seventh step: returning to the third part, knowing that all limbs in the skeleton are traversed;

and (3) outputting: the coordinates of the skeleton points stored in the set A are corrected coordinates;

wherein ,

the value of (A) represents

The body part is provided with a plurality of limbs,

represents the first in the reference valuation

The length of each limb is determined by the length of the limb,

respectively represent the first in the reference valuation

Coordinate values of the starting node and the ending node of each limb, so as to obtain all the coordinates

Calculating the values of the skeleton points to obtain all corrected skeleton point coordinates, and zooming the estimated size under the condition of ensuring that the included angle between the limbs is unchanged;

when the included angle between the limbs changes, the included angle between the vectors is selected to describe the skeleton points so as to avoid the skeleton point deviation when the included angle between the limbs changes;

the steps for solving the included angle of the human joint vector are as follows:

obtaining the angle of a certain joint point, firstly obtaining three joint points used in angle calculation, capturing three-dimensional coordinate values of the joint points by using Kinect, constructing a structural vector between the three joint points, and then obtaining the size of a joint vector included angle by adopting an inverse cosine law;

determining the angle of the first joint

For example;

selecting other two joint points connected with the first joint to obtain three-dimensional coordinate values of the joint points captured by the Kinect, wherein the other two joint points are expressed as

、

The first joint point is represented as

；

Constructing an inter-joint structure vector, the first joint point to

Point vector

=

First joint point to

Point vector

=

，

Point-to-point

Vector of

；

Computing vectors

Sum vector

Angle of (2)

Size:

wherein ,

the range of the angle is between 0 degree and 180 degrees, in order to enable the representation based on the included angle of the joint vector to be more accurate, representative joint angles are selected for representation according to the importance ranking of the joint angles in the action process, and then the position of the bone point is corrected through size normalization and angle correction.

In a further embodiment, the behavior identification module is mainly used for identifying and extracting the behavior features of the bone points, shifting and splicing the adjacent behavior features according to the adjacency relation of the graph, obtaining the calculated behavior features by only performing 1 × 1 convolution once after splicing, and obtaining the calculated behavior features for one image

For a node graph, let the characteristic dimension be

With a characteristic size of

Wherein the node

Is provided with

A node is adjacent to it, and the set of adjacent nodes is

(ii) a For the first

The characteristics of each node are equally divided into by the shift map module

+1 part, the first part retaining its own character, the latter

Shares are shifted from their neighbor node characteristics, mathematically expressed as follows:

=

wherein ,

，

subscript of (1)

A label representing a Python is used,

and double vertical lines represent feature dimensions for feature splicing.

A recognition method based on a shift mapping convolutional neural network skeleton point behavior recognition system comprises the following steps:

step 1, firstly, controlling a camera to rotate through an image acquisition module, and further acquiring a human behavior characteristic image; the rotating motor rotates to drive the rotating shaft to rotate, and then the camera is driven to rotate through the rotating shaft, so that the position of the camera is adjusted;

step 2, the image acquisition module carries out human body shooting behaviors through three groups of cameras which are placed in an equilateral triangle, and then behavior images acquired by the three groups of cameras are respectively displayed on a computer terminal before, after and at the side parts of the behavior images are installed, so that the image processing module can compare and process the images;

step 3, the image processing module mainly processes the human behavior image acquired by the image acquisition module into a human body edge image; traversing pixel points in the image by using a convolution 3 x 3 template when detecting the edge of the image through a Krisch edge detection operator, inspecting pixel gray values of adjacent areas around each pixel point one by one, and calculating the weighted sum difference of the gray values of three adjacent pixels and the gray weighted sum of the rest five pixels;

step 1, acquiring a data area pointer of an original image;

step 5, copying the processed image 1 into original image data, and programming to realize edge processing of the image;

step 4, after the human behavior characteristic image is processed, the extraction module is used for extracting skeletal points of the image processed by the image processing module, and after the image processing module processes the image acquired by the image acquisition module, the human body edge map matches the skeletal points which are input in advance according to the closest acquired image behavior body type, and then displays the matched skeletal points on the human body edge map;

step 5, after the extraction of the skeleton points is completed, correcting the positions of the skeleton points by a correction module, and when the image acquisition module acquires a human behavior image, due to different body types of people, when people with different body types perform the same group of actions, due to different sizes of skeletons of people, the three-dimensional coordinates of the skeleton points are different, so that the sizes of the skeletons need to be normalized to be the same size; firstly, selecting a skeleton of a person as a reference skeleton, selecting a body central point as a root node for a certain frame of skeleton data, calculating all vectors from points directly connected with the root node to the root node, obtaining a direction vector (the module length is 1) of each vector by using the module length of each vector, multiplying the direction vector by the length of the corresponding vector in the reference skeleton to obtain a vector, adding the vector to the coordinate of the root node to obtain a corrected coordinate of a certain point directly connected with the root node, recording the coordinates of the connected points as the coordinate values of the corresponding skeleton points after normalization, sequentially updating the coordinate values of the root node according to the sequence of a breadth-first search algorithm, and repeating the steps until all the values of the skeleton points are corrected; the correction method is to zoom the estimated size under the condition of ensuring the included angle between the limbs to be unchanged;

determining the angle of the first joint

For example;

、

The first joint point is represented as

；

Constructing an inter-joint structure vector, the first joint point to

Point vector

=

First joint point to

Point vector

=

，

Point-to-point

Vector of

；

Computing vectors

Sum vector

Angle of (2)

Size:

wherein ,

the range of the angle is between 0 degree and 180 degrees, in order to enable the representation based on the included angle of the joint vector to be more accurate, representative joint angles are selected for representation according to the importance ranking of the joint angles in the action process, and then the positions of the skeleton points are corrected through size normalization and angle correction;

step 6, after the correction of the skeleton points is completed, the behavior recognition module carries out the behavior recognition of the skeleton points at the moment, the adjacent behavior characteristics are shifted and spliced according to the adjacent relation of the graphs, the calculated behavior characteristics can be obtained only by carrying out 1 x 1 convolution once after the splicing, and for one skeleton point, the calculated behavior characteristics can be obtained

For a node graph, let the characteristic dimension be

With a characteristic size of

Wherein the node

Is provided with

A node is adjacent to it, and the set of adjacent nodes is

(ii) a For the first

+1 part, the first part retaining its own character, the latter

=

wherein ,

，

subscript of (1)

A label representing a Python is used,

and the double vertical lines represent characteristic dimensions to carry out characteristic splicing, so that the behavior characteristics of the bone points are identified.

Has the advantages that: the invention discloses a moving-map convolution-based neural network bone point behavior identification system, wherein a behavior identification module is designed to identify the behavior of a bone point, so that the novel graph convolution capable of obviously reducing the computation amount of the graph convolution is different from the traditional graph convolution.

Drawings

FIG. 1 is a schematic diagram of the convolution of the skeleton point behavior recognition shift map of the present invention.

FIG. 2 is a schematic diagram of the local chart of the present invention.

FIG. 3 is a schematic view of a non-local chart of the present invention.

Fig. 4 is a schematic diagram of conventional graph convolution for identifying the behavior of a bone point.

FIG. 5 is a table of the accuracy and computational complexity contrast of shift graph convolution with conventional graph convolution methods.

Detailed Description

Through research and analysis of the applicant, the reason for this problem (traditional volume calculation is large) is that in the traditional volume method, the modeled convolution kernel can only cover the neighborhood of one point. However, in the skeletal point behavior recognition task, some behaviors (such as clapping) need to model the position relationship of points which are physically far apart (such as two hands). This requires increasing the convolution kernel size of the graph convolution model. However, the calculated amount of the graph convolution is increased along with the increase of the convolution kernel, so that the calculated amount of the traditional graph convolution is larger, the behavior recognition module is designed to recognize the behavior of the bone points, the novel graph convolution capable of obviously reducing the calculated amount of the graph convolution is different from the traditional graph convolution, the sensing range is not expanded by expanding the convolution kernel by the shift graph convolution, the graph characteristics are subjected to shift splicing by a novel shift operation, the same or even higher recognition accuracy can be achieved under the condition that the calculated amount is obviously reduced and the calculation speed is improved, and the phenomenon that the calculated amount of the traditional graph convolution is increased along with the increase of the convolution kernel, so that the calculated amount of the traditional graph convolution is larger is avoided.

A shift-map convolutional neural network-based bone point behavior identification system, comprising: the image acquisition module is used for acquiring behavior images; the image processing module is used for processing the behavior image acquired by the image acquisition module to perform image processing; the skeleton point extraction module is used for extracting the image processed by the image processing module; the behavior recognition module is used for recognizing and extracting the bone point behavior characteristics extracted by the extraction module;

the present invention does not specify a method of skeletal point extraction. There are many methods for extracting human bone points, for example: shooting from a camera, and then obtaining the human skeleton points by an algorithm. And directly obtaining the data from the Kinect camera. The human body wears an acceleration sensor, so that the position of the skeleton is directly obtained; the present invention is concerned with how to perform behavior recognition in the case where bone points have been acquired. However, the present invention is not limited to the method for extracting the bone points, and any method for extracting the bone points is within the scope of the present invention, but in this embodiment, a correction module is provided to perform the identification and correction of the image, and the image acquisition device is correspondingly changed to increase the multi-angle of the image acquisition.

The image acquisition module is based on image acquisition device, image acquisition device is including being the camera that equilateral triangle placed, and set up the rotating device of camera afterbody, rotating device include with camera fixed connection's axis of rotation, cup joint the rotation motor of axis of rotation.

The image acquisition module is through three groups being the human action of making a video recording of the camera ware that equilateral triangle placed, and then the action image that acquires three groups of camera ware is installed before, after, the lateral part and is appeared respectively on computer terminal, and then carries out contrast processing image for image processing module.

The image processing module is mainly used for processing the human behavior image acquired by the image acquisition module into a human body edge map; traversing pixel points in the image by using a convolution 3 x 3 template when detecting the edge of the image through a Krisch edge detection operator, inspecting pixel gray values of adjacent areas around each pixel point one by one, and calculating the weighted sum difference of the gray values of three adjacent pixels and the gray weighted sum of the rest five pixels; the convolution template is as follows:

1 2 3 4

5 6 7 8

sequentially processing all pixels in the original image by using eight convolution templates, calculating to obtain the edge intensity of the pixels, detecting by using a threshold value, extracting the final edge point, and finishing edge detection; the method for detecting the image edge by the Krisch operator comprises the following steps: step 1, acquiring a data area pointer of an original image;

The extraction module is used for extracting the bone points of the image processed by the image processing module, when the image processing module finishes processing the image acquired by the image acquisition module, the positions of the bone points which are input in advance are matched according to the closest acquired image person body type on the human body edge image, and then the matched bone points are displayed on the human body edge image.

The extraction module also comprises a correction module, when the image acquisition module acquires the human behavior image, due to different body types of people, when people with different body types perform the same group of actions, due to different sizes of skeletons, the three-dimensional coordinates of skeleton points are different, so that the sizes of the skeletons need to be normalized to be the same size;

inputting: the length of the limb in the reference appendage is

Preparing a normalized bone point coordinate value;

the first step is as follows: definition of

Is a root node coordinate;

the second step is that: will be provided with

Giving an initial value of

；

A third step; to all of (

) Sequentially executing according to a breadth-first search strategy;

the fourth step: computing

-

；

The fifth step: computing

；

And a sixth step:

+

will be

The value of (a) is saved to the set a;

wherein ,

the value of (A) represents

The body part is provided with a plurality of limbs,

represents the first in the reference valuation

The length of each limb is determined by the length of the limb,

respectively represent the first in the reference valuation

determining the angle of the first joint

For example;

、

The first joint point is represented as

；

Constructing an inter-joint structure vector, the first joint point to

Point vector

=

First joint point to

Point vector

=

，

Point-to-point

Vector of；

Computing vectors

Sum vector

Angle of (2)

Size:

wherein ,

The behavior identification module is mainly used for identifying and extracting the behavior features of the bone points, shifting and splicing the adjacent behavior features according to the adjacent relation of the graphs, obtaining the calculated behavior features by only performing 1 × 1 convolution once after splicing, and obtaining the calculated behavior features for one line

For a node graph, let the characteristic dimension be

With a characteristic size of

Wherein the node

Is provided with

A node is adjacent to it, and the set of adjacent nodes is

(ii) a For the first

+1 part, the first part retaining its own character, the latter

=

wherein ,

，

subscript of (1)

A label representing a Python is used,

the double vertical lines represent characteristic dimensions for characteristic splicing; for intuitive understanding of the above formula, we use a graph of 7 nodes 20-dimensional features as an example, as shown in fig. 2 and 3; here we discuss two cases:

1. the neighborhood of each point contains only physically contiguous locations, we call the local design, shown in FIG. 2;

2. the location of each point contains the entire human skeleton map, we call the non-local design, shown in FIG. 3;

for both designs, we use node 1 (node 1) and node 2 (node 2), respectively, as examples; the following is a detailed explanation of the invention,

in fig. 2, for node 1, there are 1 neighboring nodes (i.e., node 2), so we average their features into 1+1=2, where the first retains its own features (node 1 labeled as part 1) and the second is shifted from node 2 (node 1 labeled as part 2). In fig. 2, for node 2, there are 3 contiguous nodes (i.e. node 1, node 3 and node 4), so we average their characteristics into 3+1=4 shares, the first of which holds its own characteristics (node 1 labeled as part 2) and the last 3 shares are shifted from

nodes

1, 3, 4 respectively (corresponding to node 1 labeled as

parts

1, 3, 4 respectively).

In fig. 3, for any one node, all other nodes are adjacent to it, so we shift the features of all other nodes from the current node. Examples of node 1 and node 2 are shown in fig. 3. After the shift is carried out, the formed features look like a spiral shape, which is the result of the intensive mixing of the features of different nodes, and experiments show that in two designs of the shift graph convolution, the non-local design has higher precision on the task of behavior identification, because the non-local design can better fuse the features of different nodes, the efficient feature fusion can be carried out even if the nodes are far away,

it is worth to be noted that, under the same recognition accuracy, the shift map convolution proposed by us is more than 3 times smaller than the conventional map convolution in computation cost, which is very important for fast recognition, and the method can be faster due to the saved computation times of convolution (compare fig. 1 and fig. 4); another aspect is that the shift operation can be implemented by a pointer in the C + + or CUDA languages, and thus can be very efficiently deployed on a CPU or GPU.

Our main experiment is shown in figure 5. ST-GCN, Adaptive-GCN and Adaptive-NL GCN are three typical methods of conventional GCN. Our Shift-map convolution (Shift GCN) includes both Local Shift GCN and Non-Local Shift GCN designs. As can be seen from the table, the FLOPs (floating point number of computations, representing computational complexity) of our method is more than 3 times smaller than the conventional graph convolution, which is very important for fast identification. Moreover, the accuracy of the method is higher than that of the traditional graph convolution method.

In addition, we also compare the case of reducing the adjacency matrix of the conventional graph convolution, i.e., the model of suffix "one a", and their computation amount is comparable to that of us, but the accuracy is significantly reduced. This means that the accuracy is significantly degraded when the amount of calculation of the conventional graph convolution is reduced. Our Shift-map convolution (Shift GCN) can achieve an accuracy exceeding all previous algorithms with a small amount of computation.

Description of the working principle: firstly, controlling a camera to rotate through an image acquisition module so as to acquire a human behavior characteristic image; the rotating motor rotates to drive the rotating shaft to rotate, and then the camera is driven to rotate through the rotating shaft, so that the position of the camera is adjusted; the image acquisition module is used for shooting human body behaviors through three groups of cameras which are placed in an equilateral triangle, and then behavior images acquired by the three groups of cameras are respectively displayed on a computer terminal before, after and at the side parts of the behavior images are installed, so that the image processing module can compare and process the images; the image processing module is mainly used for processing the human behavior image acquired by the image acquisition module into a human body edge map; traversing pixel points in the image by using a convolution 3 x 3 template when detecting the edge of the image through a Krisch edge detection operator, inspecting pixel gray values of adjacent areas around each pixel point one by one, and calculating the weighted sum difference of the gray values of three adjacent pixels and the gray weighted sum of the rest five pixels; sequentially processing all pixels in the original image by using eight convolution templates, calculating to obtain the edge intensity of the pixels, detecting by using a threshold value, extracting the final edge point, and finishing edge detection; the method for detecting the image edge by the Krisch operator comprises the following steps:

step 1, acquiring a data area pointer of an original image;

after the human behavior characteristic image is processed, the extraction module is used for extracting skeletal points of the image processed by the image processing module, and after the image processing module processes the image acquired by the image acquisition module, the skeletal points which are input in advance are matched according to the closest acquired image behavior body type on the human body edge map, and then the matched skeletal points are displayed on the human body edge map; after the extraction of the skeleton points is completed, the positions of the skeleton points are corrected by a correction module, when the image acquisition module acquires a human behavior image, due to different body types of people, when people with different body types perform the same group of actions, due to different sizes of skeletons, the three-dimensional coordinates of the skeleton points are different, and therefore the sizes of the skeletons need to be normalized to be the same size; firstly, selecting a skeleton of a person as a reference skeleton, selecting a body central point as a root node for a certain frame of skeleton data, calculating all vectors from points directly connected with the root node to the root node, obtaining a direction vector (the module length is 1) of each vector by using the module length of each vector, multiplying the direction vector by the length of the corresponding vector in the reference skeleton to obtain a vector, adding the vector to the coordinate of the root node to obtain a corrected coordinate of a certain point directly connected with the root node, recording the coordinates of the connected points as the coordinate values of the corresponding skeleton points after normalization, sequentially updating the coordinate values of the root node according to the sequence of a breadth-first search algorithm, and repeating the steps until all the values of the skeleton points are corrected; the correction method is to zoom the estimated size under the condition of ensuring the included angle between the limbs to be unchanged; when the included angle between the limbs changes, the included angle between the vectors is selected to describe the skeleton points so as to avoid the skeleton point deviation when the included angle between the limbs changes; the steps for solving the included angle of the human joint vector are as follows: obtaining the angle of a certain joint point, firstly obtaining three joint points used in angle calculation, capturing three-dimensional coordinate values of the joint points by using Kinect, constructing a structural vector between the three joint points, and then obtaining the size of a joint vector included angle by adopting an inverse cosine law; in order to enable the representation based on the joint vector included angle to be more accurate, representative joint angles are selected for representation according to the importance ranking of the joint angles in the action process, and then the positions of the skeleton points are corrected through size normalization and angle correction; after the correction of the skeleton points is completed, the behavior recognition module carries out the behavior recognition of the skeleton points at the moment, the adjacent behavior features are subjected to displacement splicing according to the adjacent relation of the graphs, and the calculated behavior features can be obtained only by carrying out 1 × 1 convolution once after the splicing.

The preferred embodiments of the present invention have been described in detail with reference to the accompanying drawings, however, the present invention is not limited to the specific details of the embodiments, and various equivalent changes can be made to the technical solution of the present invention within the technical idea of the present invention, and these equivalent changes are within the protection scope of the present invention.

Claims

1. A neural network skeleton point behavior identification system based on shift mapping convolution is characterized by comprising the following components:

the behavior recognition module is used for recognizing and extracting the bone point behavior characteristics extracted by the extraction module;

the behavior recognition module is mainly used for recognizing and extracting the behavior characteristics of the bone points according toThe adjacent behavior features are subjected to shift splicing in the adjacent relation of the graphs, the calculated behavior features can be obtained only by performing 1 × 1 convolution once after splicing, and for one graph

For a node graph, let the characteristic dimension be

With a characteristic size of

Wherein the node

Is provided with

A node is adjacent to it, and the set of adjacent nodes is

(ii) a For the first

+1 part, the first part retaining its own character, the latter

=

wherein ,

，

subscript of (1)

A label representing a Python is used,

and double vertical lines represent feature dimensions for feature splicing.

2. The system of claim 1, wherein the system is characterized in that: the device also comprises an image acquisition module for acquiring the behavior image;

3. The system of claim 2, wherein the system is characterized in that: the image acquisition module is through three groups being the human action of making a video recording of the camera ware that equilateral triangle placed, and then the action image that acquires three groups of camera ware is installed before, after, the lateral part and is appeared respectively on computer terminal, and then carries out contrast processing image for image processing module.

4. The system of claim 1, wherein the system is characterized in that: the behavior image acquisition module is used for acquiring behavior images of the user;

1 2 3 4

5 6 7 8

step 1, acquiring a data area pointer of an original image;

5. The system of claim 1, wherein the system is characterized in that: the bone point extraction module is used for extracting the image processed by the image processing module;

6. The system of claim 5, wherein the system is characterized in that: the extraction module also comprises a correction module, when the image acquisition module acquires the human behavior image, due to different body types of people, when people with different body types perform the same group of actions, due to different sizes of skeletons, the three-dimensional coordinates of skeleton points are different, so that the sizes of the skeletons need to be normalized to be the same size;

inputting: the length of the limb in the reference appendage is

Preparing a normalized bone point coordinate value;

the first step is as follows: definition of

Is a root node coordinate;

the second step is that: will be provided with

Giving an initial value of

；

A third step; to all of (

) Sequentially executing according to a breadth-first search strategy;

the fourth step: computing

-

；

The fifth step: computing

；

And a sixth step:

+

will be

The value of (a) is saved to the set a;

wherein ,

the value of (A) represents

The body part is provided with a plurality of limbs,

represents the first in the reference valuation

The length of each limb is determined by the length of the limb,

respectively represent the first in the reference valuation

determining the angle of the first joint

For example;

、

The first joint point is represented as

；

Constructing an inter-joint structure vector, the first joint point to

Point vector

=

First joint point to

Point vector

=

，

Point-to-point

Vector of

；

Computing vectors

Sum vector

Angle of (2)

Size:

wherein ,

7. The method for recognizing the bone point behavior recognition system based on the shift-map convolutional neural network as claimed in claim 1, which comprises:

step 1, acquiring a data area pointer of an original image;

determining the angle of the first joint