CN108681700B

CN108681700B - Complex behavior identification method

Info

Publication number: CN108681700B
Application number: CN201810421670.9A
Authority: CN
Inventors: 杨剑宇; 朱晨; 黄瑶
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2018-05-04
Filing date: 2018-05-04
Publication date: 2021-09-28
Anticipated expiration: 2038-05-04
Also published as: CN108681700A

Abstract

The invention discloses a complex behavior identification method, which comprises the following steps: acquiring three-dimensional bone joint point information of target motion by using a sensor; preprocessing joint point information and normalizing a coordinate system; extracting the motion trail of each joint point and projecting the motion trail to three two-dimensional planes; extracting motion vectors, lengths and direction angles of the motion vectors between every two frames, clustering by using a k-means algorithm to obtain motion elements, and counting to obtain a histogram; calculating the weight of each joint point by combining the time pyramid with the time information and the values of each cluster of all the histograms to form a descriptor; and (4) carrying out classification by using an SVM (support vector machine) to realize action recognition. The invention can extract and effectively express the characteristics of the action bone joint point information, thereby improving the accuracy of action identification; all motion information is completely reserved, and action reconstruction can be carried out; clustering all the motion classes, and capturing human motion characteristics from the whole situation; by using the low-level features, the calculation difficulty is reduced, the action recognition efficiency is improved, and the real-time requirement of the system is met.

Description

Complex behavior identification method

Technical Field

The invention relates to a complex behavior recognition method, and belongs to the technical field of image recognition.

Background

Action recognition is a hotspot of research in the field of machine vision, and the action recognition method is widely applied to the aspects of human-computer interaction, virtual reality, video retrieval, security monitoring and the like. With the development of a depth camera, the information of human body bone joint points can be directly acquired, and the action recognition method based on the bone features greatly improves the accuracy of action recognition. Despite the many relevant studies and exciting results, effective description of human actions remains a challenging task.

Many methods extract a variety of high-level features from skeletal information and then combine them in some form to form descriptors, but descriptors constructed in this combined mode are not complete and there is always a loss of motion information. On the other hand, many methods train on different motion classes separately, which results in a bias in the global features of human motion in the description of each individual class of motion. Meanwhile, the method using the high-level features has a problem of excessive calculation cost. There is a need to design an algorithm that uses low-level features, reduces computational cost, improves algorithm efficiency, while not losing motion information, and extracts global features of human motion from all classes of motion.

Therefore, in order to solve the above technical problems, it is necessary to provide a complex behavior recognition method.

Disclosure of Invention

The invention aims to provide a complex behavior identification method, which is used for extracting and effectively expressing the characteristics of the information of the joint points of the action skeleton, improving the accuracy of the action identification, completely retaining all motion information and reconstructing the action; clustering all the motion classes, and capturing human motion characteristics from the whole situation; by using the low-level features, the calculation difficulty is reduced, the action recognition efficiency is improved, and the real-time requirement of the system is met.

In order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows: a complex behavior recognition method comprises the following steps:

(1) acquiring three-dimensional skeleton joint point information of target motion by using a depth sensor to acquire three-dimensional coordinates of each joint of a human body;

(2) preprocessing the information of the skeletal joint points and normalizing a coordinate system;

(3) extracting the motion trail of each skeleton joint point, and defining the motion between adjacent frames as motion-let;

(4) projecting the three-dimensional trajectory of each skeleton joint point to three two-dimensional planes to obtain a two-dimensional motion-let set of all skeleton joint points;

(5) calculating the vector length parameter and the direction angle parameter of each representative motion-let;

(6) collecting all vectors of all action classes of a single skeleton joint point, and performing two-dimensional clustering on length parameters and direction angle parameters of the single skeleton joint point by using a k-means algorithm to obtain motion elements;

(7) counting the number of motion-lets represented by each motion element to obtain a motion element histogram;

(8) capturing time information of the action by using the time pyramid;

(9) calculating the weight of each bone joint point by combining the values of all the clusters of all the histograms, and finally forming a descriptor;

(10) and training the final descriptor by using an SVM classifier to obtain good division of the multi-action category descriptor and realize action recognition.

Preferably, step (2) comprises: and normalizing the coordinate system by taking the vector from the left shoulder to the right shoulder of the first frame of the action sequence as a horizontal axis and taking the vector from the crotch bone to the midpoint of the two shoulders as a vertical axis, and converting the X-Y-Z coordinate system into an X ' -Y ' -Z ' coordinate system.

Preferably, the motion trajectory of each bone joint point is extracted in step (3), and the specific steps are as follows:

the sequence of actions S for n frames is represented as:

S＝{Γ^j|j∈[1,J]}，

Γ^j＝{p^j(t)|t∈[1,n],j∈[1,J]}，

wherein gamma is^jIs the three-dimensional track of the joint point J, J is the total number of the joint points, t is the serial number of the frame number, P^j(t) is the position of the joint j in t frames:

the motion of the skeletal joint point between the two frames is defined as motion-let, and the motion-let of the skeletal joint point j between the frame t and the frame t +1 can be expressed as a vector v^j(t)：

The three-dimensional trajectory Γ of the skeletal joint point j^jCan be expressed as a sequence of vectors:

Γ^j＝{v^j(t)|t∈[1,n-1]}。

preferably, step (4) is specifically as follows:

projecting each 3D motion-let to three two-dimensional planes to obtain:

wherein the content of the first and second substances,

and

the 2D motion-let on three two-dimensional planes respectively has the following calculation formula:

in connection with all motion-lets, the motion sequence S is further represented as:

preferably, step (5) is specifically as follows:

the calculation formula of the parameters of the skeletal joint point j under the x-y coordinate system is as follows:

the calculation formula of the parameters of the skeletal joint point j under the y-z coordinate system is as follows:

the calculation formula of the parameters of the skeletal joint point j under the x-z coordinate system is as follows:

wherein

And

the motion-let of the skeleton joint point j in the frame from t to t +1 corresponds to the direction angle parameter of the vector in three two-dimensional planes, the value range is-180 degrees to 180 degrees,

and

and the length parameters of the motion-let of the skeleton joint point j in the frame from t to t +1 correspond to the vectors in three two-dimensional planes.

Preferably, step (6) is specifically as follows:

under the x-y coordinate system, the number of the skeletal joint points j and the cluster centers is K, all the cluster centers can be represented as a set U:

wherein

Is a cluster of the kth clusterThe center of the class is the center of the class,

and

is the coordinate value of the center of the cluster,

each point is represented by the cluster center of the cluster, K cluster centers represent corresponding K motion elements, and the motion element of the kth cluster

Expressed as:

thus, all motion primitives P of the motion sequence S are:

preferably, in step (7), all motion-lets in each two-dimensional plane of each bone joint point of the motion sequence S are represented by corresponding motion primitives, and the number of motion-lets represented by each motion primitive is counted to form a histogram, so that the values of all clusters in the histogram at three coordinates of the bone joint point j can be represented as H^j：

Wherein the content of the first and second substances,

and

the values of the kth bin of the histogram of motion primitives for the three two-dimensional trajectories of the skeletal joint point j.

Preferably, the stepsStep (8) capturing time information by adopting three layers of time pyramids, wherein the first layer calculates a motion element histogram aiming at the whole complete track, the second layer divides the track into two parts according to time, the motion element histogram is respectively counted, the third layer divides the two parts into two parts, and finally the 3D track descriptor D of the bone joint point j is obtained^jComprises the following steps:

preferably, step (9) is specifically as follows:

calculating the length of the three-dimensional track of all the joint points in the whole action sequence to respectively obtain the total motion amount m of all the joint points^j：

The total motion quantity of all the joint points of the sample is accumulated to obtain the total motion quantity F of the motion sample S_S：

All training samples of each type of action are collected and arranged according to sample sequence numbers, and then a set is formed:

F＝{F₁,F₂,...,F_e}，

where e is the number of samples in a training set of a certain type of motion,

for each action class, calculate M^jAnd F covariance to obtain a covariance set, wherein the covariance set comprises the following specific calculation steps:

the total motion quantity of all J joint points and the total motion quantity F of the motion sample form a J + 1-dimensional random variable G:

G＝(M¹,M²,...,M^J,F)^T，

then matrix

The covariance matrix is a covariance matrix of the J + 1-dimensional random variable G, and the covariance calculation formula is as follows:

cov_ij＝E[Mⁱ-E(Mⁱ)]×E[M^j-E(M^j)]，

wherein F is M^j+1Participating in the calculation, E (M)^j) Is M^jThe calculation formula is as follows:

the last column of the matrix C is a covariance set of Mj and F; if the covariance cov of Mj and F_j(J+1)Less than 0, the weight w of the node j^jIs 0; if the covariance cov of Mj and F_j(J+1)If the weight is greater than or equal to 0, the weight w of the joint point j^jComprises the following steps:

therein, max { cov_i(J+1)|i∈[1,J+1]Means the maximum value of the J +1 th column of the matrix C;

finally, the descriptor D of the action sequence S is represented as:

D＝{D^j×w^j|j∈[1,J]}。

due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:

the method can extract and effectively express the characteristics of the action bone joint point information, thereby improving the accuracy of action identification; all motion information can be completely reserved, and motion reconstruction can be carried out; clustering all the motion classes, and capturing human motion characteristics from the whole situation; by using the low-level features, the calculation difficulty is reduced, the action recognition efficiency is improved, and the real-time requirement of the system can be met.

Drawings

FIG. 1 is a schematic diagram of the present invention showing the normalization of a coordinate system.

FIG. 2 is a schematic representation of the present invention projecting three-dimensional trajectories of skeletal joint points onto three two-dimensional planes.

Fig. 3 is a schematic diagram of the calculation method of the length parameter and the direction angle parameter of the present invention.

Fig. 4 is a schematic diagram of the clustering result of the vector two-dimensional parameters of all motion classes of the skeletal joint point on three planes.

FIG. 5 is a histogram of the bone joint points of the present invention in three coordinate systems, x-y, y-z, and x-z.

FIG. 6 is a schematic diagram of the temporal pyramid of the present invention.

Fig. 7 is a flow chart of the present invention.

Detailed Description

The invention is further described with reference to the following figures and examples:

the first embodiment is as follows: referring to fig. 7, a complex behavior recognition method includes the following steps:

(2) preprocessing the skeleton joint point information, normalizing a coordinate system, as shown in figure 1, by taking a vector from a left shoulder to a right shoulder as a horizontal axis and a vector from a crotch bone to a midpoint of two shoulders as a vertical axis, and converting an X-Y-Z coordinate system into an X ' -Y ' -Z ' coordinate system;

(3) connecting the three-dimensional coordinates of each bone joint point in the action sequence according to the time sequence to obtain the three-dimensional tracks of all the bone joint points;

this embodiment uses a 60 frame motion sequence S (two hands swinging) with 20 skeletal joint points, expressed as:

S＝{Γ^j|j∈[1,20]}，

Γ^j＝{p^j(t)|t∈[1,60],j∈[1,20]}，

Γ^j＝{v^j(t)|t∈[1,n-1]}；

(4) for each skeletal joint, its three-dimensional trajectory is projected onto three two-dimensional planes, namely the x-y, y-z and x-z planes, resulting in three two-dimensional trajectories, as shown in FIG. 2.

Each 3D motion-let is projected onto three two-dimensional planes, resulting in:

wherein the content of the first and second substances,

and

in connection with all motion-lets, the motion sequence S can be further expressed as:

(5) the length and orientation angle of each vector representing the motion-let, shown in FIG. 3, are calculated as follows for skeletal joint point j in the x-y coordinate system:

wherein

And

and

(6) In order to globally extract human motion features, for each skeletal joint point, all vectors of the skeletal joint point of all classes of motion are collected, and the length parameter and the direction angle parameter of the vector are subjected to two-dimensional clustering by using a k-means algorithm. Taking a certain joint as an example, the clustering result is shown in fig. 4.

Taking the clustering result of the bone joint point j in the x-y coordinate system as an example, if the number K of the clustering centers is 8, all the clustering centers can be represented as a set U:

wherein

Is the cluster center of the k-th cluster,

and

is the coordinate value of the cluster center.

Each point is represented by the cluster center of its cluster, and 8 cluster centers represent the corresponding 8 motion primitives. E.g. motion primitive of the kth cluster

Can be expressed as:

thus, all motion primitives P of the motion sequence S are:

(7) for the motion sequence S, all motion-lets in each two-dimensional plane of each skeletal joint point are represented by corresponding motion primitives, and the number of motion-lets represented by each motion primitive is counted to form a histogram, as shown in fig. 5.

For the most central cluster and the corresponding motion primitives in the clustering result, statistics and subsequent calculation are not involved, because the most central cluster contains vectors in all directions and has no division significance in direction angles. In addition, the vector length parameters of the centermost cluster are small, and contribute little to the motion composition, which can be ignored.

The values of all clusters in the three histograms of bone joint points j can be represented as H^j：

Wherein the content of the first and second substances,

and

values of kth column of histogram of motion elements for three two-dimensional trajectories of skeletal joint point j。

(8) The time pyramid is applied to capture the time information of the motion, taking three layers of time pyramids as an example, the first layer calculates a motion element histogram for the whole complete track, the second layer divides the track into two parts according to time, the motion element histogram is respectively counted, and the two parts are further divided into two parts in the third layer. The description of the bone joint point j is thus divided into 7 parts, as shown in fig. 6: of the first layer

Of the second layer

And

of the third layer

And

thus, the 3D trajectory descriptor D of the skeletal joint j^jIs a combination of the above 7 moieties, namely:

(9) considering that different skeletal joint points contribute differently to the recognition of a motion, it is necessary to increase the importance of key skeletal joint point descriptors to the recognition of a motion. Taking the descriptor of the bone joint j as an example, calculate the corresponding weight w^j。

All training samples of each type of action are collected and arranged according to sample numbers, and in this embodiment, if there are 80 samples in a training set of a certain type of action, there is a set:

F＝{F₁,F₂,...,F₈₀}。

for each action class, calculate M^jAnd F covariance to obtain a covariance set. The specific calculation steps are as follows: the total motion quantity of all 20 joint points and the total motion quantity F of the motion sample form a 21-dimensional random variable G:

G＝(M¹,M²,...,M²⁰,F)^T，

then matrix

Is a covariance matrix of 21-dimensional random variables G, and the covariance calculation formula is:

cov_ij＝E[Mⁱ-E(Mⁱ)]×E[M^j-E(M^j)]，

wherein F is M²¹Participating in the calculation, E (M)^j) Is M^jThe calculation formula is as follows:

the last column of the matrix C is the covariance set of Mj and F. If the covariance cov of Mj and F_j(21)Less than 0, then the joint jWeight w^jIs 0. If the covariance cov of Mj and F_j(21)If the weight is greater than or equal to 0, the weight w of the joint point j^jComprises the following steps:

therein, max { cov_i(21)|i∈[1,21]It refers to the maximum value of column 21 of the matrix C.

Thus, the descriptor D of the action sequence S can be expressed as:

D＝{D^j×w^j|j∈[1,20]}。

(10) and training a final descriptor by using an SVM classifier to obtain good division of the multi-action category descriptor and realize action recognition.

Claims

1. A complex behavior recognition method is characterized by comprising the following steps:

(8) capturing time information of the action by using the time pyramid;

2. The complex behavior recognition method according to claim 1, wherein the step (2) comprises: and normalizing the coordinate system by taking the vector from the left shoulder to the right shoulder of the first frame of the action sequence as a horizontal axis and taking the vector from the crotch bone to the midpoint of the two shoulders as a vertical axis, and converting the X-Y-Z coordinate system into an X ' -Y ' -Z ' coordinate system.

3. The complex behavior recognition method according to claim 1, wherein the motion trajectory of each skeletal joint point is extracted in step (3), and specifically as follows:

the sequence of actions S for n frames is represented as:

，

，

wherein

Is the three-dimensional track of the joint point J, J is the total number of the joint points, t is the serial number of the frame number,

is the position of the joint point j in the t frame:

；

the movement of the skeletal joint point between the two frames is defined as motion-let, and the skeletal joint point j is from t frame to t +Motion-let between 1 frame can be represented as a vector

：

，

The three-dimensional trajectory of the skeletal joint point j

Can be expressed as a sequence of vectors:

。

4. the complex behavior recognition method according to claim 3, wherein the step (4) is specifically as follows:

projecting each 3D motion-let to three two-dimensional planes to obtain:

，

wherein the content of the first and second substances,

，

and

，

，

，

。

5. the complex behavior recognition method according to claim 3, wherein the step (5) is specifically as follows:

，

，

，

，

，

，

wherein

，

And

，

and

6. The complex behavior recognition method according to claim 1, wherein the step (6) is specifically as follows:

，

wherein

Is the cluster center of the k-th cluster,

and

is the coordinate value of the center of the cluster,

Expressed as:

，

thus, all motion primitives P of the motion sequence S are:

。

7. the complex behavior recognition method according to claim 6, wherein in step (7), all motion-lets in each two-dimensional plane of each bone joint point of the motion sequence S are represented by corresponding motion primitives, and the number of motion-lets represented by each motion primitive is counted to form a histogram, so that the values of all clusters in the histogram at three coordinates of the bone joint point j can be represented as

：

，

Wherein the content of the first and second substances,

，

and

8. The complex behavior recognition method according to claim 1, wherein the step (8) captures the time information by using a three-layer time pyramid, the first layer calculates a motion primitive histogram for the whole complete trajectory, the second layer divides the trajectory into two parts according to time, the motion primitive histogram is respectively counted, the third layer further divides the two parts into two parts, and finally the 3D trajectory descriptor of the skeletal joint j is obtained

Comprises the following steps:

wherein, the first layer is the values of all clusters in the three histograms of the bone joint points j

The second layer is

The third layer is

。

9. The complex behavior recognition method according to claim 3, wherein the step (9) is specifically as follows:

calculating the length of the three-dimensional track of all the joint points in the whole action sequence to respectively obtain the respective total motion amount of all the joint points