CN106022310A

CN106022310A - HTG-HOG (histograms of temporal gradient and histograms of oriented gradient) and STG (scale of temporal gradient) feature-based human body behavior recognition method

Info

Publication number: CN106022310A
Application number: CN201610420591.7A
Authority: CN
Inventors: 张汗灵
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2016-06-14
Filing date: 2016-06-14
Publication date: 2016-10-12
Anticipated expiration: 2036-06-14
Also published as: CN106022310B

Abstract

The invention relates to novel human body behavior recognition, in particular, an HTG-HOG (histograms of temporal gradient and histograms of oriented gradient) and STG (scale of temporal gradient) feature-based human body behavior recognition method. According to the method, HTG features and STG features are extracted from a depth map; the HTG features are space-time local features of video sequences, HTG features are extracted from each frame of image of the video sequences, the HTG features are fused into a 2-dimensional matrix; HOG features are extracted from the matrix; the STG features are the global features of a whole video sequence; the first K frames of images of each input video sequence which have large larger weighted dynamic energy values are selected as key frames of the video sequences, STG features of the video sequences are extracted according to the key frames; the HTG features and the STG features are fused into an ultra-large vector; and finally, a random decision forest is used to classify the vector. The recognition mechanism of the method of the invention is simple in structure and easy to implement, and is suitable for real-time processing of old people monitoring and intelligent video monitoring.

Description

Human bodys' response method based on HTG-HOG and STG feature

Technical field

The invention belongs to artificial intelligence and area of pattern recognition, be specifically related to based on HTG-HOG (Histograms of Temporal Gradient and Histograms of Oriented Gradient) feature and STG (Scale of Temporal Gradient) feature Human bodys' response technology.

Background technology

At big data age, along with people are growing to the demand of high speed, high-quality video information, intelligent video analysis technology Seem more and more important.Human bodys' response is one of key technology of intelligent video analysis, is the weight in pattern identification research field Want one of problem, there is the biggest researching value and meaning, its be widely used in intelligent video monitoring, safeguard and supervision for the aged, The field such as virtual reality, motion analysis.Along with the appearance of cheap Kinect device, for the human body row of depth data Artificial intelligence and the emerging study hotspot of area of pattern recognition is had become as Study of recognition.

Video is made up of image one by one, therefore is analyzed namely carrying out image sequence to the human body behavior in video Process and then extract feature and carry out the process of discriminant classification.According to the structure of research thinking, feature can be divided into the overall situation by us Feature and local feature.Global characteristics is object of study to be studied as an entirety, is that a kind of research from top to bottom is thought Dimension.Although this method can comprise more human body information, but also too dependent on the process of bottom vision, it is easily subject to noise, screening The factors such as gear must affect.In recent years, common global characteristics has shape facility, color characteristic etc..Local feature is then handle Image block relatively independent in human body regards object of study as, is a kind of research thinking from top to bottom.This method to noise, block There is stronger stability, but be susceptible to the impact of feature point number change.Common local feature has HOG (Histograms of Temporal Gradient), STIP (Spatio Temporal Interest Point) etc..

In sum, global characteristics and local feature are respectively arranged with its pluses and minuses.The most in the present invention, in conjunction with global characteristics and local The feature of feature, defines based on global characteristics (STG feature) and the Activity recognition machine of local feature (HTG-HOG feature) System.At present, the most not about open source literature and the patent application combining both features.

Summary of the invention

The present invention be directed to the Human bodys' response method that video information is carried out.Can effectively save labour force, reduce work strong Degree, meanwhile can also improve work efficiency and accuracy of identification.

To achieve the above object of the invention, the technical solution used in the present invention is a kind of human body behavior based on HTG-HOG and STG feature Recognition mechanism.Comprise the steps:

(1) extraction of STG feature:

(1) the dynamic energy value according to weighted difference figure extracts the key frame of video；

(2) key frame extracted in (1) is calculated the length and width of its non-zero region；

(3) length and width of the non-zero region of original input video is calculated；

(4) ratio of length and width in (2) and (3) is calculated in every frame key frame respectively；And by the ratio of all key frames It is connected into row vector；

(2) extraction of HTG-HOG feature:

(1) every two field picture is extracted HTG feature；

(2) column vector of the HTG feature in time, extracted by every two field picture in video synthesizes one 2 dimension matrix；

(3) the 2 dimension matrixes produced in above (2) are extracted HOG feature, generate the row vector of HTG-HOG；

(3) two big Feature Fusion become super large vectorial:

The row vector that step () and step (two) are generated is coupled to the row vector of super large, and transposition is super large the most again Column vector.

(4) use Stochastic Decision-making forest that input video carries out the kind judging of human body behavior.

Owing to technique scheme is used, the present invention compared with prior art has the advantage that

The present invention has merged global characteristics and local feature, it is possible to the automatically kind of human body behavior in detection input video, utilize with The accuracy of action recognition is detected by machine decision forest, test result indicate that, the present invention can reach the highest action recognition essence Degree.

Accompanying drawing explanation

Fig. 1 is the concrete frame diagram of identification system of the present invention

Fig. 2 is the identification system of the present invention confusion matrix on MSRAction3D data set

Fig. 3 is the identification system of the present invention confusion matrix on MSRDailyActivity3D data set

Fig. 4 is the identification system of the present invention confusion matrix on MSRActionPair3D data set

Detailed description of the invention

Below in conjunction with the accompanying drawings and case study on implementation the present invention is described further:

Case study on implementation one: in present case, carries out the differentiation of behavior to the video sample in three different data sets.See accompanying drawing 1 institute Showing, a kind of Human bodys' response method comprises the following steps:

(1) extraction of STG feature:

(1) if (it is by the pixel of N row M row that N*M represents every two field picture in video to the video that input video is N*M*L dimension Point is constituted, and L represents this video and contains L two field picture), when extracting key frame, difference between two two field pictures before and after first calculating, Then obtain the differential chart sequence that dimension is N*M* (L-1).For each frame differential chart, according to the size of each pixel, The value of this pixel is carried out corresponding weighting process.The weights that pixel point value is big are the biggest, and the weights that corresponding pixel value is little are the least. New weighted difference graphic sequence will be generated after according to said method carrying out differential chart sequence processing.Finally for weighted difference graphic sequence In every frame figure carry out dynamic power statistics, the front K frame that dynamic power is the highest is i.e. elected to be key frame.

(2), as shown in Figure 1, after selecting key frame, this K frame key frame is carried out the extraction work of STG feature. First calculate the value of the length and width of the non-zero region of every two field picture in keyframe sequence (dimension is N*M*K).

(3) value of the length and width of the non-zero region of the first two field picture in original input video sequence is tried to achieve.

(4) ratio STG feature as this key frame of (3) and (4) is asked for.Finally by the STG feature of K two field picture It is connected into the row vector of an a length of 2K.

(2) extraction of HTG-HOG feature:

(1) original input video is first extracted to HTG feature, according toWithCalculating t in video and open the HTG feature of image, wherein (i, j t) represent f In video sequence, t two field picture is positioned at point (i, j) pixel value at place.G_t,G_x,G_yRepresent respectively be this pixel the time, Grad on x direction, y direction.Calculate each pixel Grad after, further according toWithCalculate the gradient direction at this pixel and gradient width respectively Value.Gradient direction on t, y direction with the computational methods of gradient magnitude as above computational methods.Last at each pixel Gradient magnitude add up its rectangular histogram according to the size of the gradient direction value of this point, then finally give the row vector of entitled HTG.

(2) in time in video each two field picture extract HTG feature, according to action occur time sequencing by every frame The row vector produced is integrated into the matrix of 2 dimensions.

(3) matrix of gained in above (2) is extracted HOG feature again.Its computational methods are similar with the method in (1), But that calculate is x, its rectangular histogram is obtained this video final by the gradient magnitude on y direction and gradient direction the most again HTG-HOG row vector feature.

(3) extracted in step () and (two) two big features are connect.Two row vectors that each video is produced Feature connects together and forms the row vector of a super large, is the column vector of super large by its transposition the most again.

(4) accuracy of identification detection is that the Stochastic Decision-making forest used carries out discriminant classification to the human body behavior act in every section of video.Adopt With the grader trained, position behavior being carried out discriminant classification, its experiment the results are shown in Table shown in 1.As shown in Table 1, originally Invent the accuracy of identification that action kind in input video is differentiated and be up to 97.09%.Can be to the most actions in input video Make correct kind judging.Accompanying drawing 2-4 is respectively the confusion matrix on three data sets.

Table 1

Data set	MSRAction3D	MSRDailyActivity3D	MSRActionPair3D
				Accuracy of identification	97.09%	98.75%	98.33%

Claims

1. the present invention be directed to the Human bodys' response method that video information is carried out.Can effectively save labour force, reduce work Intensity, meanwhile can also improve work efficiency and accuracy of identification.

(1) extraction of STG feature:

(2) extraction of HTG-HOG feature:

(1) every two field picture is extracted HTG feature；

(3) two big Feature Fusion become super large vectorial:

The Human bodys' response method carried out for video information the most according to claim 1, it is characterised in that: in step (one) In

The extraction process of STG feature:

(1) method used during key-frame extraction is to calculate the weighting dynamic energy value between consecutive image, and selects its value Big front K frame is as the key frame of this video.Herein shown in being calculated as follows of dynamic weighting energy value: first calculate；Consecutive image Between difference F (t)=f (i, j, t+1)-f (i, j, t).F (i, j, t) represent be in video t two field picture at point (i, j) place Pixel value.F (t) is weighted obtaining F the most again_w(t)=F (t) w (t), the t frame F finally calculated_w(t) upper all pixels Value sum.Finally choose and be worth front K frame F (t) image key frame as this video of maximum.

(2), after selecting key frame, this K frame key frame is carried out the extraction work of STG feature.First calculate key frame The value of the length and width of the non-zero region of every two field picture in sequence (dimension is N*M*K).Respectively that this value is the most defeated with original Enter the length and width of the non-zero region of the first two field picture in video sequence and ask for the STG feature that its ratio value is this key frame.? After the STG feature of K two field picture is connected into the row vector of an a length of 2K.

The Human bodys' response method carried out for video information the most according to claim 1, it is characterised in that: in step (one) The extraction process of HTG-HOG feature:

(1) original input video is first extracted to HTG feature, according toWithCalculating t in video and open the HTG feature of image, wherein (i, j t) represent f In video sequence, t two field picture is positioned at point (i, j) pixel value at place.G_t,G_x,G_yRepresent respectively be this pixel the time, Grad on x direction, y direction.In practical operation, it is the unit that every pictures is divided into 8*8, then with [-1,01] As template image carried out process of convolution the Grad of every some pixel value in each of which unit.The method can be the most efficient Try to achieve the Grad of each pixel in image.Calculate in each unit after the Grad of each pixel, further according toWithCalculate the gradient direction at this pixel in this unit respectively And gradient magnitude.Gradient direction on t, y direction with the computational methods of gradient magnitude as above computational methods.Finally to often The gradient magnitude of all pixels in individual unit adds up its rectangular histogram according to the size of the gradient direction value of this point, herein by 20 degree Divide a direction into.Therefore altogether 360 degree to be divided into 18 gradient direction types, and according to this direction type to its direction gradient Value carries out statistics and finally gives rectangular histogram.Then the rectangular histogram of 8*8 unit is linked together the row by finally giving entitled HTG Vector.

The Human bodys' response method carried out for video information the most according to claim 1, it is characterised in that:

Extracted in step () and (two) two big features are connect.Two row vector spies that each video is produced Levying connects together forms the row vector of a super large, is the column vector of super large by its transposition the most again.And use Stochastic Decision-making forest The feature extracted in above step is carried out discriminant classification.