CN103886293B

CN103886293B - Human body behavior recognition method based on history motion graph and R transformation

Info

Publication number: CN103886293B
Application number: CN201410106957.4A
Authority: CN
Inventors: 肖俊; 李潘; 庄越挺
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2014-03-21
Filing date: 2014-03-21
Publication date: 2017-04-26
Anticipated expiration: 2034-03-21
Also published as: CN103886293A

Abstract

The invention discloses a human body behavior recognition method based on a history motion graph and R transformation. According to the method, a depth video is used as a recognition basis, firstly, the minimum enclosure rectangle of human body motion is calculated according to a foreground segmentation technology, then the history motion graph is extracted within a depth video area limited by the minimum enclosure rectangle, motion intensity constraint is exerted on the extracted history motion graph, so that a motion energy diagram is obtained, R transformation is calculated on the obtained motion energy graph, and therefore a characteristic vector used for behavior recognition is obtained. A method of a support vector machine is adopted for training and recognition processes. The minimum enclosure rectangle of human body behavior motion is adopted for preprocessing, and behavior characteristic extraction is accelerated; a method of history motion graph sequences is adopted for reducing influences of noise in depth graphs; characteristics are extracted through performing R transformation on the energy graph, so that calculation speed is high.

Description

A kind of Human bodys' response method converted based on motion history figure and R

Technical field

The present invention relates to computer vision and image processing field, more particularly to a kind of motion history figure that is based on is with R conversion Human bodys' response method.

Background technology

Video monitoring is the focus and Important Problems of current visual field research, in the neck such as safety-security area and man-machine interaction Domain, continuously produces large number of video data, and these data are weighed easily with the unit of G, only with manually sentencing Can undoubtedly not expend huge manpower. video content enriches, and we are concerned only with some of video part when most of, such as Say human body behavior, if it is possible to be automatically and efficiently identified, substantial amounts of manpower will be liberated.Current Activity recognition achievement in research Focus primarily upon in the Activity recognition research of rgb video.

Rgb video is a kind of modal form of video, and wide material sources have for many years more achievement in research, are currently based on The Activity recognition method of rgb video is broadly divided into space-time analysis method（Space-timeapproach）, sequence analysis method （Sequential approach）And hierarchical parsing approach（Hierarchical approach）Three major types.Through for many years Research bottleneck in terms of development, the Human bodys' response based on rgb video is increasingly highlighted, and reason is rgb video as human body row For identification data source when ambient interferences be difficult to remove.Prior thing, rgb video uses two merely with two dimensional surface information Dimension information obviously have lost many key messages describing 3 D human body behavior.

With the progress of technology, a kind of cheap photographic head-Kinect for being furnished with depth transducer is occurred in that in recent years. This Kinect photographic head of Microsoft can obtain quality acceptable depth information while normal RGB image is obtained. The algorithm of skeleton study is integrated with photographic head, the bone information of normal human in three-dimensional scenic can have been obtained.It is deep at present The feature extraction of degree figure is main still to use for reference the experience of the extraction feature on RGB in the past.At the same time, many common data sets It has been proposed that being very easy to the research of feature extraction on depth map.Zicheng Liu et al. are proposed based on three-dimensional data The method of profile (A bag of3D words), he sees depth map as three-dimensional data, then in cartesian space on, Left and first three direction projection simultaneously obtains projected outline, and down-sampling goes out the point of fixed number as spy in projected outline after this Levy, the feature for drawing is inserted and is identified in Action Graph models.Bingbing Ni independent acquisitions one are referred to as The depth data collection of RGBD-HuDaAct, and the thought of 3D-MHIs has been used in depth map sequence feature extraction first.These Method has respective limitation：The method recognition accuracy of A bag of3D words is higher, but due to needing in human body wheel Uniform sampling on exterior feature, it is desirable to which the depth data for obtaining is very pure, it is impossible to use in the Human bodys' response of actual scene；Directly Scoop out enough fast with the method speed of 3D-MHIs, but recognition accuracy is inadequate；DMM-HOG is while recognition accuracy is ensured It is also relatively more effective to the Activity recognition of complex background, but the method too takes, it is impossible to realize real-time body's Activity recognition.

The content of the invention

The present invention is directed to the deficiencies in the prior art, it is proposed that a kind of human body behavior based on motion history figure and R conversion is known Other method.The method uses deep video as basis of characterization, the concept that motion history figure and R are converted has been applied to into behavior special Among levying extraction process, and the method using support vector machine carries out training and the identification process of Activity recognition.

The method includes off-line training step and ONLINE RECOGNITION stage, comprises the following steps that：

Step (1). off-line training step

Described off-line training step purpose is to obtain a Human bodys' response model, and its step is as follows：

Deep video S to be trained is cut into the multiple deep video fragments of time span identical by step 1-1., then Different behavior classifications according to each deep video fragment stamp different behavior labellings, thereby is achieved Human bodys' response Training set T.

Described training set T is the set of each deep video fragment of different behavior labellings；

Described time span is the time span of the video segment to be identified of ONLINE RECOGNITION stage definitions；

Step 1-2. obtains the most parcel of human body behavior campaign in each deep video fragment with " foreground segmentation techniques " Square is enclosed, and the minimum video content for surrounding square restriction in deep video fragment is zoomed to into unified size.

Described " foreground segmentation techniques " operation is as follows：

A) for deep video fragment V that training set T gives, it is by some frame depth map { P₁,P₂,...,P_iStructure Into wherein i represents the i-th frame depth map；For wherein any one depth map P_i, by P_iMiddle pixel is according to pixel position Depth value carries out k-means binary clusters, obtains foreground pixel set and background pixel set；Described foreground pixel compares background The average depth value of pixel is little.

B) in depth map P_iOn find out a rectangle frame R_iSo that step a）The all foreground pixels for obtaining are included in this Individual rectangle frame R_iIt is interior, R_iByWithConstitute, wherein WithR is represented respectively_i Left margin, right margin, the pixel coordinate of coboundary and lower boundary；Then by rectangle frame R_iAccording to being laterally divided into wide two Point, if rectangle frame R_iLeft-half pixel number it is more than right half part, and ifIt is moved to the left K（K is constant, can root Adjust according to practical application scene）The pixel number of rectangle inframe new after individual pixel is more than most original rectangle frame R_iInterior number η ﹪（50<η<100, can be adjusted according to practical application scene）, then willK pixel is adjusted to the left, if new after moving boundary The pixel number of rectangle inframe is less than the η ﹪ of pixel number in most original rectangle frame Ri, then right margin adjustment is completed；If square Shape frame R_iRight half part pixel it is more than left-half, and willThe pixel of rectangle inframe after the K pixel that move right Number is more than most original rectangle frame R_iThe η ﹪ of interior number, then willK pixel is adjusted to the right, if new rectangle after moving boundary The pixel number of inframe is less than most original rectangle frame R_iThe η ﹪ of middle pixel number, then left margin adjustment is completed；If rectangle frame R_iLeft and right two halves partial pixel point in number of pixels difference be less than ε (ε is threshold parameter), then judge right boundary simultaneously To center whether the remaining pixel number of the stylish rectangle inframe of K/2 pixel is drawn close more than original rectangular frame R_iInterior whole pixels η ﹪, if set up, by rectangle frame R_iK/2 pixel is respectively drawn according to right boundary to be adjusted, afterwards repeat step （b）, until η ﹪ of the remaining pixel number of new rectangle inframe less than whole pixels in original rectangular frame Ri.Using above-mentioned same The method of sample is to rectangle frame R_iUp-and-down boundary be adjusted.

C) deep video fragment V is by the three dimensions of tri- dimension descriptions of abscissa x, vertical coordinate y and time coordinate t Body, this three-dimensional bodies through step b) adjustment after, any one frame P in deep video fragment V_iForeground pixel be divided out Come, the foreground pixel scope is by R_iIt is described.If minimum four coboundaries for surrounding square R of human body behavior in deep video S R_up, lower boundary R_down, left margin R_leftWith right margin R_rightDifference can be with according to formula（1）Calculate：

Formula（1）；

Start cross-talk sequence S that random time length of window is τ from moment j in step 1-3. deep video fragment V_j, A motion history figure can be obtained, its calculation is as follows：

Formula（2）；

Wherein, I (x, y, t) represent deep video t the capture of pixel (x, y) position depth value；The model of t Enclose for [j, j+ τ -1]；δI_thFor constant threshold, j, τ are natural number；

The present invention takes three random time length of window τ^s、τ^m、τ^l, obtain corresponding motion history figureWherein s, m, l are natural number, m=2s, l=4s, and s and are proportional to deep video fragment V Time span；

Through the process of step 1-3, deep video fragment is converted to motion history graphic sequence, remembers three obtained by the present invention Individual time window length motion history figureIn the deep video piece that the extension of time dimension is constituted The motion history graphic sequence of section V is expressed as MHIsO, wherein o=s, m, l.

Step 1-4. in step 1-3 obtain any one motion history graphic sequence MHIsO, if H^O(x, y, t) table Show motion history graphic sequence MHIs^OIn t frames pixel (x, y) position intensity.The interference of noise in order to exclude depth map, To motion history graphic sequence MHIs^OFurther strength constraint is carried out, according to motion history graphic sequence MHIs^OIt is calculated as follows energy Figure D^o, wherein D^oIn each position (x, y) value D^o(x, y) computational methods are shown in formula（3）：

Formula（3）；

Wherein, μ (θ) is unit jump function, and when θ >=0, μ (θ) is 1, and as θ ＜ 0, μ (θ) is 0；ε is threshold constant, Can be adjusted according to design application scenarios；N is the time span of deep video fragment V.

Energy diagram D of step 1-5. to acquisition^o, its R conversion is asked, R conversion is calculated, obtain the behavior of deep video fragment V FeatureIt is specific as follows：

Energy diagram D is calculated first^oRadon conversion, computational methods are shown in formula（4）：

Formula（4）；

Then, θ directions omnirange is integrated, obtains R conversion, calculation such as formula（5）：

Formula（5）；

It is right in order to prevent yardstick from affectingIt is normalized, i.e.,Will WithIt is spliced to form the behavior characteristicss of deep video fragment V

Behavior characteristicss of step 1-6. according to deep video fragment VThe deep video fragment obtained with step 1-1 Behavior labelling, using support vector machine identification model M is trained.

Step (2). the ONLINE RECOGNITION stage

Described ONLINE RECOGNITION stage purpose is that identification model M obtained using off-line training step carries out Activity recognition, Its step is as follows：

Step 2-1. is waited to know with off-line training step operating procedure 1-1～1-6 identicals method to video extraction to be identified The behavior characteristicss of other video.

Described ONLINE RECOGNITION stage identification granularity is consistent when training with off-line training step.

Behavior characteristicss of step 2-2. based on video to be identified, according to training the model M come using support vector machine pair Video to be identified carries out Activity recognition.

Method proposed by the invention has the advantages that compared with traditional Human bodys' response method：

1. the most parcel of human body behavior campaign used in off-line training step and ONLINE RECOGNITION phase characteristic extraction process Square this preprocessing process is enclosed, the process of behavior characteristicss extraction is accelerated, while eliminating the interference of complex background.

2. the key message of human body behavior campaign is kept down using the method for motion history graphic sequence, due to depth Figure is natural with three-dimensional motion information, therefore has higher human body descriptive power compared to the Activity recognition based on rgb video, protects The human body behavior description ability that the key message for staying also more is strengthened, the motion history figure strength constraint of back to back time dimension subtracts Effect of noise in little depth map.

3. the final step that behavior characteristicss are extracted carries out R conversion on energy diagram and extracts feature, is fully obtaining energy diagram Fast this advantage of calculating speed is remained on the basis of upper intensity and profile information, therefore this method is ensureing recognition accuracy Activity recognition can in real time be carried out simultaneously, it should be noted that the profile and strength information of motion are remained on energy diagram, is to original The well-refined and description of beginning motor behavior.

Based on above three feature, the invention provides a kind of quick, effective human body behavior characteristicss and based on this feature Human bodys' response method.

Description of the drawings

Fig. 1 is the flow chart of the inventive method behavior characteristicss extraction process, wherein figure (a) is specific flow process, figure (b) is The image preview corresponding with figure (a)；

Fig. 2 is the outline flowchart of the inventive method.

Specific embodiment

With reference to the accompanying drawings and detailed description the present invention is further illustrated.

As shown in Figure 1 and Figure 2, the present invention includes off-line training step and ONLINE RECOGNITION stage.

Step (1). off-line training step

Off-line training step purpose is to obtain a Human bodys' response model, and its step is as follows：

Deep video S to be trained is cut into the multiple deep video fragments of time span identical by step 1-1., so Afterwards different behavior labellings are stamped according to the different behavior classifications of each deep video fragment, thereby is achieved Human bodys' response Training set T.

Step 1-2. obtains the most parcel of human body behavior campaign in each deep video fragment with " foreground segmentation techniques " Square is enclosed, and the video content that minimum encirclement square in deep video fragment is limited is zoomed to into unified size as 320*240.

Described " foreground segmentation techniques " are described as follows：

A) for deep video fragment V that training set T gives, it is by some frame depth map { P₁,P₂,...,P_iStructure Into wherein i represents natural number, for wherein any one depth map P_i, by P_iDepth value of the middle pixel according to pixel position K-means binary clusters are carried out, two set comprising foreground pixel with background pixel respectively are obtained；Described foreground pixel ratio The average depth value of background pixel is little.

B) in depth map P_iOn find out a rectangle frame R_iSo that all foreground pixels that step a is obtained are included in this Rectangle frame R_iIt is interior, R_iByWithConstitute, wherein WithR is represented respectively_i's The pixel coordinate of left margin, right margin, coboundary and lower boundary；Then by rectangle frame R_iAccording to being laterally divided into wide two Point, if rectangle frame R_iLeft-half pixel number it is more than right half part, and ifIt is moved to the left K（K is constant, can basis Practical application scene is adjusted）The pixel number of rectangle inframe new after individual pixel is more than most original rectangle frame R_iInterior number 90 ﹪ (90 ﹪ are recommended value, can be adjusted according to practical application scene), then willK pixel is adjusted to the left, if moving boundary The pixel number deficiency most original rectangle frame R of rectangle inframe afterwards_i90 ﹪ of interior pixel number, then right margin adjusted Into；If rectangle frame R_iRight half part pixel number it is more than left-half, and willRectangle after the K pixel that move right The pixel number of inframe is more than most original rectangle frame R_i90 ﹪ of interior number, then willK pixel is adjusted to the right, if mobile The pixel number deficiency most original rectangle frame R of rectangle inframe after border_i90 ﹪ of middle pixel number, then left margin adjustment Complete；If number of pixels difference is less than ε (ε is threshold parameter) in the left and right two halves partial pixel point of rectangle frame Ri, judge Right boundary is drawn close into whether the remaining pixel number of the stylish rectangle inframe of K/2 pixel is more than original rectangular to center simultaneously Frame R_i90 ﹪ of interior whole pixels, if set up, by rectangle frame R_iK/2 pixel is respectively drawn according to right boundary to be adjusted, Repeat step afterwards（b）, until the remaining pixel number of new rectangle inframe is less than original rectangular frame R_iInterior whole pixels 90 ﹪.Using above-mentioned same method to rectangle frame R_iUp-and-down boundary be adjusted.

C) deep video fragment V is by the three dimensions of tri- dimension descriptions of abscissa x, vertical coordinate y and time coordinate t Body, after step b), any one frame P in deep video fragment V_iForeground pixel be divided out, the foreground pixel scope By R_iIt is described.If minimum four coboundary R for surrounding square R of human body behavior in deep video S^up, lower boundary R^down, the left side Boundary R^leftWith right margin R^rightDifference can be with according to formula（1）Calculate：

Formula（1）；

Formula（2）；

Wherein, I (x, y, t) represent deep video t the capture of pixel (x, y) position depth value.The model of t Enclose for [j, j+ τ -1].δI_thFor constant threshold, j, τ are natural number；

From the beginning of any time t, the present invention takes length of window τ continuous time^s=4, τ^m=8 and τ^l=16, obtain corresponding Motion history graphic sequenceWherein s, m, l are natural number, m=2s, l=4s, and s direct ratios In the time span of deep video fragment V；

Through the process of step 1-3, deep video fragment is converted to motion history graphic sequence, what note was obtained by the present invention Three time window length motion history figuresIn the deep video that the extension of time dimension is constituted The motion history graphic sequence of fragment V is expressed as MHIs^O, wherein o=s, m, l.

Step 1-4. is for any one motion history graphic sequence MHIs obtained in step 1-3^O, wherein o=s, m, l, If H^O(x, y, t) represents motion history graphic sequence MHIs^OIn t frames pixel (x, y) position intensity.In order to exclude depth map The interference of middle noise, to motion history graphic sequence MHIs^OFurther strength constraint is carried out, according to motion history graphic sequence MHIs^O It is calculated as follows energy diagram D^o, wherein D^oIn each position (x, y) value D^o(x, y) computational methods are shown in formula（3）：

Formula（3）；

Wherein, μ (θ) is unit jump function, and when θ >=0, μ (θ) is 1, and as θ ＜ 0, μ (θ) is 0.ε is threshold constant, Can be adjusted according to design application scenarios.N is the time span of deep video fragment V.

Formula（4）；

Formula（5）；

It is right in order to prevent yardstick from affectingIt is normalized, i.e.,Will It is spliced to form the behavior characteristicss of deep video fragment V

Behavior characteristicss of step 1-6. according to deep video fragment VThe deep video fragment obtained with step (1) Behavior labelling, using support vector machine identification model M is trained.

Step (2). the ONLINE RECOGNITION stage

ONLINE RECOGNITION stage purpose is that identification model M obtained using off-line training step carries out Activity recognition, its step It is as follows：

Step 2-1. is waited to know with off-line training step operating procedure 1-1～1-5 identicals method to video extraction to be identified The behavior characteristicss of other video.

Above-described embodiment is not that the present invention is not limited only to above-described embodiment, as long as meeting for the restriction of the present invention Application claims, belong to protection scope of the present invention.

Claims

1. it is a kind of based on motion history figure and the Human bodys' response method of R conversion, it is characterised in that the method includes offline instruction Practice stage and ONLINE RECOGNITION stage, comprise the following steps that：

Step (1). off-line training step：

Deep video S to be trained is cut into the multiple deep video fragments of time span identical by step 1-1., then according to The different behavior classifications of each deep video fragment stamp different behavior labellings, thereby is achieved the training of Human bodys' response Collection T；

Step 1-2. obtains the minimum of human body behavior campaign in each deep video fragment and surrounds square with " foreground segmentation techniques ", And the minimum video content for surrounding square restriction in deep video fragment is zoomed to into unified size；

Described " foreground segmentation techniques " operation is as follows：

A) for deep video fragment V that training set T gives, it is by some frame depth map { P₁,P₂,...,P_iConstitute, its Middle i represents the i-th frame depth map；For wherein any one depth map P_i, by P_iDepth value of the middle pixel according to pixel position K-means binary clusters are carried out, foreground pixel set and background pixel set is obtained；Described foreground pixel is than background pixel Average depth value is little；

B) in depth map P_iOn find out a rectangle frame R_iSo that all foreground pixels that step a) is obtained are included in this square Shape frame R_iIt is interior, R_iByWithConstitute, wherein WithR is represented respectively_iThe left side The pixel coordinate of boundary, right margin, coboundary and lower boundary；Then by rectangle frame R_iAccording to being laterally divided into wide two parts, if Rectangle frame R_iLeft-half pixel number it is more than right half part, and ifIt is moved to the left rectangle frame new after K pixel Interior pixel number is more than most original rectangle frame R_iThe η ﹪ of interior number, wherein K are constant, and for even number, 50<η<100, then WillK pixel is adjusted to the left, if the pixel number of new rectangle inframe is less than most original rectangle frame R after moving boundary_i The η ﹪ of interior pixel number, then right margin adjustment is completed；If rectangle frame R_iRight half part pixel it is more than left-half, And willThe pixel number of rectangle inframe is more than most original rectangle frame R after the K pixel that move right_iThe η ﹪ of interior number, then WillK pixel is adjusted to the right, if the pixel number of new rectangle inframe is less than most original rectangle frame R after moving boundary_iIn The η ﹪ of pixel number, then left margin adjustment is completed；If rectangle frame R_iLeft and right two halves partial pixel point in number of pixels difference Less than ε, ε is threshold parameter, then judge for right boundary to draw close the stylish rectangle inframe of K/2 pixel to center simultaneously remaining Whether pixel number is more than original rectangular frame R_iThe η ﹪ of interior whole pixels, if set up, by rectangle frame R_iAccording to left and right side Boundary respectively draws K/2 pixel in and is adjusted, afterwards repeat step b), until the remaining pixel number of new rectangle inframe is less than original Beginning rectangle frame R_iThe η ﹪ of interior whole pixels；Using above-mentioned same method to rectangle frame R_iUp-and-down boundary be adjusted；

C) deep video fragment V is the three-dimensional bodies described by tri- dimensions of abscissa x, vertical coordinate y and time coordinate t, this Three-dimensional bodies through step b) adjustment after, any one frame P in deep video fragment V_iForeground pixel be divided out, before this Scape pixel coverage is by R_iIt is described；If minimum four coboundary R for surrounding square R of human body behavior in deep video S^up, it is following Boundary R^down, left margin R^leftWith right margin R^rightRespectively with according to formula (1) calculating：

Start cross-talk sequence S that random time length of window is τ from moment j in step 1-3. deep video fragment V_j, can be in the hope of Go out a motion history figureIts calculation is as follows：

Wherein, I (x, y, t) represent deep video t the capture of pixel (x, y) position depth value；The scope of t is [j,j+τ-1]；δI_thFor constant threshold, j, τ are natural number；

Take three random time length of window τ^s、τ^m、τ^l, obtain corresponding motion history figure Wherein s, m, l are the time span that natural number, m=2s, l=4s, and s are proportional to deep video fragment V；

Through the process of step 1-3, deep video fragment is converted to motion history graphic sequence, and three for remembering time window is long Degree motion history figureThe motion history of deep video fragment V constituted in the extension of time dimension Graphic sequence is expressed as MHIs^O, wherein o=s, m, l；

Step 1-4. is for any one motion history graphic sequence MHIs obtained in step 1-3^OIf, H^O(x, y, t) represents motion History graphic sequence MHIs^OIn t frames pixel (x, y) position intensity；According to motion history graphic sequence MHIs^OIt is calculated as follows energy D ° of spirogram, value D ° (x, the y) computational methods of each position (x, y) are shown in formula (3) in wherein D °：

Wherein, μ (θ) is unit jump function, and when θ >=0, μ (θ) is 1, and as θ ＜ 0, μ (θ) is 0；ε is threshold constant；N is deep The time span of degree video segment V；

Step 1-5. asks its R conversion to D ° of the energy diagram for obtaining, and calculates R conversion, obtains the behavior characteristicss of deep video fragment VIt is specific as follows：

The Radon conversion of D ° of energy diagram is calculated first, and computational methods are shown in formula (4)：

Then, θ directions omnirange is integrated, obtains R conversion, calculation such as formula (5)：

It is rightIt is normalized, i.e.,x∈[0°,180°)；Will It is spliced to form the behavior characteristicss of deep video fragment V

Behavior characteristicss of step 1-6. according to deep video fragment VThe behavior of the deep video fragment obtained with step 1-1 Labelling, using support vector machine identification model M is trained；

Step (2). the ONLINE RECOGNITION stage：

Step 2-1. is to be identified to video extraction to be identified with off-line training step operating procedure 1-1～1-6 identicals method to be regarded The behavior characteristicss of frequency；

Described ONLINE RECOGNITION stage identification granularity is consistent when training with off-line training step；

Step 2-2. based on video to be identified behavior characteristicss, according to train come model M treat knowledge using support vector machine Other video carries out Activity recognition.