CN106022310A - HTG-HOG (histograms of temporal gradient and histograms of oriented gradient) and STG (scale of temporal gradient) feature-based human body behavior recognition method - Google Patents
HTG-HOG (histograms of temporal gradient and histograms of oriented gradient) and STG (scale of temporal gradient) feature-based human body behavior recognition method Download PDFInfo
- Publication number
- CN106022310A CN106022310A CN201610420591.7A CN201610420591A CN106022310A CN 106022310 A CN106022310 A CN 106022310A CN 201610420591 A CN201610420591 A CN 201610420591A CN 106022310 A CN106022310 A CN 106022310A
- Authority
- CN
- China
- Prior art keywords
- feature
- htg
- video
- stg
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to novel human body behavior recognition, in particular, an HTG-HOG (histograms of temporal gradient and histograms of oriented gradient) and STG (scale of temporal gradient) feature-based human body behavior recognition method. According to the method, HTG features and STG features are extracted from a depth map; the HTG features are space-time local features of video sequences, HTG features are extracted from each frame of image of the video sequences, the HTG features are fused into a 2-dimensional matrix; HOG features are extracted from the matrix; the STG features are the global features of a whole video sequence; the first K frames of images of each input video sequence which have large larger weighted dynamic energy values are selected as key frames of the video sequences, STG features of the video sequences are extracted according to the key frames; the HTG features and the STG features are fused into an ultra-large vector; and finally, a random decision forest is used to classify the vector. The recognition mechanism of the method of the invention is simple in structure and easy to implement, and is suitable for real-time processing of old people monitoring and intelligent video monitoring.
Description
Technical field
The invention belongs to artificial intelligence and area of pattern recognition, be specifically related to based on HTG-HOG (Histograms of Temporal
Gradient and Histograms of Oriented Gradient) feature and STG (Scale of Temporal Gradient) feature
Human bodys' response technology.
Background technology
At big data age, along with people are growing to the demand of high speed, high-quality video information, intelligent video analysis technology
Seem more and more important.Human bodys' response is one of key technology of intelligent video analysis, is the weight in pattern identification research field
Want one of problem, there is the biggest researching value and meaning, its be widely used in intelligent video monitoring, safeguard and supervision for the aged,
The field such as virtual reality, motion analysis.Along with the appearance of cheap Kinect device, for the human body row of depth data
Artificial intelligence and the emerging study hotspot of area of pattern recognition is had become as Study of recognition.
Video is made up of image one by one, therefore is analyzed namely carrying out image sequence to the human body behavior in video
Process and then extract feature and carry out the process of discriminant classification.According to the structure of research thinking, feature can be divided into the overall situation by us
Feature and local feature.Global characteristics is object of study to be studied as an entirety, is that a kind of research from top to bottom is thought
Dimension.Although this method can comprise more human body information, but also too dependent on the process of bottom vision, it is easily subject to noise, screening
The factors such as gear must affect.In recent years, common global characteristics has shape facility, color characteristic etc..Local feature is then handle
Image block relatively independent in human body regards object of study as, is a kind of research thinking from top to bottom.This method to noise, block
There is stronger stability, but be susceptible to the impact of feature point number change.Common local feature has HOG
(Histograms of Temporal Gradient), STIP (Spatio Temporal Interest Point) etc..
In sum, global characteristics and local feature are respectively arranged with its pluses and minuses.The most in the present invention, in conjunction with global characteristics and local
The feature of feature, defines based on global characteristics (STG feature) and the Activity recognition machine of local feature (HTG-HOG feature)
System.At present, the most not about open source literature and the patent application combining both features.
Summary of the invention
The present invention be directed to the Human bodys' response method that video information is carried out.Can effectively save labour force, reduce work strong
Degree, meanwhile can also improve work efficiency and accuracy of identification.
To achieve the above object of the invention, the technical solution used in the present invention is a kind of human body behavior based on HTG-HOG and STG feature
Recognition mechanism.Comprise the steps:
(1) extraction of STG feature:
(1) the dynamic energy value according to weighted difference figure extracts the key frame of video;
(2) key frame extracted in (1) is calculated the length and width of its non-zero region;
(3) length and width of the non-zero region of original input video is calculated;
(4) ratio of length and width in (2) and (3) is calculated in every frame key frame respectively;And by the ratio of all key frames
It is connected into row vector;
(2) extraction of HTG-HOG feature:
(1) every two field picture is extracted HTG feature;
(2) column vector of the HTG feature in time, extracted by every two field picture in video synthesizes one 2 dimension matrix;
(3) the 2 dimension matrixes produced in above (2) are extracted HOG feature, generate the row vector of HTG-HOG;
(3) two big Feature Fusion become super large vectorial:
The row vector that step () and step (two) are generated is coupled to the row vector of super large, and transposition is super large the most again
Column vector.
(4) use Stochastic Decision-making forest that input video carries out the kind judging of human body behavior.
Owing to technique scheme is used, the present invention compared with prior art has the advantage that
The present invention has merged global characteristics and local feature, it is possible to the automatically kind of human body behavior in detection input video, utilize with
The accuracy of action recognition is detected by machine decision forest, test result indicate that, the present invention can reach the highest action recognition essence
Degree.
Accompanying drawing explanation
Fig. 1 is the concrete frame diagram of identification system of the present invention
Fig. 2 is the identification system of the present invention confusion matrix on MSRAction3D data set
Fig. 3 is the identification system of the present invention confusion matrix on MSRDailyActivity3D data set
Fig. 4 is the identification system of the present invention confusion matrix on MSRActionPair3D data set
Detailed description of the invention
Below in conjunction with the accompanying drawings and case study on implementation the present invention is described further:
Case study on implementation one: in present case, carries out the differentiation of behavior to the video sample in three different data sets.See accompanying drawing 1 institute
Showing, a kind of Human bodys' response method comprises the following steps:
(1) extraction of STG feature:
(1) if (it is by the pixel of N row M row that N*M represents every two field picture in video to the video that input video is N*M*L dimension
Point is constituted, and L represents this video and contains L two field picture), when extracting key frame, difference between two two field pictures before and after first calculating,
Then obtain the differential chart sequence that dimension is N*M* (L-1).For each frame differential chart, according to the size of each pixel,
The value of this pixel is carried out corresponding weighting process.The weights that pixel point value is big are the biggest, and the weights that corresponding pixel value is little are the least.
New weighted difference graphic sequence will be generated after according to said method carrying out differential chart sequence processing.Finally for weighted difference graphic sequence
In every frame figure carry out dynamic power statistics, the front K frame that dynamic power is the highest is i.e. elected to be key frame.
(2), as shown in Figure 1, after selecting key frame, this K frame key frame is carried out the extraction work of STG feature.
First calculate the value of the length and width of the non-zero region of every two field picture in keyframe sequence (dimension is N*M*K).
(3) value of the length and width of the non-zero region of the first two field picture in original input video sequence is tried to achieve.
(4) ratio STG feature as this key frame of (3) and (4) is asked for.Finally by the STG feature of K two field picture
It is connected into the row vector of an a length of 2K.
(2) extraction of HTG-HOG feature:
(1) original input video is first extracted to HTG feature, according toWithCalculating t in video and open the HTG feature of image, wherein (i, j t) represent f
In video sequence, t two field picture is positioned at point (i, j) pixel value at place.Gt,Gx,GyRepresent respectively be this pixel the time,
Grad on x direction, y direction.Calculate each pixel Grad after, further according toWithCalculate the gradient direction at this pixel and gradient width respectively
Value.Gradient direction on t, y direction with the computational methods of gradient magnitude as above computational methods.Last at each pixel
Gradient magnitude add up its rectangular histogram according to the size of the gradient direction value of this point, then finally give the row vector of entitled HTG.
(2) in time in video each two field picture extract HTG feature, according to action occur time sequencing by every frame
The row vector produced is integrated into the matrix of 2 dimensions.
(3) matrix of gained in above (2) is extracted HOG feature again.Its computational methods are similar with the method in (1),
But that calculate is x, its rectangular histogram is obtained this video final by the gradient magnitude on y direction and gradient direction the most again
HTG-HOG row vector feature.
(3) extracted in step () and (two) two big features are connect.Two row vectors that each video is produced
Feature connects together and forms the row vector of a super large, is the column vector of super large by its transposition the most again.
(4) accuracy of identification detection is that the Stochastic Decision-making forest used carries out discriminant classification to the human body behavior act in every section of video.Adopt
With the grader trained, position behavior being carried out discriminant classification, its experiment the results are shown in Table shown in 1.As shown in Table 1, originally
Invent the accuracy of identification that action kind in input video is differentiated and be up to 97.09%.Can be to the most actions in input video
Make correct kind judging.Accompanying drawing 2-4 is respectively the confusion matrix on three data sets.
Table 1
Data set | MSRAction3D | MSRDailyActivity3D | MSRActionPair3D |
Accuracy of identification | 97.09% | 98.75% | 98.33% |
Claims (4)
1. the present invention be directed to the Human bodys' response method that video information is carried out.Can effectively save labour force, reduce work
Intensity, meanwhile can also improve work efficiency and accuracy of identification.
To achieve the above object of the invention, the technical solution used in the present invention is a kind of human body behavior based on HTG-HOG and STG feature
Recognition mechanism.Comprise the steps:
(1) extraction of STG feature:
(1) the dynamic energy value according to weighted difference figure extracts the key frame of video;
(2) key frame extracted in (1) is calculated the length and width of its non-zero region;
(3) length and width of the non-zero region of original input video is calculated;
(4) ratio of length and width in (2) and (3) is calculated in every frame key frame respectively;And by the ratio of all key frames
It is connected into row vector;
(2) extraction of HTG-HOG feature:
(1) every two field picture is extracted HTG feature;
(2) column vector of the HTG feature in time, extracted by every two field picture in video synthesizes one 2 dimension matrix;
(3) the 2 dimension matrixes produced in above (2) are extracted HOG feature, generate the row vector of HTG-HOG;
(3) two big Feature Fusion become super large vectorial:
The row vector that step () and step (two) are generated is coupled to the row vector of super large, and transposition is super large the most again
Column vector.
(4) use Stochastic Decision-making forest that input video carries out the kind judging of human body behavior.
The Human bodys' response method carried out for video information the most according to claim 1, it is characterised in that: in step (one)
In
The extraction process of STG feature:
(1) method used during key-frame extraction is to calculate the weighting dynamic energy value between consecutive image, and selects its value
Big front K frame is as the key frame of this video.Herein shown in being calculated as follows of dynamic weighting energy value: first calculate;Consecutive image
Between difference F (t)=f (i, j, t+1)-f (i, j, t).F (i, j, t) represent be in video t two field picture at point (i, j) place
Pixel value.F (t) is weighted obtaining F the most againw(t)=F (t) w (t), the t frame F finally calculatedw(t) upper all pixels
Value sum.Finally choose and be worth front K frame F (t) image key frame as this video of maximum.
(2), after selecting key frame, this K frame key frame is carried out the extraction work of STG feature.First calculate key frame
The value of the length and width of the non-zero region of every two field picture in sequence (dimension is N*M*K).Respectively that this value is the most defeated with original
Enter the length and width of the non-zero region of the first two field picture in video sequence and ask for the STG feature that its ratio value is this key frame.?
After the STG feature of K two field picture is connected into the row vector of an a length of 2K.
The Human bodys' response method carried out for video information the most according to claim 1, it is characterised in that: in step (one)
The extraction process of HTG-HOG feature:
(1) original input video is first extracted to HTG feature, according toWithCalculating t in video and open the HTG feature of image, wherein (i, j t) represent f
In video sequence, t two field picture is positioned at point (i, j) pixel value at place.Gt,Gx,GyRepresent respectively be this pixel the time,
Grad on x direction, y direction.In practical operation, it is the unit that every pictures is divided into 8*8, then with [-1,01]
As template image carried out process of convolution the Grad of every some pixel value in each of which unit.The method can be the most efficient
Try to achieve the Grad of each pixel in image.Calculate in each unit after the Grad of each pixel, further according toWithCalculate the gradient direction at this pixel in this unit respectively
And gradient magnitude.Gradient direction on t, y direction with the computational methods of gradient magnitude as above computational methods.Finally to often
The gradient magnitude of all pixels in individual unit adds up its rectangular histogram according to the size of the gradient direction value of this point, herein by 20 degree
Divide a direction into.Therefore altogether 360 degree to be divided into 18 gradient direction types, and according to this direction type to its direction gradient
Value carries out statistics and finally gives rectangular histogram.Then the rectangular histogram of 8*8 unit is linked together the row by finally giving entitled HTG
Vector.
(2) in time in video each two field picture extract HTG feature, according to action occur time sequencing by every frame
The row vector produced is integrated into the matrix of 2 dimensions.
(3) matrix of gained in above (2) is extracted HOG feature again.Its computational methods are similar with the method in (1),
But that calculate is x, its rectangular histogram is obtained this video final by the gradient magnitude on y direction and gradient direction the most again
HTG-HOG row vector feature.
The Human bodys' response method carried out for video information the most according to claim 1, it is characterised in that:
Extracted in step () and (two) two big features are connect.Two row vector spies that each video is produced
Levying connects together forms the row vector of a super large, is the column vector of super large by its transposition the most again.And use Stochastic Decision-making forest
The feature extracted in above step is carried out discriminant classification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610420591.7A CN106022310B (en) | 2016-06-14 | 2016-06-14 | Human body behavior identification method based on HTG-HOG and STG characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610420591.7A CN106022310B (en) | 2016-06-14 | 2016-06-14 | Human body behavior identification method based on HTG-HOG and STG characteristics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106022310A true CN106022310A (en) | 2016-10-12 |
CN106022310B CN106022310B (en) | 2021-08-17 |
Family
ID=57087844
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610420591.7A Active CN106022310B (en) | 2016-06-14 | 2016-06-14 | Human body behavior identification method based on HTG-HOG and STG characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106022310B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106815600A (en) * | 2016-12-27 | 2017-06-09 | 浙江工业大学 | For the depth co-ordinative construction and structural chemistry learning method of human behavior identification |
CN110610145A (en) * | 2019-08-28 | 2019-12-24 | 电子科技大学 | Behavior identification method combined with global motion parameters |
WO2020244279A1 (en) * | 2019-06-05 | 2020-12-10 | 北京京东尚科信息技术有限公司 | Method and device for identifying video |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101732055A (en) * | 2009-02-11 | 2010-06-16 | 北京智安邦科技有限公司 | Method and system for testing fatigue of driver |
CN102136066A (en) * | 2011-04-29 | 2011-07-27 | 电子科技大学 | Method for recognizing human motion in video sequence |
US20120027263A1 (en) * | 2010-08-02 | 2012-02-02 | Sony Corporation | Hand gesture detection |
CN105095866A (en) * | 2015-07-17 | 2015-11-25 | 重庆邮电大学 | Rapid behavior identification method and system |
CN105631462A (en) * | 2014-10-28 | 2016-06-01 | 北京交通大学 | Behavior identification method through combination of confidence and contribution degree on the basis of space-time context |
-
2016
- 2016-06-14 CN CN201610420591.7A patent/CN106022310B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101732055A (en) * | 2009-02-11 | 2010-06-16 | 北京智安邦科技有限公司 | Method and system for testing fatigue of driver |
US20120027263A1 (en) * | 2010-08-02 | 2012-02-02 | Sony Corporation | Hand gesture detection |
CN102136066A (en) * | 2011-04-29 | 2011-07-27 | 电子科技大学 | Method for recognizing human motion in video sequence |
CN105631462A (en) * | 2014-10-28 | 2016-06-01 | 北京交通大学 | Behavior identification method through combination of confidence and contribution degree on the basis of space-time context |
CN105095866A (en) * | 2015-07-17 | 2015-11-25 | 重庆邮电大学 | Rapid behavior identification method and system |
Non-Patent Citations (2)
Title |
---|
GURUPRASAD SOMASUNDARAM ET AL.: "Action recognition using global spatio-temporal features derived from sparse representations", 《COMPUTER VISION AND IMAGE UNDERSTANDING》 * |
蔡加欣 等: "基于局部轮廓和随机森林的人体行为识别", 《光学学报》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106815600A (en) * | 2016-12-27 | 2017-06-09 | 浙江工业大学 | For the depth co-ordinative construction and structural chemistry learning method of human behavior identification |
CN106815600B (en) * | 2016-12-27 | 2019-07-30 | 浙江工业大学 | Depth co-ordinative construction and structural chemistry learning method for human behavior identification |
WO2020244279A1 (en) * | 2019-06-05 | 2020-12-10 | 北京京东尚科信息技术有限公司 | Method and device for identifying video |
US11967134B2 (en) | 2019-06-05 | 2024-04-23 | Beijing Jingdong Shangke Information Technology Co., Ltd. | Method and device for identifying video |
CN110610145A (en) * | 2019-08-28 | 2019-12-24 | 电子科技大学 | Behavior identification method combined with global motion parameters |
Also Published As
Publication number | Publication date |
---|---|
CN106022310B (en) | 2021-08-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108537743B (en) | Face image enhancement method based on generation countermeasure network | |
CN111062973B (en) | Vehicle tracking method based on target feature sensitivity and deep learning | |
CN106446930B (en) | Robot operative scenario recognition methods based on deep layer convolutional neural networks | |
CN105069746B (en) | Video real-time face replacement method and its system based on local affine invariant and color transfer technology | |
CN102256065B (en) | Automatic video condensing method based on video monitoring network | |
CN112241762B (en) | Fine-grained identification method for pest and disease damage image classification | |
CN109902558B (en) | CNN-LSTM-based human health deep learning prediction method | |
CN108830252A (en) | A kind of convolutional neural networks human motion recognition method of amalgamation of global space-time characteristic | |
CN108416266A (en) | A kind of video behavior method for quickly identifying extracting moving target using light stream | |
CN107463920A (en) | A kind of face identification method for eliminating partial occlusion thing and influenceing | |
CN105160310A (en) | 3D (three-dimensional) convolutional neural network based human body behavior recognition method | |
CN110826389B (en) | Gait recognition method based on attention 3D frequency convolution neural network | |
CN113256677A (en) | Method for tracking visual target with attention | |
CN105631455A (en) | Image main body extraction method and system | |
CN108090403A (en) | Face dynamic identification method and system based on 3D convolutional neural network | |
WO2019071976A1 (en) | Panoramic image saliency detection method based on regional growth and eye movement model | |
CN112990077B (en) | Face action unit identification method and device based on joint learning and optical flow estimation | |
CN104834909B (en) | A kind of new image representation method based on Gabor comprehensive characteristics | |
CN109635811A (en) | The image analysis method of spatial plant | |
CN109359549A (en) | A kind of pedestrian detection method based on mixed Gaussian and HOG_LBP | |
CN104751111A (en) | Method and system for recognizing human action in video | |
CN109359527A (en) | Hair zones extracting method and system neural network based | |
CN106022310A (en) | HTG-HOG (histograms of temporal gradient and histograms of oriented gradient) and STG (scale of temporal gradient) feature-based human body behavior recognition method | |
CN106529441A (en) | Fuzzy boundary fragmentation-based depth motion map human body action recognition method | |
CN117036770A (en) | Detection model training and target detection method and system based on cascade attention |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |