CN106022310A - HTG-HOG (histograms of temporal gradient and histograms of oriented gradient) and STG (scale of temporal gradient) feature-based human body behavior recognition method - Google Patents

HTG-HOG (histograms of temporal gradient and histograms of oriented gradient) and STG (scale of temporal gradient) feature-based human body behavior recognition method Download PDF

Info

Publication number
CN106022310A
CN106022310A CN201610420591.7A CN201610420591A CN106022310A CN 106022310 A CN106022310 A CN 106022310A CN 201610420591 A CN201610420591 A CN 201610420591A CN 106022310 A CN106022310 A CN 106022310A
Authority
CN
China
Prior art keywords
feature
htg
video
stg
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610420591.7A
Other languages
Chinese (zh)
Other versions
CN106022310B (en
Inventor
张汗灵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN201610420591.7A priority Critical patent/CN106022310B/en
Publication of CN106022310A publication Critical patent/CN106022310A/en
Application granted granted Critical
Publication of CN106022310B publication Critical patent/CN106022310B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to novel human body behavior recognition, in particular, an HTG-HOG (histograms of temporal gradient and histograms of oriented gradient) and STG (scale of temporal gradient) feature-based human body behavior recognition method. According to the method, HTG features and STG features are extracted from a depth map; the HTG features are space-time local features of video sequences, HTG features are extracted from each frame of image of the video sequences, the HTG features are fused into a 2-dimensional matrix; HOG features are extracted from the matrix; the STG features are the global features of a whole video sequence; the first K frames of images of each input video sequence which have large larger weighted dynamic energy values are selected as key frames of the video sequences, STG features of the video sequences are extracted according to the key frames; the HTG features and the STG features are fused into an ultra-large vector; and finally, a random decision forest is used to classify the vector. The recognition mechanism of the method of the invention is simple in structure and easy to implement, and is suitable for real-time processing of old people monitoring and intelligent video monitoring.

Description

Human bodys' response method based on HTG-HOG and STG feature
Technical field
The invention belongs to artificial intelligence and area of pattern recognition, be specifically related to based on HTG-HOG (Histograms of Temporal Gradient and Histograms of Oriented Gradient) feature and STG (Scale of Temporal Gradient) feature Human bodys' response technology.
Background technology
At big data age, along with people are growing to the demand of high speed, high-quality video information, intelligent video analysis technology Seem more and more important.Human bodys' response is one of key technology of intelligent video analysis, is the weight in pattern identification research field Want one of problem, there is the biggest researching value and meaning, its be widely used in intelligent video monitoring, safeguard and supervision for the aged, The field such as virtual reality, motion analysis.Along with the appearance of cheap Kinect device, for the human body row of depth data Artificial intelligence and the emerging study hotspot of area of pattern recognition is had become as Study of recognition.
Video is made up of image one by one, therefore is analyzed namely carrying out image sequence to the human body behavior in video Process and then extract feature and carry out the process of discriminant classification.According to the structure of research thinking, feature can be divided into the overall situation by us Feature and local feature.Global characteristics is object of study to be studied as an entirety, is that a kind of research from top to bottom is thought Dimension.Although this method can comprise more human body information, but also too dependent on the process of bottom vision, it is easily subject to noise, screening The factors such as gear must affect.In recent years, common global characteristics has shape facility, color characteristic etc..Local feature is then handle Image block relatively independent in human body regards object of study as, is a kind of research thinking from top to bottom.This method to noise, block There is stronger stability, but be susceptible to the impact of feature point number change.Common local feature has HOG (Histograms of Temporal Gradient), STIP (Spatio Temporal Interest Point) etc..
In sum, global characteristics and local feature are respectively arranged with its pluses and minuses.The most in the present invention, in conjunction with global characteristics and local The feature of feature, defines based on global characteristics (STG feature) and the Activity recognition machine of local feature (HTG-HOG feature) System.At present, the most not about open source literature and the patent application combining both features.
Summary of the invention
The present invention be directed to the Human bodys' response method that video information is carried out.Can effectively save labour force, reduce work strong Degree, meanwhile can also improve work efficiency and accuracy of identification.
To achieve the above object of the invention, the technical solution used in the present invention is a kind of human body behavior based on HTG-HOG and STG feature Recognition mechanism.Comprise the steps:
(1) extraction of STG feature:
(1) the dynamic energy value according to weighted difference figure extracts the key frame of video;
(2) key frame extracted in (1) is calculated the length and width of its non-zero region;
(3) length and width of the non-zero region of original input video is calculated;
(4) ratio of length and width in (2) and (3) is calculated in every frame key frame respectively;And by the ratio of all key frames It is connected into row vector;
(2) extraction of HTG-HOG feature:
(1) every two field picture is extracted HTG feature;
(2) column vector of the HTG feature in time, extracted by every two field picture in video synthesizes one 2 dimension matrix;
(3) the 2 dimension matrixes produced in above (2) are extracted HOG feature, generate the row vector of HTG-HOG;
(3) two big Feature Fusion become super large vectorial:
The row vector that step () and step (two) are generated is coupled to the row vector of super large, and transposition is super large the most again Column vector.
(4) use Stochastic Decision-making forest that input video carries out the kind judging of human body behavior.
Owing to technique scheme is used, the present invention compared with prior art has the advantage that
The present invention has merged global characteristics and local feature, it is possible to the automatically kind of human body behavior in detection input video, utilize with The accuracy of action recognition is detected by machine decision forest, test result indicate that, the present invention can reach the highest action recognition essence Degree.
Accompanying drawing explanation
Fig. 1 is the concrete frame diagram of identification system of the present invention
Fig. 2 is the identification system of the present invention confusion matrix on MSRAction3D data set
Fig. 3 is the identification system of the present invention confusion matrix on MSRDailyActivity3D data set
Fig. 4 is the identification system of the present invention confusion matrix on MSRActionPair3D data set
Detailed description of the invention
Below in conjunction with the accompanying drawings and case study on implementation the present invention is described further:
Case study on implementation one: in present case, carries out the differentiation of behavior to the video sample in three different data sets.See accompanying drawing 1 institute Showing, a kind of Human bodys' response method comprises the following steps:
(1) extraction of STG feature:
(1) if (it is by the pixel of N row M row that N*M represents every two field picture in video to the video that input video is N*M*L dimension Point is constituted, and L represents this video and contains L two field picture), when extracting key frame, difference between two two field pictures before and after first calculating, Then obtain the differential chart sequence that dimension is N*M* (L-1).For each frame differential chart, according to the size of each pixel, The value of this pixel is carried out corresponding weighting process.The weights that pixel point value is big are the biggest, and the weights that corresponding pixel value is little are the least. New weighted difference graphic sequence will be generated after according to said method carrying out differential chart sequence processing.Finally for weighted difference graphic sequence In every frame figure carry out dynamic power statistics, the front K frame that dynamic power is the highest is i.e. elected to be key frame.
(2), as shown in Figure 1, after selecting key frame, this K frame key frame is carried out the extraction work of STG feature. First calculate the value of the length and width of the non-zero region of every two field picture in keyframe sequence (dimension is N*M*K).
(3) value of the length and width of the non-zero region of the first two field picture in original input video sequence is tried to achieve.
(4) ratio STG feature as this key frame of (3) and (4) is asked for.Finally by the STG feature of K two field picture It is connected into the row vector of an a length of 2K.
(2) extraction of HTG-HOG feature:
(1) original input video is first extracted to HTG feature, according toWithCalculating t in video and open the HTG feature of image, wherein (i, j t) represent f In video sequence, t two field picture is positioned at point (i, j) pixel value at place.Gt,Gx,GyRepresent respectively be this pixel the time, Grad on x direction, y direction.Calculate each pixel Grad after, further according toWithCalculate the gradient direction at this pixel and gradient width respectively Value.Gradient direction on t, y direction with the computational methods of gradient magnitude as above computational methods.Last at each pixel Gradient magnitude add up its rectangular histogram according to the size of the gradient direction value of this point, then finally give the row vector of entitled HTG.
(2) in time in video each two field picture extract HTG feature, according to action occur time sequencing by every frame The row vector produced is integrated into the matrix of 2 dimensions.
(3) matrix of gained in above (2) is extracted HOG feature again.Its computational methods are similar with the method in (1), But that calculate is x, its rectangular histogram is obtained this video final by the gradient magnitude on y direction and gradient direction the most again HTG-HOG row vector feature.
(3) extracted in step () and (two) two big features are connect.Two row vectors that each video is produced Feature connects together and forms the row vector of a super large, is the column vector of super large by its transposition the most again.
(4) accuracy of identification detection is that the Stochastic Decision-making forest used carries out discriminant classification to the human body behavior act in every section of video.Adopt With the grader trained, position behavior being carried out discriminant classification, its experiment the results are shown in Table shown in 1.As shown in Table 1, originally Invent the accuracy of identification that action kind in input video is differentiated and be up to 97.09%.Can be to the most actions in input video Make correct kind judging.Accompanying drawing 2-4 is respectively the confusion matrix on three data sets.
Table 1
Data set MSRAction3D MSRDailyActivity3D MSRActionPair3D
Accuracy of identification 97.09% 98.75% 98.33%

Claims (4)

1. the present invention be directed to the Human bodys' response method that video information is carried out.Can effectively save labour force, reduce work Intensity, meanwhile can also improve work efficiency and accuracy of identification.
To achieve the above object of the invention, the technical solution used in the present invention is a kind of human body behavior based on HTG-HOG and STG feature Recognition mechanism.Comprise the steps:
(1) extraction of STG feature:
(1) the dynamic energy value according to weighted difference figure extracts the key frame of video;
(2) key frame extracted in (1) is calculated the length and width of its non-zero region;
(3) length and width of the non-zero region of original input video is calculated;
(4) ratio of length and width in (2) and (3) is calculated in every frame key frame respectively;And by the ratio of all key frames It is connected into row vector;
(2) extraction of HTG-HOG feature:
(1) every two field picture is extracted HTG feature;
(2) column vector of the HTG feature in time, extracted by every two field picture in video synthesizes one 2 dimension matrix;
(3) the 2 dimension matrixes produced in above (2) are extracted HOG feature, generate the row vector of HTG-HOG;
(3) two big Feature Fusion become super large vectorial:
The row vector that step () and step (two) are generated is coupled to the row vector of super large, and transposition is super large the most again Column vector.
(4) use Stochastic Decision-making forest that input video carries out the kind judging of human body behavior.
The Human bodys' response method carried out for video information the most according to claim 1, it is characterised in that: in step (one) In
The extraction process of STG feature:
(1) method used during key-frame extraction is to calculate the weighting dynamic energy value between consecutive image, and selects its value Big front K frame is as the key frame of this video.Herein shown in being calculated as follows of dynamic weighting energy value: first calculate;Consecutive image Between difference F (t)=f (i, j, t+1)-f (i, j, t).F (i, j, t) represent be in video t two field picture at point (i, j) place Pixel value.F (t) is weighted obtaining F the most againw(t)=F (t) w (t), the t frame F finally calculatedw(t) upper all pixels Value sum.Finally choose and be worth front K frame F (t) image key frame as this video of maximum.
(2), after selecting key frame, this K frame key frame is carried out the extraction work of STG feature.First calculate key frame The value of the length and width of the non-zero region of every two field picture in sequence (dimension is N*M*K).Respectively that this value is the most defeated with original Enter the length and width of the non-zero region of the first two field picture in video sequence and ask for the STG feature that its ratio value is this key frame.? After the STG feature of K two field picture is connected into the row vector of an a length of 2K.
The Human bodys' response method carried out for video information the most according to claim 1, it is characterised in that: in step (one) The extraction process of HTG-HOG feature:
(1) original input video is first extracted to HTG feature, according toWithCalculating t in video and open the HTG feature of image, wherein (i, j t) represent f In video sequence, t two field picture is positioned at point (i, j) pixel value at place.Gt,Gx,GyRepresent respectively be this pixel the time, Grad on x direction, y direction.In practical operation, it is the unit that every pictures is divided into 8*8, then with [-1,01] As template image carried out process of convolution the Grad of every some pixel value in each of which unit.The method can be the most efficient Try to achieve the Grad of each pixel in image.Calculate in each unit after the Grad of each pixel, further according toWithCalculate the gradient direction at this pixel in this unit respectively And gradient magnitude.Gradient direction on t, y direction with the computational methods of gradient magnitude as above computational methods.Finally to often The gradient magnitude of all pixels in individual unit adds up its rectangular histogram according to the size of the gradient direction value of this point, herein by 20 degree Divide a direction into.Therefore altogether 360 degree to be divided into 18 gradient direction types, and according to this direction type to its direction gradient Value carries out statistics and finally gives rectangular histogram.Then the rectangular histogram of 8*8 unit is linked together the row by finally giving entitled HTG Vector.
(2) in time in video each two field picture extract HTG feature, according to action occur time sequencing by every frame The row vector produced is integrated into the matrix of 2 dimensions.
(3) matrix of gained in above (2) is extracted HOG feature again.Its computational methods are similar with the method in (1), But that calculate is x, its rectangular histogram is obtained this video final by the gradient magnitude on y direction and gradient direction the most again HTG-HOG row vector feature.
The Human bodys' response method carried out for video information the most according to claim 1, it is characterised in that:
Extracted in step () and (two) two big features are connect.Two row vector spies that each video is produced Levying connects together forms the row vector of a super large, is the column vector of super large by its transposition the most again.And use Stochastic Decision-making forest The feature extracted in above step is carried out discriminant classification.
CN201610420591.7A 2016-06-14 2016-06-14 Human body behavior identification method based on HTG-HOG and STG characteristics Active CN106022310B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610420591.7A CN106022310B (en) 2016-06-14 2016-06-14 Human body behavior identification method based on HTG-HOG and STG characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610420591.7A CN106022310B (en) 2016-06-14 2016-06-14 Human body behavior identification method based on HTG-HOG and STG characteristics

Publications (2)

Publication Number Publication Date
CN106022310A true CN106022310A (en) 2016-10-12
CN106022310B CN106022310B (en) 2021-08-17

Family

ID=57087844

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610420591.7A Active CN106022310B (en) 2016-06-14 2016-06-14 Human body behavior identification method based on HTG-HOG and STG characteristics

Country Status (1)

Country Link
CN (1) CN106022310B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106815600A (en) * 2016-12-27 2017-06-09 浙江工业大学 For the depth co-ordinative construction and structural chemistry learning method of human behavior identification
CN110610145A (en) * 2019-08-28 2019-12-24 电子科技大学 Behavior identification method combined with global motion parameters
WO2020244279A1 (en) * 2019-06-05 2020-12-10 北京京东尚科信息技术有限公司 Method and device for identifying video

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101732055A (en) * 2009-02-11 2010-06-16 北京智安邦科技有限公司 Method and system for testing fatigue of driver
CN102136066A (en) * 2011-04-29 2011-07-27 电子科技大学 Method for recognizing human motion in video sequence
US20120027263A1 (en) * 2010-08-02 2012-02-02 Sony Corporation Hand gesture detection
CN105095866A (en) * 2015-07-17 2015-11-25 重庆邮电大学 Rapid behavior identification method and system
CN105631462A (en) * 2014-10-28 2016-06-01 北京交通大学 Behavior identification method through combination of confidence and contribution degree on the basis of space-time context

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101732055A (en) * 2009-02-11 2010-06-16 北京智安邦科技有限公司 Method and system for testing fatigue of driver
US20120027263A1 (en) * 2010-08-02 2012-02-02 Sony Corporation Hand gesture detection
CN102136066A (en) * 2011-04-29 2011-07-27 电子科技大学 Method for recognizing human motion in video sequence
CN105631462A (en) * 2014-10-28 2016-06-01 北京交通大学 Behavior identification method through combination of confidence and contribution degree on the basis of space-time context
CN105095866A (en) * 2015-07-17 2015-11-25 重庆邮电大学 Rapid behavior identification method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GURUPRASAD SOMASUNDARAM ET AL.: "Action recognition using global spatio-temporal features derived from sparse representations", 《COMPUTER VISION AND IMAGE UNDERSTANDING》 *
蔡加欣 等: "基于局部轮廓和随机森林的人体行为识别", 《光学学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106815600A (en) * 2016-12-27 2017-06-09 浙江工业大学 For the depth co-ordinative construction and structural chemistry learning method of human behavior identification
CN106815600B (en) * 2016-12-27 2019-07-30 浙江工业大学 Depth co-ordinative construction and structural chemistry learning method for human behavior identification
WO2020244279A1 (en) * 2019-06-05 2020-12-10 北京京东尚科信息技术有限公司 Method and device for identifying video
US11967134B2 (en) 2019-06-05 2024-04-23 Beijing Jingdong Shangke Information Technology Co., Ltd. Method and device for identifying video
CN110610145A (en) * 2019-08-28 2019-12-24 电子科技大学 Behavior identification method combined with global motion parameters

Also Published As

Publication number Publication date
CN106022310B (en) 2021-08-17

Similar Documents

Publication Publication Date Title
CN108537743B (en) Face image enhancement method based on generation countermeasure network
CN111062973B (en) Vehicle tracking method based on target feature sensitivity and deep learning
CN106446930B (en) Robot operative scenario recognition methods based on deep layer convolutional neural networks
CN105069746B (en) Video real-time face replacement method and its system based on local affine invariant and color transfer technology
CN102256065B (en) Automatic video condensing method based on video monitoring network
CN112241762B (en) Fine-grained identification method for pest and disease damage image classification
CN109902558B (en) CNN-LSTM-based human health deep learning prediction method
CN108830252A (en) A kind of convolutional neural networks human motion recognition method of amalgamation of global space-time characteristic
CN108416266A (en) A kind of video behavior method for quickly identifying extracting moving target using light stream
CN107463920A (en) A kind of face identification method for eliminating partial occlusion thing and influenceing
CN105160310A (en) 3D (three-dimensional) convolutional neural network based human body behavior recognition method
CN110826389B (en) Gait recognition method based on attention 3D frequency convolution neural network
CN113256677A (en) Method for tracking visual target with attention
CN105631455A (en) Image main body extraction method and system
CN108090403A (en) Face dynamic identification method and system based on 3D convolutional neural network
WO2019071976A1 (en) Panoramic image saliency detection method based on regional growth and eye movement model
CN112990077B (en) Face action unit identification method and device based on joint learning and optical flow estimation
CN104834909B (en) A kind of new image representation method based on Gabor comprehensive characteristics
CN109635811A (en) The image analysis method of spatial plant
CN109359549A (en) A kind of pedestrian detection method based on mixed Gaussian and HOG_LBP
CN104751111A (en) Method and system for recognizing human action in video
CN109359527A (en) Hair zones extracting method and system neural network based
CN106022310A (en) HTG-HOG (histograms of temporal gradient and histograms of oriented gradient) and STG (scale of temporal gradient) feature-based human body behavior recognition method
CN106529441A (en) Fuzzy boundary fragmentation-based depth motion map human body action recognition method
CN117036770A (en) Detection model training and target detection method and system based on cascade attention

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant