CN104200203A - Human movement detection method based on movement dictionary learning - Google Patents

Human movement detection method based on movement dictionary learning Download PDF

Info

Publication number
CN104200203A
CN104200203A CN201410437190.3A CN201410437190A CN104200203A CN 104200203 A CN104200203 A CN 104200203A CN 201410437190 A CN201410437190 A CN 201410437190A CN 104200203 A CN104200203 A CN 104200203A
Authority
CN
China
Prior art keywords
dictionary
action
matrix
human
anthropoid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410437190.3A
Other languages
Chinese (zh)
Other versions
CN104200203B (en
Inventor
解梅
蔡勇
何磊
蔡家柱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Houpu Clean Energy Group Co ltd
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201410437190.3A priority Critical patent/CN104200203B/en
Publication of CN104200203A publication Critical patent/CN104200203A/en
Application granted granted Critical
Publication of CN104200203B publication Critical patent/CN104200203B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a human movement detection method based on movement dictionary learning. The human movement detection method includes that at the training stage, using a local property representing method to extract human movement properties from different video clips, and learning a human movement dictionary with strong distinguishing ability through training; considering reconstructing errors and new errors when modeling the movement dictionary so as to model better; at the testing stage, enabling a space-time sliding window to traverse the sparse codes of sliding windows of the whole video, and judging whether the space-time sliding window comprises a human movement according to the response values of the sparse codes to different dictionary items. The human movement detection method based on the movement dictionary learning can obtain the human movement dictionary through training without a negative sample, and the training process is easy and quick to finish.

Description

A kind of human action detection method based on action dictionary learning
Technical field
The invention belongs to computer vision technique, relate to human action detection technique.
Background technology
Physical activity analysis is one of most active research theme in computer vision field, and its core is utilize computer vision technique from image sequence, to detect, follow the tracks of, identify people and its behavior is understood and described.Human action detection method based on computer vision is the core technology of human motion analysis research, and it comprises detects the human body in visual field, obtains the parameter of reflection human action, to reach the object of understanding human action; Have broad application prospects in intelligent monitoring, intelligent appliance, man-machine interaction, the content-based field such as video frequency searching and compression of images and great economic worth and social value.In actual applications, be subject to illumination variation, block, complex scene, visual angle change, especially individual difference, as the restriction of the unfavorable factors such as expression, attitude, motion, clothing, human action is detected and have difficulty.Refer to document: Aggarwal J K, Ryoo M S.Human activity analysis:A review[J] .ACM Computing Surveys (CSUR), 2011,43 (3): the 16. rivers woods of scalding. the human action detection and Identification method research [D] based on computer vision. South China Science & Engineering University's doctorate paper, 2010.
The method that human action detects mainly contains serial method, space-time body method and the method based on feature bag (Bag of Feature, BOF) model.Human action detects and mainly comprises two steps: action represents and motion detection.Wherein action represents human action information to encode, and its main method is divided into that global characteristics represents, local feature represents and expression based on manikin; Motion detection method mainly contains Direct Classification method, template matching method, three-dimensional branch and bound method.Refer to document: Gaidon A, Harchaoui Z, Schmid C.Actom sequence models for efficient action detection[C] //Computer Vision and Pattern Recognition, IEEE Computer Society Conference on.IEEE, 2011:3201-3208.
Global characteristics method for expressing is by as a whole human action observed quantity coding.Global characteristics represent to regard as from top and under method, first human body is positioned, then utilize human body boundary rectangle to define a region-of-interest, then this region-of-interest is carried out to Global Information coding, to represent human action.Common global characteristics method for expressing mainly contains profile, light stream and space-time shape.In global characteristics, take full advantage of body shape information and movable information, in testing process, often use as template, utilize serial method or space-time body method, carry out similarity comparison with the global characteristics extracting in video sequence, what similarity was larger is testing result.The shortcoming of global characteristics is to be too dependent on the result that accurate location, background are wiped out and followed the tracks of, and to visual angle change, noise with block more responsive.Refer to document: Yilmaz A, Shah M.Actions sketch:A novel action representation[C] //Computer Vision and Pattern Recognition, IEEE Computer Society Conference on.IEEE, 2005,1:984-989.
Human action is expressed as independently image block (patch) or image cube (cuboid) set by local feature method for expressing.Local feature method for expressing can regard as the end of from and on method, first utilize point of interest detecting device to detect space-time interest points, then centered by these space-time interest points, extract two dimensional image piece or 3-D view cube, and carry out local feature description, finally, from image block or the information extraction of image cube, represent thereby obtain a human action.The extracting method of space-time interest points mainly contains: Harris3D, Cuboids and Hessian; Carrying out the adoptable method of local feature description around space-time interest points mainly contains: HOG/HOF, HOG3D and expansion SURF.Represent with respect to global characteristics, local feature represents to have the unchangeability such as good rotation, Pan and Zoom, the impact of the unfavorable factor such as can effectively reduce complex background, human body attitude, visual angle change and block, its shortcoming is to depend on a large amount of space-time interest points, and sometimes needs pre-service to compensate the error that camera motion produces.Refer to document: Wang L M, Qiao Y, Tang X.Motionlets:Mid-Level 3D Parts for Human Motion Recognition[C] //Computer Vision and Pattern Recognition, IEEE Computer Society Conference on.IEEE, 2013.
The main thought of the expression based on manikin is: manikin is to be supported by bone, and skeletal architecture can be regarded as by some human parts and links the power system forming, and the running of this system has formed different human body behaviors.Method for expressing based on manikin attempts to propose a kind of human action expression way that is rich in information, the structure of each Body composition part in study video.The latent defect of the method is too to rely on target and motion detection algorithm, thereby these methods are not suitable for the video under natural scene.Also someone attempts using space-time patch to substitute human part, but selects the standard of space-time patch and need how much space-time patch to capture likely changing not yet of human action to solve.Refer to document: Tian Y, Sukthankar R, Shah M.Spatiotemporal Deformable Part Models for Action Detection[C] //Computer Vision and Pattern Recognition, IEEE Computer Society Conference on.IEEE, 2013.
Summary of the invention
Technical matters to be solved by this invention is, provide a kind of reconstructed error less, differentiate the strong human action detection method of performance.
The present invention is that a kind of human action detection method based on action dictionary learning, comprises the following steps for solving the problems of the technologies described above adopted technical scheme:
Step 1) gather training sample, the coloured image in sample is converted into gray level image, and spatial resolution and the duration of unified video segment;
Step 2 is calculated the part three binarization mode LTP features of every section of video, obtains a high dimensional feature vectors represent n dimensional feature space, n represents total dimension of high dimensional feature vectors, () trepresent transposition;
Step 3) by random metric matrix of LTP feature premultiplication of every section of video carry out dimensionality reduction, i.e. y=Ay 0, it is dropped to m dimension, the feature constitutive characteristic matrix Y after dimensionality reduction from n dimension; represent that m × n random metric matrix is the linear space of element composition, the wherein each element a of random metric matrix A ijobeying average is 0, the Gaussian distribution that variance is 1; M < < n, < < represent much smaller than, meet m than n to order of magnitude when young for much smaller than;
Step 4) action dictionary model training:
4-1) action dictionary D is expressed as following formula:
Wherein, action dictionary D is made up of M sub-dictionary corresponding to the anthropoid action of M, be sub-dictionary corresponding to the anthropoid action of k, K represents to move all dictionary item numbers in dictionary D, and L=K/M is each the dictionary item number in sub-dictionary, dictionary item, K > > M, > > represent much larger than, meet K than at least large order of magnitude of M for much larger than;
Set up action dictionary learning model, as shown in the formula:
< D , W , A , X > = arg min D , W , A , X | | Y - DX | | 2 2 + &alpha; | | Q - AX | | 2 2 + &beta; | | H - WX | | 2 2 s . t . &ForAll; | | x i | | 0 &le; T
Wherein, parameter value when arg min represents that getting objective function gets minimum value, Y is eigenmatrix, D represents action dictionary to be learnt, W presentation class device parameter, A represents random metric matrix, X represents sparse matrix, row x in sparse matrix icorrespond to the sparse coding of sample characteristics, i=1,2 ..., N, represent K × N dimensional linear space, N represents training sample sum, and α, β are weight coefficient, and H, for indicating matrix, indicates the sign vector of the respectively corresponding anthropoid action of every row hi in matrix, H=[h 1..., h n] ∈ R m × N, Q is discrimination matrix, in discrimination matrix every row respectively a corresponding training sample belong to certain anthropoid action sentence vector of norm, Q=[q 1..., q n] ∈ R k × N, || || 2represent 2 norms, s.t. represents constraint condition, and T represents degree of rarefication threshold value, represent any number, || || 0represent l 0norm;
4-2) utilize and pass through iterative based on core svd K-SVD algorithm:
< D &prime; , X > = arg min D &prime; , X | | Y &prime; - D &prime; X | | 2 2 s . t . &ForAll; i | | x i | | 0 &le; T ;
Known quantity Y &prime; = ( Y T , &alpha; Q T , &beta; H T ) T , Unknown intermediate quantity D &prime; = ( D T , &alpha; A T , &beta; W T ) T ;
After finite iteration is tried to achieve intermediate quantity D ', bring intermediate quantity D ' into action dictionary learning model and obtain final optimized action dictionary D, random metric matrix A, classifier parameters W;
Wherein, in K-SVD algorithm, interative computation initial value is determined by the following method:
Randomly drawing sample from the anthropoid action of M, utilizes K-SVD algorithm to obtain the anthropoid action of M initial dictionary separately thereby construct the initial value of action dictionary
Determine discrimination matrix Q, indicate matrix H according to the class label of the label of each dictionary item and training sample; Recycle the initial sparse matrix X that orthogonal coupling track algorithm obtains training sample;
The initial value A of random metric matrix 0=(XX t+ λ 2i) -1xQ t;
The initial value W of classifier parameters 0=(XX t+ λ 1i) -1xH t;
Step 5) human action detection:
Space-time sliding window slides in video sequence to be measured, add up respectively sparse coding that in space-time sliding window, image the is corresponding response sum on dictionary item in each sub-dictionary in action dictionary, whether the response that judges the highest dictionary item is more than or equal to threshold value, in this way, using the corresponding classification of dictionary item the highest response and that exceed threshold value as current human action testing result, otherwise, judge current without human action.
The invention provides a kind of human action detection method based on action dictionary learning.In the training stage, utilize local feature method for expressing to extract the human action feature in different video fragment, there is the human action dictionary of differentiating more by force power by one of training study, wherein different dictionary item in different human action respective action dictionaries; In the time that action dictionary is carried out to modeling, not only consider reconstructed error, also considered that new error term makes modeling more excellent;
At test phase, given one section of video, whole section of video of space-time moving window traversal, the action dictionary obtaining based on training calculates the sparse coding of space-time moving window, and according to judging whether comprise a certain human action in space-time moving window for the response of different dictionary items in sparse coding, thereby complete human action Detection task.The method does not need negative sample to train to obtain human action dictionary, and training process is simple and quick, in illumination variation, block, have good detection effect in the situation such as complex background and visual angle change, and can be similar to requirement of real time.
The invention has the beneficial effects as follows between dictionary item and human action, to have clear and definite corresponding relation, interpretation is good; The action dictionary that utilizes training to obtain can be expressed and the different types of human action of reconstruct well, and reconstructed error is little; Utilize the sparse coding being obtained by action dictionary, can determine in space-time moving window and whether comprise and certain anthropoid action there is stronger differentiation power.
Brief description of the drawings
The corresponding relation of Fig. 1 human action, sub-dictionary and dictionary item.
Embodiment
In order to describe easily embodiment content, first some terms are defined.
Definition 1: local three binarization modes (Local Trinary Patterns, LTP).It is a kind of local feature method for expressing, local binary patterns (Local Binary Patterns, LBP) in the expansion of time-space domain, by being carried out to interframe encode, motion image sequence effectively catches movable information, thereby avoid carrying out the complicated calculations process of any light stream, can regard that a kind of space-time local grain describes algorithm as.The LTP feature that comprises certain human action video segment by extraction can obtain an one-dimensional characteristic vector.This local feature method for expressing has the advantages such as discriminating power is strong, computing velocity is fast.Refer to document: Yeffet L, Wolf L.Local trinary patterns for human action recognition[C] //Computer Vision, International Conference on.IEEE, 2009:492-497.
Definition 2: accidental projection.It is a kind of dimensionality reduction technology, utilizes random metric matrix that sparse signal or compressible signal (as image, video) are projected to lower dimensional space from higher dimensional space.Refer to document: Baraniuk R G, Wakin M B.Random projections of smooth manifolds[J] .Foundations of computational mathematics, 2009,9 (1): 51-77.
Definition 3: action dictionary learning.Eigenmatrix might as well be designated as any row of Y represent that a m dimension LTP feature is denoted as if be complete dictionary, that is to say action dictionary, by one group of normalization base vector form.In the training stage, how to design complete dictionary D and made y jonly, by the better reconstruct of linear combination of a small amount of dictionary item, be action dictionary learning problem, as shown in the formula.Wherein || || 0l 0norm, represents sparse vector x ithe number of middle non-zero entry, T is x isatisfied degree of rarefication threshold value.
min D , X 1 2 | | Y - DX | | 2 2 s . t . &ForAll; i | | x i | | 0 &le; T
Definition 4:K-SVD algorithm.It is a classical solution of crossing complete dictionary by iterative, and iteration speed is fast, and less to the reconstructed error of signal.Refer to document: Aharon M, Elad M, Bruckstein A.K-SVD:Design of dictionaries for sparse representation[J] .Proceedings of SPARS, 2005,5:9-12.
Definition 5: sparse coding.Fixing action dictionary D, solves test sample y testcorresponding sparse vector x test, make y test≈ Dx testset up, be called y testsparse coding under dictionary D, as shown in the formula.
min X 1 2 | | y test - Dx test | | 2 2 s . t . &ForAll; i | | x test | | 0 &le; T
Definition 6:OMP algorithm.Its full name is orthogonal matching pursuit, is one of typical solution solving sparse coding problem, has computation complexity low, and fast convergence rate can be estimated the advantages such as globally optimal solution preferably.Refer to document: Tropp J A.Greed is good:Algorithmic results for sparse approximation[J] .Information Theory, IEEE Transactions on, 2004,50 (10): 2231-2242.
Definition 7: space-time sliding window.Sliding window refers generally in target detection process, thereby the rectangle frame of fixed measure travels through localizing objects in image.Space-time sliding window is that sliding window is generalized to three-dimensional result from two-dimensional case.In continuous videos, detect human action, need the space-time sliding window traversal video sequence of fixed measure, to locate human action.
Step 1: gather training sample.Train positive sample to come from respectively the video segment of internet and TV programme, in selected sample, only consider single human action, do not relate to the situation of multiple human actions, take into account the influence factors such as illumination variation, complex scene, visual angle change, individual difference simultaneously.
Step 2: image pre-service.Pre-service comprises following two key steps: coloured image is converted into gray level image; Spatial resolution and the duration of unified video segment.
Step 3: calculate the LTP feature of every section of video, obtain a high dimensional feature vectors
Step 4: Feature Dimension Reduction.Adopt accidental projection method, by random metric matrix of LTP feature premultiplication carry out dimensionality reduction, i.e. y=Ay 0, it is dropped to m dimension (m < < n), the wherein each element a of random metric matrix from n dimension ijobeying average is 0, the Gaussian distribution that variance is 1, i.e. a ij~N (0,1), the feature constitutive characteristic matrix Y after dimensionality reduction.
Step 5: the foundation of action dictionary model with solve.
Step 5-1: the foundation of action dictionary model; Action dictionary form as shown in the formula:
Wherein, action dictionary D is made up of M sub-dictionary corresponding to the anthropoid action of M, sub-dictionary corresponding to the anthropoid action of k, L=[K/M] be each the dictionary item number in sub-dictionary, it is dictionary item.Human action, sub-dictionary and dictionary item three's corresponding relation is as figure.Action dictionary learning is modeled as to optimization problem, as shown in the formula:
< D , W , A , X > = arg min D , W , A , X | | Y - DX | | 2 2 + &alpha; | | Q - AX | | 2 2 + &beta; | | H - WX | | 2 2 s . t . &ForAll; | | x i | | 0 &le; T
Wherein the Section 1 of objective function is reconstructed error, and Section 2 is that sparse coding is differentiated error, and Section 3 is error in classification.D is action dictionary to be learnt; Sparse matrix X ∈ R k × Nrow correspond to the sparse coding of sample characteristics; W presentation class device parameter; α, β is scalar, represents the weight of Section 2 and Section 3 in objective function; Matrix H=[h 1..., h n] ∈ R m × Nevery row correspond to the sign vector h of certain anthropoid action i=[0 ..., 0,1,0 ..., 0] t; Q=[q 1..., q n] ∈ R k × Nthe discrimination matrix of the corresponding sparse coding of training sample, if i training sample belongs to the anthropoid action of k, this row discriminant vector q i = [ q 1,1 i , q 1,2 i , &CenterDot; &CenterDot; &CenterDot; , q 1 , L i , &CenterDot; &CenterDot; &CenterDot; , q k , 1 i , q k , 2 i , &CenterDot; &CenterDot; &CenterDot; , q k , L i , &CenterDot; &CenterDot; &CenterDot; , q M , 1 i , q M , 2 i , &CenterDot; &CenterDot; &CenterDot; , q M , L i ] t = [ 0,0 , &CenterDot; &CenterDot; &CenterDot; , 0 , &CenterDot; &CenterDot; &CenterDot; , 1,1 , &CenterDot; &CenterDot; &CenterDot; , 1 , &CenterDot; &CenterDot; &CenterDot; , 0,0 , &CenterDot; &CenterDot; &CenterDot; , 0 ] t &Element; R K , Matrix of a linear transformation A can transform to sparse matrix X discrimination matrix Q.
Step 5-2: action the solving of dictionary model; Cannot, to above formula direct solution, therefore be deformed into following optimization problem:
< D &prime; , X > = arg min D &prime; , X | | Y &prime; - D &prime; X | | 2 2 s . t . &ForAll; i | | x i | | 0 &le; T ;
Wherein Y &prime; = ( Y T , &alpha; Q T , &beta; H T ) T , For known quantity, D &prime; = ( D T , &alpha; A T , &beta; W T ) T It is the unknown parameter that needs training.Utilize K-SVD algorithm can solve this optimization problem, and obtain the globally optimal solution of all parameters.Because K-SVD algorithm is a kind of iterative algorithm, so need to determine iteration initial value D 0, A 0, W 0.Concrete grammar is as follows: randomly drawing sample from the anthropoid action of M, utilizes K-SVD algorithm to obtain the anthropoid action of M initial dictionary separately thereby construct initial dictionary according to the label of each dictionary item, and the class label of training sample, just can determine discrimination matrix Q; Recycling OMP algorithm obtains the initial sparse matrix X of training sample; Initial value A 0=(XX t+ λ 2i) -1xQ t, initial value W 0=(XX t+ λ 1i) -1xH t.Try to achieve dictionary D ' through finite iteration, thereby can obtain the optimized parameter D of optimization problem in step 5-1, A, W.
Step 6: human action detects.Given one section of video, space-time sliding window slides in video sequence, if do not comprise the human action class of training set in window, the sparse coding of this window LTP feature likely responds at all dictionary Xiang Shangjun; If comprise certain anthropoid action in moving window, the sparse coding of this window LTP feature can have stronger response on dictionary item corresponding to such human action, and a little less than response on dictionary item corresponding to other human action.In the time detecting, the response sum of the sparse coding of adding up respectively space-time sliding window on dictionary item corresponding to all kinds of human actions, if the response on sub-dictionary corresponding to certain anthropoid action is maximum and be greater than a certain threshold value, be judged to be such human action occurs, thereby complete human action Detection task.
In order to verify effect of the present invention, use Matlab, C/C++ language, at hardware platform: Intel core2 E7400+4G DDR RAM, software platform: Matlab2012a, carries out emulation on VisualStdio2010, and concrete implementation step and parameter arrange following:
Step 1: gather training sample.Consider the influence factors such as illumination variation, complex scene, visual angle change, individual difference, 6 kinds of different human body actions such as the training sample of choosing mainly comprises running, stroll, applaud, jump, stand up, sit down, the video segment intercepting amounts to 300, fragment duration 5s-20s not etc., corresponding 50 fragments of every class action.
Step 2: image pre-service.Every section of short-sighted frequency time span is fixed as 400 milliseconds, comprises altogether 10 two field pictures; 300 video segments that gather are partitioned into 3000 sections of short-sighted frequencies altogether.The coloured image of every frame is converted into gray level image, and spatial resolution unification is scaled 320 pixel × 240 pixels, and the size of data that is to say every section of video is 320 × 240 × 10.
Step 3: the LTP feature of calculating every section of short-sighted frequency.Design parameter arranges as follows: every two field picture is divided into 3 × 3 regions, by the 1st of short-sighted frequency, 3,5,7,9 frames calculate respectively the LTP feature of the 3rd, 5,7 frames, and LTP characteristic threshold value is chosen as 800, generate the eigenvector of 13824 × 1, finally obtaining the not eigenmatrix size of dimensionality reduction is 13824 × 3000.
Step 4: Feature Dimension Reduction.Utilize accidental projection algorithm, the random metric matrix that premultiplication is 1500 × 13824, by big or small dimensionality reduction to 1500 × 3000 of eigenmatrix, this matrix is eigenmatrix Y.
Step 5: the foundation of action dictionary model with solve.The weight coefficient that sparse coding differentiation error and error in classification are two is respectively α=0.3, β=0.1, degree of rarefication threshold value T=10, the action dictionary size that training obtains is 1500 × 3300, wherein the training sample under 6 anthropoid actions is corresponding to 6 sub-dictionaries, i.e. M=6; In every sub-dictionary, dictionary item number is 550, i.e. L=550.
Step 6: human action detects.Test video is two sections of recorded video, be divided into indoor, outdoor two kinds of situations, video comprises 5 people's action, 5 people clothing, visual angle, yardstick in video is different, action classification comprises running, 6 kinds of human actions such as stroll, applaud, jump, stand up, sit down, and video duration is about 18 minutes.Space-time sliding window is of a size of 320 pixel × 400 millisecond, pixel × 240.Evaluation index adopts OV20 standard: if the window of testing result overlaps 20% with groundtruth, detect correctly, otherwise detect mistake.In the situation that recall rate is 90%, accuracy of detection is 89.2%; Final average detected precision is 86.6%, and this shows that the method has good detection effect.
Adopt method of the present invention, first, in the enterprising line algorithm emulation of Matlab platform, be then transplanted on C/C++ platform.Be in the image sequence of 320 pixel × 240 pixels in resolution, on Matlab platform, the processing speed of the method was 7 frame/seconds, and on C/C++ platform, the processing speed of the method reached for 15 frame/seconds, can be similar to the requirement that meets real-time.

Claims (1)

1. the human action detection method based on action dictionary learning, is characterized in that, comprises the following steps:
Step 1) gather training sample, the coloured image in sample is converted into gray level image, and spatial resolution and the duration of unified video segment;
Step 2 is calculated the part three binarization mode LTP features of every section of video, obtains a high dimensional feature vectors y 0=(y 1..., y n) t; N represents total dimension of high dimensional feature vectors, () trepresent transposition;
Step 3) random metric matrix A of LTP feature premultiplication of every section of video is carried out to dimensionality reduction, i.e. y=Ay 0, it is dropped to m dimension, the feature constitutive characteristic matrix Y after dimensionality reduction from n dimension; The wherein each element a of random metric matrix A ijobeying average is 0, the Gaussian distribution that variance is 1; M < < n, < < represent much smaller than, meet m than n to order of magnitude when young for much smaller than;
Step 4) action dictionary model training:
4-1) action dictionary D is expressed as following formula:
D=[D 1,D 2,…,D M]=[d 1,1,d 1,2,…,d 1,L,…,d k,1,d k,2,…,d k,L,…,d M,1,d M,2,…,d M,L]
Wherein, action dictionary D is made up of M sub-dictionary corresponding to the anthropoid action of M, be sub-dictionary corresponding to the anthropoid action of k, K represents to move all dictionary item numbers in dictionary D, and L=K/M is each the dictionary item number in sub-dictionary, dictionary item, K > > M, > > represent much larger than, meet K than at least large order of magnitude of M for much larger than;
Set up action dictionary learning model, as shown in the formula:
< D , W , A , X > = arg min D , W , A , X | | Y - DX | | 2 2 + &alpha; | | Q - AX | | 2 2 + &beta; | | H - WX | | 2 2 s . t . &ForAll; | | x i | | 0 &le; T
Wherein, parameter value when argmin represents that getting objective function gets minimum value, Y is eigenmatrix, D represents action dictionary to be learnt, W presentation class device parameter, A represents random metric matrix, X represents sparse matrix, the row x in sparse matrix icorrespond to the sparse coding of sample characteristics, i=1,2 ..., N, N represents training sample sum, and α, β are weight coefficient, and H, for indicating matrix, indicates every row h in matrix ithe sign vector of a respectively corresponding anthropoid action, H=[h 1..., h n] ∈ R m × N, Q is discrimination matrix, in discrimination matrix every row respectively a corresponding training sample belong to certain anthropoid action sentence vector of norm, Q=[q 1..., q n] ∈ R k × N, || || 2represent 2 norms, s.t. represents constraint condition, and T represents degree of rarefication threshold value, represent any number, || || 0represent l 0norm;
4-2) utilize and pass through iterative based on core svd K-SVD algorithm:
< D &prime; , X > = arg min D &prime; , X | | Y &prime; - D &prime; X | | 2 2 s . t . &ForAll; i | | x i | | 0 &le; T ;
Known quantity Y &prime; = ( Y T , &alpha; Q T , &beta; H T ) T , Unknown intermediate quantity D &prime; = ( D T , &alpha; A T , &beta; W T ) T ;
After finite iteration is tried to achieve intermediate quantity D ', bring intermediate quantity D ' into action dictionary learning model and obtain final optimized action dictionary D, random metric matrix A, classifier parameters W;
Wherein, in K-SVD algorithm, interative computation initial value is determined by the following method:
Randomly drawing sample from the anthropoid action of M, utilizes K-SVD algorithm to obtain the anthropoid action of M initial dictionary separately thereby construct the initial value of action dictionary
Determine discrimination matrix Q, indicate matrix H according to the class label of the label of each dictionary item and training sample; Recycle the initial sparse matrix X that orthogonal coupling track algorithm obtains training sample;
The initial value A of random metric matrix 0=(XX t+ λ 2i) -1xQ t;
The initial value W of classifier parameters 0=(XX t+ λ 1i) -1xH t;
Step 5) human action detection:
Space-time sliding window slides in video sequence to be measured, add up respectively sparse coding that in space-time sliding window, image the is corresponding response sum on dictionary item in each sub-dictionary in action dictionary, whether the response that judges the highest dictionary item is more than or equal to threshold value, in this way, using the corresponding classification of dictionary item the highest response and that exceed threshold value as current human action testing result, otherwise, judge current without human action.
CN201410437190.3A 2014-08-30 2014-08-30 A kind of human action detection method based on action dictionary learning Active CN104200203B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410437190.3A CN104200203B (en) 2014-08-30 2014-08-30 A kind of human action detection method based on action dictionary learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410437190.3A CN104200203B (en) 2014-08-30 2014-08-30 A kind of human action detection method based on action dictionary learning

Publications (2)

Publication Number Publication Date
CN104200203A true CN104200203A (en) 2014-12-10
CN104200203B CN104200203B (en) 2017-07-11

Family

ID=52085493

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410437190.3A Active CN104200203B (en) 2014-08-30 2014-08-30 A kind of human action detection method based on action dictionary learning

Country Status (1)

Country Link
CN (1) CN104200203B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105138997A (en) * 2015-09-06 2015-12-09 湖南大学 Method for identifying movement in wireless body area network based on compression classification
CN105989266A (en) * 2015-02-11 2016-10-05 北京三星通信技术研究有限公司 Electrocardiosignal-based authentication method, apparatus and system
CN106033546A (en) * 2015-03-10 2016-10-19 中国科学院西安光学精密机械研究所 Behavior classification method based on top-down learning
CN106067041A (en) * 2016-06-03 2016-11-02 河海大学 A kind of multi-target detection method of based on rarefaction representation of improvement
CN107578425A (en) * 2016-06-26 2018-01-12 周尧 Motion estimate implementation method based on compression infrared perception
US10089451B2 (en) 2015-02-11 2018-10-02 Samsung Electronics Co., Ltd. Electrocardiogram (ECG)-based authentication apparatus and method thereof, and training apparatus and method thereof for ECG-based authentication
CN108875628A (en) * 2018-06-14 2018-11-23 攀枝花学院 pedestrian detection method
CN108921126A (en) * 2018-07-20 2018-11-30 北京开普云信息科技有限公司 A kind of automatic identification signature stamp or the method and device of handwritten signature
CN110991340A (en) * 2019-12-03 2020-04-10 郑州大学 Human body action analysis method based on image compression

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102136066A (en) * 2011-04-29 2011-07-27 电子科技大学 Method for recognizing human motion in video sequence
US20120251013A1 (en) * 2011-03-31 2012-10-04 Fatih Porikli Method for Compressing Textured Images
CN103440471A (en) * 2013-05-05 2013-12-11 西安电子科技大学 Human body action identifying method based on lower-rank representation
CN103902989A (en) * 2014-04-21 2014-07-02 西安电子科技大学 Human body motion video recognition method based on non-negative matrix factorization
CN103927517A (en) * 2014-04-14 2014-07-16 电子科技大学 Motion detection method based on human body global feature histogram entropies

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120251013A1 (en) * 2011-03-31 2012-10-04 Fatih Porikli Method for Compressing Textured Images
CN102136066A (en) * 2011-04-29 2011-07-27 电子科技大学 Method for recognizing human motion in video sequence
CN103440471A (en) * 2013-05-05 2013-12-11 西安电子科技大学 Human body action identifying method based on lower-rank representation
CN103927517A (en) * 2014-04-14 2014-07-16 电子科技大学 Motion detection method based on human body global feature histogram entropies
CN103902989A (en) * 2014-04-21 2014-07-02 西安电子科技大学 Human body motion video recognition method based on non-negative matrix factorization

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王斌等: ""基于判别稀疏编码视频表示的人体动作识别"", 《机器人》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105989266A (en) * 2015-02-11 2016-10-05 北京三星通信技术研究有限公司 Electrocardiosignal-based authentication method, apparatus and system
US10089451B2 (en) 2015-02-11 2018-10-02 Samsung Electronics Co., Ltd. Electrocardiogram (ECG)-based authentication apparatus and method thereof, and training apparatus and method thereof for ECG-based authentication
CN105989266B (en) * 2015-02-11 2020-04-03 北京三星通信技术研究有限公司 Authentication method, device and system based on electrocardiosignals
CN106033546A (en) * 2015-03-10 2016-10-19 中国科学院西安光学精密机械研究所 Behavior classification method based on top-down learning
CN105138997A (en) * 2015-09-06 2015-12-09 湖南大学 Method for identifying movement in wireless body area network based on compression classification
CN106067041A (en) * 2016-06-03 2016-11-02 河海大学 A kind of multi-target detection method of based on rarefaction representation of improvement
CN106067041B (en) * 2016-06-03 2019-05-31 河海大学 A kind of improved multi-target detection method based on rarefaction representation
CN107578425A (en) * 2016-06-26 2018-01-12 周尧 Motion estimate implementation method based on compression infrared perception
CN108875628A (en) * 2018-06-14 2018-11-23 攀枝花学院 pedestrian detection method
CN108921126A (en) * 2018-07-20 2018-11-30 北京开普云信息科技有限公司 A kind of automatic identification signature stamp or the method and device of handwritten signature
CN110991340A (en) * 2019-12-03 2020-04-10 郑州大学 Human body action analysis method based on image compression
CN110991340B (en) * 2019-12-03 2023-02-28 郑州大学 Human body action analysis method based on image compression

Also Published As

Publication number Publication date
CN104200203B (en) 2017-07-11

Similar Documents

Publication Publication Date Title
CN104200203A (en) Human movement detection method based on movement dictionary learning
CN107784293B (en) A kind of Human bodys&#39; response method classified based on global characteristics and rarefaction representation
Luo et al. Spatio-temporal feature extraction and representation for RGB-D human action recognition
CN101299241B (en) Method for detecting multi-mode video semantic conception based on tensor representation
CN103942575A (en) System and method for analyzing intelligent behaviors based on scenes and Markov logic network
Tran et al. Modeling Motion of Body Parts for Action Recognition.
Cho et al. Semantic segmentation with low light images by modified CycleGAN-based image enhancement
CN111914676A (en) Human body tumbling detection method and device, electronic equipment and storage medium
CN102184384A (en) Face identification method based on multiscale local phase quantization characteristics
CN106485750A (en) A kind of estimation method of human posture based on supervision Local Subspace
Ma et al. Human motion gesture recognition based on computer vision
Shabaninia et al. Transformers in action recognition: A review on temporal modeling
Liang et al. Specificity and latent correlation learning for action recognition using synthetic multi-view data from depth maps
Tang et al. Using a selective ensemble support vector machine to fuse multimodal features for human action recognition
Amiri et al. Non-negative sparse coding for human action recognition
Chang et al. [Retracted] Visual Sensing Human Motion Detection System for Interactive Music Teaching
Wang et al. Sparse representation of local spatial-temporal features with dimensionality reduction for motion recognition
Wang et al. Swimmer’s posture recognition and correction method based on embedded depth image skeleton tracking
Harley et al. Tracking emerges by looking around static scenes, with neural 3d mapping
You et al. Multi-stream I3D network for fine-grained action recognition
Li et al. [Retracted] Human Sports Action and Ideological and PoliticalEvaluation by Lightweight Deep Learning Model
CN102663369A (en) Human motion tracking method on basis of SURF (Speed Up Robust Feature) high efficiency matching kernel
Liu et al. A Multi-Feature Motion Posture Recognition Model Based on Genetic Algorithm.
Li Three‐Dimensional Diffusion Model in Sports Dance Video Human Skeleton Detection and Extraction
Zhu et al. [Retracted] Basketball Object Extraction Method Based on Image Segmentation Algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20210513

Address after: No.3, 11th floor, building 6, no.599, shijicheng South Road, Chengdu hi tech Zone, China (Sichuan) pilot Free Trade Zone, Chengdu, Sichuan 610041

Patentee after: Houpu clean energy Co.,Ltd.

Address before: 611731, No. 2006, West Avenue, Chengdu hi tech Zone (West District, Sichuan)

Patentee before: University of Electronic Science and Technology of China

TR01 Transfer of patent right
CP01 Change in the name or title of a patent holder

Address after: No.3, 11th floor, building 6, no.599, shijicheng South Road, Chengdu hi tech Zone, China (Sichuan) pilot Free Trade Zone, Chengdu, Sichuan 610041

Patentee after: Houpu clean energy (Group) Co.,Ltd.

Address before: No.3, 11th floor, building 6, no.599, shijicheng South Road, Chengdu hi tech Zone, China (Sichuan) pilot Free Trade Zone, Chengdu, Sichuan 610041

Patentee before: Houpu clean energy Co.,Ltd.

CP01 Change in the name or title of a patent holder