CN105023000A

CN105023000A - Human brain visual memory principle-based human body action identification method and system

Info

Publication number: CN105023000A
Application number: CN201510407799.0A
Authority: CN
Inventors: 谌先敢; 刘海华; 高智勇; 李旭
Original assignee: South Central University for Nationalities
Current assignee: South Central Minzu University
Priority date: 2015-07-13
Filing date: 2015-07-13
Publication date: 2015-11-04
Anticipated expiration: 2035-07-13
Also published as: CN105023000B

Abstract

The invention discloses a human brain visual memory principle-based human body action identification method and system, and relates to the field of computer vision and video monitoring. Enlightened by the human brain visual memory principle, the invention proposes the following technical scheme for the first time: in a training stage, feature coding of local features is used to train a classifier model, and is used to build a visual memory bank; in an identification stage, the feature coding of the local features of a video to be identified is searched in the visual memory bank; and part of the local features of the video in a search result are used to replace shielded information in the video to be identified, feature coding is performed on local features of the replaced video, and the training model is input to perform testing, thereby obtaining the category of a human body action in the video. The human brain visual memory principle-based human body action identification method and the system can effectively solve the problem of shielding in human body action identification.

Description

Based on human motion recognition method and the system of human brain visual memory principle

Technical field

The present invention relates to computer vision and field of video monitoring, specifically relate to a kind of human motion recognition method based on human brain visual memory principle and system.

Background technology

Human action identification based on video is a very important problem, can be applicable to video monitoring, video frequency searching and man-machine interaction.Human action identification refers to the classification distinguishing human action with computing machine from video sequence.

Human action identification based on video can be divided into two parts: the expression of action and the classification of action.Video can be divided into training set and test set.The expression of action refers to: from the video sequence comprising human action, extracts suitable characteristic, describes the action of human body.The classification of action refers to: by the characteristic in learning training set, obtain sorter model, the characteristic in test set is classified.

More or less all there are some and block in current many videos, comprises to block or by other target occlusions, and this can cause the main body performed an action not to be all visible, is difficult to extract effective motion characteristic, brings very large challenge to human action identification.

In current action identification method, following several method performance is under occlusion can be received: partial approach, the method based on probability and the method based on posture, but these methods respectively have certain limitation.The point of interest detection that partial approach is used, the local fritter of possible errors identification not in foreground target.Based on the method for probability, as Bayesian network, hidden Markov model is flat model, is effective in expression simple motion, but the level that can not describe in compound action and shared structure.Based on the method for posture, need to use detector, mark training image by manual, train each body part, which limits the application of method in action recognition based on posture.Therefore, urgently effective method solves the occlusion issue in human action identification.

Summary of the invention

The object of the invention is the deficiency in order to overcome above-mentioned background technology, a kind of human motion recognition method based on human brain visual memory principle and system being provided, effectively can solving the occlusion issue in human action identification.

The invention provides a kind of human motion recognition method based on human brain visual memory principle, comprise the following steps:

A, training stage:

A1, gather multiple training video, respectively intensive sampling is carried out to each training video, using the histograms of oriented gradients HOG feature on sampling block as local feature, obtains the HOG characteristic set of training video;

A2, employing expectation-maximization algorithm, learn the HOG characteristic set of the training video that steps A 1 obtains, obtain one group of " super complete " base vector;

A3, " super complete " base vector that integrating step A2 obtains, adopt the mode of sparse coding, feature coding is carried out to the HOG characteristic set of the training video that steps A 1 obtains, obtain the first sparse vector set, in first sparse vector set, the dimension of each vector is identical with the dimension of " super complete " base vector, summation operation is carried out to the whole sparse vectors in the first sparse vector set, be normalized again, obtain a dimension and " super complete " vector that base vector dimension is identical, as the coding result of training video, the human action in assertiveness training video is carried out with the coding result of training video,

A4, the coding result of all training videos steps A 3 obtained are sent into support vector machines sorter and are trained, and generate training pattern;

The coding result of all training videos that A5, use steps A 3 obtain, builds visual memory storehouse;

B, cognitive phase:

B1, input video to be identified, intensive sampling is carried out to video to be identified, using the HOG feature on sampling block as local feature, obtains the HOG characteristic set of video to be identified;

" super complete " base vector that B2, integrating step A2 obtain, adopt the mode of sparse coding, feature coding is carried out to the HOG characteristic set of the video to be identified that step B1 obtains, arrive to obtain the second sparse vector set, in second sparse vector set, the dimension of each vector is identical with " super complete " base vector dimension, summation operation is carried out to the whole sparse vectors in the second sparse vector set, then is normalized, obtain a dimension and " super complete " sparse vector that base vector dimension is identical;

B3, determine the position that is blocked in video to be identified, to replace the position be blocked in video to be identified with the result for retrieval in visual memory storehouse, obtain the coding result of video to be identified:

The sparse vector obtained with step B2 is index, retrieve in the visual memory storehouse that steps A 5 builds, using the video that retrieves as result for retrieval, the feature at the position that is blocked is replaced in video to be identified with the local feature of video in result for retrieval, obtain the HOG characteristic set of the video after replacing, as new local feature; " super complete " base vector obtained by steps A 2 carries out feature coding to this new local feature, obtains new sparse vector, as the coding result of video to be identified, expresses the human action in video to be identified with the coding result of video to be identified;

B4, the coding result of video to be identified obtained by step B3 are sent into the training pattern that steps A 4 generates and are tested, and obtain the human action classification in video to be identified.

On the basis of technique scheme, in steps A, describedly to the process that each training video carries out intensive sampling be respectively: for individualized training video, centered by intensive sampling point, find multiple local sampling blocks of this training video.

On the basis of technique scheme, described local sampling block is of a size of the arbitrary dimension being less than training video size.

On the basis of technique scheme, described local sampling block is of a size of 16 × 16 × 4 pixels.

On the basis of technique scheme, in steps A 5, content based video retrieval system system is adopted to carry out analog vision data base.

On the basis of technique scheme, the detailed process at position of determining in step B3 to be blocked in video to be identified is: the image entropy calculating each local sampling block in video to be identified, entropy is exactly the position be blocked lower than the position at the local sampling block place of predetermined threshold value, and predetermined threshold value is determined in an experiment.

The present invention also provides a kind of human action recognition system based on human brain visual memory principle, comprise a HOG characteristic set acquiring unit, " super complete " base vector acquiring unit, the first coding unit, training pattern generation unit, visual memory storehouse construction unit, the 2nd HOG characteristic set acquiring unit, sparse vector acquiring unit, the second coding unit, human action classification acquiring unit, wherein:

A described HOG characteristic set acquiring unit is used for: gather multiple training video, carry out intensive sampling respectively to each training video, using the histograms of oriented gradients feature on sampling block as local feature, obtains the HOG characteristic set of training video;

Described " super complete " base vector acquiring unit is used for: adopt expectation-maximization algorithm, learn, obtain one group of " super complete " base vector to the HOG characteristic set of the training video that a HOG characteristic set acquiring unit obtains;

Described first coding unit is used for: combine " super complete " base vector, adopt the mode of sparse coding, feature coding is carried out to the HOG characteristic set of the training video that a HOG characteristic set acquiring unit obtains, obtain the first sparse vector set, in first sparse vector set, the dimension of each vector is identical with the dimension of " super complete " base vector, summation operation is carried out to the whole sparse vectors in the first sparse vector set, be normalized again, obtain a dimension and " super complete " vector that base vector dimension is identical, as the coding result of training video, the human action in assertiveness training video is carried out with the coding result of training video,

Described training pattern generation unit is used for: the coding result of all training videos obtained by the first coding unit is sent into support vector machines sorter and trained, and generates training pattern;

Described visual memory storehouse construction unit is used for: the coding result of all training videos using the first coding unit to obtain, builds visual memory storehouse;

Described 2nd HOG characteristic set acquiring unit is used for: carry out intensive sampling to the video to be identified of input, using the HOG feature on sampling block as local feature, obtain the HOG characteristic set of video to be identified;

Described sparse vector acquiring unit is used for: combine " super complete " " super complete " base vector of obtaining of base vector acquiring unit, adopt the mode of sparse coding, feature coding is carried out to the HOG characteristic set of the video to be identified that the 2nd HOG characteristic set acquiring unit obtains, arrive to obtain the second sparse vector set, in second sparse vector set, the dimension of each vector is identical with " super complete " base vector dimension, summation operation is carried out to the whole sparse vectors in the second sparse vector set, be normalized again, obtain a dimension and " super complete " sparse vector that base vector dimension is identical,

Described second coding unit is used for: determine the position be blocked in video to be identified, the position be blocked in video to be identified is replaced with the result for retrieval in visual memory storehouse, obtain the coding result of video to be identified: the sparse vector obtained with sparse vector acquiring unit is index, retrieve in visual memory storehouse, using the video that retrieves as result for retrieval, the feature at the position that is blocked is replaced in video to be identified with the local feature of video in result for retrieval, obtain the HOG characteristic set of the video after replacing, as new local feature; With " super complete " base vector, feature coding is carried out to this new local feature, obtain new sparse vector, as the coding result of video to be identified, express the human action in video to be identified with the coding result of video to be identified;

Described human action classification acquiring unit is used for: the coding result of the video to be identified obtained by the second coding unit is sent into training pattern and tested, and obtains the human action classification in video to be identified.

On the basis of technique scheme, a described HOG characteristic set acquiring unit to the process that each training video carries out intensive sampling is respectively: for individualized training video, centered by intensive sampling point, finds multiple local sampling blocks of this training video.

On the basis of technique scheme, described visual memory storehouse construction unit adopts content based video retrieval system system to carry out analog vision data base.

On the basis of technique scheme, described second coding unit determine to be blocked in the video to be identified detailed process at position is: the image entropy calculating each local sampling block in video to be identified, entropy is exactly the position be blocked lower than the position at the local sampling block place of predetermined threshold value, and predetermined threshold value is determined in an experiment.

Compared with prior art, advantage of the present invention is as follows:

The present invention inspires by human brain visual memory principle, proposes following technical scheme first: in the training stage, uses the feature coding training classifier model of local feature, and uses this feature coding to build visual memory storehouse; At cognitive phase, in visual memory storehouse, retrieve the feature coding of local feature in video to be identified; With the part local feature of video in result for retrieval, replace the information be blocked in video to be identified, feature coding is carried out to the local feature of video after replacing, and input training pattern and test, obtain the classification of human action in video.The present invention can distinguish the classification of human action from video, effectively can solve the occlusion issue in human action identification.

Accompanying drawing explanation

Fig. 1 is the process flow diagram based on the human motion recognition method of human brain visual memory principle in the embodiment of the present invention.

Fig. 2 is the video frequency searching process of analog vision data base retrieval in the embodiment of the present invention.

Fig. 3 replaces with the result for retrieval in visual memory storehouse the process that block information obtains new sparse vector in the embodiment of the present invention.

Embodiment

Below in conjunction with drawings and the specific embodiments, the present invention is described in further detail.

Shown in Figure 1, the embodiment of the present invention provides a kind of human motion recognition method based on human brain visual memory principle, comprises the following steps:

A, training stage:

A1, gather multiple training video, respectively intensive sampling is carried out to each training video, HOG (Histogram of Oriented Gradients, histograms of oriented gradients) feature on sampling block, as local feature, is obtained the HOG characteristic set of training video;

The process of each training video being carried out respectively to intensive sampling is: for individualized training video, centered by intensive sampling point, finds multiple local sampling blocks of this training video; Local sampling block is of a size of the arbitrary dimension being less than training video size, such as: local sampling block is of a size of 16 × 16 × 4 pixels;

A2, employing well known to a person skilled in the art EM (Expectation Maximization, expectation maximization) algorithm, learn, obtain one group of " super complete " base vector to the HOG characteristic set of the training video that steps A 1 obtains; " super complete " base vector is prior art, does not repeat herein;

A4, the coding result of all training videos steps A 3 obtained are sent into SVM (Support Vector Machine, support vector machine) sorter and are trained, and generate training pattern;

The coding result of all training videos that A5, use steps A 3 obtain, builds visual memory storehouse.

Shown in Figure 2, in steps A 5, CBVR (Content-Based VideoRetrieval, content based video retrieval system) system can be adopted to carry out analog vision data base, and CBVR system is prior art, does not repeat herein.

First, human brain visual memory principle is simply introduced:

The eye minded major function of human brain comprises storage and association, and storage refers to: be stored in human brain by seen information; Association refers to: by some information seen, remembers former some information seen be stored in human brain.

By the inspiration of human brain visual memory principle, in this task of human action identification, can think human brain visual memory storehouse in what store is the full detail of video, association function then can come by retrieving this visual memory storehouse, and the feature coding of video can be retrieved as index.Suppose that the video stored in visual memory storehouse is not generally all blocked.

When adopting CBVR system to carry out analog vision data base, the eye minded memory function of human brain and association function, respectively two stages of corresponding CBVR system: the formation of property data base and video frequency searching.

B, cognitive phase:

The detailed process at position of determining to be blocked in video to be identified is: the image entropy calculating each local sampling block in video to be identified, and entropy is exactly the position be blocked lower than the position at the local sampling block place of predetermined threshold value, and predetermined threshold value is determined in an experiment;

Shown in Figure 3, the sparse vector obtained with step B2 is index, retrieve in the visual memory storehouse that steps A 5 builds, using the video that retrieves as result for retrieval, the feature at the position that is blocked is replaced in video to be identified with the local feature of video in result for retrieval, obtain the HOG characteristic set of the video after replacing, as new local feature; " super complete " base vector obtained by steps A 2 carries out feature coding to this new local feature, obtains new sparse vector, as the coding result of video to be identified, expresses the human action in video to be identified with the coding result of video to be identified;

The embodiment of the present invention also provides a kind of human action recognition system based on human brain visual memory principle, comprise a HOG characteristic set acquiring unit, " super complete " base vector acquiring unit, the first coding unit, training pattern generation unit, visual memory storehouse construction unit, the 2nd HOG characteristic set acquiring unit, sparse vector acquiring unit, the second coding unit, human action classification acquiring unit, wherein:

One HOG characteristic set acquiring unit is used for: gather multiple training video, respectively intensive sampling is carried out to each training video, by HOG (the Histogram ofOriented Gradients on sampling block, histograms of oriented gradients) feature as local feature, obtain the HOG characteristic set of training video; For individualized training video, centered by intensive sampling point, find multiple local sampling blocks of this training video; Local sampling block is of a size of the arbitrary dimension being less than training video size, such as: the size of local sampling block gets 16 × 16 × 4 pixels;

" super complete " base vector acquiring unit is used for: adopt and well known to a person skilled in the art EM (Expectation Maximization, expectation maximization) algorithm, the HOG characteristic set of the training video that the one HOG characteristic set acquiring unit obtains is learnt, obtains one group of " super complete " base vector; " super complete " base vector is prior art, does not repeat herein;

First coding unit is used for: combine " super complete " base vector, adopt the mode of sparse coding, feature coding is carried out to the HOG characteristic set of the training video that a HOG characteristic set acquiring unit obtains, obtain the first sparse vector set, in first sparse vector set, the dimension of each vector is identical with the dimension of " super complete " base vector, summation operation is carried out to the whole sparse vectors in the first sparse vector set, be normalized again, obtain a dimension and " super complete " vector that base vector dimension is identical, as the coding result of training video, the human action in assertiveness training video is carried out with the coding result of training video,

Training pattern generation unit is used for: the coding result of all training videos obtained by the first coding unit is sent into SVM (Support Vector Machine, support vector machine) sorter and trained, and generates training pattern;

Visual memory storehouse construction unit is used for: the coding result of all training videos using the first coding unit to obtain, builds visual memory storehouse;

Shown in Figure 2, visual memory storehouse construction unit can adopt CBVR (Content-Based Video Retrieval, content based video retrieval system) system to carry out analog vision data base, and CBVR system is prior art, does not repeat herein;

2nd HOG characteristic set acquiring unit is used for: carry out intensive sampling to the video to be identified of input, using the HOG feature on sampling block as local feature, obtain the HOG characteristic set of video to be identified;

Sparse vector acquiring unit is used for: combine " super complete " " super complete " base vector of obtaining of base vector acquiring unit, adopt the mode of sparse coding, feature coding is carried out to the HOG characteristic set of the video to be identified that the 2nd HOG characteristic set acquiring unit obtains, arrive to obtain the second sparse vector set, in second sparse vector set, the dimension of each vector is identical with " super complete " base vector dimension, summation operation is carried out to the whole sparse vectors in the second sparse vector set, be normalized again, obtain a dimension and " super complete " sparse vector that base vector dimension is identical,

Second coding unit is used for: determine the position be blocked in video to be identified, replace the position be blocked in video to be identified with the result for retrieval in visual memory storehouse, obtain the coding result of video to be identified;

Shown in Figure 3, the sparse vector obtained with sparse vector acquiring unit is index, retrieve in visual memory storehouse, using the video that retrieves as result for retrieval, the feature at the position that is blocked is replaced in video to be identified with the local feature of video in result for retrieval, obtain the HOG characteristic set of the video after replacing, as new local feature; With " super complete " base vector, feature coding is carried out to this new local feature, obtain new sparse vector, as the coding result of video to be identified, express the human action in video to be identified with the coding result of video to be identified;

Human action classification acquiring unit is used for: the coding result of the video to be identified obtained by the second coding unit is sent into training pattern and tested, and obtains the human action classification in video to be identified.

Those skilled in the art can carry out various modifications and variations to the embodiment of the present invention, if these amendments and modification are within the scope of the claims in the present invention and equivalent technologies thereof, then these revise and modification also within protection scope of the present invention.

The prior art that the content do not described in detail in instructions is known to the skilled person.

Claims

1. based on a human motion recognition method for human brain visual memory principle, it is characterized in that, comprise the following steps:

A, training stage:

B, cognitive phase:

2. as claimed in claim 1 based on the human motion recognition method of human brain visual memory principle, it is characterized in that: in steps A, describedly to the process that each training video carries out intensive sampling be respectively: for individualized training video, centered by intensive sampling point, find multiple local sampling blocks of this training video.

3., as claimed in claim 2 based on the human motion recognition method of human brain visual memory principle, it is characterized in that: described local sampling block is of a size of the arbitrary dimension being less than training video size.

4., as claimed in claim 3 based on the human motion recognition method of human brain visual memory principle, it is characterized in that: described local sampling block is of a size of 16 × 16 × 4 pixels.

5. as claimed in claim 1 based on the human motion recognition method of human brain visual memory principle, it is characterized in that: in steps A 5, adopt content based video retrieval system system to carry out analog vision data base.

6. the human motion recognition method based on human brain visual memory principle according to any one of claim 1 to 5, it is characterized in that: the detailed process at position of determining in step B3 to be blocked in video to be identified is: the image entropy calculating each local sampling block in video to be identified, entropy is exactly the position be blocked lower than the position at the local sampling block place of predetermined threshold value, and predetermined threshold value is determined in an experiment.

7. the human action recognition system based on human brain visual memory principle, it is characterized in that: comprise a HOG characteristic set acquiring unit, " super complete " base vector acquiring unit, the first coding unit, training pattern generation unit, visual memory storehouse construction unit, the 2nd HOG characteristic set acquiring unit, sparse vector acquiring unit, the second coding unit, human action classification acquiring unit, wherein:

8. as claimed in claim 7 based on the human action recognition system of human brain visual memory principle, it is characterized in that: a described HOG characteristic set acquiring unit to the process that each training video carries out intensive sampling is respectively: for individualized training video, centered by intensive sampling point, find multiple local sampling blocks of this training video.

9. as claimed in claim 7 based on the human action recognition system of human brain visual memory principle, it is characterized in that: described visual memory storehouse construction unit adopts content based video retrieval system system to carry out analog vision data base.

10. the human action recognition system based on human brain visual memory principle according to any one of claim 7 to 9, it is characterized in that: described second coding unit determine to be blocked in the video to be identified detailed process at position is: the image entropy calculating each local sampling block in video to be identified, entropy is exactly the position be blocked lower than the position at the local sampling block place of predetermined threshold value, and predetermined threshold value is determined in an experiment.