CN108629326A - The action behavior recognition methods of objective body and device - Google Patents
The action behavior recognition methods of objective body and device Download PDFInfo
- Publication number
- CN108629326A CN108629326A CN201810455262.5A CN201810455262A CN108629326A CN 108629326 A CN108629326 A CN 108629326A CN 201810455262 A CN201810455262 A CN 201810455262A CN 108629326 A CN108629326 A CN 108629326A
- Authority
- CN
- China
- Prior art keywords
- video
- objective body
- action behavior
- video features
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000009471 action Effects 0.000 title claims abstract description 87
- 238000000034 method Methods 0.000 title claims abstract description 76
- 230000000694 effects Effects 0.000 claims abstract description 31
- 230000006399 behavior Effects 0.000 claims description 69
- 238000013528 artificial neural network Methods 0.000 claims description 42
- 238000012549 training Methods 0.000 claims description 30
- 238000010801 machine learning Methods 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 12
- 210000005036 nerve Anatomy 0.000 claims description 11
- 238000013527 convolutional neural network Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 6
- 239000010410 layer Substances 0.000 description 18
- 230000004927 fusion Effects 0.000 description 15
- 238000012360 testing method Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 230000002123 temporal effect Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 210000004218 nerve net Anatomy 0.000 description 2
- 238000005303 weighing Methods 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000005267 amalgamation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005086 pumping Methods 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- 230000009182 swimming Effects 0.000 description 1
- 230000026676 system process Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to technical field of computer vision, and in particular to a kind of action behavior recognition methods of objective body and device, it is intended to the technical issues of how solution accurately identifies action behavior in the video with similar background.For this purpose, the action behavior recognition methods of objective body includes in the present invention:Based on the Activity recognition model built in advance, the sequential video features of objective body are obtained, and according to acquired sequential video features, predict the corresponding generic probability of action behavior classification of each preset objective body;According to prediction result, the action behavior classification of objective body is determined.Based on the method for the present invention, the feature of video entirety can be captured well, can be good at identifying with action behavior in similar background and confusing video with this.
Description
Technical field
The present invention relates to technical field of computer vision, and in particular to a kind of action behavior recognition methods of objective body and dress
It sets.
Background technology
Human action Activity recognition technology is widely used in the fields such as human-computer intellectualization, Virtual Realization and video monitoring,
It can be distinguished and judge to action behavior of the people below different scenes.Traditional action behavior recognition methods, such as base
In the action behavior recognition methods of double-current convolutional neural networks, mainly by extracting and analyzing video features come identification maneuver row
For.
Action behavior recognition methods based on double-current convolutional neural networks mainly includes the following steps:First, video is torn open
It is divided into spatially and temporally both modalities which, and the data of both modalities is respectively processed.Secondly, to treated two kinds of moulds
State data carry out Fusion Features.Finally, the action behavior class label corresponding to current video is judged according to Fusion Features result.
Although the action behavior classification corresponding to video can be recognized accurately in this action behavior recognition methods, it is often utilized
The single frame information of video to double-current convolutional neural networks carry out network training (local message that can only learn video), therefore
Also the local feature of video can only be extracted when extracting video features.When to the video with similar background (such as play ball and slamdunk) into
When row action recognition, it is impossible to action behavior classification be recognized accurately.
Invention content
In order to solve the above problem in the prior art, in order to solve how to accurately identify the video with similar background
The technical issues of middle action behavior.For this purpose, the first aspect of the present invention, provides a kind of action behavior identification of objective body
Method, the action behavior recognition methods include:
Based on the Activity recognition model built in advance, the sequential video features of the objective body are obtained, and according to acquired
Sequential video features, predict the corresponding generic probability of action behavior classification of each preset objective body;
According to prediction result, the action behavior classification of the objective body is determined;
Wherein, the Activity recognition model is based on preset objective body video sample, and utilization machine learning algorithm institute
The double-current convolutional neural networks model of structure.
Further, an optimal technical scheme provided by the invention is:
The step of " the sequential video features for obtaining the objective body " includes:
Obtain the spatial domain video information and time-domain video information of the target volumetric video;
Based on preset characteristic-acquisition method, and according to the spatial domain video information, the target volumetric video is obtained in sky
Sequential video features under the mode of domain;
Based on the characteristic-acquisition method, and according to the time-domain video information, the target volumetric video is obtained in time domain
Sequential video features under mode.
Further, an optimal technical scheme provided by the invention is:
The characteristic-acquisition method includes:
Particular video information is carried out to take out frame processing, obtains multiple video segment informations;The particular video information is spatial domain
Video information or time-domain video information;
Multiple video segment informations are encoded respectively, the corresponding feature of each video segment information is obtained and compiles
Code, and the feature coding of all video segment informations is merged, obtain the first global video features;
The particular video information is encoded, the corresponding second global video features of the particular video information are obtained;
Described first global video features and the second global video features are merged, the particular video information is obtained and corresponds to
Sequential video features.
Further, an optimal technical scheme provided by the invention is:
" according to acquired sequential video features, predict that the preset action behavior classification of each of described objective body corresponds to
Generic probability " the step of include:
According to sequential video features of the target volumetric video under the mode of spatial domain, each action classification of prediction corresponds to
The first probability value;
According to sequential video features of the target volumetric video under Time-Domain Modal, each action classification of prediction corresponds to
The second probability value;
First probability value and the second probability value are merged, it is general to obtain the corresponding generic of each action classification
Rate.
Further, an optimal technical scheme provided by the invention is:
" first probability value and the second probability value are merged, the corresponding generic of each action classification is obtained
The step of probability " includes:
Summation is weighted to first probability value and the second probability value, obtains the generic probability.
Further, an optimal technical scheme provided by the invention is:
The Activity recognition model includes spatial domain neural network and time domain neural network;" based on the behavior built in advance
Identification model obtains the sequential video features of the objective body, and according to acquired sequential video features, predicts the target
Before the step of corresponding generic probability of the preset action behavior classification of each of body ", the method further includes:
Parameters weighting initialization is carried out to the spatial domain neural network and time domain neural network respectively;
Obtain the sequential video features of the objective body video sample;
Model is carried out to the Activity recognition model according to acquired sequential video features, and using machine learning algorithm
Training.
Further, an optimal technical scheme provided by the invention is:
The step of " carrying out parameters weighting initialization to the spatial domain neural network and time domain neural network respectively " includes:
The parameters weighting for the first nerves network for being previously-completed network training is obtained, and according to acquired parameters weighting pair
The spatial domain neural network carries out parameters weighting initialization;
The parameters weighting for the nervus opticus network for being previously-completed network training is obtained, and according to acquired parameters weighting pair
The time domain neural network carries out parameters weighting initialization;
Wherein, the first nerves network is to be based on Imagenet data sets, and carry out using the machine learning algorithm
The neural network that network training obtains;The nervus opticus network completes network training using the machine learning algorithm
Light stream mode neural network in TSN networks.
Further, an optimal technical scheme provided by the invention is:
" mould is carried out to the Activity recognition model according to acquired sequential video features, and using machine learning algorithm
The step of type training " include according to the sequential video features and object function E shown in following formula, and utilize machine learning algorithm
Model training is carried out to the Activity recognition model:
Wherein, zjFor the corresponding true generic label of j-th of action behavior classification, zjValue be 0 arrive n-1, PjFor jth
The corresponding generic probability of a action behavior classification, fj-1(x) it is the corresponding nodal value of j-th of action behavior classification.
The second aspect of the present invention additionally provides a kind of storage device, wherein being stored with a plurality of program, described program is suitable for
It is loaded by processor and is executed to realize the action behavior recognition methods of above-mentioned objective body.
The third aspect of the present invention additionally provides a kind of control device, including:
Processor is adapted for carrying out each program;
Storage device is suitable for storing a plurality of program;
Described program is suitable for being loaded by processor and being executed to realize the action behavior recognition methods of above-mentioned objective body.
Compared with the immediate prior art, above-mentioned technical proposal at least has the advantages that:
In the inventive solutions, the sequential video features of objective body are obtained by Activity recognition model, and according to
The sequential video features predict that the action behavior classification of objective body, this method can be good at one video entirety of capture
Feature can identify the similar action behavior classification of background and confusing action behavior classification well;The present invention
In sequential video features acquisition methods can extract sequential video features, this feature can embody the video letter of different scale
Breath, the similar action behavior classification of which background can be preferably distinguished based on this.
Description of the drawings
Fig. 1 is a kind of key step schematic diagram of the action behavior recognition methods of objective body in the embodiment of the present invention;
Fig. 2 is a kind of primary structure schematic diagram of Activity recognition model in the embodiment of the present invention;
Fig. 3 is a kind of key step schematic diagram of the coding method of sequential video features in the embodiment of the present invention.
Specific implementation mode
The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this
A little embodiments are used only for explaining the technical principle of the present invention, it is not intended that limit the scope of the invention.
That compares mainstream at present carries out action behavior knowledge method for distinguishing based on deep neural network, is first to split into video
Two mode, respectively spatially and temporally both modalities which, are respectively processed later, and probability is carried out in the last output end of network
The Fusion Features of level carry out the last class label judged corresponding to a video, such as based on double fluids such as two-stream, TSN
The method of network.But the overwhelming majority based on binary-flow network method be all built upon frame level not on feature, such as two-stream
It is the input of single frames while the test of single frames when training, even if TSN is the input of one section of video when training, network
Also there are carry out Fusion Features when being trained, but it is also only to be merged to the feature of single frames to merge, and is not had completely
The information for having the information for the sequential for considering that video is included even whole.This network is only phenomenologically to have done a field
Scape is classified, so can be very good to distinguish for this kind of action of swimming and play football.But if encounter the similar classification ratio of background
As shot and slamdunking, most binary-flow network methods are all indistinguishable.
It to solve the above-mentioned problems, can be extensive the invention discloses a kind of action behavior recognition methods of objective body
For the behavior classification problem below natural scene.This method is using deep neural network come the video sample to different behavior classifications
Originally judgement is distinguished, still can ensure higher recognition accuracy in large-scale video data concentration.
Below in conjunction with the accompanying drawings, the action behavior recognition methods of objective body provided by the invention is illustrated.
Refering to attached drawing 1, Fig. 1 illustrates a kind of the main of the action behavior recognition methods of objective body in the present embodiment
Step, as shown in Figure 1, the action behavior recognition methods of objective body may include following the description in the present embodiment:
Step S101:Based on the Activity recognition model built in advance, the sequential video features of objective body are obtained.
The Activity recognition model built in advance in this implementation be based on preset objective body video sample, and utilize engineering
Practise the double-current convolutional neural networks model constructed by algorithm.And the video of objective body is broken down into spatially and temporally two mode.
Refering to attached drawing 2, attached drawing 2 illustrates the primary structure of Activity recognition model in this implementation, as shown in Fig. 2,
Activity recognition model in this implementation is a double-current convolutional neural networks, and that basic network is selected is BN-Inception.
Activity recognition model includes spatial domain neural network and time domain neural network, wherein spatial domain neural network is first to extract the sky of video
Domain video information, and temporal aspect coding is carried out to spatial domain video information by a depth characteristic coding layer, obtain objective body
Sequential video features of the video under the mode of spatial domain.Similarly, time domain neural network is the time-domain video information of first extraction video, and
Temporal aspect coding is carried out to time domain video information by a depth characteristic coding layer, obtains target volumetric video in Time-Domain Modal
Under sequential video features.It finally will be under the sequential video features and Time-Domain Modal under the mode of spatial domain in a manner of Weighted Fusion
Sequential video features are merged, and the generic probability of each action classification is obtained, by comparing the corresponding class of everything classification
The relative size for belonging to probability, selects the highest action classification of probability value, the action behavior recognition result as objective body.It needs
Bright, spatial domain video information is the feature of video spatial domain mode, is each frame picture i.e. RGB information;Time-domain video information is
The feature of video time domain mode, be the Optical-flow Feature i.e. light stream picture of video, one video of the feature instantiation of Time-Domain Modal
Movable information feature.It is the method extraction spatial domain video information and time-domain video information using dense-flow in the present embodiment
's.
Specifically, the step of Activity recognition model in the present embodiment, the sequential video features for obtaining objective body includes:
Obtain the spatial domain video information and time-domain video information of target volumetric video;Wherein, objective body video include spatial domain and
Time domain both modalities which, the present embodiment are the spatial domain video information extracted based on spatial domain neural network under the mode of spatial domain, are based on time domain
Neural network extracts the time-domain video information under Time-Domain Modal.
Based on preset characteristic-acquisition method, and according to the spatial domain video information extracted, target volumetric video is obtained in sky
Sequential video features under the mode of domain.
Based on the characteristic-acquisition method, and according to the time-domain video information extracted, target volumetric video is obtained in time domain
Sequential video features under mode.
Specifically, characteristic-acquisition method is by the depth characteristic coding layer of above-mentioned Activity recognition model come real in this implementation
Existing, characteristic-acquisition method is:
The video information of objective body is carried out to take out frame processing, obtains multiple video segment informations;To multiple video segment informations point
It is not encoded, obtains the corresponding feature coding of each video segment information, and the feature coding of all video segment informations is merged,
Obtain the first global video features;The video information of entire objective body is encoded, the video information pair of the objective body is obtained
The the second global video features answered;First global video features and the second global video features are merged, regarding for objective body is obtained
The corresponding sequential video features of frequency information.It should be noted that the video information of goal body can be spatial domain video letter
Breath, can also be time-domain video information.
Refering to attached drawing 3, Fig. 3 illustrates the key step of the coding method of sequential video features in the present embodiment,
As shown in figure 3, carrying out feature coding on last layer of convolution characteristic pattern of BN-Inception, which is sky
Domain video information or time-domain video information.Because the embodiment of the present invention is to have carried out the processing of pumping frame to video, in order to embody video
The information of different scale constructs the coding layer based on local feature and the coding layer based on global characteristics.Wherein, local feature
Coding layer coding method such as formula (1)-(4) shown in:
s(i- > j/4)=max { xi,x(i+1),...,x(j/4)} (1)
s(j/4+1- > j/2)=max { x(j/4+1),x(j/4+2),...,x(j/2)} (2)
s(j/2+1- > 3*j/4)=max { x(j/2+1),x(j/2+2),...,x(3*j/4)} (3)
s(3*j/4+1- > j)=max { x(3*j/4+1),x(3*j/4+2),...,x(j)} (4)
Wherein xiIndicate the feature of video frame, s(i- > j/4),s(j/4+1- > j/2),s(j/2+1- > 3*j/4),s(3*j/4- > j)Indicate every section
Feature of the video information after coding here mainly encodes every section of video information using the method in maximum pond,
After coding is completed, every section of video information coding is concatenated together into an entirety, i.e., the first global video features.
The coding layer of global characteristics is the coding to entire video information, and the coding method of the coding layer of global characteristics is such as public
Shown in formula (5):
s(i- > j)=max { xi,xi+1,...,xj} (5)
Global characteristics coding layer is to carry out maximum pond to all video frame to obtain a global character representation, i.e., the
Two global video features.So far, we have been respectively completed the mark sheet of the first global video features and the second global video features
Show, the sequential video that the first global video features and the second global video features finally are connected together to obtain objective body is special
Sign, this feature indicate that the timing information of video can be embodied.Therefore the similar classification of those backgrounds can be distinguished.
Specifically, include to the network training method of Activity recognition model in this implementation:
Step Sa1:Parameters weighting initialization is carried out to spatial domain neural network and time domain neural network respectively.It is right in this implementation
Spatial domain neural network and the method for time domain neural network progress parameters weighting initialization are:It obtains and is previously-completed the of network training
The parameters weighting of one neural network, and parameters weighting initialization is carried out to spatial domain neural network according to acquired parameters weighting;
The parameters weighting for the nervus opticus network for being previously-completed network training is obtained, and according to acquired parameters weighting to time domain nerve
Network carries out parameters weighting initialization.Wherein, first nerves network is to be based on Imagenet data sets, and calculate using machine learning
Method carries out the neural network that network training obtains;Nervus opticus network is the TSN that network training is completed using machine learning algorithm
Light stream mode neural network in (Temporal Segment Network) network.Activity recognition model passes through in the present embodiment
The parameters weighting of trained neural network is loaded, Activity recognition model Fast Convergent can be made in the training process, to save
The training time of model.It need to say it is noted that in behavior identification model training process, step Sa1 can also be saved, be based on
Training set directly carries out network training to Activity recognition model.
Step Sa2:Obtain the sequential video features of objective body video sample;Specifically, special based on above-mentioned sequential video
The coding method of sign obtain respectively under sequential video features and Time-Domain Modal under the spatial domain mode of objective body video sample when
Sequence video features.
Step Sa3:Sequential video features acquired in step Sa2, and using machine learning algorithm to the behavior
Identification model carries out model training;Specifically, according to object function E shown in the sequential video features of acquisition and formula (6), and
Object function E is minimized using back propagation with having supervision, to complete the network training to Activity recognition model:
Wherein, zjFor the corresponding true generic label of j-th of action behavior classification, zjValue be 0 arrive n-1, PjFor jth
The corresponding generic probability of a action behavior classification, fj-1(x) it is the corresponding nodal value of j-th of action behavior classification.
Step S102:According to acquired sequential video features, the action behavior classification of each preset objective body is predicted
Corresponding generic probability.
Specifically, this implementation is the sequential video features under the mode of spatial domain according to target volumetric video, is predicted each described
Corresponding first probability value of action classification;According to sequential video features of the target volumetric video under Time-Domain Modal, predict each dynamic
Make corresponding second probability value of classification;First probability value and the second probability value are merged, each action classification is obtained and corresponds to
Generic probability.
In the present embodiment, the sequential video features under sequential video features or Time-Domain Modal under the mode of spatial domain exist
A probability value can be exported after softmax, then by the way of a kind of fusion of probability level characteristic weighing, is obtained each dynamic
Make the corresponding generic probability of classification.Wherein, the fusion of feature is divided into early fusion and late fusion, early
Fusion refers to being merged in feature map i.e. characteristic pattern, therefore network mould has been arrived in early fusion actual participations
The training of type and test process.And late fusion are referred to when being tested, because different modalities feature is in softmax
A probability value can be exported later, probability value need to be only weighted again to judge the classification belonging to current video sample, it is this
Amalgamation mode is known as the characteristic weighing fusion based on probability level.It should be noted that this Weighted Fusion it can be appreciated that plus
The first obtained probability value and the second probability value, i.e., be weighted, then sum again by power summation respectively, finally obtains each dynamic
Make the generic probability of behavior.
Step S103:According to prediction result, the action behavior classification of objective body is determined.
Specifically, by comparing the relative size of the corresponding generic probability of each action behavior classification, probability value is selected most
High generic label, the action behavior classification as objective body.
Illustrate the action behavior identification of the objective body of an embodiment of the present invention by taking certain action recognition data set as an example below
Method.The data set includes 13000 video clips, belongs to 101 classifications in total, including walking, runs, plays basketball.Often
A video only only belongs to some and fixes classification.Obtained model can carry out action classification mark to these videos automatically.
Specifically, the action behavior recognition methods of the objective body of the present embodiment includes the following steps:
Step Sb1, using 9000 video samples in data set as training set, remaining 4000 are used as test set.
Step Sb2, using a BN-Inception network as the basic network of entire frame.Specifically, one is established
A sequential video features encoding nerve network (Activity recognition model) that behavior classification judgement can be carried out to video sample, network
Task definition be a polytypic problem, the number of plies of the network and every layer of number of nodes are set.The output layer section of the network
Point quantity is equal with the quantity of behavior classification for needing to identify, each node corresponds to a kind of classification of behavior, and given one regards
Frequency sample v, the output layer output relevant nodal values of class label l are:f1(v),f2(v)...fn(v), wherein f (v) be by when
The mapping function of sequence video features encoding nerve net definitions.Shown in the corresponding mapping function of monolayer neural networks such as formula (8):
Wherein, g (v)=1/ (1+e-x) indicate an activation primitive, input x.And a complicated network is by multilayer
Simple network is formed by stacking, and then can obtain the expression formula such as (9) institute of sequential video features encoding nerve network output layer
Show:
Wherein,For the weight of video temporal aspect encoding nerve network.
Step Sb3, the parameters weighting based on the good neural network of the pre-training on Imagenet data sets are regarded as sequential
The initiation parameter weight of frequency feature coding neural network, and the parameter using bottom-up mode loaded floor by floor per layer network
Weight.
Step Sb4 utilizes traditional nerve net by minimizing the object function of sequential video features encoding nerve network
Network back-propagation algorithm adjusts the weight of sequential video features encoding nerve network.When each accuracy rate no longer rises
Learning rate is adjusted, until to the last accuracy rate no longer rises.The object function E of Activity recognition model is that data are true
Cross entropy between label and model prediction label, shown in object function E such as formula (6).
Test video sample is inputted trained model, calculates the nodal value of generic, comparison node by step Sb5
The relative size of value determines the classification belonging to test video sample.It should be noted that nodal value is each node of model output layer
Output valve.
Specifically, trained model is obtained by step Sb4, to mode input test video v, model can calculate separately
Spatial domain mode and Time-Domain Modal correspond to the value of the node of each class label j, wherein the neural network output layer under the mode of spatial domain
The value of node isThe value of neural network output node layer is under Time-Domain ModalBy the value of the node of two mode
It is weighted fusion according to fixed ratio, obtains the value f of final class label nodej(v), by comparing all nodal values
Relative size i.e. we can be by the relative size to probability different classes of belonging to each sample come discriminating test video v
Affiliated class label j.
Further, the action identification method embodiment based on above-mentioned objective body, the present invention also provides a kind of storage dresses
It sets, a plurality of program can be stored in the storage device, program is suitable for being loaded by processor and being executed such as above-mentioned objective body
Action identification method.
Still further, the action identification method embodiment based on above-mentioned objective body, the present invention also provides a kind of processing
Device, the processing unit may include processor, storage device;Processor is adapted for carrying out each program;Storage device is suitable for
Store a plurality of program;Program is suitable for being loaded by processor and being executed the action identification method such as above-mentioned objective body.
Person of ordinary skill in the field can be understood that for convenience of description and succinctly, the present invention is real
Apply the specific work process and related description of the device of example, can refer to previous embodiment method in corresponding process, and with
Above method advantageous effect having the same, details are not described herein.
Those skilled in the art should be able to recognize that, side described in conjunction with the examples disclosed in the embodiments of the present disclosure
Method step, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate electronic hardware and
The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These
Function is executed with electronic hardware or software mode actually, depends on the specific application and design constraint of technical solution.
Those skilled in the art can use different methods to achieve the described function each specific application, but this reality
Now it should not be considered as beyond the scope of the present invention.
Term " first ", " second " etc. are for distinguishing similar object, rather than for describing or indicating specific suitable
Sequence or precedence.
Term " comprising " or any other like term are intended to cover non-exclusive inclusion, so that including a system
Process, the method for row element include not only those elements, but also include the other elements being not explicitly listed, or further include
The intrinsic element of these processes, method.
So far, it has been combined preferred embodiment shown in the drawings and describes technical scheme of the present invention, still, this field
Technical staff is it is easily understood that protection scope of the present invention is expressly not limited to these specific implementation modes.Without departing from this
Under the premise of the principle of invention, those skilled in the art can make the relevant technologies feature equivalent change or replacement, these
Technical solution after change or replacement is fallen within protection scope of the present invention.
Claims (10)
1. a kind of action behavior recognition methods of objective body, which is characterized in that the action behavior recognition methods includes:
Based on the Activity recognition model built in advance, obtain the sequential video features of the objective body, and according to it is acquired when
Sequence video features predict the corresponding generic probability of action behavior classification of each preset objective body;
According to prediction result, the action behavior classification of the objective body is determined;
Wherein, the Activity recognition model is based on preset objective body video sample, and using constructed by machine learning algorithm
Double-current convolutional neural networks model.
2. the action behavior recognition methods of objective body according to claim 1, which is characterized in that " obtain the objective body
Sequential video features " the step of include:
Obtain the spatial domain video information and time-domain video information of the target volumetric video;
Based on preset characteristic-acquisition method, and according to the spatial domain video information, the target volumetric video is obtained in spatial domain mould
Sequential video features under state;
Based on the characteristic-acquisition method, and according to the time-domain video information, the target volumetric video is obtained in Time-Domain Modal
Under sequential video features.
3. the action behavior recognition methods of objective body according to claim 2, which is characterized in that the characteristic-acquisition method
Including:
Particular video information is carried out to take out frame processing, obtains multiple video segment informations;The particular video information is spatial domain video
Information or time-domain video information;
Multiple video segment informations are encoded respectively, obtain the corresponding feature coding of each video segment information, and
The feature coding of all video segment informations is merged, the first global video features are obtained;
The particular video information is encoded, the corresponding second global video features of the particular video information are obtained;
Described first global video features and the second global video features are merged, obtain the particular video information it is corresponding when
Sequence video features.
4. the action behavior recognition methods of objective body according to claim 2, which is characterized in that " according to it is acquired when
Sequence video features predict the corresponding generic probability of the preset action behavior classification of each of described objective body " the step of include:
According to sequential video features of the target volumetric video under the mode of spatial domain, each action classification corresponding the is predicted
One probability value;
According to sequential video features of the target volumetric video under Time-Domain Modal, each action classification corresponding the is predicted
Two probability values;
First probability value and the second probability value are merged, the corresponding generic probability of each action classification is obtained.
5. the action behavior recognition methods of objective body according to claim 4, which is characterized in that " to first probability
Value and the second probability value are merged, and the corresponding generic probability of each action classification is obtained " the step of include:
Summation is weighted to first probability value and the second probability value, obtains the generic probability.
6. the action behavior recognition methods of objective body according to any one of claims 1-5, which is characterized in that the row
Include spatial domain neural network and time domain neural network for identification model;" based on the Activity recognition model built in advance, obtaining institute
The sequential video features of objective body are stated, and according to acquired sequential video features, predicts that each of described objective body is preset
Before the step of corresponding generic probability of action behavior classification ", the method further includes:
Parameters weighting initialization is carried out to the spatial domain neural network and time domain neural network respectively;
Obtain the sequential video features of the objective body video sample;
Model instruction is carried out to the Activity recognition model according to acquired sequential video features, and using machine learning algorithm
Practice.
7. the action behavior recognition methods of objective body according to claim 6, which is characterized in that " respectively to the spatial domain
Neural network and time domain neural network carry out parameters weighting initialization " the step of include:
The parameters weighting for the first nerves network for being previously-completed network training is obtained, and according to acquired parameters weighting to described
Spatial domain neural network carries out parameters weighting initialization;
The parameters weighting for the nervus opticus network for being previously-completed network training is obtained, and according to acquired parameters weighting to described
Time domain neural network carries out parameters weighting initialization;
Wherein, the first nerves network is to be based on Imagenet data sets, and carry out network using the machine learning algorithm
The neural network that training obtains;The nervus opticus network is the TSN nets that network training is completed using the machine learning algorithm
Light stream mode neural network in network.
8. the action behavior recognition methods of objective body according to claim 6, which is characterized in that
" model instruction is carried out to the Activity recognition model according to acquired sequential video features, and using machine learning algorithm
Practice " the step of include according to the sequential video features and object function E shown in following formula, and utilization machine learning algorithm is to institute
It states Activity recognition model and carries out model training:
Wherein, zjFor the corresponding true generic label of j-th of action behavior classification, zjValue be 0 arrive n-1, PjIt is dynamic for j-th
Make the corresponding generic probability of behavior classification, fj-1(x) it is the corresponding nodal value of j-th of action behavior classification.
9. a kind of storage device, wherein being stored with a plurality of program, which is characterized in that described program is suitable for being loaded and being held by processor
Row is to realize the action behavior recognition methods of the objective body described in any one of claim 1-8.
10. a kind of control device, including:
Processor is adapted for carrying out each program;
Storage device is suitable for storing a plurality of program;
It is characterized in that, described program is suitable for being loaded by processor and being executed to realize described in any one of claim 1-8
The action behavior recognition methods of objective body.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810455262.5A CN108629326A (en) | 2018-05-14 | 2018-05-14 | The action behavior recognition methods of objective body and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810455262.5A CN108629326A (en) | 2018-05-14 | 2018-05-14 | The action behavior recognition methods of objective body and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108629326A true CN108629326A (en) | 2018-10-09 |
Family
ID=63693105
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810455262.5A Pending CN108629326A (en) | 2018-05-14 | 2018-05-14 | The action behavior recognition methods of objective body and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108629326A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109670446A (en) * | 2018-12-20 | 2019-04-23 | 泉州装备制造研究所 | Anomaly detection method based on linear dynamic system and depth network |
CN109685801A (en) * | 2018-12-10 | 2019-04-26 | 杭州帝视科技有限公司 | In conjunction with the skin lens image processing method of textural characteristics and deep neural network information |
CN110046278A (en) * | 2019-03-11 | 2019-07-23 | 北京奇艺世纪科技有限公司 | Video classification methods, device, terminal device and storage medium |
CN110110651A (en) * | 2019-04-29 | 2019-08-09 | 齐鲁工业大学 | Activity recognition method in video based on space-time importance and 3D CNN |
CN110210454A (en) * | 2019-06-17 | 2019-09-06 | 合肥工业大学 | A kind of human action pre-judging method based on data fusion |
CN110232327A (en) * | 2019-05-21 | 2019-09-13 | 浙江师范大学 | A kind of driving fatigue detection method based on trapezoidal concatenated convolutional neural network |
CN110287789A (en) * | 2019-05-23 | 2019-09-27 | 北京百度网讯科技有限公司 | Game video classification method and system based on internet data |
CN110443182A (en) * | 2019-07-30 | 2019-11-12 | 深圳市博铭维智能科技有限公司 | A kind of urban discharging pipeline video abnormality detection method based on more case-based learnings |
CN110458038A (en) * | 2019-07-19 | 2019-11-15 | 天津理工大学 | The cross-domain action identification method of small data based on double-strand depth binary-flow network |
CN110738129A (en) * | 2019-09-20 | 2020-01-31 | 华中科技大学 | end-to-end video time sequence behavior detection method based on R-C3D network |
CN110751034A (en) * | 2019-09-16 | 2020-02-04 | 平安科技(深圳)有限公司 | Pedestrian behavior identification method and terminal equipment |
CN111242007A (en) * | 2020-01-10 | 2020-06-05 | 上海市崇明区生态农业科创中心 | Farming behavior supervision method |
CN111382616A (en) * | 2018-12-28 | 2020-07-07 | 广州市百果园信息技术有限公司 | Video classification method and device, storage medium and computer equipment |
CN112489092A (en) * | 2020-12-09 | 2021-03-12 | 浙江中控技术股份有限公司 | Fine-grained industrial motion mode classification method, storage medium, equipment and device |
CN112651330A (en) * | 2020-12-23 | 2021-04-13 | 平安银行股份有限公司 | Target object behavior detection method and device and computer equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106709461A (en) * | 2016-12-28 | 2017-05-24 | 中国科学院深圳先进技术研究院 | Video based behavior recognition method and device |
US20170294091A1 (en) * | 2016-04-06 | 2017-10-12 | Nec Laboratories America, Inc. | Video-based action recognition security system |
CN107862376A (en) * | 2017-10-30 | 2018-03-30 | 中山大学 | A kind of human body image action identification method based on double-current neutral net |
-
2018
- 2018-05-14 CN CN201810455262.5A patent/CN108629326A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170294091A1 (en) * | 2016-04-06 | 2017-10-12 | Nec Laboratories America, Inc. | Video-based action recognition security system |
CN106709461A (en) * | 2016-12-28 | 2017-05-24 | 中国科学院深圳先进技术研究院 | Video based behavior recognition method and device |
CN107862376A (en) * | 2017-10-30 | 2018-03-30 | 中山大学 | A kind of human body image action identification method based on double-current neutral net |
Non-Patent Citations (2)
Title |
---|
JIAGANG ZHU等: ""End-to-end Video-level Representation Learning for Action Recognition"", 《ARXIV》 * |
LIMIN WANG等: ""Temporal Segment Networks: Towards Good Practices for Deep Action Recognition"", 《ARXIV》 * |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109685801A (en) * | 2018-12-10 | 2019-04-26 | 杭州帝视科技有限公司 | In conjunction with the skin lens image processing method of textural characteristics and deep neural network information |
CN109685801B (en) * | 2018-12-10 | 2021-03-26 | 杭州帝视科技有限公司 | Skin mirror image processing method combining texture features and deep neural network information |
CN109670446A (en) * | 2018-12-20 | 2019-04-23 | 泉州装备制造研究所 | Anomaly detection method based on linear dynamic system and depth network |
CN109670446B (en) * | 2018-12-20 | 2022-09-13 | 泉州装备制造研究所 | Abnormal behavior detection method based on linear dynamic system and deep network |
CN111382616A (en) * | 2018-12-28 | 2020-07-07 | 广州市百果园信息技术有限公司 | Video classification method and device, storage medium and computer equipment |
CN111382616B (en) * | 2018-12-28 | 2023-08-18 | 广州市百果园信息技术有限公司 | Video classification method and device, storage medium and computer equipment |
CN110046278A (en) * | 2019-03-11 | 2019-07-23 | 北京奇艺世纪科技有限公司 | Video classification methods, device, terminal device and storage medium |
CN110110651A (en) * | 2019-04-29 | 2019-08-09 | 齐鲁工业大学 | Activity recognition method in video based on space-time importance and 3D CNN |
CN110110651B (en) * | 2019-04-29 | 2023-06-13 | 齐鲁工业大学 | Method for identifying behaviors in video based on space-time importance and 3D CNN |
CN110232327A (en) * | 2019-05-21 | 2019-09-13 | 浙江师范大学 | A kind of driving fatigue detection method based on trapezoidal concatenated convolutional neural network |
CN110287789A (en) * | 2019-05-23 | 2019-09-27 | 北京百度网讯科技有限公司 | Game video classification method and system based on internet data |
CN110210454A (en) * | 2019-06-17 | 2019-09-06 | 合肥工业大学 | A kind of human action pre-judging method based on data fusion |
CN110210454B (en) * | 2019-06-17 | 2020-12-29 | 合肥工业大学 | Human body action pre-judging method based on data fusion |
CN110458038A (en) * | 2019-07-19 | 2019-11-15 | 天津理工大学 | The cross-domain action identification method of small data based on double-strand depth binary-flow network |
CN110443182A (en) * | 2019-07-30 | 2019-11-12 | 深圳市博铭维智能科技有限公司 | A kind of urban discharging pipeline video abnormality detection method based on more case-based learnings |
CN110751034A (en) * | 2019-09-16 | 2020-02-04 | 平安科技(深圳)有限公司 | Pedestrian behavior identification method and terminal equipment |
CN110751034B (en) * | 2019-09-16 | 2023-09-01 | 平安科技(深圳)有限公司 | Pedestrian behavior recognition method and terminal equipment |
CN110738129A (en) * | 2019-09-20 | 2020-01-31 | 华中科技大学 | end-to-end video time sequence behavior detection method based on R-C3D network |
CN110738129B (en) * | 2019-09-20 | 2022-08-05 | 华中科技大学 | End-to-end video time sequence behavior detection method based on R-C3D network |
CN111242007A (en) * | 2020-01-10 | 2020-06-05 | 上海市崇明区生态农业科创中心 | Farming behavior supervision method |
CN112489092A (en) * | 2020-12-09 | 2021-03-12 | 浙江中控技术股份有限公司 | Fine-grained industrial motion mode classification method, storage medium, equipment and device |
CN112489092B (en) * | 2020-12-09 | 2023-10-31 | 浙江中控技术股份有限公司 | Fine-grained industrial motion modality classification method, storage medium, device and apparatus |
CN112651330A (en) * | 2020-12-23 | 2021-04-13 | 平安银行股份有限公司 | Target object behavior detection method and device and computer equipment |
CN112651330B (en) * | 2020-12-23 | 2023-11-24 | 平安银行股份有限公司 | Target object behavior detection method and device and computer equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108629326A (en) | The action behavior recognition methods of objective body and device | |
CN111881705B (en) | Data processing, training and identifying method, device and storage medium | |
CN107391703B (en) | The method for building up and system of image library, image library and image classification method | |
CN109902798A (en) | The training method and device of deep neural network | |
CN109299657B (en) | Group behavior identification method and device based on semantic attention retention mechanism | |
CN108229268A (en) | Expression Recognition and convolutional neural networks model training method, device and electronic equipment | |
CN109446927B (en) | Double-person interaction behavior identification method based on priori knowledge | |
CN110490177A (en) | A kind of human-face detector training method and device | |
CN110502988A (en) | Group positioning and anomaly detection method in video | |
CN107818302A (en) | Non-rigid multiple dimensioned object detecting method based on convolutional neural networks | |
CN109325547A (en) | Non-motor vehicle image multi-tag classification method, system, equipment and storage medium | |
CN108229267A (en) | Object properties detection, neural metwork training, method for detecting area and device | |
CN106951825A (en) | A kind of quality of human face image assessment system and implementation method | |
CN108664893A (en) | A kind of method for detecting human face and storage medium | |
CN108961245A (en) | Picture quality classification method based on binary channels depth parallel-convolution network | |
CN110083700A (en) | A kind of enterprise's public sentiment sensibility classification method and system based on convolutional neural networks | |
CN110478883B (en) | Body-building action teaching and correcting system and method | |
CN107506761A (en) | Brain image dividing method and system based on notable inquiry learning convolutional neural networks | |
CN104933428B (en) | A kind of face identification method and device based on tensor description | |
CN110413838A (en) | A kind of unsupervised video frequency abstract model and its method for building up | |
CN107480642A (en) | A kind of video actions recognition methods based on Time Domain Piecewise network | |
CN106326857A (en) | Gender identification method and gender identification device based on face image | |
CN107122736A (en) | A kind of human body based on deep learning is towards Forecasting Methodology and device | |
CN110097029B (en) | Identity authentication method based on high way network multi-view gait recognition | |
CN109829959A (en) | Expression edition method and device based on face parsing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181009 |
|
RJ01 | Rejection of invention patent application after publication |