CN108629326A - The action behavior recognition methods of objective body and device - Google Patents

The action behavior recognition methods of objective body and device Download PDF

Info

Publication number
CN108629326A
CN108629326A CN201810455262.5A CN201810455262A CN108629326A CN 108629326 A CN108629326 A CN 108629326A CN 201810455262 A CN201810455262 A CN 201810455262A CN 108629326 A CN108629326 A CN 108629326A
Authority
CN
China
Prior art keywords
video
objective body
action behavior
video features
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810455262.5A
Other languages
Chinese (zh)
Inventor
王亮
张兆翔
黄岩
李林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201810455262.5A priority Critical patent/CN108629326A/en
Publication of CN108629326A publication Critical patent/CN108629326A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to technical field of computer vision, and in particular to a kind of action behavior recognition methods of objective body and device, it is intended to the technical issues of how solution accurately identifies action behavior in the video with similar background.For this purpose, the action behavior recognition methods of objective body includes in the present invention:Based on the Activity recognition model built in advance, the sequential video features of objective body are obtained, and according to acquired sequential video features, predict the corresponding generic probability of action behavior classification of each preset objective body;According to prediction result, the action behavior classification of objective body is determined.Based on the method for the present invention, the feature of video entirety can be captured well, can be good at identifying with action behavior in similar background and confusing video with this.

Description

The action behavior recognition methods of objective body and device
Technical field
The present invention relates to technical field of computer vision, and in particular to a kind of action behavior recognition methods of objective body and dress It sets.
Background technology
Human action Activity recognition technology is widely used in the fields such as human-computer intellectualization, Virtual Realization and video monitoring, It can be distinguished and judge to action behavior of the people below different scenes.Traditional action behavior recognition methods, such as base In the action behavior recognition methods of double-current convolutional neural networks, mainly by extracting and analyzing video features come identification maneuver row For.
Action behavior recognition methods based on double-current convolutional neural networks mainly includes the following steps:First, video is torn open It is divided into spatially and temporally both modalities which, and the data of both modalities is respectively processed.Secondly, to treated two kinds of moulds State data carry out Fusion Features.Finally, the action behavior class label corresponding to current video is judged according to Fusion Features result. Although the action behavior classification corresponding to video can be recognized accurately in this action behavior recognition methods, it is often utilized The single frame information of video to double-current convolutional neural networks carry out network training (local message that can only learn video), therefore Also the local feature of video can only be extracted when extracting video features.When to the video with similar background (such as play ball and slamdunk) into When row action recognition, it is impossible to action behavior classification be recognized accurately.
Invention content
In order to solve the above problem in the prior art, in order to solve how to accurately identify the video with similar background The technical issues of middle action behavior.For this purpose, the first aspect of the present invention, provides a kind of action behavior identification of objective body Method, the action behavior recognition methods include:
Based on the Activity recognition model built in advance, the sequential video features of the objective body are obtained, and according to acquired Sequential video features, predict the corresponding generic probability of action behavior classification of each preset objective body;
According to prediction result, the action behavior classification of the objective body is determined;
Wherein, the Activity recognition model is based on preset objective body video sample, and utilization machine learning algorithm institute The double-current convolutional neural networks model of structure.
Further, an optimal technical scheme provided by the invention is:
The step of " the sequential video features for obtaining the objective body " includes:
Obtain the spatial domain video information and time-domain video information of the target volumetric video;
Based on preset characteristic-acquisition method, and according to the spatial domain video information, the target volumetric video is obtained in sky Sequential video features under the mode of domain;
Based on the characteristic-acquisition method, and according to the time-domain video information, the target volumetric video is obtained in time domain Sequential video features under mode.
Further, an optimal technical scheme provided by the invention is:
The characteristic-acquisition method includes:
Particular video information is carried out to take out frame processing, obtains multiple video segment informations;The particular video information is spatial domain Video information or time-domain video information;
Multiple video segment informations are encoded respectively, the corresponding feature of each video segment information is obtained and compiles Code, and the feature coding of all video segment informations is merged, obtain the first global video features;
The particular video information is encoded, the corresponding second global video features of the particular video information are obtained;
Described first global video features and the second global video features are merged, the particular video information is obtained and corresponds to Sequential video features.
Further, an optimal technical scheme provided by the invention is:
" according to acquired sequential video features, predict that the preset action behavior classification of each of described objective body corresponds to Generic probability " the step of include:
According to sequential video features of the target volumetric video under the mode of spatial domain, each action classification of prediction corresponds to The first probability value;
According to sequential video features of the target volumetric video under Time-Domain Modal, each action classification of prediction corresponds to The second probability value;
First probability value and the second probability value are merged, it is general to obtain the corresponding generic of each action classification Rate.
Further, an optimal technical scheme provided by the invention is:
" first probability value and the second probability value are merged, the corresponding generic of each action classification is obtained The step of probability " includes:
Summation is weighted to first probability value and the second probability value, obtains the generic probability.
Further, an optimal technical scheme provided by the invention is:
The Activity recognition model includes spatial domain neural network and time domain neural network;" based on the behavior built in advance Identification model obtains the sequential video features of the objective body, and according to acquired sequential video features, predicts the target Before the step of corresponding generic probability of the preset action behavior classification of each of body ", the method further includes:
Parameters weighting initialization is carried out to the spatial domain neural network and time domain neural network respectively;
Obtain the sequential video features of the objective body video sample;
Model is carried out to the Activity recognition model according to acquired sequential video features, and using machine learning algorithm Training.
Further, an optimal technical scheme provided by the invention is:
The step of " carrying out parameters weighting initialization to the spatial domain neural network and time domain neural network respectively " includes:
The parameters weighting for the first nerves network for being previously-completed network training is obtained, and according to acquired parameters weighting pair The spatial domain neural network carries out parameters weighting initialization;
The parameters weighting for the nervus opticus network for being previously-completed network training is obtained, and according to acquired parameters weighting pair The time domain neural network carries out parameters weighting initialization;
Wherein, the first nerves network is to be based on Imagenet data sets, and carry out using the machine learning algorithm The neural network that network training obtains;The nervus opticus network completes network training using the machine learning algorithm Light stream mode neural network in TSN networks.
Further, an optimal technical scheme provided by the invention is:
" mould is carried out to the Activity recognition model according to acquired sequential video features, and using machine learning algorithm The step of type training " include according to the sequential video features and object function E shown in following formula, and utilize machine learning algorithm Model training is carried out to the Activity recognition model:
Wherein, zjFor the corresponding true generic label of j-th of action behavior classification, zjValue be 0 arrive n-1, PjFor jth The corresponding generic probability of a action behavior classification, fj-1(x) it is the corresponding nodal value of j-th of action behavior classification.
The second aspect of the present invention additionally provides a kind of storage device, wherein being stored with a plurality of program, described program is suitable for It is loaded by processor and is executed to realize the action behavior recognition methods of above-mentioned objective body.
The third aspect of the present invention additionally provides a kind of control device, including:
Processor is adapted for carrying out each program;
Storage device is suitable for storing a plurality of program;
Described program is suitable for being loaded by processor and being executed to realize the action behavior recognition methods of above-mentioned objective body.
Compared with the immediate prior art, above-mentioned technical proposal at least has the advantages that:
In the inventive solutions, the sequential video features of objective body are obtained by Activity recognition model, and according to The sequential video features predict that the action behavior classification of objective body, this method can be good at one video entirety of capture Feature can identify the similar action behavior classification of background and confusing action behavior classification well;The present invention In sequential video features acquisition methods can extract sequential video features, this feature can embody the video letter of different scale Breath, the similar action behavior classification of which background can be preferably distinguished based on this.
Description of the drawings
Fig. 1 is a kind of key step schematic diagram of the action behavior recognition methods of objective body in the embodiment of the present invention;
Fig. 2 is a kind of primary structure schematic diagram of Activity recognition model in the embodiment of the present invention;
Fig. 3 is a kind of key step schematic diagram of the coding method of sequential video features in the embodiment of the present invention.
Specific implementation mode
The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this A little embodiments are used only for explaining the technical principle of the present invention, it is not intended that limit the scope of the invention.
That compares mainstream at present carries out action behavior knowledge method for distinguishing based on deep neural network, is first to split into video Two mode, respectively spatially and temporally both modalities which, are respectively processed later, and probability is carried out in the last output end of network The Fusion Features of level carry out the last class label judged corresponding to a video, such as based on double fluids such as two-stream, TSN The method of network.But the overwhelming majority based on binary-flow network method be all built upon frame level not on feature, such as two-stream It is the input of single frames while the test of single frames when training, even if TSN is the input of one section of video when training, network Also there are carry out Fusion Features when being trained, but it is also only to be merged to the feature of single frames to merge, and is not had completely The information for having the information for the sequential for considering that video is included even whole.This network is only phenomenologically to have done a field Scape is classified, so can be very good to distinguish for this kind of action of swimming and play football.But if encounter the similar classification ratio of background As shot and slamdunking, most binary-flow network methods are all indistinguishable.
It to solve the above-mentioned problems, can be extensive the invention discloses a kind of action behavior recognition methods of objective body For the behavior classification problem below natural scene.This method is using deep neural network come the video sample to different behavior classifications Originally judgement is distinguished, still can ensure higher recognition accuracy in large-scale video data concentration.
Below in conjunction with the accompanying drawings, the action behavior recognition methods of objective body provided by the invention is illustrated.
Refering to attached drawing 1, Fig. 1 illustrates a kind of the main of the action behavior recognition methods of objective body in the present embodiment Step, as shown in Figure 1, the action behavior recognition methods of objective body may include following the description in the present embodiment:
Step S101:Based on the Activity recognition model built in advance, the sequential video features of objective body are obtained.
The Activity recognition model built in advance in this implementation be based on preset objective body video sample, and utilize engineering Practise the double-current convolutional neural networks model constructed by algorithm.And the video of objective body is broken down into spatially and temporally two mode.
Refering to attached drawing 2, attached drawing 2 illustrates the primary structure of Activity recognition model in this implementation, as shown in Fig. 2, Activity recognition model in this implementation is a double-current convolutional neural networks, and that basic network is selected is BN-Inception. Activity recognition model includes spatial domain neural network and time domain neural network, wherein spatial domain neural network is first to extract the sky of video Domain video information, and temporal aspect coding is carried out to spatial domain video information by a depth characteristic coding layer, obtain objective body Sequential video features of the video under the mode of spatial domain.Similarly, time domain neural network is the time-domain video information of first extraction video, and Temporal aspect coding is carried out to time domain video information by a depth characteristic coding layer, obtains target volumetric video in Time-Domain Modal Under sequential video features.It finally will be under the sequential video features and Time-Domain Modal under the mode of spatial domain in a manner of Weighted Fusion Sequential video features are merged, and the generic probability of each action classification is obtained, by comparing the corresponding class of everything classification The relative size for belonging to probability, selects the highest action classification of probability value, the action behavior recognition result as objective body.It needs Bright, spatial domain video information is the feature of video spatial domain mode, is each frame picture i.e. RGB information;Time-domain video information is The feature of video time domain mode, be the Optical-flow Feature i.e. light stream picture of video, one video of the feature instantiation of Time-Domain Modal Movable information feature.It is the method extraction spatial domain video information and time-domain video information using dense-flow in the present embodiment 's.
Specifically, the step of Activity recognition model in the present embodiment, the sequential video features for obtaining objective body includes:
Obtain the spatial domain video information and time-domain video information of target volumetric video;Wherein, objective body video include spatial domain and Time domain both modalities which, the present embodiment are the spatial domain video information extracted based on spatial domain neural network under the mode of spatial domain, are based on time domain Neural network extracts the time-domain video information under Time-Domain Modal.
Based on preset characteristic-acquisition method, and according to the spatial domain video information extracted, target volumetric video is obtained in sky Sequential video features under the mode of domain.
Based on the characteristic-acquisition method, and according to the time-domain video information extracted, target volumetric video is obtained in time domain Sequential video features under mode.
Specifically, characteristic-acquisition method is by the depth characteristic coding layer of above-mentioned Activity recognition model come real in this implementation Existing, characteristic-acquisition method is:
The video information of objective body is carried out to take out frame processing, obtains multiple video segment informations;To multiple video segment informations point It is not encoded, obtains the corresponding feature coding of each video segment information, and the feature coding of all video segment informations is merged, Obtain the first global video features;The video information of entire objective body is encoded, the video information pair of the objective body is obtained The the second global video features answered;First global video features and the second global video features are merged, regarding for objective body is obtained The corresponding sequential video features of frequency information.It should be noted that the video information of goal body can be spatial domain video letter Breath, can also be time-domain video information.
Refering to attached drawing 3, Fig. 3 illustrates the key step of the coding method of sequential video features in the present embodiment, As shown in figure 3, carrying out feature coding on last layer of convolution characteristic pattern of BN-Inception, which is sky Domain video information or time-domain video information.Because the embodiment of the present invention is to have carried out the processing of pumping frame to video, in order to embody video The information of different scale constructs the coding layer based on local feature and the coding layer based on global characteristics.Wherein, local feature Coding layer coding method such as formula (1)-(4) shown in:
s(i- > j/4)=max { xi,x(i+1),...,x(j/4)} (1)
s(j/4+1- > j/2)=max { x(j/4+1),x(j/4+2),...,x(j/2)} (2)
s(j/2+1- > 3*j/4)=max { x(j/2+1),x(j/2+2),...,x(3*j/4)} (3)
s(3*j/4+1- > j)=max { x(3*j/4+1),x(3*j/4+2),...,x(j)} (4)
Wherein xiIndicate the feature of video frame, s(i- > j/4),s(j/4+1- > j/2),s(j/2+1- > 3*j/4),s(3*j/4- > j)Indicate every section Feature of the video information after coding here mainly encodes every section of video information using the method in maximum pond, After coding is completed, every section of video information coding is concatenated together into an entirety, i.e., the first global video features.
The coding layer of global characteristics is the coding to entire video information, and the coding method of the coding layer of global characteristics is such as public Shown in formula (5):
s(i- > j)=max { xi,xi+1,...,xj} (5)
Global characteristics coding layer is to carry out maximum pond to all video frame to obtain a global character representation, i.e., the Two global video features.So far, we have been respectively completed the mark sheet of the first global video features and the second global video features Show, the sequential video that the first global video features and the second global video features finally are connected together to obtain objective body is special Sign, this feature indicate that the timing information of video can be embodied.Therefore the similar classification of those backgrounds can be distinguished.
Specifically, include to the network training method of Activity recognition model in this implementation:
Step Sa1:Parameters weighting initialization is carried out to spatial domain neural network and time domain neural network respectively.It is right in this implementation Spatial domain neural network and the method for time domain neural network progress parameters weighting initialization are:It obtains and is previously-completed the of network training The parameters weighting of one neural network, and parameters weighting initialization is carried out to spatial domain neural network according to acquired parameters weighting; The parameters weighting for the nervus opticus network for being previously-completed network training is obtained, and according to acquired parameters weighting to time domain nerve Network carries out parameters weighting initialization.Wherein, first nerves network is to be based on Imagenet data sets, and calculate using machine learning Method carries out the neural network that network training obtains;Nervus opticus network is the TSN that network training is completed using machine learning algorithm Light stream mode neural network in (Temporal Segment Network) network.Activity recognition model passes through in the present embodiment The parameters weighting of trained neural network is loaded, Activity recognition model Fast Convergent can be made in the training process, to save The training time of model.It need to say it is noted that in behavior identification model training process, step Sa1 can also be saved, be based on Training set directly carries out network training to Activity recognition model.
Step Sa2:Obtain the sequential video features of objective body video sample;Specifically, special based on above-mentioned sequential video The coding method of sign obtain respectively under sequential video features and Time-Domain Modal under the spatial domain mode of objective body video sample when Sequence video features.
Step Sa3:Sequential video features acquired in step Sa2, and using machine learning algorithm to the behavior Identification model carries out model training;Specifically, according to object function E shown in the sequential video features of acquisition and formula (6), and Object function E is minimized using back propagation with having supervision, to complete the network training to Activity recognition model:
Wherein, zjFor the corresponding true generic label of j-th of action behavior classification, zjValue be 0 arrive n-1, PjFor jth The corresponding generic probability of a action behavior classification, fj-1(x) it is the corresponding nodal value of j-th of action behavior classification.
Step S102:According to acquired sequential video features, the action behavior classification of each preset objective body is predicted Corresponding generic probability.
Specifically, this implementation is the sequential video features under the mode of spatial domain according to target volumetric video, is predicted each described Corresponding first probability value of action classification;According to sequential video features of the target volumetric video under Time-Domain Modal, predict each dynamic Make corresponding second probability value of classification;First probability value and the second probability value are merged, each action classification is obtained and corresponds to Generic probability.
In the present embodiment, the sequential video features under sequential video features or Time-Domain Modal under the mode of spatial domain exist A probability value can be exported after softmax, then by the way of a kind of fusion of probability level characteristic weighing, is obtained each dynamic Make the corresponding generic probability of classification.Wherein, the fusion of feature is divided into early fusion and late fusion, early Fusion refers to being merged in feature map i.e. characteristic pattern, therefore network mould has been arrived in early fusion actual participations The training of type and test process.And late fusion are referred to when being tested, because different modalities feature is in softmax A probability value can be exported later, probability value need to be only weighted again to judge the classification belonging to current video sample, it is this Amalgamation mode is known as the characteristic weighing fusion based on probability level.It should be noted that this Weighted Fusion it can be appreciated that plus The first obtained probability value and the second probability value, i.e., be weighted, then sum again by power summation respectively, finally obtains each dynamic Make the generic probability of behavior.
Step S103:According to prediction result, the action behavior classification of objective body is determined.
Specifically, by comparing the relative size of the corresponding generic probability of each action behavior classification, probability value is selected most High generic label, the action behavior classification as objective body.
Illustrate the action behavior identification of the objective body of an embodiment of the present invention by taking certain action recognition data set as an example below Method.The data set includes 13000 video clips, belongs to 101 classifications in total, including walking, runs, plays basketball.Often A video only only belongs to some and fixes classification.Obtained model can carry out action classification mark to these videos automatically.
Specifically, the action behavior recognition methods of the objective body of the present embodiment includes the following steps:
Step Sb1, using 9000 video samples in data set as training set, remaining 4000 are used as test set.
Step Sb2, using a BN-Inception network as the basic network of entire frame.Specifically, one is established A sequential video features encoding nerve network (Activity recognition model) that behavior classification judgement can be carried out to video sample, network Task definition be a polytypic problem, the number of plies of the network and every layer of number of nodes are set.The output layer section of the network Point quantity is equal with the quantity of behavior classification for needing to identify, each node corresponds to a kind of classification of behavior, and given one regards Frequency sample v, the output layer output relevant nodal values of class label l are:f1(v),f2(v)...fn(v), wherein f (v) be by when The mapping function of sequence video features encoding nerve net definitions.Shown in the corresponding mapping function of monolayer neural networks such as formula (8):
Wherein, g (v)=1/ (1+e-x) indicate an activation primitive, input x.And a complicated network is by multilayer Simple network is formed by stacking, and then can obtain the expression formula such as (9) institute of sequential video features encoding nerve network output layer Show:
Wherein,For the weight of video temporal aspect encoding nerve network.
Step Sb3, the parameters weighting based on the good neural network of the pre-training on Imagenet data sets are regarded as sequential The initiation parameter weight of frequency feature coding neural network, and the parameter using bottom-up mode loaded floor by floor per layer network Weight.
Step Sb4 utilizes traditional nerve net by minimizing the object function of sequential video features encoding nerve network Network back-propagation algorithm adjusts the weight of sequential video features encoding nerve network.When each accuracy rate no longer rises Learning rate is adjusted, until to the last accuracy rate no longer rises.The object function E of Activity recognition model is that data are true Cross entropy between label and model prediction label, shown in object function E such as formula (6).
Test video sample is inputted trained model, calculates the nodal value of generic, comparison node by step Sb5 The relative size of value determines the classification belonging to test video sample.It should be noted that nodal value is each node of model output layer Output valve.
Specifically, trained model is obtained by step Sb4, to mode input test video v, model can calculate separately Spatial domain mode and Time-Domain Modal correspond to the value of the node of each class label j, wherein the neural network output layer under the mode of spatial domain The value of node isThe value of neural network output node layer is under Time-Domain ModalBy the value of the node of two mode It is weighted fusion according to fixed ratio, obtains the value f of final class label nodej(v), by comparing all nodal values Relative size i.e. we can be by the relative size to probability different classes of belonging to each sample come discriminating test video v Affiliated class label j.
Further, the action identification method embodiment based on above-mentioned objective body, the present invention also provides a kind of storage dresses It sets, a plurality of program can be stored in the storage device, program is suitable for being loaded by processor and being executed such as above-mentioned objective body Action identification method.
Still further, the action identification method embodiment based on above-mentioned objective body, the present invention also provides a kind of processing Device, the processing unit may include processor, storage device;Processor is adapted for carrying out each program;Storage device is suitable for Store a plurality of program;Program is suitable for being loaded by processor and being executed the action identification method such as above-mentioned objective body.
Person of ordinary skill in the field can be understood that for convenience of description and succinctly, the present invention is real Apply the specific work process and related description of the device of example, can refer to previous embodiment method in corresponding process, and with Above method advantageous effect having the same, details are not described herein.
Those skilled in the art should be able to recognize that, side described in conjunction with the examples disclosed in the embodiments of the present disclosure Method step, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate electronic hardware and The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These Function is executed with electronic hardware or software mode actually, depends on the specific application and design constraint of technical solution. Those skilled in the art can use different methods to achieve the described function each specific application, but this reality Now it should not be considered as beyond the scope of the present invention.
Term " first ", " second " etc. are for distinguishing similar object, rather than for describing or indicating specific suitable Sequence or precedence.
Term " comprising " or any other like term are intended to cover non-exclusive inclusion, so that including a system Process, the method for row element include not only those elements, but also include the other elements being not explicitly listed, or further include The intrinsic element of these processes, method.
So far, it has been combined preferred embodiment shown in the drawings and describes technical scheme of the present invention, still, this field Technical staff is it is easily understood that protection scope of the present invention is expressly not limited to these specific implementation modes.Without departing from this Under the premise of the principle of invention, those skilled in the art can make the relevant technologies feature equivalent change or replacement, these Technical solution after change or replacement is fallen within protection scope of the present invention.

Claims (10)

1. a kind of action behavior recognition methods of objective body, which is characterized in that the action behavior recognition methods includes:
Based on the Activity recognition model built in advance, obtain the sequential video features of the objective body, and according to it is acquired when Sequence video features predict the corresponding generic probability of action behavior classification of each preset objective body;
According to prediction result, the action behavior classification of the objective body is determined;
Wherein, the Activity recognition model is based on preset objective body video sample, and using constructed by machine learning algorithm Double-current convolutional neural networks model.
2. the action behavior recognition methods of objective body according to claim 1, which is characterized in that " obtain the objective body Sequential video features " the step of include:
Obtain the spatial domain video information and time-domain video information of the target volumetric video;
Based on preset characteristic-acquisition method, and according to the spatial domain video information, the target volumetric video is obtained in spatial domain mould Sequential video features under state;
Based on the characteristic-acquisition method, and according to the time-domain video information, the target volumetric video is obtained in Time-Domain Modal Under sequential video features.
3. the action behavior recognition methods of objective body according to claim 2, which is characterized in that the characteristic-acquisition method Including:
Particular video information is carried out to take out frame processing, obtains multiple video segment informations;The particular video information is spatial domain video Information or time-domain video information;
Multiple video segment informations are encoded respectively, obtain the corresponding feature coding of each video segment information, and The feature coding of all video segment informations is merged, the first global video features are obtained;
The particular video information is encoded, the corresponding second global video features of the particular video information are obtained;
Described first global video features and the second global video features are merged, obtain the particular video information it is corresponding when Sequence video features.
4. the action behavior recognition methods of objective body according to claim 2, which is characterized in that " according to it is acquired when Sequence video features predict the corresponding generic probability of the preset action behavior classification of each of described objective body " the step of include:
According to sequential video features of the target volumetric video under the mode of spatial domain, each action classification corresponding the is predicted One probability value;
According to sequential video features of the target volumetric video under Time-Domain Modal, each action classification corresponding the is predicted Two probability values;
First probability value and the second probability value are merged, the corresponding generic probability of each action classification is obtained.
5. the action behavior recognition methods of objective body according to claim 4, which is characterized in that " to first probability Value and the second probability value are merged, and the corresponding generic probability of each action classification is obtained " the step of include:
Summation is weighted to first probability value and the second probability value, obtains the generic probability.
6. the action behavior recognition methods of objective body according to any one of claims 1-5, which is characterized in that the row Include spatial domain neural network and time domain neural network for identification model;" based on the Activity recognition model built in advance, obtaining institute The sequential video features of objective body are stated, and according to acquired sequential video features, predicts that each of described objective body is preset Before the step of corresponding generic probability of action behavior classification ", the method further includes:
Parameters weighting initialization is carried out to the spatial domain neural network and time domain neural network respectively;
Obtain the sequential video features of the objective body video sample;
Model instruction is carried out to the Activity recognition model according to acquired sequential video features, and using machine learning algorithm Practice.
7. the action behavior recognition methods of objective body according to claim 6, which is characterized in that " respectively to the spatial domain Neural network and time domain neural network carry out parameters weighting initialization " the step of include:
The parameters weighting for the first nerves network for being previously-completed network training is obtained, and according to acquired parameters weighting to described Spatial domain neural network carries out parameters weighting initialization;
The parameters weighting for the nervus opticus network for being previously-completed network training is obtained, and according to acquired parameters weighting to described Time domain neural network carries out parameters weighting initialization;
Wherein, the first nerves network is to be based on Imagenet data sets, and carry out network using the machine learning algorithm The neural network that training obtains;The nervus opticus network is the TSN nets that network training is completed using the machine learning algorithm Light stream mode neural network in network.
8. the action behavior recognition methods of objective body according to claim 6, which is characterized in that
" model instruction is carried out to the Activity recognition model according to acquired sequential video features, and using machine learning algorithm Practice " the step of include according to the sequential video features and object function E shown in following formula, and utilization machine learning algorithm is to institute It states Activity recognition model and carries out model training:
Wherein, zjFor the corresponding true generic label of j-th of action behavior classification, zjValue be 0 arrive n-1, PjIt is dynamic for j-th Make the corresponding generic probability of behavior classification, fj-1(x) it is the corresponding nodal value of j-th of action behavior classification.
9. a kind of storage device, wherein being stored with a plurality of program, which is characterized in that described program is suitable for being loaded and being held by processor Row is to realize the action behavior recognition methods of the objective body described in any one of claim 1-8.
10. a kind of control device, including:
Processor is adapted for carrying out each program;
Storage device is suitable for storing a plurality of program;
It is characterized in that, described program is suitable for being loaded by processor and being executed to realize described in any one of claim 1-8 The action behavior recognition methods of objective body.
CN201810455262.5A 2018-05-14 2018-05-14 The action behavior recognition methods of objective body and device Pending CN108629326A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810455262.5A CN108629326A (en) 2018-05-14 2018-05-14 The action behavior recognition methods of objective body and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810455262.5A CN108629326A (en) 2018-05-14 2018-05-14 The action behavior recognition methods of objective body and device

Publications (1)

Publication Number Publication Date
CN108629326A true CN108629326A (en) 2018-10-09

Family

ID=63693105

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810455262.5A Pending CN108629326A (en) 2018-05-14 2018-05-14 The action behavior recognition methods of objective body and device

Country Status (1)

Country Link
CN (1) CN108629326A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670446A (en) * 2018-12-20 2019-04-23 泉州装备制造研究所 Anomaly detection method based on linear dynamic system and depth network
CN109685801A (en) * 2018-12-10 2019-04-26 杭州帝视科技有限公司 In conjunction with the skin lens image processing method of textural characteristics and deep neural network information
CN110046278A (en) * 2019-03-11 2019-07-23 北京奇艺世纪科技有限公司 Video classification methods, device, terminal device and storage medium
CN110110651A (en) * 2019-04-29 2019-08-09 齐鲁工业大学 Activity recognition method in video based on space-time importance and 3D CNN
CN110210454A (en) * 2019-06-17 2019-09-06 合肥工业大学 A kind of human action pre-judging method based on data fusion
CN110232327A (en) * 2019-05-21 2019-09-13 浙江师范大学 A kind of driving fatigue detection method based on trapezoidal concatenated convolutional neural network
CN110287789A (en) * 2019-05-23 2019-09-27 北京百度网讯科技有限公司 Game video classification method and system based on internet data
CN110443182A (en) * 2019-07-30 2019-11-12 深圳市博铭维智能科技有限公司 A kind of urban discharging pipeline video abnormality detection method based on more case-based learnings
CN110458038A (en) * 2019-07-19 2019-11-15 天津理工大学 The cross-domain action identification method of small data based on double-strand depth binary-flow network
CN110738129A (en) * 2019-09-20 2020-01-31 华中科技大学 end-to-end video time sequence behavior detection method based on R-C3D network
CN110751034A (en) * 2019-09-16 2020-02-04 平安科技(深圳)有限公司 Pedestrian behavior identification method and terminal equipment
CN111242007A (en) * 2020-01-10 2020-06-05 上海市崇明区生态农业科创中心 Farming behavior supervision method
CN111382616A (en) * 2018-12-28 2020-07-07 广州市百果园信息技术有限公司 Video classification method and device, storage medium and computer equipment
CN112489092A (en) * 2020-12-09 2021-03-12 浙江中控技术股份有限公司 Fine-grained industrial motion mode classification method, storage medium, equipment and device
CN112651330A (en) * 2020-12-23 2021-04-13 平安银行股份有限公司 Target object behavior detection method and device and computer equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709461A (en) * 2016-12-28 2017-05-24 中国科学院深圳先进技术研究院 Video based behavior recognition method and device
US20170294091A1 (en) * 2016-04-06 2017-10-12 Nec Laboratories America, Inc. Video-based action recognition security system
CN107862376A (en) * 2017-10-30 2018-03-30 中山大学 A kind of human body image action identification method based on double-current neutral net

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170294091A1 (en) * 2016-04-06 2017-10-12 Nec Laboratories America, Inc. Video-based action recognition security system
CN106709461A (en) * 2016-12-28 2017-05-24 中国科学院深圳先进技术研究院 Video based behavior recognition method and device
CN107862376A (en) * 2017-10-30 2018-03-30 中山大学 A kind of human body image action identification method based on double-current neutral net

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIAGANG ZHU等: ""End-to-end Video-level Representation Learning for Action Recognition"", 《ARXIV》 *
LIMIN WANG等: ""Temporal Segment Networks: Towards Good Practices for Deep Action Recognition"", 《ARXIV》 *

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109685801A (en) * 2018-12-10 2019-04-26 杭州帝视科技有限公司 In conjunction with the skin lens image processing method of textural characteristics and deep neural network information
CN109685801B (en) * 2018-12-10 2021-03-26 杭州帝视科技有限公司 Skin mirror image processing method combining texture features and deep neural network information
CN109670446A (en) * 2018-12-20 2019-04-23 泉州装备制造研究所 Anomaly detection method based on linear dynamic system and depth network
CN109670446B (en) * 2018-12-20 2022-09-13 泉州装备制造研究所 Abnormal behavior detection method based on linear dynamic system and deep network
CN111382616A (en) * 2018-12-28 2020-07-07 广州市百果园信息技术有限公司 Video classification method and device, storage medium and computer equipment
CN111382616B (en) * 2018-12-28 2023-08-18 广州市百果园信息技术有限公司 Video classification method and device, storage medium and computer equipment
CN110046278A (en) * 2019-03-11 2019-07-23 北京奇艺世纪科技有限公司 Video classification methods, device, terminal device and storage medium
CN110110651A (en) * 2019-04-29 2019-08-09 齐鲁工业大学 Activity recognition method in video based on space-time importance and 3D CNN
CN110110651B (en) * 2019-04-29 2023-06-13 齐鲁工业大学 Method for identifying behaviors in video based on space-time importance and 3D CNN
CN110232327A (en) * 2019-05-21 2019-09-13 浙江师范大学 A kind of driving fatigue detection method based on trapezoidal concatenated convolutional neural network
CN110287789A (en) * 2019-05-23 2019-09-27 北京百度网讯科技有限公司 Game video classification method and system based on internet data
CN110210454A (en) * 2019-06-17 2019-09-06 合肥工业大学 A kind of human action pre-judging method based on data fusion
CN110210454B (en) * 2019-06-17 2020-12-29 合肥工业大学 Human body action pre-judging method based on data fusion
CN110458038A (en) * 2019-07-19 2019-11-15 天津理工大学 The cross-domain action identification method of small data based on double-strand depth binary-flow network
CN110443182A (en) * 2019-07-30 2019-11-12 深圳市博铭维智能科技有限公司 A kind of urban discharging pipeline video abnormality detection method based on more case-based learnings
CN110751034A (en) * 2019-09-16 2020-02-04 平安科技(深圳)有限公司 Pedestrian behavior identification method and terminal equipment
CN110751034B (en) * 2019-09-16 2023-09-01 平安科技(深圳)有限公司 Pedestrian behavior recognition method and terminal equipment
CN110738129A (en) * 2019-09-20 2020-01-31 华中科技大学 end-to-end video time sequence behavior detection method based on R-C3D network
CN110738129B (en) * 2019-09-20 2022-08-05 华中科技大学 End-to-end video time sequence behavior detection method based on R-C3D network
CN111242007A (en) * 2020-01-10 2020-06-05 上海市崇明区生态农业科创中心 Farming behavior supervision method
CN112489092A (en) * 2020-12-09 2021-03-12 浙江中控技术股份有限公司 Fine-grained industrial motion mode classification method, storage medium, equipment and device
CN112489092B (en) * 2020-12-09 2023-10-31 浙江中控技术股份有限公司 Fine-grained industrial motion modality classification method, storage medium, device and apparatus
CN112651330A (en) * 2020-12-23 2021-04-13 平安银行股份有限公司 Target object behavior detection method and device and computer equipment
CN112651330B (en) * 2020-12-23 2023-11-24 平安银行股份有限公司 Target object behavior detection method and device and computer equipment

Similar Documents

Publication Publication Date Title
CN108629326A (en) The action behavior recognition methods of objective body and device
CN111881705B (en) Data processing, training and identifying method, device and storage medium
CN107391703B (en) The method for building up and system of image library, image library and image classification method
CN109902798A (en) The training method and device of deep neural network
CN109299657B (en) Group behavior identification method and device based on semantic attention retention mechanism
CN108229268A (en) Expression Recognition and convolutional neural networks model training method, device and electronic equipment
CN109446927B (en) Double-person interaction behavior identification method based on priori knowledge
CN110490177A (en) A kind of human-face detector training method and device
CN110502988A (en) Group positioning and anomaly detection method in video
CN107818302A (en) Non-rigid multiple dimensioned object detecting method based on convolutional neural networks
CN109325547A (en) Non-motor vehicle image multi-tag classification method, system, equipment and storage medium
CN108229267A (en) Object properties detection, neural metwork training, method for detecting area and device
CN106951825A (en) A kind of quality of human face image assessment system and implementation method
CN108664893A (en) A kind of method for detecting human face and storage medium
CN108961245A (en) Picture quality classification method based on binary channels depth parallel-convolution network
CN110083700A (en) A kind of enterprise's public sentiment sensibility classification method and system based on convolutional neural networks
CN110478883B (en) Body-building action teaching and correcting system and method
CN107506761A (en) Brain image dividing method and system based on notable inquiry learning convolutional neural networks
CN104933428B (en) A kind of face identification method and device based on tensor description
CN110413838A (en) A kind of unsupervised video frequency abstract model and its method for building up
CN107480642A (en) A kind of video actions recognition methods based on Time Domain Piecewise network
CN106326857A (en) Gender identification method and gender identification device based on face image
CN107122736A (en) A kind of human body based on deep learning is towards Forecasting Methodology and device
CN110097029B (en) Identity authentication method based on high way network multi-view gait recognition
CN109829959A (en) Expression edition method and device based on face parsing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181009

RJ01 Rejection of invention patent application after publication