CN108629326A

CN108629326A - The action behavior recognition methods of objective body and device

Info

Publication number: CN108629326A
Application number: CN201810455262.5A
Authority: CN
Inventors: 王亮; 张兆翔; 黄岩; 李林
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2018-05-14
Filing date: 2018-05-14
Publication date: 2018-10-09

Abstract

The invention belongs to technical field of computer vision, and in particular to a kind of action behavior recognition methods of objective body and device, it is intended to the technical issues of how solution accurately identifies action behavior in the video with similar background.For this purpose, the action behavior recognition methods of objective body includes in the present invention：Based on the Activity recognition model built in advance, the sequential video features of objective body are obtained, and according to acquired sequential video features, predict the corresponding generic probability of action behavior classification of each preset objective body；According to prediction result, the action behavior classification of objective body is determined.Based on the method for the present invention, the feature of video entirety can be captured well, can be good at identifying with action behavior in similar background and confusing video with this.

Description

The action behavior recognition methods of objective body and device

Technical field

The present invention relates to technical field of computer vision, and in particular to a kind of action behavior recognition methods of objective body and dress It sets.

Background technology

Human action Activity recognition technology is widely used in the fields such as human-computer intellectualization, Virtual Realization and video monitoring, It can be distinguished and judge to action behavior of the people below different scenes.Traditional action behavior recognition methods, such as base In the action behavior recognition methods of double-current convolutional neural networks, mainly by extracting and analyzing video features come identification maneuver row For.

Action behavior recognition methods based on double-current convolutional neural networks mainly includes the following steps：First, video is torn open It is divided into spatially and temporally both modalities which, and the data of both modalities is respectively processed.Secondly, to treated two kinds of moulds State data carry out Fusion Features.Finally, the action behavior class label corresponding to current video is judged according to Fusion Features result. Although the action behavior classification corresponding to video can be recognized accurately in this action behavior recognition methods, it is often utilized The single frame information of video to double-current convolutional neural networks carry out network training (local message that can only learn video), therefore Also the local feature of video can only be extracted when extracting video features.When to the video with similar background (such as play ball and slamdunk) into When row action recognition, it is impossible to action behavior classification be recognized accurately.

Invention content

In order to solve the above problem in the prior art, in order to solve how to accurately identify the video with similar background The technical issues of middle action behavior.For this purpose, the first aspect of the present invention, provides a kind of action behavior identification of objective body Method, the action behavior recognition methods include：

Based on the Activity recognition model built in advance, the sequential video features of the objective body are obtained, and according to acquired Sequential video features, predict the corresponding generic probability of action behavior classification of each preset objective body；

According to prediction result, the action behavior classification of the objective body is determined；

Wherein, the Activity recognition model is based on preset objective body video sample, and utilization machine learning algorithm institute The double-current convolutional neural networks model of structure.

Further, an optimal technical scheme provided by the invention is：

The step of " the sequential video features for obtaining the objective body " includes：

Obtain the spatial domain video information and time-domain video information of the target volumetric video；

Based on preset characteristic-acquisition method, and according to the spatial domain video information, the target volumetric video is obtained in sky Sequential video features under the mode of domain；

Based on the characteristic-acquisition method, and according to the time-domain video information, the target volumetric video is obtained in time domain Sequential video features under mode.

Further, an optimal technical scheme provided by the invention is：

The characteristic-acquisition method includes：

Particular video information is carried out to take out frame processing, obtains multiple video segment informations；The particular video information is spatial domain Video information or time-domain video information；

Multiple video segment informations are encoded respectively, the corresponding feature of each video segment information is obtained and compiles Code, and the feature coding of all video segment informations is merged, obtain the first global video features；

The particular video information is encoded, the corresponding second global video features of the particular video information are obtained；

Described first global video features and the second global video features are merged, the particular video information is obtained and corresponds to Sequential video features.

Further, an optimal technical scheme provided by the invention is：

" according to acquired sequential video features, predict that the preset action behavior classification of each of described objective body corresponds to Generic probability " the step of include：

According to sequential video features of the target volumetric video under the mode of spatial domain, each action classification of prediction corresponds to The first probability value；

According to sequential video features of the target volumetric video under Time-Domain Modal, each action classification of prediction corresponds to The second probability value；

First probability value and the second probability value are merged, it is general to obtain the corresponding generic of each action classification Rate.

Further, an optimal technical scheme provided by the invention is：

" first probability value and the second probability value are merged, the corresponding generic of each action classification is obtained The step of probability " includes：

Summation is weighted to first probability value and the second probability value, obtains the generic probability.

Further, an optimal technical scheme provided by the invention is：

The Activity recognition model includes spatial domain neural network and time domain neural network；" based on the behavior built in advance Identification model obtains the sequential video features of the objective body, and according to acquired sequential video features, predicts the target Before the step of corresponding generic probability of the preset action behavior classification of each of body ", the method further includes：

Parameters weighting initialization is carried out to the spatial domain neural network and time domain neural network respectively；

Obtain the sequential video features of the objective body video sample；

Model is carried out to the Activity recognition model according to acquired sequential video features, and using machine learning algorithm Training.

Further, an optimal technical scheme provided by the invention is：

The step of " carrying out parameters weighting initialization to the spatial domain neural network and time domain neural network respectively " includes：

The parameters weighting for the first nerves network for being previously-completed network training is obtained, and according to acquired parameters weighting pair The spatial domain neural network carries out parameters weighting initialization；

The parameters weighting for the nervus opticus network for being previously-completed network training is obtained, and according to acquired parameters weighting pair The time domain neural network carries out parameters weighting initialization；

Wherein, the first nerves network is to be based on Imagenet data sets, and carry out using the machine learning algorithm The neural network that network training obtains；The nervus opticus network completes network training using the machine learning algorithm Light stream mode neural network in TSN networks.

Further, an optimal technical scheme provided by the invention is：

" mould is carried out to the Activity recognition model according to acquired sequential video features, and using machine learning algorithm The step of type training " include according to the sequential video features and object function E shown in following formula, and utilize machine learning algorithm Model training is carried out to the Activity recognition model：

Wherein, z_jFor the corresponding true generic label of j-th of action behavior classification, z_jValue be 0 arrive n-1, P_jFor jth The corresponding generic probability of a action behavior classification, f_j-1(x) it is the corresponding nodal value of j-th of action behavior classification.

The second aspect of the present invention additionally provides a kind of storage device, wherein being stored with a plurality of program, described program is suitable for It is loaded by processor and is executed to realize the action behavior recognition methods of above-mentioned objective body.

The third aspect of the present invention additionally provides a kind of control device, including：

Processor is adapted for carrying out each program；

Storage device is suitable for storing a plurality of program；

Described program is suitable for being loaded by processor and being executed to realize the action behavior recognition methods of above-mentioned objective body.

Compared with the immediate prior art, above-mentioned technical proposal at least has the advantages that：

In the inventive solutions, the sequential video features of objective body are obtained by Activity recognition model, and according to The sequential video features predict that the action behavior classification of objective body, this method can be good at one video entirety of capture Feature can identify the similar action behavior classification of background and confusing action behavior classification well；The present invention In sequential video features acquisition methods can extract sequential video features, this feature can embody the video letter of different scale Breath, the similar action behavior classification of which background can be preferably distinguished based on this.

Description of the drawings

Fig. 1 is a kind of key step schematic diagram of the action behavior recognition methods of objective body in the embodiment of the present invention；

Fig. 2 is a kind of primary structure schematic diagram of Activity recognition model in the embodiment of the present invention；

Fig. 3 is a kind of key step schematic diagram of the coding method of sequential video features in the embodiment of the present invention.

Specific implementation mode

The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this A little embodiments are used only for explaining the technical principle of the present invention, it is not intended that limit the scope of the invention.

That compares mainstream at present carries out action behavior knowledge method for distinguishing based on deep neural network, is first to split into video Two mode, respectively spatially and temporally both modalities which, are respectively processed later, and probability is carried out in the last output end of network The Fusion Features of level carry out the last class label judged corresponding to a video, such as based on double fluids such as two-stream, TSN The method of network.But the overwhelming majority based on binary-flow network method be all built upon frame level not on feature, such as two-stream It is the input of single frames while the test of single frames when training, even if TSN is the input of one section of video when training, network Also there are carry out Fusion Features when being trained, but it is also only to be merged to the feature of single frames to merge, and is not had completely The information for having the information for the sequential for considering that video is included even whole.This network is only phenomenologically to have done a field Scape is classified, so can be very good to distinguish for this kind of action of swimming and play football.But if encounter the similar classification ratio of background As shot and slamdunking, most binary-flow network methods are all indistinguishable.

It to solve the above-mentioned problems, can be extensive the invention discloses a kind of action behavior recognition methods of objective body For the behavior classification problem below natural scene.This method is using deep neural network come the video sample to different behavior classifications Originally judgement is distinguished, still can ensure higher recognition accuracy in large-scale video data concentration.

Below in conjunction with the accompanying drawings, the action behavior recognition methods of objective body provided by the invention is illustrated.

Refering to attached drawing 1, Fig. 1 illustrates a kind of the main of the action behavior recognition methods of objective body in the present embodiment Step, as shown in Figure 1, the action behavior recognition methods of objective body may include following the description in the present embodiment：

Step S101：Based on the Activity recognition model built in advance, the sequential video features of objective body are obtained.

The Activity recognition model built in advance in this implementation be based on preset objective body video sample, and utilize engineering Practise the double-current convolutional neural networks model constructed by algorithm.And the video of objective body is broken down into spatially and temporally two mode.

Refering to attached drawing 2, attached drawing 2 illustrates the primary structure of Activity recognition model in this implementation, as shown in Fig. 2, Activity recognition model in this implementation is a double-current convolutional neural networks, and that basic network is selected is BN-Inception. Activity recognition model includes spatial domain neural network and time domain neural network, wherein spatial domain neural network is first to extract the sky of video Domain video information, and temporal aspect coding is carried out to spatial domain video information by a depth characteristic coding layer, obtain objective body Sequential video features of the video under the mode of spatial domain.Similarly, time domain neural network is the time-domain video information of first extraction video, and Temporal aspect coding is carried out to time domain video information by a depth characteristic coding layer, obtains target volumetric video in Time-Domain Modal Under sequential video features.It finally will be under the sequential video features and Time-Domain Modal under the mode of spatial domain in a manner of Weighted Fusion Sequential video features are merged, and the generic probability of each action classification is obtained, by comparing the corresponding class of everything classification The relative size for belonging to probability, selects the highest action classification of probability value, the action behavior recognition result as objective body.It needs Bright, spatial domain video information is the feature of video spatial domain mode, is each frame picture i.e. RGB information；Time-domain video information is The feature of video time domain mode, be the Optical-flow Feature i.e. light stream picture of video, one video of the feature instantiation of Time-Domain Modal Movable information feature.It is the method extraction spatial domain video information and time-domain video information using dense-flow in the present embodiment 's.

Specifically, the step of Activity recognition model in the present embodiment, the sequential video features for obtaining objective body includes：

Obtain the spatial domain video information and time-domain video information of target volumetric video；Wherein, objective body video include spatial domain and Time domain both modalities which, the present embodiment are the spatial domain video information extracted based on spatial domain neural network under the mode of spatial domain, are based on time domain Neural network extracts the time-domain video information under Time-Domain Modal.

Based on preset characteristic-acquisition method, and according to the spatial domain video information extracted, target volumetric video is obtained in sky Sequential video features under the mode of domain.

Based on the characteristic-acquisition method, and according to the time-domain video information extracted, target volumetric video is obtained in time domain Sequential video features under mode.

Specifically, characteristic-acquisition method is by the depth characteristic coding layer of above-mentioned Activity recognition model come real in this implementation Existing, characteristic-acquisition method is：

The video information of objective body is carried out to take out frame processing, obtains multiple video segment informations；To multiple video segment informations point It is not encoded, obtains the corresponding feature coding of each video segment information, and the feature coding of all video segment informations is merged, Obtain the first global video features；The video information of entire objective body is encoded, the video information pair of the objective body is obtained The the second global video features answered；First global video features and the second global video features are merged, regarding for objective body is obtained The corresponding sequential video features of frequency information.It should be noted that the video information of goal body can be spatial domain video letter Breath, can also be time-domain video information.

Refering to attached drawing 3, Fig. 3 illustrates the key step of the coding method of sequential video features in the present embodiment, As shown in figure 3, carrying out feature coding on last layer of convolution characteristic pattern of BN-Inception, which is sky Domain video information or time-domain video information.Because the embodiment of the present invention is to have carried out the processing of pumping frame to video, in order to embody video The information of different scale constructs the coding layer based on local feature and the coding layer based on global characteristics.Wherein, local feature Coding layer coding method such as formula (1)-(4) shown in：

s_{(i- ＞ j/4)}=max { x_i,x_(i+1),...,x_(j/4)} (1)

s_{(j/4+1- ＞ j/2)}=max { x_(j/4+1),x_(j/4+2),...,x_(j/2)} (2)

s_{(j/2+1- ＞ 3*j/4)}=max { x_(j/2+1),x_(j/2+2),...,x_(3*j/4)} (3)

s_{(3*j/4+1- ＞ j)}=max { x_(3*j/4+1),x_(3*j/4+2),...,x_(j)} (4)

Wherein x_iIndicate the feature of video frame, s_{(i- ＞ j/4)},s_{(j/4+1- ＞ j/2)},s_{(j/2+1- ＞ 3*j/4)},s_{(3*j/4- ＞ j)}Indicate every section Feature of the video information after coding here mainly encodes every section of video information using the method in maximum pond, After coding is completed, every section of video information coding is concatenated together into an entirety, i.e., the first global video features.

The coding layer of global characteristics is the coding to entire video information, and the coding method of the coding layer of global characteristics is such as public Shown in formula (5)：

s_{(i- ＞ j)}=max { x_i,x_i+1,...,x_j} (5)

Global characteristics coding layer is to carry out maximum pond to all video frame to obtain a global character representation, i.e., the Two global video features.So far, we have been respectively completed the mark sheet of the first global video features and the second global video features Show, the sequential video that the first global video features and the second global video features finally are connected together to obtain objective body is special Sign, this feature indicate that the timing information of video can be embodied.Therefore the similar classification of those backgrounds can be distinguished.

Specifically, include to the network training method of Activity recognition model in this implementation：

Step Sa1：Parameters weighting initialization is carried out to spatial domain neural network and time domain neural network respectively.It is right in this implementation Spatial domain neural network and the method for time domain neural network progress parameters weighting initialization are：It obtains and is previously-completed the of network training The parameters weighting of one neural network, and parameters weighting initialization is carried out to spatial domain neural network according to acquired parameters weighting； The parameters weighting for the nervus opticus network for being previously-completed network training is obtained, and according to acquired parameters weighting to time domain nerve Network carries out parameters weighting initialization.Wherein, first nerves network is to be based on Imagenet data sets, and calculate using machine learning Method carries out the neural network that network training obtains；Nervus opticus network is the TSN that network training is completed using machine learning algorithm Light stream mode neural network in (Temporal Segment Network) network.Activity recognition model passes through in the present embodiment The parameters weighting of trained neural network is loaded, Activity recognition model Fast Convergent can be made in the training process, to save The training time of model.It need to say it is noted that in behavior identification model training process, step Sa1 can also be saved, be based on Training set directly carries out network training to Activity recognition model.

Step Sa2：Obtain the sequential video features of objective body video sample；Specifically, special based on above-mentioned sequential video The coding method of sign obtain respectively under sequential video features and Time-Domain Modal under the spatial domain mode of objective body video sample when Sequence video features.

Step Sa3：Sequential video features acquired in step Sa2, and using machine learning algorithm to the behavior Identification model carries out model training；Specifically, according to object function E shown in the sequential video features of acquisition and formula (6), and Object function E is minimized using back propagation with having supervision, to complete the network training to Activity recognition model：

Step S102：According to acquired sequential video features, the action behavior classification of each preset objective body is predicted Corresponding generic probability.

Specifically, this implementation is the sequential video features under the mode of spatial domain according to target volumetric video, is predicted each described Corresponding first probability value of action classification；According to sequential video features of the target volumetric video under Time-Domain Modal, predict each dynamic Make corresponding second probability value of classification；First probability value and the second probability value are merged, each action classification is obtained and corresponds to Generic probability.

In the present embodiment, the sequential video features under sequential video features or Time-Domain Modal under the mode of spatial domain exist A probability value can be exported after softmax, then by the way of a kind of fusion of probability level characteristic weighing, is obtained each dynamic Make the corresponding generic probability of classification.Wherein, the fusion of feature is divided into early fusion and late fusion, early Fusion refers to being merged in feature map i.e. characteristic pattern, therefore network mould has been arrived in early fusion actual participations The training of type and test process.And late fusion are referred to when being tested, because different modalities feature is in softmax A probability value can be exported later, probability value need to be only weighted again to judge the classification belonging to current video sample, it is this Amalgamation mode is known as the characteristic weighing fusion based on probability level.It should be noted that this Weighted Fusion it can be appreciated that plus The first obtained probability value and the second probability value, i.e., be weighted, then sum again by power summation respectively, finally obtains each dynamic Make the generic probability of behavior.

Step S103：According to prediction result, the action behavior classification of objective body is determined.

Specifically, by comparing the relative size of the corresponding generic probability of each action behavior classification, probability value is selected most High generic label, the action behavior classification as objective body.

Illustrate the action behavior identification of the objective body of an embodiment of the present invention by taking certain action recognition data set as an example below Method.The data set includes 13000 video clips, belongs to 101 classifications in total, including walking, runs, plays basketball.Often A video only only belongs to some and fixes classification.Obtained model can carry out action classification mark to these videos automatically.

Specifically, the action behavior recognition methods of the objective body of the present embodiment includes the following steps：

Step Sb1, using 9000 video samples in data set as training set, remaining 4000 are used as test set.

Step Sb2, using a BN-Inception network as the basic network of entire frame.Specifically, one is established A sequential video features encoding nerve network (Activity recognition model) that behavior classification judgement can be carried out to video sample, network Task definition be a polytypic problem, the number of plies of the network and every layer of number of nodes are set.The output layer section of the network Point quantity is equal with the quantity of behavior classification for needing to identify, each node corresponds to a kind of classification of behavior, and given one regards Frequency sample v, the output layer output relevant nodal values of class label l are：f₁(v),f₂(v)...f_n(v), wherein f (v) be by when The mapping function of sequence video features encoding nerve net definitions.Shown in the corresponding mapping function of monolayer neural networks such as formula (8)：

Wherein, g (v)=1/ (1+e^-x) indicate an activation primitive, input x.And a complicated network is by multilayer Simple network is formed by stacking, and then can obtain the expression formula such as (9) institute of sequential video features encoding nerve network output layer Show：

Wherein,For the weight of video temporal aspect encoding nerve network.

Step Sb3, the parameters weighting based on the good neural network of the pre-training on Imagenet data sets are regarded as sequential The initiation parameter weight of frequency feature coding neural network, and the parameter using bottom-up mode loaded floor by floor per layer network Weight.

Step Sb4 utilizes traditional nerve net by minimizing the object function of sequential video features encoding nerve network Network back-propagation algorithm adjusts the weight of sequential video features encoding nerve network.When each accuracy rate no longer rises Learning rate is adjusted, until to the last accuracy rate no longer rises.The object function E of Activity recognition model is that data are true Cross entropy between label and model prediction label, shown in object function E such as formula (6).

Test video sample is inputted trained model, calculates the nodal value of generic, comparison node by step Sb5 The relative size of value determines the classification belonging to test video sample.It should be noted that nodal value is each node of model output layer Output valve.

Specifically, trained model is obtained by step Sb4, to mode input test video v, model can calculate separately Spatial domain mode and Time-Domain Modal correspond to the value of the node of each class label j, wherein the neural network output layer under the mode of spatial domain The value of node isThe value of neural network output node layer is under Time-Domain ModalBy the value of the node of two mode It is weighted fusion according to fixed ratio, obtains the value f of final class label node_j(v), by comparing all nodal values Relative size i.e. we can be by the relative size to probability different classes of belonging to each sample come discriminating test video v Affiliated class label j.

Further, the action identification method embodiment based on above-mentioned objective body, the present invention also provides a kind of storage dresses It sets, a plurality of program can be stored in the storage device, program is suitable for being loaded by processor and being executed such as above-mentioned objective body Action identification method.

Still further, the action identification method embodiment based on above-mentioned objective body, the present invention also provides a kind of processing Device, the processing unit may include processor, storage device；Processor is adapted for carrying out each program；Storage device is suitable for Store a plurality of program；Program is suitable for being loaded by processor and being executed the action identification method such as above-mentioned objective body.

Person of ordinary skill in the field can be understood that for convenience of description and succinctly, the present invention is real Apply the specific work process and related description of the device of example, can refer to previous embodiment method in corresponding process, and with Above method advantageous effect having the same, details are not described herein.

Those skilled in the art should be able to recognize that, side described in conjunction with the examples disclosed in the embodiments of the present disclosure Method step, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate electronic hardware and The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These Function is executed with electronic hardware or software mode actually, depends on the specific application and design constraint of technical solution. Those skilled in the art can use different methods to achieve the described function each specific application, but this reality Now it should not be considered as beyond the scope of the present invention.

Term " first ", " second " etc. are for distinguishing similar object, rather than for describing or indicating specific suitable Sequence or precedence.

Term " comprising " or any other like term are intended to cover non-exclusive inclusion, so that including a system Process, the method for row element include not only those elements, but also include the other elements being not explicitly listed, or further include The intrinsic element of these processes, method.

So far, it has been combined preferred embodiment shown in the drawings and describes technical scheme of the present invention, still, this field Technical staff is it is easily understood that protection scope of the present invention is expressly not limited to these specific implementation modes.Without departing from this Under the premise of the principle of invention, those skilled in the art can make the relevant technologies feature equivalent change or replacement, these Technical solution after change or replacement is fallen within protection scope of the present invention.

Claims

1. a kind of action behavior recognition methods of objective body, which is characterized in that the action behavior recognition methods includes：

Based on the Activity recognition model built in advance, obtain the sequential video features of the objective body, and according to it is acquired when Sequence video features predict the corresponding generic probability of action behavior classification of each preset objective body；

Wherein, the Activity recognition model is based on preset objective body video sample, and using constructed by machine learning algorithm Double-current convolutional neural networks model.

2. the action behavior recognition methods of objective body according to claim 1, which is characterized in that " obtain the objective body Sequential video features " the step of include：

Based on preset characteristic-acquisition method, and according to the spatial domain video information, the target volumetric video is obtained in spatial domain mould Sequential video features under state；

Based on the characteristic-acquisition method, and according to the time-domain video information, the target volumetric video is obtained in Time-Domain Modal Under sequential video features.

3. the action behavior recognition methods of objective body according to claim 2, which is characterized in that the characteristic-acquisition method Including：

Multiple video segment informations are encoded respectively, obtain the corresponding feature coding of each video segment information, and The feature coding of all video segment informations is merged, the first global video features are obtained；

Described first global video features and the second global video features are merged, obtain the particular video information it is corresponding when Sequence video features.

4. the action behavior recognition methods of objective body according to claim 2, which is characterized in that " according to it is acquired when Sequence video features predict the corresponding generic probability of the preset action behavior classification of each of described objective body " the step of include：

According to sequential video features of the target volumetric video under the mode of spatial domain, each action classification corresponding the is predicted One probability value；

According to sequential video features of the target volumetric video under Time-Domain Modal, each action classification corresponding the is predicted Two probability values；

First probability value and the second probability value are merged, the corresponding generic probability of each action classification is obtained.

5. the action behavior recognition methods of objective body according to claim 4, which is characterized in that " to first probability Value and the second probability value are merged, and the corresponding generic probability of each action classification is obtained " the step of include：

6. the action behavior recognition methods of objective body according to any one of claims 1-5, which is characterized in that the row Include spatial domain neural network and time domain neural network for identification model；" based on the Activity recognition model built in advance, obtaining institute The sequential video features of objective body are stated, and according to acquired sequential video features, predicts that each of described objective body is preset Before the step of corresponding generic probability of action behavior classification ", the method further includes：

Obtain the sequential video features of the objective body video sample；

Model instruction is carried out to the Activity recognition model according to acquired sequential video features, and using machine learning algorithm Practice.

7. the action behavior recognition methods of objective body according to claim 6, which is characterized in that " respectively to the spatial domain Neural network and time domain neural network carry out parameters weighting initialization " the step of include：

The parameters weighting for the first nerves network for being previously-completed network training is obtained, and according to acquired parameters weighting to described Spatial domain neural network carries out parameters weighting initialization；

The parameters weighting for the nervus opticus network for being previously-completed network training is obtained, and according to acquired parameters weighting to described Time domain neural network carries out parameters weighting initialization；

Wherein, the first nerves network is to be based on Imagenet data sets, and carry out network using the machine learning algorithm The neural network that training obtains；The nervus opticus network is the TSN nets that network training is completed using the machine learning algorithm Light stream mode neural network in network.

8. the action behavior recognition methods of objective body according to claim 6, which is characterized in that

" model instruction is carried out to the Activity recognition model according to acquired sequential video features, and using machine learning algorithm Practice " the step of include according to the sequential video features and object function E shown in following formula, and utilization machine learning algorithm is to institute It states Activity recognition model and carries out model training：

Wherein, z_jFor the corresponding true generic label of j-th of action behavior classification, z_jValue be 0 arrive n-1, P_jIt is dynamic for j-th Make the corresponding generic probability of behavior classification, f_j-1(x) it is the corresponding nodal value of j-th of action behavior classification.

9. a kind of storage device, wherein being stored with a plurality of program, which is characterized in that described program is suitable for being loaded and being held by processor Row is to realize the action behavior recognition methods of the objective body described in any one of claim 1-8.

10. a kind of control device, including：

Processor is adapted for carrying out each program；

Storage device is suitable for storing a plurality of program；

It is characterized in that, described program is suitable for being loaded by processor and being executed to realize described in any one of claim 1-8 The action behavior recognition methods of objective body.