CN105740773A - Deep learning and multi-scale information based behavior identification method - Google Patents

Deep learning and multi-scale information based behavior identification method Download PDF

Info

Publication number
CN105740773A
CN105740773A CN201610047682.0A CN201610047682A CN105740773A CN 105740773 A CN105740773 A CN 105740773A CN 201610047682 A CN201610047682 A CN 201610047682A CN 105740773 A CN105740773 A CN 105740773A
Authority
CN
China
Prior art keywords
video
seg
coarseness
frequency band
degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610047682.0A
Other languages
Chinese (zh)
Other versions
CN105740773B (en
Inventor
刘智
冯欣
张�杰
张杰慧
张凌
黄智勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Technology
Original Assignee
Chongqing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Technology filed Critical Chongqing University of Technology
Priority to CN201610047682.0A priority Critical patent/CN105740773B/en
Publication of CN105740773A publication Critical patent/CN105740773A/en
Application granted granted Critical
Publication of CN105740773B publication Critical patent/CN105740773B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a deep learning and multi-scale information based behavior identification method. The method comprises the steps of constructing a plurality of deep networks to form a parallel structure for researching human body behavior identification of a deep video; and splitting the deep video into a plurality of video segments at first, then performing learning by using parallel branch neural networks, performing fusion connection on high level representations obtained by learning through the branch neural networks, and finally transmitting the high level representations after fusion into a full connection layer and a classification layer for classification and identification. The behavior identification can be effectively carried out by using a deep learning method, and especially when behavior actions are greatly different, the identification rate can be remarkably increased and the real-time property is good.

Description

Activity recognition method based on degree of depth study and multi-scale information
Technical field
The present invention relates to Human bodys' response field, particularly relate to a kind of Activity recognition method based on degree of depth study and multi-scale information.
Background technology
Along with computer, image the maturation of first-class hardware technology and the requirements at the higher level of social management, the research of Human bodys' response increasingly causes the attention of computer vision research worker, and is widely used to automatic monitoring, event detection, man-machine interface, the every field such as video acquisition.Traditional Human bodys' response method carries out feature extraction first against each video describing human body behavior, such as histograms of oriented gradients (HistogramsofOrientedGradient, HOG), motion history image (MotionHistoryImage, MHI) etc., then adopt the grader such as support vector machine, random forest that the feature extracted is carried out Classification and Identification.Research based on the Human bodys' response of computational methods has been achieved for a lot of outstanding achievements, but there is also some insoluble problems: the feature of extraction is pointed, not easily extensive to other data;Computing cost is too big, is difficult to accomplish real-time.
Degree of depth study can automatically extract the multilayer feature being hidden between data and represent, the degree of depth Learning Studies based on convolutional neural networks achieves very big success in image classification, identification, location, segmentation etc..But, the convolution in image procossing is two dimension computing, it is impossible to directly apply to the 3 D video describing human body behavior.
Summary of the invention
Present invention aims to the deficiencies in the prior art, a kind of Activity recognition method based on degree of depth study and multi-scale information is provided, the method using degree of depth study can effectively carry out Activity recognition, especially when each behavior act difference is bigger, discrimination can be significantly improved, and the Generalization Capability of the present invention is good, can be trained on a large data sets, be subsequently used for lacking in training the Activity recognition field of data, can greatly reduce the time overhead of Activity recognition, and real-time is high.
The present invention, with deep video data for object of study, by building the multi-scale informations such as the deep neural network structure based on CNN, and the hand motion of the human body behavioural information of amalgamation of global and local, uses traditional two-dimentional CNN to study the Human bodys' response of three-dimensional.
The present invention is by building multiple degree of depth networks, and composition parallel organization carrys out the Human bodys' response of the depth of investigation video.First deep video is first split into multiple video-frequency band, then each parallel branch neutral net is used to learn respectively, the high-rise expression more each neutral net branch learnt carries out merging connection, it is attached after the data vector of each branch's neutral net, become one-dimensional vector, in order to input full articulamentum below.Finally full articulamentum is sent in the high-rise expression after fusion and layer of classifying carries out Classification and Identification.Meanwhile, nuance is had only in hand for major part behavior in MSRDailyActivity3D data set, as read, writing, with notebook computer, playing the behaviors such as game, the present invention proposes the thought of the multi-scale informations such as the global behavior information by merging coarseness and fine-grained hand motion.
The object of the present invention is achieved like this: a kind of Activity recognition method based on degree of depth study and multi-scale information, comprises the steps:
(1) training dataset is set up;The coarseness global behavior video that described training data is concentrated is selected from MSRDailyActivity3D data set.
(2) the deep neural network model with some parallel degree of depth convolutional neural networks is built;
(3) the coarseness global behavior video step-length L with setting of training data concentration is chosenStrideCarrying out segmentation, wherein, every segment length is set as LSeg, after segmentation, define NSegIndividual coarseness video-frequency band matrix, segments is NSeg=1+ (NF-LSeg)/LStride, NFFrame number for coarseness global behavior video;
(4) obtaining fine granularity local behavior video in the coarseness global behavior video from step (3), (3) same method that fine granularity local behavior video is taken steps carries out segmentation and obtains NSegIndividual fine granularity video-frequency band matrix;The size of each frame of fine granularity video-frequency band matrix is identical with the size of each frame of coarseness video-frequency band matrix.Intercept behavior sequence set beading degree local, the fine granularity local behavior video in the coarseness each frame of global behavior video.The behavior of fine-grained local can be hand motion, it is also possible to for the details action at other positions.Obtain fine granularity video method: centered by the left hand joint of the coarseness each frame of global behavior video, intercept the frame composition N of W/4 × H/4 sizeFThe new video of × W/4 × H/4, this video is fine granularity hand motion video, wherein W, H, NFThe frame number respectively comprised in the width of original depth frame of video, height and video.In the same size after this size and coarseness video down-sampling.
(5) N that step (3) is obtainedSegThe N that individual coarseness video-frequency band matrix and step (4) obtainSegWhat build in individual fine granularity video-frequency band matrix parallel feeding step (1) has 2NSegThe deep neural network model of individual parallel degree of depth convolutional neural networks is trained;
(6) choose that coarseness global behavior video to be identified carries out step (3), (4) respectively obtain NSegIndividual coarseness video-frequency band matrix and NSegIndividual fine granularity video-frequency band matrix, the N that will obtainSegIndividual coarseness video-frequency band matrix and NSegIndividual fine granularity video-frequency band matrix parallel is sent in the deep neural network model trained that step (5) obtains and is carried out Activity recognition.Coarseness global behavior video to be identified is the video through pretreatment.
Deep neural network in step (2), with convolutional neural networks for building block, has a classification layer, at least one convolutional layer, at least one pond layer and at least one full articulamentum.Parallel degree of depth convolutional neural networks includes the first volume lamination, the first pond layer, volume Two lamination, the second pond layer, the 3rd convolutional layer, the 3rd pond layer, the first full articulamentum, the second full articulamentum and the classification layer that are sequentially connected with.
Carry out segmentation again after each frame of the coarseness global behavior video in step (3) is carried out down-sampling, act as: 1, reduce amount of calculation;2, the size making each frame of coarseness video-frequency band matrix is identical with the size of each frame of fine granularity video-frequency band matrix, it is simple to input network.
Coarseness global behavior video is deep video.
The coarseness global behavior video that training data is concentrated is the video through pretreatment, and coarseness global behavior video to be identified is the video through pretreatment.Described pretreatment is: first, uses interpolation technique by all video specificationization in data set to unified length.This length is the intermediate value of all video lengths.Secondly, remove background, only retain video section focusing on people, and by video size specification to certain size.Again, using min-max method respectively by the x of all videos, y, z coordinate value standardizes to [0,1] scope.Finally, all samples are carried out flip horizontal and forms new sample thus the training sample in dilated data set at double.
A kind of Activity recognition method based on degree of depth study and multi-scale information, comprises the steps:
(1) training dataset is set up;The deep video that described training data is concentrated is selected from MSRDailyActivity3D data set;
(2) the deep neural network model with some parallel degree of depth convolutional neural networks is built;
(3) the behavior video step-length L with setting of training data concentration is chosenStrideCarrying out segmentation, wherein every segment length is set as LSeg, after segmentation, define NSegIndividual video-frequency band matrix, segments is NSeg=1+ (NF-LSeg)/LStride, NFFrame number for deep video;
(4) N that step (3) is obtainedSegWhat build in individual video-frequency band matrix parallel feeding step (2) has NSegThe deep neural network model of individual parallel degree of depth convolutional neural networks is trained;
(5) choose behavior video to be identified to carry out step (3) and obtain NSegIndividual video-frequency band matrix, the N that will obtainSegIndividual video-frequency band matrix parallel is sent in the deep neural network model trained and is carried out Activity recognition.Behavior video to be identified is the video through pretreatment.
Deep neural network in step (2), with convolutional neural networks for building block, has a classification layer, at least one convolutional layer, at least one pond layer and at least one full articulamentum.
Behavior video is deep video.
The behavior video that training data is concentrated is the video through pretreatment, and behavior video to be identified is the video through pretreatment.Described pretreatment is: first, uses interpolation technique by all video specificationization in data set to unified length.This length is the intermediate value of all video lengths.Secondly, remove background, only retain video section focusing on people, and by video size specification to certain size.Again, using min-max method respectively by the x of all videos, y, z coordinate value standardizes to [0,1] scope.Finally, all samples are carried out flip horizontal and forms new sample thus the training sample in dilated data set at double.
The invention have the benefit that the present invention obtains coarseness and fine granularity video matrix, designed parallel degree of depth convolutional neural networks is trained, deep neural network after training is used for the discriminator of behavior, the Generalization Capability making the present invention is good, can be trained on a large data sets, the Activity recognition field of the data that are subsequently used for lacking in training.
The present invention devises a parallel degree of depth convolutional neural networks, by the parallel input of behavior video, can greatly reduce the time overhead of Activity recognition, and real-time is effective.
The present invention uses deep video to be object of study, and deep video has the insensitive feature describing object geometry and light, color.
Experiment and result show, the human body behavior represented with deep video can effectively be identified by the degree of depth learning method based on CNN that the present invention proposes, in MSRDailyActivity3D data set behavior difference comparatively significantly lie down sofa, five behaviors of walking, play guitar, stand and sit down average recognition rate be 98%, the discrimination to behaviors all on whole data set is 60.625%.
Below in conjunction with the drawings and specific embodiments, the invention will be further described.
Accompanying drawing explanation
Fig. 1 is the theory diagram based on degree of depth study and the Activity recognition method of multi-scale information of the present invention;
Fig. 2 is the behavior video (under before pretreatment, above: drink water: write) in MSRDailyActivity3D;
Fig. 3 is the behavior video (under after pretreatment, above: drink water: write) in MSRDailyActivity3D.
Detailed description of the invention
Embodiment one
Referring to Fig. 1, a kind of Activity recognition method based on degree of depth study and multi-scale information, comprise the steps:
(1) training dataset is set up;The coarseness global behavior video that described training data is concentrated is selected from MSRDailyActivity3D data set.The coarseness global behavior video that training data is concentrated is the video through pretreatment.Coarseness global behavior video to be identified is the video through pretreatment.Described pretreatment is: first, uses interpolation technique by all video specificationization in data set to unified length.This length is the intermediate value of all video lengths.Secondly, remove background, only retain video section focusing on people, and by video size specification to certain size.Again, using min-max method respectively by the x of all videos, y, z coordinate value standardizes to [0,1] scope.Finally, all samples are carried out flip horizontal and forms new sample thus the training sample in dilated data set at double.
(2) the deep neural network model with some parallel degree of depth convolutional neural networks is built.Deep neural network in step (2), with convolutional neural networks for building block, has a classification layer, at least one convolutional layer, at least one pond layer and at least one full articulamentum.Present invention layer of classifying uses softmax grader.The parallel degree of depth convolutional neural networks of the present embodiment includes the first volume lamination, the first pond layer, volume Two lamination, the second pond layer, the 3rd convolutional layer, the 3rd pond layer, the first full articulamentum, the second full articulamentum and the classification layer that are sequentially connected with.
(3) the coarseness global behavior video step-length L with setting of training data concentration is chosenStrideCarrying out segmentation, wherein, every segment length is set as LSeg, after segmentation, define NSegIndividual coarseness video-frequency band matrix, segments is NSeg=1+ (NF-LSeg)/LStride, NFFrame number for coarseness global behavior video.Carry out segmentation again after each frame of the coarseness global behavior video in step (3) is carried out down-sampling, act as: 1, reduce amount of calculation;2, the size making each frame of coarseness video-frequency band matrix is identical with the size of each frame of fine granularity video-frequency band matrix, it is simple to input network.Object of study and coarseness global behavior video adopt deep video.
(4) obtaining fine granularity local behavior video in the coarseness global behavior video from step (3), (3) same method that fine granularity local behavior video is taken steps carries out segmentation and obtains NSegIndividual fine granularity video-frequency band matrix.The size of each frame of fine granularity video-frequency band matrix is identical with the size of each frame of coarseness video-frequency band matrix.Intercept behavior sequence set beading degree local, the fine granularity local behavior video in the coarseness each frame of global behavior video.The behavior of fine-grained local can be hand motion, it is also possible to for other details actions.The behavior of fine-grained local is determined according to concrete application, and the details action of notebook data collection is concentrated mainly on hand, if details action is at other positions, is then likely to choose the details action of other parts.The present embodiment is centered by the hand joint of the coarseness each frame of global behavior video, and intercepting the frame composition frame number being sized is NFFine granularity local behavior video.
(5) N that step (3) is obtainedSegThe N that individual coarseness video-frequency band matrix and step (4) obtainSegWhat build in individual fine granularity video-frequency band matrix parallel feeding step (1) has 2NSegThe deep neural network model of individual parallel degree of depth convolutional neural networks is trained;
(6) choose that coarseness global behavior video to be identified carries out step (3), (4) respectively obtain NSegIndividual coarseness video-frequency band matrix and NSegIndividual fine granularity video-frequency band matrix, the N that will obtainSegIndividual coarseness video-frequency band matrix and NSegIndividual fine granularity video-frequency band matrix parallel is sent in the deep neural network model trained and is carried out Activity recognition.N before the present embodimentSegIndividual network processes coarseness video, rear NSegIndividual network processes fine granularity video.
Embodiment two
Present embodiment discloses a kind of Activity recognition method based on degree of depth study and multi-scale information, the present embodiment only uses the global behavior information of coarseness to carry out Activity recognition.Comprise the steps:
(1) training dataset is set up;The deep video that described training data is concentrated is selected from MSRDailyActivity3D data set;The behavior video that training data is concentrated is the video through pretreatment.Behavior video to be identified is the video through pretreatment.Described pretreatment is: first, uses interpolation technique by all video specificationization in data set to unified length.This length is the intermediate value of all video lengths.Secondly, remove background, only retain video section focusing on people, and by video size specification to certain size.Again, using min-max method respectively by the x of all videos, y, z coordinate value standardizes to [0,1] scope.Finally, all samples are carried out flip horizontal and forms new sample thus the training sample in dilated data set at double.
(2) referring to Fig. 1, the deep neural network model with some parallel degree of depth convolutional neural networks is built.Deep neural network in step (2), with convolutional neural networks for building block, has a classification layer, at least one convolutional layer, at least one pond layer and at least one full articulamentum.
Present invention layer of classifying uses softmax grader.
(3) deep video step-length L with setting of training data concentration is chosenStrideCarrying out segmentation, wherein every segment length is set as LSeg, after segmentation, define NSegIndividual video-frequency band matrix, segments is NSeg=1+ (NF-LSeg)/LStride, NFFrame number for deep video;
(4) N that step (3) is obtainedSegWhat build in individual video-frequency band matrix parallel feeding step (2) has NSegThe deep neural network model of individual parallel degree of depth convolutional neural networks is trained;
(5) choose deep video to be identified to carry out step (3) and obtain NSegIndividual video-frequency band matrix, the N that will obtainSegIndividual video-frequency band matrix parallel is sent in the deep neural network model trained and is carried out Activity recognition.
Experimental procedure of the present invention describes as follows: assume that the video size representing a behavior after standardization is NF× W × H (being 192 × 128 × 128 in the present invention), the wherein width of W, H respectively frame of video and height.
(1) it is N by frame numberFBehavior video with LStrideCarrying out segmentation for step-length, wherein every segment length is LSeg, then segments is NSeg=1+ (NF-LSeg/LStride, then by frame of video 1/4 down-sampling, then define N after segmentationSeg×LSegThe video-frequency band matrix of × W/4 × H/4;
(2) centered by the left hand joint of each frame of deep video, the frame composition N of W/4 × H/4 size is interceptedFThe new video of × W/4 × H/4, new video is taken steps (1) same method obtains NSeg×LSegThe video-frequency band matrix of × W/4 × H/4;
(3) the video-frequency band matrix of step (1) and step (2) is carried out fusion and obtain 2NSeg×LSegThe video-frequency band matrix of × W/4 × H/4;This video-frequency band matrix is the input of degree of depth network, and namely this network has 2NSegIndividual parallel degree of depth convolutional neural networks, the input of each deep neural network is LSegThe video of × W/4 × H/4.
(4) using the parallel degree of depth convolutional neural networks of training data set pair to be trained, then use test data set to carry out the test of Human bodys' response, training dataset and tested data set are completely non-intersect.In the present invention tested 1,3,5,7,9} performance behavior video be used for training, and tested 2,4,6,8,10} performance behavior videos be used for testing.This data set is completed by 10 people's (tested), and the data of the 1st, 3,5,7,9 people are used for training, and the data of 2,4,6,8,10 these 5 people are used for testing.
Assume LSeg=16, LStride=16, then deep neural network framework needs to adopt 24 parallel networks, and the input of each network is the video-frequency band sequence of 16 × 32 × 32, and namely each video-frequency band contains 16 frame videos, and video image is sized to 32 × 32.
Degree of depth network that table 1 present invention uses and parameter thereof
Experiment and discussion
1. data set and pretreatment
The present invention adopts Microsoft to use the MSRDailyActivity3D data set that Kinect device gathers, and this data set have collected 16 kinds of behaviors common in daily life: drink water, eat a piece, read, make a phone call, write, with notebook computer, with vacuum cleaner, hail, stand still, paper-tear picture, object for appreciation game, sofa of lying down, walk, play guitar, stand and sit down.Each behavior act is completed in two different ways by same main examination: is sitting on sofa or stands.Whole data set has 320 behavior videos.Fig. 2 gives some the behavior samples in this data set.This data set have recorded human body behavior and surrounding simultaneously, and the depth information extracted contains substantial amounts of noise, and the most of behavior in data set is only locally lying in nuance, as shown in Fig. 2, Fig. 3, thus extremely challenging.
On pretreatment, each video having been carried out simple pretreatment, first, used interpolation technique by all video specificationization in data set to unified length, this length is the intermediate value of all video lengths;Secondly, remove background, only retain video section focusing on people, and by video size specification to certain size, as shown in Figure 3;Again, using min-max method respectively by the x of all videos, y, z coordinate value standardizes to [0,1] scope;Finally, all samples are carried out flip horizontal and forms new sample thus the training sample in dilated data set at double.The experiment of the present invention adopts Torch platform [20] to write, and learning rate therein is 1*10-4, loss function is the softmax function that platform carries.
2. merge based on multi-scale information and the HAR of degree of depth study identifies
The present invention uses the 2CNN2F network in table 1, using the input as degree of depth network of the multi-scale informations such as the global behavior identification video of coarseness and fine-grained hand motion sequence.Step-length L in this section experimentStrideWith segments LSegBeing disposed as 16, global behavior sequence and 12 × 16 × 32 × 32 locally hand motion sequences by extract whole video 12 × 16 × 32 × 32 merge fusion formation 24 × 16 × 32 × 32 input video matrixes.Table 2 gives the present invention and proposes method and additive method contrast of recognition performance on MSRDailyActivity3D data set.Wherein 2CNN2F refers to the global behavior information only using coarseness, and 2CNN2F+Joint then represents the multi-scale information fusion method of the present invention.Can be seen that from table the accuracy of the inventive method Activity recognition is 60.625%, iff the global behavior information using coarseness, its discrimination is in a slight decrease, is 56.875%, and the method for its recognition performance and Traditional Man feature extraction has comparability.It should be noted that, it is identified iff to the 11-16 behavior (namely play game, sofa of lying down, walk, play guitar, stand and sit down), then discrimination reaches 98%, this is possibly due between the 11-16 behavior have bigger difference, and other large number of rows in data set be between difference very trickle, as read, writing, have nuance with the several behavior of notebook computer only in hand motion.Experimental result illustrates, uses the method for degree of depth study can effectively carry out Activity recognition, and especially when each behavior act difference is bigger, discrimination can be significantly improved.
Table 2 the inventive method compares with additive method recognition performance on MSRDailyActivity3D data set
Algorithm Discrimination
LOP features[8] 42.5%
Joint Position features[8] 68%
Dynamic Temporal Warping[21] 54%
2CNN2F 56.875%
2CNN2F+Joint 60.625%
3. the network depth impact on identifying
The present invention constructs the neutral net containing 3 layers of CNN and 4 layers of CNN simultaneously respectively, i.e. 3CNN2F_8 and 4CNN2F (as shown in table 3), for the impact on recognition effect of the Probe into Network degree of depth.Network parameter is as shown in table 1.Owing to network depth increases, in order to ensure network not transition matching, the video sequence of this experiment use 24 × 8 × 128 × 128 is as the input of neutral net, it is about to 192 × 128 × 128 videos after standardization, with step-length for 8, split into the video-frequency band of 24 8 × 128 × 128, be input simultaneously to the neutral net with 24 parallel organizations.As shown in Table 2, discrimination when using 3CNN2F_8 network is 52.5%, and the discrimination of 4CNN2F is 58.75%.Experimental result illustrates that the increase of network depth can be effectively improved Activity recognition rate.
Parameter configuration in table 3 heterogeneous networks and discrimination
4. split the step-length impact on recognition effect
In order to check the fractionation step-length impact on recognition effect, the present invention is directed to 3CNN2F type network and build the network of two different inputs: 3CNN2F_8 and 3CNN2F_4, the input of 3CNN2F_8 is the video sequence of 24 × 8 × 128 × 128, and the input of 3CNN2F_4 be sized to 47 × 8 × 128 × 128, it is about to 192 × 128 × 128 videos after standardization, with step-length for 4, split into the video-frequency band of 47 8 × 128 × 128, the intersegmental repetition with 4 frames of two adjacent video after fractionation.Experimental result is as shown in table 3.When step-length is 8, recognition accuracy is 52.5%, and when step-length is 4, recognition accuracy is 56.875%.Discrimination effectively improves, and the reduction being primarily due to step-length can cause the change of two aspects, and step-length is more little on the one hand, the video-frequency band split is more many, and degree of depth network needs more parallel branch, and what become in the horizontal is wider, network parameter is more many, and the general Huaneng Group power of network is more good;On the other hand, step-length reduction and split video-frequency band increase so that training data have also been obtained increase, network training better effects if simultaneously.
In view of deep video has the insensitive feature describing object geometry and light, color, the present invention is with deep video for object of study, adopt traditional two-dimentional CNN (convolutional neural networks) to build deep neural network model, the behavior in MSRDailyActivity3D data set is carried out Classification and Identification.Experiment and result show, the human body behavior represented with deep video can effectively be identified by the degree of depth learning method based on CNN that this article proposes, in MSRDailyActivity3D data set behavior difference comparatively significantly lie down sofa, five behaviors of walking, play guitar, stand and sit down average recognition rate be 98%, the discrimination to behaviors all on whole data set is 60.625%.The discrimination how improving degree of depth study has also been carried out certain explorative experiment by the present invention simultaneously.Research finds to split the reduction of video-frequency band step-length, merges coarseness and fine-grained video information, suitably increases network depth and all can be effectively improved the discrimination of degree of depth network.
The present invention is not limited solely to above-described embodiment, carries out the technical scheme of few modifications, should fall into protection scope of the present invention when without departing substantially from technical solution of the present invention spirit.

Claims (10)

1. the Activity recognition method based on degree of depth study and multi-scale information, it is characterised in that comprise the steps:
(1) training dataset is set up;
(2) the deep neural network model with some parallel degree of depth convolutional neural networks is built;
(3) the coarseness global behavior video that training data is concentrated is chosen, with the step-length L setStrideCarrying out segmentation, wherein, every segment length is set as LSeg, after segmentation, define NSegIndividual coarseness video-frequency band matrix, segments is NSeg=1+ (NF-LSeg)/LStride, NFFrame number for coarseness global behavior video;
(4) obtaining fine granularity local behavior video in the coarseness global behavior video from step (3), (3) same method that fine granularity local behavior video is taken steps carries out segmentation and obtains NSegIndividual fine granularity video-frequency band matrix;
(5) N that step (3) is obtainedSegThe N that individual coarseness video-frequency band matrix and step (4) obtainSegWhat build in individual fine granularity video-frequency band matrix parallel feeding step (2) has 2NSegThe deep neural network model of individual parallel degree of depth convolutional neural networks is trained;
(6) choose that coarseness global behavior video to be identified carries out step (3), (4) respectively obtain NSegIndividual coarseness video-frequency band matrix and NSegIndividual fine granularity video-frequency band matrix, the N that will obtainSegIndividual coarseness video-frequency band matrix and NSegIndividual fine granularity video-frequency band matrix parallel is sent in the deep neural network model trained that step (5) obtains and is carried out Activity recognition.
2. the Activity recognition method based on degree of depth study and multi-scale information according to claim 1, it is characterized in that: the deep neural network in step (2), with convolutional neural networks for building block, has a classification layer, at least one convolutional layer, at least one pond layer and at least one full articulamentum.
3. the Activity recognition method based on degree of depth study and multi-scale information according to claim 1, it is characterized in that: carry out segmentation again after each frame of the coarseness global behavior video in step (3) is carried out down-sampling, the size making each frame of coarseness video-frequency band matrix is identical with the size of each frame of fine granularity video-frequency band matrix.
4. the Activity recognition method based on degree of depth study and multi-scale information according to claim 1, it is characterised in that: coarseness global behavior video is deep video.
5. the Activity recognition method based on degree of depth study and multi-scale information according to claim 1 or 4, it is characterized in that: the coarseness global behavior video that training data is concentrated is the video through pretreatment, and coarseness global behavior video to be identified is the video through pretreatment.
6. the Activity recognition method based on degree of depth study and multi-scale information according to claim 1, it is characterised in that: intercept behavior sequence set beading degree local, the fine granularity local behavior video in the coarseness each frame of global behavior video.
7. the Activity recognition method based on degree of depth study and multi-scale information, it is characterised in that comprise the steps:
(1) training dataset is set up;
(2) the deep neural network model with some parallel degree of depth convolutional neural networks is built;
(3) the behavior video step-length L with setting of training data concentration is chosenStrideCarrying out segmentation, wherein every segment length is set as LSeg, after segmentation, define NSegIndividual video-frequency band matrix, segments is NSeg=1+ (NF-LSeg)/LStride, NFFrame number for deep video;
(4) N that step (3) is obtainedSegWhat build in individual video-frequency band matrix parallel feeding step (2) has NSegThe deep neural network model of individual parallel degree of depth convolutional neural networks is trained;
(5) choose behavior video to be identified to carry out step (3) and obtain NSegIndividual video-frequency band matrix, the N that will obtainSegIndividual video-frequency band matrix parallel is sent in the deep neural network model trained and is carried out Activity recognition.
8. the Activity recognition method based on degree of depth study and multi-scale information according to claim 7, it is characterized in that: the deep neural network in step (2), with convolutional neural networks for building block, has a classification layer, at least one convolutional layer, at least one pond layer and at least one full articulamentum.
9. the Activity recognition method based on degree of depth study and multi-scale information according to claim 7, it is characterised in that: behavior video is deep video.
10. the Activity recognition method based on degree of depth study and multi-scale information according to claim 7 or 9, it is characterised in that: the behavior video that training data is concentrated is the video through pretreatment, and behavior video to be identified is the video through pretreatment.
CN201610047682.0A 2016-01-25 2016-01-25 Activity recognition method based on deep learning and multi-scale information Expired - Fee Related CN105740773B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610047682.0A CN105740773B (en) 2016-01-25 2016-01-25 Activity recognition method based on deep learning and multi-scale information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610047682.0A CN105740773B (en) 2016-01-25 2016-01-25 Activity recognition method based on deep learning and multi-scale information

Publications (2)

Publication Number Publication Date
CN105740773A true CN105740773A (en) 2016-07-06
CN105740773B CN105740773B (en) 2019-02-01

Family

ID=56247501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610047682.0A Expired - Fee Related CN105740773B (en) 2016-01-25 2016-01-25 Activity recognition method based on deep learning and multi-scale information

Country Status (1)

Country Link
CN (1) CN105740773B (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203503A (en) * 2016-07-08 2016-12-07 天津大学 A kind of action identification method based on skeleton sequence
CN106228240A (en) * 2016-07-30 2016-12-14 复旦大学 Degree of depth convolutional neural networks implementation method based on FPGA
CN106504266A (en) * 2016-09-29 2017-03-15 北京市商汤科技开发有限公司 The Forecasting Methodology of walking behavior and device, data processing equipment and electronic equipment
CN106778576A (en) * 2016-12-06 2017-05-31 中山大学 A kind of action identification method based on SEHM feature graphic sequences
CN106951872A (en) * 2017-03-24 2017-07-14 江苏大学 A kind of recognition methods again of the pedestrian based on unsupervised depth model and hierarchy attributes
CN107066979A (en) * 2017-04-18 2017-08-18 重庆邮电大学 A kind of human motion recognition method based on depth information and various dimensions convolutional neural networks
WO2018019126A1 (en) * 2016-07-29 2018-02-01 北京市商汤科技开发有限公司 Video category identification method and device, data processing device and electronic apparatus
CN107837087A (en) * 2017-12-08 2018-03-27 兰州理工大学 A kind of human motion state recognition methods based on smart mobile phone
CN107886117A (en) * 2017-10-30 2018-04-06 国家新闻出版广电总局广播科学研究院 The algorithm of target detection merged based on multi-feature extraction and multitask
CN108038107A (en) * 2017-12-22 2018-05-15 东软集团股份有限公司 Sentence sensibility classification method, device and its equipment based on convolutional neural networks
CN108182441A (en) * 2017-12-29 2018-06-19 华中科技大学 Parallel multichannel convolutive neural network, construction method and image characteristic extracting method
CN108182416A (en) * 2017-12-30 2018-06-19 广州海昇计算机科技有限公司 A kind of Human bodys' response method, system and device under monitoring unmanned scene
CN108280406A (en) * 2017-12-30 2018-07-13 广州海昇计算机科技有限公司 A kind of Activity recognition method, system and device based on segmentation double-stream digestion
CN108524209A (en) * 2018-03-30 2018-09-14 江西科技师范大学 Blind-guiding method, system, readable storage medium storing program for executing and mobile terminal
CN108664931A (en) * 2018-05-11 2018-10-16 中国科学技术大学 A kind of multistage video actions detection method
CN108805083A (en) * 2018-06-13 2018-11-13 中国科学技术大学 The video behavior detection method of single phase
CN109214375A (en) * 2018-11-07 2019-01-15 浙江大学 A kind of embryo's pregnancy outcome prediction meanss based on block sampling video features
CN109558805A (en) * 2018-11-06 2019-04-02 南京邮电大学 Human bodys' response method based on multilayer depth characteristic
CN109657546A (en) * 2018-11-12 2019-04-19 平安科技(深圳)有限公司 Video behavior recognition methods neural network based and terminal device
CN110119760A (en) * 2019-04-11 2019-08-13 华南理工大学 A kind of sequence classification method based on the multiple dimensioned Recognition with Recurrent Neural Network of stratification
CN110163127A (en) * 2019-05-07 2019-08-23 国网江西省电力有限公司检修分公司 A kind of video object Activity recognition method from thick to thin
CN110222587A (en) * 2019-05-13 2019-09-10 杭州电子科技大学 A kind of commodity attribute detection recognition methods again based on characteristic pattern
CN110222598A (en) * 2019-05-21 2019-09-10 平安科技(深圳)有限公司 A kind of video behavior recognition methods, device, storage medium and server
CN110321963A (en) * 2019-07-09 2019-10-11 西安电子科技大学 Based on the hyperspectral image classification method for merging multiple dimensioned multidimensional sky spectrum signature
CN111242110A (en) * 2020-04-28 2020-06-05 成都索贝数码科技股份有限公司 Training method of self-adaptive conditional random field algorithm for automatically breaking news items
WO2020244279A1 (en) * 2019-06-05 2020-12-10 北京京东尚科信息技术有限公司 Method and device for identifying video

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866429A (en) * 2010-06-01 2010-10-20 中国科学院计算技术研究所 Training method of multi-moving object action identification and multi-moving object action identification method
US20110182469A1 (en) * 2010-01-28 2011-07-28 Nec Laboratories America, Inc. 3d convolutional neural networks for automatic human action recognition
CN103593464A (en) * 2013-11-25 2014-02-19 华中科技大学 Video fingerprint detecting and video sequence matching method and system based on visual features
CN104299012A (en) * 2014-10-28 2015-01-21 中国科学院自动化研究所 Gait recognition method based on deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110182469A1 (en) * 2010-01-28 2011-07-28 Nec Laboratories America, Inc. 3d convolutional neural networks for automatic human action recognition
CN101866429A (en) * 2010-06-01 2010-10-20 中国科学院计算技术研究所 Training method of multi-moving object action identification and multi-moving object action identification method
CN103593464A (en) * 2013-11-25 2014-02-19 华中科技大学 Video fingerprint detecting and video sequence matching method and system based on visual features
CN104299012A (en) * 2014-10-28 2015-01-21 中国科学院自动化研究所 Gait recognition method based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WANQING LI 等: "Action Recognition Based on A Bag of 3D Points", 《2010 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION - WORKSHOPS》 *
李瑞峰 等: "人体动作行为识别研究综述", 《模式识别与人工智能》 *

Cited By (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203503A (en) * 2016-07-08 2016-12-07 天津大学 A kind of action identification method based on skeleton sequence
CN106203503B (en) * 2016-07-08 2019-04-05 天津大学 A kind of action identification method based on bone sequence
WO2018019126A1 (en) * 2016-07-29 2018-02-01 北京市商汤科技开发有限公司 Video category identification method and device, data processing device and electronic apparatus
CN106228240A (en) * 2016-07-30 2016-12-14 复旦大学 Degree of depth convolutional neural networks implementation method based on FPGA
CN106228240B (en) * 2016-07-30 2020-09-01 复旦大学 Deep convolution neural network implementation method based on FPGA
CN106504266A (en) * 2016-09-29 2017-03-15 北京市商汤科技开发有限公司 The Forecasting Methodology of walking behavior and device, data processing equipment and electronic equipment
CN106504266B (en) * 2016-09-29 2019-06-14 北京市商汤科技开发有限公司 The prediction technique and device of walking behavior, data processing equipment and electronic equipment
US10817714B2 (en) 2016-09-29 2020-10-27 Beijing Sensetime Technology Development Co., Ltd Method and apparatus for predicting walking behaviors, data processing apparatus, and electronic device
CN106778576A (en) * 2016-12-06 2017-05-31 中山大学 A kind of action identification method based on SEHM feature graphic sequences
CN106778576B (en) * 2016-12-06 2020-05-26 中山大学 Motion recognition method based on SEHM characteristic diagram sequence
CN106951872B (en) * 2017-03-24 2020-11-06 江苏大学 Pedestrian re-identification method based on unsupervised depth model and hierarchical attributes
CN106951872A (en) * 2017-03-24 2017-07-14 江苏大学 A kind of recognition methods again of the pedestrian based on unsupervised depth model and hierarchy attributes
CN107066979A (en) * 2017-04-18 2017-08-18 重庆邮电大学 A kind of human motion recognition method based on depth information and various dimensions convolutional neural networks
CN107886117A (en) * 2017-10-30 2018-04-06 国家新闻出版广电总局广播科学研究院 The algorithm of target detection merged based on multi-feature extraction and multitask
CN107837087A (en) * 2017-12-08 2018-03-27 兰州理工大学 A kind of human motion state recognition methods based on smart mobile phone
CN108038107B (en) * 2017-12-22 2021-06-25 东软集团股份有限公司 Sentence emotion classification method, device and equipment based on convolutional neural network
CN108038107A (en) * 2017-12-22 2018-05-15 东软集团股份有限公司 Sentence sensibility classification method, device and its equipment based on convolutional neural networks
CN108182441A (en) * 2017-12-29 2018-06-19 华中科技大学 Parallel multichannel convolutive neural network, construction method and image characteristic extracting method
CN108182441B (en) * 2017-12-29 2020-09-18 华中科技大学 Parallel multichannel convolutional neural network, construction method and image feature extraction method
CN108280406A (en) * 2017-12-30 2018-07-13 广州海昇计算机科技有限公司 A kind of Activity recognition method, system and device based on segmentation double-stream digestion
CN108182416A (en) * 2017-12-30 2018-06-19 广州海昇计算机科技有限公司 A kind of Human bodys' response method, system and device under monitoring unmanned scene
CN108524209A (en) * 2018-03-30 2018-09-14 江西科技师范大学 Blind-guiding method, system, readable storage medium storing program for executing and mobile terminal
CN108664931B (en) * 2018-05-11 2022-03-01 中国科学技术大学 Multi-stage video motion detection method
CN108664931A (en) * 2018-05-11 2018-10-16 中国科学技术大学 A kind of multistage video actions detection method
CN108805083A (en) * 2018-06-13 2018-11-13 中国科学技术大学 The video behavior detection method of single phase
CN109558805A (en) * 2018-11-06 2019-04-02 南京邮电大学 Human bodys' response method based on multilayer depth characteristic
CN109214375B (en) * 2018-11-07 2020-11-24 浙江大学 Embryo pregnancy result prediction device based on segmented sampling video characteristics
CN109214375A (en) * 2018-11-07 2019-01-15 浙江大学 A kind of embryo's pregnancy outcome prediction meanss based on block sampling video features
CN109657546A (en) * 2018-11-12 2019-04-19 平安科技(深圳)有限公司 Video behavior recognition methods neural network based and terminal device
CN110119760B (en) * 2019-04-11 2021-08-10 华南理工大学 Sequence classification method based on hierarchical multi-scale recurrent neural network
CN110119760A (en) * 2019-04-11 2019-08-13 华南理工大学 A kind of sequence classification method based on the multiple dimensioned Recognition with Recurrent Neural Network of stratification
CN110163127A (en) * 2019-05-07 2019-08-23 国网江西省电力有限公司检修分公司 A kind of video object Activity recognition method from thick to thin
CN110222587A (en) * 2019-05-13 2019-09-10 杭州电子科技大学 A kind of commodity attribute detection recognition methods again based on characteristic pattern
WO2020232886A1 (en) * 2019-05-21 2020-11-26 平安科技(深圳)有限公司 Video behavior identification method and apparatus, storage medium and server
CN110222598A (en) * 2019-05-21 2019-09-10 平安科技(深圳)有限公司 A kind of video behavior recognition methods, device, storage medium and server
WO2020244279A1 (en) * 2019-06-05 2020-12-10 北京京东尚科信息技术有限公司 Method and device for identifying video
US11967134B2 (en) 2019-06-05 2024-04-23 Beijing Jingdong Shangke Information Technology Co., Ltd. Method and device for identifying video
CN110321963A (en) * 2019-07-09 2019-10-11 西安电子科技大学 Based on the hyperspectral image classification method for merging multiple dimensioned multidimensional sky spectrum signature
CN110321963B (en) * 2019-07-09 2022-03-04 西安电子科技大学 Hyperspectral image classification method based on fusion of multi-scale and multi-dimensional space spectrum features
CN111242110B (en) * 2020-04-28 2020-08-14 成都索贝数码科技股份有限公司 Training method of self-adaptive conditional random field algorithm for automatically breaking news items
CN111242110A (en) * 2020-04-28 2020-06-05 成都索贝数码科技股份有限公司 Training method of self-adaptive conditional random field algorithm for automatically breaking news items

Also Published As

Publication number Publication date
CN105740773B (en) 2019-02-01

Similar Documents

Publication Publication Date Title
CN105740773A (en) Deep learning and multi-scale information based behavior identification method
Jia et al. Detection and segmentation of overlapped fruits based on optimized mask R-CNN application in apple harvesting robot
Parvathi et al. Detection of maturity stages of coconuts in complex background using Faster R-CNN model
Jia et al. Apple harvesting robot under information technology: A review
CN109829443A (en) Video behavior recognition methods based on image enhancement Yu 3D convolutional neural networks
CN109034210A (en) Object detection method based on super Fusion Features Yu multi-Scale Pyramid network
CN105574510A (en) Gait identification method and device
CN110008842A (en) A kind of pedestrian's recognition methods again for more losing Fusion Model based on depth
CN107977671A (en) A kind of tongue picture sorting technique based on multitask convolutional neural networks
CN107316058A (en) Improve the method for target detection performance by improving target classification and positional accuracy
CN109271888A (en) Personal identification method, device, electronic equipment based on gait
CN107527351A (en) A kind of fusion FCN and Threshold segmentation milking sow image partition method
CN109241871A (en) A kind of public domain stream of people's tracking based on video data
CN108898620A (en) Method for tracking target based on multiple twin neural network and regional nerve network
Burie et al. ICFHR2016 competition on the analysis of handwritten text in images of balinese palm leaf manuscripts
CN103164694A (en) Method for recognizing human motion
CN109508675A (en) A kind of pedestrian detection method for complex scene
CN106650804B (en) A kind of face sample cleaning method and system based on deep learning feature
CN110135502A (en) A kind of image fine granularity recognition methods based on intensified learning strategy
CN113470076B (en) Multi-target tracking method for yellow feather chickens in flat raising chicken house
CN114387499A (en) Island coastal wetland waterfowl identification method, distribution query system and medium
CN107808376A (en) A kind of detection method of raising one's hand based on deep learning
Lv et al. A visual identification method for the apple growth forms in the orchard
CN107729363A (en) Based on GoogLeNet network model birds population identifying and analyzing methods
CN109871905A (en) A kind of plant leaf identification method based on attention mechanism depth model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190201

Termination date: 20220125

CF01 Termination of patent right due to non-payment of annual fee