CN111046766A - Behavior recognition method and device and computer storage medium - Google Patents

Behavior recognition method and device and computer storage medium Download PDF

Info

Publication number
CN111046766A
CN111046766A CN201911215173.4A CN201911215173A CN111046766A CN 111046766 A CN111046766 A CN 111046766A CN 201911215173 A CN201911215173 A CN 201911215173A CN 111046766 A CN111046766 A CN 111046766A
Authority
CN
China
Prior art keywords
frame
data
accumulated
motion vector
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911215173.4A
Other languages
Chinese (zh)
Inventor
陈璐
陆辉
史海涛
丁静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Fiberhome Digtal Technology Co Ltd
Original Assignee
Wuhan Fiberhome Digtal Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Fiberhome Digtal Technology Co Ltd filed Critical Wuhan Fiberhome Digtal Technology Co Ltd
Priority to CN201911215173.4A priority Critical patent/CN111046766A/en
Publication of CN111046766A publication Critical patent/CN111046766A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a behavior recognition method, which is applied to the technical field of behavior recognition and comprises the following steps: acquiring a corresponding motion vector, a residual error and RGB frame data in a compressed video code stream file; obtaining accumulated residual data according to the residual; obtaining an accumulated motion vector according to the motion vector; using RGB frame data, accumulated motion vectors and accumulated residual data as the input of a deep learning model to obtain behavior characteristic vectors of the deep learning model; inputting the behavior feature vector into a classification model to obtain a classification result; and obtaining a behavior prediction classification result. And an apparatus and a computer storage medium are provided. By applying the embodiment of the invention, the time consumption caused by video decoding is avoided, the time bottleneck caused by the decoding link is eliminated, and the analysis efficiency of the video file is effectively improved.

Description

Behavior recognition method and device and computer storage medium
Technical Field
The present invention relates to the field of behavior recognition processing technologies, and in particular, to a behavior recognition method, a behavior recognition device, and a computer storage medium.
Background
With the vigorous development of city video monitoring project construction, analysis of video recording files generated by a video monitoring system is often a means of social station management.
In a conventional video analysis method, a video compression code stream is completely decoded, a pixel domain is analyzed, for example, decoding in common h.264 and h.265 formats is performed to obtain key frames and non-key frames corresponding to a video frame sequence, the key frames and the non-key frames are analyzed to obtain accumulated motion vectors and accumulated residual data, and then behavior recognition is performed through a human behavior recognition algorithm based on deep learning to obtain a recognition result.
Therefore, in the prior art, the compressed video code stream file needs to be decoded, time consumption caused by video decoding is caused, and the analysis efficiency of the video file is low.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a behavior recognition method, a behavior recognition device and a computer storage medium, wherein RGB frame data corresponding to a key frame, residual data of a non-key frame and a motion vector are obtained only by compressing a video code stream file, and then learning and classification are carried out according to the existing deep learning model, so that the compressed video code stream file is not required to be decoded, the time consumption caused by video decoding is avoided, the time bottleneck caused by the decoding link is eliminated, and the analysis efficiency of the video file is effectively improved.
The invention is realized by the following steps:
the invention provides a behavior recognition method, which comprises the following steps:
acquiring key frame data and non-key frame data in a compressed video code stream file, wherein the non-key frame data comprises: the method comprises the steps of obtaining a motion vector and a residual error, wherein the key frame data are RGB frame data;
obtaining accumulated residual data according to the residual;
obtaining an accumulated motion vector according to the motion vector;
taking the RGB frame data, the accumulated motion vector and the accumulated residual data as the input of a deep learning model, and obtaining a behavior characteristic vector of the deep learning model;
inputting the behavior feature vector output by the deep learning model into an SVM classifier for behavior prediction;
and obtaining a behavior prediction classification result.
Further, the step of obtaining key frame data and non-key frame data in the compressed video code stream file includes:
and decoding the compressed video code stream file by adopting a media file conversion tool to obtain key frame data, and extracting the motion vector and the residual error of the non-key frame.
Further, the obtaining of the accumulated motion vector according to the motion vector specifically includes:
Figure BDA0002299306730000021
alternatively, the first and second electrodes may be,
Figure BDA0002299306730000022
wherein the content of the first and second substances,
Figure BDA0002299306730000023
wherein the content of the first and second substances,
Figure BDA0002299306730000024
a motion vector in the p-th frame representing a block of pixels at position i of the t-th frame, p ≦ t,
Figure BDA0002299306730000025
accumulated motion vectors of pixel blocks representing position i of the t-th frame from the k-th frame to the t-th frame,
Figure BDA0002299306730000026
the pixel block representing position i of the t-th frame is traced back from the t-th frame to the reference position of the k-th frame.
Further, the specific expression adopted for obtaining the accumulated residual data is as follows:
Figure BDA0002299306730000031
wherein the content of the first and second substances,
Figure BDA0002299306730000032
for the accumulated residual of the ith pixel block in the tth frame,
Figure BDA0002299306730000033
the residual error of the ith pixel block in the t-th frame is the backtracking position of the pixel block in the t-1 frame
Figure BDA0002299306730000034
Corresponding residual error is
Figure BDA0002299306730000035
Further, the step of using the RGB frame data, the accumulated motion vector, and the accumulated residual data as input of a deep learning model and obtaining a behavior feature vector of the deep learning model includes:
acquiring an accumulated motion vector corresponding to each non-key frame and residual error data corresponding to each non-key frame;
forming an input sequence by the RGB frame data, the accumulated motion vector corresponding to each non-key frame and the residual error data corresponding to each non-key frame;
and taking the input sequence as the input of a deep learning model, and obtaining a behavior feature vector of the deep learning model.
Further, the step of classifying the feature vectors according to the classification model to obtain a classification result includes:
and classifying the feature vectors according to a Support Vector Machine (SVM) to obtain a classification result.
Further, the training process of the deep learning model comprises the following steps:
obtaining a test data set corresponding to multiple types of behaviors, wherein the test data set comprises: RGB frame data, accumulated motion vectors and accumulated residual data;
constructing an input layer: the test data set is used for determining the number of the neurons of the input layer and receiving the test data set;
constructing a rolling layer: the step of constructing the convolution layer is to determine the size and the step length of the convolution kernel, and the size of the convolution kernel is determined according to the size of the input data scale and the type of the data;
constructing a down-sampling layer for completing the determination of the pooling size and step size and the pooling type;
constructing a full connection layer;
the connection mode is as follows: an input layer, a convolutional layer, a sampling layer, a convolutional layer and a full-connection layer;
and when the model precision is not less than the preset value, determining the current neural network as an available model.
In addition, the invention also discloses a behavior recognition device, which comprises a processor and a memory connected with the processor through a communication bus; wherein the content of the first and second substances,
the memory is used for storing a behavior recognition program;
the processor is configured to execute the behavior recognition program to implement any of the behavior recognition steps.
Also, a computer storage medium is disclosed that stores one or more programs that are executable by one or more processors to cause the one or more processors to perform any of the behavior recognition steps.
The behavior recognition method, the behavior recognition device and the computer storage medium have the advantages that the method, the device and the computer storage medium are applied to the following steps:
(1) obtaining RGB frame data, accumulated residual data of each frame and accumulated motion vectors by directly obtaining key frame data and non-key frame data from a compressed video code stream file; then, the RGB frame data, the accumulated motion vector and the accumulated residual data are used as the input of a deep learning model, and a behavior characteristic vector of the deep learning model is obtained; and inputting the behavior feature vector into a classification model to obtain a classification result, namely a behavior recognition result. According to the method, only RGB frame data corresponding to the key frames, residual error data of non-key frames and motion vectors need to be obtained through the compressed video code stream file, then learning and classification are carried out according to the existing deep learning model, the compressed video code stream file does not need to be decoded, time consumption caused by video decoding is avoided, time bottleneck caused by a decoding link is eliminated, and analysis efficiency of the video file is effectively improved.
(2) The human body behaviors in the video are identified by combining the convolutional neural network, so that the rapid, efficient and accurate behavior identification is achieved.
(3) The dependency relationship of the decoding sequence of the non-key frames is removed through the decoupling model, all frame data can be processed in parallel by using hardware such as a GPU (graphics processing unit), a multi-core processor and the like, and the processing time of the non-key frame data is shortened.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a behavior recognition method according to an embodiment of the present invention;
fig. 2 is a schematic view of an application scenario of the behavior recognition method according to the embodiment of the present invention;
fig. 3 is a schematic view of an application scenario of the behavior recognition apparatus according to the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, an embodiment of the present invention provides a behavior identification method, including the following steps:
s101, obtaining key frame data and non-key frame data in a compressed video code stream file, wherein the non-key frame data comprises: the motion vector and the residual error, and the key frame data are RGB frame data.
It should be noted that the I frame represents a key frame, which can be understood as a complete reservation of this frame; only the frame data is needed to complete the decoding, and the frame data comprises a complete picture; the P frame represents the difference between the frame and a previous key frame (or P frame), and the difference defined by the frame needs to be superimposed on the previously buffered picture to generate the final picture when decoding. Thus, a P frame represents a different frame, which has no full picture data, only data that is different from the picture of the previous frame.
It will be appreciated that for a video sequence S, it is expressed as S ═ { I, P …. P, I, P, … P … I }, where I is a key frame and P is a non-key frame. In the video code stream of the compressed domain, frame data of the code stream is acquired through an open source tool ffmpeg, so that a set obtained by extracting key frame data in a video sequence is expressed as SIThe non-critical frame data set is expressed as S, { I, … I }, where S is the number of frames in the non-critical frame data setp={P,P,…P}。
In the embodiment of the invention, the compressed video code stream file is not decoded, but an open source tool ffmpeg is directly adopted to carry out motion vector mapping on each relevant motion vector in the compressed video code stream file, and the motion vectors can form a motion vector diagram ImotionAnd obtaining residual errors, composing a residual error map I from residual error dataresidual
It can be understood that, the correspondence between the compressed video code stream file and the time can obtain different key frames and non-key frames, as shown in fig. 2, the obtained key frame P, and the motion vector and the residual error corresponding to the non-key frame P are given at different times.
It is understood that in inter coding, the relative displacement between the current coding block and the best matching block in its reference picture is represented by a Motion Vector (MV). Each divided block has corresponding motion information to be transmitted to a decoding end. If the MVs of each block are coded and transmitted independently, especially divided into small-sized blocks, a considerable number of bits are consumed. In the process of coding the MV of the current macroblock, the H.264/AVC firstly uses the MVs of adjacent coded blocks to predict the MV of the current macroblock, and then codes a difference value (marked as MVD (motion Vector difference)) between a predicted value (marked as MVP (motion Vector prediction)) of the MV and a real estimated value of the MV, thereby effectively reducing the coding bit number of the MV.
For example, estimating a motion vector for each 4 × 4 sub-block in an image, first calculating the median of all motion vectors in region a1, where there is one motion vector for each 4 × 4 sub-block in region a1, and extracting the motion vectors from a video stream during decoding; the total number of the calculated motion vectors is 16, which is mv i, i is 0,1, …,15, and the calculation result of the motion vector of each 4 × 4 sub-block is: mv 0-mv 15. Then, according to the current macroblock type, for example, P16 × 16 is selected, where the P16 × 16 type includes an initial motion vector mv16 × 16 and a refinement step re16 × 16, and then the motion vector of the block is calculated.
It should be noted that a B picture (frame) is a coded picture, also called a bidirectional predicted frame, which compresses the amount of data to be transmitted, taking into account both the coded frame preceding the source picture sequence and the temporal redundancy information between the coded frames following the source picture sequence.
The encoding process of P frame and B frame encoding is as follows: motion estimation is performed and the rate distortion function (pitch) values for the inter-frame coding mode are calculated. P frames refer to only preceding frames and B frames may refer to following frames. And performing intra-frame prediction, comparing the selected intra-frame mode with the minimum rate distortion function value with the inter-frame mode, and determining which coding mode is adopted to calculate the difference value between the actual value and the predicted value. The residual is transformed and quantized and then encoded. Therefore, a residual can be obtained in the encoded data. As shown in fig. 2, the residual error corresponding to the P frame at each time is obtained after decoding.
And S102, obtaining accumulated residual error data according to the residual error.
Accordingly, the accumulated residual can be expressed as:
Figure BDA0002299306730000071
wherein the content of the first and second substances,
Figure BDA0002299306730000072
for the accumulated residual of the ith pixel block in the tth frame,
Figure BDA0002299306730000073
the residual error of the ith pixel block in the tth frame relative to the previous frame is the backtracking position of the pixel block in the previous frame (namely t-1 frame)
Figure BDA0002299306730000074
Corresponding residual error is
Figure BDA0002299306730000075
And analogizing in turn to obtain the accumulated residual calculation formula.
And S103, acquiring an accumulated motion vector according to the motion vector.
It can be understood that the obtained motion vector diagram is decoupled to obtain an accumulated motion vector, and the steps are as follows:
for any P frame t, by(t)Representing the spatial displacement of a group of pixels in the t-th frame, the reference position in the previous frame for a block of pixels that appears at spatial position i in the t-th frame can be expressed as:
Figure BDA0002299306730000076
further, by
Figure BDA0002299306730000077
The pixel block representing the position i of the t-th frame is in the p-th frame (p)<t), then the location of the backward trace in the k (k ≦ t) th frame may be expressed as
Figure BDA0002299306730000078
Where i is the position of the pixel block,
Figure BDA0002299306730000079
indicating that the pixel block is traced back from the t-th frame to the reference position of the k-th frame.
The accumulated motion vector map can be represented as
Figure BDA0002299306730000081
Wherein the content of the first and second substances,
Figure BDA0002299306730000082
the accumulated motion vector representing the pixel block at position i from the kth frame to the tth frame is calculated by subtracting the backtracking position in the kth frame from the current position i.
It will be appreciated that based on the processing of each non-key frame, an accumulated motion vector and an accumulated residual may be obtained for each non-key frame, as shown in fig. 2, for each non-key frame corresponding to a motion vector, a residual, an accumulated motion vector, and accumulated residual data.
And S104, taking the RGB frame data, the accumulated motion vector and the accumulated residual data as the input of a deep learning model, and obtaining a behavior characteristic vector of the deep learning model.
It should be noted that the deep learning model is trained in advance, and is used for training according to RGB frame data, the accumulated motion vectors, and the accumulated residual data, and obtaining a model corresponding to the behavior feature vectors.
The deep learning model in the embodiment of the present invention is a Convolutional Neural Network (CNN), which is a kind of feed forward Neural network (fed Neural Networks) that includes convolution calculation and has a deep structure, and is one of the representative algorithms of deep learning (deep learning).
Further, the step of using the RGB frame data, the accumulated motion vector, and the accumulated residual data as input of a deep learning model and obtaining a behavior feature vector of the deep learning model includes: acquiring an accumulated motion vector corresponding to each non-key frame and residual error data corresponding to each non-key frame; forming an input sequence by the RGB frame data, the accumulated motion vector corresponding to each non-key frame and the residual error data corresponding to each non-key frame; and taking the input sequence as the input of a deep learning model, and obtaining a behavior feature vector of the deep learning model.
It should be noted that, based on the above formula, the accumulated motion vector D and the residual data R corresponding to each non-key frame can be obtained, and assuming t frames in total, the obtained input image sequence is { I(0)(1),R(1),…φ(t),R(t)I is RGB frame data, which is the input to the convolutional neural network.
Further, the training process of the deep learning model comprises the following steps:
obtaining a test data set corresponding to multiple types of behaviors, wherein the test data set comprises: RGB frame data, accumulated motion vectors and accumulated residual data;
constructing an input layer: the test data set is used for determining the number of the neurons of the input layer and receiving the test data set;
constructing a rolling layer: the step of constructing the convolution layer is to determine the size and the step length of the convolution kernel, and the size of the convolution kernel is determined according to the size of the input data scale and the type of the data;
constructing a down-sampling layer for completing the determination of the pooling size and step size and the pooling type;
constructing a full connection layer;
specifically, the connection mode is as follows: an input layer, a convolutional layer, a sampling layer, a convolutional layer and a full-connection layer; and when the model accuracy is not less than the preset value, the current neural network is considered as an available model. Each cube of the 3D convolution kernel convolution consists of 9 consecutive frames of the input image, with a patch size of 60x40 per frame.
After multi-layer convolution and down-sampling, each successive 9 frames of the input image is converted into a 128-dimensional feature vector that captures the motion information of the input frame. The number of nodes of the output layer is consistent with the number of types of behaviors, and each node is fully connected with the 128 nodes in C6. Finally, a linear classifier is adopted to classify the 128-dimensional feature vectors, and behavior recognition is achieved.
And S105, inputting the behavior feature vector output by the deep learning model into an SVM classifier for behavior prediction.
It should be noted that the classification model is a Support Vector Machine SVM, an SVM (Support Vector Machine, SVM for short) is a generalized linear classifier (generalized linear classifier) that performs binary classification on data in a supervised learning (supervised learning) manner, and a decision boundary is a maximum-margin hyperplane (maximum-margin hyperplane) that solves a learning sample. The SVM calculates an empirical risk (empirical risk) using a hinge loss function (change loss) and adds a regularization term to a solution system to optimize a structural risk (structural risk), which is a classifier with sparsity and robustness.
The embodiment of the invention adopts a linear SVM, gives input data and a learning target, the hard boundary SVM is an algorithm for solving a maximum edge-margin hyperplane (maximum-margin hyperplane) in a linear separable problem, and the constraint condition is that the distance between a sample point and a decision boundary is more than or equal to 1. The hard boundary SVM can be converted into an equivalent quadratic convex optimization (quadratic convex optimization) problem to be solved, and the decision boundary can classify any sample.
The behavior recognition and classification method adopts a Support Vector Machine (SVM) classifier, the SVM maps nonlinear samples to a high-dimensional space by utilizing the idea of kernel function so as to enable the nonlinear samples to be linearly separable, and then the optimal segmentation hyperplane is found by maximizing the classification interval between data sets.
The SVM classifier calculates an optimal segmentation hyperplane, and the equation is as follows:
wTx+b=0
where x is the input vector, w is the weight vector, and b is the bias term.
For each data point (x, y) in the sample space, the following inequality is satisfied:
yi(wTxi+b)≥0
the problem of calculating the optimal classification surface can be converted into a dual problem by adopting a Lagrange optimization method, and when the optimal classification surface is searched, a kernel function K (x) can be selectedi,xj) And solving the linear classification problem after nonlinear transformation. The kernel function is defined as follows:
if X is the input space and H is the feature space, if there is a mapping from X to H:
φ(x):X→H
so that for all xi,xjE.g. X, function K (X)i,xj) The conditions are satisfied:
K(xi,xj)=φ(xi)·φ(xj)
in the formula, K (x)i,xj) Is a kernel function, phi (x) is a mapping function, phi (x)i)·φ(xj) Is phi (x)i) And phi (x)j) The inner product of (d).
The classification calculation function is formulated as follows:
Figure BDA0002299306730000101
in the formula (I), the compound is shown in the specification,
Figure BDA0002299306730000102
as lagrange multiplier, b*To classify the threshold, K (x)iX) is an inner product function, XiAnd yiAre vector coordinates in sample space.
Data point x belongs to this category when f (x) > 0; otherwise, data point x does not belong to this category.
The commonly used inner product kernel functions include polynomial kernel functions, radial basis functions and Sigmoid functions, and gaussian radial basis kernel functions are used in the present invention. The classification process of the SVM classifier is as follows:
inputting the behavior recognition sample data set into an SVM classifier for training; optimizing the parameters, and constructing a training model by using the parameters after obtaining the optimal parameters; inputting the behavior feature vector output by the convolutional neural network into an SVM classifier for behavior prediction; and acquiring a behavior prediction classification result and an identification rate.
And S106, acquiring a behavior prediction classification result.
It should be noted that the categories of the behaviors include running, jumping, walking, and the like, and whether the current behavior is running, jumping, or walking can be obtained according to the classification result of the SVM, so the classification result of the SVM is used as the recognition result.
In addition, as shown in fig. 3, the present invention also discloses a behavior recognition device 300, wherein the device 300 comprises a processor 310, and a memory 320 connected with the processor 310 through a communication bus 330; wherein the content of the first and second substances,
the memory 320 is used for storing a behavior recognition program;
the processor 310 is configured to execute the behavior recognition program to implement any one of the behavior recognition steps.
And a computer storage medium storing one or more programs executable by one or more processors 310 as shown in fig. 3 to cause the one or more processors 310 to perform any of the behavior recognition steps are disclosed.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (9)

1. A method of behavior recognition, the method comprising:
acquiring key frame data and non-key frame data in a compressed video code stream file, wherein the non-key frame data comprises: the method comprises the steps of obtaining a motion vector and a residual error, wherein the key frame data are RGB frame data;
obtaining accumulated residual data according to the residual;
obtaining an accumulated motion vector according to the motion vector;
taking the RGB frame data, the accumulated motion vector and the accumulated residual data as the input of a deep learning model to obtain a behavior characteristic vector of the deep learning model;
inputting the behavior feature vector output by the deep learning model into an SVM classifier for behavior prediction;
and obtaining a behavior prediction classification result.
2. The behavior recognition method according to claim 1, wherein the step of obtaining key frame data and non-key frame data in the compressed video code stream file comprises:
and decoding the compressed video code stream file by adopting a media file conversion tool to obtain key frame data, and extracting the motion vector and the residual error of the non-key frame.
3. A method for behavior recognition according to claim 1 or 2, wherein said deriving from said motion vectors a representation of an accumulated motion vector comprises:
Figure FDA0002299306720000011
alternatively, the first and second electrodes may be,
Figure FDA0002299306720000012
wherein the content of the first and second substances,
Figure FDA0002299306720000013
wherein the content of the first and second substances,
Figure FDA0002299306720000014
the motion vector of the pixel block at position i of the t-th frame in the p-th frame, p ≦ t, φi (t,k)Accumulated motion vectors for pixel blocks representing position i of the t-th frame from the k-th frame to the t-th frame ηi (t,k)The pixel block representing position i of the t-th frame is traced back from the t-th frame to the reference position of the k-th frame.
4. A method for behavior recognition according to claim 3, wherein the specific expression used to obtain the accumulated residual data from the residual is as follows:
Figure FDA0002299306720000021
wherein the content of the first and second substances,
Figure FDA0002299306720000022
for the accumulated residual of the ith pixel block in the tth frame,
Figure FDA0002299306720000023
the residual error of the ith pixel block in the t-th frame is the backtracking position of the pixel block in the t-1 frame
Figure FDA0002299306720000024
Corresponding residual error is
Figure FDA0002299306720000025
5. The behavior recognition method according to claim 1, wherein the step of obtaining the behavior feature vector of the deep learning model by using the RGB frame data, the accumulated motion vector, and the accumulated residual data as the input of the deep learning model comprises:
acquiring an accumulated motion vector corresponding to each non-key frame and residual error data corresponding to each non-key frame;
forming an input sequence by the RGB frame data, the accumulated motion vector corresponding to each non-key frame and the residual error data corresponding to each non-key frame;
and taking the input sequence as the input of a deep learning model, and obtaining a behavior feature vector of the deep learning model.
6. The behavior recognition method according to any one of claims 1-2 and 4-5, wherein the step of classifying the feature vectors according to a classification model to obtain a classification result comprises:
and classifying the feature vectors according to a Support Vector Machine (SVM) to obtain a classification result.
7. The behavior recognition method according to claim 6, wherein the training process of the deep learning model includes:
obtaining a test data set corresponding to multiple types of behaviors, wherein the test data set comprises: RGB frame data, accumulated motion vectors and accumulated residual data;
constructing an input layer: the test data set is used for determining the number of the neurons of the input layer and receiving the test data set;
constructing a rolling layer: the step of constructing the convolution layer is to determine the size and the step length of the convolution kernel, and the size of the convolution kernel is determined according to the size of the input data scale and the type of the data;
constructing a down-sampling layer for completing the determination of the pooling size and step size and the pooling type;
constructing a full connection layer;
the connection mode is as follows: an input layer, a convolutional layer, a sampling layer, a convolutional layer and a full-connection layer;
and when the model precision is not less than the preset value, determining the current neural network as an available model.
8. A behavior recognition apparatus, comprising a processor, and a memory connected to the processor via a communication bus; wherein the content of the first and second substances,
the memory is used for storing a behavior recognition program;
the processor configured to execute the behavior recognition program to implement the behavior recognition step according to any one of claims 1 to 7.
9. A computer storage medium, characterized in that the computer storage medium stores one or more programs executable by one or more processors to cause the one or more processors to perform the behavior recognition steps of any of claims 1 to 7.
CN201911215173.4A 2019-12-02 2019-12-02 Behavior recognition method and device and computer storage medium Pending CN111046766A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911215173.4A CN111046766A (en) 2019-12-02 2019-12-02 Behavior recognition method and device and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911215173.4A CN111046766A (en) 2019-12-02 2019-12-02 Behavior recognition method and device and computer storage medium

Publications (1)

Publication Number Publication Date
CN111046766A true CN111046766A (en) 2020-04-21

Family

ID=70234394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911215173.4A Pending CN111046766A (en) 2019-12-02 2019-12-02 Behavior recognition method and device and computer storage medium

Country Status (1)

Country Link
CN (1) CN111046766A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626178A (en) * 2020-05-24 2020-09-04 中南民族大学 Compressed domain video motion recognition method and system based on new spatio-temporal feature stream
CN112057830A (en) * 2020-09-10 2020-12-11 成都拟合未来科技有限公司 Training method, system, terminal and medium based on multi-dimensional motion capability recognition
CN112215908A (en) * 2020-10-12 2021-01-12 国家计算机网络与信息安全管理中心 Compressed domain-oriented video content comparison system, optimization method and comparison method
CN112637200A (en) * 2020-12-22 2021-04-09 武汉烽火众智数字技术有限责任公司 Loosely-coupled video target tracking implementation method
WO2022053080A3 (en) * 2020-09-10 2022-04-28 成都拟合未来科技有限公司 Training method and system based on multi-dimensional movement ability recognition, terminal, and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103618907A (en) * 2013-11-08 2014-03-05 天津大学 Multi-viewpoint distributed type video encoding and frame arranging device and method based on compressed sensing
CN105338357A (en) * 2015-09-29 2016-02-17 湖北工业大学 Distributed video compressed sensing coding technical method
CN109743575A (en) * 2018-12-05 2019-05-10 四川大学 A kind of DVC-HEVC video transcoding method based on naive Bayesian

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103618907A (en) * 2013-11-08 2014-03-05 天津大学 Multi-viewpoint distributed type video encoding and frame arranging device and method based on compressed sensing
CN105338357A (en) * 2015-09-29 2016-02-17 湖北工业大学 Distributed video compressed sensing coding technical method
CN109743575A (en) * 2018-12-05 2019-05-10 四川大学 A kind of DVC-HEVC video transcoding method based on naive Bayesian

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHAO-YUAN WU 等: "Compressed Video Action Recognition", 《ARXIV》 *
VADIM KANTOROV 等: "Efficient feature extraction, encoding and classification for action recognition", 《IEEE》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626178A (en) * 2020-05-24 2020-09-04 中南民族大学 Compressed domain video motion recognition method and system based on new spatio-temporal feature stream
CN112057830A (en) * 2020-09-10 2020-12-11 成都拟合未来科技有限公司 Training method, system, terminal and medium based on multi-dimensional motion capability recognition
CN112057830B (en) * 2020-09-10 2021-07-27 成都拟合未来科技有限公司 Training method, system, terminal and medium based on multi-dimensional motion capability recognition
WO2022053080A3 (en) * 2020-09-10 2022-04-28 成都拟合未来科技有限公司 Training method and system based on multi-dimensional movement ability recognition, terminal, and medium
CN112215908A (en) * 2020-10-12 2021-01-12 国家计算机网络与信息安全管理中心 Compressed domain-oriented video content comparison system, optimization method and comparison method
CN112637200A (en) * 2020-12-22 2021-04-09 武汉烽火众智数字技术有限责任公司 Loosely-coupled video target tracking implementation method

Similar Documents

Publication Publication Date Title
CN111046766A (en) Behavior recognition method and device and computer storage medium
CN110796662B (en) Real-time semantic video segmentation method
CN113328755B (en) Compressed data transmission method facing edge calculation
CN113132727B (en) Scalable machine vision coding method and training method of motion-guided image generation network
CN112887712B (en) HEVC intra-frame CTU partitioning method based on convolutional neural network
CN114286093A (en) Rapid video coding method based on deep neural network
EP4173292A1 (en) Method and system for image compressing and coding with deep learning
KR20230046310A (en) Signaling of feature map data
Mital et al. Neural distributed image compression using common information
CN116962708A (en) Intelligent service cloud terminal data optimization transmission method and system
US6594375B1 (en) Image processing apparatus, image processing method, and storage medium
TW202337211A (en) Conditional image compression
CN116824694A (en) Action recognition system and method based on time sequence aggregation and gate control transducer
CN111310594A (en) Video semantic segmentation method based on residual error correction
CN113780129B (en) Action recognition method based on unsupervised graph sequence predictive coding and storage medium
Aliouat et al. An efficient low complexity region-of-interest detection for video coding in wireless visual surveillance
Chen et al. Point cloud compression with sibling context and surface priors
Ndubuaku et al. Edge-enhanced analytics via latent space dimensionality reduction
CN112399177A (en) Video coding method and device, computer equipment and storage medium
CN114501031B (en) Compression coding and decompression method and device
Antonio et al. Learning-based compression of visual objects for smart surveillance
CN113902000A (en) Model training, synthetic frame generation, video recognition method and device and medium
JP2022078735A (en) Image processing device, image processing program, image recognition device, image recognition program, and image recognition system
CN113556546A (en) Two-stage multi-hypothesis prediction video compressed sensing reconstruction method
Bondarchuk et al. Motion Vector Search Algorithm for Motion Compensation in Video Encoding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200421

RJ01 Rejection of invention patent application after publication