CN111444861A

CN111444861A - Vehicle theft behavior identification method based on monitoring video

Info

Publication number: CN111444861A
Application number: CN202010238317.4A
Authority: CN
Inventors: 李凡; 文帅; 贺丽君
Original assignee: GUANGDONG XI'AN JIAOTONG UNIVERSITY ACADEMY; Xian Jiaotong University
Current assignee: GUANGDONG XI'AN JIAOTONG UNIVERSITY ACADEMY; Xian Jiaotong University
Priority date: 2020-03-30
Filing date: 2020-03-30
Publication date: 2020-07-24

Abstract

The invention discloses a vehicle theft behavior identification method based on a monitoring video, which comprises the following steps: firstly, extracting features by using a 3-dimensional convolution pre-training model, adding a space-time joint attention mechanism before classification to focus space-time positions where vehicle stealing behaviors occur, and finally realizing the recognition of the stealing behaviors through classification. Aiming at the problem that criminal target criminal behaviors are hidden in a monitoring video and the detection is difficult due to small action amplitude, a time attention mechanism is added into a network to capture the change of motion information in time; aiming at the problems that the criminal target in the monitoring video is small and the occupied position in space is small, a space attention mechanism is added into the network; then, time-space information of the vehicle stealing behavior is modeled by using a time-space joint mechanism of time-space cascade, and 3-dimensional space-time characteristics with better discrimination are obtained, so that the identification accuracy is improved, and the identification accuracy on a vehicle stealing behavior data set reaches 97.8%.

Description

Vehicle theft behavior identification method based on monitoring video

Technical Field

The invention belongs to the field of behavior identification, and particularly relates to a vehicle theft behavior identification method based on a monitoring video.

Background

The vehicle stealing behavior is a common criminal behavior, not only affects the security and stability of the society, but also causes great loss to public property and personal property, and the accurate recognition of the vehicle stealing behavior has great practical significance for guaranteeing the property safety of people and maintaining the harmony and stability of the society. In recent years, the deployment of a large amount of video monitoring equipment plays a great role in preventing criminal behaviors, but the problems of resource idling, monitoring and uncontrollable exist, and the mode of manually checking the stealing behaviors for massive monitoring video data is high in cost and low in efficiency. Therefore, it would be valuable to devise a method for accurately identifying vehicle theft using a computer.

With the rapid development of deep learning technology, researchers provide various behavior recognition methods based on convolutional neural networks, and the accuracy rate of behavior recognition is greatly improved. Compared with 2-dimensional convolution, the identification method based on the 3-dimensional convolution network can simultaneously carry out modeling of time and space dimensions, and has the characteristics of rapidness and accuracy in a behavior identification task. Vehicle theft is distinguished from a standard behavioural identification data set by the following differences: the proportion of criminals in the monitoring video is small, and the space position information is difficult to capture due to the fact that the moving target is small; the theft behavior in the monitoring video is more concealed, and the action amplitude is small, the action is fast.

The visual attention mechanism is a brain signal processing mechanism unique to human vision. Human vision obtains a target area needing important attention, namely a focus of attention in general, by rapidly scanning a global image, and then puts more attention resources into the area to obtain more detailed information of the target needing attention, and suppresses other useless information. The attention mechanism in deep learning is similar to the selective visual attention mechanism of human beings in nature, and the core target is to select information which is more critical to the current task target from a plurality of information.

Disclosure of Invention

The invention provides a vehicle theft behavior identification method based on a monitoring video, which aims at solving the problem that the existing behavior identification method based on deep learning is directly applied to vehicle theft behavior identification.

The invention is realized by adopting the following technical scheme:

a vehicle theft behavior identification method based on monitoring videos comprises the following steps:

1) the feature extraction module inputs continuous RGB frames with fixed length, and adopts a 3-dimensional convolutional neural network to extract the space-time features of vehicle theft;

2) the time attention module uses an attention mechanism in the time dimension of the space-time characteristics and assigns different weights to the characteristics of each time point, so that the recognition network can capture the action change information on time more easily;

3) the spatial attention module uses an attention mechanism in the spatial dimension of the space-time characteristics and assigns different weights to local characteristics of different spatial positions, so that the identification network is easier to focus on the spatial position of a crime target, and the interference of background factors is reduced;

4) the time-space combination strategy adopts a time-space combination strategy cascaded in time and space, so that a 3-dimensional space-time weight distribution mechanism is realized, and space-time characteristics with higher discrimination are obtained.

The further improvement of the invention is that in the step 1), a pre-trained 3D-inclusion model is adopted to perform transfer learning on vehicle theft behavior data in a training stage for a feature extraction module, network input is continuous RGB image frames with fixed length, extracted 3-dimensional features are sent to a subsequent attention module to perform feature weight distribution, the whole network is trained jointly, and in a testing stage, the trained 3-dimensional neural network is used for feature extraction.

The further improvement of the invention is that in the step 2), a strategy of processing space-time characteristics by using 3-dimensional space pooling and preserving time dimension for weight learning is provided for a time attention module, 3-dimensional space average pooling operation is firstly carried out on the characteristics of a 3-dimensional convolution network, characteristic time dimension information is preserved, one-dimensional time weight fraction is obtained through a layer of convolution and Sigmoid activation function, space dimension expansion is carried out on the time weight fraction to obtain a time weight matrix, then dot product operation is carried out on the time weight fraction and the space-time characteristics, so that different weights are distributed to the characteristics of different time points, the characteristics with obvious action information change are promoted according to the weights, and the characteristics without obvious motion information change are inhibited.

The further improvement of the invention is that in the step 3), a strategy of using 3-dimensional time pooling to process space-time characteristics and reserving space dimensions for weight learning is provided for the space attention module; the method comprises the steps of firstly using 3-dimensional time average pooling operation on 3-dimensional convolution characteristics, reserving characteristic space dimension information, obtaining two-dimensional space weight scores through a layer of convolution and Sigmoid activation function, carrying out time dimension expansion on the space weight scores to obtain a space weight matrix, and then carrying out dot product operation on the space weight scores and the space weight characteristics, so that local characteristics of a target area are highlighted, background interference is reduced, and the problem that criminal targets are small and difficult to identify in vehicle stealing behaviors is solved.

The further improvement of the invention is that in the step 4), a time-space cascade strategy is proposed for the time-space combination strategy; the time attention mechanism is firstly operated on the characteristics of the 3-dimensional convolution network, and then the space attention mechanism is cascaded, so that the self-adaptive weight learning of the 3-dimensional space-time convolution characteristics is realized, and the accuracy of vehicle theft behavior identification is improved.

The invention has at least the following beneficial effects:

the invention provides a vehicle theft behavior identification method based on a monitoring video. The method aims at the problems that criminal targets are small, criminal behaviors are hidden and action amplitude is small in vehicle stealing behaviors under the condition of monitoring videos, and a time-space combined attention mechanism is added into a basic behavior recognition network, so that the problems are effectively solved, and the accuracy of vehicle stealing behavior recognition is improved.

In the training stage, a feature extraction module of the method performs fine tuning training on stealing behavior data by adopting a 3D-inclusion model pre-trained on a large behavior data set kenetics, network input is a continuous RGB image frame with a fixed length, extracted 3-dimensional features are sent to a subsequent attention module for feature modification, and all modules of the whole network are subjected to combined training. In the testing phase, feature extraction uses a trained 3-dimensional neural network.

Further, a time attention module in the method firstly performs 3-dimensional space average pooling operation on the 3-dimensional space-time characteristics, retains time dimension information of the characteristics, obtains one-dimensional time weight fraction through a layer of convolution and Sigmoid activation function, performs space dimension expansion on the time weight fraction and then performs point multiplication operation on the time weight fraction and the space-time characteristics, promotes useful characteristics according to the time weight and inhibits characteristics with little classification effect. By the weight distribution method of the time dimension, the tiny change of the time dimension can be captured, and the problems of hidden action and small amplitude of vehicle stealing behavior are effectively solved.

Further, a space attention module in the method firstly uses 3-dimensional time average pooling operation on 3-dimensional space-time characteristics, retains space dimension information of the characteristics, obtains a two-dimensional space weight score through a layer of convolution and Sigmoid activation function, performs time dimension expansion on the space weight score and then performs multiplication operation on the space weight score and the space-time characteristics, highlights local characteristics of a small target area according to the space weight, can focus on the space position of an action target, reduces background interference, and effectively solves the problem that the crime target is small and difficult to identify in vehicle stealing behaviors.

Furthermore, according to the time-space combination strategy in the method, the time attention mechanism is firstly operated on the 3-dimensional space-time characteristics, then the space attention mechanism is cascaded, and the time-space combination is carried out, so that the self-adaptive weight learning of the 3-dimensional space-time convolution characteristics is realized, the important distinguishing characteristics are emphasized, and the vehicle theft behavior identification precision is improved.

Further, to verify the validity of the method, experimental verification was performed on the established vehicle theft behavior data set. Experiments prove that the time-space joint attention mechanism in the method greatly improves the accuracy of vehicle theft behavior identification.

In summary, the invention provides a vehicle theft behavior identification method based on a monitoring video. Based on the characteristics of vehicle theft behavior under the monitoring video, aiming at the problem that the criminal target criminal behavior in the monitoring video is hidden and the detection is difficult due to small action amplitude, a time attention mechanism is added in the network to capture the change of motion information in time; aiming at the problems that the criminal target in the monitoring video is small and the occupied position in space is small, a space attention mechanism is added into the network; and then, time-space information of the vehicle stealing behavior is modeled by using a time-space combination mechanism of first time and second space, and 3-dimensional space-time characteristics with better discrimination are obtained, so that the accuracy of vehicle stealing behavior identification is improved. The identification accuracy rate on the vehicle theft behavior data set reaches 97.8%, and the effectiveness of the invention in improving the performance of monitoring video vehicle theft behavior identification is verified.

Drawings

Fig. 1 is a flowchart of a vehicle theft behavior identification method based on surveillance video according to the present invention.

FIG. 2 is a flow chart of the temporal attention module of the present invention.

FIG. 3 is a flow chart of the spatial attention module of the present invention.

FIG. 4 is a schematic diagram of the spatio-temporal union strategy of the present invention.

Detailed Description

The invention is explained in detail below with reference to the drawings:

as shown in fig. 1, the vehicle theft identification method based on the spatio-temporal joint attention mechanism provided by the invention comprises the following steps:

1) the feature extraction module adopts a pre-trained 3-dimensional neural network model as a feature extraction network, the network input is a continuous video image with a fixed length, and 3-dimensional space-time convolution features are obtained through the feature extraction network;

2) the time attention module uses an attention mechanism in the time dimension of the 3-dimensional space-time convolution characteristics, and different weights are distributed to the characteristics of each time point, so that the identification network can capture small-amplitude action information more easily;

3) the spatial attention module uses an attention mechanism in the spatial dimension of the 3-dimensional space-time convolution characteristics, and different weights are distributed to the spatial positions of the characteristics, so that the identification network can focus on the spatial positions of the smaller criminal targets more easily;

4) the time-space combined strategy adopts a time-space combined strategy of firstly time and secondly space cascading, so that a 3-dimensional space-time weight distribution mechanism is realized, and the characteristic of more discrimination is obtained.

Specifically, in the feature extraction module in the step 1), in a training stage, a 3D-inclusion model pre-trained on a large motion data set kenetics is adopted to perform fine tuning training on stealing behavior data, a network input is a frame continuous RGB video image, the input size is adjusted to 224 × 3, the extracted 3-dimensional features come from a Mixed _5c layer of the 3D-inclusion network and have the size of 8 × 7, and then the extracted 3-dimensional features are sent to a subsequent attention module to perform self-adaptive weight learning, and all modules of the whole network perform joint training. In the testing phase, feature extraction uses a trained 3-dimensional neural network.

In 2) the time attention module, as shown in fig. 2, firstly, the 8 × 7 features extracted in 1) are reduced in dimension in space, information of time dimension is retained, 8 × 1 features are obtained, and a 3-dimensional average pooling method is adopted, as shown in formula (1):

F'＝Avgpool3D(F,[1,7,71) (1)

wherein F' is the features subjected to pooling operation, F is the network-extracted 3-dimensional features of feature extraction, [. cndot..

The feature F' is then passed through a convolutional layer to increase the non-linearity, and then through a sigmiod operation to obtain [0, 1 ]]Fraction w between_TAs shown in formula (2):

W_T＝sigmoid(Convs(F′)) (2)

and finally, performing space dimension expansion on the obtained time weight fraction of 8 × 1 to obtain a time weight matrix of 8 × 7, and performing dot product operation on the time weight matrix and the feature F to finish distribution of time adaptive weights, wherein the formula is shown in formula (3):

wherein F_TFor the feature after the time weight assignment, S _ Broadcast () is a space dimension expansion operation,

is a matrix dot product operation.

In 3) the spatial attention module, as shown in fig. 3, firstly, the features 8 × 7 obtained in 2) are subjected to dimensionality reduction in the time dimension, information of the spatial dimension is retained, the features 1 × 7 are obtained, and a 3-dimensional average pooling method is adopted, as shown in formula (4):

F″＝Avgpool3D(F_T，[8，1，1]) (4)

wherein F' is a characteristic of being subjected to a pooling operation, F_TFeatures for adaptive weight assignment over time [ ·]Is a parameter of the pooling operation.

The feature F' is then passed through a convolutional layer to increase the non-linearity, and then through a sigmiod operation to obtain [0, 1 ]]Fraction w between_SAs shown in formula (5):

w_S＝sigmoid(Convs(F″)) (5)

finally, the obtained 1 × 7 spatial weight fraction is subjected to time dimension expansion to obtain 8 × 7 spatial weight matrix, and then the spatial weight matrix is combined with the feature F_TAnd performing dot product operation to complete the distribution of the time adaptive weight, as shown in formula (6):

wherein F_TSFor the feature after spatial weight assignment, T _ Broadcast () is a time dimension expansion operation,

is a matrix dot product operation.

In the 4) time-space combination strategy, the 3-dimensional space-time characteristics are firstly subjected to the operation of a time attention mechanism, then the space attention mechanism is cascaded, and the time-space combination is carried out, so that the self-adaptive weight learning of the 3-dimensional space-time convolution characteristics is realized, the important characteristics with discrimination are emphasized, the interference factors are suppressed, and the accuracy of vehicle theft behavior identification is improved.

In order to test the effectiveness of the vehicle theft behavior recognition method in the invention, training and testing are carried out on the established vehicle theft behavior data set, the training set and the testing set are divided according to the ratio of 8:2 in the experiment, and the result of the experiment is shown in table 1. According to the test result, the identification method of the space-time joint attention mechanism can effectively improve the identification accuracy of the vehicle stealing behavior, and the identification accuracy of the behavior cannot be improved after the time or space attention is used independently, because the behavior features are 3-dimensional, the behavior can be better represented by performing time and space joint modeling, the distinguishing features are obtained, and the effectiveness of the time-space cascade joint strategy is verified.

TABLE 1 evaluation of the Algorithm on a vehicle theft data set

Claims

1. A vehicle theft behavior identification method based on a monitoring video is characterized by comprising the following steps:

2. The method for recognizing the vehicle theft behavior based on the surveillance video as claimed in claim 1, wherein in the step 1), for the feature extraction module, in a training stage, a pre-trained 3D-inclusion model is adopted to perform transfer learning on vehicle theft behavior data, network input is a fixed-length continuous RGB image frame, the extracted 3-dimensional features are sent to a subsequent attention module to perform feature weight distribution, the whole network is trained jointly, and in a testing stage, the trained 3-dimensional neural network is used for feature extraction.

3. The vehicle theft behavior identification method based on the monitoring video is characterized in that in the step 2), a strategy of processing space-time characteristics by using 3-dimensional space pooling and reserving time dimensions for weight learning is provided for a time attention module, the characteristics of a 3-dimensional convolution network are firstly subjected to 3-dimensional space average pooling operation, characteristic time dimension information is reserved, one-dimensional time weight scores are obtained through a layer of convolution and Sigmoid activation function, the time weight scores are subjected to space dimension expansion to obtain a time weight matrix, and then point multiplication operation is performed on the time weight matrix and the space-time characteristics, so that different weights are distributed to the characteristics at different time points, the characteristics with obvious action information change are improved according to the weights, and the characteristics without obvious motion information change are inhibited.

4. The vehicle theft behavior identification method based on the surveillance video according to claim 1, wherein in step 3), a strategy of processing spatiotemporal features by using 3-dimensional time pooling and reserving spatial dimensions for weight learning is proposed for a spatial attention module; the method comprises the steps of firstly using 3-dimensional time average pooling operation on 3-dimensional convolution characteristics, reserving characteristic space dimension information, obtaining two-dimensional space weight scores through a layer of convolution and Sigmoid activation function, carrying out time dimension expansion on the space weight scores to obtain a space weight matrix, and then carrying out dot product operation on the space weight scores and the space weight characteristics, so that local characteristics of a target area are highlighted, background interference is reduced, and the problem that criminal targets are small and difficult to identify in vehicle stealing behaviors is solved.

5. The surveillance video-based vehicle theft identification method according to claim 1, wherein in step 4), a time-space cascade strategy is proposed for the time-space combination strategy; the time attention mechanism is firstly operated on the characteristics of the 3-dimensional convolution network, and then the space attention mechanism is cascaded, so that the self-adaptive weight learning of the 3-dimensional space-time convolution characteristics is realized, and the accuracy of vehicle theft behavior identification is improved.