CN113705490A - Anomaly detection method based on reconstruction and prediction - Google Patents

Anomaly detection method based on reconstruction and prediction Download PDF

Info

Publication number
CN113705490A
CN113705490A CN202111016334.4A CN202111016334A CN113705490A CN 113705490 A CN113705490 A CN 113705490A CN 202111016334 A CN202111016334 A CN 202111016334A CN 113705490 A CN113705490 A CN 113705490A
Authority
CN
China
Prior art keywords
frame
anomaly detection
reconstruction
video sequence
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111016334.4A
Other languages
Chinese (zh)
Other versions
CN113705490B (en
Inventor
仲元红
陈霞
朱冬
张建
杨易
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202111016334.4A priority Critical patent/CN113705490B/en
Publication of CN113705490A publication Critical patent/CN113705490A/en
Application granted granted Critical
Publication of CN113705490B publication Critical patent/CN113705490B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of video and image processing, in particular to an anomaly detection method based on reconstruction and prediction, which comprises the following steps: acquiring a test video sequence to be detected; inputting the test video sequence into a pre-trained anomaly detection model; the anomaly detection model firstly extracts spatial appearance characteristics and temporal motion characteristics of a test video sequence respectively, then fuses the spatial appearance characteristics and the temporal motion characteristics to obtain corresponding spatio-temporal characteristics, then obtains corresponding reconstructed frames based on the spatio-temporal characteristics, and finally calculates corresponding anomaly scores according to the reconstructed frames; and taking the abnormal score of the test video sequence as an abnormal detection result. The anomaly detection method can give consideration to both anomaly detection performance and accuracy, so that the anomaly detection effect and efficiency can be improved.

Description

Anomaly detection method based on reconstruction and prediction
Technical Field
The invention relates to the technical field of video and image processing, in particular to an anomaly detection method based on reconstruction and prediction.
Background
Video anomaly detection is an important research task in computer vision, and has many applications, such as traffic accident detection, violence detection and abnormal crowd behavior detection. Due to the uncertainty and diversity of anomalies, despite years of research, accurately identifying anomalous events from normal events remains a challenging task. Meanwhile, in the real world, it is difficult to enumerate all the abnormal events to learn various abnormal patterns. Thus, many studies are based on a class of classification methods to detect abnormalities, rather than a binary classification based on a supervised thought. Anomaly detection based on one class of classification learns the distribution of normal patterns from normal data and calculates the probability that a test sample obeys the distribution to reflect anomalies.
For the problem that the existing anomaly detection method is sensitive to noise and time intervals, chinese patent publication No. CN111680614A discloses "an anomaly behavior detection method based on video monitoring", which extracts features from a target object in a video frame image, then clusters the features, inputs the features into an SVM classifier, obtains the highest abnormal score as the target object, and finally obtains the highest value of the abnormal scores of all the target objects in the video frame image as the abnormal score of the frame image.
The above-mentioned anomaly (behavior) detection method in the existing scheme utilizes a target detection technology to detect a foreground target in each video frame, and inputs the foreground target into a network frame of a convolutional self-encoder to be reconstructed, and classifies through a reconstruction error to judge the anomaly. However, in the conventional anomaly detection method, all pixels in a frame are processed equally, and a model loses focus, and a complex region which is difficult to reconstruct during training cannot be learned and reconstructed preferentially, so that the model cannot effectively obtain a reconstructed image with high-quality foreground (because simple background pixels control the optimization of the model), and the performance of anomaly detection is reduced because the foreground is more important than a static background in the anomaly detection. Meanwhile, the existing reconstruction method tries to minimize the difference between the reconstruction frame and the real label thereof, although the similarity is guaranteed in the pixel space and even the potential space, the similarity is one-to-one constrained, and the similarity of different normal frames in the same scene is ignored, so that the accuracy of abnormal detection is not high. Therefore, how to design an abnormality detection method that can achieve both abnormality detection performance and accuracy is an urgent technical problem to be solved.
Disclosure of Invention
Aiming at the defects of the prior art, the technical problems to be solved by the invention are as follows: how to provide an anomaly detection method which can give consideration to both anomaly detection performance and accuracy, thereby improving anomaly detection effect and efficiency.
In order to solve the technical problems, the invention adopts the following technical scheme:
an anomaly detection method based on reconstruction and prediction, comprising the steps of:
s1: acquiring a test video sequence to be detected;
s2: inputting the test video sequence into a pre-trained anomaly detection model; the anomaly detection model firstly extracts spatial appearance characteristics and temporal motion characteristics of a test video sequence respectively, then fuses the spatial appearance characteristics and the temporal motion characteristics to obtain corresponding spatio-temporal characteristics, then obtains corresponding reconstructed frames based on the spatio-temporal characteristics, and finally calculates corresponding anomaly scores according to the reconstructed frames;
s3: and taking the abnormal score of the test video sequence as an abnormal detection result.
Preferably, the anomaly detection model includes a reconstruction encoder for extracting spatial appearance features, a prediction encoder for extracting temporal motion features, a fusion module connected to outputs of the reconstruction encoder and the prediction encoder and used for fusing to obtain spatio-temporal features, and a decoder connected to an output of the fusion module and used for obtaining a reconstructed frame.
Preferably, in step S2, the current frame of the test video sequence is input to the reconstruction encoder to extract the corresponding spatial appearance feature; a number of frames preceding a current frame of the test video sequence are input to the predictive encoder to extract corresponding temporal motion features.
Preferably, when the anomaly detection model is trained, the video sequence input in the current round is reversely erased based on the reconstruction error of the previous round of the anomaly detection model, so as to remove pixels with reconstruction errors smaller than a preset threshold value in the video sequence, and obtain a corresponding erased frame.
Preferably, ItRepresenting the t-th frame, I, in a video sequencet-ΔRepresents ItThe previous Δ frame;
the reverse erase refers to: after each round of training iterations except the first round, the original frame I is first calculatedtAnd reconstructing the frame
Figure BDA0003240312900000021
Pixel level errors in between; then setting the corresponding pixel value in the mask to be 1 or 0 according to whether the value of the pixel level error is larger than a preset threshold value or not so as to obtain a corresponding mask; finally, before the current round of training, from It-ΔTo ItMultiplying the original frame by the mask pixel by pixel to obtain an erased frame of the current round of the abnormal detection model, which is represented as ItTo It′。
Preferably, when the anomaly detection model is trained, a deep SVDD module is connected to the output of the decoder; the depth SVDD module is used for searching a hypersphere with the smallest volume to contain all or most high-level features of a reconstructed frame of a normal event, and enabling the reconstructed normal frame to be similar by utilizing the compact constraint of the high-level features of the reconstructed frame so as to increase the reconstruction distance between the normal frame and an abnormal frame.
Preferably, the depth SVDD module includes a mapping encoder connected to an output of the decoder, and a hypersphere connected to an output of the mapping encoder; the mapping encoder first reconstructs the frame
Figure BDA0003240312900000022
Mapping into a low-dimensional potential representation and then fitting the low-dimensional representation into a hypersphere with minimal volume to forceThe anomaly detection model learns and extracts common factors of normal events;
the target function of the depth SVDD module is defined as:
Figure BDA0003240312900000023
in the formula: c and R represent the center and radius of the hyper-sphere, respectively, n represents the number of frames,
Figure BDA0003240312900000031
representing reconstructed frames output by a network with parameter W
Figure BDA0003240312900000032
Is represented by argmax {. cndot.) represents a function taking the maximum value.
Preferably, the anomaly detection model is optimized by a training loss function;
reconstructing frames
Figure BDA0003240312900000033
Constrained in pixel space and potential space of the depth SVDD module;
optimizing the anomaly detection model based on intensity loss and weighted RGB loss in pixel space; in the latent space, optimizing the anomaly detection model based on feature compaction loss.
Preferably, the training loss function is represented by the following formula:
L=λintLintrgbLrgbcompactLcompact(ii) a In the formula: l isintDenotes the loss of strength, LrgbRepresenting a weighted RGB loss, LcompcatRepresents the loss of feature compactness, λint、λrgb、λcompactHyperparameters corresponding to each loss, respectively, which determine their contribution to the total training loss;
loss of strength LintCalculated by the following formula:
Figure BDA0003240312900000034
in the formula: t represents the t-th frame of the video sequence, | · | | non-woven cells2Is represented by2A norm;
weighted RGB loss LrgbCalculated by the following formula:
Figure BDA0003240312900000035
in the formula: i | · | purple wind1Is represented by1Norm, N representing the number of previous frames, frame It-iThe weight of (N-i + 1)/N;
the feature compaction loss is calculated by the following formula:
Figure BDA0003240312900000036
in the formula: c and R represent the center and radius of the hyper-sphere, respectively, n represents the number of frames,
Figure BDA0003240312900000037
representing reconstructed frames output by a network with parameter W
Figure BDA0003240312900000038
Is represented by the low dimension of (a).
Preferably, the anomaly detection model calculates the corresponding anomaly score by:
s201: the partial score for each image block in the test video sequence is defined as:
Figure BDA0003240312900000039
in the formula: p represents an image block in the I frame, I and j represent the spatial positions of pixels in the image block, | P | represents the number of pixels in the image block, and the image block is determined by sliding a window with the step size of 4;
s202: calculating an anomaly score for a frame in a test video sequence:
Score=argmax{S(P1),S(P2),...,S(Pm) }; in the formula: the size of P is set to 16X 16, and m representsThe number of image blocks;
s203: after obtaining the score for each frame in the test video sequence, the scores for all frames are normalized to a range of [0,1] to obtain the following frame-level anomaly scores:
Figure BDA0003240312900000041
in the formula: minScoreAnd maxScoreRespectively testing the minimum score and the maximum score in the video sequence;
s204: and smoothing the frame-level abnormal scores in the time dimension by adopting a Gaussian filter to obtain the abnormal scores corresponding to the test video sequence.
Compared with the prior art, the anomaly detection method has the following beneficial effects:
in the invention, the spatial features and the temporal features of the video sequence are respectively extracted through a reconstruction method and a prediction method, and the corresponding spatio-temporal features are obtained through fusion to calculate the reconstruction frame, so that the model does not lose focus, complex regions which are difficult to reconstruct during the prior learning and reconstruction training can be preferentially learned, the reconstructed image with high quality prospect can be effectively obtained, and the anomaly detection performance of the anomaly detection model is further improved; meanwhile, the spatial features and the temporal features are extracted, and the similarity of different normal frames in the same scene is considered, so that the anomaly detection accuracy of the anomaly detection model can be improved. Therefore, the abnormality detection method in the invention has both the performance and the accuracy of abnormality detection, thereby improving the effect and the efficiency of abnormality detection.
In the invention, some pixels are erased from the original frame in a reverse erasing mode to create the input data (namely, an erased frame) of the model, which can reserve the pixels with larger reconstruction errors in the previous round of training, remove the pixels with smaller reconstruction errors, and further force the model to focus on the pixels which are not reconstructed in the previous round, so that the simple background and the complex foreground are reconstructed with high quality, most foreground pixels are reserved in the erased frame, most background pixels are discarded, the model is favorable for automatically forming a focus mechanism for the foreground, and the anomaly detection performance and the accuracy can be considered.
In the invention, the depth SVDD module directly acts on the reconstructed frame, a hypersphere with the smallest volume can be searched to contain all or most high-level features of the reconstructed frame of the normal event, the similarity between the reconstructed images of the normal frame is ensured through similar low-dimensional features in a potential space, the reconstruction distance between the normal frame and the abnormal frame can be effectively increased, and the accuracy of the abnormal detection is further improved.
Drawings
For purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made in detail to the present invention as illustrated in the accompanying drawings, in which:
FIG. 1 is a logic block diagram of an anomaly detection method;
FIG. 2 is a diagram of a network architecture during an anomaly detection model test;
FIG. 3 is a network architecture diagram during anomaly detection model training;
FIG. 4 is a partial example graph of three data sets (exceptional events are marked by bounding boxes);
FIG. 5 is a network architecture diagram of an encoder and decoder;
FIG. 6 is a graph showing qualitative results of frame reconstruction for three data sets (lighter colors represent greater error);
FIG. 7 is a graph showing a comparison of anomaly scores;
FIG. 8 is a diagram illustrating a comparison of average scores for normal and abnormal frames;
FIG. 9 is a diagram illustrating the visualization of reverse erasure during different training sessions;
FIG. 10 is a graph showing a comparison of model training loss with and without reverse erase on Ped 2;
FIG. 11 is a diagram of the model visualization with and without reverse erase on Avenue and Ped 2;
FIG. 12 is a diagram of a t-SNE visualization of a low-dimensional representation of reconstructed frames in Avenue and Ped 2.
Detailed Description
The following is further detailed by the specific embodiments:
example (b):
the embodiment discloses an anomaly detection method based on reconstruction and prediction.
As shown in fig. 1, the anomaly detection method based on reconstruction and prediction includes the following steps:
s1: acquiring a test video sequence to be detected;
s2: inputting a test video sequence into a pre-trained anomaly detection model; the anomaly detection model firstly extracts spatial appearance characteristics and temporal motion characteristics of a test video sequence respectively, then fuses the spatial appearance characteristics and the temporal motion characteristics to obtain corresponding spatio-temporal characteristics, then obtains corresponding reconstructed frames based on the spatio-temporal characteristics, and finally calculates corresponding anomaly scores according to the reconstructed frames;
s3: and taking the abnormal score of the test video sequence as an abnormal detection result.
In a specific implementation process, as shown in fig. 2, the anomaly detection model (Dual-Encoder Single-Decoder network, DESDnet) includes a reconstruction Encoder for extracting spatial appearance features, a prediction Encoder for extracting temporal motion features, a fusion module connected to outputs of the reconstruction Encoder and the prediction Encoder and used for obtaining temporal-spatial features through fusion, and a Decoder connected to an output of the fusion module and used for obtaining a reconstructed frame. Specifically, the fusion module of the anomaly detection model comprises a two-dimensional convolution layer and a Tanh activation layer; the convolution kernel of the two-dimensional convolution layer is 1 × 1 in size and the channel is 512. Inputting a current frame of the test video sequence to the reconstruction encoder to extract corresponding spatial appearance features; a number of frames preceding a current frame of the test video sequence are input to the predictive encoder to extract corresponding temporal motion features. In the test phase, from It-ΔTo ItIs input to a reconstruction encoder and a prediction encoder to extract spatial and temporal features, respectively, of the video sequence. The appearance characteristic atAnd motion characteristics mtInputting the cascade into a fusion module to obtain corresponding space-time characteristics; compared with the fusion method of the tandem characteristic, the method of the invention savesThe calculated amount is increased, and the expression capability of the model is improved. Furthermore, the spatio-temporal features are input into a decoder, and a reconstructed frame is obtained by performing a deconvolution
Figure BDA0003240312900000051
In the invention, the spatial features and the temporal features of the video sequence are respectively extracted through a reconstruction method and a prediction method, and the corresponding spatio-temporal features are obtained through fusion to calculate the reconstruction frame, so that the model does not lose focus, complex regions which are difficult to reconstruct during the prior learning and reconstruction training can be preferentially learned, the reconstructed image with high quality prospect can be effectively obtained, and the anomaly detection performance of the anomaly detection model is further improved; meanwhile, the spatial features and the temporal features are extracted, and the similarity of different normal frames in the same scene is considered, so that the anomaly detection accuracy of the anomaly detection model can be improved. Therefore, the abnormality detection method in the invention has both the performance and the accuracy of abnormality detection, thereby improving the effect and the efficiency of abnormality detection.
In a specific implementation process, as shown in fig. 3, when the anomaly detection model is trained, the video sequence input in the current round is reversely erased based on the reconstruction error in the previous round of the anomaly detection model, so as to remove pixels in the video sequence whose reconstruction error is smaller than the preset threshold value, and obtain a corresponding erased frame.
In particular, ItRepresenting the t-th frame, I, in a video sequencet-ΔRepresents ItThe previous Δ frame;
reverse erase refers to: after each round of training iterations except the first round, the original frame I is first calculatedtAnd reconstructing the frame
Figure BDA0003240312900000061
Pixel level errors in between; then, according to whether the value of the pixel level error is larger than a preset threshold value or not, setting the corresponding pixel value in the mask to be 1 or 0 to obtain a corresponding mask; finally, before the current round of training, from It-ΔTo ItMultiplying the original frame and the mask pixel by pixel to obtain the erasing frame of the current round of the abnormal detection modelIs represented by l't-ΔTo l't. In the training phase, slave I 'is given't-ΔTo l'tOf erased frame, I'tIs input to a reconstruction encoder to extract appearance features in the spatial domain, denoted atFrom l't-ΔTo l't-1Is input to a predictive coder to extract motion features in the temporal domain, denoted mt(ii) a Compared with capturing motion patterns by using optical flow, the method avoids the inaccuracy and high calculation cost caused by optical flow calculation.
In the invention, some pixels are erased from the original frame in a reverse erasing mode to create the input data (namely, an erased frame) of the model, which can reserve the pixels with larger reconstruction errors in the previous round of training, remove the pixels with smaller reconstruction errors, and further force the model to focus on the pixels which are not reconstructed in the previous round, so that the simple background and the complex foreground are reconstructed with high quality, most foreground pixels are reserved in the erased frame, most background pixels are discarded, the model is favorable for automatically forming a focus mechanism for the foreground, and the anomaly detection performance and the accuracy can be considered. Meanwhile, the input uncertainty change enables the anomaly detection model to have stronger robustness to noise, the model does not lose focus, complex areas which are difficult to reconstruct during training can be preferentially learned and reconstructed, reconstructed images with high quality prospects can be effectively obtained, and anomaly detection performance of the anomaly detection model is improved.
In a specific implementation process, as shown in fig. 3, when the anomaly detection model is trained, a depth SVDD module is connected to the output of the decoder; the depth SVDD module is used for searching a hypersphere with the smallest volume to contain all or most high-level features of a reconstructed frame of a normal event, and enabling the reconstructed normal frame to be similar by utilizing the compact constraint of the high-level features of the reconstructed frame so as to increase the reconstruction distance between the normal frame and an abnormal frame.
Specifically, the depth SVDD module includes a mapping encoder connected to an output of the decoder, and a hypersphere connected to an output of the mapping encoder; the mapping encoder first reconstructs the frame
Figure BDA0003240312900000062
Mapping into a low-dimensional potential representation, and then fitting the low-dimensional representation into a hypersphere with a minimum volume to force an anomaly detection model to learn and extract common factors of normal events;
the objective function of the depth SVDD module is defined as:
Figure BDA0003240312900000071
in the formula: c and R represent the center and radius of the hyper-sphere, respectively, n represents the number of frames,
Figure BDA0003240312900000072
representing reconstructed frames output by a network with parameter W
Figure BDA0003240312900000073
Is represented by argmax {. cndot.) represents a function taking the maximum value. In the objective function, a first term is used to minimize the volume of the hypersphere, and a second term is a penalty term for samples located outside the hypersphere; the hyperparameter v ∈ (0, 1)]For measuring the volume and boundary loss of a hypersphere; large v means that some samples are allowed to fall outside the hypersphere, which can be very penalized if v is small; optimizing a network parameter W and a radius R by a block coordinate descent and alternative minimization method; namely, a fixed R, a network iteration k is a secondary optimization parameter W; after k times, the latest W is used again to optimize R.
In the invention, the depth SVDD module directly acts on the reconstructed frame, a hypersphere with the smallest volume can be searched to contain all or most high-level features of the reconstructed frame of the normal event, the similarity between the reconstructed images of the normal frame is ensured through similar low-dimensional features in a potential space, the reconstruction distance between the normal frame and the abnormal frame can be effectively increased, and the accuracy of the abnormal detection is further improved.
In the specific implementation process, an anomaly detection model is optimized through a training loss function;
reconstructing frames
Figure BDA0003240312900000076
Constrained in pixel space and in the potential space of the depth SVDD module;
optimizing an anomaly detection model based on intensity loss and weighted RGB loss in pixel space; in the latent space, an anomaly detection model is optimized based on feature compaction loss.
Specifically, the training loss function is represented by the following formula:
L=λintLintrgbLrgbcompactLcompact(ii) a In the formula: l isintDenotes the loss of strength, LrgbRepresenting a weighted RGB loss, LcompcatRepresents the loss of feature compactness, λint、λrgb、λcompactHyperparameters corresponding to each loss, respectively, which determine their contribution to the total training loss;
loss of strength LintCalculated by the following formula:
Figure BDA0003240312900000074
in the formula: t represents the t-th frame of the video sequence, | · | | non-woven cells2Is represented by2A norm;
weighted RGB loss LrgbCalculated by the following formula:
Figure BDA0003240312900000075
in the formula: i | · | purple wind1Is represented by1Norm, N representing the number of previous frames, frame It-iThe weight of (N-i + 1)/N;
the feature compaction loss is calculated by the following formula:
Figure BDA0003240312900000081
in the formula: c and R represent the center and radius of the hyper-sphere, respectively, n represents the number of frames,
Figure BDA0003240312900000082
representing reconstructed frames output by a network with parameter W
Figure BDA0003240312900000083
Is represented by the low dimension of (a).
In order to constrain the reconstruction of all normal frames within the reachable range, the mean of the feature vectors of the reconstructed frames extracted by the first round of training model is taken as the center c. In subsequent training, the Euclidean distance between the feature representation of the reconstructed frame and the center c is calculated, and the feature compaction loss is obtained according to the distance.
In the present invention, by minimizing the feature compaction loss, the model can automatically map the reconstruction of normal frames to near the center of the hypersphere to obtain a compact description of normal events. Therefore, the feature of the reconstructed frame containing the normal event is close to the center of the hypersphere, and the feature of the abnormal event is far from the center and even falls outside the hypersphere, which means that the reconstructed images of all the normal frames in the pixel space are more similar, and the reconstructed image of the abnormal frame is more different from the reconstructed image of the normal frame, so that the distinctiveness of the abnormality can be increased, and the abnormality detection performance and accuracy of the abnormality detection model can be improved.
In a specific implementation process, the anomaly detection model calculates the corresponding anomaly score through the following steps:
s201: the partial score for each image block in the test video sequence is defined as:
Figure BDA0003240312900000084
in the formula: p represents an image block in the I frame, I and j represent the spatial positions of pixels in the image block, | P | represents the number of pixels in the image block, and the image block is determined by sliding a window with the step size of 4;
s202: calculating an anomaly score for a frame in a test video sequence:
Score=argmax{S(P1),S(P2),...,S(Pm) }; in the formula: the size of P is set to 16 × 16, m representing the number of image blocks;
s203: after obtaining the score for each frame in the test video sequence, the scores for all frames are normalized to a range of [0,1] to obtain the following frame-level anomaly scores:
Figure BDA0003240312900000085
in the formula: minScoreAnd maxScoreRespectively testing the minimum score and the maximum score in the video sequence;
s204: and smoothing the frame-level abnormal scores in the time dimension by adopting a Gaussian filter to obtain the abnormal scores corresponding to the test video sequence.
According to the invention, the abnormal score of the test video sequence can be effectively calculated through the steps, and then the abnormal behavior or the abnormal event in the test video sequence can be detected based on the abnormal score, so that the effect of abnormal detection can be assisted and improved.
In order to better illustrate the advantages of the anomaly detection method of the present invention, the present embodiment also discloses the following experiments:
the experiment was performed on three publicly available data sets, shown in fig. 4, which are a CUHK Avenue data set, a UCSD pedistrin data set, and a university campus anomaly detection data set, respectively.
According to the network structure parameters in fig. 5, the model in the present invention is implemented on a pytorech.
In order to train the model, Adam's algorithm with an initial learning rate of 0.0002 was introduced and the learning rate was attenuated using a cosine annealing method. The batch size was set to 4, and the number of training rounds on CUHK Avenue, UCSD Ped2 and a university campus anomaly detection dataset were 60, 60 and 10, respectively. For all data sets, the frame is resized to 256 × 256 pixels, with pixel intensities normalized to a range of [ -1,1 ]. The total length of the input frame is set to 5, i.e., Δ ═ 4.
In the training loss function, the hyperparameter λint、λrgb、λcompactSet to 1, 0.2, 0.01, respectively. The v of the depth SVDD module is set to 0.1 to ensure the model's tolerance to various normal modes. To reduce the memory required for computation, the embodiment does not include one for each training setCalculating a special mask by the frame; instead, an or operation is performed on these masks to generate a generic mask for erasure in the next round of training. The whole experiment is carried out on a computer running a Linux Ubuntu16.04 operating system, wherein Intel (R) core (TM) i7-7800xCPU @3.50GHz is adopted, and a display card is GeForce GTX 1080 with 8GB memory.
The CUHKAVAnue dataset; the method comprises the following steps of (1) including 37 videos, wherein 16 videos with 15328 frames are used for training a model, and the rest 21 videos with 15324 frames are used for evaluating the abnormal detection performance of the model; in this dataset, 47 exceptional events including wandering, throwing objects, and running were observed at a 640 x 360 resolution per frame.
A UCSD Peertramin data set; comprises a Ped1(UCSD Pedestrainin 1) dataset and a Ped2(UCSD Pedestrainin 2) dataset; experiments were performed on Ped2, but not on Ped1, since the 158 × 238 frame resolution in Ped1 is rather low; in Ped2, there are 16 training videos and 12 test videos, each video not exceeding 200 frames; the resolution of the video frame is 360 × 240; there are 12 irregularities in the Ped2 dataset that manifest primarily as objects with an abnormal appearance, such as bicycles and trucks on sidewalks.
An anomaly detection dataset for a university campus; the video anomaly detection data set is a very challenging video anomaly detection data set and consists of 13 scenes and more than 27 ten thousand training frames; it contains 330 training videos and 107 test videos; the resolution of each frame is 856 multiplied by 480; there are 130 abnormal events in the abnormal detection data set of a university campus, including the occurrence of bicycles, skateboards, etc.
This example evaluates the performance of anomaly detection by AUC (Area Under the Curve).
First, the anomaly detection model of the present invention
Comparing the anomaly detection model of the present invention with typical conventional methods and the latest methods based on deep learning, including: DeepOC, Stacked RNN, Liu et al, Lu et al, MESDnet, MemAE, STAE, ST-CaAE, Kim, and the like. The AUC performance of each model is shown in table 1.
TABLE 1AUC Performance comparison results
Figure BDA0003240312900000101
As can be seen from table 1, the model of the invention achieved good AUC performance on three different data sets, showing great competitiveness compared to the state of the art methods. On the data sets of CUHK Avenue and UCSD Ped2, the AUC performance of the model reaches 89.9% and 97.5% respectively, which is superior to the detection performance of other methods. A university campus anomaly detection dataset is a new dataset in video anomaly detection, so only a few studies provide test results for this dataset.
On a university campus anomaly detection dataset, the AUC of the model of the invention did not reach the best AUC performance, but was only 1.1% lower than the highest value. Furthermore, in order to visually observe the detection performance, fig. 6 provides qualitative results of the model for frame reconstruction on three data sets, and in conjunction with fig. 6, the normal regions can be well reconstructed, while the abnormal regions cannot.
Second, reconstruction and prediction models relating to the present invention
In order to evaluate the effect of reconstruction and prediction fusion in the present invention, a reconstruction encoder, a prediction encoder, and a decoder are combined to obtain three different models: 1) reconstruction model consisting of reconstruction encoder and decoder, with frame ItIs input; 2) prediction model consisting of a predictive coder and decoder, with It-ΔTo It-1The frame of (a) is input; 3) consisting of a reconstruction encoder, a prediction encoder and a decoder, with It-ΔTo ItIs input. In order to keep up with the proposed model, a jump connection is used between the encoder and the decoder of the prediction model. For the training of each model, pixel intensity loss, weighted RGB loss, and feature compactness loss are used to supervise the training. Through the models, the performance of the reconstruction model and the prediction model in independently detecting the abnormity can be obtained.
FIG. 7 shows the anomaly scores for video sequences of the Avenue and Ped2 datasets on the three models described above. The result shows that the model of the invention always generates larger reconstruction error for the abnormal frame and smaller error for the normal frame; the average of the normal and abnormal scores and the gap between them are shown in fig. 8. Overall, the score difference of the model of the invention on each data set is the largest, which indicates that the model of the invention has better detection performance. In addition, the AUC listed in table 2 also demonstrates that neither the reconstructed model nor the predicted model achieves the AUC performance achieved by the combination of the models of the present invention.
TABLE 2 AUC comparison of different models
Figure BDA0003240312900000111
Third, the reconstruction error reverse erasure related to the present invention
Fig. 9 shows masks for erasure at different training periods, and frame images before and after erasure. As can be seen from fig. 9, the erased pixels in each round are mainly background pixels, which helps the model to focus more on complex foreground; and as the number of training rounds increases, more background pixels remain in the erased frame, indicating that the reconstruction error gap between the foreground and the background is decreasing. This reflects that the reverse erasure can effectively direct the model to reduce the reconstruction error of the foreground pixels. This can also be verified in the reconstructed error map provided in fig. 9.
To better demonstrate the advantages of the reverse erase of the present invention, the present example performed ablation experiments on the reverse erase: the training penalty for the model with and without reverse erasure at Ped2 is shown in fig. 10; although fig. 10 shows that the model with reverse erasure does not significantly reduce the training loss, it can be found that the reduction in training loss is dominated by foreground pixels, rather than background pixels, as compared to fig. 9; conversely, models without reverse erasure lose guidance, look the same for all regions, resulting in simple background dominated model convergence. Finally, we list the AUC performance of the models with and without reverse erasure in table 3 and give a visual comparison in fig. 11. The result shows that the reverse erasure model of the invention has better detection performance.
TABLE 3 AUC comparison of models without and with reverse erase
Figure BDA0003240312900000112
Fourth, the depth SVDD module related to the invention
Based on the t-distributed Stochastic Neighbor Embedding (t-SNE) method, FIG. 12 provides a t-SNE visualization of a low-dimensional representation of a reconstructed frame on an Avenue and Ped2 dataset. It can be observed that in three dimensions, especially in the Ped2 dataset, most normal data are clustered in the form of close spheres, and abnormal data are scattered outside the spheres. This result is attributed to the loss of feature compaction based on the depth SVDD, which is directed to finding a minimal volume hypersphere containing normal data but not abnormal data.
To verify the advantages of applying deep SVDD after the decoder, three methods were explored in this experiment: 1) the mapping encoder after the decoder is removed, has no constraint on characteristics, is a simple double-encoding single-decoding structure and is represented as DESD; 2) depth SVDD is performed at the bottleneck between the encoder and decoder, i.e. the spatio-temporal representation of the input frame is mapped into a compact hypersphere, denoted DE-SVDD-SD; 3) a depth SVDD, denoted DESD-SVDD, is performed after the decoder.
The AUC performance of the different methods is summarized in table 4. In the table, the characteristic AUC is calculated from the distance of the low-dimensional characteristic of the frame from the center of the hypersphere. First, the distance is defined as follows:
Figure BDA0003240312900000121
in the formula: w*Parameters representing a pre-trained network; a large distance means that the low dimensional features of the frame deviate more strongly from the normal mode.
The abnormality score is expressed as
Figure BDA0003240312900000122
From table 4, it can be observed that DESD-SVDD achieves the highest AUC on both datasets, whether frame-based AUC or feature-based AUC. The frame AUC of DE-SVDD-SD is lower than that of DESD-SVDD, confirming that the decoder reconstructed abnormal frames may not approach normal frames due to the strong representation capability of CNNs even if the high-level features are limited.
TABLE 4 AUC comparison of potential feature spaces under different constraints
Figure BDA0003240312900000123
Fifth, weighted RGB loss with respect to the invention
The effect of weighted RGB loss was studied by comparison with the motion loss from which the RGB difference between two adjacent frames was calculated. Table 5 shows that weighted RGB loss can give higher AUC on both the Ped2 and Avenue datasets.
TABLE 5 AUC Performance under different motion constraints
Figure BDA0003240312900000124
Furthermore, in experiments, it was found that the RGB penalty λ would be weightedrgbA fixed parameter of 0.2 can achieve good detection performance on different data sets. Take the Ped2 data set as an example to carry out lambdargbThe results of the experiments are summarized in table 6.
TABLE 6 AUC comparison of weighted RGB loss for different weights on Ped2 dataset
Figure BDA0003240312900000125
Sixth, conclusion
The experiment researches the problem that in the traditional video anomaly detection based on deep learning, network optimization is not important, and the similarity between different normal frames is neglected. In the invention, each frame in a video is reconstructed through a double-encoder single-decoder network of an anomaly detection module, and a training strategy is provided, which comprises reverse erasure and depth SVDD based on reconstruction errors to standardize the training of the network. In the training, according to the reconstruction error of the previous round of training, the pixel with smaller error in the original frame is removed, and then the frame is input into the model, so that the model is concentrated on the pixel with larger learning error, and the reconstruction quality is improved; in addition, the application of the depth SVDD maps the reconstruction of the normal frame into the minimum-volume hypersphere, making the reconstruction of the abnormal frame easier to identify. Experimental results on three data sets showed that the method of the invention has a competitive advantage compared to the existing methods.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that, while the invention has been described with reference to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Meanwhile, the detailed structures, characteristics and the like of the common general knowledge in the embodiments are not described too much. Finally, the scope of the claims should be determined by the content of the claims, and the description of the embodiments and the like in the specification should be used for interpreting the content of the claims.

Claims (10)

1. An anomaly detection method based on reconstruction and prediction, characterized by comprising the steps of:
s1: acquiring a test video sequence to be detected;
s2: inputting the test video sequence into a pre-trained anomaly detection model; the anomaly detection model firstly extracts spatial appearance characteristics and temporal motion characteristics of a test video sequence respectively, then fuses the spatial appearance characteristics and the temporal motion characteristics to obtain corresponding spatio-temporal characteristics, then obtains corresponding reconstructed frames based on the spatio-temporal characteristics, and finally calculates corresponding anomaly scores according to the reconstructed frames;
s3: and taking the abnormal score of the test video sequence as an abnormal detection result.
2. The reconstruction and prediction based anomaly detection method according to claim 1, characterized by: the anomaly detection model comprises a reconstruction encoder used for extracting spatial appearance characteristics, a prediction encoder used for extracting temporal motion characteristics, a fusion module which is connected with the outputs of the reconstruction encoder and the prediction encoder and is used for obtaining space-time characteristics through fusion, and a decoder which is connected with the output of the fusion module and is used for obtaining a reconstruction frame.
3. The reconstruction and prediction based anomaly detection method according to claim 2, characterized by: in step S2, inputting the current frame of the test video sequence to the reconstruction encoder to extract the corresponding spatial appearance feature; a number of frames preceding a current frame of the test video sequence are input to the predictive encoder to extract corresponding temporal motion features.
4. The reconstruction and prediction based anomaly detection method according to claim 2, characterized by: and when the anomaly detection model is trained, reversely erasing the video sequence input in the current round based on the reconstruction error of the previous round of the anomaly detection model so as to remove pixels with the reconstruction error smaller than a preset threshold value in the video sequence and obtain a corresponding erased frame.
5. The reconstruction and prediction based anomaly detection method according to claim 4, characterized by:
Itrepresenting the t-th frame, I, in a video sequencet-ΔRepresents ItThe previous Δ frame;
the reverse erase refers to: after each round of training iterations except the first round, the original frame I is first calculatedtAnd reconstructing the frame
Figure FDA0003240312890000011
Pixel level errors in between; then theSetting the corresponding pixel value in the mask to 1 or 0 to obtain a corresponding mask according to whether the value of the pixel level error is greater than a preset threshold; finally, before the current round of training, from It-ΔTo ItIs multiplied by a mask pixel by pixel to obtain an erasure frame of the current round of the abnormal detection model, which is represented as I't-ΔTo l't
6. The reconstruction and prediction based anomaly detection method according to claim 4, characterized by: when the anomaly detection model is trained, a depth SVDD module is connected to the output of the decoder; the depth SVDD module is used for searching a hypersphere with the smallest volume to contain all or most high-level features of a reconstructed frame of a normal event, and enabling the reconstructed normal frame to be similar by utilizing the compact constraint of the high-level features of the reconstructed frame so as to increase the reconstruction distance between the normal frame and an abnormal frame.
7. The reconstruction and prediction based anomaly detection method according to claim 6, characterized by: the depth SVDD module comprises a mapping encoder connected to an output of the decoder, and a hypersphere connected to an output of the mapping encoder; the mapping encoder first reconstructs the frame
Figure FDA0003240312890000021
Mapping into a low-dimensional potential representation and then fitting the low-dimensional representation into a hypersphere with minimal volume to force the anomaly detection model to learn to extract common factors for normal events;
the target function of the depth SVDD module is defined as:
Figure FDA0003240312890000022
in the formula: c and R represent the center and radius of the hyper-sphere, respectively, n represents the number of frames,
Figure FDA0003240312890000023
is represented byReconstructed frame of network output with parameter W
Figure FDA0003240312890000024
Is represented by argmax {. cndot.) represents a function taking the maximum value.
8. The reconstruction and prediction based anomaly detection method according to claim 6, characterized by: optimizing the anomaly detection model by a training loss function;
reconstructing frames
Figure FDA00032403128900000210
Constrained in pixel space and potential space of the depth SVDD module;
optimizing the anomaly detection model based on intensity loss and weighted RGB loss in pixel space; in the latent space, optimizing the anomaly detection model based on feature compaction loss.
9. The reconstruction and prediction based anomaly detection method according to claim 8, characterized by: the training loss function is represented by the following formula:
L=λintLintrgbLrgbcompactLcompact(ii) a In the formula: l isintDenotes the loss of strength, LrgbRepresenting a weighted RGB loss, LcompcatRepresents the loss of feature compactness, λint、λrgb、λcompactHyperparameters corresponding to each loss, respectively, which determine their contribution to the total training loss;
loss of strength LintCalculated by the following formula:
Figure FDA0003240312890000025
in the formula: t represents the t-th frame of the video sequence, | · | | non-woven cells2Is represented by2A norm;
weighted RGB loss LrgbCalculated by the following formula:
Figure FDA0003240312890000026
in the formula: i | · | purple wind1Is represented by1Norm, N representing the number of previous frames, frame It-iThe weight of (N-i + 1)/N;
the feature compaction loss is calculated by the following formula:
Figure FDA0003240312890000027
in the formula: c and R represent the center and radius of the hyper-sphere, respectively, n represents the number of frames,
Figure FDA0003240312890000028
representing reconstructed frames output by a network with parameter W
Figure FDA0003240312890000029
Is represented by the low dimension of (a).
10. The reconstruction and prediction based anomaly detection method according to claim 1, characterized by: the anomaly detection model calculates a corresponding anomaly score by:
s201: the partial score for each image block in the test video sequence is defined as:
Figure FDA0003240312890000031
in the formula: p represents an image block in the I frame, I and j represent the spatial positions of pixels in the image block, | P | represents the number of pixels in the image block, and the image block is determined by sliding a window with the step size of 4;
s202: calculating an anomaly score for a frame in a test video sequence:
Score=argmax{S(P1),S(P2),...,S(Pm) }; in the formula: the size of P is set to 16 × 16, m representing the number of image blocks;
s203: after obtaining the score for each frame in the test video sequence, the scores for all frames are normalized to a range of [0,1] to obtain the following frame-level anomaly scores:
Figure FDA0003240312890000032
in the formula: minScoreAnd maxScoreRespectively testing the minimum score and the maximum score in the video sequence;
s204: and smoothing the frame-level abnormal scores in the time dimension by adopting a Gaussian filter to obtain the abnormal scores corresponding to the test video sequence.
CN202111016334.4A 2021-08-31 2021-08-31 Anomaly detection method based on reconstruction and prediction Active CN113705490B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111016334.4A CN113705490B (en) 2021-08-31 2021-08-31 Anomaly detection method based on reconstruction and prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111016334.4A CN113705490B (en) 2021-08-31 2021-08-31 Anomaly detection method based on reconstruction and prediction

Publications (2)

Publication Number Publication Date
CN113705490A true CN113705490A (en) 2021-11-26
CN113705490B CN113705490B (en) 2023-09-12

Family

ID=78658335

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111016334.4A Active CN113705490B (en) 2021-08-31 2021-08-31 Anomaly detection method based on reconstruction and prediction

Country Status (1)

Country Link
CN (1) CN113705490B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114896307A (en) * 2022-06-30 2022-08-12 北京航空航天大学杭州创新研究院 Time series data enhancement method and device and electronic equipment
CN115527151A (en) * 2022-11-04 2022-12-27 南京理工大学 Video anomaly detection method and system, electronic equipment and storage medium
CN116450880A (en) * 2023-05-11 2023-07-18 湖南承希科技有限公司 Intelligent processing method for vehicle-mounted video of semantic detection

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109116834A (en) * 2018-09-04 2019-01-01 湖州师范学院 A kind of batch process fault detection method based on deep learning
CN109359519A (en) * 2018-09-04 2019-02-19 杭州电子科技大学 A kind of video anomaly detection method based on deep learning
US20190080210A1 (en) * 2017-09-13 2019-03-14 Hrl Laboratories, Llc Independent component analysis of tensors for sensor data fusion and reconstruction
CN109615019A (en) * 2018-12-25 2019-04-12 吉林大学 Anomaly detection method based on space-time autocoder
US20190135300A1 (en) * 2018-12-28 2019-05-09 Intel Corporation Methods and apparatus for unsupervised multimodal anomaly detection for autonomous vehicles
CN111402237A (en) * 2020-03-17 2020-07-10 山东大学 Video image anomaly detection method and system based on space-time cascade self-encoder
US20200292608A1 (en) * 2019-03-13 2020-09-17 General Electric Company Residual-based substation condition monitoring and fault diagnosis
CN112990279A (en) * 2021-02-26 2021-06-18 西安电子科技大学 Radar high-resolution range profile library outside target rejection method based on automatic encoder
CN113052831A (en) * 2021-04-14 2021-06-29 清华大学 Brain medical image anomaly detection method, device, equipment and storage medium
CN113240011A (en) * 2021-05-14 2021-08-10 烟台海颐软件股份有限公司 Deep learning driven abnormity identification and repair method and intelligent system
CN113255518A (en) * 2021-05-25 2021-08-13 神威超算(北京)科技有限公司 Video abnormal event detection method and chip

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190080210A1 (en) * 2017-09-13 2019-03-14 Hrl Laboratories, Llc Independent component analysis of tensors for sensor data fusion and reconstruction
CN109116834A (en) * 2018-09-04 2019-01-01 湖州师范学院 A kind of batch process fault detection method based on deep learning
CN109359519A (en) * 2018-09-04 2019-02-19 杭州电子科技大学 A kind of video anomaly detection method based on deep learning
CN109615019A (en) * 2018-12-25 2019-04-12 吉林大学 Anomaly detection method based on space-time autocoder
US20190135300A1 (en) * 2018-12-28 2019-05-09 Intel Corporation Methods and apparatus for unsupervised multimodal anomaly detection for autonomous vehicles
US20200292608A1 (en) * 2019-03-13 2020-09-17 General Electric Company Residual-based substation condition monitoring and fault diagnosis
CN111402237A (en) * 2020-03-17 2020-07-10 山东大学 Video image anomaly detection method and system based on space-time cascade self-encoder
CN112990279A (en) * 2021-02-26 2021-06-18 西安电子科技大学 Radar high-resolution range profile library outside target rejection method based on automatic encoder
CN113052831A (en) * 2021-04-14 2021-06-29 清华大学 Brain medical image anomaly detection method, device, equipment and storage medium
CN113240011A (en) * 2021-05-14 2021-08-10 烟台海颐软件股份有限公司 Deep learning driven abnormity identification and repair method and intelligent system
CN113255518A (en) * 2021-05-25 2021-08-13 神威超算(北京)科技有限公司 Video abnormal event detection method and chip

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
NANJUN LI 等: "Spatial-Temporal Cascade Autoencoder for video Anomaly Detection in Crowded Scenes", 《IEEE TRANSACTIONS ON MULTIMEDIA》, vol. 23, pages 203 - 215, XP011826794, DOI: 10.1109/TMM.2020.2984093 *
PENG WU 等: "A Deep One-Class Neural Network for Anomalous Event Detection in Complex Scenes", 《IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS》, vol. 31, no. 07, pages 2609 - 2622, XP011797552, DOI: 10.1109/TNNLS.2019.2933554 *
YUANHONG ZHONG 等: "Reverse erasure guided spatio-temporal autoencoder with compact feature representation for video anomaly detection", 《SCIENCE CHINA INFORMATION SCIENCES》, vol. 65, no. 09, pages 1 - 3 *
夏火松 等: "基于自编码器和集成学习的半监督异常检测算法", 《计算机工程与科学》, vol. 42, no. 08, pages 1440 - 1447 *
张力: "监控场景中的视频异常事件检测研究", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 2021, pages 136 - 431 *
王丰华 等: "基于Bayes估计相空间融合和CM-SVDD的有载分接开关机械故障诊断", 《中国电机工程学报》, vol. 40, no. 01, pages 358 - 368 *
邓描 等: "一种基于特征正则约束的异常检测方法", 《四川大学学报(自然科学版)》, vol. 57, no. 06, pages 1077 - 1083 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114896307A (en) * 2022-06-30 2022-08-12 北京航空航天大学杭州创新研究院 Time series data enhancement method and device and electronic equipment
CN114896307B (en) * 2022-06-30 2022-09-27 北京航空航天大学杭州创新研究院 Time series data enhancement method and device and electronic equipment
CN115527151A (en) * 2022-11-04 2022-12-27 南京理工大学 Video anomaly detection method and system, electronic equipment and storage medium
CN115527151B (en) * 2022-11-04 2023-07-11 南京理工大学 Video anomaly detection method, system, electronic equipment and storage medium
CN116450880A (en) * 2023-05-11 2023-07-18 湖南承希科技有限公司 Intelligent processing method for vehicle-mounted video of semantic detection
CN116450880B (en) * 2023-05-11 2023-09-01 湖南承希科技有限公司 Intelligent processing method for vehicle-mounted video of semantic detection

Also Published As

Publication number Publication date
CN113705490B (en) 2023-09-12

Similar Documents

Publication Publication Date Title
Li et al. Flow-grounded spatial-temporal video prediction from still images
Zhao et al. Spatio-temporal autoencoder for video anomaly detection
CN109101896B (en) Video behavior identification method based on space-time fusion characteristics and attention mechanism
Abu Farha et al. When will you do what?-anticipating temporal occurrences of activities
CN109146921B (en) Pedestrian target tracking method based on deep learning
Sun et al. Lattice long short-term memory for human action recognition
CN108805015B (en) Crowd abnormity detection method for weighted convolution self-coding long-short term memory network
Ahuja et al. Probabilistic modeling of deep features for out-of-distribution and adversarial detection
CN108875624B (en) Face detection method based on multi-scale cascade dense connection neural network
CN113705490B (en) Anomaly detection method based on reconstruction and prediction
CN106127804B (en) The method for tracking target of RGB-D data cross-module formula feature learnings based on sparse depth denoising self-encoding encoder
CN112329685A (en) Method for detecting crowd abnormal behaviors through fusion type convolutional neural network
CN112115769A (en) Unsupervised sparse population abnormal behavior detection algorithm based on video
Feng et al. Online learning with self-organizing maps for anomaly detection in crowd scenes
CN107590427B (en) Method for detecting abnormal events of surveillance video based on space-time interest point noise reduction
CN110580472A (en) video foreground detection method based on full convolution network and conditional countermeasure network
CN112906631B (en) Dangerous driving behavior detection method and detection system based on video
CN110188668B (en) Small sample video action classification method
CN112801019B (en) Method and system for eliminating re-identification deviation of unsupervised vehicle based on synthetic data
CN113312973A (en) Method and system for extracting features of gesture recognition key points
Guo et al. Exposing deepfake face forgeries with guided residuals
CN117237994B (en) Method, device and system for counting personnel and detecting behaviors in oil and gas operation area
Katircioglu et al. Self-supervised human detection and segmentation via background inpainting
CN110751005B (en) Pedestrian detection method integrating depth perception features and kernel extreme learning machine
CN113989709A (en) Target detection method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant