CN111709411B - Video anomaly detection method and device based on semi-supervised learning - Google Patents
Video anomaly detection method and device based on semi-supervised learning Download PDFInfo
- Publication number
- CN111709411B CN111709411B CN202010842914.8A CN202010842914A CN111709411B CN 111709411 B CN111709411 B CN 111709411B CN 202010842914 A CN202010842914 A CN 202010842914A CN 111709411 B CN111709411 B CN 111709411B
- Authority
- CN
- China
- Prior art keywords
- video
- features
- vector
- feature
- obtaining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a video anomaly detection method and device based on semi-supervised learning, which sequentially divides video data into u x v frame video images; respectively extracting the characteristics of each packet to obtain corresponding video characteristics; obtaining an average vector and an importance vector of video features according to the video features, obtaining a mask of the features with strong filtering distinguishability according to the average vector of the video features, and obtaining a layer of a neural network from the average vector, the mask and the importance vector of the video features; obtaining modified characteristics according to the layer of the conjugate and the video characteristic vector, and obtaining training parameters according to the modified characteristics; and during testing, obtaining the modified characteristics according to adjacent packets, inputting the modified characteristics into the full-connection network, calculating to obtain the score of each packet, and judging whether the related position is abnormal or not according to the scores. The invention can hide the most differentiated part in the video characteristics to capture the whole information and can highlight the information area with strong distinctiveness to enhance the recognition capability of the neural network.
Description
Technical Field
The invention relates to the technical field of video detection, in particular to a video abnormity detection method based on semi-supervised learning, a video abnormity detection device based on semi-supervised learning, computer equipment and a computer program product.
Background
In modern society, video monitoring technology becomes the most important security monitoring means at present. However, in the common monitoring video processing method, a manager is required to check a monitoring picture, and when monitoring data is large, a specially-assigned person is very easy to fatigue when watching the monitoring picture, and the phenomenon of missing detection is easy to occur. Therefore, determining whether the video is abnormal or not and locating the abnormal part in the video are urgent requirements for monitoring management.
In the related technology, a part of pictures of a video is input into a C3D network (3D convolutional neural network) or the like to obtain the video characteristics of the part, then the part of characteristics are input into a full link network to calculate an abnormal score, finally, the maximum value is taken according to the abnormal score of each part to predict whether an abnormal event occurs in the video, and meanwhile, the position where the abnormal event occurs in the video is positioned according to the score of each part.
However, the contribution of the anomaly score in the above scheme is mainly determined by some significant local features, and in some videos, the neural network is required to understand the global video to determine whether an anomaly occurs, so that the determination may not be accurate only by considering some local features with strong distinctiveness.
Disclosure of Invention
The invention provides a video anomaly detection method based on semi-supervised learning to solve the technical problems, which can hide the most distinguished part in video features to capture overall information and highlight the information area with strong distinctiveness to enhance the recognition capability of a neural network.
The technical scheme adopted by the invention is as follows:
a video anomaly detection method based on semi-supervised learning comprises the following steps: sequentially cutting video data into u x v frame video images, wherein v frame video of each adjacent time sequence from the beginning is called a packet, each video can be divided into u packets, and u and v are positive integers; respectively extracting features of each packet to obtain corresponding video features, wherein each video has u video feature vectors; obtaining an average vector and an importance vector of video features according to the video features, obtaining a mask of the features with strong filtering distinguishability according to the average vector of the video features, and obtaining a conjugate layer of a neural network according to the average vector of the video features, the mask and the importance vector; obtaining modified features from the layer of the conjugate and the video feature vector, and obtaining training parameters from the modified features; and during testing, obtaining the modified characteristics according to adjacent packets, inputting the modified characteristics into a full-connection network, calculating to obtain the score of each packet, and judging whether the relevant position is abnormal or not according to the scores.
According to one embodiment of the invention, the importance vector F is obtained by calculation using the following formulas:
Wherein, Sigmoid function expressionIs composed of,FmAn average vector representing the video features.
According to one embodiment of the invention, the conjugated layer F of the neural network is obtained by calculation using the following formulad:
Wherein the content of the first and second substances,wherein α is a predetermined coefficient, ands is [0,1 ]]A random number in between, and a random number,is used as a mask.
According to one embodiment of the invention, the training parameters are obtained by a training model,
wherein the content of the first and second substances,,,,a1、a2in order to be a hyper-parameter,in order to be a feature of an abnormal video,is a feature of a normal video,indicating that the maximum value is taken for the final score after the corresponding u features have passed through the fully connected network G.
According to one embodiment of the invention, the expression of the fully connected network G is:
wherein the content of the first and second substances,for the purpose of the characteristics after the modification,and() For the parameter to be trained, Relu has the expressionWhen x is a vector, a Relu operation is performed on each element in the vector.
According to an embodiment of the present invention, obtaining a mask of a feature with strong filtering distinguishability according to an average vector of the video features includes: when the average vector of the video features is smaller than or equal to the product of the maximum element corresponding to the average vector and a preset coefficient, the element value of the corresponding position in the mask is 1; and when the average vector of the video features is larger than the product of the corresponding maximum element and a preset coefficient, the element value of the corresponding position in the mask is 0.
According to one embodiment of the invention, the determining whether the relevant position is abnormal or not according to the score comprises the following steps: judging whether the video score is larger than a preset threshold value or not; if the current picture is larger than the abnormal picture, the current picture is judged to be the abnormal picture so as to determine the abnormal position in the video.
The invention also provides a video anomaly detection device based on semi-supervised learning, which comprises: the video segmentation module is used for sequentially segmenting the video data into u x v frame video images, wherein v frame videos of each adjacent time sequence from the beginning are called a packet, each video can be divided into u packets, and u and v are positive integers; the video feature extraction module is used for respectively extracting features of each packet to obtain corresponding video features, wherein each video has u video feature vectors; the neural network training module is used for obtaining an average vector and an importance vector of a video feature according to the video feature, obtaining a mask of a feature with strong filtering distinguishability according to the average vector of the video feature, obtaining a dropped layer of a neural network according to the average vector of the video feature, the mask and the importance vector, obtaining a modified feature according to the dropped layer and the video feature vector and obtaining a training parameter according to the modified feature; and the model testing module is used for obtaining the modified characteristics according to adjacent packets and inputting the modified characteristics into the full-connection network during testing, calculating the score of each packet and judging whether the relevant position is abnormal or not according to the scores.
The invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the program, the video abnormity detection method based on semi-supervised learning is realized.
The invention also provides a computer program product, and when instructions in the computer program product are executed by a processor, the video anomaly detection method based on semi-supervised learning is executed.
The invention has the beneficial effects that:
the invention can hide the most differentiated part in the video characteristics to capture the whole information and can highlight the information area with strong distinctiveness to enhance the recognition capability of the neural network.
Drawings
Fig. 1 is a flowchart of a video anomaly detection method based on semi-supervised learning according to an embodiment of the present invention;
fig. 2 is a block diagram illustrating video anomaly detection based on semi-supervised learning according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a video anomaly detection method based on semi-supervised learning according to an embodiment of the present invention. As shown in fig. 1, a video anomaly detection method based on semi-supervised learning according to an embodiment of the present invention may include the following steps:
s1, sequentially dividing the video data into u × v frame video images, wherein v frame video of each adjacent time sequence from the beginning is called a packet, each video can be divided into u packets, and u and v are positive integers.
In an embodiment of the present invention, the method further includes: judging the relationship between the video size and the uxv frame, wherein when the video size is less than the uxv frame, the frame number of the head frame and the tail frame of the copied video is determined according to the video size; when the video is larger than the u × v frame, the frequency of frame skipping is determined according to the video size. For example, when the video size is still different by 6 frames to reach u × v frames, 3 frames from the beginning to the end of the video can be copied.
It should be noted that, when the frame number of the video is not an integer multiple of v, the last frame of the remaining frame number is copied to reach v frames, for example, the frame number of the video is 16 frames, each 5 frames of the video is a packet, one frame of the video remains, and 4 frames of the video can form a packet, so that the last frame of the video is copied 4 times, and forms a packet together with the remaining frame; for another example, if the number of frames of the video is 18, each 5 frames of the video is a packet, and the remaining 3 frames of the video are 2 frames, then the last frame is copied 2 times and the remaining 3 frames together form a packet.
During model training, videos marked by a training set are divided into normal videos and abnormal videos, and for the abnormal videos, due to the fact that the videos are in a semi-supervised mode, the time sequence position of an abnormal picture is unknown.
And S2, respectively extracting the features of each packet to obtain corresponding video features, wherein each video has u video feature vectors.
In one embodiment of the invention, the corresponding video features may be obtained by a C3D or I3D feature extractor. Each video has u video feature vectors, which are respectively recorded as: f1, F2,…,FuWhereinFor n-dimensional vectors, C3D and I3D are convolutional neural networks of different architectures.
And S3, obtaining the average vector and the importance vector of the video features according to the video features, obtaining a mask of the features with strong filtering distinguishability according to the average vector of the video features, and obtaining a dropped layer of the neural network according to the average vector, the mask and the importance vector of the video features.
According to one embodiment of the invention, the average vector F of the video features is obtained by calculation according to the following formula (1)m:
Further, the importance vector F is obtained by calculation of the following formula (2)s:
In an embodiment of the present invention, obtaining a mask with very distinctive filtering features according to an average vector of video features includes: when the average vector of the video features is smaller than or equal to the product of the maximum element corresponding to the average vector and a preset coefficient, the element value of the corresponding position in the mask is 1; when the average vector of the video features is larger than the product of the maximum element corresponding to the average vector and the preset coefficient, the element value of the corresponding position in the mask is 0.
That is, F is judgedmAndwherein α is a predetermined coefficient, and,max(Fm) Represents a pair vector FmTaking the maximum element of the strain, when FmIs less than or equal toWhen F is presentmaskThe element value of the corresponding position is 1, otherwise, 0.
Further, a conjugated layer F of the neural network is obtained by calculation using the following formula (3)d:
Wherein the content of the first and second substances,wherein α is a predetermined coefficient, ands is [0,1 ]]A random number in between, and a random number,is used as a mask.
And S4, obtaining modified characteristics according to the layer of the conjugate and the video characteristic vector, and obtaining training parameters according to the modified characteristics.
In one embodiment of the invention, the conjugate layer FdAnd video feature vector F1, F2, …,FuRespectively dot-multiplied to obtain modified features,,…,. And the modified features, ,…,Inputting the full-connection network G to obtain the final layer output value of G, and obtaining the training parameter by the following formula (4) training model,
wherein the content of the first and second substances,,,,a1、a2in order to be a hyper-parameter,in order to be a feature of an abnormal video,is a feature of a normal video,indicating that the maximum value is taken for the final score after the corresponding u features have passed through the fully connected network G.
And S5, inputting the modified characteristics obtained by the adjacent packets into the full-connection network during testing, calculating the score of each packet, and judging whether the related position is abnormal or not according to the scores.
In one embodiment of the invention, the expression for the fully connected network G is:
wherein the content of the first and second substances,for the purpose of the characteristics after the modification,and() For the parameter to be trained, Relu has the expressionWhen x is a vector, Relu operation is performed on each element in the vector, and a Sigmoid activation function is used to express the final score.
Training-time dropping is no longer used when testing for video anomaliesLayer, directly according to function maxi(G(Fi) ) is performed.
To sum up, firstly, marking the video, the marking type is divided into normal video and abnormal video, then carrying out model training, firstly carrying out feature extraction on the video, obtaining an average vector and an importance vector of video features, obtaining a mask according to the average vector of the video features, then obtaining a dropping layer according to a corresponding formula, obtaining a modified feature vector according to the dropping layer and the feature vector, inputting the modified feature vector into a fully-connected network, obtaining the final obtained maximum value after corresponding u features pass through the fully-connected network G, then inputting the value into a training model to obtain training parameters, obtaining the modified vector features according to the steps when calculating the score of the video, and substituting the modified vector features into the expression of the fully-connected network to obtain the final score, wherein in the testing stage, the dropping layer during training is not needed, and directly substituting the modified vector characteristics into the full-connection network to obtain the final score.
According to one embodiment of the invention, the judging whether the relevant position is abnormal or not according to the score comprises the following steps: judging whether the score is greater than a preset threshold value; if the current picture is larger than the abnormal picture, the current picture is judged to be the abnormal picture so as to determine the abnormal position in the video.
That is, the location of the anomaly location in the video may be based on the score G (F)i) And if the number of the detected images is larger than a certain threshold, determining that the images are abnormal.
In summary, the present invention can hide the most differentiated part in the video features to capture the whole information, and can highlight the information area with strong distinctiveness to enhance the recognition capability of the neural network.
Fig. 2 is a block diagram illustrating video anomaly detection based on semi-supervised learning according to an embodiment of the present invention. As shown in fig. 2, the video anomaly detection based on semi-supervised learning according to the embodiment of the present invention may include: the video segmentation module 10, the video feature extraction module 20, the neural network training module 30 and the model testing module 40.
The video segmentation module 10 is configured to segment video data into u × v frame video images in sequence, where v frame video of each adjacent time sequence from the beginning is called a packet, each video may be divided into u packets, and u and v are positive integers. The video feature extraction module 20 is configured to perform feature extraction on each packet to obtain corresponding video features, where each video has u video feature vectors. The neural network training module 30 is configured to obtain an average vector and an importance vector of a video feature according to the video feature, obtain a mask of a feature with a high filtering distinction property according to the average vector of the video feature, obtain a layer of a neural network according to the average vector, the mask and the importance vector of the video feature, obtain a modified feature according to the layer of the neural network and the video feature vector, and obtain a training parameter according to the modified feature. The model testing module 40 is configured to, during testing, input the modified features into the fully-connected network according to the adjacent packets, calculate a score of each packet, and determine whether an abnormality occurs in a relevant position according to the scores.
According to an embodiment of the present invention, the neural network training module 30 obtains the importance vector F by calculating according to the following formulas:
According to one embodiment of the present invention, the neural network training module 30 calculates the layer of the neural network for the dropped layer F by the following formulad:
Wherein the content of the first and second substances,wherein, in the step (A),alpha is a predetermined coefficient, ands is [0,1 ]]A random number in between, and a random number,is used as a mask.
According to one embodiment of the present invention, the neural network training module 30 obtains the training parameters through the following training model,
wherein the content of the first and second substances,,,,a1、a2in order to be a hyper-parameter,in order to be a feature of an abnormal video,is a feature of a normal video,indicating that the maximum value is taken for the final score after the corresponding u features have passed through the fully connected network G.
According to one embodiment of the invention, the expression of the fully connected network G of the model test module 40 is:
wherein the content of the first and second substances,for the purpose of the characteristics after the modification,and() For the parameter to be trained, Relu has the expressionWhen x is a vector, a Relu operation is performed on each element in the vector.
According to an embodiment of the present invention, the neural network training module 30 obtains a mask with very strong filtering distinguishing characteristics according to the average vector of the video characteristics, specifically, when the average vector of the video characteristics is less than or equal to the product of the maximum element corresponding to the average vector of the video characteristics and the preset coefficient, the element value of the corresponding position in the mask is 1; when the average vector of the video features is larger than the product of the maximum element corresponding to the average vector and the preset coefficient, the element value of the corresponding position in the mask is 0.
According to an embodiment of the invention, the model test module 40 is further configured to: judging whether the score is greater than a preset threshold value; if the current picture is larger than the abnormal picture, the current picture is judged to be the abnormal picture so as to determine the abnormal position in the video.
It should be noted that details that are not disclosed in the video anomaly detection apparatus based on semi-supervised learning according to the embodiment of the present invention refer to details disclosed in the video anomaly detection method based on semi-supervised learning according to the embodiment of the present invention, and details are not described here again.
The invention can hide the most differentiated part in the video characteristics to capture the whole information and can highlight the information area with strong distinctiveness to enhance the recognition capability of the neural network.
The invention further provides a computer device corresponding to the embodiment.
The computer device of the embodiment of the invention comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and when the processor executes the computer program, the video anomaly detection method based on semi-supervised learning can be realized.
According to the computer device of the embodiment of the invention, when the processor executes the computer program stored on the memory, the most differentiated part in the video characteristics can be hidden to capture the whole information, and the information area with strong distinctiveness can be highlighted to enhance the recognition capability of the neural network.
The invention also provides a non-transitory computer readable storage medium corresponding to the above embodiment.
A non-transitory computer-readable storage medium of an embodiment of the present invention stores thereon a computer program, which, when executed by a processor, can implement the video anomaly detection method based on semi-supervised learning according to the above-described embodiment of the present invention.
According to the non-transitory computer readable storage medium of the embodiment of the invention, when the processor executes the computer program stored on the processor, the most differentiated part in the video features can be hidden to capture the whole information, and the information area with strong distinctiveness can be highlighted to enhance the recognition capability of the neural network.
The present invention also provides a computer program product corresponding to the above embodiments.
When the instructions in the computer program product of the embodiment of the present invention are executed by the processor, the video anomaly detection method based on semi-supervised learning according to the above-mentioned embodiment of the present invention can be executed.
According to the computer program product of the embodiment of the invention, when the processor executes the instructions, the most differentiated part in the video characteristics can be hidden to capture the whole information, and the information area with strong distinctiveness can be highlighted to enhance the recognition capability of the neural network.
In the description of the present invention, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. The meaning of "plurality" is two or more unless specifically limited otherwise.
In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "over," and "above" a second feature may be directly or diagonally above the second feature, or may simply indicate that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature may be directly under or obliquely under the first feature, or may simply mean that the first feature is at a lesser elevation than the second feature.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.
Claims (9)
1. A video anomaly detection method based on semi-supervised learning is characterized by comprising the following steps:
sequentially cutting video data into u x v frame video images, wherein v frame video of each adjacent time sequence from the beginning is called a packet, each video can be divided into u packets, and u and v are positive integers;
respectively extracting features of each packet to obtain corresponding video features, wherein each video has u video feature vectors;
obtaining an average vector and an importance vector of video features according to the video features, obtaining a mask of the features with strong filtering distinguishability according to the average vector of the video features, and obtaining a conjugate layer of a neural network according to the average vector of the video features, the mask and the importance vector;
obtaining a modified feature from the layer of the conjugate and the video feature vector, and obtaining a training parameter from the modified feature, wherein the layer of the conjugate and the video feature vector are point multiplied to obtain the modified feature;
and during testing, obtaining the modified characteristics according to adjacent packets, inputting the modified characteristics into a full-connection network, calculating to obtain the score of each packet, and judging whether the relevant position is abnormal or not according to the scores.
3. The semi-supervised learning based video anomaly detection method of claim 2, wherein the calculation of the Nephroler layer F of the neural network is obtained by the following formulad:
4. The video anomaly detection method based on semi-supervised learning according to claim 3, wherein the training parameters are obtained through a training model,
wherein the content of the first and second substances,,,,a1、a2in order to be a hyper-parameter,in order to be a feature of an abnormal video,is a feature of a normal video,and i is more than or equal to 1 and less than or equal to u, which means that the maximum value is taken for the final score of the corresponding u features after passing through the fully-connected network G.
5. The video anomaly detection method based on semi-supervised learning according to claim 1, wherein the expression of the fully connected network G is as follows:
6. The video anomaly detection method based on semi-supervised learning according to claim 1, wherein obtaining a mask with very filtering discriminative features according to the average vector of the video features comprises:
when the average vector of the video features is smaller than or equal to the product of the maximum element corresponding to the average vector and a preset coefficient, the element value of the corresponding position in the mask is 1;
and when the average vector of the video features is larger than the product of the corresponding maximum element and a preset coefficient, the element value of the corresponding position in the mask is 0.
7. The video anomaly detection method based on semi-supervised learning as recited in claim 1, wherein determining whether an anomaly occurs in a relevant position according to the score comprises:
judging whether the score is larger than a preset threshold value or not;
if the current picture is larger than the abnormal picture, the current picture is judged to be the abnormal picture so as to determine the abnormal position in the video.
8. A video anomaly detection device based on semi-supervised learning is characterized by comprising:
the video segmentation module is used for sequentially segmenting the video data into u x v frame video images, wherein v frame videos of each adjacent time sequence from the beginning are called a packet, each video can be divided into u packets, and u and v are positive integers;
the video feature extraction module is used for respectively extracting features of each packet to obtain corresponding video features, wherein each video has u video feature vectors;
a neural network training module, configured to obtain an average vector and an importance vector of a video feature according to the video feature, obtain a mask of a feature with a high filtering distinction property according to the average vector of the video feature, obtain a layer of a neural network according to the average vector of the video feature, the mask, and the importance vector, obtain a modified feature according to the layer of the video feature, and obtain a training parameter according to the modified feature, wherein the layer of;
and the model testing module is used for obtaining the modified characteristics according to adjacent packets and inputting the modified characteristics into the full-connection network during testing, calculating the score of each packet and judging whether the relevant position is abnormal or not according to the scores.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the video anomaly detection method based on semi-supervised learning according to any one of claims 1 to 7 when executing the program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010842914.8A CN111709411B (en) | 2020-08-20 | 2020-08-20 | Video anomaly detection method and device based on semi-supervised learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010842914.8A CN111709411B (en) | 2020-08-20 | 2020-08-20 | Video anomaly detection method and device based on semi-supervised learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111709411A CN111709411A (en) | 2020-09-25 |
CN111709411B true CN111709411B (en) | 2020-11-10 |
Family
ID=72547386
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010842914.8A Active CN111709411B (en) | 2020-08-20 | 2020-08-20 | Video anomaly detection method and device based on semi-supervised learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111709411B (en) |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018227105A1 (en) * | 2017-06-08 | 2018-12-13 | The United States Of America, As Represented By The Secretary, Department Of Health And Human Services | Progressive and multi-path holistically nested networks for segmentation |
EP3625727A1 (en) * | 2017-11-14 | 2020-03-25 | Google LLC | Weakly-supervised action localization by sparse temporal pooling network |
CN110516536B (en) * | 2019-07-12 | 2022-03-18 | 杭州电子科技大学 | Weak supervision video behavior detection method based on time sequence class activation graph complementation |
CN110502988A (en) * | 2019-07-15 | 2019-11-26 | 武汉大学 | Group positioning and anomaly detection method in video |
CN111291699B (en) * | 2020-02-19 | 2022-06-03 | 山东大学 | Substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection |
-
2020
- 2020-08-20 CN CN202010842914.8A patent/CN111709411B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN111709411A (en) | 2020-09-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Khodabakhsh et al. | Fake face detection methods: Can they be generalized? | |
CN108810620B (en) | Method, device, equipment and storage medium for identifying key time points in video | |
CN107122806B (en) | Sensitive image identification method and device | |
CN110047095B (en) | Tracking method and device based on target detection and terminal equipment | |
JP7006702B2 (en) | Image processing equipment, image processing methods and programs | |
CN109543760A (en) | Confrontation sample testing method based on image filters algorithm | |
CN112508950B (en) | Anomaly detection method and device | |
CN111179295B (en) | Improved two-dimensional Otsu threshold image segmentation method and system | |
CN114120127A (en) | Target detection method, device and related equipment | |
CN112001401A (en) | Training model and training method of example segmentation network, and example segmentation network | |
CN113420745A (en) | Image-based target identification method, system, storage medium and terminal equipment | |
CN112597928A (en) | Event detection method and related device | |
CN113781483B (en) | Industrial product appearance defect detection method and device | |
CN113743378B (en) | Fire monitoring method and device based on video | |
JP6874864B2 (en) | Image processing equipment, image processing methods and programs | |
CN116452966A (en) | Target detection method, device and equipment for underwater image and storage medium | |
CN113706837B (en) | Engine abnormal state detection method and device | |
CN111709411B (en) | Video anomaly detection method and device based on semi-supervised learning | |
CN112052823A (en) | Target detection method and device | |
CN112287905A (en) | Vehicle damage identification method, device, equipment and storage medium | |
CN114187292B (en) | Abnormality detection method, apparatus, device and storage medium for cotton spinning paper tube | |
CN115861315A (en) | Defect detection method and device | |
CN115249316A (en) | Industrial defect detection method and device | |
CN114612710A (en) | Image detection method, image detection device, computer equipment and storage medium | |
CN112668451A (en) | Crowd density real-time monitoring method based on YOLOv5 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220107 Address after: 315000 No. 138-1, Zhongshan West Road, Fenghua District, Ningbo City, Zhejiang Province (self declaration) Patentee after: Shenlan industrial intelligent Innovation Research Institute (Ningbo) Co.,Ltd. Address before: 213000 No.103, building 4, Chuangyan port, Changzhou science and Education City, No.18, middle Changwu Road, Wujin District, Changzhou City, Jiangsu Province Patentee before: SHENLAN ARTIFICIAL INTELLIGENCE CHIP RESEARCH INSTITUTE (JIANGSU) Co.,Ltd. |