CN111709411B - Video anomaly detection method and device based on semi-supervised learning - Google Patents

Video anomaly detection method and device based on semi-supervised learning Download PDF

Info

Publication number
CN111709411B
CN111709411B CN202010842914.8A CN202010842914A CN111709411B CN 111709411 B CN111709411 B CN 111709411B CN 202010842914 A CN202010842914 A CN 202010842914A CN 111709411 B CN111709411 B CN 111709411B
Authority
CN
China
Prior art keywords
video
features
vector
feature
obtaining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010842914.8A
Other languages
Chinese (zh)
Other versions
CN111709411A (en
Inventor
陈海波
张雷武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenlan industrial intelligent Innovation Research Institute (Ningbo) Co.,Ltd.
Original Assignee
DeepBlue AI Chips Research Institute Jiangsu Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DeepBlue AI Chips Research Institute Jiangsu Co Ltd filed Critical DeepBlue AI Chips Research Institute Jiangsu Co Ltd
Priority to CN202010842914.8A priority Critical patent/CN111709411B/en
Publication of CN111709411A publication Critical patent/CN111709411A/en
Application granted granted Critical
Publication of CN111709411B publication Critical patent/CN111709411B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a video anomaly detection method and device based on semi-supervised learning, which sequentially divides video data into u x v frame video images; respectively extracting the characteristics of each packet to obtain corresponding video characteristics; obtaining an average vector and an importance vector of video features according to the video features, obtaining a mask of the features with strong filtering distinguishability according to the average vector of the video features, and obtaining a layer of a neural network from the average vector, the mask and the importance vector of the video features; obtaining modified characteristics according to the layer of the conjugate and the video characteristic vector, and obtaining training parameters according to the modified characteristics; and during testing, obtaining the modified characteristics according to adjacent packets, inputting the modified characteristics into the full-connection network, calculating to obtain the score of each packet, and judging whether the related position is abnormal or not according to the scores. The invention can hide the most differentiated part in the video characteristics to capture the whole information and can highlight the information area with strong distinctiveness to enhance the recognition capability of the neural network.

Description

Video anomaly detection method and device based on semi-supervised learning
Technical Field
The invention relates to the technical field of video detection, in particular to a video abnormity detection method based on semi-supervised learning, a video abnormity detection device based on semi-supervised learning, computer equipment and a computer program product.
Background
In modern society, video monitoring technology becomes the most important security monitoring means at present. However, in the common monitoring video processing method, a manager is required to check a monitoring picture, and when monitoring data is large, a specially-assigned person is very easy to fatigue when watching the monitoring picture, and the phenomenon of missing detection is easy to occur. Therefore, determining whether the video is abnormal or not and locating the abnormal part in the video are urgent requirements for monitoring management.
In the related technology, a part of pictures of a video is input into a C3D network (3D convolutional neural network) or the like to obtain the video characteristics of the part, then the part of characteristics are input into a full link network to calculate an abnormal score, finally, the maximum value is taken according to the abnormal score of each part to predict whether an abnormal event occurs in the video, and meanwhile, the position where the abnormal event occurs in the video is positioned according to the score of each part.
However, the contribution of the anomaly score in the above scheme is mainly determined by some significant local features, and in some videos, the neural network is required to understand the global video to determine whether an anomaly occurs, so that the determination may not be accurate only by considering some local features with strong distinctiveness.
Disclosure of Invention
The invention provides a video anomaly detection method based on semi-supervised learning to solve the technical problems, which can hide the most distinguished part in video features to capture overall information and highlight the information area with strong distinctiveness to enhance the recognition capability of a neural network.
The technical scheme adopted by the invention is as follows:
a video anomaly detection method based on semi-supervised learning comprises the following steps: sequentially cutting video data into u x v frame video images, wherein v frame video of each adjacent time sequence from the beginning is called a packet, each video can be divided into u packets, and u and v are positive integers; respectively extracting features of each packet to obtain corresponding video features, wherein each video has u video feature vectors; obtaining an average vector and an importance vector of video features according to the video features, obtaining a mask of the features with strong filtering distinguishability according to the average vector of the video features, and obtaining a conjugate layer of a neural network according to the average vector of the video features, the mask and the importance vector; obtaining modified features from the layer of the conjugate and the video feature vector, and obtaining training parameters from the modified features; and during testing, obtaining the modified characteristics according to adjacent packets, inputting the modified characteristics into a full-connection network, calculating to obtain the score of each packet, and judging whether the relevant position is abnormal or not according to the scores.
According to one embodiment of the invention, the importance vector F is obtained by calculation using the following formulas
Figure 380874DEST_PATH_IMAGE001
Wherein, Sigmoid function expressionIs composed of
Figure 991240DEST_PATH_IMAGE002
,FmAn average vector representing the video features.
According to one embodiment of the invention, the conjugated layer F of the neural network is obtained by calculation using the following formulad
Figure 572525DEST_PATH_IMAGE003
Wherein the content of the first and second substances,
Figure 205501DEST_PATH_IMAGE004
wherein α is a predetermined coefficient, and
Figure 555842DEST_PATH_IMAGE005
s is [0,1 ]]A random number in between, and a random number,
Figure 325391DEST_PATH_IMAGE006
is used as a mask.
According to one embodiment of the invention, the training parameters are obtained by a training model,
Figure 393972DEST_PATH_IMAGE007
wherein the content of the first and second substances,
Figure 50212DEST_PATH_IMAGE008
Figure 520639DEST_PATH_IMAGE009
Figure 518813DEST_PATH_IMAGE010
,a1、a2in order to be a hyper-parameter,
Figure 543532DEST_PATH_IMAGE011
in order to be a feature of an abnormal video,
Figure 19775DEST_PATH_IMAGE012
is a feature of a normal video,
Figure 46506DEST_PATH_IMAGE013
indicating that the maximum value is taken for the final score after the corresponding u features have passed through the fully connected network G.
According to one embodiment of the invention, the expression of the fully connected network G is:
Figure 289617DEST_PATH_IMAGE014
wherein the content of the first and second substances,
Figure 270474DEST_PATH_IMAGE015
for the purpose of the characteristics after the modification,
Figure 815987DEST_PATH_IMAGE016
and
Figure 431645DEST_PATH_IMAGE017
Figure 771621DEST_PATH_IMAGE018
) For the parameter to be trained, Relu has the expression
Figure 505353DEST_PATH_IMAGE019
When x is a vector, a Relu operation is performed on each element in the vector.
According to an embodiment of the present invention, obtaining a mask of a feature with strong filtering distinguishability according to an average vector of the video features includes: when the average vector of the video features is smaller than or equal to the product of the maximum element corresponding to the average vector and a preset coefficient, the element value of the corresponding position in the mask is 1; and when the average vector of the video features is larger than the product of the corresponding maximum element and a preset coefficient, the element value of the corresponding position in the mask is 0.
According to one embodiment of the invention, the determining whether the relevant position is abnormal or not according to the score comprises the following steps: judging whether the video score is larger than a preset threshold value or not; if the current picture is larger than the abnormal picture, the current picture is judged to be the abnormal picture so as to determine the abnormal position in the video.
The invention also provides a video anomaly detection device based on semi-supervised learning, which comprises: the video segmentation module is used for sequentially segmenting the video data into u x v frame video images, wherein v frame videos of each adjacent time sequence from the beginning are called a packet, each video can be divided into u packets, and u and v are positive integers; the video feature extraction module is used for respectively extracting features of each packet to obtain corresponding video features, wherein each video has u video feature vectors; the neural network training module is used for obtaining an average vector and an importance vector of a video feature according to the video feature, obtaining a mask of a feature with strong filtering distinguishability according to the average vector of the video feature, obtaining a dropped layer of a neural network according to the average vector of the video feature, the mask and the importance vector, obtaining a modified feature according to the dropped layer and the video feature vector and obtaining a training parameter according to the modified feature; and the model testing module is used for obtaining the modified characteristics according to adjacent packets and inputting the modified characteristics into the full-connection network during testing, calculating the score of each packet and judging whether the relevant position is abnormal or not according to the scores.
The invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the program, the video abnormity detection method based on semi-supervised learning is realized.
The invention also provides a computer program product, and when instructions in the computer program product are executed by a processor, the video anomaly detection method based on semi-supervised learning is executed.
The invention has the beneficial effects that:
the invention can hide the most differentiated part in the video characteristics to capture the whole information and can highlight the information area with strong distinctiveness to enhance the recognition capability of the neural network.
Drawings
Fig. 1 is a flowchart of a video anomaly detection method based on semi-supervised learning according to an embodiment of the present invention;
fig. 2 is a block diagram illustrating video anomaly detection based on semi-supervised learning according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a video anomaly detection method based on semi-supervised learning according to an embodiment of the present invention. As shown in fig. 1, a video anomaly detection method based on semi-supervised learning according to an embodiment of the present invention may include the following steps:
s1, sequentially dividing the video data into u × v frame video images, wherein v frame video of each adjacent time sequence from the beginning is called a packet, each video can be divided into u packets, and u and v are positive integers.
In an embodiment of the present invention, the method further includes: judging the relationship between the video size and the uxv frame, wherein when the video size is less than the uxv frame, the frame number of the head frame and the tail frame of the copied video is determined according to the video size; when the video is larger than the u × v frame, the frequency of frame skipping is determined according to the video size. For example, when the video size is still different by 6 frames to reach u × v frames, 3 frames from the beginning to the end of the video can be copied.
It should be noted that, when the frame number of the video is not an integer multiple of v, the last frame of the remaining frame number is copied to reach v frames, for example, the frame number of the video is 16 frames, each 5 frames of the video is a packet, one frame of the video remains, and 4 frames of the video can form a packet, so that the last frame of the video is copied 4 times, and forms a packet together with the remaining frame; for another example, if the number of frames of the video is 18, each 5 frames of the video is a packet, and the remaining 3 frames of the video are 2 frames, then the last frame is copied 2 times and the remaining 3 frames together form a packet.
During model training, videos marked by a training set are divided into normal videos and abnormal videos, and for the abnormal videos, due to the fact that the videos are in a semi-supervised mode, the time sequence position of an abnormal picture is unknown.
And S2, respectively extracting the features of each packet to obtain corresponding video features, wherein each video has u video feature vectors.
In one embodiment of the invention, the corresponding video features may be obtained by a C3D or I3D feature extractor. Each video has u video feature vectors, which are respectively recorded as: f1, F2,…,FuWherein
Figure 854557DEST_PATH_IMAGE020
For n-dimensional vectors, C3D and I3D are convolutional neural networks of different architectures.
And S3, obtaining the average vector and the importance vector of the video features according to the video features, obtaining a mask of the features with strong filtering distinguishability according to the average vector of the video features, and obtaining a dropped layer of the neural network according to the average vector, the mask and the importance vector of the video features.
According to one embodiment of the invention, the average vector F of the video features is obtained by calculation according to the following formula (1)m
Figure 91766DEST_PATH_IMAGE021
, (1)
Further, the importance vector F is obtained by calculation of the following formula (2)s
Figure 632337DEST_PATH_IMAGE022
(2)
Wherein the Sigmoid function expression is
Figure 587786DEST_PATH_IMAGE023
,FmAn average vector representing the video features.
In an embodiment of the present invention, obtaining a mask with very distinctive filtering features according to an average vector of video features includes: when the average vector of the video features is smaller than or equal to the product of the maximum element corresponding to the average vector and a preset coefficient, the element value of the corresponding position in the mask is 1; when the average vector of the video features is larger than the product of the maximum element corresponding to the average vector and the preset coefficient, the element value of the corresponding position in the mask is 0.
That is, F is judgedmAnd
Figure 463383DEST_PATH_IMAGE024
wherein α is a predetermined coefficient, and
Figure 86257DEST_PATH_IMAGE025
,max(Fm) Represents a pair vector FmTaking the maximum element of the strain, when FmIs less than or equal to
Figure 486145DEST_PATH_IMAGE024
When F is presentmaskThe element value of the corresponding position is 1, otherwise, 0.
Further, a conjugated layer F of the neural network is obtained by calculation using the following formula (3)d
Figure 725628DEST_PATH_IMAGE026
(3)
Wherein the content of the first and second substances,
Figure 668832DEST_PATH_IMAGE027
wherein α is a predetermined coefficient, and
Figure 954668DEST_PATH_IMAGE025
s is [0,1 ]]A random number in between, and a random number,
Figure 807349DEST_PATH_IMAGE028
is used as a mask.
And S4, obtaining modified characteristics according to the layer of the conjugate and the video characteristic vector, and obtaining training parameters according to the modified characteristics.
In one embodiment of the invention, the conjugate layer FdAnd video feature vector F1, F2, …,FuRespectively dot-multiplied to obtain modified features
Figure 737390DEST_PATH_IMAGE029
Figure 465043DEST_PATH_IMAGE030
,…,
Figure 531350DEST_PATH_IMAGE031
. And the modified features
Figure 820511DEST_PATH_IMAGE029
Figure 300165DEST_PATH_IMAGE030
,…,
Figure 67395DEST_PATH_IMAGE032
Inputting the full-connection network G to obtain the final layer output value of G, and obtaining the training parameter by the following formula (4) training model,
Figure 581685DEST_PATH_IMAGE033
(4)
wherein the content of the first and second substances,
Figure 540282DEST_PATH_IMAGE034
Figure 121612DEST_PATH_IMAGE035
Figure 489271DEST_PATH_IMAGE036
,a1、a2in order to be a hyper-parameter,
Figure 326908DEST_PATH_IMAGE037
in order to be a feature of an abnormal video,
Figure 754609DEST_PATH_IMAGE038
is a feature of a normal video,
Figure 146539DEST_PATH_IMAGE039
indicating that the maximum value is taken for the final score after the corresponding u features have passed through the fully connected network G.
And S5, inputting the modified characteristics obtained by the adjacent packets into the full-connection network during testing, calculating the score of each packet, and judging whether the related position is abnormal or not according to the scores.
In one embodiment of the invention, the expression for the fully connected network G is:
Figure 19686DEST_PATH_IMAGE040
wherein the content of the first and second substances,
Figure 915091DEST_PATH_IMAGE041
for the purpose of the characteristics after the modification,
Figure 44853DEST_PATH_IMAGE042
and
Figure 624212DEST_PATH_IMAGE043
Figure 255045DEST_PATH_IMAGE044
) For the parameter to be trained, Relu has the expression
Figure 598433DEST_PATH_IMAGE045
When x is a vector, Relu operation is performed on each element in the vector, and a Sigmoid activation function is used to express the final score.
Training-time dropping is no longer used when testing for video anomaliesLayer, directly according to function maxi(G(Fi) ) is performed.
To sum up, firstly, marking the video, the marking type is divided into normal video and abnormal video, then carrying out model training, firstly carrying out feature extraction on the video, obtaining an average vector and an importance vector of video features, obtaining a mask according to the average vector of the video features, then obtaining a dropping layer according to a corresponding formula, obtaining a modified feature vector according to the dropping layer and the feature vector, inputting the modified feature vector into a fully-connected network, obtaining the final obtained maximum value after corresponding u features pass through the fully-connected network G, then inputting the value into a training model to obtain training parameters, obtaining the modified vector features according to the steps when calculating the score of the video, and substituting the modified vector features into the expression of the fully-connected network to obtain the final score, wherein in the testing stage, the dropping layer during training is not needed, and directly substituting the modified vector characteristics into the full-connection network to obtain the final score.
According to one embodiment of the invention, the judging whether the relevant position is abnormal or not according to the score comprises the following steps: judging whether the score is greater than a preset threshold value; if the current picture is larger than the abnormal picture, the current picture is judged to be the abnormal picture so as to determine the abnormal position in the video.
That is, the location of the anomaly location in the video may be based on the score G (F)i) And if the number of the detected images is larger than a certain threshold, determining that the images are abnormal.
In summary, the present invention can hide the most differentiated part in the video features to capture the whole information, and can highlight the information area with strong distinctiveness to enhance the recognition capability of the neural network.
Fig. 2 is a block diagram illustrating video anomaly detection based on semi-supervised learning according to an embodiment of the present invention. As shown in fig. 2, the video anomaly detection based on semi-supervised learning according to the embodiment of the present invention may include: the video segmentation module 10, the video feature extraction module 20, the neural network training module 30 and the model testing module 40.
The video segmentation module 10 is configured to segment video data into u × v frame video images in sequence, where v frame video of each adjacent time sequence from the beginning is called a packet, each video may be divided into u packets, and u and v are positive integers. The video feature extraction module 20 is configured to perform feature extraction on each packet to obtain corresponding video features, where each video has u video feature vectors. The neural network training module 30 is configured to obtain an average vector and an importance vector of a video feature according to the video feature, obtain a mask of a feature with a high filtering distinction property according to the average vector of the video feature, obtain a layer of a neural network according to the average vector, the mask and the importance vector of the video feature, obtain a modified feature according to the layer of the neural network and the video feature vector, and obtain a training parameter according to the modified feature. The model testing module 40 is configured to, during testing, input the modified features into the fully-connected network according to the adjacent packets, calculate a score of each packet, and determine whether an abnormality occurs in a relevant position according to the scores.
According to an embodiment of the present invention, the neural network training module 30 obtains the importance vector F by calculating according to the following formulas
Figure 790773DEST_PATH_IMAGE046
Wherein the Sigmoid function expression is
Figure 157295DEST_PATH_IMAGE047
,FmAn average vector representing the video features.
According to one embodiment of the present invention, the neural network training module 30 calculates the layer of the neural network for the dropped layer F by the following formulad
Figure 467184DEST_PATH_IMAGE048
Wherein the content of the first and second substances,
Figure 981256DEST_PATH_IMAGE049
wherein, in the step (A),alpha is a predetermined coefficient, and
Figure 124924DEST_PATH_IMAGE050
s is [0,1 ]]A random number in between, and a random number,
Figure 244320DEST_PATH_IMAGE051
is used as a mask.
According to one embodiment of the present invention, the neural network training module 30 obtains the training parameters through the following training model,
Figure 233267DEST_PATH_IMAGE052
wherein the content of the first and second substances,
Figure 20089DEST_PATH_IMAGE053
Figure 662554DEST_PATH_IMAGE054
Figure 738088DEST_PATH_IMAGE055
,a1、a2in order to be a hyper-parameter,
Figure 825999DEST_PATH_IMAGE056
in order to be a feature of an abnormal video,
Figure 839565DEST_PATH_IMAGE057
is a feature of a normal video,
Figure 59456DEST_PATH_IMAGE058
indicating that the maximum value is taken for the final score after the corresponding u features have passed through the fully connected network G.
According to one embodiment of the invention, the expression of the fully connected network G of the model test module 40 is:
Figure 746921DEST_PATH_IMAGE059
wherein the content of the first and second substances,
Figure 77670DEST_PATH_IMAGE060
for the purpose of the characteristics after the modification,
Figure 275302DEST_PATH_IMAGE061
and
Figure 931674DEST_PATH_IMAGE062
Figure 935795DEST_PATH_IMAGE063
) For the parameter to be trained, Relu has the expression
Figure 726028DEST_PATH_IMAGE064
When x is a vector, a Relu operation is performed on each element in the vector.
According to an embodiment of the present invention, the neural network training module 30 obtains a mask with very strong filtering distinguishing characteristics according to the average vector of the video characteristics, specifically, when the average vector of the video characteristics is less than or equal to the product of the maximum element corresponding to the average vector of the video characteristics and the preset coefficient, the element value of the corresponding position in the mask is 1; when the average vector of the video features is larger than the product of the maximum element corresponding to the average vector and the preset coefficient, the element value of the corresponding position in the mask is 0.
According to an embodiment of the invention, the model test module 40 is further configured to: judging whether the score is greater than a preset threshold value; if the current picture is larger than the abnormal picture, the current picture is judged to be the abnormal picture so as to determine the abnormal position in the video.
It should be noted that details that are not disclosed in the video anomaly detection apparatus based on semi-supervised learning according to the embodiment of the present invention refer to details disclosed in the video anomaly detection method based on semi-supervised learning according to the embodiment of the present invention, and details are not described here again.
The invention can hide the most differentiated part in the video characteristics to capture the whole information and can highlight the information area with strong distinctiveness to enhance the recognition capability of the neural network.
The invention further provides a computer device corresponding to the embodiment.
The computer device of the embodiment of the invention comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and when the processor executes the computer program, the video anomaly detection method based on semi-supervised learning can be realized.
According to the computer device of the embodiment of the invention, when the processor executes the computer program stored on the memory, the most differentiated part in the video characteristics can be hidden to capture the whole information, and the information area with strong distinctiveness can be highlighted to enhance the recognition capability of the neural network.
The invention also provides a non-transitory computer readable storage medium corresponding to the above embodiment.
A non-transitory computer-readable storage medium of an embodiment of the present invention stores thereon a computer program, which, when executed by a processor, can implement the video anomaly detection method based on semi-supervised learning according to the above-described embodiment of the present invention.
According to the non-transitory computer readable storage medium of the embodiment of the invention, when the processor executes the computer program stored on the processor, the most differentiated part in the video features can be hidden to capture the whole information, and the information area with strong distinctiveness can be highlighted to enhance the recognition capability of the neural network.
The present invention also provides a computer program product corresponding to the above embodiments.
When the instructions in the computer program product of the embodiment of the present invention are executed by the processor, the video anomaly detection method based on semi-supervised learning according to the above-mentioned embodiment of the present invention can be executed.
According to the computer program product of the embodiment of the invention, when the processor executes the instructions, the most differentiated part in the video characteristics can be hidden to capture the whole information, and the information area with strong distinctiveness can be highlighted to enhance the recognition capability of the neural network.
In the description of the present invention, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. The meaning of "plurality" is two or more unless specifically limited otherwise.
In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "over," and "above" a second feature may be directly or diagonally above the second feature, or may simply indicate that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature may be directly under or obliquely under the first feature, or may simply mean that the first feature is at a lesser elevation than the second feature.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (9)

1. A video anomaly detection method based on semi-supervised learning is characterized by comprising the following steps:
sequentially cutting video data into u x v frame video images, wherein v frame video of each adjacent time sequence from the beginning is called a packet, each video can be divided into u packets, and u and v are positive integers;
respectively extracting features of each packet to obtain corresponding video features, wherein each video has u video feature vectors;
obtaining an average vector and an importance vector of video features according to the video features, obtaining a mask of the features with strong filtering distinguishability according to the average vector of the video features, and obtaining a conjugate layer of a neural network according to the average vector of the video features, the mask and the importance vector;
obtaining a modified feature from the layer of the conjugate and the video feature vector, and obtaining a training parameter from the modified feature, wherein the layer of the conjugate and the video feature vector are point multiplied to obtain the modified feature;
and during testing, obtaining the modified characteristics according to adjacent packets, inputting the modified characteristics into a full-connection network, calculating to obtain the score of each packet, and judging whether the relevant position is abnormal or not according to the scores.
2. The video anomaly detection method based on semi-supervised learning of claim 1, wherein the importance vector F is obtained by calculation through the following formulas
Figure 179272DEST_PATH_IMAGE001
Wherein the Sigmoid function expression is
Figure 129036DEST_PATH_IMAGE002
Figure 33407DEST_PATH_IMAGE003
An average vector representing the video features.
3. The semi-supervised learning based video anomaly detection method of claim 2, wherein the calculation of the Nephroler layer F of the neural network is obtained by the following formulad
Figure 507244DEST_PATH_IMAGE004
Wherein the content of the first and second substances,
Figure 682136DEST_PATH_IMAGE005
α is a predetermined coefficient, and
Figure 832495DEST_PATH_IMAGE006
s is [0,1 ]]A random number in between, and a random number,
Figure 448329DEST_PATH_IMAGE007
is used as a mask.
4. The video anomaly detection method based on semi-supervised learning according to claim 3, wherein the training parameters are obtained through a training model,
Figure 568600DEST_PATH_IMAGE008
wherein the content of the first and second substances,
Figure 597998DEST_PATH_IMAGE009
Figure 138832DEST_PATH_IMAGE010
Figure 17795DEST_PATH_IMAGE011
,a1、a2in order to be a hyper-parameter,
Figure 912064DEST_PATH_IMAGE012
in order to be a feature of an abnormal video,
Figure 294504DEST_PATH_IMAGE013
is a feature of a normal video,
Figure DEST_PATH_IMAGE014
and i is more than or equal to 1 and less than or equal to u, which means that the maximum value is taken for the final score of the corresponding u features after passing through the fully-connected network G.
5. The video anomaly detection method based on semi-supervised learning according to claim 1, wherein the expression of the fully connected network G is as follows:
Figure 881606DEST_PATH_IMAGE015
wherein the content of the first and second substances,
Figure 546067DEST_PATH_IMAGE016
for the purpose of the characteristics after the modification,
Figure 680245DEST_PATH_IMAGE017
and
Figure 871186DEST_PATH_IMAGE018
Figure 566872DEST_PATH_IMAGE019
the parameters to be trained, are,
Figure 420428DEST_PATH_IMAGE020
is expressed as
Figure 933797DEST_PATH_IMAGE021
When x is a vector, pairEach element in the vector proceeds
Figure 556408DEST_PATH_IMAGE020
And (5) operating.
6. The video anomaly detection method based on semi-supervised learning according to claim 1, wherein obtaining a mask with very filtering discriminative features according to the average vector of the video features comprises:
when the average vector of the video features is smaller than or equal to the product of the maximum element corresponding to the average vector and a preset coefficient, the element value of the corresponding position in the mask is 1;
and when the average vector of the video features is larger than the product of the corresponding maximum element and a preset coefficient, the element value of the corresponding position in the mask is 0.
7. The video anomaly detection method based on semi-supervised learning as recited in claim 1, wherein determining whether an anomaly occurs in a relevant position according to the score comprises:
judging whether the score is larger than a preset threshold value or not;
if the current picture is larger than the abnormal picture, the current picture is judged to be the abnormal picture so as to determine the abnormal position in the video.
8. A video anomaly detection device based on semi-supervised learning is characterized by comprising:
the video segmentation module is used for sequentially segmenting the video data into u x v frame video images, wherein v frame videos of each adjacent time sequence from the beginning are called a packet, each video can be divided into u packets, and u and v are positive integers;
the video feature extraction module is used for respectively extracting features of each packet to obtain corresponding video features, wherein each video has u video feature vectors;
a neural network training module, configured to obtain an average vector and an importance vector of a video feature according to the video feature, obtain a mask of a feature with a high filtering distinction property according to the average vector of the video feature, obtain a layer of a neural network according to the average vector of the video feature, the mask, and the importance vector, obtain a modified feature according to the layer of the video feature, and obtain a training parameter according to the modified feature, wherein the layer of;
and the model testing module is used for obtaining the modified characteristics according to adjacent packets and inputting the modified characteristics into the full-connection network during testing, calculating the score of each packet and judging whether the relevant position is abnormal or not according to the scores.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the video anomaly detection method based on semi-supervised learning according to any one of claims 1 to 7 when executing the program.
CN202010842914.8A 2020-08-20 2020-08-20 Video anomaly detection method and device based on semi-supervised learning Active CN111709411B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010842914.8A CN111709411B (en) 2020-08-20 2020-08-20 Video anomaly detection method and device based on semi-supervised learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010842914.8A CN111709411B (en) 2020-08-20 2020-08-20 Video anomaly detection method and device based on semi-supervised learning

Publications (2)

Publication Number Publication Date
CN111709411A CN111709411A (en) 2020-09-25
CN111709411B true CN111709411B (en) 2020-11-10

Family

ID=72547386

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010842914.8A Active CN111709411B (en) 2020-08-20 2020-08-20 Video anomaly detection method and device based on semi-supervised learning

Country Status (1)

Country Link
CN (1) CN111709411B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018227105A1 (en) * 2017-06-08 2018-12-13 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Progressive and multi-path holistically nested networks for segmentation
EP3625727A1 (en) * 2017-11-14 2020-03-25 Google LLC Weakly-supervised action localization by sparse temporal pooling network
CN110516536B (en) * 2019-07-12 2022-03-18 杭州电子科技大学 Weak supervision video behavior detection method based on time sequence class activation graph complementation
CN110502988A (en) * 2019-07-15 2019-11-26 武汉大学 Group positioning and anomaly detection method in video
CN111291699B (en) * 2020-02-19 2022-06-03 山东大学 Substation personnel behavior identification method based on monitoring video time sequence action positioning and abnormity detection

Also Published As

Publication number Publication date
CN111709411A (en) 2020-09-25

Similar Documents

Publication Publication Date Title
Khodabakhsh et al. Fake face detection methods: Can they be generalized?
CN108810620B (en) Method, device, equipment and storage medium for identifying key time points in video
CN107122806B (en) Sensitive image identification method and device
CN110047095B (en) Tracking method and device based on target detection and terminal equipment
JP7006702B2 (en) Image processing equipment, image processing methods and programs
CN109543760A (en) Confrontation sample testing method based on image filters algorithm
CN112508950B (en) Anomaly detection method and device
CN111179295B (en) Improved two-dimensional Otsu threshold image segmentation method and system
CN114120127A (en) Target detection method, device and related equipment
CN112001401A (en) Training model and training method of example segmentation network, and example segmentation network
CN113420745A (en) Image-based target identification method, system, storage medium and terminal equipment
CN112597928A (en) Event detection method and related device
CN113781483B (en) Industrial product appearance defect detection method and device
CN113743378B (en) Fire monitoring method and device based on video
JP6874864B2 (en) Image processing equipment, image processing methods and programs
CN116452966A (en) Target detection method, device and equipment for underwater image and storage medium
CN113706837B (en) Engine abnormal state detection method and device
CN111709411B (en) Video anomaly detection method and device based on semi-supervised learning
CN112052823A (en) Target detection method and device
CN112287905A (en) Vehicle damage identification method, device, equipment and storage medium
CN114187292B (en) Abnormality detection method, apparatus, device and storage medium for cotton spinning paper tube
CN115861315A (en) Defect detection method and device
CN115249316A (en) Industrial defect detection method and device
CN114612710A (en) Image detection method, image detection device, computer equipment and storage medium
CN112668451A (en) Crowd density real-time monitoring method based on YOLOv5

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220107

Address after: 315000 No. 138-1, Zhongshan West Road, Fenghua District, Ningbo City, Zhejiang Province (self declaration)

Patentee after: Shenlan industrial intelligent Innovation Research Institute (Ningbo) Co.,Ltd.

Address before: 213000 No.103, building 4, Chuangyan port, Changzhou science and Education City, No.18, middle Changwu Road, Wujin District, Changzhou City, Jiangsu Province

Patentee before: SHENLAN ARTIFICIAL INTELLIGENCE CHIP RESEARCH INSTITUTE (JIANGSU) Co.,Ltd.