CN108734106B - Rapid riot and terrorist video identification method based on comparison - Google Patents

Rapid riot and terrorist video identification method based on comparison Download PDF

Info

Publication number
CN108734106B
CN108734106B CN201810366397.4A CN201810366397A CN108734106B CN 108734106 B CN108734106 B CN 108734106B CN 201810366397 A CN201810366397 A CN 201810366397A CN 108734106 B CN108734106 B CN 108734106B
Authority
CN
China
Prior art keywords
video
riot
terrorist
layer
key frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810366397.4A
Other languages
Chinese (zh)
Other versions
CN108734106A (en
Inventor
李兵
胡卫明
原春锋
王博
赵永帅
刘琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201810366397.4A priority Critical patent/CN108734106B/en
Publication of CN108734106A publication Critical patent/CN108734106A/en
Application granted granted Critical
Publication of CN108734106B publication Critical patent/CN108734106B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention relates to the field of video classification, and provides a rapid riot and terrorist video identification method based on comparison, aiming at solving the problem that the accuracy (preciousness) and recall (recall) of riot and terrorist video identification are relatively low due to limited feature descriptor description capability in the riot and terrorist video identification based on visual features. The method comprises the following steps: carrying out shot segmentation on a video to be detected for carrying out riot and terrorist identification so as to select a key frame of the video to be detected; carrying out hash code operation on each key frame of the video to be detected by utilizing a pre-constructed riot and terrorist video identification model to obtain the hash code of each key frame; comparing the hash code of each key frame with the hash code of the video frame of the prestored violence-terrorist video respectively to determine the video frame similar to each key frame; and if the number of the video frames similar to each key frame exceeds a set threshold value, determining that the video to be detected is a riot video. The method and the device can quickly and accurately identify the riot and terrorist videos from a large number of videos.

Description

Rapid riot and terrorist video identification method based on comparison
Technical Field
The invention relates to the technical field of computer vision, in particular to the field of video classification, and specifically relates to a rapid riot and terrorist video identification method based on comparison.
Background
The riot terrorist video is a video containing contents of propaganda riot terrorist, religious extremes, ethnic fission and the like. With the rapid development of network technology, the era of mobile internet comes along, which makes more and more multimedia data appear in front of people, and the violent video is also spread and diffused in a large quantity. At present, the detection of the riot and terrorist videos mainly adopts manual examination and marking, and the method consumes a great amount of financial resources and material resources. Therefore, in the face of the internet with increasing data volume, a novel technology is needed to automatically filter terrorist video image contents and deploy and control early warning in important public places.
The visual features currently applied to the detection of the riot and terrorist videos are mainly divided into two types, namely static features and dynamic features. Static features are used to describe features within a video frame, including color, texture, structure, etc. The characteristics can effectively reflect information such as background, environment, main character appearance and the like, and MPEG-7 is a typical static characteristic and comprises visual descriptors such as CLD, CSD, SC, EH and the like. The dynamic characteristics are used for describing characteristics between video frames, including motion amplitude, direction, frequency and the like, and the characteristics can effectively reflect the motion condition of a main angle in the video. The dynamic features are mostly tracked and extracted by using a corner detection algorithm. Such as HOG, HOF, MoSIFT, etc. Wherein the MoSIFT algorithm is used for detecting local features, and the descriptor can only extract features in places with sufficient motion. However, the above feature descriptors have limited description capability, it is difficult to fully and accurately describe the content in the video image, and especially in the riot video, detection needs to be performed for a specific target, so that the detection work accuracy (precision) and recall (recall) are relatively low.
Disclosure of Invention
In order to solve the problems in the prior art, namely to solve the problems that the copy judgment of some edited videos cannot be accurately detected and the positions of the copied video segments cannot be accurately positioned due to the existence of a plurality of copied segments in two sections of videos, the application provides a rapid violence and terrorism video identification method based on comparison so as to solve the problems.
The application provides a rapid riot and terrorist video identification method based on comparison, which comprises the following steps: carrying out shot segmentation on a video to be detected for carrying out riot and terrorist identification so as to select a key frame of the video to be detected; performing hash code operation on each key frame of the video to be detected by using a pre-constructed riot and terrorist video identification model to obtain hash codes of the key frames; the riot and terrorist video identification model is constructed based on a Hash network, the input of the model is a video frame, and the output of the model is a Hash code of the input video frame; comparing the hash code of each key frame with the hash code of the video frame of the prestored violence-terrorist video respectively to determine the video frame similar to each key frame; counting the number of similar frames, and if the number of video frames similar to each key frame exceeds a set threshold, determining that the video to be detected is a riot video.
In some examples, "shot segmentation is performed on a video to be detected for riot and terrorist identification to select a key frame of the video to be detected", including: extracting a histogram of each frame of the video to be detected, and performing difference comparison on the histograms of adjacent video frames to determine a shot boundary of the video to be detected; and selecting the starting frame and/or the ending frame of each shot of the video to be detected as a key frame according to the determined shot boundary.
In some examples, "comparing the hash code of each of the key frames with the hash codes of the video frames of the pre-stored riot video respectively to determine video frames similar to each of the key frames" includes: comparing the hash code of each key frame with the hash codes of the video frames of the violence and terrorism videos in the video library respectively; calculating the Hamming distance between the hash code of the key frame and the hash code of the video frame; and confirming the key frame and the video frame with the Hamming distance radius within the set value range as similar frames.
In some examples, the training method of the riot-terrorist video recognition model is as follows: classifying preset training sample pictures into positive sample data and negative sample data; wherein, the positive sample data are riot and riot pictures, and the negative sample data are riot and non-riot pictures; adjusting the size of the training sample picture, randomly intercepting a region with a set size from the adjusted training sample picture, and performing sample mean processing; training the processed picture by using the initial riot and terrorist video recognition model to obtain a riot and terrorist video recognition model based on the Hash network.
In some examples, the network structure of the initial riot video recognition model includes an input layer, a convolutional layer, and a fully-connected layer, where the first layer is the input layer, the second to sixth layers are convolutional layers, and the seventh to ninth layers are fully-connected layers.
In some examples, in training the riot video recognition model, the input layer inputs the training sample picture after sample mean processing.
In some examples, the convolutional layer receives an output of a previous layer, and outputs the convolutional layer after being activated by an activation function of the current layer after being subjected to convolution processing; the full connection layer receives the output of the previous layer, and the output is output after the activation function of the layer is activated after the convolution processing of the layer.
In some examples, the activation functions of the second layer to the eighth layer of the network structure of the initial riot video recognition model are:
Figure GDA0002614924490000031
where Re LU (x) is the activation function and x is the convolved output of the layer.
In some examples, the activation function of the ninth layer of the network structure of the initial riot video recognition model is:
Figure GDA0002614924490000032
wherein (x) is a pair of bi,jAnd obtaining the result after the partial derivation.
In some examples, the loss function for training the above-described riot-terrorist video recognition model is:
Figure GDA0002614924490000033
wherein, yiIndicating whether the sample pairs are similar, i.e. y i1 indicates that the two samples are similar, but not in phaseLike;
Figure GDA0002614924490000034
is the euclidean distance between two sample binary codes in a sample pair;
Figure GDA0002614924490000035
is the Manhattan distance L of the sample binary code and the identity matrixrIs that the loss function m (m > 0) is a marginal threshold parameter, alpha is a scaling factor, bi,1And hash code b of sample 1i,2Is the hash code of sample 2, N is the total number of training sample pairs, and k is the dimension of the hash code.
According to the rapid riot and terrorist video identification method based on comparison, a key frame is extracted by performing structural analysis on a video subjected to riot and terrorist detection; secondly, determining the hash code of each key frame of the video segment by using a riot and terrorist video identification model based on a hash network; and then, matching the hash code of the key frame of the video to be detected with the hash code of the key frame of the pre-stored riot and terrorist video, and determining whether the video to be detected is the riot and terrorist video. According to the method, the video to be detected is subjected to structured analysis, the key frame is extracted, and the good balance between the precision and the speed of lens detection is realized; by comparing the hash code of the key frame with the pre-stored hash code, whether the video to be detected comprises the video can be rapidly judged; the pre-stored hash codes occupy small space and are high in retrieval speed, so that the method can quickly and accurately identify the riot and terrorist videos.
Drawings
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a schematic flow chart diagram illustrating one embodiment of a contrast-based fast riot and terrorist video identification method according to the present application;
FIG. 3 is a schematic diagram of a network structure of a hash network model in an embodiment of a contrast-based fast riot and terrorist video identification method according to the present application;
fig. 4 is a flowchart illustrating an application example of the contrast-based fast riot and terrorist video identification method according to the present application.
Detailed Description
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 is a schematic diagram illustrating an exemplary system architecture to which an embodiment of the contrast-based fast riot video identification method of the present application may be applied.
As shown in fig. 1, the system architecture may include a terminal device 101, a network 102, and a server 103. Network 102 is the medium used to provide communication links between terminal devices 101 and server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use terminal device 101 to interact with server 103 over network 102 to receive or send messages and the like. Various communication client applications, such as a web browser application, a video browsing application, a video uploading application, social platform software, etc., may be installed on the terminal device 101.
The terminal device 101 may be various electronic devices having a display screen and supporting video browsing or video uploading, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 103 may be a server that provides various services, such as a video processing server that performs including recognition on a video uploaded by the terminal apparatus 101, or an application platform. The video processing server can analyze and process the video data uploaded by each terminal device connected with the video processing server through the network, and feed back the processing result (such as the video riot identification result) to the terminal device or a third party for use.
It should be noted that the contrast-based fast riot video identification method provided in the embodiment of the present application is generally executed by the server 103, and accordingly, an apparatus to which the method shown in the present application can be applied is generally disposed in the server 103.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to fig. 2, a flow diagram of one embodiment of a contrast-based fast riot-terrorist video identification method according to the present application is shown. The rapid riot and terrorist video identification method based on comparison comprises the following steps:
step 201, performing shot segmentation on a video to be detected for riot and terrorist identification to select a key frame of the video to be detected.
In this embodiment, an electronic device (such as the server in fig. 1) or an application platform that uses a rapid riot and terrorist video identification method based on comparison may be used to obtain a video to be detected that is to be subjected to the riot and terrorist detection. The electronic equipment or the application platform respectively performs shot segmentation on the obtained video to be detected so as to extract key frames of the video to be detected. For example, the video to be detected may be obtained from a terminal device connected to the electronic device or the application platform, for example, after a user using the terminal device connected to the server or the application platform via a network uploads the video, the server or the application platform obtains the video as the video to be detected.
Specifically, the "performing shot segmentation on a video to be detected for riot and terrorist identification to select a key frame of the video to be detected" includes: extracting a histogram of each frame of video of a video to be detected, and performing difference comparison on the histograms of adjacent video frames to determine a shot boundary of the video to be detected; and selecting the starting frame and/or the ending frame of each shot of the video to be detected as a key frame according to the determined shot boundary. The histogram for extracting each frame of video may be a gray level histogram or a color histogram. After a video to be detected is divided into a series of shots, the first frame or the last frame of each shot can be used as a key frame of the shot; the first and last frames may also be used as key frames.
Step 202, performing hash code operation on each key frame of the video to be detected by using a pre-constructed riot and terrorist video identification model to obtain the hash code of each key frame.
In this embodiment, based on the plurality of key frames of the video to be detected selected in step 201, the electronic device or the application platform performs an operation by using a pre-established hash network model to determine the hash code of each key frame. Here, the riot-terrorist video recognition model may be a deep convolutional neural network model, for example, a Siamese network model, and the hash operation of the video keyframe to be detected is completed by adding a designed hash loss using the Siamese network model. The riot and terrorist video identification model is constructed based on a hash network, the input of the model is a video frame, and the output of the model is a hash code of the input video frame.
The method for determining the key frame hash code by the riot-terrorist video identification model comprises the steps of judging an input frame of picture, and finishing the hash operation of the input key frame (picture) by utilizing the optimized operation of the deep convolutional neural network. The riot and terrorist video identification model can be operated by utilizing the characteristics of key frames, and the characteristics of the key frames can be static characteristics which comprise color, texture, structure and the like and reflect information such as background, environment, main role appearance and the like; and dynamic characteristics including motion amplitude, direction, frequency, etc. reflecting the motion status of the principal angles in the video. And determining the hash code of the key frame by using the characteristics of the key frame.
Step 203, comparing the hash code of each key frame with the hash codes of the video frames of the pre-stored riot video respectively, and determining the video frames similar to each key frame.
In this embodiment, based on the hash code of the key frame of the video to be detected obtained by the computation using the riot and terrorist video recognition model in step 202, the electronic device or the application platform compares with the pre-stored hash code to determine whether the key frame of the video to be detected is similar to the video frame of the riot and terrorist video. The pre-stored hash code may be a hash code of a video frame of the riot video.
Here, the above-mentioned pre-stored hash code is obtained by: firstly, extracting an violence video from a video library, and then extracting key video frames of all the extracted violence videos in an off-line or on-line mode; and finally, inputting the extracted key video frames into a riot and terrorist video identification model based on a hash network for operation to obtain a hash code of the riot and terrorist video, and storing the obtained hash code of the riot and terrorist video.
The hash code comparison may be comparing the hash code of the key frame with a hamming distance of a pre-stored hash code, and determining whether the key frame is similar to the video frame of the riot video according to the hamming distance.
In some optional implementation manners of this embodiment, the comparing the hash code of each of the key frames with the hash codes of the video frames of the pre-stored riot video to determine video frames similar to each of the key frames includes: comparing the hash code of each key frame with the hash codes of the video frames of the violence and terrorism videos in the video library respectively; calculating the Hamming distance between the hash code of the key frame and the hash code of the video frame; and confirming the key frame and the video frame with the Hamming distance radius within the set value range as similar frames. Specifically, two pictures having a hamming distance radius of 2 or less may be confirmed as similar frames.
And 204, counting the number of similar frames, and if the number of the video frames similar to each key frame exceeds a set threshold, determining that the video to be detected is a riot video.
In this embodiment, in step 203, key frames similar to the video frames of the riot and terrorist video in the riot and terrorist video database are determined, the number of key frames similar to the video frames of the riot and terrorist video in the video to be detected is counted, and if the number is greater than a set threshold, it can be determined that the video to be detected is the riot and terrorist video. Specifically, if the video to be detected has 3 frames or more than 3 frames of key frames similar to the video frames of the riot and terrorist video in the riot and terrorist video library, the video to be detected is determined to be the riot and terrorist video.
In some optional implementation manners of this embodiment, the training method of the above hash network-based riot-terrorist video recognition model is as follows: classifying preset training sample pictures into positive sample data and negative sample data, wherein the positive sample data are violence and violence pictures, and the negative sample data are violence and non-violence pictures; adjusting the size of the training sample picture, randomly intercepting a region with a set size from the adjusted training sample picture, and performing sample mean processing; training the processed picture by using the initial riot and terrorist video recognition model to obtain a riot and terrorist video recognition model based on the Hash network. Specifically, the data for training may be divided into two groups: positive sample data and negative sample data; the positive sample data can be an riot picture and an riot picture, the label of the positive sample data is set to be 1, the negative sample data can be an riot picture and a non-riot picture, and the label of the negative sample data is set to be 0; so that the hash codes of the violent-terrorism videos are similar as much as possible, and the hash codes of the non-violent-terrorism videos and the violent-terrorism videos are far away as much as possible.
And adjusting the training sample picture to 256 × 256, then randomly intercepting the region with the size of 227 × 227, subtracting all sample mean values to be used as the processed sample picture, and directly inputting the processed sample picture into an initial hash network model for training. The sample mean value is the mean value of all pixel points of the sample picture; after the sample mean value is subtracted, training and testing are carried out to improve the training speed and the testing precision.
Inputting the pair of pictures (the first violence and terrorist pictures and the second violence and terrorist pictures) of the positive sample data or the pair of pictures (one frame is the violence and terrorist picture and one frame is the non-violence and terrorist picture) of the negative sample data into the initial hash network model for training.
In some optional implementations of this embodiment, the network structure of the initial riot and terrorist video recognition model includes an input layer, a convolutional layer, and a full connection layer, as shown in fig. 3, which is a schematic network structure diagram of a hash network model. Wherein, the first layer is an input layer, the second layer to the sixth layer are convolution layers, and the seventh layer to the ninth layer are full connection layers. The processed training sample picture is input into the input layer, and the training sample picture is a two-frame RGB three-channel picture. The above-described convolutional layers of the second to sixth layers are represented by conv 1-conv 5 in fig. 3; the fully-connected layer of the seventh to ninth layers is denoted by fc1-fc3 in fig. 3; the loss function (loss) in the fully-connected layer includes: two characteristics of distinguishing power (Discriming) and near Binary-like code (Binary-like).
The convolution layer receives the output of the previous layer, and outputs the output after being activated by the activation function of the current layer after being subjected to convolution processing; the full connection layer receives the output of the previous layer, and the output is output after the activation function of the layer is activated after the convolution processing of the layer. Specifically, the method comprises the following steps:
the second layer is a convolution layer, 64 convolution kernels are provided, the size of each convolution kernel is 11 × 11, the convolution step is 4, padding is 0, and an active layer, a downsampling layer and a normalization layer are connected after the output feature map. The activation layer activation function adopts a ReLU function. The sampling layer sampling mode is maximum value sampling, the sampling kernel is 3 × 3, and the step size is 2. The LRN normalization method adopted by the normalization layer has a kernel size of 5, an alpha value of 0.00001, and a beta value of 0.75. Where alpha is the scaling factor and beta is the exponential term. The second layer obtains the output of the first layer, and the output is C after convolution processing1,C1Input to a down-sampling layer to obtain P1,P1Input to the active layer to obtain A1,A1Input to the normalization layer to obtain L1And finally output L1To the third layer.
The third layer is a convolution layer, 256 convolution kernels are provided, the size of each convolution kernel is 5 multiplied by 5, the convolution step is 1, the padding is 2, and an activation layer, a downsampling layer and a normalization layer are connected after the output feature map. The activation layer activation function adopts a ReLU function. The sampling layer sampling mode is maximum value sampling, the sampling kernel is 3 × 3, and the step size is 2. The LRN normalization method adopted by the normalization layer has a kernel size of 5, an alpha value of 0.00001, and a beta value of 0.75. The third layer obtains the output of the second layer, and the output is C after convolution processing2,C2Input to a down-sampling layer to obtain P2,P2Input to the active layer to obtain A2,A2Input to the normalization layer to obtain L2And finally output L2To the fourth layer.
The fourth layer is a convolutional layer having 256 convolutional kernels each of 3 × 3 in sizeThe step size is 1, padding is 1, and the activation layer is connected behind the output feature map. The activation layer activation function adopts a ReLU function. The fourth layer obtains the output of the third layer, and the output is C after convolution processing3,C3Input to the active layer to obtain A3And finally output A3To the fifth layer.
The fifth layer is a convolution layer, the convolution layer has 256 convolution kernels, the size of each convolution kernel is 3 multiplied by 3, the convolution step is 1, the padding is 1, and the activation layer is connected behind the output feature map. The activation layer activation function adopts a ReLU function. The fifth layer obtains the output of the fourth layer, and the output is C after convolution processing4,C4Input to the active layer to obtain A4And finally output A4To the sixth layer.
The sixth layer is a convolution layer, 256 convolution kernels are provided, the size of each convolution kernel is 3 × 3, the convolution step is 1, padding is 1, and an active layer and a downsampling layer are connected after the output feature map. The activation layer activation function adopts a ReLU function. The sampling layer sampling mode is maximum value sampling, the sampling kernel is 3 × 3, and the step size is 2. The sixth layer obtains the output of the fifth layer, and the output is C after convolution processing5,C5Input to a down-sampling layer to obtain P5,P5Input to the active layer to obtain A5And finally output A5To the seventh layer.
The seventh layer is a fully connected layer, there are 4096 convolution kernels, each convolution kernel is 1 × 1 in size, the step size is 1, and the activation layer is connected behind the output feature map. The activation layer activation function adopts a ReLU function. The seventh layer obtains the output of the sixth layer, and the output is C after convolution processing6,C6Input to the active layer to obtain A6And finally output A6To the eighth layer.
The eighth layer is a fully connected layer, there are 4096 convolution kernels, each convolution kernel is 1 × 1 in size, the step size is 1, and the activation layer is connected behind the output feature map. The activation layer activation function adopts a ReLU function. The eighth layer obtains the output of the seventh layer, and the output is C after convolution processing7,C7Input to the active layer to obtain A7And finally output A7To the last layer.
The ninth layer is a full-connected layer, the number of convolution kernels is according toThe required hash code length is determined, the size of each convolution kernel is 1 multiplied by 1, the step length is 1, and the hash loss layer is connected behind the output feature graph. The hash loss layer adopts a hash function. The ninth layer obtains the output of the eighth layer, and the output is C after convolution processing8,C8Hash binary code (b) input to a hash loss layer output sample pairi,1,bi,2)。
Each of the above layers includes an activation function, wherein the activation functions of the second layer to the eighth layer are:
Figure GDA0002614924490000091
where Re LU (x) is the activation function and x is the convolved output of the layer.
The activation function of the ninth layer of the network structure of the initial riot and terrorist video identification model is as follows:
Figure GDA0002614924490000092
wherein (x) is a pair of bi,jAnd obtaining the result after the partial derivation.
The loss function for training the riot and terrorist video recognition model is as follows:
Figure GDA0002614924490000101
wherein, yiIndicating whether the sample pairs are similar, i.e. yi1 means that two samples are similar, and vice versa;
Figure GDA0002614924490000102
is the euclidean distance between two sample binary codes in a sample pair;
Figure GDA0002614924490000103
is the Manhattan distance L of the sample binary code and the identity matrixrIs that the loss function m (m > 0) is a marginal threshold parameter and alpha isScaling factor, bi,1And bi,2Is the hash code of sample 1 and sample 2, N is the total number of training sample pairs, and k is the dimension of the hash code.
As an example, referring to fig. 4, fig. 4 shows a schematic diagram of contrast-based fast riot video recognition. As shown in fig. 4, on one hand, key frames of the riot and terrorist video are extracted from the video database in advance, and the hash code of each key frame is generated by using the riot and terrorist video identification model. On the other hand, key frames of the video to be detected are extracted, and the hash codes of the key frames are generated by utilizing the hash network model. And then comparing the Hamming distance between the hash code of the video key frame to be detected and the hash code of the riot video key frame. Two pictures with Hamming distance radius within 2 are confirmed as similar frames. And finally, if the video to be detected has 3 frames and more than 3 frames of key frames which are similar to the key frames of the riot and terrorist video in the video library, the video is considered as the riot and terrorist video.
The method provided by the embodiment of the application confirms the similar frames of the key frames of the video to be detected by matching the hash codes of the key frames of the video to be detected with the hash codes of the key frames of the riot-terrorist video, and confirms whether the video to be detected is the riot-terrorist video or not according to the number of the key frames similar to the key frames in the video database in the video to be detected. The video key frames are extracted by utilizing shot segmentation, so that the good balance between the precision and the speed of shot detection is realized; by comparing the hash code of the key frame with the pre-stored hash code, whether the video to be detected comprises the video can be rapidly judged; the pre-stored hash codes occupy small space and have high retrieval speed; the Hash code of the key frame can be accurately and quickly obtained by utilizing the Hash network model; therefore, the method provided by the invention can be used for quickly and accurately identifying the riot and terrorist videos.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (7)

1. A rapid riot and terrorist video identification method based on comparison is characterized by comprising the following steps:
carrying out shot segmentation on a video to be detected for carrying out riot and terrorist identification so as to select a key frame of the video to be detected;
carrying out hash code operation on each key frame of the video to be detected by utilizing a pre-constructed riot and terrorist video identification model to obtain the hash code of each key frame; the riot and terrorist video identification model is constructed based on a Hash network, the input of the model is a video frame, and the output of the model is a Hash code of the input video frame;
comparing the hash code of each key frame with the hash code of the video frame of the prestored violence-terrorist video respectively to determine the video frame similar to each key frame;
counting the number of similar frames, and if the number of video frames similar to each key frame exceeds a set threshold, determining that the video to be detected is a riot video;
the step of comparing the hash code of each key frame with the hash codes of the video frames of the pre-stored riot video respectively to determine the video frames similar to each key frame includes:
comparing the hash code of each key frame with the hash codes of the video frames of the violence and terrorism videos in the video library respectively;
calculating the Hamming distance between the hash code of the key frame and the hash code of the video frame;
confirming the key frame and the video frame with the Hamming distance radius within a set value range as similar frames;
the training method of the riot and terrorist video identification model comprises the following steps:
classifying preset training sample pictures into positive sample data and negative sample data; wherein, the positive sample data are riot and riot pictures, and the negative sample data are riot and non-riot pictures;
adjusting the size of the training sample picture, randomly intercepting a region with a set size from the adjusted training sample picture, and carrying out sample mean processing;
training the processed picture by using the initial riot and terrorist video identification model to obtain a riot and terrorist video identification model based on a Hash network;
the loss function for training the riot and terrorist video recognition model is as follows:
Figure FDA0002714876420000021
s.t.bi,j∈{-1,+1}k,i∈{1,...,N},j∈{1,2}
wherein, yiIndicating whether the ith sample pair is similar, i.e. yi1 means that the two samples in the ith sample pair are similar and vice versa;
Figure FDA0002714876420000022
is the euclidean distance between two sample binary codes in the ith sample pair; | | bi,1-1|||1、|||bi,2-1|||1Is the Manhattan distance between the two sample binary codes in the ith sample pair and the identity matrix; l isrIs a loss function; m is a marginal threshold parameter, wherein m > 0; α is a scaling factor; bi,1And bi,2Hash codes of sample 1 and sample 2 in the ith sample pair; n is the total number of training sample pairs; k is the dimension of the hash code, bi,jIs the hash code of sample j in the ith sample pair.
2. The contrast-based rapid riot-terrorist video identification method according to claim 1, wherein "shot segmentation is performed on a video to be detected for riot-terrorist identification to select key frames of the video to be detected" comprises:
extracting a histogram of each frame of the video to be detected, and performing difference comparison on the histograms of adjacent video frames to determine a shot boundary of the video to be detected;
and selecting a start frame and/or an end frame of each shot of the video to be detected as a key frame according to the determined shot boundary.
3. The contrast-based fast riot-terrorist video recognition method according to claim 1, wherein the network structure of the initial riot-terrorist video recognition model comprises an input layer, a convolutional layer and a fully connected layer, wherein the first layer is the input layer, the second layer to the sixth layer are convolutional layers, and the seventh layer to the ninth layer are fully connected layers.
4. The contrast-based rapid riot-terrorist video recognition method according to claim 3, wherein in training the riot-terrorist video recognition model, the training sample picture processed by the sample mean is inputted into the input layer.
5. The method as claimed in claim 3, wherein the convolutional layer receives the output of the previous layer, and outputs the result after being activated by the activation function of the current layer after being convolved; and the full connection layer receives the output of the previous layer, and outputs the output after the activation of the activation function of the layer after the convolution processing of the layer.
6. The contrast-based fast riot-terrorist video recognition method according to claim 5, wherein the activation functions of the second layer to the eighth layer of the network structure of the initial riot-terrorist video recognition model are:
Figure FDA0002714876420000031
where Re LU (x) is the activation function and x is the convolved output of the layer.
7. The contrast-based fast riot-terrorist video recognition method of claim 5, wherein the activation function of the ninth layer of the network structure of the initial riot-terrorist video recognition model is:
Figure FDA0002714876420000032
wherein (x) is a pair of bi,jThe result after the partial derivation, bi,jFor the hash code of the jth sample in the ith sample pair, i ∈ {1,..., N }, and j ∈ {1,2 }.
CN201810366397.4A 2018-04-23 2018-04-23 Rapid riot and terrorist video identification method based on comparison Active CN108734106B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810366397.4A CN108734106B (en) 2018-04-23 2018-04-23 Rapid riot and terrorist video identification method based on comparison

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810366397.4A CN108734106B (en) 2018-04-23 2018-04-23 Rapid riot and terrorist video identification method based on comparison

Publications (2)

Publication Number Publication Date
CN108734106A CN108734106A (en) 2018-11-02
CN108734106B true CN108734106B (en) 2021-01-05

Family

ID=63939718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810366397.4A Active CN108734106B (en) 2018-04-23 2018-04-23 Rapid riot and terrorist video identification method based on comparison

Country Status (1)

Country Link
CN (1) CN108734106B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918537B (en) * 2019-01-18 2021-05-11 杭州电子科技大学 HBase-based rapid retrieval method for ship monitoring video content
CN109785214A (en) * 2019-03-01 2019-05-21 宝能汽车有限公司 Safety alarming method and device based on car networking
CN110796182A (en) * 2019-10-15 2020-02-14 西安网算数据科技有限公司 Bill classification method and system for small amount of samples
CN111078941B (en) * 2019-12-18 2022-10-28 福州大学 Similar video retrieval system based on frame correlation coefficient and perceptual hash
CN112395457B (en) * 2020-12-11 2021-06-22 中国搜索信息科技股份有限公司 Video to-be-retrieved positioning method applied to video copyright protection
CN112861976B (en) * 2021-02-11 2024-01-12 温州大学 Sensitive image identification method based on twin graph convolution hash network
CN114724074B (en) * 2022-06-01 2022-09-09 共道网络科技有限公司 Method and device for detecting risk video

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103744973A (en) * 2014-01-11 2014-04-23 西安电子科技大学 Video copy detection method based on multi-feature Hash
CN105718861A (en) * 2016-01-15 2016-06-29 北京市博汇科技股份有限公司 Method and device for identifying video streaming data category
WO2018017566A1 (en) * 2016-07-18 2018-01-25 The Regents Of The University Of Michigan Hash-chain based sender identification scheme

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103744973A (en) * 2014-01-11 2014-04-23 西安电子科技大学 Video copy detection method based on multi-feature Hash
CN105718861A (en) * 2016-01-15 2016-06-29 北京市博汇科技股份有限公司 Method and device for identifying video streaming data category
WO2018017566A1 (en) * 2016-07-18 2018-01-25 The Regents Of The University Of Michigan Hash-chain based sender identification scheme

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"Image Copy Detection Based on Convolutional Neural Networks";Jing Zhang等;《CCPR 2016: Pattern Recognition》;20161022;第111-121页 *
"基于卷积神经网络的鸟类视频图像检索研究";张惠凡等;《科研信息化技术与应用》;20171031;第8卷(第5期);第50-57页 *
"基于深度卷积神经网络和二进制哈希学习的图像检索方法";彭天强等;《电子与信息学报》;20160831;第88卷(第8期);第2068-2076页 *
"有害音视频一致性检测方法的研究与实现";王媛媛等;《中国人民公安大学学报(自然科学版)》;20160930(第3期);第83-88页 *

Also Published As

Publication number Publication date
CN108734106A (en) 2018-11-02

Similar Documents

Publication Publication Date Title
CN108734106B (en) Rapid riot and terrorist video identification method based on comparison
CN110569721B (en) Recognition model training method, image recognition method, device, equipment and medium
US9928407B2 (en) Method, system and computer program for identification and sharing of digital images with face signatures
WO2022134584A1 (en) Real estate picture verification method and apparatus, computer device and storage medium
CN108985190B (en) Target identification method and device, electronic equipment and storage medium
EP2659400A1 (en) Method, apparatus, and computer program product for image clustering
WO2021237570A1 (en) Image auditing method and apparatus, device, and storage medium
CN111651636A (en) Video similar segment searching method and device
CN112668320A (en) Model training method and device based on word embedding, electronic equipment and storage medium
CN114663871A (en) Image recognition method, training method, device, system and storage medium
CN113111880A (en) Certificate image correction method and device, electronic equipment and storage medium
CN110895811B (en) Image tampering detection method and device
CN111177450B (en) Image retrieval cloud identification method and system and computer readable storage medium
CN108446737B (en) Method and device for identifying objects
CN112686847B (en) Identification card image shooting quality evaluation method and device, computer equipment and medium
US11087121B2 (en) High accuracy and volume facial recognition on mobile platforms
CN113780239A (en) Iris recognition method, iris recognition device, electronic equipment and computer readable medium
CN112270305A (en) Card image recognition method and device and electronic equipment
CN111966851A (en) Image recognition method and system based on small number of samples
CN112036501A (en) Image similarity detection method based on convolutional neural network and related equipment thereof
CN111079704A (en) Face recognition method and device based on quantum computation
CN111985483B (en) Method and device for detecting screen shot file picture and storage medium
CN117333926B (en) Picture aggregation method and device, electronic equipment and readable storage medium
CN112565601B (en) Image processing method, image processing device, mobile terminal and storage medium
CN111611417B (en) Image de-duplication method, device, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant