CN105844239A - Method for detecting riot and terror videos based on CNN and LSTM - Google Patents

Method for detecting riot and terror videos based on CNN and LSTM Download PDF

Info

Publication number
CN105844239A
CN105844239A CN201610168334.9A CN201610168334A CN105844239A CN 105844239 A CN105844239 A CN 105844239A CN 201610168334 A CN201610168334 A CN 201610168334A CN 105844239 A CN105844239 A CN 105844239A
Authority
CN
China
Prior art keywords
cnn
lstm
feature
omega
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610168334.9A
Other languages
Chinese (zh)
Other versions
CN105844239B (en
Inventor
苏菲
宋凡
宋一凡
赵志诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201610168334.9A priority Critical patent/CN105844239B/en
Publication of CN105844239A publication Critical patent/CN105844239A/en
Application granted granted Critical
Publication of CN105844239B publication Critical patent/CN105844239B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/44Event detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for detecting riot and terror videos based on CNN and LSTM, and belongs to the technical field of pattern recognition, video detection and deep learning. The detection method comprises following steps: firstly, key frame sampling is performed on the video to be detected and key frame features are extracted; expression and discrimination at video level are performed, wherein the expression and the discrimination comprise VLAD feature expression and SVM discrimination of a CNN semantic module, VLAD feature expression and SVM discrimination of a CNN scene module, and LSTM discrimination of a LSTM time sequence module; finally, results are fused. According to the method, the advantages of CNN on image feature extraction and LSTM on time sequence expression are utilized, and features of riot and terror videos at scene are taken into full consideration; test index mAP value reaches 98.0% in real tests which approaches manual operation level. In terms of operation speed, only single machine GPU acceleration mode is adopted and 76.4 seconds of network video can be processed per second; the method is suitable for blocking the spread of riot and terror videos on large video websites and therefore it helps maintain social stability and state long-term peace and order.

Description

A kind of sudden and violent probably video detecting method based on CNN and LSTM
Technical field
The invention belongs to pattern recognition, Video Detection, degree of depth learning art field, be specifically related to a kind of based on CNN and LSTM Sudden and violent probably video detecting method.
Background technology
In recent years, a large amount of local and overseas violence terror videos are the most illegally propagated, and have become as the great of harm social stability Malignant tumor.But relevant automatization cruelly fears video detection technology and is still in development, great majority are to use the inspection of existing event video Survey method, these methods can be divided three classes substantially: video detecting method based on image local feature, based on semantic concept Video detecting method and Video Detection based on convolutional neural networks (Convolutional Neural Network is called for short CNN) Method.
List of references [1] (Sun, Chen, and Ram Nevatia. " Large-scale web video event classification by use of fisher vectors."In Applications of Computer Vision(WACV),2013IEEE Workshop on,pp.15-22. IEEE, 2013.) disclose a kind of video detecting method based on image local feature, first in key frame aspect, extract image Local feature, such as Scale invariant features transform (Scale-Invariant Feature Transform, be called for short SIFT) feature; Subsequently in video aspect, the mode using Fisher core vector (Fisher Vector) to express obtains the overall situation expression of video;Finally Recycling support vector machine (Support Vector Machine is called for short SVM) grader, it determines the classification of video, e.g. Sudden and violent probably video or non-sudden and violent probably video.The method need not the most artificial mark in the training process, simple, but exist Following deficiency: (1) Detection accuracy is limited to used local feature.(2) detection speed is slower.The local such as SIFT is special The computing cost levied is relatively big, causes the method should not be applied to extensive Video Detection task, and practicality is the highest.
List of references [2] (Liu, J.;Yu,Qian;Javed,O.;Ali,S.;Tamrakar,A.;Divakaran,A.;Hui Cheng;& Sawhney, H., Video event recognition using concept attributes, WACV, 2013.) disclose one based on language The video detecting method of justice concept, it is necessary first in key frame aspect, uses local shape factor to combine with SVM classifier Mode, it determines (for sudden and violent probably video, these semantic concepts include but not limited to rifle for various default semantic concepts in picture , blast, masked man, cruelly fear tissue marker etc.) confidence level;Subsequently in video aspect, Fisher Vector is used to express Mode, generate video global characteristics;Finally use SVM classifier again, it determines the type of video.Due to default semanteme Concept has guidance quality, and video detecting method based on semantic concept is higher for the precision of sudden and violent probably video identification, but has following Not enough: needing the image pattern having mark in a large number during (1) training, artificial expense is bigger.(2) cruelly probably regard when to be detected When frequency does not comprises arbitrary default concept, accuracy of detection does not ensure.(3) detection speed is slower.
List of references [3] (Xu, Zhongwen, Yi Yang, and Alexander G.Hauptmann. " Adiscriminative CNN Video representation for event detection. " arXiv preprint arXiv:1411.4006 (2014) .) discloses a kind of base In the video detecting method of CNN semantic feature, in the training stage, with there being mark image training CNN semantic model in a large number.And At test phase, utilize the CNN semantic feature (such as features such as FC6, FC7, SPP) of the model extraction key frame trained, Local feature polymerization is used to describe son (Vector of Locally Aggregated Descriptors, VLAD) in video aspect subsequently Method, carries out the expression of feature and obtains the high dimensional feature of video, and the method detects (Multimedia Event at multi-media events Detection, is called for short MED) obtain preferable effect on data set.It is special at still image that the method takes full advantage of CNN Levy the advantage in terms of extraction, preferable effect can be obtained in sudden and violent probably Video Detection, but still suffer from the following aspect that can improve: (1) the method during VLAD feature representation for video temporal characteristics use and insufficient.(2) the method is only It is extracted the CNN semantic feature of key frame, is not concerned with cruelly fearing other individualized features of video.To sum up, based on CNN The video detecting method of semantic feature still has certain performance boost space.
Summary of the invention
In order to solve problems of the prior art, the present invention proposes one based on CNN and long mnemon (Long in short-term Short-term Memory, is called for short LSTM) sudden and violent probably video detecting method.This process employs CNN in image characteristics extraction With LSTM advantage in terms of time series expression, and take into full account sudden and violent probably video characteristic in terms of scene, in actual test Testing index mAP value reaches 98.0%, close to manual work level.In terms of the speed of service, accelerate only with unit GPU Mode, just can process the Internet video (average bit rate is 632kbps) of 76.4 seconds each second, be suitable to block sudden and violent probably video and exist Propagation on Large video website, is conducive to maintaining social stability and country's long-term stability.
By finding the analysis of a large amount of sudden and violent probably videos, sudden and violent probably video is in sequential organization and the great characteristic of photographed scene two aspect.Base Find in this, the present invention on the basis of original video detection module based on CNN semantic feature (be called for short CNN semantic modules), Add video detection module based on CNN scene characteristic (being called for short CNN scene module) and sequential based on LSTM inspection Survey module (being called for short LSTM tfi module).For video to be detected, the present invention uses semanteme, scene and sequential organization tripartite The mode that face testing result blends, differentiates whether video relates to probably, reduces false drop rate, improves the reality of method more comprehensively By value.
Based on CNN and LSTM the sudden and violent probably video detecting method that the present invention provides, specifically includes following steps:
The first step, carries out key frame sampling, and extracts key frame feature video to be detected;
Second step, utilizes the key frame feature extracted, carries out expression and the differentiation of video aspect;Including CNN semantic modules VLAD feature representation and SVM differentiate, the scene VLAD feature representation of CNN scene module differentiates with SVM, and LSTM The LSTM of tfi module differentiates.
3rd step, result merges.Have employed level convergence strategy based on checksum set mAP value, i.e. for a video to be identified, The judgement score of three modules (CNN semantic modules, CNN scene module and LSTM tfi module) of calculating respectively, then with Each module mAP value on checksum set is weighted merging as weight.
Advantages of the present invention or have the beneficial effects that:
(1) single time sequence information using CNN semantic modules to have ignored video in prior art.Exist for making full use of sudden and violent probably video Feature in terms of sequential organization, the present invention, on the basis of original method, adds LSTM tfi module.Test result shows, Introducing time sequence information, the lifting for accuracy of identification is the most notable.
(2) present invention is based on to the extensive sudden and violent statistics fearing video sample and analysis, excavates sudden and violent probably video in terms of recording scene Great characteristic.Therefore, on the basis of original structure, CNN scene module is joined in sudden and violent probably Video Detection by the present invention, protects Demonstrate,prove the accuracy of identification under particular video frequency scene.
Based on CNN and LSTM the sudden and violent probably video detecting method that the present invention provides, is mainly used in government network supervision department With Large video website, whether the video uploaded for detecting user relates to violence horrible content.Once find that video is doubtful to comprise This type of illegal contents, should give a warning in time, friendship manual review:
(1) present invention could apply to the series of rows disorder of internal organs that online sudden and violent probably audio frequency and video " are rooted out " by government network supervision department, original On the basis of artificial report, the present invention is used to be sampled detection for the Online Video of major video website, for finding The video website of problem issues rectification notice, safeguards the safety of domestic internet environment.
(2) present invention could apply in the content safety system of Large video website, both can be during user's uploaded videos Filtering out sudden and violent probably content, can checking for existing stock's video again, it is to avoid because touching the red line of content safety to website Cause unnecessary loss.
Accompanying drawing explanation
Fig. 1 is the video detecting method flow process frame diagram that the present invention provides.
Fig. 2 is SPP feature extraction schematic diagram in the present invention.
Fig. 3 is LSTM neural unit structural representation in the present invention.
Detailed description of the invention
The present invention is described in detail with embodiment below in conjunction with the accompanying drawings.
The present invention provides a kind of sudden and violent probably video detecting method based on CNN and LSTM, as it is shown in figure 1, described video inspection Survey method specifically includes following steps:
The first step, carries out key frame sampling, and extracts key frame feature video to be detected;
(1) for video to be detected, first carrying out key frame sampling at equal intervals, the sampling interval is 1 second, obtains key frame images.
(2) key frame images is down-sampled to 227 × 227, is input in CNN semantic model and CNN model of place, carries respectively Take CNN semantic feature and the CNN scene characteristic of this key frame images.
Described CNN semantic feature and CNN scene characteristic specifically include FC6 feature, FC7 feature and SPP feature the most respectively Three kinds of features.Wherein, FC6 feature and FC7 are characterized as 4096 dimensional vectors commonly used, and SPP characteristic extraction procedure is the most special Very, the following detailed description of.
Such as the SPP feature extraction schematic diagram be given in Fig. 2, SPP feature extraction is from Conv5 layer (Conv5 full name convolutional Layer 5, i.e. CNN model convolution the 5th layer) after, Conv5 layer has been sufficiently reserved the spatial positional information of target, but due to Its characteristic dimension is too high, is not easy to directly utilize.For avoiding this problem, first by the Eigen Structure of Conv5 layer according to 1 × 1, 2 × 2 and 3 × 3 carry out Spacial domain decomposition, then use the method in maximum pond to obtain 14 256 in each zoning The vector of dimension (256D), every one-dimensional characteristic of each vector correspond to a certain semantic concept explicitly or implicitly, i.e. SPP Feature.
For each key frame images, the present invention extracts three kinds of CNN semantic features and (includes SPP, FC6 and FC7 Feature) and three kinds of CNN scene characteristic (including SPP, FC6 and FC7 feature), it is separately input to they are on-demand not subsequently In same video aspect discrimination module, it is further processed.
Second step, utilizes the key frame feature extracted, carries out expression and the differentiation of video aspect;
Described video layer bread contains three independent feature representations and differentiation, the VLAD feature of respectively CNN semantic modules Express the scene VLAD feature representation with SVM differentiation, CNN scene module to differentiate with SVM, and LSTM sequential mould The LSTM of block differentiates.
The semantic VLAD feature representation of described CNN semantic modules differentiates with SVM, and input feature vector is three kinds of CNN semantemes Feature (SPP, FC6, FC7).Initially with principal component analysis (Principal Components Analysis is called for short PCA) Method, respectively three kinds of features are down to 128 dimensions, 256 peacekeepings 256 are tieed up.
Subsequently, VLAD method is applied, to the D dimensional feature vector after dimensionality reduction, to beforehand through K-mean cluster (K-Means) The cluster centre set C={c obtained1,c2,...,cKCarry out difference accumulation projection.Make V={v1,v2,...,vNRepresent one comprise N The set of the individual characteristic vector of dimensionality reduction, then with cluster centre ckRelevant difference accumulation vector diffkCan be expressed as:
diff k = Σ i : N N ( v i ) = c k ( v i - c k ) - - - ( 1 )
Wherein, i=1,2 ..., N;K=1,2 ..., K.NN(vi) represent dimensionality reduction characteristic vector viEuclidean in cluster centre set C The arest neighbors of distance.To each difference accumulation vector diffj(1≤j≤K) is carried out respectivelyNorm normalization, then by K difference Accumulation vector cascade, has just obtained final K × D and has tieed up VLAD feature representation.Cluster centre number K is set to herein 256, then SPP, the dimension after FC6, FC7 correspondence VLAD feature representation is respectively 32,768 dimensions, 65,536 peacekeepings 65,536 Dimension.
Finally, training Linear SVM grader completes video and relates to the judgement of probably confidence level.Video VLAD feature representation is made to form Sample set be X={x1,x2,...,xN, corresponding video classification (cruelly fear, non-sudden and violent probably) collection is combined into Y={y1,y2,...,yN, Wherein yi{+1 ,-1}, be converted into the convex double optimization problem that solves to ∈ by geometry margin maximization, and the segmentation that study obtains surpasses Plane is:
W x+b=0 (2)
Wherein, w and b is respectively slope and the amount of bias of segmentation hyperplane.The geometry interval of segmentation hyperplane can be maximized, It is expressed as the optimization problem of band inequality constraints condition:
m a x w , b γ - - - ( 3 )
s . t . y i ( w | | w | | · x i + b | | w | | ) ≥ γ , i = 1 , 2 , ... , N - - - ( 4 )
Wherein, γ represents sample point xiGeometric distance to segmentation hyperplane.This problem can pass through minimax method Lagrange duality Problem is optimized, and minimizes (Sequential Minimal Optimization, be called for short SMO) algorithm by sequence and carry out Solve.Parameter w of the segmentation hyperplane of optimum is obtained after solving*And b*, the most cruelly fearing visual classification decision function can be expressed as:
F (x)=sign (w*·x+b*) (5)
Wherein, sign (x) represents sign function.Current VLAD feature representation is identified as sudden and violent confidence level probably:
P ( y = + 1 ) = 1 1 + e - ( w * · x + b * ) - - - ( 6 )
The VLAD feature representation of SPP, FC6, FC7 is respectively by Linear SVM grader, and finally three kinds of CNN semantemes of output are special Levy the differentiation confidence level P corresponding to FC6, FC7 and SPP features (fc6), Ps (fc7) and Ps (spp)
The scene VLAD feature representation of described CNN scene module differentiates with SVM, and input feature vector is three kinds of CNN scenes Feature (SPP, FC6, FC7).The handling process of this module and semantic VLAD feature representation and SVM discrimination module basic Cause, finally output differentiation confidence level corresponding to three kinds of CNN scene characteristic FC6, FC7 and SPP featuresWith
The LSTM of described LSTM tfi module differentiates, input feature vector is two kinds of CNN semantic features (FC6, FC7). First two category features are separately input in LSTM discrimination model.This model comprises 2 layers of LSTM unit, and ground floor comprises 1024 Individual neuron, the second layer comprises 512 neurons.The structure of each LSTM neuron is as shown in Figure 3.LSTM nerve list The forward conduction process of unit can be expressed as:
it=σ (Wixt+Uiht-1+bi) (7)
ft=σ (Wfxt+Ufht-1+bf) (8)
ot=σ (Woxt+Uoht-1+ bo) (9)
ct=ft*ct-1+ it*φ(Wcxt+Ucht-1+bc) (10)
ht=ot*φ(ct) (11)
Wherein, two kinds of nonlinear activation functions are respectivelyWith φ (xt)=tanh (xt)。it, ft, otAnd ctPoint Do not represent t input gate, Memory-Gate, out gate and the quantity of state corresponding to core door.For each gate, Wi, Wf, WoAnd WcRepresent input gate, Memory-Gate, out gate and the transferring weights matrix corresponding to core door, U respectivelyi, Uf, UoWith UcRepresent input gate, Memory-Gate, out gate and the t-1 moment hidden layer variable h corresponding to core door respectivelyt-1Corresponding weight turns Move matrix, bi,bf,bo,bcThen represent bias vector corresponding to input gate, Memory-Gate, out gate and core door.
First, t input feature vector xtWith t-1 moment hidden layer variable ht-1, at transferring weights matrix W and U, and biasing Under the common effect of vector b, generate the quantity of state i of tt, ftAnd ot, see that formula (7) is to formula (9).Further at t-1 Moment core door state amount ct-1Auxiliary under, generate t core door state amount ct, see formula (10).Finally, in t Core door state amount ctWith out gate quantity of state otEffect under, generate t hidden layer variable ht, and then affect the t+1 moment The interior change of LSTM neuron, is shown in formula (11).
The output of second layer LSTM neuron is connected with full articulamentum grader, two kinds of CNN semantic features FC6 of final output The sequential corresponding with FC7 feature differentiates confidence level Pt (fc6)And Pt (fc7)
3rd step, result merges.
For ensureing fusion efficiencies, in terms of result fusion, have employed level based on checksum set mAP value merge (Hierarchical Fusion) strategy, i.e. for a video to be identified, calculates three modules (CNN semantic modules, CNN scene module respectively With LSTM tfi module) judgement score, then using each module mAP value on checksum set as weight be weighted merge. In practical operation, the score carrying out CNN semantic modules, CNN scene module and LSTM tfi module the most respectively merges, Score followed by the overall situation merges:
P s = ω s ( f c 6 ) P s ( f c 6 ) + ω s ( f c 7 ) P s ( f c 7 ) + ω s ( s p p ) P s ( s p p ) ω s ( f c 6 ) + ω s ( f c 7 ) + ω s ( s p p ) - - - ( 12 )
P p = ω p ( f c 6 ) P p ( f c 6 ) + ω p ( f c 7 ) P p ( f c 7 ) + ω p ( s p p ) P p ( s p p ) ω p ( f c 6 ) + ω p ( f c 7 ) + ω p ( s p p ) - - - ( 13 )
P t = ω t ( f c 6 ) P t ( f c 6 ) + ω t ( f c 7 ) P t ( f c 7 ) ω t ( f c 6 ) + ω t ( f c 7 ) - - - ( 14 )
P o = ω s P s + ω p P p + ω t P t ω s + ω p + ω t - - - ( 15 )
Wherein, Ps, PpAnd PtRepresent respectively and based on CNN semantic modules, CNN scene module and LSTM tfi module sentence Certainly score;ωs、ωpAnd ωtIt is respectively the verification that CNN semantic modules, CNN scene module are corresponding with LSTM tfi module Collection mAP value;Ps (fc6)、Ps (fc7)And Ps (spp)It is respectively corresponding the adjudicating of FC6, FC7, SPP feature in CNN semantic modules Point;WithBe respectively FC6, FC7, SPP feature in CNN semantic modules corresponding checksum set mAP Value;WithIt is respectively the judgement score that in CNN scene module, FC6, FC7, SPP feature is corresponding; WithIt is respectively the checksum set mAP value that in CNN scene module, FC6, FC7, SPP feature is corresponding;Pt (fc6)And Pt (fc7) It is respectively the judgement score that in LSTM tfi module, FC6, FC7 feature is corresponding;WithIt is respectively LSTM sequential mould The checksum set mAP value that in block, FC6, FC7 feature is corresponding.Final sudden and violent probably Video Detection result (confidence level) PoIt is to pass through Three modules obtain based on the mode that mAP value is weighted, and see formula (15).

Claims (7)

1. a sudden and violent probably video detecting method based on CNN and LSTM, it is characterised in that:
Specifically include following steps:
The first step, carries out key frame sampling, and extracts key frame feature video to be detected;
Second step, utilizes the key frame feature extracted, carries out expression and the differentiation of video aspect;Including CNN semantic modules VLAD feature representation and SVM differentiate, the scene VLAD feature representation of CNN scene module differentiates with SVM, and LSTM The LSTM of tfi module differentiates;
3rd step, result merges: use level convergence strategy based on checksum set mAP value, i.e. for a video to be identified, Calculate CNN semantic modules, CNN scene module and the judgement score of LSTM tfi module respectively, then with each module in verification MAP value on collection is weighted merging as weight.
A kind of sudden and violent probably video detecting method based on CNN and LSTM the most according to claim 1, it is characterised in that: first In step, the key frame sampling interval is 1 second, CNN semantic feature that key frame feature includes and CNN scene characteristic, described CNN semantic feature and CNN scene characteristic specifically include FC6 feature, FC7 feature and three kinds of features of SPP feature the most respectively.
A kind of sudden and violent probably video detecting method based on CNN and LSTM the most according to claim 1 and 2, it is characterised in that: SPP feature extraction is from Conv5 layer, and first by the Eigen Structure of Conv5 layer according to 1 × 1,2 × 2 and 3 × 3 carry out area of space Divide, then use the method in maximum pond to obtain the vector of 14 256 dimensions in each zoning, each vector Every one-dimensional characteristic all correspond to a certain semantic concept explicitly or implicitly, i.e. SPP feature.
A kind of sudden and violent probably video detecting method based on CNN and LSTM the most according to claim 1, it is characterised in that: second The semantic VLAD feature representation of the CNN semantic modules described in step differentiates with SVM, and input feature vector is three kinds of CNN semantemes Feature SPP, FC6, FC7;Initially with the method for principal component analysis, respectively three kinds of features are down to 128 dimensions, 256 peacekeepings 256 dimensions;Subsequently, VLAD method is applied, to the characteristic vector after dimensionality reduction, to the cluster obtained beforehand through K-mean cluster Centralization C={c1,c2,...,cKCarry out difference accumulation projection;Make V={v1,v2,...,vNRepresent that comprises N number of dimensionality reduction spy Levy the set of vector, then with cluster centre ckRelevant difference accumulation vector diffkIt is expressed as:
diff k = Σ i : N N ( v i ) = c k ( v i - c k ) - - - ( 1 )
Wherein, i=1,2 ..., N;K=1,2 ..., K.NN(vi) represent dimensionality reduction characteristic vector viEuclidean in cluster centre set C The arest neighbors of distance;To each difference accumulation vector diffj(1≤j≤K) carries out l respectively2Norm normalization, then by K difference Accumulation vector cascade, has just obtained final K × D and has tieed up VLAD feature representation;Cluster centre number K is set to herein 256, then SPP, the dimension after FC6, FC7 correspondence VLAD feature representation is respectively 32,768 dimensions, 65,536 peacekeepings 65,536 Dimension;
Finally, training Linear SVM grader completes video and relates to the judgement of probably confidence level.
A kind of sudden and violent probably video detecting method based on CNN and LSTM the most according to claim 4, it is characterised in that: described Training Linear SVM grader complete video and relate to the judgement of probably confidence level, particularly as follows: make video VLAD feature representation form Sample set be X={x1,x2,...,xN, corresponding video category set is Y={y1,y2,...,yN, wherein yi∈+1 ,-1}, logical Crossing geometry margin maximization and be converted into the convex double optimization problem that solves, the segmentation hyperplane that study obtains is:
W x+b=0 (2)
Wherein, w and b is respectively slope and the amount of bias of segmentation hyperplane;The geometry interval of segmentation hyperplane, table will be maximized It is shown as the optimization problem of band inequality constraints condition:
m a x w , b γ - - - ( 3 )
s . t . y i ( w | | w | | · x i + b | | w | | ) ≥ γ , i = 1 , 2 , ... , N - - - ( 4 )
Wherein, γ represents sample point xiGeometric distance to segmentation hyperplane;This problem passes through minimax method lagrange duality problem It is optimized, and minimizes algorithm by sequence and solve;Parameter w of the segmentation hyperplane of optimum is obtained after solving*And b*, The most cruelly fear visual classification decision function to be expressed as:
F (x)=sign (w*·x+b*) (5)
Wherein, sign (x) represents sign function;Current VLAD feature representation is identified as sudden and violent confidence level probably:
P ( y = + 1 ) = 1 1 + e - ( w * · x + b * ) - - - ( 6 )
The VLAD feature representation of SPP, FC6, FC7 is respectively by Linear SVM grader, and finally three kinds of CNN semantemes of output are special Levy the differentiation confidence level corresponding to FC6, FC7 and SPP featureWith
A kind of sudden and violent probably video detecting method based on CNN and LSTM the most according to claim 1, it is characterised in that: second The LSTM of the LSTM tfi module described in step differentiates, input feature vector is two kinds of CNN semantic features FC6, FC7;First Being separately input in LSTM discrimination model by two category features, this model comprises 2 layers of LSTM unit, and ground floor comprises 1024 Individual neuron, the second layer comprises 512 neurons;The forward conduction procedural representation of each LSTM neural unit is:
it=σ (Wixt+Uiht-1+bi) (7)
ft=σ (Wfxt+Ufht-1+bf) (8)
ot=σ (Woxt+Uoht-1+bo) (9)
ct=ft*ct-1+it*φ(Wcxt+Ucht-1+bc) (10)
ht=ot*φ(ct) (11)
Wherein, two kinds of nonlinear activation functions are respectivelyWith φ (xt)=tanh (xt);it, ft, otAnd ctPoint Do not represent t input gate, Memory-Gate, out gate and the quantity of state corresponding to core door;For each gate, Wi, Wf, WoAnd WcRepresent input gate, Memory-Gate, out gate and the transferring weights matrix corresponding to core door respectively;Ui, Uf, UoAnd Uc Represent input gate, Memory-Gate, out gate and the t-1 moment hidden layer variable h corresponding to core door respectivelyt-1Corresponding transferring weights Matrix, bi,bf,bo,bcThen represent bias vector corresponding to input gate, Memory-Gate, out gate and core door;
The output of second layer LSTM neuron is connected with full articulamentum grader, two kinds of CNN semantic features FC6 of final output The sequential corresponding with FC7 feature differentiates confidence levelWith
A kind of sudden and violent probably video detecting method based on CNN and LSTM the most according to claim 1, it is characterised in that: the 3rd Step result merges, and the score carrying out CNN semantic modules, CNN scene module and LSTM tfi module the most respectively merges, Score followed by the overall situation merges:
P s = ω s ( f c 6 ) P s ( f c 6 ) + ω s ( f c 7 ) P s ( f c 7 ) + ω s ( s p p ) P s ( s p p ) ω s ( f c 6 ) + ω s ( f c 7 ) + ω s ( s p p ) - - - ( 12 )
P p = ω p ( f c 6 ) P p ( f c 6 ) + ω p ( f c 7 ) P p ( f c 7 ) + ω p ( s p p ) P p ( s p p ) ω p ( f c 6 ) + ω p ( f c 7 ) + ω p ( s p p ) - - - ( 13 )
P t = ω t ( f c 6 ) P t ( f c 6 ) + ω t ( f c 7 ) P t ( f c 7 ) ω t ( f c 6 ) + ω t ( f c 7 ) - - - ( 14 )
P o = ω s P s + ω p P p + ω t P t ω s + ω p + ω t - - - ( 15 )
Wherein, Ps, PpAnd PtRepresent respectively and based on CNN semantic modules, CNN scene module and LSTM tfi module sentence Certainly score;ωs、ωpAnd ωtIt is respectively the verification that CNN semantic modules, CNN scene module are corresponding with LSTM tfi module Collection mAP value;WithIt is respectively corresponding the adjudicating of FC6, FC7, SPP feature in CNN semantic modules Point;WithBe respectively FC6, FC7, SPP feature in CNN semantic modules corresponding checksum set mAP Value;WithIt is respectively the judgement score that in CNN scene module, FC6, FC7, SPP feature is corresponding; WithIt is respectively the checksum set mAP value that in CNN scene module, FC6, FC7, SPP feature is corresponding;With It is respectively the judgement score that in LSTM tfi module, FC6, FC7 feature is corresponding;WithIt is respectively LSTM sequential mould The checksum set mAP value that in block, FC6, FC7 feature is corresponding;Final sudden and violent probably Video Detection result PoIt is by three module bases The mode being weighted in mAP value obtains.
CN201610168334.9A 2016-03-23 2016-03-23 It is a kind of that video detecting method is feared based on CNN and LSTM cruelly Active CN105844239B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610168334.9A CN105844239B (en) 2016-03-23 2016-03-23 It is a kind of that video detecting method is feared based on CNN and LSTM cruelly

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610168334.9A CN105844239B (en) 2016-03-23 2016-03-23 It is a kind of that video detecting method is feared based on CNN and LSTM cruelly

Publications (2)

Publication Number Publication Date
CN105844239A true CN105844239A (en) 2016-08-10
CN105844239B CN105844239B (en) 2019-03-29

Family

ID=56584468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610168334.9A Active CN105844239B (en) 2016-03-23 2016-03-23 It is a kind of that video detecting method is feared based on CNN and LSTM cruelly

Country Status (1)

Country Link
CN (1) CN105844239B (en)

Cited By (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106411597A (en) * 2016-10-14 2017-02-15 广东工业大学 Network traffic abnormality detection method and system
CN106548208A (en) * 2016-10-28 2017-03-29 杭州慕锐科技有限公司 A kind of quick, intelligent stylizing method of photograph image
CN106599198A (en) * 2016-12-14 2017-04-26 广东顺德中山大学卡内基梅隆大学国际联合研究院 Image description method for multi-stage connection recurrent neural network
CN106782602A (en) * 2016-12-01 2017-05-31 南京邮电大学 Speech-emotion recognition method based on length time memory network and convolutional neural networks
CN106780491A (en) * 2017-01-23 2017-05-31 天津大学 The initial profile generation method used in GVF methods segmentation CT pelvis images
CN106846346A (en) * 2017-01-23 2017-06-13 天津大学 Sequence C T image pelvis profile rapid extracting methods based on key frame marker
CN106951783A (en) * 2017-03-31 2017-07-14 国家电网公司 A kind of Method for Masquerade Intrusion Detection and device based on deep neural network
CN106997371A (en) * 2016-10-28 2017-08-01 华数传媒网络有限公司 The construction method of single user wisdom collection of illustrative plates
CN107016356A (en) * 2017-03-21 2017-08-04 乐蜜科技有限公司 Certain content recognition methods, device and electronic equipment
CN107038221A (en) * 2017-03-22 2017-08-11 杭州电子科技大学 A kind of video content description method guided based on semantic information
CN107092254A (en) * 2017-04-27 2017-08-25 北京航空航天大学 A kind of design method for the Household floor-sweeping machine device people for strengthening study based on depth
CN107092894A (en) * 2017-04-28 2017-08-25 孙恩泽 A kind of motor behavior recognition methods based on LSTM models
CN107256221A (en) * 2017-04-26 2017-10-17 苏州大学 Video presentation method based on multi-feature fusion
CN107274378A (en) * 2017-07-25 2017-10-20 江西理工大学 A kind of image blurring type identification and parameter tuning method for merging memory CNN
CN107341462A (en) * 2017-06-28 2017-11-10 电子科技大学 A kind of video classification methods based on notice mechanism
CN107392097A (en) * 2017-06-15 2017-11-24 中山大学 A kind of 3 D human body intra-articular irrigation method of monocular color video
CN107480726A (en) * 2017-08-25 2017-12-15 电子科技大学 A kind of Scene Semantics dividing method based on full convolution and shot and long term mnemon
CN107818084A (en) * 2017-10-11 2018-03-20 北京众荟信息技术股份有限公司 A kind of sentiment analysis method for merging comment figure
CN107885853A (en) * 2017-11-14 2018-04-06 同济大学 A kind of combined type file classification method based on deep learning
CN107895172A (en) * 2017-11-03 2018-04-10 北京奇虎科技有限公司 Utilize the method, apparatus and computing device of image information detection anomalous video file
CN108009539A (en) * 2017-12-26 2018-05-08 中山大学 A kind of new text recognition method based on counting focus model
WO2018086513A1 (en) * 2016-11-08 2018-05-17 杭州海康威视数字技术股份有限公司 Target detection method and device
CN108053410A (en) * 2017-12-11 2018-05-18 厦门美图之家科技有限公司 Moving Object Segmentation method and device
CN108228915A (en) * 2018-03-29 2018-06-29 华南理工大学 A kind of video retrieval method based on deep learning
CN108229522A (en) * 2017-03-07 2018-06-29 北京市商汤科技开发有限公司 Training method, attribute detection method, device and the electronic equipment of neural network
CN108289248A (en) * 2018-01-18 2018-07-17 福州瑞芯微电子股份有限公司 A kind of deep learning video encoding/decoding method and device based on content forecast
CN108419091A (en) * 2018-03-02 2018-08-17 北京未来媒体科技股份有限公司 A kind of verifying video content method and device based on machine learning
US10152627B2 (en) 2017-03-20 2018-12-11 Microsoft Technology Licensing, Llc Feature flow for video recognition
CN109460812A (en) * 2017-09-06 2019-03-12 富士通株式会社 Average information analytical equipment, the optimization device, feature visualization device of neural network
CN109817338A (en) * 2019-02-13 2019-05-28 北京大学第三医院(北京大学第三临床医学院) A kind of chronic disease aggravates risk assessment and warning system
CN109858540A (en) * 2019-01-24 2019-06-07 青岛中科智康医疗科技有限公司 A kind of medical image recognition system and method based on multi-modal fusion
CN109923559A (en) * 2016-11-04 2019-06-21 易享信息技术有限公司 Quasi- Recognition with Recurrent Neural Network
CN109961041A (en) * 2019-03-21 2019-07-02 腾讯科技(深圳)有限公司 A kind of video frequency identifying method, device and storage medium
CN110046226A (en) * 2019-04-17 2019-07-23 桂林电子科技大学 A kind of Image Description Methods based on distribution term vector CNN-RNN network
CN110166826A (en) * 2018-11-21 2019-08-23 腾讯科技(深圳)有限公司 Scene recognition method, device, storage medium and the computer equipment of video
CN110555488A (en) * 2018-06-04 2019-12-10 北京京东尚科信息技术有限公司 Image sequence auditing method and system, electronic equipment and storage medium
CN110647905A (en) * 2019-08-02 2020-01-03 杭州电子科技大学 Method for identifying terrorist-related scene based on pseudo brain network model
CN110929762A (en) * 2019-10-30 2020-03-27 中国科学院自动化研究所南京人工智能芯片创新研究院 Method and system for detecting body language and analyzing behavior based on deep learning
CN111222320A (en) * 2019-12-17 2020-06-02 共道网络科技有限公司 Character prediction model training method and device
CN111291602A (en) * 2018-12-07 2020-06-16 北京奇虎科技有限公司 Video detection method and device, electronic equipment and computer readable storage medium
CN111368071A (en) * 2018-12-07 2020-07-03 北京奇虎科技有限公司 Video detection method and device based on video related text and electronic equipment
CN112115984A (en) * 2020-08-28 2020-12-22 安徽农业大学 Tea garden abnormal data correction method and system based on deep learning and storage medium
CN113010735A (en) * 2019-12-20 2021-06-22 北京金山云网络技术有限公司 Video classification method and device, electronic equipment and storage medium
CN113095183A (en) * 2021-03-31 2021-07-09 西北工业大学 Micro-expression detection method based on deep neural network
CN115089206A (en) * 2022-05-09 2022-09-23 吴先洪 Method for predicting heart sound signals and heart auscultation device using same

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218608A (en) * 2013-04-19 2013-07-24 中国科学院自动化研究所 Network violent video identification method
CN103473555A (en) * 2013-08-26 2013-12-25 中国科学院自动化研究所 Horrible video scene recognition method based on multi-view and multi-instance learning
CN105005772A (en) * 2015-07-20 2015-10-28 北京大学 Video scene detection method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218608A (en) * 2013-04-19 2013-07-24 中国科学院自动化研究所 Network violent video identification method
CN103473555A (en) * 2013-08-26 2013-12-25 中国科学院自动化研究所 Horrible video scene recognition method based on multi-view and multi-instance learning
CN105005772A (en) * 2015-07-20 2015-10-28 北京大学 Video scene detection method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YQWANG2006: "SPP_mask阅读报告", 《百度文库》 *
ZHANGWEN XU 等: ""A Discriminative CNN Video Representation for Event Detection"", 《IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION(CVPR)》 *
ZUXUAN WU 等: ""Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classfication"", 《PROCEEDINGS OF THE 23RD ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA》 *

Cited By (67)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106411597A (en) * 2016-10-14 2017-02-15 广东工业大学 Network traffic abnormality detection method and system
CN106997371A (en) * 2016-10-28 2017-08-01 华数传媒网络有限公司 The construction method of single user wisdom collection of illustrative plates
CN106548208A (en) * 2016-10-28 2017-03-29 杭州慕锐科技有限公司 A kind of quick, intelligent stylizing method of photograph image
CN106997371B (en) * 2016-10-28 2020-06-23 华数传媒网络有限公司 Method for constructing single-user intelligent map
CN106548208B (en) * 2016-10-28 2019-05-28 杭州米绘科技有限公司 A kind of quick, intelligent stylizing method of photograph image
CN109923559B (en) * 2016-11-04 2024-03-12 硕动力公司 Quasi-cyclic neural network
CN109923559A (en) * 2016-11-04 2019-06-21 易享信息技术有限公司 Quasi- Recognition with Recurrent Neural Network
WO2018086513A1 (en) * 2016-11-08 2018-05-17 杭州海康威视数字技术股份有限公司 Target detection method and device
US10949673B2 (en) 2016-11-08 2021-03-16 Hangzhou Hikvision Digital Technology Co., Ltd. Target detection method and device
CN106782602A (en) * 2016-12-01 2017-05-31 南京邮电大学 Speech-emotion recognition method based on length time memory network and convolutional neural networks
CN106599198B (en) * 2016-12-14 2021-04-06 广东顺德中山大学卡内基梅隆大学国际联合研究院 Image description method of multi-cascade junction cyclic neural network
CN106599198A (en) * 2016-12-14 2017-04-26 广东顺德中山大学卡内基梅隆大学国际联合研究院 Image description method for multi-stage connection recurrent neural network
CN106846346B (en) * 2017-01-23 2019-12-20 天津大学 Method for rapidly extracting pelvis outline of sequence CT image based on key frame mark
CN106780491B (en) * 2017-01-23 2020-03-17 天津大学 Initial contour generation method adopted in segmentation of CT pelvic image by GVF method
CN106846346A (en) * 2017-01-23 2017-06-13 天津大学 Sequence C T image pelvis profile rapid extracting methods based on key frame marker
CN106780491A (en) * 2017-01-23 2017-05-31 天津大学 The initial profile generation method used in GVF methods segmentation CT pelvis images
CN108229522A (en) * 2017-03-07 2018-06-29 北京市商汤科技开发有限公司 Training method, attribute detection method, device and the electronic equipment of neural network
CN108229522B (en) * 2017-03-07 2020-07-17 北京市商汤科技开发有限公司 Neural network training method, attribute detection device and electronic equipment
US10152627B2 (en) 2017-03-20 2018-12-11 Microsoft Technology Licensing, Llc Feature flow for video recognition
CN107016356A (en) * 2017-03-21 2017-08-04 乐蜜科技有限公司 Certain content recognition methods, device and electronic equipment
CN107038221A (en) * 2017-03-22 2017-08-11 杭州电子科技大学 A kind of video content description method guided based on semantic information
CN107038221B (en) * 2017-03-22 2020-11-17 杭州电子科技大学 Video content description method based on semantic information guidance
CN106951783B (en) * 2017-03-31 2021-06-01 国家电网公司 Disguised intrusion detection method and device based on deep neural network
CN106951783A (en) * 2017-03-31 2017-07-14 国家电网公司 A kind of Method for Masquerade Intrusion Detection and device based on deep neural network
CN107256221A (en) * 2017-04-26 2017-10-17 苏州大学 Video presentation method based on multi-feature fusion
CN107256221B (en) * 2017-04-26 2020-11-03 苏州大学 Video description method based on multi-feature fusion
CN107092254B (en) * 2017-04-27 2019-11-29 北京航空航天大学 A kind of design method of the Household floor-sweeping machine device people based on depth enhancing study
CN107092254A (en) * 2017-04-27 2017-08-25 北京航空航天大学 A kind of design method for the Household floor-sweeping machine device people for strengthening study based on depth
CN107092894A (en) * 2017-04-28 2017-08-25 孙恩泽 A kind of motor behavior recognition methods based on LSTM models
CN107392097B (en) * 2017-06-15 2020-07-07 中山大学 Three-dimensional human body joint point positioning method of monocular color video
CN107392097A (en) * 2017-06-15 2017-11-24 中山大学 A kind of 3 D human body intra-articular irrigation method of monocular color video
CN107341462A (en) * 2017-06-28 2017-11-10 电子科技大学 A kind of video classification methods based on notice mechanism
CN107274378B (en) * 2017-07-25 2020-04-03 江西理工大学 Image fuzzy type identification and parameter setting method based on fusion memory CNN
CN107274378A (en) * 2017-07-25 2017-10-20 江西理工大学 A kind of image blurring type identification and parameter tuning method for merging memory CNN
CN107480726A (en) * 2017-08-25 2017-12-15 电子科技大学 A kind of Scene Semantics dividing method based on full convolution and shot and long term mnemon
CN109460812A (en) * 2017-09-06 2019-03-12 富士通株式会社 Average information analytical equipment, the optimization device, feature visualization device of neural network
CN107818084A (en) * 2017-10-11 2018-03-20 北京众荟信息技术股份有限公司 A kind of sentiment analysis method for merging comment figure
CN107818084B (en) * 2017-10-11 2021-03-09 北京众荟信息技术股份有限公司 Emotion analysis method fused with comment matching diagram
CN107895172A (en) * 2017-11-03 2018-04-10 北京奇虎科技有限公司 Utilize the method, apparatus and computing device of image information detection anomalous video file
CN107885853A (en) * 2017-11-14 2018-04-06 同济大学 A kind of combined type file classification method based on deep learning
CN108053410A (en) * 2017-12-11 2018-05-18 厦门美图之家科技有限公司 Moving Object Segmentation method and device
CN108009539B (en) * 2017-12-26 2021-11-02 中山大学 Novel text recognition method based on counting focusing model
CN108009539A (en) * 2017-12-26 2018-05-08 中山大学 A kind of new text recognition method based on counting focus model
CN108289248B (en) * 2018-01-18 2020-05-15 福州瑞芯微电子股份有限公司 Deep learning video decoding method and device based on content prediction
CN108289248A (en) * 2018-01-18 2018-07-17 福州瑞芯微电子股份有限公司 A kind of deep learning video encoding/decoding method and device based on content forecast
CN108419091A (en) * 2018-03-02 2018-08-17 北京未来媒体科技股份有限公司 A kind of verifying video content method and device based on machine learning
CN108228915A (en) * 2018-03-29 2018-06-29 华南理工大学 A kind of video retrieval method based on deep learning
CN110555488A (en) * 2018-06-04 2019-12-10 北京京东尚科信息技术有限公司 Image sequence auditing method and system, electronic equipment and storage medium
CN110166826B (en) * 2018-11-21 2021-10-08 腾讯科技(深圳)有限公司 Video scene recognition method and device, storage medium and computer equipment
CN110166826A (en) * 2018-11-21 2019-08-23 腾讯科技(深圳)有限公司 Scene recognition method, device, storage medium and the computer equipment of video
CN111291602A (en) * 2018-12-07 2020-06-16 北京奇虎科技有限公司 Video detection method and device, electronic equipment and computer readable storage medium
CN111368071A (en) * 2018-12-07 2020-07-03 北京奇虎科技有限公司 Video detection method and device based on video related text and electronic equipment
CN109858540A (en) * 2019-01-24 2019-06-07 青岛中科智康医疗科技有限公司 A kind of medical image recognition system and method based on multi-modal fusion
CN109817338A (en) * 2019-02-13 2019-05-28 北京大学第三医院(北京大学第三临床医学院) A kind of chronic disease aggravates risk assessment and warning system
CN109961041A (en) * 2019-03-21 2019-07-02 腾讯科技(深圳)有限公司 A kind of video frequency identifying method, device and storage medium
CN110046226B (en) * 2019-04-17 2021-09-24 桂林电子科技大学 Image description method based on distributed word vector CNN-RNN network
CN110046226A (en) * 2019-04-17 2019-07-23 桂林电子科技大学 A kind of Image Description Methods based on distribution term vector CNN-RNN network
CN110647905A (en) * 2019-08-02 2020-01-03 杭州电子科技大学 Method for identifying terrorist-related scene based on pseudo brain network model
CN110647905B (en) * 2019-08-02 2022-05-13 杭州电子科技大学 Method for identifying terrorist-related scene based on pseudo brain network model
CN110929762A (en) * 2019-10-30 2020-03-27 中国科学院自动化研究所南京人工智能芯片创新研究院 Method and system for detecting body language and analyzing behavior based on deep learning
CN110929762B (en) * 2019-10-30 2023-05-12 中科南京人工智能创新研究院 Limb language detection and behavior analysis method and system based on deep learning
CN111222320A (en) * 2019-12-17 2020-06-02 共道网络科技有限公司 Character prediction model training method and device
CN113010735A (en) * 2019-12-20 2021-06-22 北京金山云网络技术有限公司 Video classification method and device, electronic equipment and storage medium
CN113010735B (en) * 2019-12-20 2024-03-08 北京金山云网络技术有限公司 Video classification method and device, electronic equipment and storage medium
CN112115984A (en) * 2020-08-28 2020-12-22 安徽农业大学 Tea garden abnormal data correction method and system based on deep learning and storage medium
CN113095183A (en) * 2021-03-31 2021-07-09 西北工业大学 Micro-expression detection method based on deep neural network
CN115089206A (en) * 2022-05-09 2022-09-23 吴先洪 Method for predicting heart sound signals and heart auscultation device using same

Also Published As

Publication number Publication date
CN105844239B (en) 2019-03-29

Similar Documents

Publication Publication Date Title
CN105844239A (en) Method for detecting riot and terror videos based on CNN and LSTM
Wan et al. An intelligent video analysis method for abnormal event detection in intelligent transportation systems
Zhao et al. 3DVG-Transformer: Relation modeling for visual grounding on point clouds
Shi et al. Key-word-aware network for referring expression image segmentation
CN106022300B (en) Traffic sign recognition method and system based on cascade deep study
CN104376105B (en) The Fusion Features system and method for image low-level visual feature and text description information in a kind of Social Media
CN106909625A (en) A kind of image search method and system based on Siamese networks
CN105760488A (en) Image expressing method and device based on multi-level feature fusion
CN109635647B (en) Multi-picture multi-face clustering method based on constraint condition
Wang et al. One-shot learning for long-tail visual relation detection
CN110765285A (en) Multimedia information content control method and system based on visual characteristics
Yuan et al. Few-shot scene classification with multi-attention deepemd network in remote sensing
Balaji et al. Multi-level feature fusion for group-level emotion recognition
CN104715266A (en) Image characteristics extracting method based on combination of SRC-DP and LDA
Wu et al. Component-based metric learning for fully automatic kinship verification
Chen et al. Part alignment network for vehicle re-identification
Pan et al. Hybrid dilated faster RCNN for object detection
Mohammad et al. Searching surveillance video contents using convolutional neural network
Yao [Retracted] Application of Higher Education Management in Colleges and Universities by Deep Learning
CN116630726B (en) Multi-mode-based bird classification method and system
Chen et al. Intelligent teaching evaluation system integrating facial expression and behavior recognition in teaching video
Hao et al. Facial expression recognition based on regional adaptive correlation
CN110363164A (en) Unified method based on LSTM time consistency video analysis
Wang Improved facial expression recognition method based on gan
Min-qing et al. An automatic classification method of sports teaching video using support vector machine

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant