CN108647641A - Video behavior dividing method and device based on two-way Model Fusion - Google Patents
Video behavior dividing method and device based on two-way Model Fusion Download PDFInfo
- Publication number
- CN108647641A CN108647641A CN201810443505.3A CN201810443505A CN108647641A CN 108647641 A CN108647641 A CN 108647641A CN 201810443505 A CN201810443505 A CN 201810443505A CN 108647641 A CN108647641 A CN 108647641A
- Authority
- CN
- China
- Prior art keywords
- video frame
- segment
- video
- behavior classification
- behavior
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Image Analysis (AREA)
Abstract
This application discloses a kind of video behavior dividing methods and device based on two-way Model Fusion.This method includes:Based on the related coefficient between video frame adjacent in video, by the Video segmentation at segment;For the video frame in the segment, the scene of the video frame is identified, obtain scene characteristic vector;For the video frame in the segment, the local behavioural characteristic of the video frame is identified, obtain local behavioural characteristic vector;Based on local behavioural characteristic vector described in the scene characteristic vector sum, the behavior classification of the video frame and confidence level corresponding with behavior classification are identified;The behavior classification and confidence level of video frame based on the segment determine the behavior classification of the segment;Merge with by the adjacent identical segment of behavior classification, obtains the segmentation result of the video.This method can simultaneously merge two-way model, and two dimensions of comprehensive utilization scene and local behavior extract global behavior information, to be rapidly split to video.
Description
Technical field
This application involves image automatic business processing fields, more particularly to a kind of video behavior based on two-way Model Fusion
Dividing method and device.
Background technology
The fast development of video compression algorithm and application brings the video data of magnanimity.Contain in video abundant
Information, however, since video data is huge, unlike word directly indicates abstract concept, thus the extraction of video information and
Structuring is relative complex.Currently, the extracting method of video information is mainly first split video, then give every after segmentation
A segment classification is tagged, is a kind of thinking of video information extraction and structuring.Based on traditional computer vision to regarding
Frequency is split, and generally requires engineer's characteristics of image, the feature designed in this way cannot flexibly adapt to the change of various scenes
Change.Actually available Video segmentation most of at present only according to every frame colouring information, by various traditional computer visions
Transformation, the variation of adjacent two frame is detected, so that it is determined that Video segmentation point, then proceedes to calculate using the cluster in machine learning
Method polymerize the adjacent video clip divided, and the meeting of similar categorization is classified as one kind.However, these above-mentioned methods are only
Superficial segmentation can be completed, and cannot recognize that the semanteme of each segment in screen.
Invention content
The application's aims to overcome that the above problem or solves or extenuate to solve the above problems at least partly.
According to the one side of the application, a kind of methods of video segmentation is provided, including:
Fragment segmentation step:It is based on the related coefficient between video frame adjacent in video, the Video segmentation is in blocks
Section;
Scene Recognition step:For the video frame in the segment, identify the scene of the video frame, obtain scene characteristic to
Amount;
Local behavioural characteristic identification step:For the video frame in the segment, identify that the local behavior of the video frame is special
Sign obtains local behavioural characteristic vector;
Video frame behavior classification judgment step:Based on local behavioural characteristic vector described in the scene characteristic vector sum, know
The behavior classification of the not described video frame and confidence level corresponding with behavior classification;
Segment behavior class determining step:The behavior classification and confidence level of video frame based on the segment, determine the piece
The behavior classification of section;
Segment merges step:The adjacent identical segment of behavior classification is merged, the segmentation result of the video is obtained.
This method can simultaneously merge two-way model, two dimensions of comprehensive utilization scene and local behavior, to whole
Body behavioural information extracts, to be rapidly split to video.
Optionally, the fragment segmentation step includes:
Histogram calculation step:Calculate the YCbCr histograms of each video frame of the video;
Related coefficient calculates step:Calculate the YCbCr histograms of the video frame and the YCbCr histograms of previous video frame
Related coefficient;
Threshold value comparison step:When the related coefficient is less than scheduled first threshold, using the video frame as new piece
The start frame of section.
Optionally, the scene Recognition step includes:
Resolution ratio step of converting:The RGB channel of the video frame is separately converted to fixed-size resolution ratio;With
Scene characteristic vector generation step:Video frame after resolution ratio converts is input in first network model,
Obtain the scene characteristic vector of the video frame, wherein the first network model is:Remove the full articulamentum of last layer and
The VGG16 network models of Softmax graders.
Optionally, the local behavioural characteristic identification step includes:
The long fixing step of most short side:The RGB channel of the video frame is separately converted to most short side and grows fixed resolution ratio;
With
Local behavioural characteristic vector generation step:Most short side is grown fixed video frame to be input in first network model,
The output result of the first network model is input in the convolutional neural networks based on region (FasterRCNN) model, profit
Optimal detection category result is calculated with the output result of the convolutional neural networks based on region, by the optimal detection classification
As a result local behavioural characteristic vector is obtained by area-of-interest pond layer.
Optionally, the video frame behavior classification judgment step includes:
Video frame feature vector merges step:Local behavioural characteristic vector described in the scene characteristic vector sum is merged into
Video frame feature vector;With
Behavior classification and confidence calculations step:The video frame feature vector is input to third network, is obtained described
The behavior classification of video frame and confidence level corresponding with behavior classification, wherein the third network by 4 full articulamentums with
Softmax graders are in turn connected to form.
Optionally, the segment behavior classification judgment step includes:The identical video frame quantity of behavior classification with it is described
In the case that the ratio of the video frame total quantity of segment is more than scheduled second threshold, using behavior classification as the row of the segment
For classification.
According to further aspect of the application, a kind of Video segmentation device is additionally provided, including:
Fragment segmentation module is disposed for based on the related coefficient between video frame adjacent in video, will be described
Video segmentation is at segment;
Scene Recognition module is disposed for identifying the video frame in the segment scene of the video frame, obtain
To scene feature vector;
Local behavioural characteristic identification module is disposed for, for the video frame in the segment, identifying the video frame
Local behavioural characteristic, obtain local behavioural characteristic vector;
Video frame behavior classification judgment module is disposed for based on local behavior described in the scene characteristic vector sum
Feature vector identifies the behavior classification of the video frame and confidence level corresponding with behavior classification;
Segment behavior category determination module is disposed for the behavior classification and confidence of the video frame based on the segment
Degree, determines the behavior classification of the segment;With
Segment merging module is disposed for merging the adjacent identical segment of behavior classification, obtains the video
Segmentation result.
The device can simultaneously merge two-way model, two dimensions of comprehensive utilization scene and local behavior, to whole
Body behavioural information extracts, to be rapidly split to video.
According to further aspect of the application, a kind of computer equipment, including memory, processor and storage are additionally provided
In the memory and the computer program that can be run by the processor, wherein the processor executes the computer
Method as described above is realized when program.
According to further aspect of the application, a kind of computer readable storage medium is additionally provided, it is preferably non-volatile
Readable storage medium storing program for executing, is stored with computer program, and the computer program is realized as described above when executed by the processor
Method.
According to further aspect of the application, a kind of computer program product, including computer-readable code are additionally provided,
When the computer-readable code is executed by computer equipment, the computer equipment is caused to execute method as described above.
According to the accompanying drawings to the detailed description of the specific embodiment of the application, those skilled in the art will be more
Above-mentioned and other purposes, the advantages and features of the application are illustrated.
Description of the drawings
Some specific embodiments of the application are described in detail by way of example rather than limitation with reference to the accompanying drawings hereinafter.
Identical reference numeral denotes same or similar component or part in attached drawing.It should be appreciated by those skilled in the art that these
What attached drawing was not necessarily drawn to scale.In attached drawing:
Fig. 1 is the schematic flow chart according to one embodiment of the methods of video segmentation of the application;
Fig. 2 is the schematic block diagram of the behavior prediction network of the application;
Fig. 3 is the schematic block diagram of the behavior prediction network of trained the application;
Fig. 4 is the schematic block diagram according to one embodiment of the Video segmentation device of the application;
Fig. 5 is the block diagram of one embodiment of the computing device of the application;
Fig. 6 is the block diagram of one embodiment of the computer readable storage medium of the application.
Specific implementation mode
According to the accompanying drawings to the detailed description of the specific embodiment of the application, those skilled in the art will be more
Above-mentioned and other purposes, the advantages and features of the application are illustrated.
The embodiment of the application provides a kind of methods of video segmentation, and Fig. 1 is the methods of video segmentation according to the application
One embodiment schematic flow chart.This method may include:
S100 fragment segmentation steps:Based on the related coefficient between video frame adjacent in video, by the Video segmentation
At segment;
S200 scene Recognition steps:For the video frame in the segment, the scene of the video frame is identified, obtain scene spy
Sign vector;
The parts S300 behavioural characteristic identification step:For the video frame in the segment, the partial row of the video frame is identified
It is characterized, obtains local behavioural characteristic vector;
S400 video frame behavior classification judgment steps:Based on local behavioural characteristic described in the scene characteristic vector sum to
Amount, identifies the behavior classification of the video frame and confidence level corresponding with behavior classification;
S500 segment behavior class determining steps:The behavior classification and confidence level of video frame based on the segment determine
The behavior classification of the segment;
S600 segments merge step:The adjacent identical segment of behavior classification is merged, the segmentation knot of the video is obtained
Fruit.
Method provided by the present application can simultaneously merge two-way model, comprehensive utilization scene and local behavior two
Dimension extracts global behavior information, to be rapidly split to video.The present invention utilizes depth learning technology,
Video is split from the dimension of the behavior classification of people.On the one hand, it can be extracted using depth learning technology more abstract
Generic features, on the other hand, multidate information and causal event in video are mainly defined by the behavior of people, therefore according to the row of people
It is also the most rational to be split to video for classification.
Optionally, the S100 fragment segmentations step may include:
S101 histogram calculation steps:Calculate the YCbCr histograms of each video frame of the video;
S102 related coefficients calculate step:YCbCr histograms and the YCbCr of previous video frame for calculating the video frame are straight
The related coefficient of square figure;With
S103 threshold value comparison steps:When the related coefficient is less than scheduled first threshold, using the video frame as new
Segment start frame.
Color space can may include:(Hue, Saturation, Value, tone are satisfied by RGB, CMY (three primary colours), HSV
With degree, brightness), HIS (Hue, Saturation, Intensity, tone, saturation degree, intensity), YCbCr.The wherein Y of YCbCr
Refer to luminance component, Cb refers to chroma blue component, and Cr refers to red chrominance component.By taking YCbCr as an example, in an optional embodiment
In, fragment segmentation is carried out to video:
Based on YCbCr color spaces, the YCbCr data of the frame are normalized, YCbCr after structure normalization
The horizontal axis of histogram, the histogram indicates that normalized series, the longitudinal axis indicate the corresponding pixel quantity of the series.Normalization
When processing, it is alternatively possible to which Y, Cb, Cr are not divided into 16 parts, 9 parts, 9 parts, i.e. 16-9-9 patterns, normalized series takes at this time
Value is 16+9+9=34.It determines series and allows for visual resolving power and the calculating of the mankind the reason of being normalized
The processing speed of machine, therefore perceive the normalized into between-line spacing not etc. according to the different range of color and subjective color, i.e.,
Quantification treatment.
The related coefficient between the frame and the former frame of the frame is calculated using following formula
Wherein, l indicates that normalized series, bins1 indicate normalized total series,WithRespectively the frame with
L grades of corresponding pixel quantities of the former frame of the frame;WithThe pixel number of the frame and the former frame of the frame respectively
Measure average value.It should be noted that bins1 is the number of the bin (box) of histogram, in YCbCr histograms, normalizing is indicated
The total series changed.For each pixel, the channels Y value carries out 16 deciles, and the channels Cb and the channels Cr carry out 9 deciles respectively.At this point,
Bins1 values are 16+9+9=34.Preferably, bins1 takes 34.Compared with colour difference information, human eye is more sensitive to luminance information,
Therefore preferably luminance information and colour difference information can be respectively processed using YCbCr color space models.
First similarity is compared with first threshold, if first similarity is less than first threshold, is shown
There is a strong possibility is the start frame of new segment (clip) for the frame, then using the frame as the start frame of new segment.First threshold can be with
It is determined according to experiment and practical application.Optionally, first threshold takes 0.85.
For the every section of video clip (i) cut roughly in step S103, wherein i indicates the serial number of every section of video, per second
A frame image is intercepted, is sent into behavior prediction network, network exports the identifier (id) of behavior, with clip (i) _ frame (j) _ id
It indicates, and exports corresponding corresponding confidence level clip (i) _ frame (j) _ confidence.Behavior prediction network is special
For the network of behavior prediction, each behavior is corresponded with an id.Behavior prediction network may include first network model,
Second network model and third network model.Single-frame images is described below and finally obtains behavior classification by behavior prediction network
Flow.
Optionally, the S200 scene Recognitions step may include:
S201 resolution ratio step of converting:The RGB channel of the video frame is separately converted to fixed-size resolution ratio;With
S202 scene characteristic vector generation steps:Video frame after resolution ratio converts is input to first network model
In, obtain the scene characteristic vector of the video frame, wherein the first network model is:Remove the full articulamentum of last layer and
The VGG16 network models of Softmax graders.
Fig. 2 is the schematic block diagram of the behavior prediction network of the application.It is fixed that image RGB channel is separately converted to size
Video frame after conversion is inputted first network model, also referred to as scene is known by resolution ratio for example, being converted into the resolution ratio of 224x224
Small pin for the case network.First network model is the improved VGG16 nets for the pre-defined trained scene Recognition of several scenes
Network, the improved VGG16 networks eliminate last full articulamentum and Softmax graders.The output of scene Recognition sub-network
For the vector of 1x1x25088 dimensions, it is denoted as scene characteristic vector place_feature_vector.
It should be noted that visual geometric group (Visual Geometry Group, VGG) is Oxford University's engineering science
One tissue, the tissue by expression data library carry out deep learning foundation model be VGG models, the spy of VGG models
Sign is VGG features, and VGG features may include:FC6 layers of feature.VGG16Net deep neural network structures.
VGG16Net network structures include the convolutional neural networks (ConvNet) of 5 stacking-types, each ConvNet in total
Be made of again multiple convolutional layers (Conv), Conv layer followed by Nonlinear Mapping layer (ReLU) later, is pond after each ConvNet
Change layer (Pooling), is finally 3 full articulamentums and 1 soft-max (maximizing layer), wherein each full articulamentum has
4096 channels, soft-max layers have 1000 channels according to specific task, can select different output numbers).It should
Network introduces smaller convolution kernel (3 × 3), increases ReLU layers, the input of convolutional layer and full articulamentum is all directly connected to ReLU
Layer, while having used a kind of regularization method (Dropout), such network structure to greatly shorten in full articulamentum fc6 and fc7
Training time, the flexibility of network is increased, while preventing over-fitting.The present invention considers network model
Factors, the features for choosing VGG16Net as the present invention such as study and characterization ability, the flexibility of structure and training time carry
Take device.Adjustment of matrix function (Reshape functions) in the model is the line number that can readjust matrix, columns, dimension
Function.
Optionally, the parts the S300 behavioural characteristic identification step may include:
The long fixing step of S301 most short sides:The RGB channel of the video frame is separately converted to most short side and grows fixed point
Resolution;With
The parts S302 behavioural characteristic vector generation step:Most short side is grown into fixed video frame and is input to first network model
In, the output result of the first network model is input to convolutional neural networks (FasterRCNN) model based on region
In, optimal detection category result is calculated using the output result of the convolutional neural networks based on region, by the optimal inspection
It surveys category result and obtains local behavioural characteristic vector by area-of-interest pond layer.
Referring to Fig. 2, the RGB channel of the video frame is separately converted to most short side is long, for example, 600 pixels resolution ratio,
Video frame is inputted into the second network model, in also referred to as local behavior detection sub-network network.Second network model is for pre-defined
Good several local trained local behavioral value networks of behavior.Second network model may include:First network model,
FasterRCNN, optimal detection module and pond layer.The flow chart of data processing of second network model is, by the first network mould
The output result of type is input in FasterRCNN models, and optimal detection module utilizes the convolutional neural networks based on region
Output result calculate optimal detection category result, by the optimal detection category result pass through area-of-interest (region of
Interest, ROI) pond layer (Pooling Layer) obtains local behavioural characteristic vector.Second network model is based on
FasterRCNN, but only use optimal detection classification.
Optimal detection classification is determined based on the formula quantified as follows:For each FasterRCNN output detection target and
Rectangle frame, for example, detection target takes the maximum probability value Softmax_max that softmxax is exported, the area of rectangle frame to be denoted as
S calculates optimal detection category result opt_detection:
Opt_detection=SCALE*softmax_max+WEIGHT*S
Wherein, SCALE is coefficient, and softmax_max is flooded by the value range of S in order to prevent;WEIGHT is the face that is directed to
Long-pending weighted value.Optionally, SCALE=1000, WEIGHT=0.7 indicate the weight of a little higher than area of weight of local behavior.
Optimal detection category result converts the output result that 7x7x512 is tieed up to by area-of-interest pond layer
1x1x25088 vectors, are denoted as local behavioural characteristic vector local_action_feature_vector.In fig. 2, it is obtaining
After local behavioural characteristic vector, by FC1, FC2, FC M, Softmax M obtain as a result, and the result of FC2 inputted FC
M*4, the identification that can be used in evaluating local behavioural characteristic vector using the obtained results of window regression function Bbox_Pred are imitated
Fruit, wherein M are local behavior classification.
Optionally, the S400 video frame behavior classification judgment step may include:
S401 video frame feature vectors merge step:Local behavioural characteristic vector described in the scene characteristic vector sum is closed
And it is video frame feature vector;With
S402 behaviors classification and confidence calculations step:The video frame feature vector is input to third network, is obtained
The behavior classification of the video frame and confidence level corresponding with behavior classification, wherein the third network is by 4 full articulamentums
It is in turn connected to form with Softmax graders.
In S401, by scene characteristic vector place_feature_vector and local behavioural characteristic vector local_
Action_feature_vector merges into a video frame feature vector, and the size of the vector is 1x1x (25088+25088)
=50176 dimensional vectors, are denoted as feature_vector, referring to Fig. 2.
Optionally, the S500 segments behavior classification judgment step may include:In the identical video frame number of behavior classification
In the case that the ratio of amount and the video frame total quantity of the segment is more than scheduled second threshold, using behavior classification as this
The behavior classification of segment.
In S402, video frame feature vector feature_vector passes through 4 layers of full articulamentum FC, FC1 to FC4.Its
In, FC1 exports 4096 channels, and FC2 exports 4096 channels, and FC3 exports 1000 channels, and FC4 exports the score of C classification, referring to
Fig. 2.C can be according to actual needs the selection of behavior categorical measure, it is general to choose 15 to 30 preferably.The output of FC4 accesses
Softmax graders, the forecast confidence of each behavior classification of final output.The highest behavior classification of confidence level is chosen, as
The frame line exports for classification, is denoted as clip (i) _ frame (j) _ id, clip (i) _ frame (j) _ confidence.
In S500 segment behavior class determining steps, for the frame of interception per second in segment clip (i), step is all carried out
The processing of S200 to S400 predicts the behavior classification of every frame.The identical frames of id account for the percentage of total prediction frame number in clip (i)
It is denoted as same_id_percent.Simply by the presence of such id so that same_id_percent>same_id_percent_
Thres, wherein same_id_percent_thres indicate the threshold value of setting, and the confidence level of the frame of identical id is more than 65%
Accounting be more than 80%.Just exported the id as the behavior classification of the segment clip (i).
In step S600 segments merge step, for each segment obtained roughly by step S100, all carry out
The processing stated obtains the behavior classification of each segment.If the behavior classification of adjacent segment is identical, just the two segments are closed
And it is a segment.Finally obtain the short-sighted frequency that the video is divided according to behavior classification.
It should be understood that the parts S300 behavioural characteristic identification step and S400 video frame behavior classification judgment steps are not
It must execute, can also be performed simultaneously in sequence, or successively execute.
Fig. 3 is the schematic block diagram of the behavior prediction network of trained the application.Optionally, this method can also include behavior
Predict the training step of network.
For first network model, that is, scene prediction network, the network model is using VGG16 to N number of predefined field
Scape is classified.The scene type N of output chooses according to actual demand, general to choose 30 to 40.For example, scene type can be
Dining room, basketball court, music hall etc..Training strategy is as follows:Weight w initialization is carried out using following formula:
W=np.random.randn (n) * sqrt (2.0/n)
Wherein, np.random.randn (n) is the function for generating random number, i.e., to each channel of each convolutional layer
N weights initialisation of filter is Gaussian Profile, and the generation of numpy methods may be used.It is calculated using square root function
Sqrt (2.0/n), the distribution variance to ensure the input of every layer of each neuron is consistent.Using dropout technologies come into
Row regularization, it refers to, for neural network unit, being pressed in the training process of deep learning network to prevent over-fitting, dropout
It is temporarily abandoned from network according to certain probability.The probability of each neuronal activation is hyper parameter p.Result warp behind pond
It is input to cost function after crossing two FC4096, FC N, Softmax N.Cost function uses cross entropy loss function cross-
Entropy loss (Softmax) are calculated.Wherein, weight more new strategy using SGD+Momentum (stochastic gradient descent+
Momentum) method realization.Learning rate (learning rate) is according to step decay (step decay) as the training time reduces.
For the second network model, that is, local behavior prediction network, Web vector graphic FasterRCNN, training method
Using the standard exercise method of FasterRCNN.The local behavior classification M of output chooses according to actual demand, it is general choose 15 to
30.For example, local behavior can be have a meal, appointment etc. of playing basketball.After obtaining local behavioural characteristic vector, pass through two FC
4096, the prediction result that FC M, Softmax M are obtained, and the result of second FC 4096 is inputted into FC M*4, utilize window
The result that mouth regression function Bbox_Pred is obtained can be used in evaluating the recognition effect of local behavioural characteristic vector, and wherein M is office
Portion's behavior classification.The result of Softmax M and FC M*4 are input to the intersection entropy loss defined by FasterRCNN.
After the completion of first network model and the training of the second network model, training third network.Scene network removes Softmax
Grader and last several layers of full articulamentums, remaining each layer parameter remain unchanged, last layer of pond layer is converted into
1x1x25088 is tieed up, and is denoted as video frame feature vector.For local Activity recognition network.When training third network model, Mei Getu
As going out multiple local behaviors and its position rectangle frame by the part Activity recognition neural network forecast, according to optimal detection classification, choosing
Optimal detection classification is taken, the vector output of the 7x7x512 dimensions of corresponding area-of-interest pond layer is obtained, is further converted into
The local behavioural characteristic vector of 1x1x25088 dimensions.Scene characteristic vector sum part behavioural characteristic Vector Groups are combined into 1x1x (25088+
25088) it=50176 ties up, is denoted as video frame feature vector.The video frame feature vector passes through 4 layers of full articulamentum FC1 to FC4.
The output of FC4 is sequentially ingressed into Softmax C and intersects entropy loss cross-entropy loss.For third network model,
He remains unchanged parameter, only the parameter of 4 layers of FC of training.Parameter training strategy takes the Training strategy of first network model.
For C behavior classification of third network model prediction, M local behavior classification of the second network model prediction,
N number of scene type of first network model prediction, can choose as follows.First whole C are defined according to business demand
Behavior classification, for example have a meal, play basketball, date.Then according to this C global behavior, to wherein possible local behavior classification
It is defined, can generally keep consistent with global behavior, for example have a meal, play basketball, date.Finally according to global behavior point
Class is defined N number of possible scene, for example, for having a meal, can define scenes such as dining room, coffee shop etc..
A kind of Video segmentation device is additionally provided according to another embodiment herein, Fig. 4 is according to the application
The schematic block diagram of one embodiment of Video segmentation device.The device may include:
Fragment segmentation module 100 is disposed for based on the related coefficient between video frame adjacent in video, by institute
Video segmentation is stated into segment;
Scene Recognition module 200 is disposed for identifying the video frame in the segment field of the video frame
Scape obtains scene characteristic vector;
Local behavioural characteristic identification module 300 is disposed for, for the video frame in the segment, identifying the video
The local behavioural characteristic of frame obtains local behavioural characteristic vector;
Video frame behavior classification judgment module 400 is disposed for based on part described in the scene characteristic vector sum
Behavioural characteristic vector, identifies the behavior classification of the video frame and confidence level corresponding with behavior classification;
Segment behavior category determination module 500, be disposed for the video frame based on the segment behavior classification and
Confidence level determines the behavior classification of the segment;With
Segment merging module 600 is disposed for merging the adjacent identical segment of behavior classification, obtains described regard
The segmentation result of frequency.
Device provided by the present application can simultaneously merge two-way model, comprehensive utilization scene and local behavior two
Dimension extracts global behavior information, to be rapidly split to video.
Optionally, the fragment segmentation module 100 may include:
Histogram calculation module 101 is disposed for calculating the YCbCr histograms of each video frame of the video
Figure;
Related coefficient computing module 102 is disposed for calculating the YCbCr histograms of the video frame and previous video
The related coefficient of the YCbCr histograms of frame;With
Threshold value comparison module 103 is disposed for, when the related coefficient is less than scheduled first threshold, this being regarded
Start frame of the frequency frame as new segment.
Optionally, the scene Recognition module 200 may include:
Resolution ratio conversion module 201 is disposed for the RGB channel of the video frame being separately converted to fixed dimension
Resolution ratio;With
Scene characteristic vector generation module 202 is disposed for the video frame after resolution ratio converts being input to
In first network model, the scene characteristic vector of the video frame is obtained, wherein the first network model is:Remove last
The VGG16 network models of layer full articulamentum and Softmax graders.
Optionally, the local behavioural characteristic identification module 300 may include:
The long fixed module 301 of most short side, is disposed for the RGB channel of the video frame being separately converted to most short side
Long fixed resolution ratio;With
Local behavioural characteristic vector generation module 302 is disposed for the fixed video frame of most short side length being input to
In first network model, the output result of the first network model is input to the convolutional neural networks based on region
(FasterRCNN) in model, optimal detection classification knot is calculated using the output result of the convolutional neural networks based on region
The optimal detection category result is obtained local behavioural characteristic vector by fruit by area-of-interest pond layer.
Optionally, the video frame behavior classification judgment module 400 may include:
Video frame feature vector merging module 401 is disposed for partial row described in the scene characteristic vector sum
Video frame feature vector is merged into for feature vector;With
Behavior classification and confidence calculations module 402 are disposed for the video frame feature vector being input to
Three networks obtain the behavior classification of the video frame and confidence level corresponding with behavior classification, wherein the third network by
4 full articulamentums are in turn connected to form with Softmax graders.
Fig. 5 is the block diagram of one embodiment of the computing device of the application.Another embodiment herein also provides
A kind of computing device, the computing device include memory 1120, processor 1110 and are stored in the memory 1120 simultaneously
The computer program that can be run by the processor 1110, the computer program are stored in memory 1120 and are used for program generation
The space 1130 of code, the computer program are realized when being executed by processor 1110 for executing any one side according to the present invention
Method step 1131.
The application another embodiments further provide a kind of computer readable storage medium.Fig. 6 is the calculating of the application
The block diagram of one embodiment of machine readable storage medium storing program for executing, the computer readable storage medium include the storage list for program code
Member, the storage unit are provided with the program 1131 ' for executing steps of a method in accordance with the invention, which is held by processor
Row.
The embodiment of the present application also provides a kind of computer program products including instruction.When the computer program product.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or its arbitrary combination real
It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program
Product includes one or more computer instructions.When computer loads and executes the computer program instructions, whole or portion
Ground is divided to generate according to the flow or function described in the embodiment of the present application.The computer can be all-purpose computer, dedicated computing
Machine, computer network obtain other programmable devices.The computer instruction can be stored in computer readable storage medium
In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer
Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center
User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or
Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or
It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with
It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state disk
Solid State Disk (SSD)) etc..
Professional should further appreciate that, described in conjunction with the examples disclosed in the embodiments of the present disclosure
Unit and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, hard in order to clearly demonstrate
The interchangeability of part and software generally describes each exemplary composition and step according to function in the above description.
These functions are implemented in hardware or software actually, depend on the specific application and design constraint of technical solution.
Professional technician can use different methods to achieve the described function each specific application, but this realization
It is not considered that exceeding scope of the present application.
One of ordinary skill in the art will appreciate that implement the method for the above embodiments be can be with
It is completed come instruction processing unit by program, the program can be stored in computer readable storage medium, and the storage is situated between
Matter is non-transitory (English:Non-transitory) medium, such as random access memory, read-only memory, flash
Device, hard disk, solid state disk, tape (English:Magnetic tape), floppy disk (English:Floppy disk), CD (English:
Optical disc) and its arbitrary combination.
The preferable specific implementation mode of the above, only the application, but the protection domain of the application is not limited thereto,
Any one skilled in the art is in the technical scope that the application discloses, the change or replacement that can be readily occurred in,
It should all cover within the protection domain of the application.Therefore, the protection domain of the application should be with scope of the claims
Subject to.
Claims (10)
1. a kind of methods of video segmentation, including:
Fragment segmentation step:Based on the related coefficient between video frame adjacent in video, by the Video segmentation at segment;
Scene Recognition step:For the video frame in the segment, the scene of the video frame is identified, obtain scene characteristic vector;
Local behavioural characteristic identification step:For the video frame in the segment, identifies the local behavioural characteristic of the video frame, obtain
To local behavioural characteristic vector;
Video frame behavior classification judgment step:Based on local behavioural characteristic vector described in the scene characteristic vector sum, institute is identified
State the behavior classification of video frame and confidence level corresponding with behavior classification;
Segment behavior class determining step:The behavior classification and confidence level of video frame based on the segment, determine the segment
Behavior classification;
Segment merges step:The adjacent identical segment of behavior classification is merged, the segmentation result of the video is obtained.
2. according to the method described in claim 1, it is characterized in that, the fragment segmentation step includes:
Histogram calculation step:Calculate the YCbCr histograms of each video frame of the video;
Related coefficient calculates step:Calculate the phase of the YCbCr histograms of the video frame and the YCbCr histograms of previous video frame
Relationship number;
Threshold value comparison step:When the related coefficient is less than scheduled first threshold, using the video frame as new segment
Start frame.
3. method according to claim 1 or 2, which is characterized in that the scene Recognition step includes:
Resolution ratio step of converting:The RGB channel of the video frame is separately converted to fixed-size resolution ratio;With
Scene characteristic vector generation step:Video frame after resolution ratio converts is input in first network model, is obtained
The scene characteristic vector of the video frame, wherein the first network model is:Remove the full articulamentum of last layer and Softmax
The VGG16 network models of grader.
4. method according to claim 1 or 2, which is characterized in that it is described part behavioural characteristic identification step include:
The long fixing step of most short side:The RGB channel of the video frame is separately converted to most short side and grows fixed resolution ratio;With
Local behavioural characteristic vector generation step:Most short side is grown fixed video frame to be input in first network model, by institute
The output result for stating first network model is input in the convolutional neural networks based on region (FasterRCNN) model, utilizes institute
The output result for stating the convolutional neural networks based on region calculates optimal detection category result, by the optimal detection category result
Local behavioural characteristic vector is obtained by area-of-interest pond layer.
5. according to the method described in claim 4, it is characterized in that, the video frame behavior classification judgment step includes:
Video frame feature vector merges step:Local behavioural characteristic vector described in the scene characteristic vector sum is merged into video
Frame feature vector;With
Behavior classification and confidence calculations step:The video frame feature vector is input to third network, obtains the video
The behavior classification of frame and confidence level corresponding with behavior classification, wherein the third network by 4 full articulamentums with
Softmax graders are in turn connected to form.
6. according to the method described in claim 1, it is characterized in that, the segment behavior classification judgment step includes:In behavior
In the case that the ratio of the identical video frame quantity of classification and the video frame total quantity of the segment is more than scheduled second threshold,
Using behavior classification as the behavior classification of the segment.
7. a kind of Video segmentation device, including:
Fragment segmentation module is disposed for based on the related coefficient between video frame adjacent in video, by the video
It is divided into segment;
Scene Recognition module is disposed for identifying the video frame in the segment scene of the video frame, must show up
Scape feature vector;
Local behavioural characteristic identification module is disposed for identifying the video frame in the segment office of the video frame
Portion's behavioural characteristic obtains local behavioural characteristic vector;
Video frame behavior classification judgment module is disposed for based on local behavioural characteristic described in the scene characteristic vector sum
Vector identifies the behavior classification of the video frame and confidence level corresponding with behavior classification;
Segment behavior category determination module is disposed for the behavior classification and confidence level of the video frame based on the segment,
Determine the behavior classification of the segment;With
Segment merging module is disposed for merging the adjacent identical segment of behavior classification, obtains point of the video
Cut result.
8. a kind of computer equipment, including memory, processor and storage can be transported in the memory and by the processor
Capable computer program, wherein the processor is realized when executing the computer program such as any one of claim 1 to 6
The method.
9. a kind of computer readable storage medium, preferably non-volatile readable storage medium, are stored with computer program,
The computer program realizes such as method according to any one of claims 1 to 6 when executed by the processor.
10. a kind of computer program product, including computer-readable code, when the computer-readable code is by computer equipment
When execution, the computer equipment is caused to execute such as method according to any one of claims 1 to 6.
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110314627.4A CN112966646B (en) | 2018-05-10 | 2018-05-10 | Video segmentation method, device, equipment and medium based on two-way model fusion |
CN202110314575.0A CN112906649B (en) | 2018-05-10 | 2018-05-10 | Video segmentation method, device, computer device and medium |
CN201810443505.3A CN108647641B (en) | 2018-05-10 | 2018-05-10 | Video behavior segmentation method and device based on two-way model fusion |
CN202110313073.6A CN112836687B (en) | 2018-05-10 | 2018-05-10 | Video behavior segmentation method, device, computer equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810443505.3A CN108647641B (en) | 2018-05-10 | 2018-05-10 | Video behavior segmentation method and device based on two-way model fusion |
Related Child Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110313073.6A Division CN112836687B (en) | 2018-05-10 | 2018-05-10 | Video behavior segmentation method, device, computer equipment and medium |
CN202110314627.4A Division CN112966646B (en) | 2018-05-10 | 2018-05-10 | Video segmentation method, device, equipment and medium based on two-way model fusion |
CN202110314575.0A Division CN112906649B (en) | 2018-05-10 | 2018-05-10 | Video segmentation method, device, computer device and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108647641A true CN108647641A (en) | 2018-10-12 |
CN108647641B CN108647641B (en) | 2021-04-27 |
Family
ID=63754392
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110314627.4A Active CN112966646B (en) | 2018-05-10 | 2018-05-10 | Video segmentation method, device, equipment and medium based on two-way model fusion |
CN202110313073.6A Active CN112836687B (en) | 2018-05-10 | 2018-05-10 | Video behavior segmentation method, device, computer equipment and medium |
CN201810443505.3A Active CN108647641B (en) | 2018-05-10 | 2018-05-10 | Video behavior segmentation method and device based on two-way model fusion |
CN202110314575.0A Active CN112906649B (en) | 2018-05-10 | 2018-05-10 | Video segmentation method, device, computer device and medium |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110314627.4A Active CN112966646B (en) | 2018-05-10 | 2018-05-10 | Video segmentation method, device, equipment and medium based on two-way model fusion |
CN202110313073.6A Active CN112836687B (en) | 2018-05-10 | 2018-05-10 | Video behavior segmentation method, device, computer equipment and medium |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110314575.0A Active CN112906649B (en) | 2018-05-10 | 2018-05-10 | Video segmentation method, device, computer device and medium |
Country Status (1)
Country | Link |
---|---|
CN (4) | CN112966646B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109543590A (en) * | 2018-11-16 | 2019-03-29 | 中山大学 | A kind of video human Activity recognition algorithm of Behavior-based control degree of association fusion feature |
CN110516540A (en) * | 2019-07-17 | 2019-11-29 | 青岛科技大学 | Group Activity recognition method based on multithread framework and long memory network in short-term |
CN110751218A (en) * | 2019-10-22 | 2020-02-04 | Oppo广东移动通信有限公司 | Image classification method, image classification device and terminal equipment |
WO2020119187A1 (en) * | 2018-12-14 | 2020-06-18 | 北京沃东天骏信息技术有限公司 | Method and device for segmenting video |
CN111541912A (en) * | 2020-04-30 | 2020-08-14 | 北京奇艺世纪科技有限公司 | Video splitting method and device, electronic equipment and storage medium |
CN111881818A (en) * | 2020-07-27 | 2020-11-03 | 复旦大学 | Medical action fine-grained recognition device and computer-readable storage medium |
CN113301430A (en) * | 2021-07-27 | 2021-08-24 | 腾讯科技(深圳)有限公司 | Video clipping method, video clipping device, electronic equipment and storage medium |
CN113784227A (en) * | 2020-06-10 | 2021-12-10 | 北京金山云网络技术有限公司 | Video slicing method and device, electronic equipment and storage medium |
CN113784226A (en) * | 2020-06-10 | 2021-12-10 | 北京金山云网络技术有限公司 | Video slicing method and device, electronic equipment and storage medium |
EP4024879A4 (en) * | 2019-09-06 | 2022-11-09 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Video processing method and device, terminal and computer readable storage medium |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113569703B (en) * | 2021-07-23 | 2024-04-16 | 上海明略人工智能(集团)有限公司 | Real division point judging method, system, storage medium and electronic equipment |
CN117610105B (en) * | 2023-12-07 | 2024-06-07 | 上海烜翊科技有限公司 | Model view structure design method for automatically generating system design result |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102833492A (en) * | 2012-08-01 | 2012-12-19 | 天津大学 | Color similarity-based video scene segmenting method |
US20160155024A1 (en) * | 2014-12-02 | 2016-06-02 | Canon Kabushiki Kaisha | Video segmentation method |
CN106529467A (en) * | 2016-11-07 | 2017-03-22 | 南京邮电大学 | Group behavior identification method based on multi-feature fusion |
CN107590420A (en) * | 2016-07-07 | 2018-01-16 | 北京新岸线网络技术有限公司 | Scene extraction method of key frame and device in video analysis |
CN107590442A (en) * | 2017-08-22 | 2018-01-16 | 华中科技大学 | A kind of video semanteme Scene Segmentation based on convolutional neural networks |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7296231B2 (en) * | 2001-08-09 | 2007-11-13 | Eastman Kodak Company | Video structuring by probabilistic merging of video segments |
CN102426705B (en) * | 2011-09-30 | 2013-10-30 | 北京航空航天大学 | Behavior splicing method of video scene |
US20140328570A1 (en) * | 2013-01-09 | 2014-11-06 | Sri International | Identifying, describing, and sharing salient events in images and videos |
US9244924B2 (en) * | 2012-04-23 | 2016-01-26 | Sri International | Classification, search, and retrieval of complex video events |
CN103366181A (en) * | 2013-06-28 | 2013-10-23 | 安科智慧城市技术(中国)有限公司 | Method and device for identifying scene integrated by multi-feature vision codebook |
EP3007082A1 (en) * | 2014-10-07 | 2016-04-13 | Thomson Licensing | Method for computing a similarity measure for video segments |
CN104331442A (en) * | 2014-10-24 | 2015-02-04 | 华为技术有限公司 | Video classification method and device |
CN105989358A (en) * | 2016-01-21 | 2016-10-05 | 中山大学 | Natural scene video identification method |
CN105893936B (en) * | 2016-03-28 | 2019-02-12 | 浙江工业大学 | A kind of Activity recognition method based on HOIRM and Local Feature Fusion |
CN107027051B (en) * | 2016-07-26 | 2019-11-08 | 中国科学院自动化研究所 | A kind of video key frame extracting method based on linear dynamic system |
CN107992836A (en) * | 2017-12-12 | 2018-05-04 | 中国矿业大学(北京) | A kind of recognition methods of miner's unsafe acts and system |
-
2018
- 2018-05-10 CN CN202110314627.4A patent/CN112966646B/en active Active
- 2018-05-10 CN CN202110313073.6A patent/CN112836687B/en active Active
- 2018-05-10 CN CN201810443505.3A patent/CN108647641B/en active Active
- 2018-05-10 CN CN202110314575.0A patent/CN112906649B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102833492A (en) * | 2012-08-01 | 2012-12-19 | 天津大学 | Color similarity-based video scene segmenting method |
US20160155024A1 (en) * | 2014-12-02 | 2016-06-02 | Canon Kabushiki Kaisha | Video segmentation method |
CN107590420A (en) * | 2016-07-07 | 2018-01-16 | 北京新岸线网络技术有限公司 | Scene extraction method of key frame and device in video analysis |
CN106529467A (en) * | 2016-11-07 | 2017-03-22 | 南京邮电大学 | Group behavior identification method based on multi-feature fusion |
CN107590442A (en) * | 2017-08-22 | 2018-01-16 | 华中科技大学 | A kind of video semanteme Scene Segmentation based on convolutional neural networks |
Non-Patent Citations (2)
Title |
---|
RUI YANG ET AL.: ""Video Segmentation via Multiple Granularity Analysis"", 《IEEE》 * |
申海洋: ""基于内容的监控视频检索算法研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109543590A (en) * | 2018-11-16 | 2019-03-29 | 中山大学 | A kind of video human Activity recognition algorithm of Behavior-based control degree of association fusion feature |
CN111327945B (en) * | 2018-12-14 | 2021-03-30 | 北京沃东天骏信息技术有限公司 | Method and apparatus for segmenting video |
EP3896986A4 (en) * | 2018-12-14 | 2022-08-24 | Beijing Wodong Tianjun Information Technology Co., Ltd. | Method and device for segmenting video |
WO2020119187A1 (en) * | 2018-12-14 | 2020-06-18 | 北京沃东天骏信息技术有限公司 | Method and device for segmenting video |
CN111327945A (en) * | 2018-12-14 | 2020-06-23 | 北京沃东天骏信息技术有限公司 | Method and apparatus for segmenting video |
US11275950B2 (en) | 2018-12-14 | 2022-03-15 | Beijing Wodong Tianjun Information Technology Co., Ltd. | Method and apparatus for segmenting video |
CN110516540B (en) * | 2019-07-17 | 2022-04-29 | 青岛科技大学 | Group behavior identification method based on multi-stream architecture and long-term and short-term memory network |
CN110516540A (en) * | 2019-07-17 | 2019-11-29 | 青岛科技大学 | Group Activity recognition method based on multithread framework and long memory network in short-term |
EP4024879A4 (en) * | 2019-09-06 | 2022-11-09 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Video processing method and device, terminal and computer readable storage medium |
CN110751218A (en) * | 2019-10-22 | 2020-02-04 | Oppo广东移动通信有限公司 | Image classification method, image classification device and terminal equipment |
CN111541912A (en) * | 2020-04-30 | 2020-08-14 | 北京奇艺世纪科技有限公司 | Video splitting method and device, electronic equipment and storage medium |
CN113784227A (en) * | 2020-06-10 | 2021-12-10 | 北京金山云网络技术有限公司 | Video slicing method and device, electronic equipment and storage medium |
CN113784226A (en) * | 2020-06-10 | 2021-12-10 | 北京金山云网络技术有限公司 | Video slicing method and device, electronic equipment and storage medium |
CN111881818A (en) * | 2020-07-27 | 2020-11-03 | 复旦大学 | Medical action fine-grained recognition device and computer-readable storage medium |
CN111881818B (en) * | 2020-07-27 | 2022-07-22 | 复旦大学 | Medical action fine-grained recognition device and computer-readable storage medium |
CN113301430A (en) * | 2021-07-27 | 2021-08-24 | 腾讯科技(深圳)有限公司 | Video clipping method, video clipping device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112836687A (en) | 2021-05-25 |
CN112906649A (en) | 2021-06-04 |
CN112966646B (en) | 2024-01-09 |
CN112966646A (en) | 2021-06-15 |
CN108647641B (en) | 2021-04-27 |
CN112836687B (en) | 2024-05-10 |
CN112906649B (en) | 2024-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108647641A (en) | Video behavior dividing method and device based on two-way Model Fusion | |
Rao | Dynamic histogram equalization for contrast enhancement for digital images | |
CN104834933B (en) | A kind of detection method and device in saliency region | |
Zhao et al. | SCGAN: Saliency map-guided colorization with generative adversarial network | |
KR102449841B1 (en) | Method and apparatus for detecting target | |
Agrawal et al. | Grape leaf disease detection and classification using multi-class support vector machine | |
Bianco et al. | Predicting image aesthetics with deep learning | |
CN109657715B (en) | Semantic segmentation method, device, equipment and medium | |
US20130028470A1 (en) | Image processing apparatus, image processing method, and comupter readable recording device | |
CN112528058B (en) | Fine-grained image classification method based on image attribute active learning | |
Trivedi et al. | Automatic segmentation of plant leaves disease using min-max hue histogram and k-mean clustering | |
Chetouani et al. | On the use of a scanpath predictor and convolutional neural network for blind image quality assessment | |
Waldamichael et al. | Coffee disease detection using a robust HSV color‐based segmentation and transfer learning for use on smartphones | |
Mano et al. | Method of multi‐region tumour segmentation in brain MRI images using grid‐based segmentation and weighted bee swarm optimisation | |
CN111368911A (en) | Image classification method and device and computer readable storage medium | |
Li et al. | A novel feature fusion method for computing image aesthetic quality | |
Wang et al. | Distortion recognition for image quality assessment with convolutional neural network | |
KR101833943B1 (en) | Method and system for extracting and searching highlight image | |
US8131077B2 (en) | Systems and methods for segmenting an image based on perceptual information | |
Chang et al. | Semantic-relation transformer for visible and infrared fused image quality assessment | |
WO2024083152A1 (en) | Pathological image recognition method, pathological image recognition model training method and system therefor, and storage medium | |
CN108510483A (en) | A kind of calculating using VLAD codings and SVM generates color image tamper detection method | |
US20220280086A1 (en) | Management server, method of generating relative pattern information between pieces of imitation drawing data, and computer program | |
Hepburn et al. | Enforcing perceptual consistency on generative adversarial networks by using the normalised laplacian pyramid distance | |
CN110796650A (en) | Image quality evaluation method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: A Method and Device for Video Behavior Segmentation Based on Dual Path Model Fusion Effective date of registration: 20230713 Granted publication date: 20210427 Pledgee: Bank of Jiangsu Limited by Share Ltd. Beijing branch Pledgor: BEIJING MOVIEBOOK SCIENCE AND TECHNOLOGY Co.,Ltd. Registration number: Y2023110000278 |