CN110347873A - Video classification methods, device, electronic equipment and storage medium - Google Patents
Video classification methods, device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN110347873A CN110347873A CN201910562350.XA CN201910562350A CN110347873A CN 110347873 A CN110347873 A CN 110347873A CN 201910562350 A CN201910562350 A CN 201910562350A CN 110347873 A CN110347873 A CN 110347873A
- Authority
- CN
- China
- Prior art keywords
- feature
- network
- video
- trained
- key frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/75—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
Present disclose provides a kind of video classification methods, device, electronic equipment and computer readable storage mediums, are related to technical field of image processing, and the video classification methods include: to carry out sparse sampling to video to be processed to obtain multiple key frames;The multiple key frame is handled by the feature extraction network in preset model, to extract the feature of the multiple key frame;The feature of the multiple key frame is merged by attention network trained in the preset model, and fused feature is handled to obtain the classification results of the video to be processed.The disclosure can reduce calculation amount, improve visual classification speed and efficiency.
Description
Technical field
This disclosure relates to which technical field of image processing, fills in particular to a kind of video classification methods, visual classification
It sets, electronic equipment and computer readable storage medium.
Background technique
With the development of video technique, user can obtain various videos from multiple channel.Due to the number of video
Amount is excessively huge, by carrying out classification processing to video, can search and use the video of needs convenient for user, improve user's body
It tests.
In the related technology, video classification methods may include the method based on shot and long term memory network, based on 3D convolution
Method and method based on binary-flow network.
In above-mentioned several ways, since the parameter amount that network structure is larger and calculates is larger, processing speed is slower.
In addition, when handling inter-frame information, all can carry out global operation in above-mentioned several ways to individual frames, cause computing resource unrestrained
Take;And the information since interframe cannot be utilized, it may cause classification results inaccuracy.
It should be noted that information is only used for reinforcing the reason to the background of the disclosure disclosed in above-mentioned background technology part
Solution, therefore may include the information not constituted to the prior art known to persons of ordinary skill in the art.
Summary of the invention
The disclosure is designed to provide a kind of video classification methods, device, electronic equipment and computer-readable storage medium
Matter, and then overcome visual classification caused by the limitation and defect due to the relevant technologies slow at least to a certain extent
Problem.
Other characteristics and advantages of the disclosure will be apparent from by the following detailed description, or partially by the disclosure
Practice and acquistion.
According to one aspect of the disclosure, a kind of video classification methods are provided, comprising: sparse adopt is carried out to video to be processed
Sample obtains multiple key frames;The multiple key frame is handled by the feature extraction network in preset model, to extract
The feature of the multiple key frame;By attention network trained in the preset model to the spy of the multiple key frame
Sign is merged, and is handled fused feature to obtain the classification results of the video to be processed.
In a kind of exemplary embodiment of the disclosure, the feature extraction network includes residual error network, by presetting mould
Feature extraction network in type handles the multiple key frame, with extract the feature of the multiple key frame include: by
The multiple key frame described will be criticized in the input residual error network as one batch, to extract the multiple key frame
The feature.
In a kind of exemplary embodiment of the disclosure, by attention network trained in the preset model to institute
The feature for stating multiple key frames is merged, and is handled fused feature to obtain the classification knot of the video to be processed
Fruit includes: that the feature of the multiple key frame is inputted the trained attention network, obtains fused feature;According to
The fused feature determines that the video to be processed belongs to the probability of each classification, to divide according to the determine the probability
Class result.
In a kind of exemplary embodiment of the disclosure, the feature of the multiple key frame is inputted into the trained note
Meaning power network, before obtaining fused feature, the method also includes: the residual error network is fixed, and to the attention
Power network is trained, to obtain the trained attention network.
In a kind of exemplary embodiment of the disclosure, the method also includes: obtaining the trained attention
After network, the preset model is trained, obtains trained preset model.
In a kind of exemplary embodiment of the disclosure, the preset model is trained, is obtained trained default
Model includes: to be trained end to end to the preset model, to obtain the trained preset model.
In a kind of exemplary embodiment of the disclosure, the method also includes: it is trained based on loss is returned to described
Preset model compressed;And/or the parameter type of the trained preset model is adjusted.
According to one aspect of the disclosure, a kind of visual classification device is provided, comprising: key frame obtain module, for pair
Video to be processed carries out sparse sampling and obtains multiple key frames;Characteristic extracting module, for being mentioned by the feature in preset model
Network is taken to handle the multiple key frame, to extract the feature of the multiple key frame;Classification results determining module is used
Trained attention network merges the feature of the multiple key frame in through the preset model, and to fusion
Feature afterwards is handled to obtain the classification results of the video to be processed.
According to one aspect of the disclosure, a kind of electronic equipment is provided, comprising: processor;And memory, for storing
The executable instruction of the processor;Wherein, the processor is configured to above-mentioned to execute via the executable instruction is executed
Video classification methods described in any one.
According to one aspect of the disclosure, a kind of computer readable storage medium is provided, computer program is stored thereon with,
The computer program realizes video classification methods described in above-mentioned any one when being executed by processor.
In video classification methods, device, electronic equipment and the computer readable storage medium that the present exemplary embodiment provides,
By extracting the feature of the key frame of video to be processed, and merged using feature of the attention network to multiple key frames,
To classify to video to be processed.On the one hand, video to be processed is extracted by the feature extraction network in preset model
The feature of multiple key frames reduces the parameter for being input to feature extraction network, and due to the network knot of feature extraction network
Structure is smaller, reduces the quantity of the parameter of processing, avoids and extracts the features of all frames of video to be processed in the related technology and make
At time waste, improve extract feature speed, improve treatment effeciency.On the other hand, using attention network to more
The feature of a key frame is merged to obtain the classification results of video to be processed, can be melted to the feature of multiple key frames
It closes so that the information between different frame is uniformly processed, global operation can be carried out to each individual key frame in the related technology by avoiding
The step of, reduce the waste to computing resource, reduces resource consumption;And inter-frame information can be efficiently used, therefore energy
It is enough that Accurate classification is carried out to video to be processed, improve the precision of classification results.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not
The disclosure can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure
Example, and together with specification for explaining the principles of this disclosure.It should be evident that the accompanying drawings in the following description is only the disclosure
Some embodiments for those of ordinary skill in the art without creative efforts, can also basis
These attached drawings obtain other attached drawings.
Fig. 1 schematically shows the schematic diagram of video classification methods in disclosure exemplary embodiment.
Fig. 2 schematically shows the structural schematic diagram of disclosure exemplary embodiment preset model.
Fig. 3 schematically shows the flow chart that classification results are determined in disclosure exemplary embodiment.
Fig. 4 schematically shows the overall flow figure classified in disclosure exemplary embodiment to video.
Fig. 5 schematically shows the block diagram of visual classification device in disclosure exemplary embodiment.
Fig. 6 schematically shows the schematic diagram of the electronic equipment in disclosure exemplary embodiment.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be with a variety of shapes
Formula is implemented, and is not understood as limited to example set forth herein;On the contrary, thesing embodiments are provided so that the disclosure will more
Fully and completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.Described feature, knot
Structure or characteristic can be incorporated in any suitable manner in one or more embodiments.In the following description, it provides perhaps
More details fully understand embodiment of the present disclosure to provide.It will be appreciated, however, by one skilled in the art that can
It is omitted with technical solution of the disclosure one or more in the specific detail, or others side can be used
Method, constituent element, device, step etc..In other cases, be not shown in detail or describe known solution to avoid a presumptuous guest usurps the role of the host and
So that all aspects of this disclosure thicken.
In addition, attached drawing is only the schematic illustrations of the disclosure, it is not necessarily drawn to scale.Identical attached drawing mark in figure
Note indicates same or similar part, thus will omit repetition thereof.Some block diagrams shown in the drawings are function
Energy entity, not necessarily must be corresponding with physically or logically independent entity.These function can be realized using software form
Energy entity, or these functional entitys are realized in one or more hardware modules or integrated circuit, or at heterogeneous networks and/or place
These functional entitys are realized in reason device device and/or microcontroller device.
In the present exemplary embodiment, a kind of video classification methods are provided firstly, which can be applied to
Any scene classified to photo, video either picture.Next, refering to what is shown in Fig. 1, in the present exemplary embodiment
Video classification methods be described in detail.
In step s 110, sparse sampling is carried out to video to be processed and obtains multiple key frames.
In the present exemplary embodiment, video to be processed may include the multitude of video (example that some file stores in terminal
Such as the video in intelligent terminal photograph album) or certain information exchange platforms in upload and storage multitude of video.View to be processed
The concrete type of frequency can be determined according to actual operating function demand, such as when needing to classify, video to be processed is referred to
Video to be sorted.
The difference as existing for the continuous interframe of video to be processed is little, again will not need in the present exemplary embodiment
Each frame information of video to be processed is all used as the input of subsequent processes.It is carried out to choose the partial frame of video to be processed
Processing, can sample video to be processed.Sampling refers to video to be processed is enterprising in the ranks in time-domain as sample size
Every the process of sampling.The sparse degree of the corresponding sampled result of different sample rates is different, for example, gives a view to be processed
Frequency such as video sequence V, the video sequence, can be equally divided into T+1 sections, each section of view comprising identical quantity by a length of T at that time
Then frequency frame randomly selects sample of the frame as sampling from each section.In this way, which this can be obtained from T+1 sections to from
Manage multiple key frames of video.In the present exemplary embodiment, multiple keys are obtained by carrying out sparse sampling to video to be processed
Frame can reduce sampling number under conditions of guaranteeing that data are in fidelity range, decrease input feature vector and extract net
The parameter of network, to reduce operand.
Shown in continuing to refer to figure 1, in the step s 120, by the feature extraction network in preset model to the multiple
Key frame is handled, to extract the feature of the multiple key frame.
In the present exemplary embodiment, preset model refers to obtaining view to be processed for handling multiple key frames
The entire model of the classification results of frequency.Preset model mainly may include two parts: wherein first part is characterized extraction network,
Second part is attention network.The feature of multiple key frames can specifically be indicated with feature vector.
Feature extraction network is illustrated first.Feature extraction network is mainly used for extracting input this feature extraction network
Each video to be processed multiple key frames feature.Feature extraction network may include that any one can extract feature
Network model, such as suitable machine learning model, machine learning model can include but is not limited to convolutional neural networks, follow
Ring neural network and residual error network model etc..If feature extraction network is convolutional neural networks, due to convolutional neural networks
It may include multiple convolutional layers and pond layer, each convolutional layer for extracting different features respectively, and pond layer is for reducing dimension
Degree is to extract main feature, to execute subsequent processing for main feature as final feature.
In the present exemplary embodiment, the network principal based on the end PC that character network uses is extracted, but is used based on shifting
The network of moved end such as MobileNet, ThunderNet carry out feature extraction work also within the scope of protection of this application.
In the present exemplary embodiment, if feature extraction network is residual error network, by feature extraction network to described more
A key frame is handled, and includes: to make the multiple key frame to extract the detailed process of the feature of the multiple key frame
It is one batch, and described will criticizes in the input residual error network, extracts the feature of the multiple key frame.Wherein, residual
Poor network can be any one in a variety of residual error networks such as 18 layers of residual error network, 34 layers of residual error network, this sentences 18
It is illustrated for the residual error network ResNet18 of layer.
Residual error network is to be made of residual block (difference of output and input), and the mapping of residual error Web vector graphic congruence is direct
Later layer is passed into preceding layer output.Assuming that the input of certain section of neural network is x, desired output is H (x).In residual error network
In, input x directly can be passed into output as a result, the target for then needing to learn is residual error H (x)-x, rather than it is complete defeated
Out.
Constructing a ResNet network is exactly by being packed together many such residual blocks, by a common volume
The method that product becomes residual error network through network is connected plus all jumps, and one shortcut of every two layers of increase constitutes a residual error
Block.For example, 5 residual blocks, which link together, constitutes a residual error network.In each residual block being sequentially connected in residual error network
Include an identical mapping and at least two convolutional layers in any one residual block, the identical mapping of any one residual block by
The input terminal of any one residual block is directed toward the output end of any one residual block.The specific network knot of residual error network
Structure, number of plies etc. can be configured according to needs such as computing resource consumption, recognition performances, be not particularly limited herein.It needs
Bright, the residual error network ResNet18 of coded portion used in this step is preparatory trained model, therefore is originally shown
It does not need to be trained it optimization in example property embodiment.
It, can after the multiple key frames for getting multiple videos to be processed in step s 110 in the present exemplary embodiment
Using this multiple key frame as one crowd of Batch.Batch size is a hyper parameter, updates internal model ginseng for being defined on
Sample size to be processed before number.Batch processing is considered as loop iteration one or more sample and is predicted.In batch processing
At the end of, prediction is compared with anticipated output variable, and calculate error.From the mistake, more new algorithm is for improving mould
Type, such as moved down along error gradient.When all samples are for creating a Batch, learning algorithm is known as batch gradient
Decline.Since all key frames form one batch, the renewal frequency and update times to network can be reduced.
Specifically, using multiple key frames as first residual block of one batch of input residual error network;For any one
Residual block receives the output of a upper residual block, and is based on the first convolutional layer, the second convolutional layer and third convolutional layer, to upper
The output of one residual block carries out feature extraction;The output for obtaining third convolutional layer, by the output of third convolutional layer and upper one
The output of a residual block is transmitted to next residual block;The output for obtaining the last one residual block of residual error network, obtains multiple
The feature of key frame.
In the present exemplary embodiment, due to using 18 layers of residual error network as the network for carrying out feature extraction, network tool
The stronger ability that feature extraction is carried out to image is had, while the network number of plies is less and then reduces network parameter.It solves
Because the network number of plies it is too deep caused by gradient disperse problem, can with deeper network structure carry out feature extraction, it is ensured that feature
The accuracy of extraction, and reduce calculation amount.
Method in step S110 and step S120 obtains multiple keys by carrying out sparse sampling to video to be processed
Each frame information of video to be processed, is no longer all used as the input of next step by frame, to reduce the parameter of input.Also,
Residual error network has the ability to image zooming-out feature, and the network number of plies is less, further reduces the quantity of parameter.Such one
Come, the feature of key frame is extracted by the less feature extraction network of sparse sampling and the number of plies, reduces needs and transmit and calculate
Parameter quantity, save computing resource.
Shown in continuing to refer to figure 1, in step s 130, pass through attention network pair trained in the preset model
The feature of the multiple key frame is merged, and is handled fused feature to obtain the classification of the video to be processed
As a result.
In the present exemplary embodiment, preset model refers to trained preset model.It is diagrammatically illustrated in Fig. 2 default
The concrete structure diagram of model, with reference to shown in Fig. 2, preset model may be used also other than feature extraction network and attention network
With include one BN layers, a full articulamentum and a softmax, to obtain classify according to the vector that softmax is exported more
The result of label.Wherein, feature extraction network is the ResNet18 for removing softmax, the frame of one batch of the network inputs,
Export the feature vector of multiple key frames;Attention network and feature extraction are connected to the network, and the input of attention network is more
The feature vector of a key frame, the output of attention network are fused vector;BN layers are connected to the network with attention, are used for
Each neuron is normalized, to accelerate training speed, improves model accuracy;Full articulamentum (fully
Connected layers, FC) it is connect with BN layers, play the role of classifier in entire convolutional neural networks;
Softmax is connect with full articulamentum, finally exports predicted vector, each dimension of predicted vector represents the general of corresponding classification
Rate.
Since convolutional neural networks do not have the ability of fusion inter-frame information, for the key frame that extracts
Attention network can be used to merge the feature between multiple and different key frames for feature, to obtain for view to be processed
The classification results of frequency.The attention network can be interframe attention network, and input can be more obtained in step S120
The feature of the Batch of a key frame composition, output are fused vector.
The flow chart of determining classification results is diagrammatically illustrated in Fig. 3, mainly includes step S310 with reference to shown in Fig. 3
With step S320, in which:
In step s310, the feature of the multiple key frame is inputted into the trained attention network, is melted
Feature after conjunction.
In this step, attention network refers to the network based on attention mechanism, and attention mechanism can allow a mind
A part of information of its input can be only focused on through network, it can select specifically to input.Attention mechanism can be applied
To any type of input, regardless of its shape, input, such as image either vector for matrix form etc..
It, can be before calculating fused feature, first to attention network in order to guarantee the accuracy of fused feature
It is trained, to be carried out at fusion by the feature that trained attention network handles handle multiple key frames of video
Reason.It may include: to fix the residual error network to the detailed process that attention network is trained, and to the attention net
Network is trained, to obtain the trained attention network.That is, the training process of entire model, due to front
The ResNet18 network of the coded portion used is preparatory trained model, so in the training process, first fixing this
Partial parameters are only trained subsequent attention network, after the loss function of attention network tends towards stability, stopping pair
The training process of attention network, to obtain trained attention network.Specifically, the note in the present exemplary embodiment
Anticipating power network can be as shown in formula (1):
Wherein, a represents the vector of input attention network, i.e., the feature vector of multiple key frames;C is calculated more
The fusion vector of the feature of a key frame.Input the parameter calculation such as formula (2) and formula of the vector a of attention network
(3) shown in:
ei=wTaiFormula (2)
Wherein, w is the parameter learnt in the training process, by the parameter learnt, can use trained note
Meaning power network query function goes out the fused vector c of the feature of multiple key frames.
When being trained to attention network, it is possible, firstly, to obtain the image data of multiple key frames, and manually determine
Classification belonging to this video to be processed out;Then, the attention network is trained using classification and image data, with
The weight of each convolution kernel constantly in adjustment attention network, until the classification for obtaining classification and manually setting, to obtain
Trained attention network.
It may include: to be obtained entire convolutional layer information as input by the specific steps that attention network is merged
The point to be concentrated for the first time is taken, to indicate the attention to different location.It obtains after paying attention to force vector, it can be by last point
Notice that the vector of force vector and convolutional layer does product, vector that product obtains indicate it is noted that point location information.By position
After information and timing information combine incoming network, under current timing, the prediction that new position vector and output is calculated is general
Rate information.Constantly output is combined with convolutional layer to generate new location point information, so that new attention is obtained, using new
Attention combines input to obtain new output information.In the present exemplary embodiment, by the ResNet18 conduct for removing softmax
The network of feature is extracted, it is corresponding to export a series of multiple key frames by crowd batch which is made of multiple key frames
Multiple feature vectors connect interframe attention network later, obtain the corresponding fusion vector of multiple feature vectors.
Based on this, inter-frame information can be efficiently used by attention network, avoiding in the related technology can be to each independent
Key frame carry out global operation the step of, reduce the waste to computing resource, reduce resource consumption.By fused
Vector can indicate the feature of video to be processed, more accurately so as to more accurately classify.In addition, due to paying attention to
Power network can efficiently use inter-frame information, therefore can carry out exact classification to video to be processed based on inter-frame information.
After obtaining the trained attention network, entire preset model can be trained, be trained
Good preset model.For example, being finely adjusted to feature extraction network and attention network, until the class of some video to be processed
Until consistent with the classification manually set, to obtain the trained preset model of better performances, to pass through preset model
Improve the precision of visual classification.When being trained to preset model, it can be achieved that training end to end.Training can end to end
To include: that can obtain a prediction result from input terminal to output end, an error can be obtained compared with legitimate reading, this
Error can each layer of transmitting (backpropagation) in a model, each layer of expression can all adjust according to this error, directly
Restraining or get a desired effect to model just terminates.Training is exactly not do other extra process in fact end to end, from original
Data are input to task result output, entire training and prediction process, are completed in model.For example, in entire model
There is no individual models, but a neural network is directly used to be connected from input terminal to output end, this neural network is allowed
Undertake the function of original all modules.By training end to end, reduces operating procedure, improve training effectiveness.
You need to add is that entire trained preset model can be adjusted again to advanced optimize performance,
Specifically include following adjustment mode: the first, based on return loss the trained preset model is compressed, i.e.,
It can be to each layer of progress model beta pruning processing in preset model.Since the parameter of neural network is numerous, but some of them is joined
It is several that final output result is contributed less and the redundancy that seems, it is therefore desirable to cut the parameter of these redundancies.Model beta pruning side
Method can according to weighted value carry out beta pruning method etc..In the present exemplary embodiment, loss can be returned based on LASSO to adjust
The number of channels of whole preset model, to remove, recurrence loss is lesser to influence little channel on classification results, in terms of reducing
Calculation amount.By carrying out beta pruning processing to trained preset model, it is able to ascend the speed of service, and it is big to reduce model file
It is small.
Second, the parameter type of the trained preset model is adjusted.Specifically, the ginseng in preset model
Several classes of types are generally float32, and in the present example embodiment, parameter type can be truncated by float32 as float16, from
And in the case where not influencing to calculate effect, the model scale of construction is reduced, and reduce the consumption to computing resource.
It should be noted that can only carry out model compression in the present exemplary embodiment, parameter type can also be only carried out
Adjustment can also carry out model compression and parameter type adjustment simultaneously, to promote the speed of service, reduce the consumption of computing resource.
Next, in step s 320, determining that the video to be processed belongs to each class according to the fused feature
Other probability, with the classification results according to the determine the probability.
In this step, classification results can belong to the probability of each classification with video to be processed to indicate, specifically can be with thing
One probability threshold value is first set;When probability value is more than or equal to the probability threshold value, it may be determined that video to be processed belongs to such
Not.
After obtaining fused feature, which can be inputted BN layers and be normalized, in turn
Full articulamentum is inputted to classify, further input softmax layers obtain predicted vector, thus according to predicted vector each
Dimension obtains the probability that video to be processed belongs to some classification, to determine its classification results according to probability value.
For example, probability threshold value can be 0.7, be 0.9 when video 1 to be processed belongs to the probability of classification 1, belong to classification
When 2 probability is 0.1, it may be determined that the classification results of video 1 to be processed are classification 1.
In the present exemplary embodiment, the preset model that is made up of residual error network and attention network to video to be processed into
Row classification reduces parameter for the relevant technologies, time-consuming less, while will not lose too many precision.Attention simultaneously
Network is effectively utilized the information between multiple and different key frames, and saves computing resource.
The overall flow figure of visual classification is diagrammatically illustrated in Fig. 4, with reference to shown in Fig. 4, is mainly comprised the steps that
In step S401, video to be processed is carried out to cut frame processing, sparse sampling specifically can be used and extract view to be processed
Multiple key frames of frequency.
In step S402, by the feature extraction network on multiple key frames input basis, feature extraction network herein can
Residual error network ResNet18 is thought, to obtain the vector for indicating feature.
In step S403, it is corresponding for indicating the vector of high dimensional feature to obtain each key frame.
In step s 404, high dimensional feature is inputted into attention network, obtains fused vector.
In step S405, classification results are obtained according to fused vector.Specifically, fused vector is inputted into BN
Layer, full articulamentum and softmax layers, to obtain the probability that video to be processed belongs to each classification, and then according to determine the probability point
Class result.
In conclusion the technical solution in the present exemplary embodiment, first carries out sparse sampling to video to be processed, is closed
Key frame simultaneously carries out feature extraction by residual error network.For the feature extracted, carried out using attention network further
Fusion Features obtain the fusion feature between different key frames, final output prediction result.By this method, reduce defeated
Enter to the parameter of feature extraction network, and since the network structure of feature extraction network is smaller, reduces the parameter of processing
Quantity avoids the waste of time caused by the feature for extracting all frames of video to be processed in the related technology, it is special to improve extraction
The efficiency and speed of sign.In addition, can merge to the feature of multiple key frames, avoiding in the related technology can be to each list
Only key frame carries out the step of global operation, reduces the waste to computing resource, reduces resource consumption.In addition to this,
It further uses model pruning method to be handled, compact model parameter amount and speed can be promoted.
In the present exemplary embodiment, a kind of visual classification device is additionally provided, refering to what is shown in Fig. 5, the device 500 can wrap
It includes:
Key frame obtains module 501, obtains multiple key frames for carrying out sparse sampling to video to be processed;
Characteristic extracting module 502, for being carried out by the feature extraction network in preset model to the multiple key frame
Processing, to extract the feature of the multiple key frame;
Classification results determining module 503 is used for through attention network trained in the preset model to described more
The feature of a key frame is merged, and is handled fused feature to obtain the classification results of the video to be processed.
In a kind of exemplary embodiment of the disclosure, the feature extraction network includes residual error network, feature extraction mould
Block is configured as: using the multiple key frame as one batch, and described will be criticized in the input residual error network, described in extracting
The feature of multiple key frames.
In a kind of exemplary embodiment of the disclosure, classification results determining module includes: Fusion Features module, and being used for will
The feature of the multiple key frame inputs the trained attention network, obtains fused feature;Probability evaluation entity,
For determining that the video to be processed belongs to the probability of each classification according to the fused feature, with true according to the probability
The fixed classification results.
In a kind of exemplary embodiment of the disclosure, the feature of the multiple key frame is inputted into the trained note
Meaning power network, before obtaining fused feature, described device further include: network training module is used for the residual error network
It is fixed, and the attention network is trained, to obtain the trained attention network.
In a kind of exemplary embodiment of the disclosure, described device further include: preset model training module, for obtaining
To after the trained attention network, the preset model is trained, trained preset model is obtained.
In a kind of exemplary embodiment of the disclosure, preset model training module includes: Training Control module, for pair
The preset model is trained end to end, to obtain the trained preset model.
In a kind of exemplary embodiment of the disclosure, described device further include: model compression module, for based on recurrence
The trained preset model is compressed in loss;And/or parameter adjustment module, for the trained default mould
The parameter type of type is adjusted.
It should be noted that the detail of each module carries out in corresponding method in above-mentioned visual classification device
It elaborates, therefore details are not described herein again.
It should be noted that although being referred to several modules or list for acting the equipment executed in the above detailed description
Member, but this division is not enforceable.In fact, according to embodiment of the present disclosure, it is above-described two or more
Module or the feature and function of unit can embody in a module or unit.Conversely, an above-described mould
The feature and function of block or unit can be to be embodied by multiple modules or unit with further division.
In addition, although describing each step of method in the disclosure in the accompanying drawings with particular order, this does not really want
These steps must be executed in this particular order by asking or implying, or having to carry out step shown in whole could realize
Desired result.Additional or alternative, it is convenient to omit multiple steps are merged into a step and executed by certain steps, and/
Or a step is decomposed into execution of multiple steps etc..
In an exemplary embodiment of the disclosure, a kind of electronic equipment that can be realized the above method is additionally provided.
Person of ordinary skill in the field it is understood that various aspects of the invention can be implemented as system, method or
Program product.Therefore, various aspects of the invention can be embodied in the following forms, it may be assumed that complete hardware embodiment, complete
The embodiment combined in terms of full Software Implementation (including firmware, microcode etc.) or hardware and software, can unite here
Referred to as circuit, " module " or " system ".
The electronic equipment 600 of this embodiment according to the present invention is described referring to Fig. 6.The electronics that Fig. 6 is shown
Equipment 600 is only an example, should not function to the embodiment of the present invention and use scope bring any restrictions.
As shown in fig. 6, electronic equipment 600 is showed in the form of universal computing device.The component of electronic equipment 600 can be with
Including but not limited to: at least one above-mentioned processing unit 610, at least one above-mentioned storage unit 620, the different system components of connection
The bus 630 of (including storage unit 620 and processing unit 610).
Wherein, the storage unit is stored with program code, and said program code can be held by the processing unit 610
Row, so that various according to the present invention described in the execution of the processing unit 610 above-mentioned " illustrative methods " part of this specification
The step of illustrative embodiments.For example, the processing unit 610 can execute step as shown in fig. 1.
Storage unit 620 may include the readable medium of volatile memory cell form, such as Random Access Storage Unit
(RAM) 6201 and/or cache memory unit 6202, it can further include read-only memory unit (ROM) 6203.
Storage unit 620 can also include program/utility with one group of (at least one) program module 6205
6204, such program module 6205 includes but is not limited to: operating system, one or more application program, other program moulds
It may include the realization of network environment in block and program data, each of these examples or certain combination.
Bus 630 can be to indicate one of a few class bus structures or a variety of, including storage unit bus or storage
Cell controller, peripheral bus, graphics acceleration port, processing unit use any bus structures in a variety of bus structures
Local bus.
Display unit 640 can be display having a display function, to pass through the display exhibits by processing unit 610
Execute processing result obtained from the method in the present exemplary embodiment.Display include but is not limited to liquid crystal display either
Other displays.
Electronic equipment 600 can also be with one or more external equipments 700 (such as keyboard, sensing equipment, bluetooth equipment
Deng) communication, can also be enabled a user to one or more equipment interact with the electronic equipment 600 communicate, and/or with make
The electronic equipment 600 any equipment (such as the router, modulatedemodulate that can be communicated with one or more of the other calculating equipment
Adjust device etc.) communication.This communication can be carried out by input/output (I/O) interface 650.Also, electronic equipment 600 may be used also
To pass through network adapter 660 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network
Network, such as internet) communication.As shown, network adapter 660 passes through other modules of bus 630 and electronic equipment 600
Communication.It should be understood that although not shown in the drawings, other hardware and/or software module, packet can be used in conjunction with electronic equipment 600
It includes but is not limited to: microcode, device driver, redundant processing unit, external disk drive array, RAID system, magnetic tape drive
Device and data backup storage system etc..
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented
Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to the disclosure
The technical solution of embodiment can be embodied in the form of software products, which can store non-volatile at one
Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are so that a calculating
Equipment (can be personal computer, server, terminal installation or network equipment etc.) is executed according to disclosure embodiment
Method.
In an exemplary embodiment of the disclosure, a kind of computer readable storage medium is additionally provided, energy is stored thereon with
Enough realize the program product of this specification above method.In some possible embodiments, various aspects of the invention may be used also
In the form of being embodied as a kind of program product comprising program code, when described program product is run on the terminal device, institute
Program code is stated for executing the terminal device described in above-mentioned " illustrative methods " part of this specification according to this hair
The step of bright various illustrative embodiments.
The program product for realizing the above method of embodiment according to the present invention can use Portable, compact
Disk read-only memory (CD-ROM) and including program code, and can be run on terminal device, such as PC.However,
Program product of the invention is without being limited thereto, and in this document, readable storage medium storing program for executing, which can be, any includes or storage program has
Shape medium, the program can be commanded execution system, device or device use or in connection.
Described program product can be using any combination of one or more readable mediums.Readable medium can be readable letter
Number medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example can be but be not limited to electricity, magnetic, optical, electromagnetic, infrared ray or
System, device or the device of semiconductor, or any above combination.The more specific example of readable storage medium storing program for executing is (non exhaustive
List) include: electrical connection with one or more conducting wires, portable disc, hard disk, random access memory (RAM), read-only
Memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read only memory
(CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
In carry readable program code.The data-signal of this propagation can take various forms, including but not limited to electromagnetic signal,
Optical signal or above-mentioned any appropriate combination.Readable signal medium can also be any readable Jie other than readable storage medium storing program for executing
Matter, the readable medium can send, propagate or transmit for by instruction execution system, device or device use or and its
The program of combined use.
The program code for including on readable medium can transmit with any suitable medium, including but not limited to wirelessly, have
Line, optical cable, RF etc. or above-mentioned any appropriate combination.
The program for executing operation of the present invention can be write with any combination of one or more programming languages
Code, described program design language include object oriented program language-Java, C++ etc., further include conventional
Procedural programming language-such as " C " language or similar programming language.Program code can be fully in user
It calculates and executes in equipment, partly executes on a user device, being executed as an independent software package, partially in user's calculating
Upper side point is executed on a remote computing or is executed in remote computing device or server completely.It is being related to far
Journey calculates in the situation of equipment, and remote computing device can pass through the network of any kind, including local area network (LAN) or wide area network
(WAN), it is connected to user calculating equipment, or, it may be connected to external computing device (such as utilize ISP
To be connected by internet).
In addition, above-mentioned attached drawing is only the schematic theory of processing included by method according to an exemplary embodiment of the present invention
It is bright, rather than limit purpose.It can be readily appreciated that the time that above-mentioned processing shown in the drawings did not indicated or limited these processing is suitable
Sequence.In addition, be also easy to understand, these processing, which can be, for example either synchronously or asynchronously to be executed in multiple modules.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the disclosure
His embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or
Adaptive change follow the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure or
Conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by claim
It points out.
It should be understood that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and
And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited by the attached claims.
Claims (10)
1. a kind of video classification methods characterized by comprising
Sparse sampling is carried out to video to be processed and obtains multiple key frames;
The multiple key frame is handled by the feature extraction network in preset model, to extract the multiple key frame
Feature;
The feature of the multiple key frame is merged by attention network trained in the preset model, and to melting
Feature after conjunction is handled to obtain the classification results of the video to be processed.
2. video classification methods according to claim 1, which is characterized in that the feature extraction network includes residual error net
Network is handled the multiple key frame by the feature extraction network in preset model, to extract the multiple key frame
Feature include:
It using the multiple key frame as one batch, and described will criticize in the input residual error network, to extract the multiple pass
The feature of key frame.
3. video classification methods according to claim 1, which is characterized in that pass through trained note in the preset model
Meaning power network merges the feature of the multiple key frame, and is handled to obtain to fused feature described to be processed
The classification results of video include:
The feature of the multiple key frame is inputted into the trained attention network, obtains fused feature;
Determine that the video to be processed belongs to the probability of each classification according to the fused feature, with true according to the probability
The fixed classification results.
4. video classification methods according to claim 1, which is characterized in that the feature of the multiple key frame is inputted institute
Trained attention network is stated, before obtaining fused feature, the method also includes:
The residual error network is fixed, and the attention network is trained, to obtain the trained attention net
Network.
5. video classification methods according to claim 1, which is characterized in that the method also includes:
After obtaining the trained attention network, the preset model is trained, is obtained trained default
Model.
6. video classification methods according to claim 5, which is characterized in that be trained, obtain to the preset model
Trained preset model includes:
The preset model is trained end to end, to obtain the trained preset model.
7. video classification methods according to claim 5, which is characterized in that the method also includes:
The trained preset model is compressed based on loss is returned;And/or
The parameter type of the trained preset model is adjusted.
8. a kind of visual classification device characterized by comprising
Key frame obtains module, obtains multiple key frames for carrying out sparse sampling to video to be processed;
Characteristic extracting module, for being handled by the feature extraction network in preset model the multiple key frame, with
Extract the feature of the multiple key frame;
Classification results determining module is used for through attention network trained in the preset model to the multiple key frame
Feature merged, and fused feature is handled to obtain the classification results of the video to be processed.
9. a kind of electronic equipment characterized by comprising
Processor;And
Memory, for storing the executable instruction of the processor;
Wherein, the processor is configured to come described in perform claim requirement 1-7 any one via the execution executable instruction
Video classification methods.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program
Video classification methods described in claim 1-7 any one are realized when being executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910562350.XA CN110347873B (en) | 2019-06-26 | 2019-06-26 | Video classification method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910562350.XA CN110347873B (en) | 2019-06-26 | 2019-06-26 | Video classification method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110347873A true CN110347873A (en) | 2019-10-18 |
CN110347873B CN110347873B (en) | 2023-04-07 |
Family
ID=68183260
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910562350.XA Active CN110347873B (en) | 2019-06-26 | 2019-06-26 | Video classification method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110347873B (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111026915A (en) * | 2019-11-25 | 2020-04-17 | Oppo广东移动通信有限公司 | Video classification method, video classification device, storage medium and electronic equipment |
CN111160191A (en) * | 2019-12-23 | 2020-05-15 | 腾讯科技(深圳)有限公司 | Video key frame extraction method and device and storage medium |
CN111177460A (en) * | 2019-12-20 | 2020-05-19 | 腾讯科技(深圳)有限公司 | Method and device for extracting key frame |
CN111246124A (en) * | 2020-03-09 | 2020-06-05 | 三亚至途科技有限公司 | Multimedia digital fusion method and device |
CN111553169A (en) * | 2020-06-25 | 2020-08-18 | 北京百度网讯科技有限公司 | Pruning method and device of semantic understanding model, electronic equipment and storage medium |
CN111611435A (en) * | 2020-04-01 | 2020-09-01 | 中国科学院深圳先进技术研究院 | Video classification method and device and storage medium |
CN111626251A (en) * | 2020-06-02 | 2020-09-04 | Oppo广东移动通信有限公司 | Video classification method, video classification device and electronic equipment |
CN111680624A (en) * | 2020-06-08 | 2020-09-18 | 上海眼控科技股份有限公司 | Behavior detection method, electronic device, and storage medium |
CN111737520A (en) * | 2020-06-22 | 2020-10-02 | Oppo广东移动通信有限公司 | Video classification method, video classification device, electronic equipment and storage medium |
CN112000842A (en) * | 2020-08-31 | 2020-11-27 | 北京字节跳动网络技术有限公司 | Video processing method and device |
CN112232164A (en) * | 2020-10-10 | 2021-01-15 | 腾讯科技(深圳)有限公司 | Video classification method and device |
CN112613308A (en) * | 2020-12-17 | 2021-04-06 | 中国平安人寿保险股份有限公司 | User intention identification method and device, terminal equipment and storage medium |
CN112863650A (en) * | 2021-01-06 | 2021-05-28 | 中国人民解放军陆军军医大学第二附属医院 | Cardiomyopathy identification system based on convolution and long-short term memory neural network |
CN113065533A (en) * | 2021-06-01 | 2021-07-02 | 北京达佳互联信息技术有限公司 | Feature extraction model generation method and device, electronic equipment and storage medium |
CN113191401A (en) * | 2021-04-14 | 2021-07-30 | 中国海洋大学 | Method and device for three-dimensional model recognition based on visual saliency sharing |
CN115311584A (en) * | 2022-08-15 | 2022-11-08 | 贵州电网有限责任公司 | Unmanned aerial vehicle high-voltage power grid video inspection floating hanging method based on deep learning |
CN115376052A (en) * | 2022-10-26 | 2022-11-22 | 山东百盟信息技术有限公司 | Long video classification method based on key frame sampling and multi-scale dense network |
CN116824641A (en) * | 2023-08-29 | 2023-09-29 | 卡奥斯工业智能研究院(青岛)有限公司 | Gesture classification method, device, equipment and computer storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109359592A (en) * | 2018-10-16 | 2019-02-19 | 北京达佳互联信息技术有限公司 | Processing method, device, electronic equipment and the storage medium of video frame |
US20190163981A1 (en) * | 2017-11-28 | 2019-05-30 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and apparatus for extracting video preview, device and computer storage medium |
CN109862391A (en) * | 2019-03-18 | 2019-06-07 | 网易(杭州)网络有限公司 | Video classification methods, medium, device and calculating equipment |
-
2019
- 2019-06-26 CN CN201910562350.XA patent/CN110347873B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190163981A1 (en) * | 2017-11-28 | 2019-05-30 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and apparatus for extracting video preview, device and computer storage medium |
CN109359592A (en) * | 2018-10-16 | 2019-02-19 | 北京达佳互联信息技术有限公司 | Processing method, device, electronic equipment and the storage medium of video frame |
CN109862391A (en) * | 2019-03-18 | 2019-06-07 | 网易(杭州)网络有限公司 | Video classification methods, medium, device and calculating equipment |
Non-Patent Citations (1)
Title |
---|
吴昌等: "一种新的基于RVM的视频关键帧语义提取算法", 《计算机应用研究》 * |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111026915A (en) * | 2019-11-25 | 2020-04-17 | Oppo广东移动通信有限公司 | Video classification method, video classification device, storage medium and electronic equipment |
CN111026915B (en) * | 2019-11-25 | 2023-09-15 | Oppo广东移动通信有限公司 | Video classification method, video classification device, storage medium and electronic equipment |
CN111177460A (en) * | 2019-12-20 | 2020-05-19 | 腾讯科技(深圳)有限公司 | Method and device for extracting key frame |
CN111160191B (en) * | 2019-12-23 | 2024-05-14 | 腾讯科技(深圳)有限公司 | Video key frame extraction method, device and storage medium |
CN111160191A (en) * | 2019-12-23 | 2020-05-15 | 腾讯科技(深圳)有限公司 | Video key frame extraction method and device and storage medium |
CN111246124A (en) * | 2020-03-09 | 2020-06-05 | 三亚至途科技有限公司 | Multimedia digital fusion method and device |
CN111611435A (en) * | 2020-04-01 | 2020-09-01 | 中国科学院深圳先进技术研究院 | Video classification method and device and storage medium |
CN111626251A (en) * | 2020-06-02 | 2020-09-04 | Oppo广东移动通信有限公司 | Video classification method, video classification device and electronic equipment |
CN111680624A (en) * | 2020-06-08 | 2020-09-18 | 上海眼控科技股份有限公司 | Behavior detection method, electronic device, and storage medium |
CN111737520A (en) * | 2020-06-22 | 2020-10-02 | Oppo广东移动通信有限公司 | Video classification method, video classification device, electronic equipment and storage medium |
CN111737520B (en) * | 2020-06-22 | 2023-07-25 | Oppo广东移动通信有限公司 | Video classification method, video classification device, electronic equipment and storage medium |
CN111553169B (en) * | 2020-06-25 | 2023-08-25 | 北京百度网讯科技有限公司 | Pruning method and device of semantic understanding model, electronic equipment and storage medium |
CN111553169A (en) * | 2020-06-25 | 2020-08-18 | 北京百度网讯科技有限公司 | Pruning method and device of semantic understanding model, electronic equipment and storage medium |
CN112000842A (en) * | 2020-08-31 | 2020-11-27 | 北京字节跳动网络技术有限公司 | Video processing method and device |
CN112232164A (en) * | 2020-10-10 | 2021-01-15 | 腾讯科技(深圳)有限公司 | Video classification method and device |
CN112613308B (en) * | 2020-12-17 | 2023-07-25 | 中国平安人寿保险股份有限公司 | User intention recognition method, device, terminal equipment and storage medium |
CN112613308A (en) * | 2020-12-17 | 2021-04-06 | 中国平安人寿保险股份有限公司 | User intention identification method and device, terminal equipment and storage medium |
CN112863650A (en) * | 2021-01-06 | 2021-05-28 | 中国人民解放军陆军军医大学第二附属医院 | Cardiomyopathy identification system based on convolution and long-short term memory neural network |
CN113191401A (en) * | 2021-04-14 | 2021-07-30 | 中国海洋大学 | Method and device for three-dimensional model recognition based on visual saliency sharing |
CN113065533A (en) * | 2021-06-01 | 2021-07-02 | 北京达佳互联信息技术有限公司 | Feature extraction model generation method and device, electronic equipment and storage medium |
CN115311584A (en) * | 2022-08-15 | 2022-11-08 | 贵州电网有限责任公司 | Unmanned aerial vehicle high-voltage power grid video inspection floating hanging method based on deep learning |
CN115376052A (en) * | 2022-10-26 | 2022-11-22 | 山东百盟信息技术有限公司 | Long video classification method based on key frame sampling and multi-scale dense network |
CN116824641A (en) * | 2023-08-29 | 2023-09-29 | 卡奥斯工业智能研究院(青岛)有限公司 | Gesture classification method, device, equipment and computer storage medium |
CN116824641B (en) * | 2023-08-29 | 2024-01-09 | 卡奥斯工业智能研究院(青岛)有限公司 | Gesture classification method, device, equipment and computer storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110347873B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110347873A (en) | Video classification methods, device, electronic equipment and storage medium | |
EP4024232A1 (en) | Text processing model training method, and text processing method and apparatus | |
CN107423376B (en) | Supervised deep hash rapid picture retrieval method and system | |
JP2022058915A (en) | Method and device for training image recognition model, method and device for recognizing image, electronic device, storage medium, and computer program | |
CN109783655A (en) | A kind of cross-module state search method, device, computer equipment and storage medium | |
CN110599492A (en) | Training method and device for image segmentation model, electronic equipment and storage medium | |
CN110569359B (en) | Training and application method and device of recognition model, computing equipment and storage medium | |
US11423307B2 (en) | Taxonomy construction via graph-based cross-domain knowledge transfer | |
KR101828215B1 (en) | A method and apparatus for learning cyclic state transition model on long short term memory network | |
CN114743196B (en) | Text recognition method and device and neural network training method | |
US20220374678A1 (en) | Method for determining pre-training model, electronic device and storage medium | |
CN111178036B (en) | Text similarity matching model compression method and system for knowledge distillation | |
CN109697724A (en) | Video Image Segmentation method and device, storage medium, electronic equipment | |
US20230009547A1 (en) | Method and apparatus for detecting object based on video, electronic device and storage medium | |
CN113723378B (en) | Model training method and device, computer equipment and storage medium | |
CN116152833B (en) | Training method of form restoration model based on image and form restoration method | |
CN114715145B (en) | Trajectory prediction method, device and equipment and automatic driving vehicle | |
CN115455171A (en) | Method, device, equipment and medium for mutual retrieval and model training of text videos | |
CN112560499A (en) | Pre-training method and device of semantic representation model, electronic equipment and storage medium | |
CN110728359B (en) | Method, device, equipment and storage medium for searching model structure | |
JP7390442B2 (en) | Training method, device, device, storage medium and program for document processing model | |
CN114419327B (en) | Image detection method and training method and device of image detection model | |
US20240038223A1 (en) | Speech recognition method and apparatus | |
CN116310925A (en) | Video counting method, device and equipment for building materials and storage medium | |
CN113592074A (en) | Training method, generating method and device, and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |