CN109002812A

CN109002812A - A kind of method and device of intelligent recognition video cover

Info

Publication number: CN109002812A
Application number: CN201810895066.XA
Authority: CN
Inventors: 陈长伟; 杨晓亮; 田丹
Original assignee: Beijing Future Media Polytron Technologies Inc
Current assignee: Beijing Future Media Polytron Technologies Inc
Priority date: 2018-08-08
Filing date: 2018-08-08
Publication date: 2018-12-14

Abstract

The invention discloses a kind of method and devices of intelligent recognition video cover, comprising: by video input to be identified into content recognition model, classifies to each video frame of video to be identified；The content recognition model is by obtaining after preset machine learning algorithm and different classes of video data model training；It is filtered out from sorted video frame and meets at least one video frame for presetting excellent degree；From meeting the target video frame for filtering out image quality at least one video frame for presetting excellent degree and meeting preset condition, and using the target video frame as the cover of video to be identified.In such manner, it is possible to extract more excellent, image quality preferably video frame as video cover.Also it just solves in the prior art that video cover can not embody the splendid contents of video, does not have targetedly problem.

Description

A kind of method and device of intelligent recognition video cover

Technical field

The present invention relates to machine learning field more particularly to a kind of method and devices of intelligent recognition video cover.

Background technique

Currently, the positive strength of internet video public attention, the various novel UGC modes such as especially short-sighted frequency, net cast are firmly The psychology for having caught consumer becomes the golden sharp weapon of the another suction in internet.Wherein, in order to attract user, video popularity is promoted, Usually require the cover that performance video content is generated for each video.

In the prior art, usually random to select some video frame as video cover, or choose using default First frame is as video cover.But the method provided in the prior art, the splendid contents of video, the body of user can not be embodied The property tested is poor.

Summary of the invention

In view of this, can be extracted the embodiment of the invention provides a kind of method and device of intelligent recognition video cover More excellent out, image quality preferably video frame is as video cover.

The embodiment of the invention discloses a kind of methods of intelligent recognition video cover, comprising:

By video input to be identified into content recognition model, each video frame of the video to be identified is carried out Classification；The content recognition model is by obtaining after preset machine learning algorithm and different classes of video data model training It arrives；

It is filtered out from sorted video frame and meets at least one video frame for presetting excellent degree；

From meet filtered out at least one video frame for presetting excellent degree image quality meet preset condition target view Frequency frame, and using the target video frame as the cover of video to be identified.

Optionally, it is described by video input to be identified into content recognition model, to the every of the video to be identified A video frame is classified, comprising:

The video to be identified is pre-processed in the data input layer of the content recognition model；

Each video frame images in the video to be identified are drawn in the convolutional calculation layer of the content recognition model It is divided into multiple regions；

Determine the class label in each region；

Video frame with the same category label is summarized, the classification results of each video frame in video are obtained.

Optionally, described filter out from sorted video frame meets at least one video frame for presetting excellent degree, packet It includes:

At least one video frame for being greater than preset threshold comprising class label is determined from classification results.

Optionally, described to filter out image quality at least one video frame for presetting excellent degree and meet default item from meeting The target video frame of part, and using the target video frame as the cover of video to be identified, comprising:

The best target video frame of image quality is chosen at least one described video frame frame, as video to be identified Cover.

Optionally, further includes:

Obtain original video sample data；

Classify to the original video sample data, obtains different classes of data model；

According to the different classes of data model and preset machine learning algorithm, training content identification model.

Optionally, further includes:

According to the different classes of data model and preset machine learning algorithm, training content identification model is simultaneously exported The label result of video frame；

Judge whether the content recognition model meets preset training condition；

If the content recognition model does not meet preset training condition, according to the label result of the video frame and default Machine learning algorithm training is reconstructed to the content recognition model, the label of output video frame is as a result, and return to execution Judge whether the content recognition model meets preset training condition；

If the content recognition model meets preset training condition, content recognition model is exported.

Optionally, the preset machine learning algorithm is random forests algorithm.

The embodiment of the invention also discloses a kind of devices of intelligent recognition video cover, comprising:

Content, classification module is used for by video input to be identified into content recognition model, to the view to be identified Each video frame of frequency is classified；The content recognition model is to pass through preset machine learning algorithm and different classes of view Frequency is according to obtaining after model training；

Excellent degree screening module meets at least one view for presetting excellent degree for filtering out from sorted video frame Frequency frame；

Image quality screening module, for filtering out image quality at least one video frame for presetting excellent degree and meeting from meeting The target video frame of preset condition, and using the target video frame as the cover of video to be identified.

Optionally, the content, classification module, comprising:

Submodule is pre-processed, the video to be identified is carried out for the data input layer in the content recognition model Pretreatment；

Region division submodule, for the convolutional calculation layer in the content recognition model in the video to be identified Each video frame images be divided into multiple regions；

Class label determines submodule, for determining the class label in each region；

Classification results output sub-module obtains in video for will have the video frame of the same category label to summarize The classification results of each video frame.

Optionally, further includes:

Sample data obtains module, for obtaining original video sample data；

First categorization module obtains different classes of data for classifying to the original video sample data Model；

Training module, for according to the different classes of data model and preset machine learning algorithm, training content Identification model.

The embodiment of the invention discloses a kind of method and devices of intelligent recognition video cover, comprising: by view to be identified Frequency is input in content recognition model, is classified to each video frame of video to be identified；The content recognition model is logical It crosses after preset machine learning algorithm and the training of different classes of video data model and to obtain；It is sieved from sorted video frame It selects and meets at least one video frame for presetting excellent degree；Picture is filtered out from meeting at least one video frame for presetting excellent degree Face quality meets the target video frame of preset condition, and using the target video frame as the cover of video to be identified.Thus may be used Know, is understood first by content of the content recognition model to video frame images each in video, obtain video frame images Classification results, and the higher candidate cover of excellent degree is determined according to the classification results, then filters out picture from candidate's cover Face quality is preferably used as the cover of video.In such manner, it is possible to extract more excellent, image quality preferably video frame as view Frequency cover.Also it just solves in the prior art that video cover can not embody the splendid contents of video, does not have targetedly problem.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.

Fig. 1 shows a kind of flow diagram of the method for intelligent recognition video cover provided in an embodiment of the present invention；

Fig. 2 shows the flow diagrams of the training process of content recognition model a kind of in the embodiment of the present invention；

Fig. 3 shows a kind of another flow diagram of the training process of content recognition model in the embodiment of the present invention；

Fig. 4 shows a kind of structural schematic diagram of the device of intelligent recognition video cover provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

With reference to Fig. 1, a kind of process signal of the method for intelligent recognition video cover provided in an embodiment of the present invention is shown Figure, in the present embodiment, this method comprises:

S101: by video input to be identified into content recognition model, to each video of the video to be identified Frame is classified；The content recognition model is to be instructed by preset machine learning model and different classes of video data model It is obtained after white silk；

Wherein, it before executing S101, needs to be trained content identification model, the content recognition model after training is used Classify in each video frame in video, determines classification belonging to each video frame.Specific training process can be It hereinafter describes in detail, just repeats no more herein.

Wherein, sorted classification may include: scene (indoor or outdoor), personage, expression, limb action etc..With Family can be set according to the actual situation.

In the present embodiment, due to that comprising personage or may have been wrapped comprising many contents, such as both in a video frame images Containing scene, in order to accurately be classified to the content that video frame includes, video frame images can be divided into different regions, And the classification in each region is determined respectively.Specifically, S101 includes；

S11: in the data input layer of the content recognition model, the video to be identified is pre-processed；

S12: the content recognition model convolutional calculation layer to each video frame figure in the video to be identified As being divided into multiple regions；

S13: the class label in each region is determined；

S14: the video frame with the same category label is summarized, and obtains the classification knot of each video frame in video frame Fruit.

In the present embodiment, pretreated process may include: in S11

It is multiple video frames by the Video Quality Metric；

Multiple video frames of the video are converted into the binary code that computer can identify.

Wherein, it may include more for multiple video frames of video being converted to the process for the binary code that computer can identify Kind mode, such as may include: mean value, normalization and PCA/ albefaction etc..

In the present embodiment, multiple regions can be divided video into using various ways, in the present embodiment without limit It is fixed, such as following several embodiments can be used:

Embodiment one: the quantity in the divided region of video frame images is preset；

According to preset quantity, random divides video frame images.

For example: assuming that the quantity in the region of pre-set division is 5,5 pieces can be divided an image at random The identical or of different sizes region of size.

Embodiment two: video frame images can first be split, be partitioned by the accuracy in order to improve region recognition Different regions；

For example: assuming that including personage and scene in video frame images, can be divided according to character features and scene characteristic The task and scene not being partitioned into video frame images, and then be equivalent to and divided an image into different regions.

In the present embodiment, a video frame is divided into different regions, and each region may belong to identical classification, Different classifications may be belonged to.In this way, a video frame images may include multiple class labels.It, will be identical in the present embodiment Classification be associated, i.e., the video frame of the same category label is gathered, in this way, being achieved that view each in video Frequency frame is classified.

In the present embodiment, it is assumed that content recognition model is calculated by random forests algorithm, to content recognition mould The process that type is trained is equivalent to the process of building decision tree, when some region to video frame is classified, by interior Hold the decision tree constructed in identification model, determines classification belonging to the region of the video frame, i.e., each decision tree is to the area Domain is voted, by the most a kind of classification results as prediction of ballot.

In the present embodiment, content recognition model, which is equivalent to, carries out content understanding to video frame images, determines video frame figure The content for including as in, and then the excellent degree of video frame can be determined by the content for including in image.

S102: it is filtered out from sorted video frame and meets at least one video frame for presetting excellent degree；

In the present embodiment, excellent degree can be the rule that technical staff pre-establishes, in the present embodiment without limiting.

Under normal circumstances, the content of video frame images the excellent more easily causes the interest of user, often in the present embodiment A class label indicates that unused video content indicates the video it follows that the class label that video frame images include Frame image is abundanter, and the content of video frame images is also more excellent.Therefore, can by comprising the more video frame of class label As candidate video cover.Specifically, S102 includes:

For example: assuming that the content that video frame A includes is the picture of major part son happiness to play out, video frame B is outdoor scenic picture, and video frame C is the picture of major part son；The class label that content recognition mode includes includes: people Object, expression, limb action and scene etc..After content recognition model, the class label of video frame A can be with by video frame A, B and C It include: personage, scene, expression (happiness), limb action etc.；The class label of video frame B may include: scene；Video frame C's Class label may include: personage.It follows that the class label that video frame C includes is most, and video frame C include it is interior Hold content that is also richer, can more embodying video, also can more cause the interest of viewer.

S103: from meeting the mesh for filtering out image quality at least one video frame for presetting excellent degree and meeting preset condition Video frame is marked, and using the target video frame as the cover of video frame to be identified.

What is filtered out in S102 meets at least one video frame for presetting excellent degree, include multiple video frames the case where Under, the picture quality of different video frames is different, in the present embodiment, the selection condition of picture quality can be preset, and Using qualified video frame as cover.

Wherein, the screening of picture quality may be set according to actual conditions, and under normal circumstances, picture quality is higher, user Experience it is better, therefore can select image quality in video frame it is best be used as cover, specifically, including:

The best target video frame of image quality is chosen at least one described video frame, as video frame to be identified Cover.

In the present embodiment, the content of video frame images each in video is understood first, obtains video frame images Classification results, and the higher candidate cover of excellent degree is determined according to the classification results, then filters out picture from candidate's cover Face quality is preferably used as the cover of video.In such manner, it is possible to extract more excellent, image quality preferably video frame as view Frequency cover.Also it just solves in the prior art that video cover can not embody the splendid contents of video, does not have targetedly problem.

With reference to Fig. 2, the flow diagram of the training process of content recognition model in the embodiment of the present invention is shown, in this reality It applies in example, this method comprises:

S201: original video sample data are obtained；

It include a large amount of video information in original video sample data in the present embodiment.Wherein, original video sample number According to source include a variety of: video data is obtained from other site databases in such a way that network crawls, from preset view Video data etc. is obtained in frequency information material library.

S202: classify according to preset classification information to the original video sample data, obtain different classes of Data model.

In the present embodiment, in order to classify to video sample data, need to predefine classification information, wherein The determination of classification information can be what technical staff was set according to actual conditions, such as may include: scene, personage, facial table Feelings (glad, sad, cry), limb action etc..

Wherein, technical staff manually can carry out mark to original video sample data, that is, mark the every of video The classification of a video frame belongs to the attribute information for gathering to obtain some class models and the category of identical classification Property information can be understood as the description information to class models.

S203: according to different the categorical data model and preset machine learning algorithm, training content identification model.

Can be using any machine learning algorithm training content identification model in the present embodiment, or use a variety of groups The machine learning algorithm training content identification model of conjunction, in the present embodiment without limiting.

Where it is assumed that preset machine learning algorithm is random forests algorithm, it may include that following method knows content Other model is trained:

Firstly, in order to make it easy to understand, first carrying out following introduction to random forest:

The decision tree of many of random forest.We classify an input sample, it would be desirable to will input Sample, which is input in each tree, classifies.It beats a metaphor for image: holding a meeting in forest, it is old that some animal, which is discussed, on earth Mouse or squirrel, each tree will independently deliver oneself view to this problem, that is, each tree will vote.This is dynamic Object is mouse or squirrel on earth, to be determined according to ballot situation, and the classification knot that the most classification of poll is exactly forest is obtained Fruit.Each tree in forest be all it is independent, the 99.9% incoherent prediction result made set covers all situations, these Prediction result will cancel each other out.The prediction result of a small number of outstanding trees will be unconventional in numerous " noise ", make one it is good Prediction.The classification results of several Weak Classifiers are subjected to ballot selection, thus one strong classifier of composition, here it is random gloomy The thought of woods bagging.

Wherein, the following S21-S22 of the training process of decision tree in random forest:

Know it should be noted that different classes of data model obtained above is equivalent to as training sample training content Other model.

S21: the method that sampling with replacement is taken from training sample randomly selects autonomous sample set, and repeats K times and obtain K Autonomous sample set；(K is positive integer)

S22: according to one decision tree of each autonomous sample set training；

Specifically, being directed to each autonomous sample set training process and may include:

Assuming that including M input feature vector in the autonomous sample set, selected at random from M feature at each node of tree Mtry feature carries out dendritic growth according to a feature is selected in node impurity level minimum principle and this mtry feature, directly It can accurately classification based training collection or all properties be used to this tree.

Wherein, during to entire Forest Growth, mtry is kept constant, that is, is used when training any decision tree Mtry be identical.

Also, in order to allow decision tree to reach minimum deviation and High Defferential, and it can also reach and sufficiently grow, make each node Impurity level reach minimum, without cut operator.

In the present embodiment, can by each decision tree in the method training content identification model of above-mentioned introduction, but Being, since there are many content for including, may include simultaneously scene and personage, in order to more accurately to view in each video frame images Frequency frame is classified, and video frame can be divided into different regions in the present embodiment, classified to each region, specifically , comprising:

The training sample is pre-processed；

Each video frame in the training sample is divided into different regions；

Determine the class label in each region of video frame；

According to the class label in each region, the video frame images of same label are summarized, obtain the classification of video As a result.

It should be noted that the above process is consistent with S11-S14, the present embodiment repeats no more herein.

In the present embodiment, by above-mentioned training process, the content recognition model for classifying to video is obtained, with Understood by content of the content recognition model to video frame images each in video, obtains the classification knot of video frame images Fruit, and the higher candidate cover of excellent degree is determined according to the classification results, then filters out image quality from candidate's cover Cover preferably as video.It is sealed in such manner, it is possible to extract more excellent, image quality preferably video frame as video Face.Also it just solves in the prior art that video cover can not embody the splendid contents of video, does not have targetedly problem.

In another embodiment, in order to improve the accuracy of content recognition category of model, to content identification model into When row training, training constantly can be reconstructed to content identification model, specifically by the learning network of building many levels , with reference to Fig. 3, further includes:

S301: according to the different classes of data model and preset machine learning algorithm, training content identification model And the label result of output video frame；

S302: judge whether the content recognition model meets preset training condition；

S303: if the content recognition model does not meet preset training condition, the label result according to the video frame Training is reconstructed to the content recognition model with preset machine learning algorithm, the label of output video frame is as a result, and return Receipt row judges whether the content recognition model meets preset training condition；

S304: if the content recognition model meets preset training condition, content recognition model is exported.

In the present embodiment, preset training condition can be what technical staff was set according to actual conditions, such as can be The accuracy of content recognition model, or the number of iterations for preset training.

It is understood that S301 was obtained by different classes of data model and the training of preset machine learning algorithm Content recognition model can be expressed as first layer learning network, and the label result for the output that this training obtains can be used as down Once trained input sample data；Label result and preset study when training next time, according to this training output Algorithm constructs next layer of learning network, and so on, deeper learning network is constructed, and then obtain classifying more accurately Content recognition model.

In the present embodiment, by constructing deeper learning network, constantly content identification model is trained, is obtained The higher content recognition model of classification accuracy is arrived.

With reference to Fig. 4, a kind of structural representation of the device of intelligent recognition video cover provided in an embodiment of the present invention is shown Figure, in the present embodiment, which includes:

Content, classification module 401 is used for by video input to be identified into content recognition model, to described to be identified Each video frame of video is classified；The content recognition model is by preset machine learning algorithm and different classes of It is obtained after video data model training；

Excellent degree screening module 402 meets preset excellent degree at least one for filtering out from sorted video frame A video frame；

Image quality screening module 403, for filtering out image quality from meeting at least one video frame for presetting excellent degree Meet the target video frame of preset condition, and using the target video frame as the cover of video to be identified.

Optionally, the content, classification module, comprising:

Optionally, the excellent degree screening module, comprising:

Excellent degree screens submodule, is greater than preset threshold at least comprising class label for determining from classification results One video frame.

Optionally, the image quality screening module, comprising:

Image quality screens submodule, for choosing the best target video of image quality at least one described video frame frame Frame, the cover as video to be identified.

Optionally, further includes:

Sample data obtains module, for obtaining original video sample data；

Categorization module obtains different classes of data model for classifying to the original video sample data；

Optionally, further includes:

Second training module, for according to the different classes of data model and preset machine learning algorithm, training The label result of content recognition model and output video frame；

Judgment module, for judging whether the content recognition model meets preset training condition；

Training module is reconstructed, if preset training condition is not met for the content recognition model, according to the video Training, the mark of output video frame is reconstructed to the content recognition model in the label result of frame and preset machine learning algorithm Note as a result, and return execution judge whether the content recognition model meets preset training condition；

Content recognition model output module, if meeting preset training condition for the content recognition model, in output Hold identification model.

Optionally, the preset machine learning algorithm is random forests algorithm.

Device through this embodiment, first by content recognition model to the contents of video frame images each in video into Row understands, obtains the classification results of video frame images, and determine the higher candidate cover of excellent degree according to the classification results, then The cover that image quality is preferably used as video is filtered out from candidate's cover.In such manner, it is possible to extract more excellent, picture Quality preferably video frame is as video cover.Also the excellent interior of video can not be embodied by just solving video cover in the prior art Hold, does not have targetedly problem.

It should be noted that all the embodiments in this specification are described in a progressive manner, each embodiment weight Point explanation is the difference from other embodiments, and the same or similar parts between the embodiments can be referred to each other.

The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims

1. a kind of method of intelligent recognition video cover characterized by comprising

By video input to be identified into content recognition model, each video frame of the video to be identified is divided Class；The content recognition model is by obtaining after preset machine learning algorithm and different classes of video data model training 's；

From meeting the target video frame for filtering out image quality at least one video frame for presetting excellent degree and meeting preset condition, And using the target video frame as the cover of video to be identified.

2. the method according to claim 1, wherein described by video input to be identified to content recognition model In, classify to each video frame of the video to be identified, comprising:

Each video frame images in the video to be identified are divided into the convolutional calculation layer of the content recognition model Multiple regions；

Determine the class label in each region；

3. according to the method described in claim 2, it is characterized in that, it is described filtered out from sorted video frame meet it is default At least one video frame of excellent degree, comprising:

4. the method according to claim 1, wherein described from meeting at least one video frame for presetting excellent degree In filter out the target video frame that image quality meets preset condition, and using the target video frame as video to be identified Cover, comprising:

The best target video frame of image quality, the envelope as video to be identified are chosen at least one described video frame frame Face.

5. the method according to claim 1, wherein further include:

Obtain original video sample data；

6. the method according to claim 1, wherein further include:

According to the different classes of data model and preset machine learning algorithm, training content identification model simultaneously exports video The label result of frame；

Judge whether the content recognition model meets preset training condition；

Label result and preset machine if the content recognition model does not meet preset training condition, according to the video frame Training is reconstructed to the content recognition model in device learning algorithm, and the label of output video frame is as a result, and returning and executing judgement Whether the content recognition model meets preset training condition；

7. method described in any one of -6 according to claim 1, which is characterized in that the preset machine learning algorithm is Random forests algorithm.

8. a kind of device of intelligent recognition video cover characterized by comprising

Content, classification module is used for by video input to be identified into content recognition model, to the video to be identified Each video frame is classified；The content recognition model is to pass through preset machine learning algorithm and different classes of video counts According to what is obtained after model training；

Excellent degree screening module meets at least one video for presetting excellent degree for filtering out from sorted video frame Frame；

Image quality screening module, for from meet filtered out at least one video frame for presetting excellent degree image quality meet it is default The target video frame of condition, and using the target video frame as the cover of video to be identified.

9. device according to claim 8, which is characterized in that the content, classification module, comprising:

Submodule is pre-processed, the video to be identified is located in advance for the data input layer in the content recognition model Reason；

Region division submodule, for the convolutional calculation layer in the content recognition model to every in the video to be identified A video frame images are divided into multiple regions；

Classification results output sub-module obtains each in video for will have the video frame of the same category label to summarize The classification results of video frame.

10. device according to claim 8, which is characterized in that further include:

Sample data obtains module, for obtaining original video sample data；

First categorization module obtains different classes of data model for classifying to the original video sample data；