CN110505519A

CN110505519A - A kind of video clipping method, electronic equipment and storage medium

Info

Publication number: CN110505519A
Application number: CN201910750378.6A
Authority: CN
Inventors: 张进; 杜欧杰; 莫东松; 赵璐; 张健; 马丹; 钟宜峰; 马晓琳
Original assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2019-08-14
Filing date: 2019-08-14
Publication date: 2019-11-26
Anticipated expiration: 2039-08-14
Also published as: CN110505519B

Abstract

The embodiment of the present invention provides a kind of video clipping method, electronic equipment and storage medium, and method includes: the extraction characteristic frame image from video to be clipped, wherein the characteristic frame image is the image for including pre-set categories picture；The characteristic frame image is input in video clipping model, the testing result of the video clipping model output is obtained, the testing result indicates the excellent degree of the characteristic frame image；According to the testing result, editing is carried out to the video to be clipped；Wherein, the video clipping model is to be obtained according to the video sample training obtained in advance, and the picture frame in the video sample is marked with the weight for indicating video highlight degree.The embodiment of the present invention improves the accuracy of video clipping, and then ensure that the higher excellent degree of institute's editing video.

Description

A kind of video clipping method, electronic equipment and storage medium

Technical field

The present invention relates to video technique field more particularly to a kind of video clipping methods, electronic equipment and storage medium.

Background technique

With the development of user demand and medium technique, the explosion of the quantity of video also exponentially increases, especially body The features such as educating timeliness, the user interactivity of live streaming match meets experience when user watches video, experience live streaming class video Quantity it is more.But often the duration is longer for a sports show race, usually dozens of minutes to several hours, but sight What crowd was concerned about is often the sub-fraction in match, this just needs to carry out editing to video more excellent in broadcasting, It allows users to watch part of concern by featured videos.

But in the prior art mostly by manually carrying out editing to part more excellent in video.And in the mistake of editing Cheng Zhong, which part is more excellent in usually manual judgement video, then to thinking personally that more excellent part cuts Volume.But since everyone is different to the understanding of excellent degree, when this causes to carry out editing to same video, different people institutes The excellent degree that editing obtains part is different, so the excellent degree accuracy for the featured videos for causing institute's editing to obtain compared with It is low.

Summary of the invention

The embodiment of the present invention provides a kind of video clipping method, electronic equipment and storage medium, to solve in the prior art In editing featured videos, the lower problem of the excellent degree accuracy for the featured videos being clipped to.

The embodiment of the present invention provides a kind of video clipping method, comprising:

Characteristic frame image is extracted from video to be clipped, wherein the characteristic frame image is to include pre-set categories picture Image；

The characteristic frame image is input in video clipping model, the detection knot of the video clipping model output is obtained Fruit, the testing result indicate the excellent degree of the characteristic frame image；

According to the testing result, editing is carried out to the video to be clipped；

Wherein, the video clipping model is to be obtained according to the video sample training obtained in advance, in the video sample Picture frame be marked with indicate video highlight degree weight.

The embodiment of the present invention provides a kind of ball live video clipping method, comprising:

From ball video to be clipped extract characteristic frame image, wherein the characteristic frame image be include preset it is ball Scene and the image for presetting ball movement；

The characteristic frame image is input in ball video clipping model, the ball video clipping model output is obtained Testing result, the testing result indicates the excellent degree of the characteristic frame image；

According to the testing result, editing is carried out to the ball video；

Wherein, the ball video clipping model is according to the ball scene video sample obtained in advance and ball movement view Frequency sample training obtains, and the picture frame in the ball scene video sample and ball action video sample is marked with expression respectively The weight of video highlight degree.

The embodiment of the present invention provides a kind of electronic equipment, including memory, processor and storage are on a memory and can be The program run on processor, the processor realize the video clipping method or ball live streaming view when executing described program The step of frequency clipping method.

The embodiment of the present invention provides a kind of non-transient computer readable storage medium, is stored thereon with computer program, should The step of video clipping method or ball live video clipping method is realized when computer program is executed by processor.

Video clipping method and device provided in an embodiment of the present invention, by extracting characteristic frame figure from video to be clipped Then characteristic frame image is input in video clipping model by picture, obtain the testing result of video clipping model output, then root According to testing result, editing is carried out to video to be clipped, is at this time according to the video sample obtained in advance based on video clipping model Training obtains, and the picture frame in video sample is marked with the weight for indicating video highlight degree, improves video clipping model Accuracy and then ensure that institute's editing obtains to ensure that the accuracy of the testing result of video clipping model output Video frame the excellent degree of height.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.

Fig. 1 is the step flow chart of video clipping method in the embodiment of the present invention；

Fig. 2 is that training obtains the flow diagram of video clipping model in the embodiment of the present invention；

Fig. 3 is the structural schematic diagram of electronic equipment in the embodiment of the present invention.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.

As shown in Figure 1, for the step flow chart of video clipping method in the embodiment of the present invention, the video clipping method packet It includes:

Step 101: characteristic frame image is extracted from video to be clipped.

In this step, specifically, need to video carry out editing when, can first obtain video to be clipped, then from Characteristic frame image is extracted in video to be clipped.

Specifically, characteristic frame image is the image for including pre-set categories picture.

In addition, specifically, from video to be clipped extract characteristic frame image when, can be to each of video to be clipped Video frame carries out image recognition, and extracts the characteristic frame image for recognizing pre-set categories picture.

In addition, it should be noted that pre-set categories picture can be carried out according to the property difference of video to be clipped Different restrictions.For example, pre-set categories picture may include in following classifications when video to be clipped is table tennis live video At least one of: sportsman's erect-position, sportsman move footwork, sportsman's shot, sportsman's hitting distance, sportsman's hitting time, instructor in broadcasting Visual angle and subtitle.Certainly, the property of video to be clipped is not limited specifically herein.

In this way, being mentioned by the image that will include pre-set categories picture as characteristic frame image, and from video to be clipped It takes out, when avoiding directly to the video frame to be clipped progress excellent degree identification in video to be clipped, identifies heavy workload, And the situation that invalid identification amount is more.

Step 102: characteristic frame image being input in video clipping model, the detection knot of video clipping model output is obtained Fruit.

In this step, specifically, the present embodiment is trained in advance to obtain video clipping model.Then characteristic frame is being got When image, directly characteristic frame image can be input in the video clipping model, to obtain the output of video clipping model Testing result.

Specifically, testing result indicates the excellent degree of characteristic frame image.

The excellent degree can be indicated with score value, not limit the form of expression of excellent degree specifically herein.

In addition, specifically, video clipping model be according to obtain in advance video sample training obtain, the video sample In picture frame be marked with indicate video highlight degree weight.

Video sample in this way based on the weight for being marked with video highlight degree is trained video clipping model, guarantees The accuracy for the video clipping model trained.

Step 103: according to testing result, editing being carried out to video to be clipped.

In this step, it specifically, when getting the testing result of video clipping model output, can be tied according to detection Fruit carries out editing to video to be clipped.

Wherein,, can according to testing result when carrying out editing to video to be clipped according to testing result, judging characteristic Whether the excellent degree of frame image is greater than excellent degree of presetting, and works as according to the testing result, confirms the characteristic frame image Excellent degree be greater than when presetting excellent degree, video frame corresponding to the characteristic frame image is determined as video to be clipped Then frame carries out editing processing to the video frame to be clipped.

In this way, when the excellent degree of characteristic frame image is greater than and presets excellent degree, by view corresponding to characteristic frame image Frequency frame is determined as video frame to be clipped, then carries out editing processing to video frame to be clipped from video to be clipped, ensure that institute The excellent degree for the video frame being clipped to is higher.

In addition, editing video is obtained specifically, synthesis processing can also be carried out to the video frame being clipped to, so that It can obtain the higher video of excellent degree.

In this way, then characteristic frame image is input to by the present embodiment by extracting characteristic frame image from video to be clipped In video clipping model, obtain video clipping model output testing result, then according to testing result, to video to be clipped into Row editing, at this time based on video clipping model to be obtained according to the video sample training obtained in advance, and the figure in video sample As frame flag have indicate video highlight degree weight, ensure that the accuracy of video clipping model, to ensure that video is cut The accuracy of the testing result of model output is collected, and then ensure that the excellent degree of height for the video frame that institute's editing obtains.

Further, on the basis of the above embodiments, characteristic frame image is being input to video clipping mould by the present embodiment In type, obtain video clipping model output testing result before, it is also necessary to default neural network model is trained, is obtained Video clipping model.

Specifically, video clipping model can also be obtained according to audio sample corresponding with video sample training, audio Audio frame in sample be marked with indicate video highlight degree label, and label according to pre-set preset sound type into Row setting.

In this way, indicating that the label of video highlight degree is configured according to pre-set preset sound type, based on view Sound in frequency is actual response of the spectators to video highlight degree, so that when label is arranged according to preset sound type, energy Enough accuracys for guaranteeing set label, and ensure that high efficiency when label is set.

For example, it is assumed that video to be clipped is table tennis live video, the preset sound type includes in following types At least one of: table tennis hits sound, sportsman and coach's sound, spectators' sound, referee's sound and mute.At this point, One label for indicating excellent degree can be set for above-mentioned every class preset sound type, then in audio sample, to sound Each audio frame in frequency sample carries out the identification of sound type, and marks corresponding with the preset sound type recognized Label.

In addition, specifically, based on audio sample it is corresponding with video sample, therefore can using the label of audio sample come The excellent degree of reflecting video sample, and video clipping model is further obtained using audio sample training, at this time based on above-mentioned The accuracy of the label label of audio sample, so that passing through video sample and audio sample corresponding with video sample training When obtaining video clipping model, the accuracy of video clipping model ensure that, to ensure that the output of video clipping model The accuracy of characteristic frame image detection result.

Specifically, as shown in Fig. 2, being trained to default neural network model, when obtaining video clipping model, specifically It may include steps of:

Step 201: according to video sample and audio sample corresponding with video sample, obtain characteristic frame sample image with And audio sample training frames corresponding with characteristic frame sample image.

In this step, specifically, when training obtains video clipping model, it is necessary first to obtain training sample, that is, obtain Take the characteristic frame sample image and audio sample training frames corresponding with characteristic frame sample image in video sample.

Wherein, according to video sample and audio sample corresponding with video sample, obtain characteristic frame sample image with It, can be according to default framing mode, respectively to the view and when corresponding with characteristic frame sample image audio sample training frames Frequency sample and audio sample carry out sub-frame processing, obtain multiple video frame images and corresponding with the multiple video frame images Multiple audio frames；It is then based on pre-set categories picture, the multiple video frame images are identified, obtains the characteristic frame sample This image；Then audio sample training frames corresponding with the characteristic frame sample image are obtained from the multiple audio frame, And it is based on the preset sound type, label corresponding with affiliated preset sound type to audio sample training frame flag.

The above process is illustrated below.

Specifically, the present embodiment is extracted from video sample obtains audio sample.Then, according to default framing mode point It is other when carrying out sub-frame processing to video sample and audio sample, frame cutting can be carried out according to the time, such as unit of ms, often 125ms generates an audio frame, and correspondence generates a video frame images in video sample.Certainly, not specific herein to limit in advance If the concrete form of framing mode, such as framing can also be carried out for 130ms.

In addition, specifically, audio frame after obtaining audio frame, can also be sliced by the present embodiment, to improve To the voice recognition precision of audio frame.For example, can be 5ms being unit, it be sliced in a manner of 50% overlapping, thus one Multistage audio is obtained in a audio frame；It, can be at this time when carrying out voice recognition to audio frame by speech recognition module Multistage audio is identified, to improve the precision of voice recognition.

In addition, specifically, can be drawn based on pre-set pre-set categories after framing obtains multiple video frame images Face carries out the identification of pre-set categories picture to each video frame images that framing obtains, to obtain characteristic frame sample image, then Audio sample training frames corresponding with characteristic frame sample image are obtained from multiple audio frames again.Certainly, pre-set categories picture It can be set according to the property of video sample.Such as when video frame to be clipped is table tennis live video, pre-set categories Picture may include at least one in following scenes: sportsman's erect-position, sportsman move footwork, sportsman's shot, sportsman's batting Distance, sportsman's hitting time, instructor in broadcasting visual angle and subtitle.

Meanwhile the present embodiment is also previously provided with preset sound type, for example including table tennis hit sound, sportsman and Coach's sound, spectators' sound, referee's sound and the types such as mute.The present embodiment is being based on preset sound type at this time, When to audio sample training frame flag label, voice recognition can be carried out to audio sample training frames by speech recognition module, Then preset sound type belonging to the audio sample training frames is determined, finally by the corresponding label mark of the preset sound type It is denoted as the label of the audio sample training frames.The excellent degree of the corresponding video frame sample image of certain tag representation.This reality Applying example indicates the label of excellent degree based on the setting of preset sound type, since sound is spectators to the true of video highlight degree Reaction, therefore ensure that the accuracy of set label, and then ensure that accurately recognizing to the excellent degree of video frame images Know.

In addition, specifically, can also be by audio sample after obtaining audio sample carrying out audio extraction to video sample This generates pulse code modulation (Pulse Code Modulation, PCM) binary file by analog-to-digital conversion.Specifically, When generating PCM binary file, it can sample and quantify, i.e., with certain sample rate and sampling resolution by sound continuous wave It is converted into discrete data point, and is turned the audio file of MP3 format with 16KHz sample frequency using computer program ffmpeg 16 mono PCM files are changed into, to improve accuracy when audio sample identification.

Step 202: the corresponding initial weight of the excellent degree of different scenes in pre-set pre-set categories picture is substituted into In default neural network model, and characteristic frame sample image is input in default neural network model, obtains default nerve net The image result of network model output.

Specifically, the present embodiment is also previously provided in default neural network model and pre-set categories picture different scenes The corresponding initial weight of excellent degree.It in this step, can be corresponding by the excellent degree of different scenes in pre-set categories picture Initial weight substitutes into default neural network model, and characteristic frame sample image is input in default neural network model, obtains The image result exported to default neural network model, certain image result indicate the excellent degree of characteristic frame sample image.

Specifically, image result can be indicated with following formula:Wherein, n indicates pre- If total classification of classification picture, F (Xⁱ) indicate excellent degree corresponding to every class pre-set categories picture.In the present embodiment, n Value can be 7, i.e., altogether include 7 class pre-set categories pictures.

In this way by the corresponding initial weight of the excellent degree of different scenes in setting pre-set categories picture, enable differentiation between The excellent degree of different scenes in pre-set categories picture so that substitute into have the default neural network model of initial weight into When row characteristic frame sample image identifies, accuracy is higher.

In addition, specifically, the design parameter of default neural network model can be arranged as follows: default neural network model is total 13 layers of convolutional layer are provided with, and the convolution kernel of input layer multiplies 7 for 7, output channel 128；The convolution kernel of the second layer is 7 to multiply 7, defeated Channel is 128 out；The convolution kernel of third layer to eleventh floor multiplies 5 for 5, output channel 512；Floor 12 and output layer are complete Articulamentum adds softmax layers.

Step 203: default refreshing to being substituting to according to label corresponding to described image result and audio sample training frames It is trained through the initial weight in network model, obtains video clipping model.

It in this step, can be according to figure specifically, after the image result for obtaining default neural network model output The label as corresponding to result and audio sample training frames carries out the initial weight being substituting in default neural network model Training, to obtain video clipping model.

Specifically, audio sample can be used since audio sample training frames are corresponding with characteristic frame sample image Label corresponding to training frames reflects the excellent degree of characteristic frame sample image.At this point it is possible to by audio sample training frames institute Corresponding label regards the true tag for being characterized frame sample image as, so that in the default neural network model of training, energy Pair enough labels based on characteristic frame sample image and audio sample training frames, are trained default neural network model, i.e., The initial weight being substituting in default neural network model is trained, and then obtains video clipping model.

Wherein, in the label according to corresponding to image result and audio sample training frames, to being substituting to default nerve net Initial weight in network model is trained, can be based on corresponding to audio sample training frames when obtaining video clipping model Label detects the accuracy of image result, right then when detecting the accuracy of image result lower than preset threshold Initial weight is adjusted, and is finally based on the video sample and audio sample, default nerve net adjusted to initial weight Network model carries out accuracy validation, and when the accuracy that verifying obtains described image result is greater than the preset threshold, will generation Enter to have the default neural network model of weight adjusted to be determined as video clipping model.

Specifically, it is corresponding with characteristic frame sample image based on audio sample training frames, therefore audio sample can be instructed Practice label corresponding to frame and regard the true tag for being characterized frame sample image as, it can using the mark of audio sample training frames Label, detect the accuracy of image result.At this point, when detecting the accuracy of image result lower than preset threshold, i.e., When the label of image result and audio sample training frames is not obviously met, illustrate each field being substituting in default neural network model The initial weight setting in face is unreasonable, needs to be adjusted initial weight at this time, has weight adjusted to guarantee to substitute into The precision of default neural network model.It is determined finally, can will then substitute into the default neural network model for having weight adjusted For video clipping model.

Certainly, it should be noted that when being adjusted to initial weight, the adjustment ratio of available initial weight Example；Then when detect the initial weight adjustment ratio be greater than preset ratio threshold value when, forbid to the initial weight into Row adjustment, when the adjustment ratio for detecting the initial weight is less than or equal to the preset ratio threshold value, to described initial Weight is adjusted.

It should be noted that not limiting the specific value of adjustment ratio specifically herein, such as the adjustment ratio can Think 40%, i.e., when the adjustment ratio of initial weight is greater than 40%, forbids being adjusted initial weight, i.e., training at this time Sample does not use.

In this way, whether being adjusted according to adjustment ratio-dependent when being adjusted to initial weight, avoiding invalid instruction Practice interference of the sample to default network neural model training.

The present embodiment is incited somebody to action by presetting the corresponding initial weight of the excellent degree of different scenes in pre-set categories picture Initial weight substitutes into default neural network model, then by audio sample corresponding with video sample to initial weight into Row training, is actual response of the spectators for video highlight degree based on the sound in audio sample, ensure that and train to obtain Weight accuracy, and then ensure that the accuracy for the video clipping model trained.

In addition, specifically, herein by taking table tennis live video as an example, to scenes different in pre-set categories picture and difference The initial weight facilities of scene are illustrated.

Wherein, the pre-set categories picture include in following classifications at least one of: sportsman's erect-position, sportsman move footwork, Sportsman's shot, sportsman's hitting distance, sportsman's hitting time, instructor in broadcasting visual angle and subtitle.

Specifically, sportsman's erect-position includes at least one of following scenes: close-table erect-position, middle close-table erect-position, in it is remote Station position and far from the table erect-position, wherein the far from the table erect-position, close-table erect-position, middle far from the table erect-position and middle close-table erect-position initial weight according to Secondary reduction.Assuming that initial weight be fractional value, at this time close-table erect-position can for sportsman from 40-50 centimetres of billiard table, at this time due to Close-table is usually sportsman's service state, and more excellent, setting initial weight value is 30 points；Middle close-table erect-position can be sportsman From 50-70 centimetres of billiard table, at this time since middle close-table is usually that sportsman receives or toe lift, excellent degree is general, and initial power is arranged Weight values are 15 points；Middle far from the table erect-position can be sportsman from 70-100 centimetres of billiard table, since middle far from the table is usually that sportsman is locked In the stage, excellent degree is general, and setting initial weight value is 20 points；Far from the table erect-position can be sportsman from 70-100 centimetres of billiard table, by In far from the table be usually sportsman smash or follow the ball the stage, excellent degree highest, setting initial weight value be 35.

It includes at least one of following scenes that the sportsman, which moves footwork: single step footwork and step by step method, the footwork that strides, Leapfrog footwork pads method, cross steps footwork, sideway step footwork and random little step footwork step by step, wherein the sideway step footwork, leapfrog step Method, the footwork that strides, cross steps footwork, single step footwork, pad method, random little step footwork and simultaneously the initial weight of method successively drops step by step step by step It is low.For example, it is assumed that initial weight is fractional value, single step footwork is that this footwork is used when fighting back nearly net or body-hit, excellent degree compared with Height, setting initial weight value are 12 points；And method is to move left and right to generally use this footwork when being received step by step, excellent degree one As, setting initial weight value is 6 points；The footwork that strides is more with this footwork when being forehand position wide-angle ball, and excellent degree is higher, if Setting initial weight value is 15 points；Leapfrog footwork is to use when ball is very fast, angle is larger, and excellent degree is higher, and initial weight is arranged Value is 18 points；Method is uses this footwork to pad step by step when receiving or when simple mobile, and generally, setting initial weight value is excellent degree 8 points；Cross steps footwork is for tackling the ball distant from body, and excellent degree is higher, and setting initial weight value is 13 points；Side Method is to generally use this footwork when ball approaches body of sportsmen or ball to batter's backhand to body step by step, excellent degree compared with Height, setting initial weight value are 21 points；Random little step footwork be adjust centre of body weight, receive position and time when the footwork that uses, Excellent degree is general, and setting initial weight value is 7 points.

Sportsman's shot includes at least one of following scenes: swing arm racked swing meets ball racked swing, ball Impact movement, with gesture racked swing and loosening up, wherein meet ball racked swing, swing arm racked swing, ball impact movement, It is successively reduced with the initial weight of gesture racked swing and loosening up.For example, it is assumed that initial weight is fractional value, swing arm is swung the bat dynamic It makes decision shot and striking force, excellent degree is higher, and setting initial weight value is 25 points；Ball racked swing is met to determine back Rotation property, the flying arc of return of serve and the batting route of ball, excellent degree is very high, and setting initial weight value is 30 points；Racket touching Ball movement determines the angle of trajectory of this return of serve, goes out ball speed and rotation property, and excellent degree is higher, and setting initial weight value is 20 Point；Facilitate to guarantee integrality, harmony and the stability of shot, excellent journey in batting ending phase with gesture racked swing Degree is general, and setting initial weight value is 10 points；Loosening up is an of short duration loosening stage for terminating occur of swinging the bat, excellent journey Spend it is lower, setting initial weight value be 5 points.

Sportsman's hitting distance includes at least one of following scenes: short distance batting, middle distance batting and it is long away from From batting, wherein the initial weight of long range batting, short distance batting and middle distance batting successively reduces.For example, it is assumed that Initial weight is fractional value, short distance square stance, and sportsman is usually good at speed, drop point, and excellent degree is higher, and setting is initial Weighted value is 35 points；Middle apart from square stance, sportsman is usually good at push and block, and excellent degree is lower, and setting initial weight value is 25 Point；Long range square stance, sportsman are usually good at strength, rotation, and excellent degree highest, setting initial weight value is 40 points.

Sportsman's hitting time includes at least one of following scenes: rising batting early period, rises later period batting, is high The batting of point phase declines batting early period and the batting of decline later period, wherein decline early period batting, high point phase bat, rise the later period Batting, the initial weight for rising batting early period and declining later period batting successively reduce.For example, it is assumed that initial weight is fractional value, Rise early period when usually pushing away fastly, excellent degree is higher, and setting initial weight value is 20 points；Rising the later period is usually preshoot loop-drive Ball, excellent degree is higher, and setting initial weight value is 21 points；When the high point phase is usually accentuated push, excellent degree is higher, and setting is just Beginning weighted value is 23 points；Decline early period be usually in far from the table attack when, excellent degree highest, setting initial weight value be 25 points； When the decline later period is usually middle off-table chop, excellent degree is general, and setting initial weight value is 11 points.

The instructor in broadcasting visual angle includes at least one of following scenes: big panoramic viewing angle, small panoramic viewing angle, sportsman's feature Visual angle, sportsman's close shot visual angle, player motion feature visual angle and spectators' feature visual angle；Wherein, player motion feature visual angle, Small panoramic viewing angle, sportsman's close shot visual angle, sportsman's feature visual angle, spectators' feature visual angle and big panoramic viewing angle initial weight according to Secondary reduction.For example, it is assumed that initial weight is fractional value, big panoramic viewing angle is usually displayed on the club of upper both sides' match at this time and is poised for battle Name and both sides sportsman are poised for battle list, and excellent degree is lower, and setting initial weight value is 8 points；Sportsman's feature visual angle is to fortune When mobilizing feature, while including players caption strips, show that the sportsman is practising a ball game, excellent degree is lower, and setting is initial Weighted value is 12 points；Sportsman's close shot visual angle indicates that the sportsman is serving a ball, and excellent degree is higher, and setting initial weight value is 20 points；Small panoramic viewing angle indicates that both sides sportsman is refusing to budge, and excellent degree is higher, and setting initial weight value is 23 points；Movement Member's movement feature visual angle shows that sportsman's score or sportsman are lost points, and excellent degree highest, setting initial weight value is 25 points； Spectators' feature visual angle shows camera lens focus not in play, and excellent degree is minimum, and setting initial weight value is 12 points.

The subtitle includes at least one of following scenes: both sides are to system of battle formations subtitle, score subtitle, office point subtitle, skill Art counts subtitle and whole audience stroke analysis subtitle；Wherein office point subtitle, score subtitle both sides are to system of battle formations subtitle, stroke analysis subtitle It is successively reduced with the initial weight of whole audience stroke analysis subtitle.For example, it is assumed that initial weight is fractional value, both sides are to system of battle formations subtitle Show that match not yet starts, excellent degree is lower, and setting initial weight value is 15 points；Score subtitle shows that bout will start, Excellent degree is higher, and setting initial weight value is 30 points；Office point subtitle shows that the last bout of the office starts, excellent degree highest, It is 35 points that initial weight value, which is arranged,；Stroke analysis subtitle indicates to terminate for one innings, and excellent degree is lower, and setting initial weight value is 15 Point；Whole audience stroke analysis subtitle indicates end of match, and excellent degree is minimum, and setting initial weight value is 5 points.

In addition, being illustrated by taking table tennis live video as an example to preset sound type.Specifically, the preset sound Type includes at least one in following types: table tennis hits sound, sportsman and coach's sound, spectators' sound, judge Member's sound and mute；Wherein, it is marked while the table tennis hits sound, sportsman and coach's sound and spectators' sound To be excellent, referee's sound and it is mute be labeled as it is not excellent.

Video clipping method provided in this embodiment then will be special by extracting characteristic frame image from video to be clipped Sign frame image is input in video clipping model, obtains the testing result of video clipping model output, then according to testing result, Editing is carried out to video to be clipped, at this time based on video clipping model to be obtained according to the video sample training obtained in advance, and Picture frame in video sample is marked with the weight for indicating video highlight degree, improves the accuracy of video clipping model, from And it ensure that the accuracy of the testing result of video clipping model output, and then ensure that the height for the video frame that institute's editing obtains Excellent degree.

In addition, the present embodiment also provides a kind of ball live video clipping method, may include steps of:

Step A: extracting characteristic frame image from ball video to be clipped, wherein it includes pre- that the characteristic frame image, which is, If ball scene and the image for presetting ball movement；

Step B: the characteristic frame image is input in ball video clipping model, obtains the ball video clipping mould The testing result of type output, the testing result indicate the excellent degree of the characteristic frame image；

Step C: according to the testing result, editing is carried out to the ball video；

It is referred in above-mentioned video clipping method it should be noted that presetting ball scene and presetting ball movement Citing introduction to pre-set categories picture, is no longer repeated herein；In addition, the method in ball live video clipping method walks Suddenly identical as the method and step in above-mentioned video clipping method, it is no longer repeated herein.

In addition, as shown in figure 3, being the entity structure schematic diagram of electronic equipment provided in an embodiment of the present invention, which sets Standby may include: processor (processor) 310,320, memory communication interface (Communications Interface) (memory) 330 and communication bus 340, wherein processor 310, communication interface 320, memory 330 pass through communication bus 340 Complete mutual communication.Processor 310 can call the meter that is stored on memory 330 and can run on processor 310 Calculation machine program, the method to execute the various embodiments described above offer, for example, characteristic frame image is extracted from video to be clipped, Wherein the characteristic frame image is the image for including pre-set categories picture；The characteristic frame image is input to video clipping mould In type, the testing result of the video clipping model output is obtained, the testing result indicates the excellent of the characteristic frame image Degree；According to the testing result, editing is carried out to the video to be clipped；Wherein, the video clipping model is according to pre- The video sample training first obtained obtains, and the picture frame in the video sample is marked with the weight for indicating video highlight degree.

Also for example, characteristic frame image is extracted from ball video to be clipped, wherein the characteristic frame image is packet Include the image preset ball scene He preset ball movement；The characteristic frame image is input to ball video clipping model In, the testing result of the ball video clipping model output is obtained, the testing result indicates the essence of the characteristic frame image Color degree；According to the testing result, editing is carried out to the ball video；Wherein, the ball video clipping model is root Obtained according to the ball scene video sample and ball action video sample training that obtain in advance, the ball scene video sample and Picture frame in ball action video sample is marked with the weight for indicating video highlight degree respectively.

In addition, the logical order in above-mentioned memory 330 can be realized by way of SFU software functional unit and conduct Independent product when selling or using, can store in a computer readable storage medium.Based on this understanding, originally Substantially the part of the part that contributes to existing technology or the technical solution can be in other words for the technical solution of invention The form of software product embodies, which is stored in a storage medium, including some instructions to So that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation of the present invention The all or part of the steps of example the method.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. it is various It can store the medium of program code.

The embodiment of the present invention also provides a kind of non-transient computer readable storage medium, is stored thereon with computer program, The computer program is implemented to carry out the various embodiments described above offer method when being executed by processor, for example, to be clipped Characteristic frame image is extracted in video, wherein the characteristic frame image is the image for including pre-set categories picture；By the feature Frame image is input in video clipping model, obtains the testing result of the video clipping model output, the testing result table Show the excellent degree of the characteristic frame image；According to the testing result, editing is carried out to the video to be clipped；Wherein, institute Stating video clipping model is to be obtained according to the video sample training obtained in advance, and the picture frame in the video sample is marked with table Show the weight of video highlight degree.

The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member It is physically separated with being or may not be, component shown as a unit may or may not be physics list Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs In some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness Labour in the case where, it can understand and implement.

Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation Method described in certain parts of example or embodiment.

Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features； And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims

1. a kind of video clipping method characterized by comprising

Characteristic frame image is extracted from video to be clipped, wherein the characteristic frame image is the figure for including pre-set categories picture Picture；

The characteristic frame image is input in video clipping model, the testing result of the video clipping model output is obtained, The testing result indicates the excellent degree of the characteristic frame image；

Wherein, the video clipping model is to be obtained according to the video sample training obtained in advance, the figure in the video sample As frame flag has the weight for indicating video highlight degree.

2. video clipping method according to claim 1, which is characterized in that the video clipping model also according to it is described The corresponding audio sample training of video sample obtains, and the audio frame in the audio sample, which is marked with, indicates video highlight degree Label, and the label is configured according to pre-set preset sound type.

3. video clipping method according to claim 2, which is characterized in that described that the characteristic frame image is input to view In frequency editing model, before obtaining the testing result of video clipping model output, further includes:

According to video sample and audio sample corresponding with the video sample, obtain characteristic frame sample image and with it is described The corresponding audio sample training frames of characteristic frame sample image；Wherein, the characteristic frame sample image be include pre-set categories The image of picture, the audio sample training frames are provided with label corresponding with affiliated preset sound type；

The corresponding initial weight of the excellent degree of different scenes in pre-set pre-set categories picture is substituted into default neural network In model, and the characteristic frame sample image is input in the default neural network model, obtains the default nerve net The image result of network model output；Wherein described image result indicates the excellent degree of the characteristic frame sample image；

According to label corresponding to described image result and the audio sample training frames, to being substituting to the default nerve net Initial weight in network model is trained, and obtains the video clipping model.

4. video clipping method according to claim 3, which is characterized in that it is described according to video sample and with the video The corresponding audio sample of sample obtains characteristic frame sample image and audio sample corresponding with the characteristic frame sample image This training frames, comprising:

According to default framing mode, sub-frame processing is carried out to the video sample and audio sample respectively, obtains multiple video frames Image and multiple audio frames corresponding with the multiple video frame images；

Based on pre-set categories picture, the multiple video frame images are identified, obtain the characteristic frame sample image；

Audio sample training frames corresponding with the characteristic frame sample image are obtained from the multiple audio frame, and are based on institute Preset sound type is stated, label corresponding with affiliated preset sound type to audio sample training frame flag.

5. video clipping method according to claim 3, which is characterized in that described according to described image result and described Label corresponding to audio sample training frames is trained the initial weight being substituting in the default neural network model, Obtain the video clipping model, comprising:

Based on label corresponding to the audio sample training frames, the accuracy of described image result is detected；

When detecting the accuracy of described image result lower than preset threshold, the initial weight is adjusted；

Based on the video sample and audio sample, default neural network model adjusted to initial weight carries out accuracy and tests Card, and when the accuracy that verifying obtains described image result is greater than the preset threshold, substitution there is into weight adjusted Default neural network model is determined as the video clipping model.

6. video clipping method according to claim 5, which is characterized in that it is described that the initial weight is adjusted, Include:

Obtain the adjustment ratio of initial weight；

When the adjustment ratio for detecting the initial weight is greater than preset ratio threshold value, forbid adjusting the initial weight It is whole；

When the adjustment ratio for detecting the initial weight is less than or equal to the preset ratio threshold value, to the initial weight It is adjusted.

7. video clipping method according to claim 1, which is characterized in that when the video to be clipped is table tennis live streaming When video, the pre-set categories picture includes at least one in following: sportsman's erect-position, sportsman move footwork, sportsman's batting is moved Work, sportsman's hitting distance, sportsman's hitting time, instructor in broadcasting visual angle and subtitle.

8. a kind of ball live video clipping method characterized by comprising

Characteristic frame image is extracted from ball video to be clipped, wherein it includes to preset ball scene that the characteristic frame image, which is, With the image for presetting ball movement；

The characteristic frame image is input in ball video clipping model, the inspection of the ball video clipping model output is obtained It surveys as a result, the testing result indicates the excellent degree of the characteristic frame image；

According to the testing result, editing is carried out to the ball video；

Wherein, the ball video clipping model is according to the ball scene video sample and ball action video sample obtained in advance This training obtains, and the picture frame in the ball scene video sample and ball action video sample is marked with expression video respectively The weight of excellent degree.

9. a kind of electronic equipment including memory, processor and stores the program that can be run on a memory and on a processor, It is characterized in that, the processor realizes video clipping method as described in any one of claim 1 to 7 when executing described program The step of or the step of ball live video clipping method as claimed in claim 8.

10. a kind of non-transient computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer The step of video clipping method as described in any one of claim 1 to 7 is realized when program is executed by processor or as right is wanted Described in asking 8 the step of ball live video clipping method.