CN110516572A

CN110516572A - Method for identifying sports event video clip, electronic equipment and storage medium

Info

Publication number: CN110516572A
Application number: CN201910759733.6A
Authority: CN
Inventors: 徐鸣谦; 徐嵩; 李琳; 杜欧杰; 王科
Original assignee: Migu Cultural Technology Co Ltd; China Mobile Communications Group Co Ltd
Current assignee: Migu Cultural Technology Co Ltd; China Mobile Communications Group Co Ltd
Priority date: 2019-08-16
Filing date: 2019-08-16
Publication date: 2019-11-29
Anticipated expiration: 2039-08-16
Also published as: CN110516572B

Abstract

The embodiment of the invention provides a method for identifying a video clip of a sports event, electronic equipment and a storage medium, wherein the method comprises the following steps: identifying the action category of the sports event video clip by adopting a first preset model; the training of the first preset model adopts first sample data; the first sample data is data related to action categories; if the accuracy of the recognition result is lower than a preset threshold, re-recognizing the action type by adopting a second preset model, and taking the re-recognition result as a final recognition result of the action type; training of the second preset model adopts second sample data; the second sample data is data related to the relative position of a target reference object in the video clip of the sports event; the relative position is a position between the target reference object and the trigger portion of the action type. The method for identifying the video clips of the sports events, the electronic device and the storage medium provided by the embodiment of the invention can accurately identify the video clips of the sports events and also have the advantages of high efficiency, simplicity and strong universality.

Description

A kind of method, electronic equipment and storage medium identifying competitive sports video clip

Technical field

The present invention relates to technical field of video processing more particularly to a kind of methods for identifying competitive sports video clip, electricity Sub- equipment and storage medium.

Background technique

When video class app needs to issue short Video Roundup of some sports tournaments under different scenes (as scored, point Ball, penalty shot etc.), it, can also be by way of deep learning algorithm other than the video method of traditional short-sighted frequency of artificial editing AI is carried out to match video and automates editing.It is exactly to know to the scene of match video that AI, which automates editing firstly the need of what is done, Not, some video scenes can be identified by having the method for many deep learnings at present, such as use 3D convolutional neural networks pair Kinetics data set (personage's behavior class) identifies that Average Accuracy is up to 83.6%, for another example with LSTM network to UCF-101 Data set (movement of 101 classes) identifies that Average Accuracy is up to 88.6%.It can be found that: existing technical solution is for single The human action scene Recognition of change has good accuracy rate, still, for the scene Recognition of sports tournament class, especially basketball And football, effect be not just it is so ideal, for this kind of scene discrimination 60% or so, this has been primarily due to many people The scene that group is clustered round, and the movement span of personage's interaction is big, there are also various environmental differences, such as multi-angle of view, illumination, low resolution Factor.It is high so as to cause the complexity of training sample, so that the accuracy rate of identification disaggregated model is low.

Therefore, drawbacks described above how is avoided, accurately identifies competitive sports video clip, becoming need solve the problems, such as.

Summary of the invention

In view of the problems of the existing technology, the embodiment of the present invention provides a kind of side for identifying competitive sports video clip Method, electronic equipment and storage medium.

The embodiment of the present invention provides a kind of method for identifying competitive sports video clip, comprising:

It is identified using action classification of first preset model to sport event video segment；First preset model Training use first sample data；The first sample data are data relevant to action classification；

If the accuracy rate of recognition result is lower than preset threshold, the action classification is carried out again using the second preset model Identification, and using recognition result again as the final recognition result of the action classification；The training of second preset model uses Second sample data；Second sample data is related to the relative position of target object of reference in sport event video segment Data；The relative position is the position between target object of reference and the triggering position of action classification.

The embodiment of the present invention provides a kind of electronic equipment, comprising: memory, processor and storage are on a memory and can be The computer program run on processor, wherein

The processor realizes following method and step when executing described program:

The embodiment of the present invention provides a kind of non-transient computer readable storage medium, is stored thereon with computer program, should Following method and step is realized when computer program is executed by processor:

Method, electronic equipment and the storage medium of identification competitive sports video clip provided in an embodiment of the present invention, pass through The action classification of sport event video segment is recognized, and the model of second of identification will be with sport event video piece The relevant data in relative position of target object of reference in section, being capable of accurately identifier as the second trained sample data Race video clip is educated, is also had the advantages that efficient, simple and versatile.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.

Fig. 1 is the embodiment of the method flow chart of present invention identification competitive sports video clip；

Fig. 2 is another embodiment flow chart of method of present invention identification competitive sports video clip；

Fig. 3 is electronic equipment entity structure schematic diagram provided in an embodiment of the present invention.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.

Fig. 1 is the embodiment of the method flow chart of present invention identification competitive sports video clip, as shown in Figure 1, the present invention is real The method that a kind of identification competitive sports video clip of example offer is provided, comprising the following steps:

S101: it is identified using action classification of first preset model to sport event video segment；Described first is pre- If the training of model uses first sample data；The first sample data are data relevant to action classification.

Specifically, device identifies the action classification of sport event video segment using the first preset model；It is described The training of first preset model uses first sample data；The first sample data are data relevant to action classification.Dress Setting can be for electronic equipment, by taking basketball as an example, and action classification may include laying up, dunk shot, robbing backboard and penalty shot etc., due to laying up Higher with the similitude of the technical movements of dunk shot, therefore, the first preset model action classification that is not easily distinguishable out is laid up or is detained Basket, it is therefore, lower with the accuracy rate of the recognition result of dunk shot for laying up, it will usually lower than the recognition result for robbing backboard and penalty shot Accuracy rate.

First preset model can be for using non local module nonlocal and the double-current volume for expanding 3D convolution I3D and combining Product neural network, the embodiment of the present invention use double fluid expansion 3D convolution (I3D) convolutional neural networks as basic network, and Non local module (nonlocal) is added wherein to obtain better global effect.I3D convolutional neural networks are a kind of volume Product core and Chi Huahe expand into the network structure of 3D form, i.e., base all convolution kernels and Chi Huahe are long and wide in script On plinth, and increase time dimension.And nonlocal module extracts the space time information other than video, for obtaining depth nerve net The long-term memory and global information of network, have the characteristics that efficient, simple and versatile, and this nonlocal module can be with Very easily it is embedded into existing network frame.Therefore, the embodiment of the present invention uses the volume that nonlocal and I3D are combined Product neural network, and the convolutional neural networks are trained, then by trained convolutional neural networks, identify competitive sports The accuracy rate of the action classification of video clip.The training of first preset model can refer to subsequent explanation.

Fig. 2 is another embodiment flow chart of method of present invention identification competitive sports video clip, as shown in Fig. 2, can be with Video stream data is first obtained, video segmentation is then cut into, that is, sport event video segment is got, using I3D-nonlocal Model prediction is to be identified using action classification of first preset model to sport event video segment.

S102: if the accuracy rate of recognition result is lower than preset threshold, using the second preset model to the action classification It is identified again, and using recognition result again as the final recognition result of the action classification；The instruction of second preset model Practice and uses the second sample data；Second sample data is the opposite position with the target object of reference in sport event video segment Set relevant data；The relative position is the position between target object of reference and the triggering position of action classification.

Specifically, if device judgement knows that the accuracy rate of recognition result lower than preset threshold, uses the second preset model The action classification is identified again, and using recognition result again as the final recognition result of the action classification；Described The training of two preset models uses the second sample data；Second sample data be and the target in sport event video segment The relevant data in the relative position of object of reference；The relative position is between target object of reference and the triggering position of action classification Position.Preset threshold can be independently arranged according to the actual situation, and for the scene of above-mentioned basketball race, preset threshold is chosen as 70%, the accuracy rate of recognition result can be indicated with the confidence level of action classification.

Illustrate referring to the example above, for laying up with the accuracy rate of the recognition result of dunk shot lower than preset threshold, then uses Second preset model is identified that the second preset model can be convolutional neural networks CNN classifier to the action classification again. By taking basketball as an example, target object of reference is backboard frame, and the relative position of target object of reference is the position between backboard frame and hand, It is to be understood that for position of the backboard frame between hand farther out the case where, i.e. the distance between backboard frame and hand Greater than pre-determined distance, then the action classification is to lay up；For the closer situation in position of the backboard frame between hand, i.e. backboard The distance between frame and hand are less than pre-determined distance, then the action classification is dunk shot, and pre-determined distance can according to the actual situation certainly Main setting.The training of second preset model can refer to subsequent explanation.

The relative position, which can be, obtains every frame picture detection in the sport event video segment using yolo algorithm .Its full name of yolo is You Only Look Once:Unified, Real-Time Object Detection, You What Only Look Once was said is only to need a CNN operation, and Unified refers to that this is a unified frame, provides The prediction of end-to-end, and Real-Time embodiment is that yolo algorithm speed is fast.

The method of identification competitive sports video clip provided in an embodiment of the present invention, by sport event video segment Action classification is recognized, and the model of second identification is by the phase with the target object of reference in sport event video segment The second sample data to the relevant data in position as training, can accurately identify competitive sports video clip, also have Have the advantages that efficient, simple and versatile.

On the basis of the above embodiments, described that the action classification is identified again using the second preset model, and Using recognition result again as the final recognition result of the action classification, comprising:

If recognition result includes multiple action classifications again, the quantity of everything classification is obtained respectively, and most by quantity More action classifications is as the final recognition result.

If obtaining everything class respectively specifically, device judgement knows that again recognition result includes multiple action classifications Other quantity, and using the most action classification of quantity as the final recognition result.Referring to the example above, multiple action classifications It may include laying up and two classifications of dunk shot, obtain the lay up quantity of action classification and the quantity of dunk shot action classification respectively, such as Fruit lay up action classification quantity be greater than dunk shot action classification quantity, then will lay up as final recognition result；If dunk shot The quantity of action classification is greater than the quantity for action classification of laying up, then using dunk shot as final recognition result.

The method of identification competitive sports video clip provided in an embodiment of the present invention, is determined most by the quantity of action classification Whole recognition result is further able to accurately identify competitive sports video clip.

If judgement knows that the most action classification of quantity is not unique, the most action classification of all quantity is obtained respectively Confidence level, and using the maximum action classification of confidence value as the final recognition result.

If it is most to obtain all quantity respectively specifically, device judgement knows that the most action classification of quantity is not unique Action classification confidence level, and using the maximum action classification of confidence value as the final recognition result.Referring to above-mentioned Citing obtains the confidence for movement of laying up that is, if the quantity for action classification of laying up is equal to the quantity of dunk shot action classification respectively The confidence level of degree and dunk shot movement, if the numerical value of the confidence level for movement of laying up is greater than the numerical value of the confidence level of dunk shot movement, It will lay up as final recognition result；If the numerical value of the confidence level for movement of laying up is less than the numerical value of the confidence level of dunk shot movement, Then using dunk shot as final recognition result.

The method of identification competitive sports video clip provided in an embodiment of the present invention, passes through the confidence value of action classification It determines final recognition result, is further able to accurately identify competitive sports video clip.

On the basis of the above embodiments, the relative position is using yolo algorithm to the sport event video segment In every frame picture it is detected.

Specifically, the relative position in device is using yolo algorithm to every in the sport event video segment Frame picture is detected.It can refer to above description, repeat no more.

The method of identification competitive sports video clip provided in an embodiment of the present invention regards competitive sports using yolo algorithm The mode of every frame picture detection in frequency segment obtains the relative position, guarantees the accuracy that relative position obtains, is further able to It is enough accurately to identify competitive sports video clip.

On the basis of the above embodiments, first preset model is swollen using non local module nonlocal and double fluid The convolutional neural networks that swollen 3D convolution I3D is combined.

Specifically, first preset model in device is using 3D volumes of non local module nonlocal and double fluid expansion The convolutional neural networks that product I3D is combined.It can refer to above description, repeat no more.

The method of identification competitive sports video clip provided in an embodiment of the present invention, the first preset model is selected as non local The convolutional neural networks that module nonlocal and double fluid expansion 3D convolution I3D are combined, are further able to accurately identify sport Race video clip also has the advantages that efficient, simple and versatile.

On the basis of the above embodiments, second preset model is convolutional neural networks CNN classifier.

Specifically, second preset model in device is convolutional neural networks CNN classifier.It can refer to and state It is bright, it repeats no more.

Second preset model is selected as convolution mind by the method for identification competitive sports video clip provided in an embodiment of the present invention Through network C NN classifier, it is further able to accurately identify competitive sports video clip.

On the basis of the above embodiments, the training of first preset model, comprising:

Each action classification data of each sport event video segment are collected, and as the first sample data.

Specifically, device collects each action classification data of each sport event video segment, and as the first sample Data.Referring to the example above, first sample data can be respectively to lay up, dunk shot, rob this 5 classifications of backboard, penalty shot and background Video clip, each video clip can be 64 frames.

Pre-process the first sample data, and using the pretreated first sample data training nonlocal and The convolutional neural networks that the I3D is combined.

Specifically, device pre-processes the first sample data, and using pretreated first sample data training institute State the convolutional neural networks that nonlocal and the I3D are combined.Following steps can be divided by pre-processing: first, due to Different classes of sample size differs greatly, therefore, it is necessary to be overturn to sample, the behaviour of the image enhancements such as rotation and plus noise Make, to increase training sample, so that the sample size of each classification is balanced.Second, by the unified contracting of every frame picture of training sample It puts as same size, length and width as picture to be adjusted to 256*320 in the embodiment of the present invention are loaded with increasing in training process The speed of model.The every picture sampled out is cut to multiple new pictures by the way of random cropping by third, the present invention The size picture that 3 224*224 have just been cut in embodiment, as the input sample of algorithm, to increase the extensive of training sample Property.Format is read in suitable for the model of the embodiment of the present invention finally, converting the sample after data processing to, that is, is converted into LMDB Form, wherein sample catalogue address and label data are stored in LMDB.

Adjust suitable learning rate, the number of iterations and training parameter.In embodiments of the present invention, sample rate is set as 8, because This, the short-sighted frequency of 64 frames shares the input of 8 picture samples, and every 400 iteration save a model, when the number of iterations is 12800 When, error rate is minimum.Therefore, select the model under the number of iterations for trained first preset model.

The convolutional neural networks that the nonlocal and the I3D when being up to the first preset condition are combined are as instruction The first preset model perfected.

Specifically, the convolution mind that the nonlocal and the I3D when device is up to the first preset condition are combined Through network as trained first preset model.First preset condition can reach 12800 for above-mentioned the number of iterations, not make to have Body limits.

The method of identification competitive sports video clip provided in an embodiment of the present invention, by being instructed to the first preset model Practice, ensure that the accuracy of the first preset model, be further able to accurately identify competitive sports video clip, also there is height Effect, simple and versatile advantage.

On the basis of the above embodiments, the training of second preset model, comprising:

The corresponding each station-keeping data of target object of reference of each sport event video segment is collected, and as described second Sample data.

Specifically, device collects the corresponding each station-keeping data of target object of reference of each sport event video segment, and As second sample data.Second sample data can be respectively lay up, the basket in the video clip of two classifications of dunk shot Position data of the sheet frame relative to hand, each video clip can be 64 frames or so.

Second sample data is pre-processed, and using pretreated the second sample data training CNN classifier.

Specifically, device pre-processes second sample data, and using pretreated the second sample data training institute State CNN classifier.Pre-process the second sample data, using pretreated the second sample data training CNN classifier Step, can with above-mentioned pretreatment first sample data, training nonlocal and I3D combine convolutional neural networks the step of It is identical, it repeats no more.

CNN classifier when being up to the second preset condition is as trained second preset model.

Specifically, CNN classifier when device is up to the second preset condition is as the trained second default mould Type.Second preset condition may include that the number of iterations reaches preset times or model error less than default error, not limit specifically Fixed, preset times and default error can be independently arranged according to the actual situation.

The method of identification competitive sports video clip provided in an embodiment of the present invention, by being instructed to the second preset model Practice, ensure that the accuracy of the second preset model, be further able to accurately identify competitive sports video clip.

It should be understood that may include different types of action classification in same sport event video segment, it can Including laying up, dunk shot, rob backboard, penalty shot.Using the first preset model to robbing backboard and recognition result that penalty shot is identified Accuracy rate is higher than preset threshold, and uses the first preset model low with the accuracy rate for the recognition result that dunk shot is identified to laying up In preset threshold.Therefore, this method after step slol, can also include the following steps:

S102 ': the accuracy rate of recognition result is lower than the target action classification of preset threshold if it exists, then default using second Model identifies the target action classification again, and according to recognition result again and higher than the recognition result of the preset threshold It is determined as the final recognition result of the action classification；The training of second preset model uses the second sample data；Institute Stating the second sample data is data relevant to the relative position of target object of reference in sport event video segment；It is described opposite Position is the position between target object of reference and the triggering position of action classification.

Be described as follows referring to Fig. 2: the recognition result higher than the preset threshold is to rob backboard and penalty shot, synthesizes short-sighted frequency, It will lay up with dunk shot as target action classification, and be identified again using two preset models, at this point, recognition result and being higher than again The recognition result of the preset threshold may include laying up, dunk shot, robbing backboard, penalty shot, can be further respectively according to movement Quantity, the mode of the confidence level of action classification of classification determine final recognition result.Quantity, action classification about action classification Confidence level determine that the explanation of final recognition result can refer to the above-mentioned explanation laid up with two kinds of action classifications of dunk shot, it is no longer superfluous It states.

Fig. 3 is electronic equipment entity structure schematic diagram provided in an embodiment of the present invention, as shown in figure 3, the electronic equipment It include: processor (processor) 301, memory (memory) 302 and bus 303；

Wherein, the processor 301, memory 302 complete mutual communication by bus 303；

The processor 301 is used to call the program instruction in the memory 302, to execute above-mentioned each method embodiment Provided method, for example, identified using action classification of first preset model to sport event video segment；Institute The training of the first preset model is stated using first sample data；The first sample data are data relevant to action classification； If the accuracy rate of recognition result is lower than preset threshold, the action classification is identified again using the second preset model, and Using recognition result again as the final recognition result of the action classification；The training of second preset model uses the second sample Data；Second sample data is data relevant to the relative position of target object of reference in sport event video segment； The relative position is the position between target object of reference and the triggering position of action classification.

The present embodiment discloses a kind of computer program product, and the computer program product includes being stored in non-transient calculating Computer program on machine readable storage medium storing program for executing, the computer program include program instruction, when described program instruction is calculated When machine executes, computer is able to carry out method provided by above-mentioned each method embodiment, for example, uses the first preset model The action classification of sport event video segment is identified；The training of first preset model uses first sample data； The first sample data are data relevant to action classification；If the accuracy rate of recognition result is lower than preset threshold, use Second preset model identifies the action classification again, and using recognition result again as the final identification of the action classification As a result；The training of second preset model uses the second sample data；Second sample data is and sport event video The relevant data in relative position of target object of reference in segment；The relative position is the touching of target object of reference and action classification Send out the position between position.

The present embodiment provides a kind of non-transient computer readable storage medium, the non-transient computer readable storage medium Computer instruction is stored, the computer instruction makes the computer execute method provided by above-mentioned each method embodiment, example It such as include: to be identified using action classification of first preset model to sport event video segment；First preset model Training use first sample data；The first sample data are data relevant to action classification；If the standard of recognition result True rate is lower than preset threshold, then is identified again using the second preset model to the action classification, and recognition result will make again For the final recognition result of the action classification；The training of second preset model uses the second sample data；Described second Sample data is data relevant to the relative position of target object of reference in sport event video segment；The relative position is Position between target object of reference and the triggering position of action classification.

Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above method embodiment can pass through The relevant hardware of program instruction is completed, and program above-mentioned can be stored in a computer readable storage medium, the program When being executed, step including the steps of the foregoing method embodiments is executed；And storage medium above-mentioned includes: ROM, RAM, magnetic disk or light The various media that can store program code such as disk.

The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member It is physically separated with being or may not be, component shown as a unit may or may not be physics list Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs In some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness Labour in the case where, it can understand and implement.

Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation Method described in certain parts of example or embodiment.

Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features； And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims

1. a kind of method for identifying competitive sports video clip characterized by comprising

It is identified using action classification of first preset model to sport event video segment；The instruction of first preset model Practice and uses first sample data；The first sample data are data relevant to action classification；

If the accuracy rate of recognition result is lower than preset threshold, the action classification is known again using the second preset model Not, and using recognition result again as the final recognition result of the action classification；The training of second preset model is using the Two sample datas；Second sample data is relevant to the relative position of target object of reference in sport event video segment Data；The relative position is the position between target object of reference and the triggering position of action classification.

2. the method for identification competitive sports video clip according to claim 1, which is characterized in that described pre- using second If model identifies the action classification again, and using recognition result again as the final recognition result of the action classification, Include:

If recognition result includes multiple action classifications again, the quantity of everything classification is obtained respectively, and quantity is most Action classification is as the final recognition result.

3. the method for identification competitive sports video clip according to claim 2, which is characterized in that described pre- using second If model identifies the action classification again, and using recognition result again as the final recognition result of the action classification, Include:

If judgement knows that the most action classification of quantity is not unique, the confidence of the most action classification of all quantity is obtained respectively Degree, and using the maximum action classification of confidence value as the final recognition result.

4. the method for identification competitive sports video clip according to any one of claims 1 to 3, which is characterized in that the phase It is detected to every frame picture in the sport event video segment using yolo algorithm to position.

5. the method for identification competitive sports video clip according to any one of claims 1 to 3, which is characterized in that described the One preset model is the convolutional neural networks combined using non local module nonlocal and double fluid expansion 3D convolution I3D.

6. the method for identification competitive sports video clip according to any one of claims 1 to 3, which is characterized in that described the Two preset models are convolutional neural networks CNN classifier.

7. the method for identification competitive sports video clip according to claim 5, which is characterized in that the first default mould The training of type, comprising:

Each action classification data of each sport event video segment are collected, and as the first sample data；

Pre-process the first sample data, and using the pretreated first sample data training nonlocal and described The convolutional neural networks that I3D is combined；

The convolutional neural networks that the nonlocal and the I3D when being up to the first preset condition are combined are used as and train The first preset model.

8. the method for identification competitive sports video clip according to claim 6, which is characterized in that the second default mould The training of type, comprising:

The corresponding each station-keeping data of target object of reference of each sport event video segment is collected, and as second sample Data；

Second sample data is pre-processed, and using pretreated the second sample data training CNN classifier；

9. a kind of electronic equipment including memory, processor and stores the calculating that can be run on a memory and on a processor Machine program, which is characterized in that the processor is realized as described in any one of claim 1 to 8 when executing the computer program The step of method.

10. a kind of non-transient computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer It is realized when program is executed by processor such as the step of any one of claim 1 to 8 the method.