CN110490064B

CN110490064B - Sports video data processing method and device, computer equipment and computer storage medium

Info

Publication number: CN110490064B
Application number: CN201910624923.7A
Authority: CN
Inventors: 张国辉; 雷晨雨
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-07-11
Filing date: 2019-07-11
Publication date: 2024-05-14
Anticipated expiration: 2039-07-11
Also published as: CN110490064A

Abstract

The application discloses a method and a device for processing sports video data and a computer storage medium, which relate to the technical field of data processing, and can be used for more comprehensively analyzing the sports video data and improving the processing precision of the sports video data. The method comprises the following steps: acquiring sports video sample data, wherein the sports video sample data comprises sports video pictures carrying technical action labels; inputting the sports video picture carrying the technical action label in the sports video sample data into a deep learning model for training, and constructing an action recognition model; when a processing request of sports video data is received, inputting a sports video picture in the sports video data to be processed into the action recognition model to obtain a technical action in the sports video picture; and counting technical actions in each sports video picture based on the request information carried in the processing request to obtain a processing result of sports video data.

Description

Sports video data processing method and device, computer equipment and computer storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a method and apparatus for processing sports video data, a computer device, and a computer storage medium.

Background

Technology is changing people's life, especially present big data trade, and big data brings the change of turning over the sky and covering the earth for certain trade. For example, the sports industries, whether football, basketball, racing, etc., have been greatly changed by utilizing the functions of capturing, storing, analyzing, etc., of big data.

The general sports event company is focused on the analysis of the event data, the core technical team can slice each game in seconds, and multidimensional processing is carried out on the sports video data in each slice, for example, for basketball games, the occurrence times of different technical actions in the sports video data are counted, for football games, the sports video data are classified based on different actions of athletes, so that more personalized and favorite sports video data can be pushed to users, and the experience of watching sports videos by the users is improved.

In the prior art, a mode of manual statistics or manual editing is generally adopted to process sports video data, a large amount of manpower is required to be consumed, and in addition, data results obtained by different manual processes have certain subjectivity, so that the consistency of the data processing results is poor.

Disclosure of Invention

In view of the above, the present invention provides a method, an apparatus, a computer device and a computer storage medium for processing sports video data, which mainly aims to solve the problem that the consistency of the processing result of the sports video data is poor due to a certain subjectivity of the data result obtained by manual processing at present.

According to an aspect of the present invention, there is provided a method of processing sports video data, the method comprising:

Acquiring sports video sample data, wherein the sports video sample data comprises sports video pictures carrying technical action labels;

Inputting the sports video picture carrying the technical action tag in the sports video sample data into a deep learning model for training, and constructing an action recognition model, wherein the action recognition model is used for recognizing the technical action in the sports video picture;

when a processing request of sports video data is received, inputting a sports video picture in the sports video data to be processed into the action recognition model to obtain a technical action in the sports video picture;

And counting technical actions in each sports video picture based on the request information carried in the processing request to obtain a processing result of sports video data.

Further, the deep learning model is a deep residual error network, the deep residual error network includes a multi-layer structure, the training is performed by inputting the sports video picture carrying the technical action tag in the sports video sample data into the deep learning model, and the action recognition model constructing includes:

extracting features of the sports video picture through the convolution layer of the depth residual error network to obtain feature parameters of the sports video picture in different technical actions;

Performing dimension reduction processing on the characteristic parameters of the sports video picture in different technical actions through a pooling layer of the depth residual error network to obtain the characteristic parameters of the sports video picture in different technical actions after the dimension reduction processing;

Summarizing characteristic parameters of the sports video picture subjected to the dimension reduction processing on different technical actions through a full connection layer of the depth residual error network to obtain weight values representing different technical actions in the sports video picture;

And generating a recognition result of the technical actions in the sports video picture according to the weight values representing the different technical actions in the sports video picture by the classification layer of the depth residual error network.

Further, the acquiring sports video sample data specifically includes:

The method comprises the steps of collecting sports video data captured by an acquisition device, and dividing the sports video data according to preset time intervals to obtain multi-section sports video frame data;

And selecting a plurality of sports video pictures from each piece of sports video frame data, and marking the sports video pictures based on technical actions in the sports video pictures to obtain sports video sample data.

Further, when the processing request is to count the occurrence times of the first technical action in the sports video data, the request information carries the first technical action requested to be counted, and based on the request information carried in the processing request, the counting of the technical action in each sports video picture is performed, and the processing result of the sports video data includes:

continuously judging the technical actions obtained by identification in the sports video data, and counting the occurrence times of each technical action in the sports video data;

Determining the occurrence times of the first technical action according to the occurrence times of each technical action in the sports video data to obtain a processing result of the sports video data;

the step of continuously judging the technical actions obtained by identification in the sports video data, and counting the occurrence times of each technical action in the sports video data specifically comprises the following steps:

Based on the technical actions obtained by identifying each video picture in the sports video data, if the technical actions obtained by identifying more than or equal to the preset number of sports video pictures continuously appear in the sports video data, judging that one technical action appears in the sports video data, and counting the occurrence times of each technical action in the sports video data.

Further, when the processing request of the sports video data is received, inputting the sports video picture in the sports video data to be processed into the action recognition model, and obtaining the technical action in the sports video picture, the method further comprises:

sports video data of different angles and different distances captured by the acquisition equipment are divided, so that the sports video data of different acquisition angles and different acquisition distances are obtained.

Further, the dividing the sports video data based on the sports video data of different angles and different distances captured by the acquisition device specifically includes:

identifying a human body target in the sports video picture through a human body detection algorithm, and determining the coordinates and the size of the human body target in the sports video picture;

Calculating the occupation ratio of the human body target in the sports video picture according to the coordinates and the size of the human body target in the sports video picture;

dividing the sports video data based on the ratio of the human body target in the sports video picture to obtain the sports video data with different acquisition angles and different acquisition distances.

Further, when the processing request is to screen a highlight in sports video data, the request information carries a second technical action corresponding to the highlight, and based on the request information carried in the processing request, the statistics is performed on the technical action in each sports video picture, so as to obtain a processing result of the sports video data, where the processing result includes:

Based on the corresponding acquisition angle and acquisition distance of the sports video data, screening out the sports video data with the forward acquisition angle and the acquisition distance within a preset range as candidate highlight segments;

Carrying out continuity judgment on the technical actions obtained by recognition in the candidate highlight clips, and counting the technical actions appearing in the candidate highlight clips;

Determining a candidate highlight with a second technical action according to the technical actions in the candidate highlight, and obtaining a processing result of sports video data;

The technical actions obtained by identifying the candidate highlight segments are continuously judged, and the technical actions appearing in the candidate highlight segments are counted, which concretely comprises the following steps:

Based on the technical actions obtained by identifying each video picture in the candidate highlight, if the technical actions obtained by identifying the sports video pictures with the number larger than or equal to the preset number continuously appear in the candidate highlight are the same, determining that the technical actions appear in the candidate highlight once, and counting the technical actions appearing in the candidate highlight.

According to another aspect of the present invention, there is provided a sports video data processing apparatus, the apparatus comprising:

The system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring sports video sample data, and the sports video sample data comprises sports video pictures carrying technical action labels;

The construction unit is used for inputting the sports video picture carrying the technical action tag in the sports video sample data into the deep learning model for training, and constructing an action recognition model which is used for recognizing the technical action in the sports video picture;

The identification unit is used for inputting the sports video pictures in the sports video data to be processed into the action identification model when receiving the processing request of the sports video data, so as to obtain technical actions in the sports video pictures;

And the statistical unit is used for counting the technical actions in each sports video picture based on the request information carried in the processing request to obtain the processing result of the sports video data.

Further, the deep learning model is a deep residual network, the deep residual network comprising a multi-layer structure, the building unit comprising:

the extraction module is used for extracting the characteristics of the sports video picture through the convolution layer of the depth residual error network to obtain characteristic parameters of the sports video picture in different technical actions;

the dimension reduction module is used for carrying out dimension reduction processing on the characteristic parameters of the sports video picture in different technical actions through a pooling layer of the depth residual error network to obtain the characteristic parameters of the sports video picture in different technical actions after the dimension reduction processing;

The summarizing module is used for summarizing the characteristic parameters of the sports video picture subjected to the dimension reduction treatment on different technical actions through the full-connection layer of the depth residual error network to obtain weight values representing different technical actions in the sports video picture;

the generation module is used for generating the recognition result of the technical actions in the sports video picture according to the weight values representing the different technical actions in the sports video picture through the classification layer of the depth residual error network.

Further, the acquisition unit includes:

the collection module is used for collecting sports video data captured by the collection equipment, and dividing the sports video data according to a preset time interval to obtain a plurality of pieces of sports video frame data;

And the marking module is used for selecting a plurality of sports video pictures from each section of sports video frame data, marking the sports video pictures based on technical actions in the sports video pictures, and obtaining sports video sample data.

Further, when the processing request is to count the occurrence times of the first technical action in the sports video data, the request information carries the first technical action requested to be counted, and the counting unit includes:

The first statistics module is used for carrying out continuity judgment on the technical actions obtained by recognition in the sports video data and counting the occurrence times of each technical action in the sports video data;

the first determining module is used for determining the occurrence times of the first technical action according to the occurrence times of each technical action in the sports video data to obtain a processing result of the sports video data;

The first statistics module is specifically configured to determine that a technical action occurs in the sports video data, and count the occurrence number of each technical action in the sports video data if the technical actions obtained by identifying each video picture in the sports video data are the same when the number of the technical actions obtained by identifying the sports video pictures in the sports video data is greater than or equal to a preset number.

Further, the apparatus further comprises:

The dividing unit is used for inputting the sports video pictures in the sports video data to be processed into the action recognition model when the processing request of the sports video data is received, and dividing the sports video data based on the sports video data with different angles and different distances captured by the acquisition equipment after the technical actions in the sports video pictures are obtained, so as to obtain the sports video data with different acquisition angles and different acquisition distances.

Further, the dividing unit includes:

The identification module is used for identifying a human body target in the sports video picture through a human body detection algorithm and determining the coordinates and the size of the human body target in the sports video picture;

The calculation module is used for calculating the occupation ratio of the human body target in the sports video picture according to the coordinates and the size of the human body target in the sports video picture;

The division module is used for dividing the sports video data based on the ratio of the human body target in the sports video picture to obtain the sports video data with different acquisition angles and different acquisition distances.

Further, when the processing request is to screen a highlight in the sports video data, the request information carries a second technical action corresponding to the highlight, and the statistics unit includes:

The screening module is used for screening out the sports video data with the collection angle being forward and the collection distance being within a preset range as candidate highlight segments based on the collection angle and the collection distance corresponding to the sports video data;

The second statistics module is used for carrying out continuity judgment on the technical actions obtained by recognition in the candidate highlight clips and counting the technical actions appearing in the candidate highlight clips;

The second determining module is used for determining candidate highlight clips with second technical actions according to the technical actions in the candidate highlight clips, and obtaining a processing result of sports video data;

The second statistics module is specifically configured to determine that a technical action occurs in the candidate highlight segment, and count the technical actions occurring in the candidate highlight segment if the technical actions obtained by identifying each video picture in the candidate highlight segment by continuously occurring sports video pictures greater than or equal to a preset number are the same.

According to a further aspect of the present invention there is provided a computer device comprising a memory storing a computer program and a processor implementing the steps of a method of processing sports video data when the computer program is executed by the processor.

According to still another aspect of the present invention, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a method of processing sports video data.

By means of the technical scheme, the method and the device for processing the sports video data are provided, the sports video sample data are obtained, the sports video pictures with the technical action labels contained in the sports video sample data are input into the deep learning model for training, and the action recognition model is constructed, so that the sports video data are processed according to the action recognition model, the sports video data are not required to be counted manually, technical actions appearing in the sports video data can be automatically recognized, and the processing precision of the sports video data is improved. Compared with the mode of processing the sports video data by adopting a manual statistics or manual editing mode in the prior art, the method provided by the invention has the advantages that the technical actions in the sports video data are identified through the constructed technical identification model, manual identification or manual editing is not needed, the processing time of the sports data is saved, and because of the subjectivity of manual processing, the processing result of the sports video data generated based on the technical identification model has higher precision, and the flexibility of the sports video data processing is improved.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

Fig. 1 is a schematic flow chart of a method for processing sports video data according to an embodiment of the present invention;

fig. 2 is a flow chart illustrating another method for processing sports video data according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a processing device for sports video data according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of another sports video data processing apparatus according to an embodiment of the present invention;

Fig. 5 is a schematic structural diagram of another sports video data processing apparatus according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The embodiment of the invention provides a processing method of sports video data, which can more comprehensively analyze the sports video data and improve the processing precision of the sports video data, as shown in fig. 1, and comprises the following steps:

101. sports video sample data is acquired.

The sports video sample data includes sports video pictures carrying technical action labels, where the sports video sample data may include, but is not limited to, various sports video data such as basketball, football, badminton, etc., and different technical actions may be set correspondingly for different balls, for example, for basketball technical actions, 13 categories may be set, specifically including 12 technical actions and 1 static action, such as backboard, aid, robbery, basketball, etc., and for football technical actions, 11 categories may be set, specifically including 10 technical actions and 1 static action, such as corner ball, arbitrary ball, boundary ball, etc.

In this embodiment, the sports video sample data is in the form of sports video pictures, and specifically, after the sports video data is collected, multiple pieces of video frame data are obtained by dividing the sports video data, where each piece of video frame data includes multiple sports video pictures, so that multiple sports video pictures are selected as the sports video sample data.

102. And inputting the sports video picture carrying the technical action label in the sports video sample data into a deep learning model for training, and constructing an action recognition model.

The motion recognition model is used for recognizing technical motions in the sports video pictures, and because the deep learning model has the function of training the mapping relation between the sports video pictures and the technical motions, the technical motions in the sports video pictures can be recognized through the constructed motion recognition model for the sports video pictures input into the model, and of course, the still motions can be recognized for the pictures which do not generate the technical motions in the sports video, and new technical motions can be added adaptively, so that the types of the technical motions are not limited.

For the embodiment of the invention, the deep learning model can be a deep residual error network model, and particularly, a network structure of an action recognition model can be constructed by repeatedly training sports video sample data, and the network structure can train input sports video pictures and provide correct input-output relations, namely recognition results which are correspondingly output for the sports video pictures in different technical actions, namely probability situations of different technical actions in the sports video pictures.

The structure of the specific depth residual error network model can be realized through a convolution layer, a pooling layer, a full connection layer and a classification layer structure, wherein the convolution layer is equivalent to an implicit layer of the depth residual error network and can be a multi-layer structure for extracting characteristic parameters of a deeper sports video picture in different technical actions; in the depth residual network model, to reduce parameters and computation, a pooling layer is often inserted in the middle of the continuous convolution layer; the full-connection layer is similar to the convolution layer, the neurons of the convolution layer are connected with the output local area of the upper layer, and two full-connection layers can be arranged for reducing the excessive output characteristic vectors, so that the training data are integrated after being trained through a plurality of convolution layers.

103. When a processing request of sports video data is received, inputting the sports video pictures in the sports video data to be processed into the action recognition model to obtain technical actions in the sports video pictures.

The sports video data to be processed requested to be processed is usually recently collected sports video data, and the sports video data contains a plurality of technical actions, so that the technical actions contained in the sports video data to be processed can be identified through an action identification model, and of course, the occurrence frequency of each technical action, the occurrence time of the technical action and other information can be further counted.

It should be noted that, before the sports video data to be processed is input into the motion recognition model, the sports video data to be processed is also required to be preprocessed, and the obtained sports video picture is input into the motion recognition model for recognition.

104. And counting technical actions in each sports video picture based on the request information carried in the processing request to obtain a processing result of sports video data.

In the embodiment of the invention, the processing request can be specifically a request for counting the occurrence times of a certain technical action in the sports video data, or can be a request for screening the highlight in the sports video data, and the specific content of the processing request is not limited.

Specifically, technical actions in each sports video picture can be summarized according to request information carried in the processing request, and recognition results of the corresponding technical actions are screened out from the technical actions to serve as processing results of sports video data. For example, the processing request is the occurrence frequency of the basket-catching technical action, and the occurrence frequency of the basket-catching technical action in the sports video data is obtained by counting the occurrence basket-catching technical action obtained by the action recognition model.

It should be noted that, for the technical action identified in the sports video picture, if the technical action only appears once in the adjacent sports video picture, it is explained that the technical action may not actually appear in the sports video data, but a similar action or a fake action made by the athlete, where the technical action appears in the sports video picture may be determined after the same technical action identified in a plurality of continuous sports video pictures, so that accuracy of technical action statistics may be ensured, and further, processing results of the sports video data may be improved.

The embodiment of the invention provides a processing method of sports video data, which is characterized in that sports video sample data are obtained, sports video pictures with technical action labels contained in the sports video sample data are input into a deep learning model for training, and an action recognition model is constructed, so that the sports video data are processed according to the action recognition model, the sports video data are not required to be counted manually, the technical actions appearing in the sports video data can be automatically recognized, and the processing precision of the sports video data is improved. Compared with the mode of processing the sports video data by adopting a manual statistics or manual editing mode in the prior art, the method provided by the invention has the advantages that the technical actions in the sports video data are identified through the constructed technical identification model, manual identification or manual editing is not needed, the processing time of the sports data is saved, and because of the subjectivity of manual processing, the processing result of the sports video data generated based on the technical identification model has higher precision, and the flexibility of the sports video data processing is improved.

The embodiment of the invention provides another method for processing sports video data, which can more comprehensively analyze the sports video data and improve the processing precision of the sports video data, as shown in fig. 2, and comprises the following steps:

201. and collecting the sports video data captured by the acquisition equipment, and dividing the sports video data according to a preset time interval to obtain a plurality of pieces of sports video frame data.

The capturing device is a video camera capable of capturing video, and of course, other devices with video capturing functions may also be used, which is not limited herein. In general, a plurality of acquisition devices are arranged at different positions and different distances on the sports event scene, so that viewers can watch the sports event better from different angles and different distances, and the process of the sports event can be reserved, so that the highlight of the sports event can be played back.

It should be noted that, here, an interval frame for dividing sports video data may be set, for example, one sports video data has 300 frames, the sports video data is divided into 3 segments, each segment of sports video data includes 100 frames, each frame is a still sports video picture, and a plurality of sports video pictures are selected as sports video sample data, where the number of the divided segments is not limited, and may be specifically determined according to the accuracy requirement in the practical model training process.

202. And selecting a plurality of sports video pictures from each piece of sports video frame data, and marking the sports video pictures based on technical actions in the sports video pictures to obtain sports video sample data.

In the embodiment of the invention, because a large number of sports video pictures are contained in the sports video data, technical actions appearing in each sports video picture may be different, for example, technical actions appearing in the 1 st to 5 th individual sports video pictures are static, calculation actions appearing in the 6 th to 8 th individual sports video pictures are attack-aiding, technical actions appearing in the 9 th to 10 th individual sports video pictures are basket, and the like, the sports video pictures need to be marked based on the technical actions in the sports video pictures before training the sports video pictures, so that the accuracy of a training obtained action recognition model is ensured.

203. And inputting the sports video picture carrying the technical action label in the sports video sample data into a deep learning model for training, and constructing an action recognition model.

To ensure normalization of the input video pictures, before training the deep learning model, the marked sports video pictures need to be initialized, for example, normalized to [0,1], and the subtracted mean value (denoted as m) divided by the standard deviation (denoted as u) scaled to a size of 224×224, to obtain x= (x-m)/u, m=0.5, u=0.23 for each sports video picture.

The depth residual network model may be rsnet models, and a TSN framework is adopted, so as to improve the recognition of the depth residual network model, a resnet classifier can be trained on a ImagNet dataset to serve as a standard of training performance, which is equivalent to pretraining the rsnet model.

Specifically, the depth residual error network can be composed of a 4-layer structure, the first layer structure is a convolution layer structure, the convolution layer structure can be composed of a plurality of convolution layers, and features of the sports video picture are extracted through the convolution layers, so that feature parameters of the sports video picture in different technical actions are obtained; the second layer structure is a pooling layer, and feature parameters of the sports video picture in different technical actions are subjected to dimension reduction processing through the pooling layer, so that feature parameters of the sports video picture in different technical actions after the dimension reduction processing are obtained; the third layer structure is a full-connection layer, and feature parameters of the sports video picture on different technical actions after the dimension reduction processing are summarized through the full-connection layer, so that weight values representing different technical actions in the sports video picture are obtained; the fourth layer structure is a classification layer, and a recognition result of the technical actions in the sports video picture is generated through the classification layer according to the weight values representing the different technical actions in the sports video picture, so as to construct an action recognition model.

For example, convolutional neural networks include 13 convolutional layers, 3 fully-connected layers, the number of convolution kernels per convolution layer may be set to be 64, 128, 256, 512, and between the 2 nd and 3 rd convolution layers, between the 4 th and 5 th convolution layers, between the 6 th and 7 th convolution layers, between the 8 th and 9 th convolution layers, between the 10 th and 11 th convolution layers, between the 13 th and 1 st fully-connected layers, all the layers are connected with 1 pooling layer, and the 13 convolution layers and the 3 full-connection layers are processed by nonlinear activation functions, wherein the number of layers of the convolution layers, the full-connection layers and the pooling layers is not limited, the layers can be specifically selected according to actual conditions, and the activation functions selected in each layer are not limited.

204. When a processing request of sports video data is received, inputting the sports video pictures in the sports video data to be processed into the action recognition model to obtain technical actions in the sports video pictures.

In the embodiment of the invention, since the motion recognition model is constructed for training the sports video picture with the technical motion label pre-marked, when the sports video picture extracted from the sports video data to be processed is input into the motion recognition model, the weight values of the sports video picture on different technical motions are output, the higher the weight value is, the higher the probability of recognizing the technical motion from the sports video picture is, and the technical motion corresponding to the highest weight value is taken as the technical motion in the recognition of the sports video picture.

205. Sports video data of different angles and different distances captured by the acquisition equipment are divided, so that the sports video data of different acquisition angles and different acquisition distances are obtained.

In general, a fixed camera may be used to capture sports video data of a fixed angle and a fixed distance, and of course, sports video data of different angles and different distances may be collected in combination with a long-distance lens, a medium-distance lens, and a short-distance lens, which are not limited herein.

It can be appreciated that, since the sports video data collected at different angles and different distances affects the subsequent data processing result, after marking the sports video picture, in order to facilitate the spectator to screen more wonderful video pictures, the sports video data can be more intuitively distinguished by dividing the sports video data at different distances and different angles.

Specifically, the coordinates and the sizes of the human body targets in the sports video pictures can be determined by identifying the human body targets in the sports video pictures through a human body detection algorithm, yolo target detection can be adopted for each sports video picture to obtain the human body targets in the pictures, the human body coordinates and the sizes of the human body targets, the occupation ratio of the human body targets in the sports video pictures is calculated according to the coordinates and the sizes of the human body targets in the sports video pictures, the length and the width of the target human body can be calculated to be in a threshold range, and sports video data are divided based on the occupation ratio of the human body targets in the sports video pictures, so that sports video data with different acquisition angles and different acquisition distances are obtained.

In general, the collected sports video data may be divided into far, middle and near shot video data based on the collection distance, and the collected sports video data may be divided into front, side and back video data based on the collection angle.

206. And counting technical actions in each sports video picture based on the request information carried in the processing request to obtain a processing result of sports video data.

In the embodiment of the invention, when the processing request is to count the occurrence times of the first technical action in the sports video data, the request information carries the first technical action which is requested to be counted, and the occurrence times of each technical action in the sports video data are counted by continuously judging the technical action which is identified in the sports video data, for example, the occurrence times of the basket-catching technical action in the sports video data are 7 times, the occurrence times of the attack-helping technical action is 10 times, the occurrence times of the backboard technical action is 4 times, and the like, and the occurrence times of the first technical action are determined according to the occurrence times of each technical action in the sports video data, so that the processing result of the sports video data is obtained.

For example, for counting the occurrence times of a certain technical action in the sports video data, since the action recognition model can recognize the technical action in each sports video picture in the sports video data, if there are three or more continuous sports video pictures recognized to be the same, it is determined that the technical action occurs once in the sports video data, and the occurrence times of a certain technical action in the sports video data can be obtained by accumulating the occurrence times of the technical action in the sports video data.

In the embodiment of the invention, when the processing request is to screen the highlight in the sports video data, the request information carries a second technical action corresponding to the highlight, and based on the collection angle and the collection distance corresponding to the sports video data, the sports video data with the collection angle being forward and the collection distance being within the preset range is screened out as the candidate highlight, the technical action appearing in the candidate highlight is counted by continuously judging the technical action identified in the candidate highlight, and the candidate highlight with the second technical action is determined according to the technical action appearing in the candidate highlight, so as to obtain the processing result of the sports video data.

For example, for screening highlight clips in sports video data, sports video data with a front acquisition angle and a acquisition distance within 100 meters is preferentially selected from the sports video data as candidate highlight clips, and similarly, since the action recognition model can recognize a technical action in each sports video picture in the candidate highlight clips, if there are three or more consecutive sports video pictures recognized that have the same technical action, it is determined that the technical action occurs in the candidate highlight clips, so that the candidate highlight clip in which the technical action occurs is determined as the highlight clip in the sports video data.

In addition, the recognition result obtained by inputting the sports video data into the recognition model is the probability of the sports video picture on different technical actions, which is equivalent to classifying the technical actions, and in the classification output, if the output is not 0-1, but is a real value, namely, the probability of belonging to each category, the recognition result can be evaluated by using the confidence, and the output probability represents the confidence of the sports video data corresponding to the different technical actions. For example, the confidence coefficient of the basket technical action corresponding to the output video data of the classifier is 0.51, which indicates that the reliability of the classification result is lower, and the recognition result with the confidence coefficient lower than the preset value can be preferentially deleted in the process of screening the highlight in the sports video data.

In addition, in the process of screening out the sports video data with the collection angle being forward and the collection distance being within the preset range as the candidate highlight, in order to ensure the duration of the highlight, duration limitation can be further performed on the candidate highlight, and the candidate highlight with the duration not within the preset numerical range can be deleted.

It may be appreciated that in the process of continuously determining the technical actions obtained by identifying the sports video data or the candidate highlight, specifically, the technical actions obtained by identifying each video picture in the sports video data or the candidate highlight may be counted, and if the technical actions obtained by identifying the sports video pictures with a number greater than or equal to the preset number of consecutive sports video pictures in the sports video data or the candidate highlight are the same, it is determined that one technical action occurs in the sports video data or the candidate highlight, and the number of occurrences of each technical action in the sports video data or the candidate highlight is counted.

Further, as a specific implementation of the method shown in fig. 1, an embodiment of the present invention provides a processing apparatus for sports video data, as shown in fig. 3, where the apparatus includes: an acquisition unit 31, a construction unit 32, a prediction unit 33, a statistics unit 34.

An obtaining unit 31, configured to obtain sports video sample data, where the sports video sample data includes a sports video picture carrying a technical action tag;

The construction unit 32 may be configured to input the sports video picture carrying the technical action tag in the sports video sample data into a deep learning model for training, and construct an action recognition model, where the action recognition model is used for recognizing the technical action in the sports video picture;

the identifying unit 33 may be configured to, when receiving a processing request of sports video data, input a sports video picture in the sports video data to be processed to the action identifying model, and obtain a technical action in the sports video picture;

The statistics unit 34 may be configured to perform statistics on technical actions in each sports video picture based on the request information carried in the processing request, so as to obtain a processing result of the sports video data.

According to the sports video data processing device, the sports video sample data are obtained, the sports video pictures with the technical action labels contained in the sports video sample data are input into the deep learning model for training, and the action recognition model is constructed, so that the sports video data are processed according to the action recognition model, statistics on the sports video data is not needed, technical actions appearing in the sports video data can be automatically recognized, and the processing precision of the sports video data is improved. Compared with the mode of processing the sports video data by adopting a manual statistics or manual editing mode in the prior art, the method provided by the invention has the advantages that the technical actions in the sports video data are identified through the constructed technical identification model, manual identification or manual editing is not needed, the processing time of the sports data is saved, and because of the subjectivity of manual processing, the processing result of the sports video data generated based on the technical identification model has higher precision, and the flexibility of the sports video data processing is improved.

As a further explanation of the processing apparatus for sports video data shown in fig. 3, fig. 4 is a schematic structural diagram of another processing apparatus for sports video data provided according to an embodiment of the present invention, as shown in fig. 4, the apparatus further includes:

the dividing unit 35 may be configured to input, when the processing request of the sports video data is received, a sports video picture in the sports video data to be processed to the action recognition model, and after obtaining a technical action in the sports video picture, divide the sports video data based on sports video data of different angles and different distances captured by the capturing device, to obtain sports video data of different capturing angles and different capturing distances.

Further, the dividing unit 35 includes:

The identification module 351 may be configured to identify a human body target in the sports video picture by using a human body detection algorithm, and determine coordinates and a size of the human body target in the sports video picture;

The calculating module 352 may be configured to calculate a ratio of the human target in the sports video picture according to the coordinate and the size of the human target in the sports video picture;

The dividing module 353 may be configured to divide the sports video data based on a ratio of the human target in the sports video picture, so as to obtain the sports video data with different acquisition angles and different acquisition distances.

Further, the deep learning model is a deep residual network, the deep residual network comprising a multi-layer structure, the building unit 32 comprising:

The extracting module 321 may be configured to extract features of the sports video picture through a convolution layer of the depth residual error network, so as to obtain feature parameters of the sports video picture in different technical actions;

The dimension reduction module 322 may be configured to perform dimension reduction processing on the feature parameters of the sports video picture in different technical actions through a pooling layer of the depth residual error network, so as to obtain feature parameters of the sports video picture in different technical actions after the dimension reduction processing;

The summarizing module 323 can be used for summarizing the characteristic parameters of the sports video picture subjected to the dimension reduction processing on different technical actions through the full connection layer of the depth residual error network to obtain weight values representing different technical actions in the sports video picture;

the generating module 324 may be configured to generate, by using a classification layer of the depth residual network, a recognition result of a technical action in the sports video picture according to a weight value characterizing different technical actions in the sports video picture.

Further, the acquisition unit 31 includes:

The collecting module 311 may be configured to collect sports video data captured by the collecting device, and divide the sports video data according to a preset time interval to obtain multiple pieces of sports video frame data;

The marking module 312 may be configured to select a plurality of sports video pictures from each piece of sports video frame data, and mark the sports video pictures based on technical actions in the sports video pictures, so as to obtain sports video sample data.

Further, when the processing request is to count the occurrence number of the first technical action in the sports video data, the request information carries the first technical action requested to be counted, and the counting unit 34 includes:

The first statistics module 341 may be configured to count the occurrence number of each technical action in the sports video data by performing continuity determination on the technical action identified in the sports video data;

the first determining module 342 may be configured to determine the number of occurrences of the first technical action according to the number of occurrences of each technical action in the sports video data, so as to obtain a processing result of the sports video data;

The first statistics module 341 may be specifically configured to determine that a technical action occurs in the sports video data, and count the occurrence number of each technical action in the sports video data, if the technical actions obtained by identifying each video picture in the sports video data are the same, where the number of technical actions obtained by identifying each video picture in the sports video data is greater than or equal to a preset number.

As a further explanation of the processing device for sports video data shown in fig. 3, fig. 5 is a schematic structural diagram of another processing device for sports video data according to an embodiment of the present invention, as shown in fig. 5, when the processing request is to screen a highlight in sports video data, the request information carries a second technical action corresponding to the highlight, and the statistics unit 34 includes:

The screening module 343 may be configured to screen out, based on the collection angle and the collection distance corresponding to the sports video data, the sports video data with the collection angle being forward and the collection distance being within a preset range as the candidate highlight;

a second statistics module 344, configured to count technical actions occurring in the candidate highlight through continuity determination of the technical actions identified in the candidate highlight;

a second determining module 345, configured to determine, according to the technical actions occurring in the candidate highlight, a candidate highlight in which a second technical action occurs, so as to obtain a processing result of sports video data;

the second statistics module 344 may be specifically configured to determine that a technical action occurs in the candidate highlight and count the technical actions occurring in the candidate highlight if the technical actions obtained by identifying each video picture in the candidate highlight are the same, where the technical actions obtained by identifying a preset number of sports video pictures that continuously occur in or greater than a preset number of sports video pictures in the candidate highlight.

It should be noted that, for other corresponding descriptions of each functional unit related to the processing apparatus for sports video data provided in this embodiment, reference may be made to corresponding descriptions in fig. 1 and fig. 2, and details are not repeated here.

Based on the above-described methods shown in fig. 1 and 2, correspondingly, the present embodiment further provides a storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described method for processing sports video data shown in fig. 1 and 2.

Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and includes several instructions for causing a computer device (may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective implementation scenario of the present application.

Based on the methods shown in fig. 1 and fig. 2 and the virtual device embodiments shown in fig. 3, fig. 4 and fig. 5, in order to achieve the above objects, the embodiments of the present application further provide a computer device, which may specifically be a personal computer, a server, a network device, etc., where the entity device includes a storage medium and a processor; a storage medium storing a computer program; a processor for executing a computer program to implement the above-described processing method of sports video data as shown in fig. 1 and 2.

Optionally, the computer device may also include a user interface, a network interface, a camera, radio Frequency (RF) circuitry, sensors, audio circuitry, WI-FI modules, and the like. The user interface may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., bluetooth interface, WI-FI interface), etc.

It will be appreciated by those skilled in the art that the physical device structure of the sports video data processing apparatus provided in this embodiment is not limited to this physical device, and may include more or fewer components, or may combine certain components, or may be a different arrangement of components.

The storage medium may also include an operating system, a network communication module. An operating system is a program that manages the computer device hardware and software resources described above, supporting the execution of information handling programs and other software and/or programs. The network communication module is used for realizing communication among all components in the storage medium and communication with other hardware and software in the entity equipment.

From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general hardware platforms, or may be implemented by hardware. Compared with the prior art, the technical scheme provided by the application has the advantages that the technical actions in the sports video data are identified through the constructed technical identification model, manual identification or manual editing is not needed, the processing time of the sports data is saved, and the processing result of the sports video data generated based on the technical identification model has higher precision due to certain subjectivity of manual processing, so that the flexibility of the sports video data processing is improved.

Those skilled in the art will appreciate that the drawing is merely a schematic illustration of a preferred implementation scenario and that the modules or flows in the drawing are not necessarily required to practice the application. Those skilled in the art will appreciate that modules in an apparatus in an implementation scenario may be distributed in an apparatus in an implementation scenario according to an implementation scenario description, or that corresponding changes may be located in one or more apparatuses different from the implementation scenario. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.

The above-mentioned inventive sequence numbers are merely for description and do not represent advantages or disadvantages of the implementation scenario. The foregoing disclosure is merely illustrative of some embodiments of the application, and the application is not limited thereto, as modifications may be made by those skilled in the art without departing from the scope of the application.

Claims

1. A method of processing sports video data, the method comprising:

Inputting a sports video picture carrying a technical action tag in sports video sample data into a deep learning model for training, constructing an action recognition model, wherein the action recognition model is used for recognizing technical actions in the sports video picture, the deep learning model is a deep residual error network, the deep residual error network comprises a multilayer structure, inputting the sports video picture carrying the technical action tag in the sports video sample data into the deep learning model for training, and constructing the action recognition model comprises the following steps: extracting features of the sports video picture through the convolution layer of the depth residual error network to obtain feature parameters of the sports video picture in different technical actions; performing dimension reduction processing on the characteristic parameters of the sports video picture in different technical actions through a pooling layer of the depth residual error network to obtain the characteristic parameters of the sports video picture in different technical actions after the dimension reduction processing; summarizing characteristic parameters of the sports video picture subjected to the dimension reduction processing on different technical actions through a full connection layer of the depth residual error network to obtain weight values representing different technical actions in the sports video picture; generating a recognition result of the technical actions in the sports video picture according to the weight values representing the different technical actions in the sports video picture by a classification layer of the depth residual error network;

Counting technical actions in each sports video picture based on request information carried in the processing request to obtain a processing result of sports video data, when the processing request is counting the occurrence times of a first technical action in the sports video data, based on the technical actions obtained by identifying each video picture in the sports video data, judging that one technical action occurs in the sports video data and counting the occurrence times of each technical action in the sports video data if the technical actions obtained by identifying sports video pictures with the number larger than or equal to the preset number are the same in the sports video data; and determining the occurrence times of the first technical action according to the occurrence times of each technical action in the sports video data to obtain a processing result of the sports video data.

2. The method according to claim 1, wherein the acquiring sports video sample data specifically comprises:

3. The method according to claim 1, wherein, upon said receiving a processing request of sports video data, inputting a sports video picture in the sports video data to be processed to the action recognition model, and obtaining a technical action in the sports video picture, the method further comprises:

4. A method according to claim 3, wherein the dividing of the sports video data based on sports video data of different angles and different distances captured by the acquisition device comprises:

5. The method according to any one of claims 1-4, wherein when the processing request is to screen a highlight in sports video data, the request information carries a second technical action corresponding to the highlight, and the counting the technical actions in each sports video picture based on the request information carried in the processing request, to obtain a processing result of the sports video data includes:

6. A processing apparatus for sports video data, the apparatus comprising:

The construction unit is used for inputting the sports video picture carrying the technical action tag in the sports video sample data into the deep learning model for training, constructing an action recognition model, wherein the action recognition model is used for recognizing the technical action in the sports video picture, the deep learning model is a depth residual error network, the depth residual error network comprises a multi-layer structure, inputting the sports video picture carrying the technical action tag in the sports video sample data into the deep learning model for training, and constructing the action recognition model comprises the following steps: extracting features of the sports video picture through the convolution layer of the depth residual error network to obtain feature parameters of the sports video picture in different technical actions; performing dimension reduction processing on the characteristic parameters of the sports video picture in different technical actions through a pooling layer of the depth residual error network to obtain the characteristic parameters of the sports video picture in different technical actions after the dimension reduction processing; summarizing characteristic parameters of the sports video picture subjected to the dimension reduction processing on different technical actions through a full connection layer of the depth residual error network to obtain weight values representing different technical actions in the sports video picture; generating a recognition result of the technical actions in the sports video picture according to the weight values representing the different technical actions in the sports video picture by a classification layer of the depth residual error network;

The statistics unit is used for counting the technical actions in each sports video picture based on the request information carried in the processing request to obtain a processing result of sports video data, when the processing request is the statistics of the occurrence times of the first technical actions in the sports video data, the statistics unit is used for judging that one technical action occurs in the sports video data and counting the occurrence times of each technical action in the sports video data based on the technical actions obtained by identifying each video picture in the sports video data if the technical actions obtained by identifying the sports video pictures with the continuous occurrence number being greater than or equal to the preset number are the same in the sports video data; and determining the occurrence times of the first technical action according to the occurrence times of each technical action in the sports video data to obtain a processing result of the sports video data.

7. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 5 when the computer program is executed.

8. A computer storage medium having stored thereon a computer program, which when executed by a processor realizes the steps of the method according to any of claims 1 to 5.