Background technology
Current, multimedia, Digital Television and development of internet technology have produced a large amount of video documents, how these video documents are effectively managed and how to be realized that fast browsing has become a urgent problem.And football becomes an importance of sports video research because extensively paid close attention to by people for the management and the retrieval of football video.Because the singularity of football video, it handles also different with the disposal route of general news video and film video.In short, the technology that football video is handled comprises that mainly shot boundary detection, shot classification, slow motion detect, extract excellent camera lens, court scene rebuilding, event detection and video frequency abstract generation etc.
In order better to understand the football video treatment technology, at first introduce several related notions:
Shot boundary detects: be also referred to as Video Segmentation, be meant video flowing is divided into a series of meaningful and manageable camera lenses.
The shot boundary coefficient: a kind of eigenwert of shot change degree when being used for weighing camera lens and switching can be used to carry out shot boundary and detects.
Shot classification: the lens type according to definition is classified continuous football video camera lens.
Football video lens type: generally speaking, can the football video camera lens be defined as four types (typonym is difference slightly) according to the FOV (Field of View) that video image shows: main lens type, middle lens type, close-up shot type and other lens type.As shown in Figure 1, wherein, a and b are the main lens type, and c and d are middle lens type, and e and f are the close-up shot type, and g and h are other lens type.
Video frequency abstract: browse for ease of people, and the one section sequence of video images (audio frequency or absence of audio are arranged) that from original video, extracts, this sequence has kept the main contents of original video and has omitted a large amount of details, therefore it can offer the comparatively concise information of the relevant original video substance of people, its content can be summarized the main contents of original video, but its length is than original video much shorter.
Video frequency abstract is to realize the content-based effective tool that carries out video frequency searching, has been subjected to extensive concern in recent years in the video research field.As everyone knows, the summary of one piece of article is the high level overview to article content, its length is compared with entire article and is wanted much shorter, but it has reflected the main contents of article, basically just can understand main contents in full by reading the article abstract reader, therefore we utilize the summary of article to come document is carried out primary election usually, and then select own interested intensive reading in the article of just selecting.In video frequency searching, people equally also wish to set up for each segment length's video the summary of a correspondence, different is that we can not only set up video frequency abstract with literal, is understood by people easilier but will make full use of, more intuitively video in the video and audio-frequency information.
Video frequency abstract has very practical value, can be applied in many fields such as video monitor, video request program, home entertaining, advertisement, education and television production, and a good video summarization system can improve the utilization ratio of video data greatly.
Fig. 2 is the method synoptic diagram of generation abstract of football video content-based in the prior art, as seen from Figure 2, can this three part of shot boundary detection, shot classification and video frequency abstract be realize successfully generating the gordian technique of video frequency abstract, will how to realize carrying out comparatively detailed introduction with regard to these three kinds of technology in the prior art below.
Shot boundary detects:
Because fairly simple and be easy to realize, be current shot boundary detection algorithms the most commonly used based on the shot boundary detection algorithms of frame difference, and the researchist has also carried out a lot of improvement to this algorithm.Such as, at the characteristics of sudden change and gradual change, proposed a kind of secondary frame difference method of suitable sudden change detection and the window maximum value process that suitable gradual change detects: the secondary frame difference method is poor by the two sub-frames of calculating between the consecutive frame difference, has effectively given prominence to the feature of sudden change; And the feature that the window maximum value process has effectively been given prominence to gradual change by the frame difference of calculating between the non-adjacent frame utilizes the window maximal value to find the center of gradual change accurately, and the combination of two kinds of methods can effectively detect nearly all sudden change and gradual change.
Although above-mentioned shot boundary detection algorithms based on the frame difference is fairly simple,, add threshold value need manually be set, so effect often not fine because this class algorithm major part is not all considered the motion of video camera and the influence of big object of which movement.
At present another many direction of research is based on machine Learning Theory and carries out shot boundary and detect, and this method does not need manually to be provided with threshold value, threshold value can according to given video dynamic change.Study the machine learning algorithm that the algorithm that often and reaches reasonable effect is based on support vector machine (SVM) in these class methods; In addition, also have some that SVM is dissolved into algorithm in other algorithm, as in hidden Markov model (HMM) algorithm, incorporating the SVM algorithm, or adopt many features earlier camera lens to be carried out rough segmentation according to hierarchical sequence from coarse to fine, and then the algorithm of progressively refinement etc.
Though the method based on machine Learning Theory can both reach effect preferably on accuracy rate and recall ratio, but these class methods need very big calculated amount mostly, and selection of training is also quite complicated, therefore this method tends to reduce detection efficiency when satisfying degree of accuracy, sometimes even can make and detect failure.
Except above-mentioned two kinds of methods, also have a kind of method of using often that is used for the shot boundary detection to be based on the shot boundary detection algorithms of model.This algorithm comprises two parts, modeling and Shot Detection, at first rely on color and the Luminance Distribution of calculating the shot transition front and back to set up a camera lens transformation model, carry out Shot Detection then, regard one section video as a continuous frame stream, use Reynolds now transformation theory analyze rheologyization in predefined control capacity.Such as, carry out modeling according to gradual change principle, the Shot Detection algorithm of being fade-in fade-out that proposes, this algorithm needs the data volume of processing smaller owing to only consider single color component, and efficient is higher, the accuracy of judging is also higher, and this algorithm can with utilize histogram to judge the effective combination of method of sudden change, finish the identification of sudden change and gradual change simultaneously, practicality is stronger.
But the quality that this shot boundary detection algorithms based on model detects effect directly depends on the model of being set up, and certain limitation is arranged; And, when setting up corresponding camera lens transformation model, also must have the knowledge of association area, need analyze for a long time and test model.
Shot classification:
In the prior art shot classification method relatively commonly used be with the court meadowbrook in entire image shared ratio as feature, ratio that calculates and the threshold value that sets in advance are compared, distinguish long according to comparative result apart from camera lens, middle apart from camera lens and other lens type.
Because this method does not provide an appropriate threshold to choose mode, causes whole proposal to implement the comparison difficulty; For middle apart from camera lens with long apart from the similar situation of camera lens color-ratio, classifying quality is also undesirable; And this method is to handle at entire image, and operand is bigger, is difficult to realize real-time processing.
A kind of Soccer Video Shot Classification Method of frame of video being carried out golden section has also been proposed in the prior art, this method is divided into as shown in Figure 3 nine zones that differ in size with the whole video frame, color distinction according to zones of different, utilize Bayes classifier to long apart from camera lens, middle apart from camera lens, close-up shot and outside the venue camera lens classify, but this method exists operand excessive equally, can't realize the problem of handling in real time.
Other method also comprises and will calculate the shared ratio of court meadowbrook and seek the method that non-meadowbrook maximum rectangular area combines and utilize SVM and the HMM theory is carried out the method for semantic classification to football video.Fig. 4 is and utilizes SVM to carry out football video sorting technique synoptic diagram, though this method improves on accuracy rate, but algorithm is realized more complicated, and needs to obtain a test training set before carrying out shot classification, thereby has increased the algorithm computation amount.
Video frequency abstract:
Existing a kind of method that generates abstract of football video is, zone, place in the motion court is detected on the basis, utilize the motion conditions of camera to locate the beginning and the end of peculair motion incident, and determine concrete type of sports, form the classification summary at last in conjunction with the sound signal of different motion type.
But, realize utilizing the special event of camera motion location, more complicated in the operation, even and in conjunction with sound signal, unavoidably wrongheaded situation can appear also.
Embodiment
The realization thought of the embodiment of the invention is: receive the football video stream of input, the lens boundary detection method of using based on running mean window frame difference carries out the shot boundary detection to described football video stream, obtains the camera lens collection; Application is carried out shot classification based on the shot classification method in subwindow zone with the camera lens collection that obtains; The camera lens collection of having classified is carried out excellent Shot Detection, detected excellent camera lens is exported as video frequency abstract.
For making purpose of the present invention, technical scheme and advantage clearer, below with reference to the accompanying drawing embodiment that develops simultaneously, the present invention is described in further detail.
Fig. 5 forms structural representation for apparatus of the present invention, and as shown in Figure 5, this device mainly comprises shot boundary detection module 501, shot classification module 502 and excellent shot detection module 503.
Shot boundary detection module 501 is used to receive football video stream, utilizes the lens boundary detection method based on running mean window frame difference that described football video stream is carried out the shot boundary detection, and the camera lens collection that obtains is sent to shot classification module 502.
Shot classification module 502 is used to receive the camera lens collection from shot boundary detection module 501, utilizes and based on the shot classification method in subwindow zone described camera lens collection is carried out shot classification, and sorted camera lens collection is sent to excellent shot detection module 503.
Excellent shot detection module 503 is used to receive the sorted camera lens collection from shot classification module 502, and described sorted camera lens collection is carried out excellent Shot Detection, and detected excellent camera lens is exported as video frequency abstract.
Wherein, Fig. 6 forms structural representation for shot boundary detection module 501 of the present invention, as shown in Figure 6, shot boundary detection module 501 comprises: thumbnail generation module 601, frame difference computing module 602, characteristic value calculating module 603 and shot transition type detection module 604, in addition, can further include a camera lens screening module 605.
Thumbnail generation module 601 is used to receive football video stream, asks for the thumbnail of each two field picture in the video flowing, and the thumbnail that generates is sent to frame difference computing module 602;
Frame difference computing module 602 is used to receive the thumbnail from thumbnail generation module 601, and the frame that calculates described thumbnail is poor, and result of calculation is sent to characteristic value calculating module 603;
Characteristic value calculating module 603, it is poor to be used to the frame that receives from frame difference computing module 602, calculate running mean window frame difference and shot boundary coefficient according to described frame difference, and, running mean window frame difference poor according to described frame and shot boundary coefficient calculations eigenwert, the eigenwert that calculates is sent to shot transition type detection module 604;
Shot transition type detection module 604, be used to receive eigenwert from characteristic value calculating module 603, described eigenwert and the threshold value of coming out according to eigenvalue calculation are compared, generate sudden change boundary set and gradual change boundary candidates collection and export to camera lens screening module 605 according to comparative result;
Shot transition type detection module 604 can be further used for, reception is from the eigenwert and the shot boundary coefficient of characteristic value calculating module 603, with described eigenwert and shot boundary coefficient as input vector, detect the sudden change border by the self-organization reflection method, and generation sudden change boundary set is exported to camera lens screening module 605;
Camera lens screening module 605, be used to receive sudden change boundary set and gradual change boundary candidates collection from shot transition type detection module 604, described gradual change boundary candidates collection is screened, remove the sudden change border of flase drop, and further gradual change boundary candidates collection is positioned, obtain the gradual change boundary set, sudden change boundary set and the gradual change boundary set determined are exported to shot classification module 502.
Fig. 7 forms structural representation for shot classification module 502 of the present invention, as shown in Figure 7, shot classification module 502 comprises: key frame read module 701, subwindow locating module 702, subwindow pixel rate computing module 703 and lens type determination module 704.
Key frame read module 701 is used to receive the camera lens collection that detects through shot boundary, according to the initial frame number of each camera lens with stop frame number and calculate the key frame images position, and described key frame images is sent to subwindow locating module 702.
Subwindow locating module 702, be used to receive key frame images from key frame read module 701, orient subwindow 1, subwindow 2 and subwindow 3 according to predefined subwindow locating rule, and the image after will locating sends to subwindow pixel rate computing module 703.
Subwindow pixel rate computing module 703, be used to receive positioning image from subwindow locating module 702, calculate shared ratio of court color pixel in the subwindow 1,2 and 3 and the shared ratio of edge pixel in the subwindow 1, and result of calculation is sent to lens type determination module 704.
Lens type determination module 704 is used to receive the result of calculation from subwindow pixel rate computing module 703, determines the type of different camera lenses according to described result of calculation, and exports after described camera lens is labeled as corresponding types.
Wherein, subwindow pixel rate computing module 703 is further used for, the subwindow zone is transformed into hue, saturation, intensity (HSV) space by red, green, blue (RGB) space, according to the shared ratio of court color pixel in the HSV spatial component calculating subwindow 1,2 and 3.
Fig. 8 forms structural representation for the excellent shot detection module 503 of the present invention, and as shown in Figure 5, excellent shot detection module 503 comprises: position detecting module 801, distance calculation module 802, audio extraction module 803 and excellent camera lens judge module 804.
Position detecting module 801 is used to receive sorted football video camera lens, and detects goal area position and football position in each two field picture, and testing result is sent to distance calculation module 802.
Distance calculation module 802 is used to receive goal area position and football position testing result from position detecting module 801, calculates the distance between two positions, and result of calculation is sent to excellent camera lens judge module 804.
Audio extraction module 803 is used to receive football video stream, and therefrom extracts audio frequency, sends to excellent camera lens judge module 804.
Excellent camera lens judge module 804, be used to receive from the result of calculation of distance between the goal area of distance calculation module 802 and the football position and from the audio-frequency information of audio extraction module 803, judge according to described result of calculation and audio-frequency information whether the present image content meets excellent camera lens requirement, if meet, then this image or camera lens are exported as video frequency abstract.
Generally speaking, it is relatively shorter and have the possible camera lens of score that excellent camera lens refers to duration that the goal area annex occurs, as shooting camera lens and scoring camera lens.
Based on the above device of introducing, Fig. 9 is the inventive method overview flow chart, as shown in Figure 9, may further comprise the steps:
Step 901: receive the football video stream of input, the lens boundary detection method of using based on running mean window frame difference carries out the shot boundary detection to football video stream, obtains the camera lens collection.
The lens boundary detection method based on running mean window frame difference in this step is: each two field picture in the football video stream of input is carried out convergent-divergent, obtain the thumbnail of each two field picture; The frame that calculates thumbnail is poor, running mean window frame difference and shot boundary coefficient, and, running mean window frame difference poor according to frame and shot boundary coefficient calculations eigenwert; According to eigenwert, or eigenwert and shot boundary coefficient detection sudden change border and gradual change border, and generate sudden change boundary set and gradual change boundary set.
Step 902: use and the camera lens collection that obtains is carried out shot classification based on the shot classification method in subwindow zone.
Sorted camera lens is divided into four types: main lens, middle camera lens, close-up shot and other camera lens.
The shot classification method based on the subwindow zone in this step is: receive the camera lens collection through the shot boundary detection of input, ask for the key frame of each camera lens; In key frame, orient subwindow 1, subwindow 2 and subwindow 3 according to the subwindow locating rule of predesignating; Add up shared ratio of court color pixel and/or the shared ratio of edge pixel in each subwindow, and determine lens type according to shared ratio of described court color pixel and/or the shared ratio of edge pixel.
Step 903: the camera lens collection of having classified is carried out excellent Shot Detection, detected excellent camera lens is exported as video frequency abstract.
The present invention adopts the distance judged between goal area and the football position whether to judge whether to occur excellent camera lens less than threshold value that sets in advance and the mode that is aided with audio-frequency information.
Come the inventive method is described in detail below by a preferred embodiment:
Different with lens boundary detection method of the prior art, the lens boundary detection method that is based on running mean window frame difference of the present invention, Figure 10 is the lens boundary detection method process flow diagram that the present invention is based on running mean window frame difference, as shown in figure 10, may further comprise the steps:
Step 1001: each two field picture in the football video stream of input is carried out convergent-divergent, obtain the thumbnail of each two field picture.
In order to reduce calculated amount, before carrying out the shot boundary detection, at first need two field picture is carried out convergent-divergent, obtain the thumbnail of each two field picture, concrete grammar is: as sampled point structure interpolating function, with this interpolating function the required resample points of convergent-divergent is carried out interpolation with original image pixels point, try to achieve the color value of resample points, and then obtain the scaled images pixel, according to these pixel structure thumbnails.Such as, can adopt the method for carrying out the one dimension interpolation on the row and column both direction respectively: at first construct interpolating function, selected node all is equidistant during owing to the structure interpolating function, and generally all is according to x
0, x
1, x
2... order choose interpolation knot successively, so can adopt n rank Newton's interpolation formula that image is carried out interpolation:
N
n(x)=z
0+a
1t+a
2t(t-1)+a
3t(t-1)(t-2)+a
4t(t-1)(t-2)(t-3)+...+ant(t-1)(t-2)...(t-n+1)。
Wherein, t=x-x
0, each coefficient a1, a2, the computing formula of a3...... is:
z
0, z
1, z
2, z
3, z
4... be respectively certain and go/list, sampled point x
0, x
1, x
2, x
3, x
2... the red/green component value of locating.
Step 1002: the frame that calculates thumbnail is poor, running mean window frame difference and shot boundary coefficient, and, running mean window frame difference poor according to frame and shot boundary coefficient calculations eigenwert.
What adopt in this step is to utilize the frame of the color Nogata difference calculating thumbnail in the HSV space poor, because each component in the HSV space all is continuous value, therefore before calculating the frame difference, they must be quantized, the present invention is according to the common way in the Flame Image Process, with each element quantization to 256 grade.
Simultaneously, in order to reduce the influence that video camera or big object of which movement cause, before calculating HSV Nogata difference, earlier every two field picture is carried out non-homogeneous piecemeal and Gauss's weighting.Figure 11 is the non-homogeneous piecemeal of two field picture and Gauss's weighting synoptic diagram, and the W among the figure represents the width of two field picture, and H represents the height of two field picture.Why adopting this method, is because main contents often concentrate on center section in the practical video, can emphasize to give prominence to main contents with the mode of weighting.After two field picture carried out piecemeal, the HSV Nogata that calculates corresponding blocks was poor, and then each piece is weighted, and after handling like this, the frame difference of i frame and j frame has just become:
Wherein, H
h(i, m, k), H
s(i, m, k) and H
v(i, m k) represent that respectively the Nogata of h, s, each component of v in the i frame m piece is poor, H
h(j, m, k), H
s(j, m, k) and H
v(j, m k) represent that respectively the Nogata of h, s, each component of v in the j frame m piece is poor.
Calculated after the frame difference, it is poor next will to calculate running mean window frame, and in the practical video sequence, when the camera lens switching took place, there was very big variation in frame missionary society, and a camera lens inside, the frame difference changes generally all smaller.Suppose that the frame difference when camera lens switches is D
Switch, and the frame difference in same camera lens inside is D
Inner, D is then arranged
Switch>>D
Inner, the width of establishing sliding window is 2N+1, and the running mean window frame difference of i frame is defined as:
Wherein, and D (i, j) frame of expression i frame and j frame is poor, and Figure 12 is running mean window frame difference computing method synoptic diagram.With N=3 is example, poor with their running mean window frame of following formula calculating respectively to 6 frames near the window the i frame, obtains Fsub (i+1), Fsub (i+2), Fsub (i+3), Fsub (i-1), Fsub (i-2) and Fsub (i-3).Suppose to have taken place in the i frame camera lens switching, because the frame difference in the same camera lens is very little, so the difference of the running mean window frame difference of the frame in the camera lens and the frame that switches and adjacent thereafter some frames is also all very little, supposes equally, is D
Switch, so, just can obtain following conclusion: near the value of the running mean window frame difference of 6 frames in window the intermediate frame, when camera lens taking place switch, approximate meets 1: 2: 3: 3: 2: 1 ratio also has similar conclusion when N gets different values.
Poor based on top running mean window frame, further try to achieve shot boundary coefficient S BC (i):
Vec is for ideally, and the approximate ratio of the running mean window frame difference of each frame in the sliding window except that intermediate frame relation, 1: 2: 3 when N=3 as previously mentioned: 3: 2: 1, it was one 6 vector of tieing up, Vec=(1,2,3,3,2,1).As can be seen from the above equation, when camera lens switched, the value of shot boundary coefficient approached 1 with extremely, i.e. SBC (i) → 1.In order further to add big-difference, can also adjust SBC (i): SBC ' (i)=exp[-10 * (1-SBC (i))].
Calculating continuously that frame is poor, after running mean window frame difference and the shot boundary coefficient, just can be worth further computation of characteristic values according to these, the eigenwert among the present invention comprises two, i.e. D
(1)(i, i+1) and D
(2)(i, i+1):
Wherein, D
(1)(i, i+1)=D (i, i+1) * SBC ' is (i);
D
(2)(i,i+1)=Fubs(i)×(1-SBC′(i))。
Step 1003: detect sudden change border and gradual change border according to eigenwert, and generate sudden change boundary set and gradual change boundary set.
With the eigenwert D that calculates in the step 1002
(1)(i, i+1) and D
(2)(correspondingly, the sliding window Gauss model that can be understood as length with two different parameters and be L detects respectively for i, i+1) suddenly change the respectively detection of camera lens and gradual change camera lens.
Threshold value T is set, T=μ+r σ, wherein, μ represents the average of eigenwert selected in the sliding window, and σ represents the mean square deviation of eigenwert selected in the sliding window, and r is a constant.When the detection of the camera lens that suddenlys change, according to eigenwert D
(1)(i, average i+1) and mean square deviation calculated threshold T, relatively D
(1)(i, i+1) whether greater than threshold value T, if greater than, then think the border of having found the sudden change camera lens.Can survey the gradual change shot boundary with identical method inspection, as long as with D
(1)(i i+1) is replaced by D
(2)(i i+1) gets final product.All detected sudden change borders and gradual change border are deposited respectively continuously, form sudden change boundary set B
Cut={ B
1, B
2..., B
nAnd gradual change boundary set B
Gradd={ B
1, B
2..., B
n.
Sudden change boundary set that obtains in the above-mentioned testing process and gradual change boundary set only are the initial survey result, in order to reduce the flase drop possibility, also need it is further screened.But according to experience, the initial survey on sudden change border is relatively more accurate, so only need further screen the gradual change boundary set, for convenience of description, the gradual change boundary set that initial survey is obtained is called gradual change boundary candidates collection.Employing is asked with respect to B
GradThe mode of relative complement, weed out B
GradIn the sudden change camera lens that is mixed with, obtain new gradual change boundary candidates collection B '
Grad, obviously, B '
Grad=B
Grad-B
CutTo new gradual change boundary candidates collection B '
GradFurther locate, because D
(2)(i, i+1) smoother, can come to judge more accurately that less than the mode of predefined certain threshold value gradual change begins and end position according to the sign change of second order difference and 3 continuous second difference score values by begin to ask to the left and right sides respectively second order difference from peak.Above-mentionedly ask relative complement and ask the method for second order difference to be general knowledge known in this field, do not describe in detail herein.
By repeatedly test discovery, when suddenling change Boundary Detection, the value of the coefficient r among the threshold value T is between 4 to 5, the value of window L is 25 and when carrying out the gradual change Boundary Detection, the value of coefficient r among the threshold value T is between 1 to 2, and the value of window L is 35 o'clock, and it is best to detect effect.
In order to test the effect of lens boundary detection method of the present invention, in experiment, various different types of videos all to be tested, table one is for carrying out the statistical effect of Boundary Detection to various dissimilar videos with the inventive method.
Table one
Except the method for introducing above, in actual applications, can also adopt self-organization mapping (SOM) method to realize that sudden change shot boundary of the present invention detects, this method does not need to be provided with threshold value, detects the sudden change shot boundary fully adaptively.
SOM is a kind of nothing supervision competition neural network with self-learning capability that Finn Kohonen proposes, Kohonen thinks, neural network is when receiving extraneous input, will be divided into different zones, different zones has different response characteristics to different patterns, be that different neurons responds signal excitation of different nature in the best way, thereby form the ordered graph on a kind of topological meaning.In this network, output node extensively links to each other with other node of its neighborhood, and the phase mutual excitation.Be connected by weight vector between input node and the output node,, constantly adjust weight vector, make that when stablizing all nodes of each neighborhood have similar output to certain input by certain rule.
Figure 12 A is the structural representation of SOM network, and as shown in the figure, p represents input vector, and the input vector in the embodiment of the invention is by shot boundary coefficient S BC ' (i) and eigenwert D
(1)(i, i+1) bivector of being formed; W represents weight vector, and its initial value can be set to greater than 0 less than 1 random number; The transition function of competition layer is: a=compet (n).
The SBC ' that will calculate according to every two field picture (i) and D
(1)(i i+1) carries out after the normalization as input vector, and promptly sample is input in the SOM network, the Euclidean distance between each input vector of SOM network calculations and the weight vector, and the image of the input vector correspondence that distance is minimum is detected first sudden change border; Afterwards, the network based learning rules of SOM are adjusted weight vector:
And computation process before repeating, promptly under new weight vector, detect the sudden change border; Each weight vector correspondence of calculating a learning rate, when certain learning rate during, detection of end process less than all before learning rate, and by before detected sudden change border form the sudden change boundary set.
Above-mentioned used SOM method is a prior art, does not do too much introduction.
Shot classification is the basis of realizing that video frequency abstract generates, and is the condition precedent that realizes the video fast browsing.For solving in the prior art problems such as shot classification method accuracy deficiency and operand be excessive, the present invention proposes a kind of shot classification method, utilize this method can reduce calculated amount, improve operation efficiency and can guarantee higher accuracy based on the subwindow zone.Figure 13 is the shot classification method flow diagram that the present invention is based on the subwindow zone, as shown in figure 13, may further comprise the steps:
Step 1301: receive the camera lens collection through the shot boundary detection of input, ask for the key frame of each camera lens.
All carry initial title f in each camera lens
StartWith termination frame number f
End, the key frame acquiring method of stipulating in this step is, with initial with finish the frame number sum divided by 2, i.e. f
Key=(f
Stan+ f
End)/2.
Step 1302: in key frame, orient subwindow 1, subwindow 2 and subwindow 3 according to the subwindow locating rule of predesignating.
What shot classification method of the present invention adopted is to distinguish different lens types according to the difference of color pixel proportion in court in the image, because for different lens types, because its content that comprises is different, its court color pixel shared obvious difference of ratio regular meeting in entire image, such as, main lens and with court middle camera lens as a setting, the shared ratio of its court color pixel can be than higher, and the shared ratio of court color pixel of close-up shot and other camera lens such as spectators' camera lens is lower.For some specific position, it is more obvious that this difference can be reacted.
Experiment shows, what the variation of the shared ratio of court color pixel reflected in each rectangular area shown in Figure 14 in the different lens types is the most obvious, dotted line institute area surrounded is subwindow 1 zone among the figure, solid line institute area surrounded is subwindow 2 zones, and adding heavy line institute area surrounded is subwindow 3 zones.Adopt these rectangular areas to calculate the meadow color ratio, both can reduce computational complexity, can keep the color distribution feature of whole two field picture again.
The difference of main lens and non-main lens type is: the shared ratio of main lens two field picture the latter half court color pixel is bigger, and non-court color pixels such as sportsman, football and judge comparatively disperse and shared ratio is smaller.In subwindow shown in Figure 14 1 zone, remove the influence in zones, non-court such as auditorium, goal and coach's seat after, guaranteed substantially to be the zone, court in the whole subwindow 1.During the shared ratio of court color pixel in calculating this window, main lens type and non-main lens type have evident difference, so utilize subwindow 1 can distinguish main lens and non-main lens type.
But, iff distinguish main lens and non-main lens type according to subwindow 1, inevitably can cause the type that makes some belong to main lens originally owing to flase drop to be mistaken as is non-main lens type, so, further introduce subwindow 2, by calculating color pixel shared ratio in court in the subwindow 2, be that the main lens type of non-main lens type is divided away from non-main lens type with thinking by mistake.
In from non-main lens type, distinguishing camera lens and non-in during lens type, the characteristics of lens frame image in considering: than non-middle lens type, the middle shared ratio of lens frame image lowermost distal end region court color pixel is bigger, so according to test findings, camera lens and non-middle lens type during the zone of selected subwindow 3 shown in Figure 14 is used for distinguishing.
When in non-, distinguishing close-up shot and other lens type in the lens type, consider that the target particles degree in other lens type is smaller, can have more marginal information in the window of comparable size, so adopt and at first image carried out binaryzation with the Canny operator, and then the method for the shared ratio of statistics edge pixel is judged other lens type, experiment shows, adopts subwindow shown in Figure 14 1 zone can obtain result preferably.
Step 1303: add up shared ratio of court color pixel and/or the shared ratio of edge pixel in each subwindow, and determine lens type according to shared ratio of described court color pixel and/or the shared ratio of edge pixel.
Determined after the position of each subwindow, next will add up the shared ratio of court color pixel in each subwindow, what the present invention adopted is the shared ratio of statistics court color pixel in the HSV space, select the benefit in HSV space to be: human eye can this space of independent perception in each color component, the i.e. variation of h, s, v; In addition, (v) the sentient respective color difference of the Euclidean distance between and human eye is linear for h, s, is a kind of color model that meets the human visual perception characteristic for the color triplet in this space.And for the football field, whole place all is made of the court color pixel basically, so utilize the tone component can add up the shared ratio of court color pixel.Figure 15 is the tone distribution schematic diagram, and as shown in figure 15, the present invention is when the shared ratio of statistics court color pixel, and the scope of tone component H is set between 75 °~105 °.
Figure 16 determines as shown in figure 16, to may further comprise the steps the method flow diagram of lens type according to the shared ratio of each subwindow zone shared ratio/edge pixel of court color pixel for the present invention:
Step 1601: the shared ratio R 1 of court color pixel in the statistics subwindow 1.
Step 1602: whether judge R1 more than or equal to the threshold value T1 that sets in advance, if think that then this lens type is the main lens type, and export after this camera lens is labeled as main lens; Otherwise, execution in step 1603.
Step 1603: judge that whether described R1 is more than or equal to T1 ', if then execution in step 1604 on the basis less than T1; Otherwise, execution in step 1606.
Step 1604: the shared ratio R 2 of court color pixel in the statistics subwindow 2.
Step 1605: whether judge R2 more than or equal to the threshold value T2 that sets in advance, if think that then this lens type is the main lens type, and export after this camera lens is labeled as main lens; Otherwise, execution in step 1606.
Step 1606: the shared ratio R 3 of court color pixel in the statistics subwindow 3.
Step 1607: whether judge R3 more than or equal to the threshold value T3 that sets in advance, if think that then this lens type is middle lens type, and export behind the camera lens during this camera lens is labeled as; Otherwise, execution in step 1608.
Step 1608: binaryzation is carried out in subwindow 1 zone, the shared ratio R 4 of edge pixel in the subwindow 1 after the statistics binaryzation.
Step 1609: whether judge R4 more than or equal to the threshold value T4 that sets in advance, if think that then this lens type is other lens type, and export after this camera lens is labeled as other camera lens; Otherwise, think that this lens type is the close-up shot type, and export after this camera lens is labeled as close-up shot.
Experiment shows, when the value of above-mentioned threshold value is respectively T
1=0.95, T '
1=0.8, T
2=0.9, T
3=0.7 and T
4=0.25 o'clock, can reach more than 90% the accuracy rate of Soccer Video Shot Classification.
At shot classification method of the present invention, adopted the multistage football video to carry out the test of aspects such as accuracy and correctness in the experiment, table two is shot classification method test result statistics of the present invention.
Table two
Figure 17 is the excellent lens detection method process flow diagram of the present invention, as shown in figure 17, may further comprise the steps:
Step 1701: receive sorted camera lens collection and video flowing and extract audio-frequency information.
Here the sorted camera lens collection of being mentioned is the camera lens collection of two types on main lens and middle camera lens; Can adopt the technology and the standard that provide in existing Motion Picture Experts Group (MPEG)-7 standard from video flowing, to extract voice data.
Step 1702: goal area position in the detected image and football position, calculate distance between the two.
Article " analysis of the football collection of choice specimens and introduction in the digital video (Analysisand Presentation of Soccer Highlights from Digital Video) " (Yow D is adopted in the detection of goal area, YeoBoon-Lock, Yeung M, et al..ACCV95,1995.) in the method introduced: whether goal post appears in the detected image, and then definite goal area position; The detection of football position then will be used the automatic detection and the tracking technique of motion object, motion object in the football video comprises sportsman and ball, and with respect to the court that occurs with the bulk green background, football and sportsman's color is obviously different, so can utilize colouring information that sportsman and football are distinguished from entire image, concrete grammar is: according to the color of pixel, it is labeled as green (G) or non-green (N); In each connected region of the pixel that is labeled as N, carry out being similar to the opening operation that carries out at bianry image and the operation of closed operation, with the noise in the removal of images; Then, remove and the irrelevant grandstand part of motion object, obtain rough motion subject area; Extract color characteristic, on the Rectangular Bounding Volume of the different colours motion object box that each is possible, with the candidate blocks of these rectangular areas as detection and tracking, as shown in figure 19, the image on the left side is original football video image among Figure 19, the image on the right has been determined the image of surveying and follow the tracks of candidate blocks for after handling through said process.
Further, according to the size of each rectangular area area, sportsman's zone candidate blocks and football zone candidate blocks can be differentiated, and extract respectively and go up the next feature, as position, size and speed etc.To each candidate blocks, with the colouring information of respective regions in the present frame as template, with the speed in the present frame and direction as a reference, in the proper range of next frame, adopt the method for template matches to search, thereby between the candidate blocks of consecutive frame, set up association, realize Continuous Tracking candidate blocks.
But; realize the Continuous Tracking of football candidate blocks is had certain difficulty; because generally shared pixel is all seldom in entire image for football; but also covered by the sportsman through regular meeting; in this case, can utilize the contextual feature between the candidate blocks of successive frame to follow the tracks of and handle.For example, continuous six two field pictures all can trace into the position of football candidate blocks in the first five frame, but can't trace in the 6th frame, at this moment, the positional information of football candidate blocks constitutes a function in available the first five frame, when detecting the 6th frame, derive the position of football candidate blocks according to described function.
After in every two field picture, detecting goal area and football position, calculate distance between the two.
Step 1703: the distance that will calculate compares with the threshold value that sets in advance, and judges that whether described distance is less than given threshold value.
If then think shooting and then execution in step 1704 to occur; Otherwise, think shooting not occur, continue to detect the next frame image.
Be the demand of the view that adapts to different focal, the threshold value of being mentioned not is a fixed value here, but will be set to different value according to the difference of motion object size.
Step 1704: judge cheer whether occurs in the audio frequency this moment, if cheer occurs, then execution in step 1705; Otherwise, think shooting not occur, continue to detect the next frame image.
Audio-frequency information can play the method for only carrying out excellent Shot Detection based on image and well replenish and booster action.Because in football match, tend to be attended by impassioned the separating of commentator when excellent camera lens occurring and say cheer with spectators, these sound are closely-related with the fierce degree of sportsman and action thereof.And when goal occurring, the keyword of " Goal " or " ball has advanced " and so on often all can appear in the audio frequency, and make good use of these information, can improve the accuracy rate of excellent Shot Detection greatly.
Judged under the situation that might occur shooting in the current picture in step 1703 this step further judges cheer whether occurs in the audio frequency this moment,, then thought shooting not occur, continued to detect the next frame image if do not occur; If occur, then execution in step 1705.
Step 1705: judge keyword whether occurs in the audio frequency this moment,, then think scoring if occur; Otherwise, then think shooting to occur but score not.
Further judge keyword whether occurs in the audio frequency this moment,,, then think shooting not occur if do not occur as the vocabulary of " Goal " or " ball has advanced " and so on; If occur, then think scoring, this frame/camera lens is exported as video frequency abstract.
Need explanation the time, specifically select which type of scene as video frequency abstract output to decide according to actual conditions, be not to leave no choice but be limited in the scoring.
One of ordinary skill in the art will appreciate that all or part of step that realizes in the foregoing description method is to instruct relevant hardware to finish by program, described program can be stored in the computer-readable recording medium, and described storage medium comprises: ROM/RAM, disk, CD etc.
As seen, adopted technical scheme of the present invention: the lens boundary detection method based on the running mean window can separate shot boundary and non-camera lens border effectively, simultaneously, this method can be used in combination with further raising testing result with any traditional lens boundary detection method based on the consecutive frame difference; Shot classification method of the present invention is only calculated less than 6% subwindow zone accounting for entire image, and algorithm is realized simple, has significantly improved operation efficiency, has guaranteed higher accuracy simultaneously again; Simultaneously, what the present invention adopted detects sorted camera lens, to obtain the mode of video frequency abstract, than the method for utilizing the camera motion situation to generate video frequency abstract of the prior art, realizes simple and has improved accuracy rate.
In sum, more than be preferred embodiment of the present invention only, be not to be used to limit protection scope of the present invention.Within the spirit and principles in the present invention all, any modification of being done, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.