CN101990093A

CN101990093A - Method and device for detecting replay section in video

Info

Publication number: CN101990093A
Application number: CN2009101611618A
Authority: CN
Inventors: 韩博; 吴伟国
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2009-08-06
Filing date: 2009-08-06
Publication date: 2011-03-23

Abstract

The application relates to a method and device for detecting replay section in a video. The method comprises the following steps: acquiring the reliability of motion vectors in at least one pair of frames at two ends of a candidate section; classifying whether the candidate section is a replay section or not according to the reliability of the motion vectors; or classifying whether the candidate section is a replay section or not by utilizing a causal probabilistic relationship based on the integration of a lens translative mode matching fraction, a variable-speed playback lens statistical character, a lens content type statistical character and a lens quantity statistical character. The invention ensures that the replay section in the video can be effectively detected.

Description

Detect the method and apparatus of the playback segment in the video

Technical field

The application relates to the processing of video, relates in particular to the detection of playback segment in the video.

Background technology

The detection of playback segment is a very important field of Video processing, has a lot of purposes.For example, after sports tournament on-the-spot broadcasting, spectators seldom can carry out cephalocaudally watching again to its content, and general a small amount of wonderful of just being concerned about wherein, for example incidents such as the shooting in the football match, corner-kick, free kick, penalty kick and red card.Therefore, the automatic identification of wonderful and extraction just have very important value.Because the playback segment in the sports video is used to once more to show the excellent incident that has just taken place from various visual angles,, be the stabilizing effective clue of excellent Shot Detection so between it and the excellent camera lens natural getting in touch arranged.

Basic concepts is at first introduced in the explanation of back for convenience.Video is made of one or more camera lens.A camera lens is meant one group of inherent relevant frame being taken continuously by a video camera, and it is used for showing one group of continuous on space-time content.Video segment is a relative notion, is meant the part of video or bigger video segment, and it also is to be made of one or more camera lens.Playback segment is meant in video flowing (such as but not limited to the video flowing of live telecast) the time video segment that reproduces again of incident the preceding.

The type of video lens conversion has two kinds: sudden change (abrupt transition) and gradual change (gradual transition is called for short GT).Sudden change also often is called shear (cut), the situation that the last frame that it refers to previous camera lens and first frame of next camera lens directly link to each other.Certainly, in interleaved television broadcasting video, may have a frame in shear place and form, be clipped between two camera lenses by the last frame of previous camera lens and the first frame aliasing of next camera lens.And because video compression coding, even also can't remove aliasing effect fully to this frame partiting row sampling.This situation also belongs to shear.

Different with shear, during gradual change, previous camera lens carries out the transition to next camera lens by the change procedure of a lasting multiframe, that is to say, exists some frames to be sandwiched between two adjacent camera lenses in the video and does not belong to wherein any one camera lens.Common gradual change type mainly contains fade over (fadeout/in), dissolving (dissolve), wipe (wipe) etc.The image that is meant previous camera lens of fading out conceals until picture gradually and is entirely solid color, and shear is to next camera lens then; The camera lens transfer process of fading in and referring to and fading out opposite.Certainly, fade out and fade in the use that also can connect together in time.Dissolving is meant that the image of a back camera lens strengthens gradually when the image of previous camera lens blurs gradually, finishes the transition of camera lens in the overlapping process of such front and back lens image.The image that is meant a back camera lens of wiping becomes greatly according to certain rule gradually from the beginning of a certain zone, overrides fully up to the image previous camera lens.Wipe differently with common, the simultaneous animation flag flies into and the more complicated process of wiping that flies out is called animation and wipes (graphic wipe), is also referred to as sign transition (logo transition).

There have been many work to attempt to detect effectively playback segment in the literature.

Because the playback camera lens is usually playing than low velocity, so some work is devoted to detect slow motion mode.At H.Pan, people's such as P.van Beek Detection of slow-motion replaysegments in sports video for highlights generation, ICASSP 2001, vol.3, among the pp.1649-1652, the fluctuation of employing frame difference characterizes the slow motion by frame is repeated to produce.

At L.Wang, people Generic slow-motion replay detection insports video such as X.Liu, ICIP 2004, and vol.3 among the pp.24-27, adopts color, motion and lens length statistics to characterize slow motion.

At L.Gu, people's such as D.Bone Replay detection in sports video sequences.Proc.the Eurographics Workshop on Multimedia 1999, Springer Verlag, among the pp.3-12, the method for proposition is the fragment of search volume and time coupling in different camera lenses.This method hypothesis is reset and corresponding non-playback segment is taken by same video camera.

At J.Wang, people's such as E.Chng " Soccer replay detection using scenetransition structure analysis ", ICASSP 2005, vol.2, among the pp.433-436, propose in football video the feature of context lens type information as the detection of resetting.In this method, need the camera lens scene to classify and extract feature.

In order to emphasize playback segment, before fragment begins and after finishing the transition (logo transition) of twice sign can appear usually.Many work have utilized the sign sample to select and the thought of sign template matches detects playback segment, H.Pan for example, people's such as B.Li Automatic detection of replaysegments in broadcast sports programs by detection of logos in scenetransitions, ICASSP 2002, vol.4, pp.3385-3388, X.Tong, people's such as H.Lu Replay detection in broadcasting sports video, Int ' l Conf Image﹠amp; Graphics 2004, pp.337-340, and Q.Huang, and people's such as J.Hu A reliable logoand replay detector for sports video, ICME 2007, pp.1695-1698.

Summary of the invention

Provide hereinafter about brief overview of the present invention, so that basic comprehension about some aspect of the present invention is provided.Should be appreciated that this general introduction is not about exhaustive general introduction of the present invention.It is not that intention is determined key of the present invention or pith, neither be intended to limit scope of the present invention.Its purpose only is to provide some notion with the form of simplifying, with this as the preorder in greater detail of argumentation after a while.

A main purpose of the present invention is to provide the method for the playback segment in a kind of new detection video.

According to an aspect of the present invention, a kind of method that detects the playback segment in the video comprises: the first motion vector reliability obtaining step, obtain the reliability of the motion vector at least one pair of frame at candidate segment two ends; Classification step, whether playback segment is classified to candidate segment according to the reliability of described motion vector.

According to a further aspect in the invention, a kind of equipment that detects the playback segment in the video comprises: the first motion vector reliability deriving means, obtain the reliability of the motion vector at least one pair of frame at candidate segment two ends; The candidate segment sorter, whether playback segment is classified to candidate segment according to the reliability of described motion vector.

In addition, embodiments of the invention also provide the computer program of the method for the playback segment that is used for realizing above-mentioned detection video.

In addition, embodiments of the invention also provide the computer program of computer-readable medium form at least, record the computer program code of the method for the playback segment that is used for realizing above-mentioned detection video on it.

The present invention can detect the playback segment in the video effectively.

Description of drawings

With reference to below in conjunction with the explanation of accompanying drawing, can understand above and other purpose of the present invention, characteristics and advantage more easily to the embodiment of the invention.Parts in the accompanying drawing are just in order to illustrate principle of the present invention.In the accompanying drawings, same or similar technical characterictic or parts will adopt identical or similar Reference numeral to represent.

Fig. 1 is the schematic diagram that is used for the reliability of account for motion vector;

Fig. 2 is the flow chart according to the method for the playback segment in the detection video of one embodiment of the present invention;

Fig. 3 is the flow chart according to the method for the playback segment in the detection video of another embodiment of the invention;

Fig. 4 is used for the schematic diagram of the reliability category of account for motion vector in the regularity of distribution of gradual change frame;

Fig. 5 is for matching to obtain the schematic diagram of candidate segment to detected camera lens conversion in one embodiment of the invention;

Fig. 6 detects the histogram of the effect of playback segment for utilizing the mistake match motion vector;

Fig. 7 is for detecting the flow chart of the method for the playback segment in the video in one embodiment of the invention;

Fig. 8 is the flow chart that detects the method for the playback segment in the video in another embodiment of the invention;

Fig. 9 is the flow chart that obtains the step of camera motion feature in the method for the playback segment in Fig. 7, the detection video shown in Figure 8;

Figure 10 utilizes the motion vector reliability to detect the schematic diagram of the effect of camera motion;

Figure 11 is the flow chart that detects the method for the playback segment in the video in another embodiment of the invention;

Figure 12 is the flow chart that detects the method for the playback segment in the video in another embodiment of the invention;

Figure 13 is the flow chart that detects the method for the playback segment in the video in another embodiment of the invention;

Obtain the flow chart of the step of camera lens content type in the method for Figure 14 for the playback segment in the detection video shown in Figure 13;

Figure 15 is the flow chart that detects the method for the playback segment in the video in another embodiment of the invention;

Figure 16 is the flow chart that detects the method for the playback segment in the video in another embodiment of the invention;

Figure 17 is the schematic diagram of the probability of cause relational network that makes up in execution mode shown in Figure 16;

Figure 18 is the structure chart of giving an example of the computing equipment of the method and apparatus of the playback segment that can be used for implementing detection video of the present invention;

Figure 19 to 29 is the flow chart of each execution mode of the equipment of the playback segment in the detection video of the present invention.

Embodiment

Embodiments of the invention are described with reference to the accompanying drawings.Element of describing in an accompanying drawing of the present invention or a kind of execution mode and feature can combine with element and the feature shown in one or more other accompanying drawing or the execution mode.Should be noted that for purpose clearly, omitted the parts that have nothing to do with the present invention, those of ordinary skills are known and the expression and the description of processing in accompanying drawing and the explanation.

Detect the method for the playback segment in the video

First execution mode

The applicant finds that the camera lens at playback segment two ends switches the general similar switch mode that adopts, and for example all wipes, and perhaps all is the sign transition, or the like.Therefore,, then can utilize this similitude to come, determine that whether or much probability are arranged is playback segment it video segment classification if can discern or characterize the similitude of the camera lens switch mode at certain video segment two ends.

Further, the applicant has found a kind of means of new sign camera lens switch mode similitude.Its principle is described in detail in detail below.

We know, can utilize motion vector to characterize the motion of object between frame of video, and specifically, motion vector is that object in the present frame is with respect to the displacement of the corresponding object in the reference frame.Depend on concrete application, described object can be the image block that shape, size all can change.Generally, the frame picture is divided into the impartial rectangular image piece of size.Image block may diminish to a pixel.Hereinafter, when relating to image block, if not otherwise specified, should comprise the situation of pixel.

In the prior art, there is multiple motion vector search method can be used for searching for motion vector in the video, for example searches for, utilize the full search of center-biased characteristic, rhombus search and the hexagon search etc. fast fast that utilizes the gradient descent method entirely.But the motion vector that these motion vector search methods searched may not be reliably, that is to say and can not move really by reflection object.If motion vector has accurately been described corresponding to the position in the true zone of piece and moved, then it is reliable.In general application, insecure motion vector is harmful to.But the applicant finds that the distribution of this unreliable motion vector itself is clocklike.For example in the transfer process of camera lens, the distribution of unreliable motion vector can reflect the pattern of conversion.

The appearance of unreliable motion vector has reason.As shown in Figure 1, show the example of the several image blocks in present frame (Figure 1B) and the reference frame (Figure 1A) respectively.White rectangle is the image block corresponding to black rectangle that searches, and the displacement between them is exactly a motion vector.As shown in the figure, unreliable motion vector mainly appears under the following situation: the blocking of prospect (being ball here) in A, the motion; The motion of B, non-rigid objects (being face and arm segment here); C, level and smooth texture (being the ground of competition area here); D, unidirectional texture (being ground lines here); The texture that repeats on E, the space (being ground dotted line here).In addition, the sudden change of content also is the reason that produces unreliable motion vector, and for example, wiping is equivalent to blocking of the prospect that produces, and the border of wiping then may form unidirectional texture, or the like.

Fig. 4 shows the example of the motion vector that the frame search in some camera lens conversions is gone out.Among the figure, blockage is the image block that is used for searching motion vector.The motion vector that blank square frame of black and black are drawn the represented image block of fork square frame is insecure, and the motion vector of image block in addition (representing with white box) is reliable.Can see that the distribution of unreliable motion vector (succinct in order to compose a piece of writing, hereinafter, mention the distribution of certain motion vector, be equivalent with the distribution of the image block of mentioning this kind motion vector) is associated with the pattern that camera lens is changed.For example, in the wiping of Fig. 4 A, unreliable motion vector mainly appear at new and old picture intersection and near.

Whether therefore, can utilize the reliability of motion vector at the video segment two ends of camera lens translative mode (and then having reflected its similitude) that can reflecting video fragment two ends to come candidate segment is that playback segment is classified.

Therefore, as shown in Figure 2, propose a kind of method that detects the playback segment in the video, comprised first motion vector reliability obtaining step 202 and the classification step 206.

In the first motion vector reliability obtaining step 202, obtain the reliability of the motion vector at least one pair of frame of candidate segment two ends.Obtaining of the reliability of motion vector can realize with multiple means.For example, can utilize preprocessed data, promptly read data about the motion vector reliability from the outside to video.Also can be to carry out motion-vector search and motion vector reliability classification without pretreated candidate segment, the two all can utilize any existing and realization of technology in the future.Particularly, as previously mentioned, there are various prior aries can supply searching motion vector.The reliability classification of motion vector also has multiple prior art of supporting utilization, T.Yoshida for example, people's such as A.Miyamoto Reliability metric of motion vectors and its applications to motionestimation, Proc.SPIE, vol.2501, VCIP 1995, disclose measuring the motion vector reliability among the pp.799-809.The full content of the document is incorporated among the application by reference.

In classification step 206, whether playback segment is classified to candidate segment according to the reliability of described motion vector.This classification step can utilize any classifier technique that sample learning trains of passing through existing or in the future to realize.

In a kind of modification of present embodiment, as shown in Figure 3, can be afterwards in the reliability (step 202) of the motion vector that obtains the candidate segment two ends, in coupling fractional computation step 304 according to the matching degree of the camera lens translative mode at the distribution calculated candidate fragment two ends of the reliability of the motion vector at least one pair of frame at described candidate segment two ends.This matching degree can be represented with the coupling mark.Classification step 206 can directly be utilized this coupling mark (perhaps its variation), thinks that it has reflected that candidate segment is the probability of playback segment, according to the candidate segment of judging whether be (perhaps may for) playback segment.Certainly classification step 206 also can utilize any classifier technique of training by sample learning to realize.For the calculating of coupling mark in the step 304, can infer, for any two distributions of any variable, can design any suitable mode and calculate its matching degree.Matching degree includes but not limited to similitude.For example,, comprise the direction of wiping, can think that then matching degree should reflect the similitude that reliability distributes if identical pattern is adopted in wiping of playback segment two ends.But also might wiping of playback segment two ends adopt the pattern of symmetry, for example enter playback time and wipe from right to left, withdraw from playback time and wipe from left to right.Can think in this case and can reflect matching degree with symmetry.

In this application, at the identical situation of the camera lens translative mode at two ends, thereby the applicant has also proposed a kind of mode that similitude is calculated the coupling mark of calculating.

For the pair of lens conversion (representing) that clips candidate's playback segment, can calculate following formula as mating mark with GT1, GT2:

\max_{B} (Σ_{i &Element; {GT}_{1}, i + B &Element; {GT}_{2}} S (i, i + B) / N (B)) - - - (1)

Wherein, GT1 and GT2 represent the set of the frame at playback segment two ends respectively, i and i+B represent the numbering of frame, B represents relative vertical shift, it is the skew between a pair of corresponding frame among GT1 and the GT2, the right number of N (B) expression frame, promptly at certain B, make the i frame belong to the number that GT1 and i+B frame belong to the different i that GT2 is met, (i is to utilize i frame among the GT1 that the distribution of the reliability of motion vector calculates and the similitude (perhaps matching degree) between the i+B frame among the GT2 i+B) to S.Item in the outer bracket is that many similitudes to frame of being considered are asked average.

Physical meaning to the B maximizing in the top formula (1) is that GT1 and GT2 aligning mutually just can calculate accurate match mark the most.Which mate the most owing to can't judge i frame and the frame among the GT2 among the GT1 in advance, therefore by changing the position that the B value finds GT1 and GT2 to mate the most mutually.

If can obtain camera lens conversion GT1 and GT2 start-stop frame (detection about the camera lens conversion will be explained below) separately exactly by the detection of camera lens conversion, just can know the B value of coupling the most basically, needn't be in the then above-mentioned formula (1) to the B maximizing.Promptly use following formula as mating mark:

Σ_{i &Element; {GT}_{1}, i + B &Element; {GT}_{2}} S (i, i + B) / N - - - (2)

Wherein, N represents the number that frame is right.

Certainly, consider that camera lens when conversion picture changes rapidly, even and the deviation of a frame also might make the coupling mark be subjected to bigger influence, even in this case also can be to the B maximizing.

When calculating above-mentioned coupling mark,, can calculate at the some or all of of the frame between the start-stop frame if obtained GT1 and GT2 start-stop frame separately by the detection of camera lens conversion.If only a pair of coupling frame is calculated, then do not need to carry out aforementioned average, (i is i+B) as described coupling mark promptly to use similitude S.If fail or do not obtain the start-stop frame by the camera lens transition detection, then can be rule of thumb or the frame of at random choosing right quantity at the candidate segment two ends calculate described coupling mark.

As previously mentioned, the calculating of similitude can be carried out in any way.A kind of account form that the applicant proposes is as follows:

Wherein, i and j represent a pair of frame at described candidate segment two ends, and bi and bj represent in i frame and the j frame piece in order to calculating kinematical vector respectively, and their locus in frame are corresponding mutually, M _i(bi) and M _j(bj) represent the reliability of the motion vector of piece bi in i frame and the j frame and bj respectively, when the motion vector of this piece its value 1 during for the mistake match motion vector, otherwise value is 0, the quantity of piece in frame of # (block) expression.

In the formula (3), ∑ _BiM _i(bi) quantity of unreliable motion vector in the expression i frame, ∑ _BjM _j(bj) quantity of unreliable motion vector in the expression j frame, ∑ _BiM _i(bi) ^M _j(bj) corresponding piece is the right number of piece of unreliable motion vector in expression i frame and the j frame, and the # on the denominator (block) then is in order to eliminate what the influence of frame piecemeal.This formula has characterized the ratio of the unreliable motion vector and the unreliable motion vector sum of position correspondence in two frames, thereby can reflect the similitude of unreliable motion vector distribution in i frame and the j frame.

As previously mentioned, the camera lens translative mode at playback segment two ends also might be symmetrical.For this situation, still can calculate symmetry, as described similitude (in this application, do not distinguishing especially under the situation of " similitude " and " symmetry ", " similitude " comprises " symmetry ") with above-mentioned computing formula (3).Only selected as to make its locus in frame symmetrical in this moment i frame and j frame in order to the piece bi and the bj of calculating kinematical vector.The mode of choosing bi and bj symmetrically is consistent with the symmetry of playback segment two ends camera lens translative mode.For example, if the camera lens translative mode at playback segment two ends be respectively from tilt left to bottom right at the uniform velocity to wipe (not shown) and from the lower-left to upper right inclination at the uniform velocity wipe (for example shown in Fig. 4 E) otherwise or, then bi and the bj position in frame is laterally zygomorphic.

For many symmetric mean values to frame at described candidate segment two ends, computational methods are identical with the computational methods (formula (1) or (2)) of the mean value of similitude.

Simply wipe for some, symmetry on its space can change temporal symmetry into, for example, if the camera lens translative mode at playback segment two ends be respectively from left to right vertically at the uniform velocity wipe with vertical from right to left at the uniform velocity wipe (for example shown in Fig. 4 A, Fig. 4 D) otherwise or, then both can calculate symmetry, then according to formula (1) or (2) calculating mean value according to identical sequential; Also can calculate similitude (i.e. the last frame of first frame of first camera lens conversion and the conversion of second camera lens compares, and the rest may be inferred) according to formula (3), can be calculated as follows mean value then according to opposite sequential:

Σ_{i &Element; {GT}_{1}, B - i &Element; {GT}_{2}} S (i, B - i) / N - - - (2^{,})

Wherein, GT1 and GT2 represent the set of the frame at candidate segment two ends respectively, and i and B-i represent the numbering of frame, the skew between first frame that B represents GT1 and its corresponding frame in GT2, and N represents the number that frame is right.The implication of choosing of i and B-i is that first frame of first camera lens conversion is corresponding to the frame last of second camera lens conversion.Equally, can be similar to formula (1) asks the maximum of following formula (2 ') as final mean value at different B.

Certainly, also may exist corresponding frame spatially the symmetry, in time also the symmetry (being inverted sequence) situation, at this moment, then can calculate symmetry, and can utilize formula (2 ') calculating mean value, and also can be at different B maximizings as final mean value.

As another kind of modification, the calculating and the symmetric calculations incorporated of similitude can be got up.Can calculate the symmetry of similitude and various different directions, choose maximum then as the coupling mark.

Mentioned the detection of camera lens conversion above.The detection of camera lens conversion can be carried out with any technology existing and future.J.Yuan for example, people's such as H.Wang A formal study of shotboundary detection.IEEE Trans.CSVT, vol.17, no.2, pp.168-186,2007 disclose a kind of camera lens change detection method that uses the field color histogram feature.The full content of the document is incorporated herein by reference.Thereby embodiments of the present invention also can directly use passed through the preliminary treatment mark wherein the camera lens conversion and the video segment of start-stop frame.

In addition, the handled candidate segment of embodiments of the present invention is provided by the preliminary treatment to video, and the result that also can be based on the camera lens conversion obtains.A kind of the simplest mode is that the part between any two camera lenses conversion is classified as a candidate segment (can be referred to as camera lens conversion pairing).Depend on different application, can be to the camera lens conversion of determining candidate segment to being limited raising the efficiency, rather than use any camera lens conversion.Specifically, the conversion of the camera lens of the video of different field has different characteristics, can carry out camera lens conversion pairing according to this characteristics.

For example, for sports video, as shown in Figure 5, playback segment generally all is clipped between the camera lens conversion (GT) of two gradual-change modes.Like this, when carrying out camera lens conversion pairing, can only change pairing and obtain candidate segment the camera lens of all gradual-change modes.

For sports video, as shown in Figure 5, generally also have following characteristics:

---if certain playback segment is made of a plurality of camera lenses, then one of following two kinds of situations may occur: 1) all inner shot transition all are gradual change (GT), 2) all internal lens transition all are shear (cut).

---in the adjacent normal play fragment of playback segment, have only shear to be used as the shot transition mode;

---the number of shots of playback segment is less than the number of shots of adjacent normal play fragment.

The camera lens conversion it is also conceivable that one or more in the These characteristics when matching.

Second execution mode

In first execution mode of Miao Shuing, utilize insecure motion vector to come candidate segment is classified in front.The applicant generally is divided into motion vector reliably with insecure still accurate inadequately through discovering.In fact, in conjunction with as described in Fig. 1, cause that motion vector is unreliable a multiple reason (as the A among Fig. 1 to shown in the E) as preamble.In the present embodiment, the applicant proposes further never to distinguish a kind of unreliable type in the reliable motion vectors, with its called after mistake coupling.

For example, in various situations shown in Figure 1, A situation and B situation can be mated as mistake.But the title itself that it should be noted that " mistake coupling " is not intended to the in addition any restriction of its implication, and is not limited to A and two kinds of situations of B, other any A of being similar to, B or also belong to " mistake is mated " type with situation that A, B have an inherent general character.For example, the sudden change of content is the another kind of possible reason of mistake match motion vector.All the other situations shown in Fig. 1 (comprising C, D and E) can called after " many couplings " type.The same with " mistake coupling ", the name itself of " many couplings " does not constitute any restriction to its implication, and is not limited to C, D, these several situations of E.In addition, all motion vectors except that the mistake coupling can be called non-mistake match motion vector.

Shown in Fig. 4 A-4C, in sign transition (animation is wiped), the variation of sign and/or mobile very fast.Therefore, owing to suddenling change and blocking, the motion vector of the piece in the mark region has very high possibility mistake coupling (black is drawn the fork square frame).In addition, shown in Fig. 4 D-4E, wipe (wipe) that usually use before playback segment and afterwards also can characterize well with the distribution of mistake match block.Therefore, the conversion of the gradual-change mode camera lens of model identical has the time and the spatial distribution of closely similar mistake match block, although original video content may be different fully.If the conversion of the camera lens of playback segment head and the tail has symmetry, the symmetry in the also free and/or space of the distribution of match block by mistake then is as described in the first embodiment.

Therefore, the unreliable motion vector in first execution mode is replaced with the mistake match motion vector, will obtain better effect.For for purpose of brevity, no longer be repeated in this description at this.

In order to verify the ability of mistake match motion vector differentiation playback segment and non-playback segment, selected 975 candidate segment to test, wherein 632 is playback segment, 343 is non-playback segment, generates two histograms.As shown in Figure 6, for the high candidate segment of mark, probably be real playback segment (black block diagram).The coupling mark of non-playback segment (white block diagram) then concentrates on below 0.1.For the lower playback segment of coupling mark, then be because the camera lens translative mode at two ends changes to some extent.

Present embodiment can directly be obtained as the motion vector reliability of video segment being carried out pretreated result and be used for the video segment classification, and not necessarily wants the reliability of the motion vector of Direct Recognition video segment.The applicant is examining the method and apparatus that Chinese patent application No.200910119520.3 and No.200910119521.8 have put down in writing identification motion vector reliability, and the full text of these two applications is by quoting and all be incorporated into this herein.

Below a kind of RM of mistake match motion vector is made a brief description.

Classify for reliability, can use the feature and the stable feature of sign interframe coupling of the interior texture strength of frame of characterization block motion vector.

Generally, by being that the reference block of motion vector mates and carries out estimation with having diverse location skew with current block.Algorithm can be selected the motion vector as current block corresponding to the motion vector MVbest of smallest match remnants.Not only by the remnants reflection at motion vector MVbest place, also the remnants by near motion vector reflect the coupling stability of this motion vector.Therefore, the coupling of piece can be considered as and will be mapped to the function of residual value with respect to the motion vector skew of motion vector MVbest, that is to say that along with motion vector skew MVbest (just corresponding reference block departs from the reference block of MVbest), residual value can change.Functional relation between residual value and the described skew has promptly reflected interframe movement vector matching stability.

Described residual value can characterize with multiple mode.For example, can characterize, also can characterize with absolute value and (SAD, the Sum ofAbsolute Difference) of described difference with the quadratic sum (SSD, Sum of Square Difference) of the difference of respective pixel values between current block and the reference block.

Motion vector

Near motion vector MV (MV _H, MV _V) with respect to the residual value of reference block (with square error and Be example) can be well modeled in order to minor function:

aX′ ²+bY′ ²+cX′Y′+d＝SSD(MV _H，MV _V) (4)

\{\begin{matrix} X^{'} = X \cos θ + Y \sin θ \\ Y^{'} = - X \sin θ + Y \cos θ \end{matrix} - - - (5)

\{\begin{matrix} X = {MV}_{H} - M V_{H}^{best} \\ Y = {MV}_{V} - M V_{V}^{best} \end{matrix} - - - (6)

Wherein, a, b, c, d are coefficient, and subscript H and V represent the level and the vertical component of motion vector respectively, and X, Y represent near the skew of the motion vector MV of motion vector MVbest with respect to MVbest, and the θ angle is:

θ = \underset{φ}{\arg \min} {R (φ)}, φ &Element; [0, π) - - - (7)

Wherein, the grain direction in the current block is represented at the θ angle.Can obtain described grain direction with multiple technological means existing or future.For example at T.Yoshida, people's such as A.Miyamoto Reliability metric of motion vectors and its applications to motionestimation, Proc.SPIE, vol.2501, VCIP 1995, defined the texture function R that characterizes the direction texture strength of current block among the pp.799-809.The full content of the document is incorporated herein by reference.What preamble was quoted has also put down in writing how to obtain grain direction at careful Chinese patent application No.200910119520.3 and No.200910119521.8, and its full content is incorporated herein by reference.

Above-mentioned coefficient a, b, c, d can be used for characterizing the reliability of motion vector.Use near the residual value of a plurality of motion vectors of MVbest, can obtain described coefficient value by surface fitting.For example, can use with MVbest and carry out the least square method surface fitting as the SSD value of 13 motion vectors at center.The maximin that utilizes above-mentioned coefficient and texture function R can effectively be classified to the reliability of motion vector as feature.For example, can motion vector be divided into as previously described reliably, miss coupling, many match-types, perhaps other types with any grader of training by sample learning.Can also can use grader in man-to-man mode with a grader simultaneously to the motion vector classification of a plurality of types, promptly grader is responsible for the motion vector classification to a type.Described grader can be SVMs (SVM), can certainly be other graders.

Above-mentioned motion vector reliability classification algorithm is trained and assess with the data set that comprises 44160 pieces (138 frame to), wherein, the ratio of reliable, mistake coupling and many match block is approximately 60%, 30% and 10%.In 5 times cross validation tests, reached 88.2% overall precision ratio.

The 3rd execution mode

In the 3rd execution mode, propose at first to obtain the motion feature (step 702) of video camera, utilize the camera motion feature to come to the candidate segment classification identification playback segment (classification step 206) then.

In playback segment, the motion feature of video camera probably is different from the camera motion feature in the normal play fragment.This has many-sided reason.At first, therefore the excellent often camera lens of playback segment, is compared with normal play, and playback segment often is not long distance or wide-angle picture, but the low coverage or the picture of looking in the distance.Therefore, in order to follow the rapid movement of object (for example ball among Figure 10 C), the motor pattern of video camera when the camera motion pattern usually is different from normal play.In addition, playback segment might be slow motion, also might be quick playback, the motion feature when both of these case all can cause surperficial being seen camera motion feature to be different from normal play (slow down or quicken).Therefore, from every aspect, camera motion can perform well in discerning playback segment.

Exist a lot of technology existing or future to be used to detect camera motion.The Iterative least squares and compression based estimationsfor a four-parameter linear global motion model and global motioncompensation of G.B.Rath and A.Makur for example, IEEE Trans.CSVT, vol.9, no.7, pp.1075-1099,1999 disclose and utilize interative least square method to detect camera motion.The full content of the document is incorporated herein by reference.In the present embodiment, can be from camera motion information be obtained in the preliminary treatment of video segment.

Camera motion comprises pan (pan, be presented as the move left and right of picture), (zoom is presented as the convergent-divergent of picture for pitching (tilt is presented as moving up and down of picture), convergent-divergent, be pixel away from or convergence picture center) and rotation (θ is presented as the rotation of picture).With the picture center is coordinate origin, can characterize camera motion with following formula, wherein (x, y), (x ', y ') be respectively the coordinate (supposition picture center be the origin of coordinates) of pixel before and after camera motion:

[\begin{matrix} x^{'} \\ y^{'} \end{matrix}] = zoom \cdot [\begin{matrix} \cos θ & - \sin θ \\ \sin θ & \cos θ \end{matrix}] \cdot [\begin{matrix} x \\ y \end{matrix}] + [\begin{matrix} pan \\ tilt \end{matrix}] - - - (8)

Can use any partly or completely statistical nature in the above-mentioned camera motion to come candidate segment is classified.Described statistical nature for example is camera motion between each frame mean value, absolute value mean value, variance or the like of (promptly with respect to the time).As additional or substitute, also can classify to candidate segment with the various statistical natures of the acceleration of camera motion.The acceleration of camera motion can obtain by the corresponding sports parameter of two frames is subtracted each other.

In addition, as an alternative or additional, any statistical nature that can also use local motion and/or its acceleration is as the feature that is used to classify.Local motion can obtain by deduct camera motion (being global motion) with motion vector.Any statistical nature that can also use motion vector itself (not only comprising video camera/global motion but also comprise local motion) and/or its acceleration certainly is as the feature that is used to classify.Above-mentioned various feature can combination in any.

As mentioned before, candidate segment may comprise a more than camera lens.In this case, as shown in Figure 8, can utilize said method earlier, discern it whether or much probability are arranged is speed change playback camera lens (step 802) shot classification, then based on each shot classification result's statistical nature, to candidate segment classify (step 206).For example, each shot classification result's statistical nature can be that each camera lens is the confidence level and/or the mean value of speed change playback camera lens, perhaps, under the situation of step 802 for two-value output, described statistical nature can be the shared ratio (ratio of number, the ratio of time) of speed change playback camera lens in all camera lenses of candidate segment, or the like.

The same with first execution mode with second execution mode, in classification step 206, can classify to candidate segment with any grader of training by sample learning.In the present embodiment, described grader can be that (Linear Discriminant Analysis, LDA) grader can certainly be other graders to linear discriminant analysis.

Shown in the dotted portion among Fig. 7, Fig. 8, present embodiment can also combine with first execution mode or second execution mode and any concrete modification thereof.In this case, classification step 206 can be used identical or dissimilar graders, classify at first or second execution mode and the 3rd execution mode respectively, classification results is given comprehensively with any appropriate mode, thereby whether the identification candidate segment is playback segment.Classification step 206 also can be used same grader, utilizes the feature that is obtained in first or second execution mode and the 3rd execution mode to classify.This will be described in detail in the 7th execution mode.

The 4th execution mode

In the above-described 3rd embodiment, can utilize any technology for detection camera motion.But normally carry out in the prior art, the estimation of camera motion based on all motion vectors.Obviously, the existence owing at previously discussed unreliable motion vector is inaccurate based on all estimation of motion vectors camera motions.

For example, as shown in figure 10, the most of motion vector among Figure 10 A and Figure 10 C all is insecure, and it is reliable having only small part motion vector (shown in the white box line).Obviously, if utilize all motion vectors to estimate camera motion, the result will be very inaccurate.

Therefore, in the present embodiment, as shown in Figure 9, at first obtain the reliability (step 902) of motion vector in the candidate segment, only the service-strong motion vector is estimated camera motion (step 904), and then as calculating camera motion feature (step 906) as described in the 3rd execution mode.

Obtaining at the existing detailed description of preamble of motion vector reliability no longer repeated at this.

According to the camera motion that present embodiment detected is reliable.For example, shown in Figure 10 B, 10D, be respectively after utilizing reliable motion vectors to obtain camera motion, Figure 10 A and Figure 10 C to have been carried out camera motion compensation result afterwards by aforementioned interative least square method.Can see that the notable difference between the present frame after reference frame and the process motion compensation only appears at the mark region of prospect and stack.This shows that the estimation of video camera is accurately.

According to a kind of modification of present embodiment, consider the reliability of motion vector when not only estimating camera motion, when calculating the camera motion feature, also consider the reliability of motion vector, shown in the dotted arrow among Fig. 9.For example, the camera motion that utilizes the few frame of reliable motion vectors to estimate is obviously credible not as the camera motion that utilizes the many frames of reliable motion vectors to estimate.Therefore, at the motion feature that calculates video camera for example during the time average of absolute value (being interframe mean value), can be with the reliable motion vectors ratio of each frame weight as the camera motion absolute value of this frame.

In addition, when the statistical nature of calculating kinematical vector and/or local motion and/or their acceleration as described in the 3rd execution mode, can only use reliable motion vectors similarly, and/or can be at the time of counting statistics feature (interframe) mean time with reliable motion vectors quantity weight as every frame.

Further, consider when detecting camera motion, in convergence process, can give up a part of motion vector (promptly being reliable motion vectors), therefore with interative least square method, when calculating the camera motion feature, described weight is preferably used the ratio of final adopted motion vector.

When calculating the statistical nature of acceleration as previously mentioned, because the acceleration of each frame is the motion vector of front and back two frames or local motion or global motion to be subtracted each other obtain, so its reliability is by the lower frame decision of confidence level.Therefore, in this case, when the time average of counting statistics feature, the smaller value before and after the weight of each frame can be got in two frames.

About shot classification step, candidate segment classification step and employed grader, can no longer repeat at this referring to the 3rd execution mode.

The example of part statistical nature discussed above when having listed below to shot classification, but obviously present embodiment is not limited to following example.

The time-weighted average of camera motion:

\underset{i &Element; Shot}{Σ} p_{i}^{Global} \cdot | {Pan}_{i} | / \underset{i &Element; Shot}{Σ} p_{i}^{Global} - - - (9)

\underset{i &Element; Shot}{Σ} p_{i}^{Global} \cdot | {Tilt}_{i} | / \underset{i &Element; Shot}{Σ} p_{i}^{Global} - - - (10)

\underset{i &Element; Shot}{Σ} p_{i}^{Global} \cdot | θ_{i} | / \underset{i &Element; Shot}{Σ} p_{i}^{Global} - - - (11)

\underset{i &Element; Shot}{Σ} p_{i}^{Global} \cdot | \log ({Zoom}_{i}) | / \underset{i &Element; Shot}{Σ} p_{i}^{Global} - - - (12)

Wherein, i is the numbering of frame, the camera lens that shot indicates to classify,

The used shared ratio of motion vector of detection of representing the global motion (being camera motion) of i frame.

The time-weighted average of camera motion acceleration:

\frac{\underset{i &Element; Shot, i + 1 &Element; Shot}{Σ} \min (p_{i}^{Global}, p_{i + 1}^{Global}) \cdot | {Pan}_{i} - {Pan}_{i + 1} |}{\underset{i &Element; Shot, i + 1 &Element; Shot}{Σ} \min (p_{i}^{Global}, p_{i + 1}^{Global})} - - - (13)

\frac{\underset{i &Element; Shot, i + 1 &Element; Shot}{Σ} \min (p_{i}^{Global}, p_{i + 1}^{Global}) \cdot | {Tilt}_{i} - {Tilt}_{i + 1} |}{\underset{i &Element; Shot, i + 1 &Element; Shot}{Σ} \min (p_{i}^{Global}, p_{i + 1}^{Global})} - - - (14)

\frac{\underset{i &Element; Shot, i + 1 &Element; Shot}{Σ} \min (p_{i}^{Global}, p_{i + 1}^{Global}) \cdot | θ_{i} - θ_{i + 1} |}{\underset{i &Element; Shot, i + 1 &Element; Shot}{Σ} \min (p_{i}^{Global}, p_{i + 1}^{Global})} - - - (15)

\frac{\underset{i &Element; Shot, i + 1 &Element; Shot}{Σ} \min (p_{i}^{Global}, p_{i + 1}^{Global}) \cdot | \log ({Zoom}_{i}) - \log ({Zoom}_{i + 1}) |}{\underset{i &Element; Shot, i + 1 &Element; Shot}{Σ} \min (p_{i}^{Global}, p_{i + 1}^{Global})} - - - (16)

Meaning of parameters is the same.

In addition, as shown in figure 11, the statistical nature of reliability type that can also use motion vector is as the feature that candidate segment or camera lens are classified.Promptly in obtaining candidate segment, increase a step 1102 of obtaining motion vector reliability statistics feature after the step 902 of the reliability of motion vector.Equally, the reliability of motion vector can be meant that motion vector is reliable or unreliable, also can be meant reliable, the mistake coupling of motion vector and mate three types more, perhaps other types.

For example, we can use interframe (time) mean value of the ratio of reliable motion vectors and/or mistake match motion vector and/or many match motion vector and/or the used motion vector of detection global motion.Certainly, reliable motion vectors, mistake match motion vector, many match motion vector constitute all motion vectors, are correlated with, and therefore at most only need to use simultaneously two types statistical nature wherein.

In addition, the front has also been discussed the less motion vector ratio of using in two frames and is come interframe camera motion acceleration is weighted.Therefore to detect the smaller value of ratio of the motion vector of camera motion also be a kind of feature being used in adjacent two frames, can with its for example time average as feature to camera lens/candidate segment classification:

\underset{i &Element; Shot, i + 1 &Element; Shot}{mean} (\min (p_{i}^{Global}, p_{i + 1}^{Global})) - - - (17)

The specific implementation of classification step and present embodiment combine with first or second execution mode, with the 3rd execution mode be similarly, in this no longer repeat specification.

Can notice that the step 202 of reliability of the motion vector of at least one pair of frame that obtains the candidate segment two ends is arranged in first, second execution mode.The step 902 that the reliability of the motion vector that (comprises the camera lens in the candidate segment) in the candidate segment obtained is arranged in the 3rd, the 4th execution mode.The main distinction of these two steps is the object difference handled, what step 202 was obtained is the reliability that possible change the motion vector of relevant transition frames with camera lens at candidate segment two ends, step 902 is obtained is the reliability of motion vector of the body matter frame (preferably not comprising the transition frames that is used for the camera lens conversion) of candidate segment or camera lens, obtain the mode of motion vector as for it, then can be identical, also can be different.

Therefore, in a kind of modification of present embodiment, as shown in figure 12, first or second execution mode is combined with present embodiment, and the step 1202 of shared same motion vector reliability.Just, in the present embodiment, can be earlier the reliability of the motion vector of all frames of candidate segment be classified, detect step 904 and/or camera motion feature calculation step 906 and/or motion vector reliability statistics feature obtaining step 1102 at follow-up coupling fractional computation step 304 and/or camera motion respectively then and suitably select for use.

The 5th execution mode

Content also is to judge whether candidate segment is one of useful clue of playback segment.For example, depend on whether be the content of sports genre, the candidate segment whether probability of playback segment is different.In the present embodiment, as shown in figure 13, at first obtain the content type (step 1302) of the camera lens in the candidate segment, then according to the distribution characteristics of camera lens content type to candidate segment classification (step 206).

Except by directly obtaining the content type, there is multiple prior art can detect content type (for example studio scene, business scenario, physical culture scene or the like) from preliminary treatment result to video.For example, the Automatic Video Classification:ASurvey of the Literature of D.Brezeale and D.J.Cook, IEEE Trans.SMC-PART C, vol.38, no.3, pp.416-430,2008 to have put down in writing based on color, based on camera lens and based drive feature be the effective visible sensation feature of video content classification.D.Comaniciu and P.Meer, a robustapproach toward feature space analysis of Mean shift, IEEETrans.PAMI, vol.24, no.5, pp.603-619,2002) also put down in writing the feature that main color-ratio and lens length can be used as the video content classification.For the classification of camera lens content type, can adopt usually SVMs (SupportVector Machine, SVM) as grader, can certainly use other any existing or in the future pass through the grader that sample learning is trained.

When having only a camera lens in candidate's sheet fragment, the camera lens content type can be directly as the feature that candidate segment is classified.When in the candidate segment a plurality of camera lens being arranged, in classification step 206, utilize the statistical nature of camera lens content type that candidate segment is classified.Described statistical nature for example is the ratio of the camera lens of various content types, for example the ratio of the camera lens of sports genre.This ratio can be the ratio of number of shots, the perhaps ratio of time length, or the like.Described statistical nature can also be confidence level and/or the mean value of each camera lens for the camera lens (for example physical culture camera lens) of certain content type, or the like.

In order to make feature stable on statistics, in a kind of modification of present embodiment, as shown in figure 14, with one group a plurality of (for example 7) continuous camera lens as the elementary cell of classifying.That is to say, be equivalent to provide a window with the number of shots definition, is that unit travels through described candidate segment with the camera lens with this window, thereby the lens group in each window is considered as a camera lens classify (step 1402).Like this, each camera lens can relate in a plurality of groups in succession, thereby obtains a plurality of content type classification results.At last, merge, thereby obtain the content type classification result (step 1404) of each camera lens at the content type classification result of each camera lens lens group that it is related.

The same with first to the 4th execution mode, in classification step 206, can classify to candidate segment with any grader of training by sample learning.

Shown in the dotted portion among Figure 13, present embodiment can also combine with first execution mode or second execution mode and any concrete modification thereof.In this case, classification step 206 can be used identical or dissimilar graders, classify at first or second execution mode and the 5th execution mode respectively, classification results is given comprehensively with any appropriate mode, thereby whether the identification candidate segment is playback segment.Classification step 206 also can be used same grader, utilizes the feature that is obtained in first or second execution mode and the 5th execution mode to classify.Similarly, present embodiment can also combine (not shown) with the 3rd or the 4th execution mode, perhaps can also be with first or the second, and the 3rd or the 4th execution mode combines (not shown), this will be described in detail in the 7th execution mode.

The 6th execution mode

The obtain manner of candidate segment has been described in the first embodiment.In brief, candidate segment can be the fragment between any two camera lenses conversion, but more is to match and the candidate segment that obtains according to certain rule.For example, for sports video, first execution mode has been discussed some characteristics of carrying out camera lens conversion pairing according to this.Certainly, for the video of other types, may have the characteristics of similar or corresponding (for example opposite).

From the characteristics that first execution mode is put down in writing, whether playback segment is also relevant with the number of shots in the candidate segment for candidate segment.Therefore, in the present embodiment, as shown in figure 15, at first obtain the number of shots (step 1502) in the candidate segment, then based on this number of shots to candidate segment classification (step 206).

The same with first to the 5th execution mode, in classification step 206, can classify to candidate segment with any grader of training by sample learning.

Shown in the dotted portion among Figure 15, present embodiment can also combine with first execution mode or second execution mode and any concrete modification thereof.In this case, classification step 206 can be used identical or dissimilar graders, classify at first or second execution mode and the 6th execution mode respectively, classification results is given comprehensively with any appropriate mode, thereby whether the identification candidate segment is playback segment.Classification step 206 also can be used same grader, utilizes the feature that is obtained in first or second execution mode and the 6th execution mode to classify.Similarly, present embodiment can also combine with the 3rd or the 4th execution mode and/or the 5th execution mode (not shown), perhaps can also with first or second, third or the 4th execution mode and the 5th execution mode combine (not shown), this will be described in detail in the 7th execution mode.

The 7th execution mode

In first to the 6th execution mode, discussed respectively based on camera lens translative mode coupling mark, speed change playback camera lens statistical nature, camera lens content type statistical nature and the number of shots at candidate segment two ends candidate segment has been classified, and some combination of above-mentioned feature.

In fact, each edit effect, such as gradual-change mode, speed change playback (comprising slow motion), the content type (for example physical culture scene) of lens construction, camera lens conversion, can both the middle generation of resetting of reflecting video (for example sports video).Therefore, aforementioned feature can combination in any, but the equal combination in any of first to the 6th execution mode and their all modification.This combination is to realize by final classification step 206.Figure 16 illustrates a kind of with above-mentioned four kinds of execution modes that characteristic synthetic gets up.For each Feature Extraction, be the same with first to the 6th execution mode, no longer repeat at this.Do an explanation at classification step 206 below.Should be appreciated that following explanation all is suitable for for the combination in any of combination in any, first to the 6th execution mode and any modification thereof of above-mentioned four kinds of features.

Described in former each execution mode, classification step 206 can be used identical or dissimilar graders, classify at each execution mode respectively, classification results is given comprehensively (for example suing for peace or weighted sum with any appropriate mode, or the like), thereby whether the identification candidate segment is playback segment.Classification step 206 also can be used same grader, utilizes the feature that is obtained in each execution mode to classify.

In a kind of execution mode that the applicant provides, recognize that in fact above each feature constitutes a kind of probability of cause relational network.In fact, for video content and edit format thereof, there are various causal relations that the feature of its each side is linked together.Take all factors into consideration the main aspect of probability of cause relation and the efficient of processing, can choose wherein the main probability of cause and concern and constitute a fairly simple probability of cause relational network, so that can carry out significant processing.Figure 17 promptly shows an example of such probability of cause relational network.In Figure 17, have five nodes: " playback segment whether? " the probability of node 1702 (perhaps " being, denying " the two-value judgement) is the reasoning target, and described four features constitute other four nodes: " camera lens translative mode coupling mark " node 1704, " camera lens content type statistical nature " node 1706, " speed change playback camera lens statistical nature " node 1708 and " number of shots " node 1710.Each directed edge is represented two dependences between the node.

According to above-mentioned probability of cause relational network, can make up probability of cause relational network grader, BAYESIAN NETWORK CLASSIFIER for example obtains the conditional probability parameter of each directed edge from training data study.This grader that utilization trains is classified in the present embodiment.As previously mentioned, the node of this probability of cause relational network can reduce, and can certainly increase other nodes; The directed edge of expression probability of cause relation can reduce, and also can increase (for example between node 1704 and the node 1708); Described directed edge is illustrated as two-way in Figure 17, but they also can be unidirectional.Obviously, after probability of cause relational network changes, need train again grader.

Probability of cause relational network grader, BAYESIAN NETWORK CLASSIFIER has multiple different structure available in the prior art specifically, does not repeat them here.In addition, also have multiple additive method and grader to can be used to classify, for example decision tree (Decision Tree).Can also use some simple heuristic (such as threshold method), especially work as number of nodes more after a little while.

It should be noted that when each feature mutually combined that some technical characterictic in aforementioned first to the 6th each execution mode can replenish or the phase trans-substitution shared, mutually.For example, as previously mentioned, the motion vector reliability obtaining step 1202 of common use can be arranged before camera lens translative mode coupling fractional computation step 304 and camera motion feature obtaining step 702.And for example, when obtaining the camera lens content type separately, the movable information statistical nature in the camera lens also can be used as the feature to content type classification.But when combining,, can not use movable information during therefore to the camera lens content type classification because the latter two also use movable information with camera lens translative mode coupling mark feature and/or speed change playback camera lens statistical nature.For another example, thereby, be the precondition of obtaining above-mentioned all features, can be used as common preliminary treatment result and obtain, perhaps carry out common preliminary treatment from the outside for the detection of camera lens conversion, to gradual-change mode pairing acquisition candidate segment.

For verifying validity of the present invention, carried out test widely at 24 sections various sports videos (about 40 hours) (comprising football, basketball, vollyball, beach volleyball, tennis, table tennis, shuttlecock, hockey, swimming, diving, track and field, weight lifting, boxing, judo etc.), wherein 16 sections are used as training, and remaining 8 sections are used as test data.Data are MPEG-2 forms that the television broadcasting within a year is recorded.

Automatically detect the camera lens conversion in the test data, the camera lens conversion of the gradual-change mode in the manual markings training data is to manifest the validity of camera lens conversion gradual-change mode coupling.In the training data that comprises 690 playback segment, the gradual-change mode matching process has been realized 97.7% recall ratio and 68.4% precision ratio.In the test data that comprises 295 playback segment, recall ratio and precision ratio are respectively 90.5% and 71.4%, can see that recall ratio is compared with training data decreases, this is because the detection of the gradual-change mode camera lens conversion in the training data is to carry out automatically, not manual carrying out, so accuracy rate descends to some extent.

For the detection of camera lens content type, be that example is tested to detect the physical culture camera lens, all camera lenses (5143 physical culture camera lenses and 3118 non-physical culture camera lenses) of test video have been realized 91.8% precision ratio.Use by the candidate segment of camera lens translative mode match selection speed change playback Shot Detection (being example with the slow motion detection) is assessed.In the test set that 208 slow motions and 436 normal speed camera lenses are formed, utilize 1074 slow motions in the training set and the LDA grader of 1057 normal speed camera lenses training to realize 81.9% precision ratio.

No matter be camera lens conversion gradual-change mode coupling, still the speed change playback detects, and independent use all can not successfully be eliminated by gradual-change mode and match selected non-playback candidate segment.And use present embodiment to propose probability of cause relational network, finally can realize 87.1% recall ratio and 88.3% precision ratio.This result is gratifying for the automatic on-line system.In addition, from camera lens conversion gradual-change mode matching result as can be seen, the error that gradual-change mode detects has caused recall ratio significantly to reduce, and therefore, if strengthen the gradual-change mode detection algorithm, performance can be brought up to more than 90%.

Detect the equipment of the playback segment in the video

Figure 18 illustrates the structure of giving an example of computing equipment of the equipment of the playback segment that can be used for realizing detection video of the present invention.

In Figure 18, CPU (CPU) 1801 carries out various processing according to program stored among read-only memory (ROM) 1802 or from the program that storage area 1808 is loaded into random-access memory (ram) 1803.In RAM 1803, also store data required when CPU 1801 carries out various processing or the like as required.

CPU 1801, ROM 1802 and RAM 1803 are connected to each other via bus 1804.Input/output interface 1805 also is connected to bus 1804.

Following parts are connected to input/output interface 1805: importation 1806 comprises keyboard, mouse or the like; Output 1807 comprises display, such as cathode ray tube (CRT) display, LCD (LCD) or the like and loud speaker or the like; Storage area 1808 comprises hard disk or the like; With communications portion 1809, comprise that network interface unit is such as LAN card, modulator-demodulator or the like.Communications portion 1809 is handled such as the internet executive communication via network.

As required, driver 1810 also is connected to input/output interface 1805.Detachable media 1811 is installed on the driver 1810 as required such as disk, CD, magneto optical disk, semiconductor memory or the like, makes the computer program of therefrom reading be installed to as required in the storage area 1808.

Can from network such as internet or storage medium such as detachable media 1811 installation procedure to computing equipment.

It will be understood by those of skill in the art that this storage medium is not limited to shown in Figure 180 wherein having program stored therein, distribute separately so that the detachable media 1811 of program to be provided to the user with equipment.The example of detachable media 1811 comprises disk (comprising floppy disk), CD (comprising compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto optical disk (comprising mini-disk (MD) (registered trade mark)) and semiconductor memory.Perhaps, storage medium can be hard disk that comprises in ROM 1802, the storage area 1808 or the like, computer program stored wherein, and be distributed to the user with the equipment that comprises them.

To the various embodiments of the equipment of the playback segment in the detection video of the present invention be described in detail below.Wherein, when the aspect that related in the description that relates in front the method that detects the playback segment in the video, will no longer be repeated in this description for brevity.

The 8th execution mode

Present embodiment is corresponding to first execution mode and second execution mode.

As shown in figure 19, propose a kind of equipment that detects the playback segment in the video, comprised first motion vector reliability deriving means 1902 and the candidate segment sorter 1906.

The first motion vector reliability deriving means 1902 is configured to obtain the reliability of the motion vector at least one pair of frame of candidate segment two ends.Obtaining of the reliability of motion vector can realize with multiple means.For example, can utilize preprocessed data, promptly read data about the motion vector reliability from the outside to video.Also can be to carry out motion-vector search and motion vector reliability classification without pretreated candidate segment, the two all can utilize any existing and realization of technology in the future, can no longer repeat at this referring to first and second execution mode.

Candidate segment sorter 1906 is configured to reliability according to described motion vector, and whether playback segment is classified to candidate segment.This candidate segment sorter 1906 can utilize any classifier technique that sample learning trains of passing through existing or in the future to realize.

In a kind of modification of present embodiment, as shown in figure 20, can after obtaining candidate segment sorter 1902, also comprise in the coupling fractional computation device 2004, be used for matching degree according to the camera lens translative mode at the distribution calculated candidate fragment two ends of the reliability of the motion vector of at least one pair of frame at described candidate segment two ends.This matching degree can be represented with the coupling mark.Candidate segment sorter 1906 can directly utilize this coupling mark (perhaps its variation), thinks that it has reflected that candidate segment is the probability of playback segment, according to the candidate segment of judging whether be (perhaps may for) playback segment.Certainly candidate segment sorter 1906 also can utilize any classifier technique of training by sample learning to realize.For the calculating of coupling mark in the coupling fractional computation device 2004, can infer, for any two distributions of any variable, can design any suitable mode and calculate its matching degree.Matching degree includes but not limited to similitude.For example,, comprise the direction of wiping, can think that then matching degree should reflect the similitude that reliability distributes if identical pattern is adopted in wiping of playback segment two ends.But also might wiping of playback segment two ends adopt the pattern of symmetry, for example enter playback time and wipe from right to left, withdraw from playback time and wipe from left to right.Can think in this case and can reflect matching degree with symmetry (can think a kind of special shape of similitude).

Wherein, as described in first execution mode and second execution mode, the reliability of described motion vector can comprise reliable and unreliable, and this moment, described coupling fractional computation device 2004 was configured to: according to the coupling mark at the distribution calculated candidate fragment two ends of the unreliable motion vector at least one pair of frame at described candidate segment two ends.The reliability of described motion vector also can comprise mistake coupling and non-mistake coupling.At this moment, described coupling fractional computation device 2004 is configured to: according to the coupling mark at the distribution calculated candidate fragment two ends of the mistake match motion vector at least one pair of frame at described candidate segment two ends.

Further, described coupling fractional computation device 2004 can be configured to: the similitude (can comprise symmetry) of distribution of calculating the unreliable/mistake match motion vector at least one pair of frame at described candidate segment two ends is as described coupling mark.

For the concrete account form of coupling mark, and the preliminary treatment of relevant camera lens transition detection and definite candidate segment, to the search and the reliability classification of motion vector, see also first execution mode and second execution mode in addition.

The 9th execution mode

Present embodiment is corresponding to the 3rd execution mode.

Being used for of present embodiment, the equipment that detects the playback segment of video comprised the camera motion feature deriving means 2102 of the motion feature that is used to obtain video camera, and utilize the camera motion feature to come, thereby the candidate segment sorter 1906 of identification playback segment to the candidate segment classification.

Referring to the 3rd execution mode, exist a lot of technology existing or future to be used to detect camera motion.In the present embodiment, camera motion feature deriving means 2102 also can be from obtaining camera motion information to the preliminary treatment of video segment.

Camera motion comprises pan, pitching, convergent-divergent and rotation.Candidate segment sorter 1906 can utilize any partly or completely statistical nature in these camera motions to come candidate segment is classified.Described statistical nature for example is camera motion between each frame mean value, absolute value mean value, variance or the like of (promptly with respect to the time).As additional or substitute, also can classify to candidate segment with the various statistical natures of the acceleration of camera motion.The acceleration of camera motion can obtain by the corresponding sports parameter of two frames is subtracted each other.

As mentioned before, candidate segment may comprise a more than camera lens.In this case, as shown in figure 22, the equipment that detects the playback segment in the video can also comprise speed change playback shot classification device 2202, utilize said method to shot classification earlier, discern it whether or much probability are arranged is speed change playback camera lens, candidate segment is classified based on each shot classification result's statistical nature by candidate segment sorter 1906 then.For example, each shot classification result's statistical nature can be that each camera lens is the confidence level and/or the mean value of speed change playback camera lens, perhaps, under the situation of speed change playback shot classification device 2202 for two-value output, described statistical nature can be the shared ratio (ratio of number, the ratio of time) of speed change playback camera lens in all camera lenses of candidate segment, or the like.

The same with the 8th execution mode, candidate segment sorter 1906 can be any grader of training by sample learning.In the present embodiment, described grader can be that (Linear Discriminant Analysis, LDA) grader can certainly be other graders to linear discriminant analysis.

Shown in the dotted portion among Figure 21, Figure 22, present embodiment can also combine with the 8th execution mode and any concrete modification thereof.In this case, candidate segment sorter 1906 can use identical or dissimilar graders, classify at the 8th execution mode and the 9th execution mode respectively, classification results is given comprehensively with any appropriate mode, thereby whether the identification candidate segment is playback segment.Candidate segment sorter 1906 also can use same grader (referring to the 7th execution mode and the 13 execution mode), utilizes the feature that is obtained in the 8th execution mode and the 9th execution mode to classify.

The tenth execution mode

Present embodiment is corresponding to the 4th execution mode.

On the basis of the 9th execution mode, as shown in figure 23, the camera motion feature deriving means 2102 of present embodiment further comprises: the second motion vector reliability deriving means 2302 is used for obtaining the reliability of candidate segment motion vector; Camera motion checkout gear 2304 only is configured to the service-strong motion vector and estimates camera motion; And camera motion feature calculation device 2306, be used for calculating the camera motion feature according to the method for the 3rd execution mode and the 9th execution mode.

A kind of modification according to present embodiment, not only camera motion checkout gear 2304 is considered the reliability of motion vector, camera motion feature calculation device 2306 is also considered the reliability of motion vector, shown in the dotted line that points to camera motion feature calculation device 2306 from camera motion checkout gear 2304.For example, the camera motion that utilizes the few frame of reliable motion vectors to estimate is obviously credible not as the camera motion that utilizes the many frames of reliable motion vectors to estimate.Therefore, at the motion feature that calculates video camera for example during the time average of absolute value (being interframe mean value), can be with the reliable motion vectors ratio of each frame weight as the camera motion absolute value of this frame.

In addition, as during the statistical nature of the 3rd or the 9th execution mode calculating kinematical vector and/or local motion and/or their acceleration, camera motion feature calculation device 2306 can be configured to only use reliable motion vectors similarly, and/or can be at the time of counting statistics feature (interframe) mean time with the reliable motion vectors quantity weight as every frame.

Further, consider when detecting camera motion with interative least square method, can give up a part of motion vector (promptly being reliable motion vectors) in convergence process, camera motion feature calculation device 2306 preferably is configured to use the ratio of final adopted motion vector as described weight.

When calculating the statistical nature of acceleration as previously mentioned, because the acceleration of each frame is the motion vector of front and back two frames or local motion or global motion to be subtracted each other obtain, so its reliability is by the lower frame decision of confidence level.Therefore, in this case, camera motion feature calculation device 2306 is when the time average of counting statistics feature, and the weight of each frame can be got the smaller value in two frames of front and back.

About speed change playback shot classification device 2202, candidate segment sorter 1906 and employed grader, can no longer repeat at this referring to the 3rd and the 9th execution mode.

About the concrete account form of camera motion statistical nature, can be referring to the 3rd execution mode.

In addition, as shown in figure 24, candidate segment sorter 1906 can also use the statistical nature of reliability type of motion vector as the feature that candidate segment or camera lens are classified.Promptly after the second motion vector reliability deriving means 2302, increase a motion vector reliability statistics feature deriving means 2402 that is used to obtain motion vector reliability statistics feature.Equally, the reliability of motion vector can be meant that motion vector is reliable or unreliable, also can be meant reliable, the mistake coupling of motion vector and mate three types more, perhaps other types.

Concrete calculating for motion vector reliability statistics feature sees also the 4th execution mode.

The specific implementation of candidate segment sorter 1906 and present embodiment combine with the 8th execution mode, with the 9th execution mode be similarly, in this no longer repeat specification.

Can notice, in the 8th execution mode, be useful on the first motion vector reliability deriving means 1902 of reliability of the motion vector of at least one pair of frame that obtains the candidate segment two ends.In the 9th, the tenth execution mode, be useful on the second motion vector reliability deriving means 2302 of the reliability of obtaining the motion vector that (comprises the camera lens in the candidate segment) in the candidate segment.The main distinction of these two devices is the object difference handled, what the first motion vector reliability deriving means 1902 obtained is the reliability that possible change the motion vector of relevant transition frames with camera lens at candidate segment two ends, the second motion vector reliability deriving means 2302 obtains is the reliability of motion vector of the body matter frame (preferably not comprising the transition frames that is used for the camera lens conversion) of candidate segment or camera lens, obtain the mode of motion vector as for them, then can be identical, also can be different.

Therefore, in a kind of modification of present embodiment, as shown in figure 25, the 8th execution mode is combined with the tenth execution mode, and shared same motion vector reliability deriving means 2502.Just, in the present embodiment, can be earlier the reliability of the motion vector of all frames of candidate segment be classified, suitably select for use at follow-up coupling fractional computation device 2004 and/or camera motion checkout gear 2304 and/or camera motion feature calculation device 2306 and/or motion vector reliability statistics feature deriving means 2402 respectively then.Therefore, in this manual, can substitute the first motion vector reliability deriving means 1902 and the second motion vector reliability deriving means 2302 of any position, hereinafter this no longer be specified with motion vector reliability deriving means 2502.

The 11 execution mode

Present embodiment is corresponding to the 5th execution mode.

As shown in figure 26, the equipment of the playback segment in the detection video of present embodiment comprises the camera lens content type deriving means 2602 of the content type of the camera lens that is used for obtaining candidate segment, and according to the distribution characteristics of the camera lens content type candidate segment sorter 1906 to the candidate segment classification.

The concrete obtain manner of camera lens content type sees also the 5th execution mode.

When having only a camera lens in candidate's sheet fragment, the camera lens content type can be directly as the feature that candidate's sheet is classified.When in the candidate segment a plurality of camera lens being arranged, in candidate segment sorter 1906, utilize the statistical nature of camera lens content type that candidate segment is classified.Described statistical nature for example is the ratio of the camera lens of various content types, for example the ratio of the camera lens of sports genre.This ratio can be the ratio of number of shots, the perhaps ratio of time length, or the like.Described statistical nature can also be confidence level and/or the mean value of each camera lens for the camera lens (for example physical culture camera lens) of certain content type, or the like.

In order to make feature stable on statistics, in a kind of modification of present embodiment, as shown in figure 27, camera lens content type deriving means can comprise lens group traversal device 2702 and merge device 2704.The continuous camera lens of lens group traversal device 2702 usefulness one group a plurality of (for example 7) is as the elementary cell of classifying.That is to say, be equivalent to provide a window with the number of shots definition, is that unit travels through described candidate segment with the camera lens with this window, classifies thereby the lens group in each window is considered as a camera lens.Like this, each camera lens can relate in a plurality of groups in succession, thereby obtains a plurality of content type classification results.At last, merging device 2704 merges at the content type classification result of each camera lens lens group that it is related, thereby obtains the content type classification result of each camera lens.

The same with the 8th to the tenth execution mode, candidate segment sorter 1906 can be realized with any grader of training by sample learning.

Shown in the dotted portion among Figure 27, present embodiment can also combine with the 8th execution mode and any concrete modification thereof.In this case, candidate segment sorter 1906 can use identical or dissimilar graders, classify at the 8th execution mode and the 11 execution mode respectively, classification results is given comprehensively with any appropriate mode, thereby whether the identification candidate segment is playback segment.Candidate segment sorter 1906 also can use same grader, utilizes the feature that is obtained in the 8th execution mode and the 11 execution mode to classify.Similarly, present embodiment can also combine (not shown) with the 9th or the tenth execution mode, perhaps can also combine (not shown) with the 8th and the 9th or the tenth execution mode, can be referring to the 7th execution mode and the 13 execution mode.

The 12 execution mode

Present embodiment is corresponding to the 6th execution mode.

In the present embodiment, as shown in figure 28, the equipment that detects the playback segment in the video comprises the number of shots deriving means 2802 of the number of shots that is used for obtaining candidate segment, and based on the candidate segment sorter 1906 of this number of shots to the candidate segment classification.

The same with the 8th to the 11 execution mode, candidate segment sorter 1906 can be realized with any grader of training by sample learning.

Shown in the dotted portion among Figure 28, present embodiment can also combine with the 8th execution mode and any concrete modification thereof.In this case, candidate segment sorter 1906 can use identical or dissimilar graders, classify at the 8th execution mode and the 12 execution mode respectively, classification results is given comprehensively with any appropriate mode, thereby whether the identification candidate segment is playback segment.Candidate segment sorter 1906 also can use same grader, utilizes the feature that is obtained in the 8th execution mode and the 12 execution mode to classify.Similarly, present embodiment can also combine with the 9th or the tenth execution mode and/or the 11 execution mode (not shown), perhaps can also combine (not shown) with the 8th, the 9th or the tenth execution mode and the 11 execution mode, can be referring to the 7th execution mode and the 13 execution mode.

The 13 execution mode

In the 8th to the 12 execution mode, result based on coupling fractional computation device 2004, speed change playback shot classification device 2202, camera lens content type deriving means 2602 and number of shots deriving means 2802 has been discussed respectively, based on camera lens conversion coupling mark, speed change playback camera lens statistical nature, camera lens content type statistical nature and number of shots candidate segment has been classified by candidate segment sorter 1906.Some combination of coupling fractional computation device 2004, speed change playback shot classification device 2202, camera lens content type deriving means 2602 and number of shots deriving means 2802 also has been discussed in aforementioned each execution mode.

In fact, above-mentioned each device or above-mentioned each feature can combination in any, but the equal combination in any of the 8th to the 12 execution mode and their all modification.This combination can realize by final candidate segment sorter 1906.

Figure 29 illustrates a kind of with above-mentioned four kinds of devices or the execution mode that gets up of characteristic synthetic.For each Feature Extraction, be the same with the 8th to the 12 execution mode, no longer repeat at this.Do an explanation at candidate segment sorter 1906 below.Should be appreciated that following explanation all is suitable for for the combination in any of combination in any, the 8th to the 12 execution mode and any modification thereof of above-mentioned four kinds of devices.

Described in former each execution mode, candidate segment sorter 1906 can use identical or dissimilar graders, classify at each execution mode respectively, classification results is given comprehensively (for example suing for peace or weighted sum with any appropriate mode, or the like), thereby whether the identification candidate segment is playback segment.Candidate segment sorter 1906 also can use same grader, utilizes the feature that is obtained in each execution mode to classify.

In a kind of execution mode that the applicant provides, recognize that in fact above each feature constitutes a kind of probability of cause relational network (for example shown in Figure 17).According to this probability of cause relational network, can make up probability of cause relational network grader, BAYESIAN NETWORK CLASSIFIER for example obtains the conditional probability parameter of each directed edge of probability of cause relational network from training data study.This grader that utilization trains is classified in the present embodiment.

It should be noted that when each feature mutually combined that some technical characterictic in aforementioned the 8th to the 12 each execution mode can replenish or the phase trans-substitution shared, mutually.For example, as previously mentioned, coupling fractional computation device 2004 and camera motion feature deriving means 2102 can shared motion vector reliability deriving means 2502.And for example, when independent use camera lens content type statistical nature, the camera lens content type is just being divided time-like also can use movable information statistical nature in the camera lens.But when combining,, can not use movable information during therefore to the camera lens content type classification because the latter two also use movable information with camera lens translative mode coupling mark feature and/or speed change playback camera lens statistical nature.For another example, thereby, be the precondition of obtaining above-mentioned all features, can be used as common preliminary treatment result and obtain, perhaps carry out common preliminary treatment from the outside for the detection of camera lens conversion, to gradual-change mode pairing acquisition candidate segment.

Above some embodiments of the present invention are described in detail.To understand as those of ordinary skill in the art, whole or any steps or the parts of method and apparatus of the present invention, can be in the network of any computing equipment (comprising processor, storage medium etc.) or computing equipment, realized with hardware, firmware, software or their combination, this is that those of ordinary skills' their basic programming skill of utilization under the situation of understanding content of the present invention just can be realized, does not therefore need to specify at this.

In addition, it is evident that, when relating to possible peripheral operation in the superincumbent explanation, will use any display device and any input equipment, corresponding interface and the control program that link to each other with any computing equipment undoubtedly.Generally speaking, the hardware of the various operations in the related hardware in computer, computer system or the computer network, software and the realization preceding method of the present invention, firmware, software or their combination promptly constitute equipment of the present invention and each building block thereof.

Therefore, based on above-mentioned understanding, purpose of the present invention can also realize by program of operation or batch processing on any messaging device.Described messaging device can be known common apparatus.Therefore, purpose of the present invention also can be only by providing the program product that comprises the program code of realizing described method or equipment to realize.That is to say that such program product also constitutes the present invention, and storage or the medium that transmits such program product also constitute the present invention.Obviously, described storage or transmission medium can be well known by persons skilled in the art, and perhaps therefore the storage or the transmission medium of any kind that is developed in the future also there is no need at this various storages or transmission medium to be enumerated one by one.

In equipment of the present invention and method, obviously, after can decomposing, make up and/or decompose, each parts or each step reconfigure.These decomposition and/or reconfigure and to be considered as equivalents of the present invention.The step that also it is pointed out that the above-mentioned series of processes of execution can order following the instructions naturally be carried out in chronological order, but does not need necessarily to carry out according to time sequencing.Some step can walk abreast or carry out independently of one another.Simultaneously, in the above in the description to the specific embodiment of the invention, can in one or more other execution mode, use in identical or similar mode at the feature that a kind of execution mode is described and/or illustrated, combined with the feature in other execution mode, or the feature in alternative other execution mode.

Should emphasize that term " comprises/comprise " existence that refers to feature, key element, step or assembly when this paper uses, but not get rid of the existence of one or more further feature, key element, step or assembly or additional.

Though described the present invention and advantage thereof in detail, be to be understood that and under not exceeding, can carry out various changes, alternative and conversion by the situation of the appended the spirit and scope of the present invention that claim limited.And the application's scope is not limited only to the specific embodiment of the described process of specification, equipment, means, method and step.The one of ordinary skilled in the art will readily appreciate that from disclosure of the present invention, can use according to the present invention and carry out and process, equipment, means, method or step essentially identical function of corresponding embodiment described herein or acquisition result essentially identical with it, existing and that will be developed in the future.Therefore, appended claim is intended to comprise such process, equipment, means, method or step in their scope.

The present invention can be used for fields such as the segmentation, retrieval, analysis of video.For example automatically or semi-automatically set up structure category and semantic tagger, thereby be convenient for people to search apace and browse wonderful at the sports video analysis field.This application both can be faced the terminal spectators, also can be in the face of the program making of TV station and the content search of website, for example TV station between match period to the comment of match in the past, the sports tournament fragment download on the multimedia web site etc.

Claims

1. method that detects the playback segment in the video comprises:

The first motion vector reliability obtaining step obtains the reliability of the motion vector at least one pair of frame at candidate segment two ends;

Classification step, whether playback segment is classified to candidate segment according to the reliability of described motion vector.

2. the method for the playback segment in the detection video as claimed in claim 1 also comprises:

Coupling fractional computation step is according to the coupling mark at the distribution calculated candidate fragment two ends of the reliability of the motion vector at least one pair of frame at described candidate segment two ends;

Wherein, described classification step is configured to whether playback segment is classified to candidate segment according to described coupling mark.

3. the method for the playback segment in the detection video as claimed in claim 2, wherein:

The reliability of described motion vector comprises reliable and unreliable.

4. the method for the playback segment in the detection video as claimed in claim 2, wherein:

The reliability of described motion vector comprises mistake coupling and non-mistake coupling;

Wherein, described coupling fractional computation step is configured to: according to the coupling mark at the distribution calculated candidate fragment two ends of the mistake match motion vector at least one pair of frame at described candidate segment two ends.

5. the method for the playback segment in the detection video as claimed in claim 4, wherein, described coupling fractional computation step is configured to: the similitude of distribution of calculating the mistake match motion vector at least one pair of frame at described candidate segment two ends is as described coupling mark.

6. the method for the playback segment in the detection video as claimed in claim 5 wherein, is calculated as follows described similitude:

Wherein, i and j represent a pair of frame at described candidate segment two ends, S (i, j) similitude between expression i frame and the j frame, bi and bj represent in i frame and the j frame piece in order to calculating kinematical vector respectively, and their locus in frame are corresponding or symmetrical mutually, M _i(bi) and M _j(bj) represent the reliability of the motion vector of piece bi in i frame and the j frame and bj respectively, when the motion vector of this piece its value 1 during for the mistake match motion vector, otherwise value is 0, the quantity of piece in frame of # (block) expression.

7. the method for the playback segment in the detection video as claimed in claim 6, wherein, the mean value that frame is calculated described similitude at described candidate segment two ends many is as described coupling mark:

Σ_{i &Element; {GT}_{1}, i + B 1 &Element; {GT}_{2}} S (i, i + B 1) / N,

Perhaps

Σ_{i &Element; {GT}_{1}, B 2 - i &Element; {GT}_{2}} S (i, B 2 - i) / N

Wherein, GT1 and GT2 represent the set of the frame at candidate segment two ends respectively, and i, i+B1 and B2-i represent the numbering of frame, and B1 represents each to the skew between the frame, the skew between first frame that B2 represents GT1 and its corresponding frame in GT2, and N represents the number that frame is right.

8. the method for the playback segment in the detection video as claimed in claim 2 also comprises:

Camera motion feature obtaining step obtains the camera motion feature in the candidate segment;

Wherein, described classification step is configured to whether playback segment is classified to candidate segment according to described coupling mark and described camera motion feature.

9. the method for the playback segment in the detection video as claimed in claim 8 also comprises:

Whether the shot classification step is that speed change playback camera lens is classified according to described camera motion feature to camera lens;

Wherein, described classification step is configured to whether playback segment is classified to candidate segment according to described coupling mark and speed change playback camera lens statistical nature.

10. the method for the playback segment in the detection video as claimed in claim 9, wherein, described camera motion feature obtaining step comprises:

The second motion vector reliability obtaining step obtains the reliability of the motion vector in the candidate segment;

The motion detection step is utilized the camera motion in the described candidate segment of reliable motion vector detection;

The feature calculation step is calculated the camera motion feature according to the distribution of the reliability of described camera motion and motion vector.

11. the method for the playback segment in the detection video as claimed in claim 10, wherein, the reliability of motion vector comprises reliably, mates more and the mistake coupling, and this method also comprises:

Obtain the statistical nature of the reliability of described motion vector;

Wherein, described shot classification step is configured to whether the statistical nature according to the reliability of described camera motion feature and described motion vector is that speed change playback camera lens is classified to camera lens.

12. the method for the playback segment in the detection video as claimed in claim 2 also comprises:

Camera lens content type obtaining step obtains the camera lens content type in the candidate segment;

Wherein, described classification step is configured to whether playback segment is classified to candidate segment according to described coupling mark and camera lens content type statistical nature.

13. the method for the playback segment in the detection video as claimed in claim 12, wherein, described camera lens content type obtaining step comprises:

Traversal step: according to all camera lenses in certain step-length traversal candidate segment, the lens group that each window is included is considered as a camera lens its content type is classified with the window of the camera lens that comprises predetermined quantity;

Combining step, at each camera lens, the classification results that will comprise all lens group of this camera lens merges, thereby obtains the content type of this camera lens.

14. the method for the playback segment in the detection video as claimed in claim 2 also comprises:

The number of shots obtaining step obtains the number of shots in the candidate segment;

Wherein, described classification step is configured to whether playback segment is classified to candidate segment according to described coupling mark and described number of shots.

15. the method for the playback segment in the detection video as claimed in claim 9 also comprises:

Wherein, described classification step is configured to whether playback segment is classified to candidate segment according to described coupling mark, speed change playback camera lens statistical nature and camera lens content type statistical nature.

16. the method for the playback segment in the detection video as claimed in claim 15 also comprises:

Wherein, described classification step is configured to whether playback segment is classified to candidate segment according to described coupling mark, speed change playback camera lens statistical nature, camera lens content type statistical nature and described number of shots.

17. the method for the playback segment in the detection video as claimed in claim 16, wherein, described coupling mark, speed change playback camera lens statistical nature, camera lens content type statistical nature and described number of shots constitute probability of cause relational network, and described classification step is configured to use probability of cause relational network grader to classify.

18. an equipment that detects the playback segment in the video comprises:

The first motion vector reliability deriving means obtains the reliability of the motion vector at least one pair of frame at candidate segment two ends;

The candidate segment sorter, whether playback segment is classified to candidate segment according to the reliability of described motion vector.

19. the equipment of the playback segment in the detection video as claimed in claim 18 also comprises:

Coupling fractional computation device is according to the coupling mark at the distribution calculated candidate fragment two ends of the reliability of the motion vector at least one pair of frame at described candidate segment two ends;

Wherein, described candidate segment sorter is configured to whether playback segment is classified to candidate segment according to described coupling mark.

20. the equipment of the playback segment in the detection video as claimed in claim 19, wherein, the reliability of described motion vector comprises mistake coupling and non-mistake coupling, and described coupling fractional computation device is configured to: according to the coupling mark at the distribution calculated candidate fragment two ends of the mistake match motion vector at least one pair of frame at described candidate segment two ends.

21. the equipment of the playback segment in the detection video as claimed in claim 19 also comprises:

Camera motion feature deriving means obtains the camera motion feature in the candidate segment;

Wherein, described candidate segment sorter is configured to whether playback segment is classified to candidate segment according to described coupling mark and described camera motion feature.

22. the equipment of the playback segment in the detection video as claimed in claim 21 also comprises:

Whether the shot classification device is that speed change playback camera lens is classified according to described camera motion feature to camera lens;

Wherein, described candidate segment sorter is configured to whether playback segment is classified to candidate segment according to described coupling mark and speed change playback camera lens statistical nature.

23. the equipment of the playback segment in the detection video as claimed in claim 22, wherein, described camera motion feature deriving means comprises:

The second motion vector reliability deriving means obtains the reliability of the motion vector in the candidate segment;

Motion detection apparatus utilizes the camera motion in the described candidate segment of reliable motion vector detection;

The feature calculation device calculates the camera motion feature according to the distribution of the reliability of described camera motion and motion vector.

24. the equipment of the playback segment in the detection video as claimed in claim 23, wherein, the reliability of motion vector comprises reliably, mates more and the mistake coupling, and this equipment also comprises:

Obtain the device of statistical nature of the reliability of described motion vector;

Wherein, described shot classification device is configured to whether the statistical nature according to the reliability of described camera motion feature and described motion vector is that speed change playback camera lens is classified to camera lens.

25. the equipment of the playback segment in the detection video as claimed in claim 19 also comprises:

Camera lens content type deriving means obtains the camera lens content type in the candidate segment;

Wherein, described candidate segment sorter is configured to whether playback segment is classified to candidate segment according to described coupling mark and camera lens content type statistical nature.

26. the equipment of the playback segment in the detection video as claimed in claim 25, wherein, described camera lens content type deriving means comprises:

Lens group traversal device: according to all camera lenses in certain step-length traversal candidate segment, the lens group that each window is included is considered as a camera lens its type is classified with the window of the camera lens that comprises predetermined quantity;

Merge device, at each camera lens, the classification results that will comprise all lens group of this camera lens merges, thereby obtains the content type of this camera lens.

27. the equipment of the playback segment in the detection video as claimed in claim 19 also comprises:

The number of shots deriving means obtains the number of shots in the candidate segment;

Wherein, described candidate segment sorter is configured to whether playback segment is classified to candidate segment according to described coupling mark and described number of shots.

28. the equipment of the playback segment in the detection video as claimed in claim 22 also comprises:

Wherein, described candidate segment sorter is configured to whether playback segment is classified to candidate segment according to described coupling mark, speed change playback camera lens statistical nature and camera lens content type statistical nature.

29. the equipment of the playback segment in the detection video as claimed in claim 28 also comprises:

Wherein, described candidate segment sorter is configured to whether playback segment is classified to candidate segment according to described coupling mark, speed change playback camera lens statistical nature, camera lens content type statistical nature and described number of shots.

30. the equipment of the playback segment in the detection video as claimed in claim 29, wherein, described candidate segment sorter is a probability of cause relational network grader.