WO2008093277A2 - Method and apparatus for smoothing a transition between a first video segment and a second video segment - Google Patents

Method and apparatus for smoothing a transition between a first video segment and a second video segment Download PDF

Info

Publication number
WO2008093277A2
WO2008093277A2 PCT/IB2008/050296 IB2008050296W WO2008093277A2 WO 2008093277 A2 WO2008093277 A2 WO 2008093277A2 IB 2008050296 W IB2008050296 W IB 2008050296W WO 2008093277 A2 WO2008093277 A2 WO 2008093277A2
Authority
WO
WIPO (PCT)
Prior art keywords
video segment
profile
determining
features
determined
Prior art date
Application number
PCT/IB2008/050296
Other languages
French (fr)
Other versions
WO2008093277A3 (en
Inventor
Dzevdet Burazerovic
Pedro Fonseca
Jan A. D. Nesvadba
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Priority to JP2009547788A priority Critical patent/JP2010518672A/en
Publication of WO2008093277A2 publication Critical patent/WO2008093277A2/en
Publication of WO2008093277A3 publication Critical patent/WO2008093277A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/268Signal distribution or switching

Definitions

  • the present invention relates to method and apparatus for smoothing a transition between a first video segment and a second video segment.
  • a method for smoothing a transition between a first video segment and a second video segment comprising the steps of: determining a first profile of content of a first video segment; determining a second profile of content of a second video segment; and inserting the first video segment within the second video segment at a location where the determined first profile is similar to the determined second profile to smooth the transition between the first video segment and the second video segment.
  • apparatus for smoothing a transition between a first video segment and a second video segment comprising: first determining means for determining a first profile of content of a first video segment; second determining means for determining a second profile of content of a second video segment; and third determining means for determining a location for insertion of the first video segment within the second video segment where the determined first profile is similar to the determined second profile to smooth the transition between the first video segment and the second video segment.
  • the system of the present invention effectively changes the insertion point for a block of commercials.
  • individual commercials within the block may be rearranged, and further the audiovisual content at the boundaries between the individual commercials may be modified, as well as at the transitions from/to the adjoining non-commercial content.
  • the audiovisual content at the boundaries between the individual commercials, as well as at the transitions from/to the adjoining non-commercial content can also be modified when the commercials are inserted at fixed locations, e.g. when a content creator has determined a fixed moment for inserting commercials.
  • the system can be verified by using methods and strategies to detect the commercials once edited and if commercials are still detectable, editing the material by feeding it back through the system.
  • the content of the commercial break is profiled, and based on this profile the choice of where to insert the commercial break can be made.
  • the commercials can be inserted (for example only after the end of a scene) and on the general location within the content (for example between 15 and 20 minutes into the content).
  • the optimum location for commercial insertion can be chosen to minimize the difference between the commercials and the enclosing content and hence provide the desired smooth transition.
  • the individual commercials within the block can be rearranged.
  • the choice regarding the order in which commercials should be put next to each and towards the boundaries with non-commercial content is determined on the basis of the respective profiles. This can be used to smooth out the typically high audiovisual variation inside the commercial block. In fact, it is the pattern of frequent and abrupt interruptions of multiple audiovisual features within a relatively short period of time (several minutes), which is particularly disruptive and annoying for the viewer.
  • the audiovisual content at the boundaries between the commercials may be modified, as well as at the transitions from/to the adjoining non-commercial content. It is known that gradual transitions between visual (camera) shots, e.g. cross-fades and dissolves, are less disruptive and are more difficult to detect than the abrupt cuts, for example, as disclosed in Ying Li, C-C. Jay Kuo, "Video Content Analysis Using Multimodal Information", 2003 by Kluwer Academic Publishers Group, ISBN 1-4020-7490-5. Thus, in providing such gradual transitions at the boundaries between commercials, a transition is further smoothed and effectively disturb detection of a high rate of visual shot-cuts or audiovisual super-separators. Similar effects could also be created in audio, where the insertion of non-audible noise can also be useful.
  • Fig. 1 is a simplified schematic diagram of apparatus according to an embodiment of the present invention
  • Figs. 2 (a) to (d) illustrate a first example of low-level feature statistics in a feature movie, including commercial blocks
  • Figs. 3 (a) to (d) illustrate a second example of low- level feature statistics in a feature movie, including commercial blocks.
  • the apparatus 100 comprises an input terminal 101 for receiving a multimedia data stream.
  • the input terminal 101 is connected to the input of a multimedia (audio visual) content analyzer 103 and the input of a video editor 105.
  • the output of the video editor 105 is connected to an output terminal 107 of the apparatus 100.
  • the output of the multimedia content analyzer 103 is connected to the control 109 of the video editor 105.
  • the output of the video editor 105 is also connected in a feedback loop to the control 109 of the video editor 105 via a reference detector 111.
  • a multimedia data stream comprising a second video segment S 1 ,... ,S N and a first video segment C 1 ,... ,C M are input onto the input terminal 101 of the apparatus 100.
  • the first and second segments are shown input separately on the input terminal 101.
  • the first video segment consists of a plurality of individual commercials Ci , ... ,C M or segments of informative data for example.
  • the second video segment consists of noncommercial video content broadcast as a contiguous sequence of (groups of) visual shots, S 1 ,... ,S N -
  • the first and/or second video segments include both the visual and the corresponding audio data.
  • the input multimedia data stream is analyzed by the multimedia content analyzer 103.
  • the first and second video segments C 1 ,... ,C M and S 1 ,... ,S N are first identified, e.g. based on audiovisual inspection done by a human, and possibly labeled (by means of associating metadata) for easier indexing and access. Then, features that can be characteristic of the behavior of the first and second video segment are extracted.
  • audiovisual features are known to the skilled person, as well as methods for their extraction.
  • Presence of humans in a scene can be established by means of face or speech recognition
  • Object or speaker tracking including the detection of speaker change
  • Mood of the content e.g. derived from the mood of music or by analysis of speech prosody
  • Audio composition of the scene (voice, music, voice + music, voice + background noise, etc.) - Localization of visual (camera) shot-cuts.
  • Dominant color e.g. the color of the largest clusters in a color space
  • Level and gradient of visual activity e.g. derived from the statistics of coding parameters such as motion vectors.
  • Level and gradient of scene complexity e.g. derived from the statistics of coding parameters such as the product of coding bit rate and the quantization parameter.
  • Audio (temporal and spectral properties):
  • - volume e.g. of a speaker
  • - tempo e.g. of a speaker
  • the extracted features are then processed by the analyzer 103 to generate content profiles to control the video editor 105.
  • Content profiling is the estimation of content similarity based on the extracted features. These profiles are generated by the different method described below.
  • a profile may, typically, be composed of feature statistics, for example the mean and standard deviation computed for each feature over a number of consecutive video frames (the analysis window). For the high level features, the standard deviation would probably be most meaningful, while other measures suitable for binary signals are also conceivable.
  • each candidate feature is considered separately, and the results obtained from different features are combined to form a final decision. Accordingly, single- feature profiles are created for content of a first video segment and a second video segment. These single-feature profiles are compared to yield a similarity estimate
  • the estimates can be obtained by, for example, measuring the metric, such as distance - the smaller the distance the greater the similarity.
  • the multiple estimates are then combined into a single decision using well-known techniques, such as majority voting, linear decision models with weighting, fuzzy logic, Markov Models, etc.
  • a composite profile is obtained from a conjunction of lower- level features - a multi-dimensional feature vector containing a statistics (as described above) of each feature is obtained.
  • the similarity between such feature-vectors extracted from different content items is then measured using techniques known from the field of statistical pattern classification, for example, as disclosed by Richard O. Duda, Peter E. Hart, David G. Stork, "Pattern Classification", 2001 by John Wiley & Sons, ISBN 0-471-05669-3, for instance data clustering. This may be achieved by using techniques such as supervised learning and neural networks.
  • combining higher-level features in multidimensional feature vectors to determine a measure of similarity might not be adequate or even feasible (for example it may be difficult to quantize high-level features such as speaker tracking or mood of the content).
  • the features are evaluated separately, for instance by applying heuristics to obtain similarity measures after which they may be combined using the techniques described above.
  • the concept of content profiling according to the embodiments above is further explained with reference to Figs. 2 (a) to (d) and 3 (a) to (d) which illustrate two examples of statistics of such features. In the example a feature movie and a sequence of animated cartoons are illustrated.
  • Figs. 2(a) and 3(a) represent ground truth (manual annotation) in which 1 corresponds to commercials, 0 to non-commercial content.
  • Figs. 2(b) and 3(b) represent data as the standard deviation of the average luma of one video frame of the example. At each frame position, this is computed over an analysis window of 3500 video frames ( ⁇ 2.5 minutes of PAL video), centered at that position.
  • Figs. 2(c) and 3(c) represent the probability of speech for the same sample of Figs. 2(b) and 3(b), respectively.
  • Figs. 2(d) and 3(d) represent the probability of music for the same sample of Figs. 2(b), 2(c), 3(b) and 3(c), respectively.
  • the original data has been sub-sampled by the factor of 2.
  • the output of the analysis above is input into the control 109 of the video editor 105 for recommending to the user (broadcaster) a certain editing action, or to perform an appropriate editing action automatically.
  • a possible result of such editing is shown in Fig. 1.
  • a commercial block C 3 C 1 is composed and inserted between shot groups Sj and Sj +1 . This may be because the terminating part of Sj was found most resembling to the starting part Of C 3 and the starting part of Sj + i most resembling to the terminating part of Ci. Or else because high similarity was observed at the transition from C 3 to C 1 .
  • a smooth transition is observed between the non-commercial portion and a commercial which makes more pleasant viewing and also assist to prevent automatic commercial block detection by PVRs.
  • the segments 'T' which consist of extra content that may arise between different segments due to cross fading, insertion of silences as is well known in the art, etc may be inserted.
  • the editing operations can also be performed with compressed video (i.e. after encoding), which is common in professional video production. Also, it is conceivable that the reference detector 111 and the multimedia (audiovisual) content analyzer 103 could overlap, as they both may incorporate a number of same operations.
  • the edited data stream output from the editor 105 is fed back to the control 109 of the editor 105 to make adjustments to the editor 105 via the reference detector 111.
  • the reference detector 111 comprises a known commercial block detector which seeks transitions between non-commercial portions and commercial blocks in order to distinguish between the commercial and non-commercial portion. If the transition created by the editor 105 is not smooth, this will be detected by the reference block 111 and fed to the control 109 to adjust operation of the editor 105 to improve smoothing of the transition between the different video segments.
  • the edited data stream is then placed on the output terminal 107 of the apparatus 100.
  • 'Means' as will be apparent to a person skilled in the art, are meant to include any hardware (such as separate or integrated circuits or electronic elements) or software (such as programs or parts of programs) which perform in operation or are designed to perform a specified function, be it solely or in conjunction with other functions, be it in isolation or in co-operation with other elements.
  • the invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the apparatus claim enumerating several means, several of these means can be embodied by one and the same item of hardware.
  • 'Computer program product' is to be understood to mean any software product stored on a computer-readable medium, such as a floppy disk, downloadable via a network, such as the Internet, or marketable in any other manner.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Television Signal Processing For Recording (AREA)
  • Apparatus For Radiation Diagnosis (AREA)
  • Image Processing (AREA)
  • Picture Signal Circuits (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

A transition between a first video segment (C1,...,CM) and a second video segment (S1,... SN), is smoothed by determining (103) a first profile of content of a first video segment then determining (103) a second profile of content of a second video segment; and inserting (105) the first video segment within the second video segment at a location (Sj, Sj+1) where the determined first profile is similar to the determined second profile to smooth the transition between the first video segment and the second video segment.

Description

Method and apparatus for smoothing a transition between a first video segment and a second video segment
FIELD OF THE INVENTION
The present invention relates to method and apparatus for smoothing a transition between a first video segment and a second video segment.
BACKGROUND OF THE INVENTION
Due to the proliferation of digital multimedia broadcast and distribution, commercials are now claiming an important role in people's daily lives. To have a radio or TV program without commercial is becoming increasingly rare. Companies use commercials to advertise their products, while broadcasters need commercials to generate supporting (or even primary) revenues. On the other hand, the average consumers often see commercial breaks as an unsolicited intrusion into their viewing or listening experience. Consumers therefore use video recorders to skip these commercial blocks, which reduces broadcasters' advertising revenue.
SUMMARY OF THE INVENTION
It is desirable to smooth the transition to a commercial break (or block) to improve the viewing experience such that it becomes less important or desirable to skip commercials.
This is effectively achieved according to a first aspect of the present invention by a method for smoothing a transition between a first video segment and a second video segment, the method comprising the steps of: determining a first profile of content of a first video segment; determining a second profile of content of a second video segment; and inserting the first video segment within the second video segment at a location where the determined first profile is similar to the determined second profile to smooth the transition between the first video segment and the second video segment.
This is also achieved according to a second aspect of the present invention by apparatus for smoothing a transition between a first video segment and a second video segment, the apparatus comprising: first determining means for determining a first profile of content of a first video segment; second determining means for determining a second profile of content of a second video segment; and third determining means for determining a location for insertion of the first video segment within the second video segment where the determined first profile is similar to the determined second profile to smooth the transition between the first video segment and the second video segment.
In this way a set of simple video editing options are provided, enabling a more seamless integration of given commercials with other non-commercial content (e.g. narrative content, such as a movie or a TV series). The editing is intended to preserve the essential information, while minimizing the abruptness of a transition to a commercial break or block and making commercial skipping less desirable. The invention is particularly effective when used in professional movie and TV broadcasting and editing.
Effectively, the system of the present invention effectively changes the insertion point for a block of commercials. In addition, individual commercials within the block may be rearranged, and further the audiovisual content at the boundaries between the individual commercials may be modified, as well as at the transitions from/to the adjoining non-commercial content. The audiovisual content at the boundaries between the individual commercials, as well as at the transitions from/to the adjoining non-commercial content, can also be modified when the commercials are inserted at fixed locations, e.g. when a content creator has determined a fixed moment for inserting commercials. Further, in a preferred embodiment, the system can be verified by using methods and strategies to detect the commercials once edited and if commercials are still detectable, editing the material by feeding it back through the system.
The content of the commercial break is profiled, and based on this profile the choice of where to insert the commercial break can be made. In practice there may be some limitation on the specific point the commercials can be inserted (for example only after the end of a scene) and on the general location within the content (for example between 15 and 20 minutes into the content). Within these constraints the optimum location for commercial insertion can be chosen to minimize the difference between the commercials and the enclosing content and hence provide the desired smooth transition. Further in profiling the content of the commercial break, the individual commercials within the block can be rearranged. The choice regarding the order in which commercials should be put next to each and towards the boundaries with non-commercial content is determined on the basis of the respective profiles. This can be used to smooth out the typically high audiovisual variation inside the commercial block. In fact, it is the pattern of frequent and abrupt interruptions of multiple audiovisual features within a relatively short period of time (several minutes), which is particularly disruptive and annoying for the viewer.
The audiovisual content at the boundaries between the commercials may be modified, as well as at the transitions from/to the adjoining non-commercial content. It is known that gradual transitions between visual (camera) shots, e.g. cross-fades and dissolves, are less disruptive and are more difficult to detect than the abrupt cuts, for example, as disclosed in Ying Li, C-C. Jay Kuo, "Video Content Analysis Using Multimodal Information", 2003 by Kluwer Academic Publishers Group, ISBN 1-4020-7490-5. Thus, in providing such gradual transitions at the boundaries between commercials, a transition is further smoothed and effectively disturb detection of a high rate of visual shot-cuts or audiovisual super-separators. Similar effects could also be created in audio, where the insertion of non-audible noise can also be useful.
BRIEF DESCRIPTION OF DRAWINGS
For a more complete understanding of the present invention, reference is made to the following description taken in conjunction with the accompanying drawings, in which:
Fig. 1 is a simplified schematic diagram of apparatus according to an embodiment of the present invention; Figs. 2 (a) to (d) illustrate a first example of low-level feature statistics in a feature movie, including commercial blocks; and
Figs. 3 (a) to (d) illustrate a second example of low- level feature statistics in a feature movie, including commercial blocks.
DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION
With reference to Fig. 1, apparatus of an embodiment of the present invention will be described in more detail. The apparatus 100 comprises an input terminal 101 for receiving a multimedia data stream. The input terminal 101 is connected to the input of a multimedia (audio visual) content analyzer 103 and the input of a video editor 105. The output of the video editor 105 is connected to an output terminal 107 of the apparatus 100. The output of the multimedia content analyzer 103 is connected to the control 109 of the video editor 105. The output of the video editor 105 is also connected in a feedback loop to the control 109 of the video editor 105 via a reference detector 111.
Operation of the apparatus will now be described in detail below. A multimedia data stream comprising a second video segment S1,... ,SN and a first video segment C1,... ,CM are input onto the input terminal 101 of the apparatus 100. For simplicity, the first and second segments are shown input separately on the input terminal 101. The first video segment consists of a plurality of individual commercials Ci , ... ,CM or segments of informative data for example. The second video segment consists of noncommercial video content broadcast as a contiguous sequence of (groups of) visual shots, S1,... ,SN- The first and/or second video segments include both the visual and the corresponding audio data.
The input multimedia data stream is analyzed by the multimedia content analyzer 103. The first and second video segments C1,... ,CM and S1,... ,SN are first identified, e.g. based on audiovisual inspection done by a human, and possibly labeled (by means of associating metadata) for easier indexing and access. Then, features that can be characteristic of the behavior of the first and second video segment are extracted. A multitude of audiovisual features are known to the skilled person, as well as methods for their extraction. Some illustrative examples are now listed:
• Higher- level features.
Presence of humans in a scene (can be established by means of face or speech recognition)
Object or speaker tracking (including the detection of speaker change) - Mood of the content (e.g. derived from the mood of music or by analysis of speech prosody)
• Intermediary level features.
Audio composition of the scene (voice, music, voice + music, voice + background noise, etc.) - Localization of visual (camera) shot-cuts.
Detection of presence of audiovisual delimiters, i.e. super-separators (a conjunction of monochrome video frames and an audio silence)
Detection of presence of overlaid text, broadcasters' logos, etc.
• Lower-level features. - Visual:
- Dominant color (e.g. the color of the largest clusters in a color space)
- Luma, chrome (color) averages, histograms, gradients, etc.
- Level and gradient of visual activity (e.g. derived from the statistics of coding parameters such as motion vectors). - Level and gradient of scene complexity (e.g. derived from the statistics of coding parameters such as the product of coding bit rate and the quantization parameter).
Audio (temporal and spectral properties):
- volume; - tempo (e.g. of a speaker);
- background noise characteristics;
- pitch dynamics (e.g. of the speaker).
The extracted features are then processed by the analyzer 103 to generate content profiles to control the video editor 105. Content profiling is the estimation of content similarity based on the extracted features. These profiles are generated by the different method described below.
A profile may, typically, be composed of feature statistics, for example the mean and standard deviation computed for each feature over a number of consecutive video frames (the analysis window). For the high level features, the standard deviation would probably be most meaningful, while other measures suitable for binary signals are also conceivable.
In a first embodiment, each candidate feature is considered separately, and the results obtained from different features are combined to form a final decision. Accordingly, single- feature profiles are created for content of a first video segment and a second video segment. These single-feature profiles are compared to yield a similarity estimate
(confidence, probability). The estimates can be obtained by, for example, measuring the metric, such as distance - the smaller the distance the greater the similarity. The multiple estimates are then combined into a single decision using well-known techniques, such as majority voting, linear decision models with weighting, fuzzy logic, Markov Models, etc. In an alternative embodiment, a composite profile is obtained from a conjunction of lower- level features - a multi-dimensional feature vector containing a statistics (as described above) of each feature is obtained. The similarity between such feature-vectors extracted from different content items is then measured using techniques known from the field of statistical pattern classification, for example, as disclosed by Richard O. Duda, Peter E. Hart, David G. Stork, "Pattern Classification", 2001 by John Wiley & Sons, ISBN 0-471-05669-3, for instance data clustering. This may be achieved by using techniques such as supervised learning and neural networks.
Finally, it should be noted that combining higher-level features in multidimensional feature vectors to determine a measure of similarity might not be adequate or even feasible (for example it may be difficult to quantize high-level features such as speaker tracking or mood of the content). In this case, the features are evaluated separately, for instance by applying heuristics to obtain similarity measures after which they may be combined using the techniques described above. The concept of content profiling according to the embodiments above is further explained with reference to Figs. 2 (a) to (d) and 3 (a) to (d) which illustrate two examples of statistics of such features. In the example a feature movie and a sequence of animated cartoons are illustrated.
Figs. 2(a) and 3(a) represent ground truth (manual annotation) in which 1 corresponds to commercials, 0 to non-commercial content. Figs. 2(b) and 3(b) represent data as the standard deviation of the average luma of one video frame of the example. At each frame position, this is computed over an analysis window of 3500 video frames (~ 2.5 minutes of PAL video), centered at that position. Figs. 2(c) and 3(c) represent the probability of speech for the same sample of Figs. 2(b) and 3(b), respectively. Figs. 2(d) and 3(d) represent the probability of music for the same sample of Figs. 2(b), 2(c), 3(b) and 3(c), respectively.
For the sake of illustration, the original data has been sub-sampled by the factor of 2.
In the first example of Figs. 2(a) to (d), the most discriminative feature is speech probability as shown in Fig. 2(c). It would appear to already separate the commercial blocks, as each commercial block creates a characteristic "carved plateau" in the predominantly low-amplitude movie data.
In considering the commercial-block CB2, if the same commercial block had been inserted little later, such that it would have close to the subsequent peak of non- commercial content, this would create a "plateau" covering both the commercial block and a piece of the movie at the same time. Hence, this would create a more seamless transition with the movie data that follows. The transition would appear seamless to both humans and automatic classifiers that have learned that "plateaus" should mainly be a characteristic of commercial blocks. Similar observation can be made for CB3 that, if inserted later, would create a more seamless transition with the movie data that follows. Refer again to Fig. 2(c).
In the second example of Figs. 3(a) to (d), it is the average luma that provides the best cue for separation, whereas the audio features are quite indiscriminative. In this case, an additional difference can be observed between the commercial blocks themselves. The 3rd commercial block creates the most prominent "plateau", which is distinguishable from those corresponding to the 1st and the 4th as is clearly observed from Fig. 3(b). The leveling of this plateau could be achieved by redistributing individual commercials among the different blocks.
It is conceivable that, with some other genres, neither of these features would be as discriminative, but rather some other feature(s). It is exactly this dependency of the discriminative power of lower-level features on the content type (genre) that makes the combination of multiple features for generating the profile preferable. This is also favorable in that any disturbances that editing would produce in one feature output could be "multiplied", that is, appear in other features. It is even conceivable that artificial patterns could arise in some normally non-discriminative features increasing the effectiveness of the video editor.
The output of the analysis above is input into the control 109 of the video editor 105 for recommending to the user (broadcaster) a certain editing action, or to perform an appropriate editing action automatically. A possible result of such editing is shown in Fig. 1. A commercial block C3C1 is composed and inserted between shot groups Sj and Sj+1. This may be because the terminating part of Sj was found most resembling to the starting part Of C3 and the starting part of Sj+i most resembling to the terminating part of Ci. Or else because high similarity was observed at the transition from C3 to C1. As a result a smooth transition is observed between the non-commercial portion and a commercial which makes more pleasant viewing and also assist to prevent automatic commercial block detection by PVRs. Furthermore the segments 'T' which consist of extra content that may arise between different segments due to cross fading, insertion of silences as is well known in the art, etc may be inserted.
The editing operations can also be performed with compressed video (i.e. after encoding), which is common in professional video production. Also, it is conceivable that the reference detector 111 and the multimedia (audiovisual) content analyzer 103 could overlap, as they both may incorporate a number of same operations.
The edited data stream output from the editor 105 is fed back to the control 109 of the editor 105 to make adjustments to the editor 105 via the reference detector 111. The reference detector 111 comprises a known commercial block detector which seeks transitions between non-commercial portions and commercial blocks in order to distinguish between the commercial and non-commercial portion. If the transition created by the editor 105 is not smooth, this will be detected by the reference block 111 and fed to the control 109 to adjust operation of the editor 105 to improve smoothing of the transition between the different video segments. The edited data stream is then placed on the output terminal 107 of the apparatus 100.
While the invention has been described in connection with preferred embodiments, it will be understood that modifications thereof within the principles outlined above will be evident to those skilled in the art, and thus the invention is not limited to the preferred embodiments but is intended to encompass such modifications. The invention resides in each and every novel characteristic feature and each and every combination of characteristic features. Reference numerals in the claims do not limit their protective scope. Use of the verb "to comprise" and its conjugations does not exclude the presence of elements other than those stated in the claims. Use of the article "a" or "an" preceding an element does not exclude the presence of a plurality of such elements.
'Means', as will be apparent to a person skilled in the art, are meant to include any hardware (such as separate or integrated circuits or electronic elements) or software (such as programs or parts of programs) which perform in operation or are designed to perform a specified function, be it solely or in conjunction with other functions, be it in isolation or in co-operation with other elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the apparatus claim enumerating several means, several of these means can be embodied by one and the same item of hardware. 'Computer program product' is to be understood to mean any software product stored on a computer-readable medium, such as a floppy disk, downloadable via a network, such as the Internet, or marketable in any other manner.

Claims

CLAIMS:
1. A method for smoothing a transition between a first video segment and a second video segment, the method comprising the steps of: determining a first profile of content of a first video segment; determining a second profile of content of a second video segment; and - inserting said first video segment within said second video segment at a location where said determined first profile is similar to said determined second profile to smooth the transition between said first video segment and said second video segment.
2. A method according to claim 1, wherein the step of inserting said first video segment within said second video segment comprises: inserting said first video segment within said second video segment at a location where similarity between said determined first profile and said determined second profile exceeds a similarity threshold.
3. A method according to claim 1, wherein the step of determining a first profile comprises the steps of: extracting at least one first feature from each frame of a plurality of frames of said first video segment; determining first or higher order statistical properties of said extracted first features; and generating the first profile of said determined statistical properties of said extracted first features.
4. A method according to claim 3, wherein the step of determining a second profile comprises the steps of: extracting at least one second feature from each frame of a plurality of frames of said second video segment, said at least one second feature corresponding to said at least one first feature; determining first or higher order statistical properties of said extracted second features; and generating the second profile of said determined statistical properties of said extracted second features.
5. A method according to claim 1, wherein the step of inserting said first video segment within said second video segment comprises the steps of: determining a plurality of similarity estimates from said generated first profile and said generated second profile for a plurality of extracted first and second features; combining said plurality of similarity estimates; - inserting said first video segment within said second video segment at a location where said combined similarity estimate is above a predetermined threshold.
6. A method according to claim 1, wherein the step of inserting said first video segment within said second video segment comprises the steps of: - determining a plurality of similarity estimates from said generated first profile and said generated second profile for a plurality of corresponding portions of the first and second profiles; determining a highest similarity estimate of said plurality of similarity estimates; and - inserting said first video segment within said second video segment at a location of said highest similarity estimate.
7. A method according to claim 1, wherein the method further comprises: inserting an insertion portion at the transition between said first video segment and said second video segment on the basis of any remaining difference between said first profile and said second profile to smooth the transition between said first video segment and said second video segment.
8. A computer program product comprising a plurality of program code portions for carrying out the method according to any one of the preceding claims.
9. Apparatus for smoothing a transition between a first video segment and a second video segment, the apparatus comprising: a first determining means for determining a first profile of content of a first video segment; a second determining means for determining a second profile of content of a second video segment; and a third determining means for determining a location for insertion of said first video segment within said second video segment where said determined first profile is similar to said determined second profile to smooth the transition between said first video segment and said second video segment.
10. Apparatus according to claim 9, wherein the apparatus further comprises editing means for inserting said first video segment within said second video segment at said determined location.
11. Apparatus according to claim 9, wherein the apparatus further comprises: extracting means for extracting at least one first feature from each frame of a plurality of frames of said first video segment and/or at least one second feature from each frame of a plurality of frames of said second video segment; and processing means for determining first or higher order statistical properties of said extracted first features and generating the first profile of said determined statistical properties of said extracted first features and/or determining first or higher order statistical properties of said extracted second features and generating the second profile of said determined statistical properties of said second features.
PCT/IB2008/050296 2007-02-01 2008-01-28 Method and apparatus for smoothing a transition between a first video segment and a second video segment WO2008093277A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2009547788A JP2010518672A (en) 2007-02-01 2008-01-28 Method and apparatus for smoothing a transition between a first video segment and a second video segment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP07101558 2007-02-01
EP07101558.0 2007-02-01

Publications (2)

Publication Number Publication Date
WO2008093277A2 true WO2008093277A2 (en) 2008-08-07
WO2008093277A3 WO2008093277A3 (en) 2008-10-23

Family

ID=39563508

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2008/050296 WO2008093277A2 (en) 2007-02-01 2008-01-28 Method and apparatus for smoothing a transition between a first video segment and a second video segment

Country Status (3)

Country Link
JP (1) JP2010518672A (en)
CN (1) CN101601280A (en)
WO (1) WO2008093277A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018007951A1 (en) * 2016-07-07 2018-01-11 Corephotonics Ltd. Dual-camera system with improved video smooth transition by image blending

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5924127B2 (en) * 2012-05-24 2016-05-25 カシオ計算機株式会社 Movie generation apparatus, movie generation method, and program
EP4091332A1 (en) 2020-01-15 2022-11-23 Dolby International AB Adaptive streaming of media content with bitrate switching

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001030073A1 (en) * 1999-10-19 2001-04-26 Koninklijke Philips Electronics N.V. Television receiver and method of using same for displaying information messages

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001030073A1 (en) * 1999-10-19 2001-04-26 Koninklijke Philips Electronics N.V. Television receiver and method of using same for displaying information messages

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018007951A1 (en) * 2016-07-07 2018-01-11 Corephotonics Ltd. Dual-camera system with improved video smooth transition by image blending
US10706518B2 (en) 2016-07-07 2020-07-07 Corephotonics Ltd. Dual camera system with improved video smooth transition by image blending

Also Published As

Publication number Publication date
WO2008093277A3 (en) 2008-10-23
CN101601280A (en) 2009-12-09
JP2010518672A (en) 2010-05-27

Similar Documents

Publication Publication Date Title
Brezeale et al. Automatic video classification: A survey of the literature
US6469749B1 (en) Automatic signature-based spotting, learning and extracting of commercials and other video content
US7526181B2 (en) System and method for automatically customizing a buffered media stream
JP4699476B2 (en) Video summarization device
US7327885B2 (en) Method for detecting short term unusual events in videos
US20050180730A1 (en) Method, medium, and apparatus for summarizing a plurality of frames
KR101341808B1 (en) Video summary method and system using visual features in the video
US7149365B2 (en) Image information summary apparatus, image information summary method and image information summary processing program
US20030123850A1 (en) Intelligent news video browsing system and method thereof
JP4332700B2 (en) Method and apparatus for segmenting and indexing television programs using multimedia cues
JP2003101939A (en) Apparatus, method, and program for summarizing video information
KR20020035153A (en) System and method for automated classification of text by time slicing
US20100259688A1 (en) method of determining a starting point of a semantic unit in an audiovisual signal
Brezeale et al. Using closed captions and visual features to classify movies by genre
US20080256576A1 (en) Method and Apparatus for Detecting Content Item Boundaries
WO2008093277A2 (en) Method and apparatus for smoothing a transition between a first video segment and a second video segment
Yang et al. Key frame extraction using unsupervised clustering based on a statistical model
Huang et al. A film classifier based on low-level visual features
CN101355673B (en) Information processing device, information processing method
JP5257356B2 (en) Content division position determination device, content viewing control device, and program
Huang et al. Movie classification using visual effect features
JP5254900B2 (en) Video reconstruction method, video reconstruction device, and video reconstruction program
Barbieri et al. Movie-in-a-minute: automatically generated video previews
Kyperountas et al. Audio PCA in a novel multimedia scheme for scene change detection
Khan et al. Unsupervised commercials identification in videos

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200880003957.4

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08702541

Country of ref document: EP

Kind code of ref document: A2

WWE Wipo information: entry into national phase

Ref document number: 2008702541

Country of ref document: EP

ENP Entry into the national phase in:

Ref document number: 2009547788

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase in:

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 5006/CHENP/2009

Country of ref document: IN