WO2008093277A2 - Method and apparatus for smoothing a transition between a first video segment and a second video segment - Google Patents
Method and apparatus for smoothing a transition between a first video segment and a second video segment Download PDFInfo
- Publication number
- WO2008093277A2 WO2008093277A2 PCT/IB2008/050296 IB2008050296W WO2008093277A2 WO 2008093277 A2 WO2008093277 A2 WO 2008093277A2 IB 2008050296 W IB2008050296 W IB 2008050296W WO 2008093277 A2 WO2008093277 A2 WO 2008093277A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video segment
- profile
- determining
- features
- determined
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/268—Signal distribution or switching
Definitions
- the present invention relates to method and apparatus for smoothing a transition between a first video segment and a second video segment.
- a method for smoothing a transition between a first video segment and a second video segment comprising the steps of: determining a first profile of content of a first video segment; determining a second profile of content of a second video segment; and inserting the first video segment within the second video segment at a location where the determined first profile is similar to the determined second profile to smooth the transition between the first video segment and the second video segment.
- apparatus for smoothing a transition between a first video segment and a second video segment comprising: first determining means for determining a first profile of content of a first video segment; second determining means for determining a second profile of content of a second video segment; and third determining means for determining a location for insertion of the first video segment within the second video segment where the determined first profile is similar to the determined second profile to smooth the transition between the first video segment and the second video segment.
- the system of the present invention effectively changes the insertion point for a block of commercials.
- individual commercials within the block may be rearranged, and further the audiovisual content at the boundaries between the individual commercials may be modified, as well as at the transitions from/to the adjoining non-commercial content.
- the audiovisual content at the boundaries between the individual commercials, as well as at the transitions from/to the adjoining non-commercial content can also be modified when the commercials are inserted at fixed locations, e.g. when a content creator has determined a fixed moment for inserting commercials.
- the system can be verified by using methods and strategies to detect the commercials once edited and if commercials are still detectable, editing the material by feeding it back through the system.
- the content of the commercial break is profiled, and based on this profile the choice of where to insert the commercial break can be made.
- the commercials can be inserted (for example only after the end of a scene) and on the general location within the content (for example between 15 and 20 minutes into the content).
- the optimum location for commercial insertion can be chosen to minimize the difference between the commercials and the enclosing content and hence provide the desired smooth transition.
- the individual commercials within the block can be rearranged.
- the choice regarding the order in which commercials should be put next to each and towards the boundaries with non-commercial content is determined on the basis of the respective profiles. This can be used to smooth out the typically high audiovisual variation inside the commercial block. In fact, it is the pattern of frequent and abrupt interruptions of multiple audiovisual features within a relatively short period of time (several minutes), which is particularly disruptive and annoying for the viewer.
- the audiovisual content at the boundaries between the commercials may be modified, as well as at the transitions from/to the adjoining non-commercial content. It is known that gradual transitions between visual (camera) shots, e.g. cross-fades and dissolves, are less disruptive and are more difficult to detect than the abrupt cuts, for example, as disclosed in Ying Li, C-C. Jay Kuo, "Video Content Analysis Using Multimodal Information", 2003 by Kluwer Academic Publishers Group, ISBN 1-4020-7490-5. Thus, in providing such gradual transitions at the boundaries between commercials, a transition is further smoothed and effectively disturb detection of a high rate of visual shot-cuts or audiovisual super-separators. Similar effects could also be created in audio, where the insertion of non-audible noise can also be useful.
- Fig. 1 is a simplified schematic diagram of apparatus according to an embodiment of the present invention
- Figs. 2 (a) to (d) illustrate a first example of low-level feature statistics in a feature movie, including commercial blocks
- Figs. 3 (a) to (d) illustrate a second example of low- level feature statistics in a feature movie, including commercial blocks.
- the apparatus 100 comprises an input terminal 101 for receiving a multimedia data stream.
- the input terminal 101 is connected to the input of a multimedia (audio visual) content analyzer 103 and the input of a video editor 105.
- the output of the video editor 105 is connected to an output terminal 107 of the apparatus 100.
- the output of the multimedia content analyzer 103 is connected to the control 109 of the video editor 105.
- the output of the video editor 105 is also connected in a feedback loop to the control 109 of the video editor 105 via a reference detector 111.
- a multimedia data stream comprising a second video segment S 1 ,... ,S N and a first video segment C 1 ,... ,C M are input onto the input terminal 101 of the apparatus 100.
- the first and second segments are shown input separately on the input terminal 101.
- the first video segment consists of a plurality of individual commercials Ci , ... ,C M or segments of informative data for example.
- the second video segment consists of noncommercial video content broadcast as a contiguous sequence of (groups of) visual shots, S 1 ,... ,S N -
- the first and/or second video segments include both the visual and the corresponding audio data.
- the input multimedia data stream is analyzed by the multimedia content analyzer 103.
- the first and second video segments C 1 ,... ,C M and S 1 ,... ,S N are first identified, e.g. based on audiovisual inspection done by a human, and possibly labeled (by means of associating metadata) for easier indexing and access. Then, features that can be characteristic of the behavior of the first and second video segment are extracted.
- audiovisual features are known to the skilled person, as well as methods for their extraction.
- Presence of humans in a scene can be established by means of face or speech recognition
- Object or speaker tracking including the detection of speaker change
- Mood of the content e.g. derived from the mood of music or by analysis of speech prosody
- Audio composition of the scene (voice, music, voice + music, voice + background noise, etc.) - Localization of visual (camera) shot-cuts.
- Dominant color e.g. the color of the largest clusters in a color space
- Level and gradient of visual activity e.g. derived from the statistics of coding parameters such as motion vectors.
- Level and gradient of scene complexity e.g. derived from the statistics of coding parameters such as the product of coding bit rate and the quantization parameter.
- Audio (temporal and spectral properties):
- - volume e.g. of a speaker
- - tempo e.g. of a speaker
- the extracted features are then processed by the analyzer 103 to generate content profiles to control the video editor 105.
- Content profiling is the estimation of content similarity based on the extracted features. These profiles are generated by the different method described below.
- a profile may, typically, be composed of feature statistics, for example the mean and standard deviation computed for each feature over a number of consecutive video frames (the analysis window). For the high level features, the standard deviation would probably be most meaningful, while other measures suitable for binary signals are also conceivable.
- each candidate feature is considered separately, and the results obtained from different features are combined to form a final decision. Accordingly, single- feature profiles are created for content of a first video segment and a second video segment. These single-feature profiles are compared to yield a similarity estimate
- the estimates can be obtained by, for example, measuring the metric, such as distance - the smaller the distance the greater the similarity.
- the multiple estimates are then combined into a single decision using well-known techniques, such as majority voting, linear decision models with weighting, fuzzy logic, Markov Models, etc.
- a composite profile is obtained from a conjunction of lower- level features - a multi-dimensional feature vector containing a statistics (as described above) of each feature is obtained.
- the similarity between such feature-vectors extracted from different content items is then measured using techniques known from the field of statistical pattern classification, for example, as disclosed by Richard O. Duda, Peter E. Hart, David G. Stork, "Pattern Classification", 2001 by John Wiley & Sons, ISBN 0-471-05669-3, for instance data clustering. This may be achieved by using techniques such as supervised learning and neural networks.
- combining higher-level features in multidimensional feature vectors to determine a measure of similarity might not be adequate or even feasible (for example it may be difficult to quantize high-level features such as speaker tracking or mood of the content).
- the features are evaluated separately, for instance by applying heuristics to obtain similarity measures after which they may be combined using the techniques described above.
- the concept of content profiling according to the embodiments above is further explained with reference to Figs. 2 (a) to (d) and 3 (a) to (d) which illustrate two examples of statistics of such features. In the example a feature movie and a sequence of animated cartoons are illustrated.
- Figs. 2(a) and 3(a) represent ground truth (manual annotation) in which 1 corresponds to commercials, 0 to non-commercial content.
- Figs. 2(b) and 3(b) represent data as the standard deviation of the average luma of one video frame of the example. At each frame position, this is computed over an analysis window of 3500 video frames ( ⁇ 2.5 minutes of PAL video), centered at that position.
- Figs. 2(c) and 3(c) represent the probability of speech for the same sample of Figs. 2(b) and 3(b), respectively.
- Figs. 2(d) and 3(d) represent the probability of music for the same sample of Figs. 2(b), 2(c), 3(b) and 3(c), respectively.
- the original data has been sub-sampled by the factor of 2.
- the output of the analysis above is input into the control 109 of the video editor 105 for recommending to the user (broadcaster) a certain editing action, or to perform an appropriate editing action automatically.
- a possible result of such editing is shown in Fig. 1.
- a commercial block C 3 C 1 is composed and inserted between shot groups Sj and Sj +1 . This may be because the terminating part of Sj was found most resembling to the starting part Of C 3 and the starting part of Sj + i most resembling to the terminating part of Ci. Or else because high similarity was observed at the transition from C 3 to C 1 .
- a smooth transition is observed between the non-commercial portion and a commercial which makes more pleasant viewing and also assist to prevent automatic commercial block detection by PVRs.
- the segments 'T' which consist of extra content that may arise between different segments due to cross fading, insertion of silences as is well known in the art, etc may be inserted.
- the editing operations can also be performed with compressed video (i.e. after encoding), which is common in professional video production. Also, it is conceivable that the reference detector 111 and the multimedia (audiovisual) content analyzer 103 could overlap, as they both may incorporate a number of same operations.
- the edited data stream output from the editor 105 is fed back to the control 109 of the editor 105 to make adjustments to the editor 105 via the reference detector 111.
- the reference detector 111 comprises a known commercial block detector which seeks transitions between non-commercial portions and commercial blocks in order to distinguish between the commercial and non-commercial portion. If the transition created by the editor 105 is not smooth, this will be detected by the reference block 111 and fed to the control 109 to adjust operation of the editor 105 to improve smoothing of the transition between the different video segments.
- the edited data stream is then placed on the output terminal 107 of the apparatus 100.
- 'Means' as will be apparent to a person skilled in the art, are meant to include any hardware (such as separate or integrated circuits or electronic elements) or software (such as programs or parts of programs) which perform in operation or are designed to perform a specified function, be it solely or in conjunction with other functions, be it in isolation or in co-operation with other elements.
- the invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the apparatus claim enumerating several means, several of these means can be embodied by one and the same item of hardware.
- 'Computer program product' is to be understood to mean any software product stored on a computer-readable medium, such as a floppy disk, downloadable via a network, such as the Internet, or marketable in any other manner.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Television Signal Processing For Recording (AREA)
- Apparatus For Radiation Diagnosis (AREA)
- Image Processing (AREA)
- Picture Signal Circuits (AREA)
- Management Or Editing Of Information On Record Carriers (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2009547788A JP2010518672A (en) | 2007-02-01 | 2008-01-28 | Method and apparatus for smoothing a transition between a first video segment and a second video segment |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP07101558 | 2007-02-01 | ||
EP07101558.0 | 2007-02-01 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2008093277A2 true WO2008093277A2 (en) | 2008-08-07 |
WO2008093277A3 WO2008093277A3 (en) | 2008-10-23 |
Family
ID=39563508
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2008/050296 WO2008093277A2 (en) | 2007-02-01 | 2008-01-28 | Method and apparatus for smoothing a transition between a first video segment and a second video segment |
Country Status (3)
Country | Link |
---|---|
JP (1) | JP2010518672A (en) |
CN (1) | CN101601280A (en) |
WO (1) | WO2008093277A2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018007951A1 (en) * | 2016-07-07 | 2018-01-11 | Corephotonics Ltd. | Dual-camera system with improved video smooth transition by image blending |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5924127B2 (en) * | 2012-05-24 | 2016-05-25 | カシオ計算機株式会社 | Movie generation apparatus, movie generation method, and program |
EP4091332A1 (en) | 2020-01-15 | 2022-11-23 | Dolby International AB | Adaptive streaming of media content with bitrate switching |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001030073A1 (en) * | 1999-10-19 | 2001-04-26 | Koninklijke Philips Electronics N.V. | Television receiver and method of using same for displaying information messages |
-
2008
- 2008-01-28 CN CNA2008800039574A patent/CN101601280A/en active Pending
- 2008-01-28 JP JP2009547788A patent/JP2010518672A/en not_active Withdrawn
- 2008-01-28 WO PCT/IB2008/050296 patent/WO2008093277A2/en active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001030073A1 (en) * | 1999-10-19 | 2001-04-26 | Koninklijke Philips Electronics N.V. | Television receiver and method of using same for displaying information messages |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018007951A1 (en) * | 2016-07-07 | 2018-01-11 | Corephotonics Ltd. | Dual-camera system with improved video smooth transition by image blending |
US10706518B2 (en) | 2016-07-07 | 2020-07-07 | Corephotonics Ltd. | Dual camera system with improved video smooth transition by image blending |
Also Published As
Publication number | Publication date |
---|---|
WO2008093277A3 (en) | 2008-10-23 |
CN101601280A (en) | 2009-12-09 |
JP2010518672A (en) | 2010-05-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Brezeale et al. | Automatic video classification: A survey of the literature | |
US6469749B1 (en) | Automatic signature-based spotting, learning and extracting of commercials and other video content | |
US7526181B2 (en) | System and method for automatically customizing a buffered media stream | |
JP4699476B2 (en) | Video summarization device | |
US7327885B2 (en) | Method for detecting short term unusual events in videos | |
US20050180730A1 (en) | Method, medium, and apparatus for summarizing a plurality of frames | |
KR101341808B1 (en) | Video summary method and system using visual features in the video | |
US7149365B2 (en) | Image information summary apparatus, image information summary method and image information summary processing program | |
US20030123850A1 (en) | Intelligent news video browsing system and method thereof | |
JP4332700B2 (en) | Method and apparatus for segmenting and indexing television programs using multimedia cues | |
JP2003101939A (en) | Apparatus, method, and program for summarizing video information | |
KR20020035153A (en) | System and method for automated classification of text by time slicing | |
US20100259688A1 (en) | method of determining a starting point of a semantic unit in an audiovisual signal | |
Brezeale et al. | Using closed captions and visual features to classify movies by genre | |
US20080256576A1 (en) | Method and Apparatus for Detecting Content Item Boundaries | |
WO2008093277A2 (en) | Method and apparatus for smoothing a transition between a first video segment and a second video segment | |
Yang et al. | Key frame extraction using unsupervised clustering based on a statistical model | |
Huang et al. | A film classifier based on low-level visual features | |
CN101355673B (en) | Information processing device, information processing method | |
JP5257356B2 (en) | Content division position determination device, content viewing control device, and program | |
Huang et al. | Movie classification using visual effect features | |
JP5254900B2 (en) | Video reconstruction method, video reconstruction device, and video reconstruction program | |
Barbieri et al. | Movie-in-a-minute: automatically generated video previews | |
Kyperountas et al. | Audio PCA in a novel multimedia scheme for scene change detection | |
Khan et al. | Unsupervised commercials identification in videos |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 200880003957.4 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 08702541 Country of ref document: EP Kind code of ref document: A2 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2008702541 Country of ref document: EP |
|
ENP | Entry into the national phase in: |
Ref document number: 2009547788 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase in: |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 5006/CHENP/2009 Country of ref document: IN |