WO2005117431A1 - Method for synchronising video and audio data - Google Patents
Method for synchronising video and audio data Download PDFInfo
- Publication number
- WO2005117431A1 WO2005117431A1 PCT/AU2005/000747 AU2005000747W WO2005117431A1 WO 2005117431 A1 WO2005117431 A1 WO 2005117431A1 AU 2005000747 W AU2005000747 W AU 2005000747W WO 2005117431 A1 WO2005117431 A1 WO 2005117431A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- stream
- audio
- frame
- time
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/236—Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
- H04N21/2368—Multiplexing of audio and video streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4307—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
- H04N21/43072—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/414—Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
- H04N21/4143—Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance embedded in a Personal Computer [PC]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4305—Synchronising client clock from received content stream, e.g. locking decoder clock with encoder clock, extraction of the PCR packets
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/434—Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
- H04N21/4341—Demultiplexing of audio and video streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/04—Synchronising
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N9/00—Details of colour television systems
- H04N9/79—Processing of colour television signals in connection with recording
- H04N9/80—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
- H04N9/804—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving pulse code modulation of the colour picture signal components
- H04N9/8042—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving pulse code modulation of the colour picture signal components involving data reduction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/426—Internal components of the client ; Characteristics thereof
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/44—Receiver circuitry for the reception of television signals according to analogue transmission standards
- H04N5/60—Receiver circuitry for the reception of television signals according to analogue transmission standards for the sound signals
- H04N5/602—Receiver circuitry for the reception of television signals according to analogue transmission standards for the sound signals for digital sound signals
Definitions
- the present invention concerns a method for synchronising audio and video media streams, in particular to provide to a user the experience of seamless audio playback and smoothly synchronised video playback.
- the invention relates to the field of data processing, for processing a stream of data including audio and video data comprised in a sequence of frames.
- a new architecture and method of operation is described below.
- the MPEG standard (from the Motion Pictures Expert Group (MPEG)) is a well established standard for audio and video compression and decompression algorithms, for use in the digital transmission and receipt of audio and video broadcasts. This provides for the efficient compression of data according to an established psychoacoustic model to enable real time transmission, decompression and broadcast of high quality sound and video images.
- Other audio standards have also been established for the encoding and decoding of audio and video data transmitted in digital format, such as data for digital television systems.
- Compression standards are based on psycho-acoustics of human perception. Generally, video and audio need to match to an accuracy of not much worse than 1/20 of a second in order to be acceptable for the viewer. Accuracy worse than 1/10 of a second is usually noticeable by the viewer, and accuracy of worse than 1/5 of a second is almost always noticeable.
- Maintaining synchronisation between video and audio data is a straightforward matter if the streams are integrated and played using a single video/audio source. This is not the case for digital video, as the audio data and the video data are separated and independently decoded, processed, and played. Furthermore, computer users may require to view digital video while performing some other task or function within the computer, such as sending or receiving information from a computer network. This is quite possible in a multitasking computing environment, but can introduce significant multimedia synchronisation problems between the audio and the video data.
- audio hardware does not generally support simple alterations in the audio rate, and in any case varying the audio rate produces a result generally unpleasant to the viewer, such as wavering alterations in pitch, deterioration in speech, etc.
- the audio data is generally taken as providing the standard of player time, and the video is made to keep pace with it.
- a further approach is simply to increase the performance level of the hardware, to ensure that the intensive computing requirements are met, and synchronisation of the audio and video can therefore be maintained.
- the system has no control over the processing power (or over the simultaneous competing needs) of individual machines. It is therefore important that the synchronisation processes are as performance-tolerant as possible.
- United States Patent No. 6,310,652 to Li et al. discusses a synchronisation method in which a 'presentation time' of data frames is continuously compared with a 'reference time' calculated by the playing device. Subsequent frames or portions thereof, are then either dropped or repeated depending on whether the presentation time is earlier or later than the calculated reference time.
- This solution is less than ideal, in that it not only requires specialised hardware to calculate the reference time, but also involves dropping or repeating both audio and video frames, resulting in an unsatisfactory user experience.
- United States Patent No. 6,272,776 to Griffiths discusses playing the video data ahead of the corresponding audio data in order to maintain synchronisation.
- the 'initial due time' of the video data is first determined, which is typically the time-stamped initial start time for the video and audio data indicating when the video and audio data should be played.
- An 'offset time' is then applied to the video due time, which adjusts when the video data should be played relative to the corresponding audio data and produces an adjusted video due time earlier than the initial video due time.
- the particular value of the offset - and hence the amount of time by which the video data is played ahead of the audio data - may be varied depending on how early or late a frame of video data is relative to the corresponding audio data. Variations in the offset may also be made to account for an increase in available processing power, which allows a smaller offset to be applied.
- the method is said to be advantageous in that it allows video to be played ahead of the audio in order to 'build in a margin' for any future late frames while degrading the video as little as possible.
- the method attempts to jump to the exact point of synchronisation between the audio and video data upon each detection of an early or late video frame. Typically, this results in a blurred or jerky image, in a similar manner to when video frames are dropped or paused, in order to achieve synchronisation.
- a method for playing a multimedia digital data stream comprising audio data and video data, the latter displayed to a user in a sequence of frames, in order to provide synchronisation between the streams, comprising the steps of: calculating the audio time in accordance with the time elapsed since the start of the audio data stream; determining at a certain point in time the offset of the video stream; adjusting, if an offset is detected, the frame delivery rate by a prescribed amount; repeating the above steps at successive points in time, to constantly adjust the frame delivery rate by no more than a prescribed amount for each successive frame, so to constantly trim the video stream display to enhance synchronisation with the audio stream.
- the audio is synchronised to the system clock, and that synchronisation produces an offset.
- This offset is relative to a point in time when the media started playing. It should be noted that there is plenty of this jitter in the audio synchronisation offset, and this is dependent on the hardware, the load on the computer at the time, and many other factors.
- Video frames are displayed at a certain and adjustable rate, that rate being trimmed (adjusted by a small amount) in accordance with the apparent time difference between the audio and video.
- a computer software product for playing a multimedia digital data stream comprising audio data and video data, the latter displayed to a user in a sequence of frames, in order to provide synchronisation between the streams, comprising computer program code, which when executed: j calculates the audio time in accordance with the time elapsed since the start of the audio data stream; determines at a certain point in time the offset of the video stream from the audio stream; adjusts, if an offset is detected, the frame delivery rate by a prescribed amount; and repeats the above steps at successive points in time, to constantly adjust the frame delivery rate by no more than a maximum amount for each successive frame, so as to constantly trim the video stream display to enhance synchronisation with the audio stream.
- the present invention may be practised on any suitable computing device, with the necessary hardware and software resources for decoding and playing digital audio and video data streams.
- suitable computing devices include personal computers (PCs), hand-held devices, multiprocessor systems, mobile telephone handsets, dvd players and terrestrial, satellite or cable digital television set top boxes.
- PCs personal computers
- hand-held devices multiprocessor systems
- mobile telephone handsets dvd players
- terrestrial satellite or cable digital television set top boxes.
- the data to be played may be provided as streamed data, or may be stored for playback in any suitable form.
- the audio playback is synchronised to the system clock of the particular device, and this is the only variable that is considered an absolute reference for the purposes of the technique of the invention.
- the system clock measures time in milliseconds. When the audio stream is started, the system clock time is recorded. A calculation is then performed to determine how much audio time has elapsed.
- Some media playback devices such as those implemented on the Mac OS, provide this information directly in the form of an actual time value.
- Other devices such as those utilising the DirectX Application Programming Interfaces, only provide the position of , a playback pointer in an audio buffer, rather than an actual time elapsed value.
- 'ring buffers' are often used, it is necessary to keep track of the number of buffers of data that have been used, along with the sample rate of the media, in order to calculate how much audio time has elapsed.
- the audio time is considered as the 'lead', and the video attempts to loosely synchronise to this time.
- a timer event exists which prompts the video to display a frame. This prompting is over sampled, and is often ignored. In the current embodiment of the device, it is set to 100 prompts per second, and therefore 3 in 4 prompts are ignored when displaying 25 frames per second media. This setting is arbitrary, and the trade-off is amount of CPU overhead used against smoothness of playback.
- the video time is calculated from the same base offset as the audio time.
- the actual video time is calculated by the time from when the last displayed frame plus that frame number, multiplied by the interval between frames (the reciprocal of the frame rate).
- the determination of whether the audio time is 'in front' or 'behind' the video time occurs at a frequency of around once per second. This is sufficiently frequent to afford a smooth and constant effective synchronisation between the audio and video stream.
- the audio and the video become excessively out of synchronism (in accordance with prescribed criteria; currently 200ms is considered excessively out of synchronism), the following considerations come into play. If the audio is excessively ahead of the video, one or more entire frames are omitted ('dropped'), to enable the video to catch up with the audio. As many are discarded as is required to catch up. If the video time is well ahead of the audio time, then the video is stalled until the audio catches up.
- the accompanying Figure 1 diagrammatically illustrates the method of the invention.
- the horizontal time axis represents time t elapsed from commencement of the multimedia data playback, as measured by the system clock.
- the upper trace shows the audio data stream, the audio played time APT representing the synchronisation point we are aiming for.
- the lower trace shows the video data stream, and the latest frame to be played LFP is shown in the figure as trailing the synchronisation point objective.
- the next frame is therefore scheduled for display at the 'apparent' time of 1/fps later, but with a +2ms deviation, to trim it towards synchronisation with the audio data stream.
- the video data stream is ahead of the audio data stream, then the next frame is scheduled for display 1/fps later, but with a - 2ms deviation. If the latest frame to be played occurred before a prescribed time interval ta before the synchronisation point (LFP ⁇ -t ⁇ ), then one or more frames are omitted. If the latest frame to be played is timed to display after a prescribed time interval ta from the synchronisation point (LFP > ta), then the video is held for the audio to catch up.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- General Engineering & Computer Science (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2004902811 | 2004-05-26 | ||
AU2004902811A AU2004902811A0 (en) | 2004-05-26 | Method for synchronising video and audio data |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2005117431A1 true WO2005117431A1 (en) | 2005-12-08 |
Family
ID=35451272
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/AU2005/000747 WO2005117431A1 (en) | 2004-05-26 | 2005-05-26 | Method for synchronising video and audio data |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2005117431A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2076052A3 (en) * | 2007-12-28 | 2009-07-29 | Intel Corporation | Synchronizing audio and video frames |
CN101119461B (en) * | 2006-08-02 | 2010-05-12 | 广达电脑股份有限公司 | System and method for maintaining video frame and audio frame synchronous broadcasting |
US8126309B2 (en) * | 2007-02-19 | 2012-02-28 | Kabushiki Kaisha Toshiba | Video playback apparatus and method |
CN112637488A (en) * | 2020-12-17 | 2021-04-09 | 深圳市普汇智联科技有限公司 | Edge fusion method and device for audio and video synchronous playing system |
CN112714353A (en) * | 2020-12-28 | 2021-04-27 | 杭州电子科技大学 | Distributed synchronization method for multimedia stream |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5617502A (en) * | 1996-03-22 | 1997-04-01 | Cirrus Logic, Inc. | System and method synchronizing audio and video digital data signals during playback |
US6337883B1 (en) * | 1998-06-10 | 2002-01-08 | Nec Corporation | Method and apparatus for synchronously reproducing audio data and video data |
US6452974B1 (en) * | 1998-01-02 | 2002-09-17 | Intel Corporation | Synchronization of related audio and video streams |
US20030058224A1 (en) * | 2001-09-18 | 2003-03-27 | Chikara Ushimaru | Moving image playback apparatus, moving image playback method, and audio playback apparatus |
-
2005
- 2005-05-26 WO PCT/AU2005/000747 patent/WO2005117431A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5617502A (en) * | 1996-03-22 | 1997-04-01 | Cirrus Logic, Inc. | System and method synchronizing audio and video digital data signals during playback |
US6452974B1 (en) * | 1998-01-02 | 2002-09-17 | Intel Corporation | Synchronization of related audio and video streams |
US6337883B1 (en) * | 1998-06-10 | 2002-01-08 | Nec Corporation | Method and apparatus for synchronously reproducing audio data and video data |
US20030058224A1 (en) * | 2001-09-18 | 2003-03-27 | Chikara Ushimaru | Moving image playback apparatus, moving image playback method, and audio playback apparatus |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101119461B (en) * | 2006-08-02 | 2010-05-12 | 广达电脑股份有限公司 | System and method for maintaining video frame and audio frame synchronous broadcasting |
US8126309B2 (en) * | 2007-02-19 | 2012-02-28 | Kabushiki Kaisha Toshiba | Video playback apparatus and method |
EP2076052A3 (en) * | 2007-12-28 | 2009-07-29 | Intel Corporation | Synchronizing audio and video frames |
US9571901B2 (en) | 2007-12-28 | 2017-02-14 | Intel Corporation | Synchronizing audio and video frames |
CN112637488A (en) * | 2020-12-17 | 2021-04-09 | 深圳市普汇智联科技有限公司 | Edge fusion method and device for audio and video synchronous playing system |
CN112637488B (en) * | 2020-12-17 | 2022-02-22 | 深圳市普汇智联科技有限公司 | Edge fusion method and device for audio and video synchronous playing system |
CN112714353A (en) * | 2020-12-28 | 2021-04-27 | 杭州电子科技大学 | Distributed synchronization method for multimedia stream |
CN112714353B (en) * | 2020-12-28 | 2022-08-30 | 杭州电子科技大学 | Distributed synchronization method for multimedia stream |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109714634B (en) | Decoding synchronization method, device and equipment for live data stream | |
US8111327B2 (en) | Method and apparatus for audio/video synchronization | |
US10930318B2 (en) | Gapless video looping | |
KR102536652B1 (en) | Dynamic reduction of alternative content playback to support aligning the end of the alternative content with the end of the substitute content. | |
EP1684516B1 (en) | Software-based audio rendering | |
US20070217505A1 (en) | Adaptive Decoding Of Video Data | |
CN113225598B (en) | Method, device and equipment for synchronizing audio and video of mobile terminal and storage medium | |
US20070019931A1 (en) | Systems and methods for re-synchronizing video and audio data | |
CN109963184A (en) | A kind of method, apparatus and electronic equipment of audio-video network broadcasting | |
US10638180B1 (en) | Media timeline management | |
KR102469142B1 (en) | Dynamic playback of transition frames while transitioning between media stream playbacks | |
CN106470352B (en) | Live channel playing method, device and system | |
US8279344B2 (en) | Synchronization of video presentation by video cadence modification | |
CN108810656B (en) | Real-time live broadcast TS (transport stream) jitter removal processing method and processing system | |
US20180367827A1 (en) | Player client terminal, system, and method for implementing live video synchronization | |
US10148722B2 (en) | Methods and nodes for synchronized streaming of a first and a second data stream | |
CN108259964B (en) | Video playing rate adjusting method and system | |
CN101119461B (en) | System and method for maintaining video frame and audio frame synchronous broadcasting | |
WO2005117431A1 (en) | Method for synchronising video and audio data | |
JP2020522193A (en) | Temporal placement of rebuffering events | |
CN113766261A (en) | Method and device for determining pre-pulling duration, electronic equipment and storage medium | |
US8848803B2 (en) | Information processing device and method, and program | |
JP3906712B2 (en) | Data stream processing device | |
US11283852B2 (en) | Methods and nodes for synchronized streaming of a first and a second data stream | |
CN113271496B (en) | Video smooth playing method and system in network live broadcast and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 69(1) EPC - FORM EPO 1205A DATED 30-03-2007 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 05742122 Country of ref document: EP Kind code of ref document: A1 |