WO2005098854A1 - 音声再生装置、音声再生方法及びプログラム - Google Patents
音声再生装置、音声再生方法及びプログラム Download PDFInfo
- Publication number
- WO2005098854A1 WO2005098854A1 PCT/JP2005/006685 JP2005006685W WO2005098854A1 WO 2005098854 A1 WO2005098854 A1 WO 2005098854A1 JP 2005006685 W JP2005006685 W JP 2005006685W WO 2005098854 A1 WO2005098854 A1 WO 2005098854A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio
- video
- reproduction
- signal
- time information
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 116
- 230000005236 sound signal Effects 0.000 claims abstract description 223
- 238000005070 sampling Methods 0.000 claims description 83
- 230000002194 synthesizing effect Effects 0.000 claims description 31
- 238000006243 chemical reaction Methods 0.000 claims description 29
- 230000008859 change Effects 0.000 claims description 24
- 230000033458 reproduction Effects 0.000 description 179
- 238000012545 processing Methods 0.000 description 114
- 230000008569 process Effects 0.000 description 40
- 238000010586 diagram Methods 0.000 description 31
- 230000000694 effects Effects 0.000 description 27
- 230000001360 synchronised effect Effects 0.000 description 26
- 230000015572 biosynthetic process Effects 0.000 description 23
- 238000003786 synthesis reaction Methods 0.000 description 23
- 230000006835 compression Effects 0.000 description 7
- 238000007906 compression Methods 0.000 description 7
- 230000005540 biological transmission Effects 0.000 description 5
- 230000009977 dual effect Effects 0.000 description 4
- 230000010354 integration Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000002159 abnormal effect Effects 0.000 description 3
- 239000002131 composite material Substances 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 3
- 230000007423 decrease Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000005562 fading Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000010187 selection method Methods 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/102—Programmed access in sequence to addressed parts of tracks of operating record carriers
- G11B27/105—Programmed access in sequence to addressed parts of tracks of operating record carriers of operating discs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/236—Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
- H04N21/2368—Multiplexing of audio and video streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4305—Synchronising client clock from received content stream, e.g. locking decoder clock with encoder clock, extraction of the PCR packets
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4307—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
- H04N21/43072—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/434—Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
- H04N21/4341—Demultiplexing of audio and video streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S1/00—Two-channel systems
- H04S1/007—Two-channel systems in which the audio signals are in digital form
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B20/10527—Audio or video recording; Data buffering arrangements
- G11B2020/10537—Audio or video recording
- G11B2020/10592—Audio or video recording specifically adapted for recording or reproducing multichannel signals
Definitions
- the present invention relates to an audio reproducing apparatus, an audio reproducing method, and a program.
- the present invention relates to an audio reproducing device for reproducing a compression-coded digital audio signal.
- MPEG is known as a known standard for encoding and compressing an audio signal and a video signal into a digital signal, and then decoding the signal.
- the audio signal and the video signal are encoded separately in order to decode the multiplexed and compression-coded audio signal and the video signal and then reproduce the audio signal and the video signal in synchronization with each other.
- time information information on the time at which the signal is reproduced and displayed (hereinafter referred to as “time information”) is added and compressed. Accordingly, when the compression-encoded digital audio signal and video signal are decompressed, the playback device refers to the own system time reference value and refers to the time information to reproduce the audio signal and the video signal. Plays back while synchronizing with the signal.
- FIG. 1 is a block diagram showing a configuration of a dual audio decoder 183 that performs the reproducing method.
- the dual audio decoder 183 includes a first audio decoder 183a and a second audio decoder 183b, and a first audio selection circuit 183c and a second audio selection circuit 183d.
- the first audio signal which is a Japanese audio signal
- the second audio signal which is an English voice signal
- the decoded first and second audio signals are processed by a first audio selection circuit 183c and a second audio selection circuit 183d. For example, if the audio output channels are left and right one channel each, the first and second audio signals are processed so as to output one monaural channel each. Alternatively, processing is performed such that only one of the first and second audio signals is output in two-channel stereo. If the number of audio output channels is greater than the left and right channels, the first and second audio signals are processed to be output in a combination of stereo and monaural.
- the first audio selection circuit 183c and the second audio selection circuit 183d output a stereo 2 + 1 channel for the outputable 5 + 1 channel. You can output each channel individually, or select and output only 5 + 1 channels of one audio data.
- Patent Document 1 Japanese Patent Application Laid-Open No. 10-145735 (Pages 10 to 11, FIGS. 4, 8, and 9)
- Patent Document 1 describes a method of decoding a plurality of pieces of data having different angles by a plurality of moving picture decoding means, and combining and displaying them by a video data combining means.
- Patent Document 1 discloses that a plurality of audio data in different languages are added to video data. If so, explain how to decode each audio data with a plurality of audio decoding means and mix and play them, and how to select and play any one! /
- Patent Document 1 does not specifically describe a detailed means for mixing two types of data or a means for establishing reproduction synchronization. Even if it is limited to audio only, the mixing method when the sampling rates of the two types of audio data to be reproduced are different, the mixing ratio of each audio data, and audio with different number of channels such as surround audio and stereo audio There is no explanation on how to mix data, how to mix sections, and how to synchronize each audio data.
- the first audio is compression-encoded by the Dolby Digital system
- the second audio is encoded by the linear PCM.
- the processing is required.
- the present invention has been made in consideration of the above problems, and has as its object to provide an audio reproducing device that reproduces a plurality of digital audio signals in synchronization with each other.
- the audio reproducing apparatus is an apparatus for reproducing and outputting an audio signal, and stores a plurality of pieces of audio reproduction time information of each of the plurality of audio signals on one time axis.
- a synchronization means for synchronizing the plurality of audio signals, and a plurality of synchronization means allocated on the time axis Synthesizing means for synthesizing the plurality of audio signals using the audio reproduction time information.
- the audio reproducing apparatus of the present invention allocates a plurality of audio reproduction time information items of a plurality of audio signals on one time axis, a plurality of digital audio signals are reproduced in synchronization. You can do it.
- the time axis may be a time axis specified by a plurality of pieces of the audio reproduction time information of any one of the plurality of audio signals.
- the synchronization means assigns the plurality of pieces of audio reproduction time information of another audio signal to a time axis specified by the audio reproduction time information of any one of the audio signals. It is.
- a plurality of sounds can be synchronized by matching the audio reproduction time information of the other audio signal with the audio reproduction time information of the main audio signal.
- a third aspect of the present invention is an audio reproducing device, wherein the time axis is a time axis specified by a plurality of pieces of the audio reproduction time information of the one audio signal being reproduced at a variable speed. It is. This has an effect that even in the case of variable speed reproduction, a plurality of audio signals can be synchronized by decoding using the audio reproduction time information of the audio signal being reproduced at variable speed.
- the plurality of audio signals are multiplexed with a video signal, and the time axis is specified by a plurality of pieces of video reproduction time information of the video signal.
- a fifth aspect of the present invention is the audio reproducing apparatus, wherein the time axis is a time axis specified by video reproduction time information of the video signal being reproduced at a variable speed. This has the effect of synchronizing audio with the reproduced video at the time of the skip according to the output of the skip-reproduced video.
- a sixth aspect of the present invention is the audio reproducing apparatus according to the present invention, wherein the time axis is a time axis specified by a variable speed system time reference signal. This has the effect of synchronizing video and audio by making the system time reference signal, which is the reference for the entire system, variable.
- a sound reproducing apparatus further comprises the step of changing the sampling rate of another audio signal in accordance with the sampling rate of one of the plurality of audio signals.
- the apparatus further comprises a sampling rate converting means for converting, wherein the synthesizing means synthesizes any one of the audio signals and the other audio signal converted by the sampling rate converting means. This makes it possible to reproduce a plurality of sounds in accordance with the sampling rate of one sound.
- main audio or sub-audio such as commentary
- sub-audio such as commentary
- the user can hear multiple sounds at a fixed sampling rate.
- An eighth aspect of the present invention is the audio reproduction device according to the present invention, wherein any one of the audio signals is an audio signal having a longest continuous audio reproduction section among the plurality of audio signals.
- Auxiliary audio such as commentary may be inserted for the purpose of assisting the main audio such as commentary on a specific scene, and it is assumed that the audio playback section is shorter than the main audio. Therefore, if the longer playback section is selected, the number of times the sampling rate is changed in the middle can be reduced.
- a ninth aspect of the present invention is the audio reproducing apparatus according to the ninth aspect, wherein the one of the audio signals is an audio signal having the least intermittent audio reproduction interval among the plurality of audio signals.
- the sampling rate of the intermittent audio signal is set to the least intermittent audio signal (there is no intermittent audio signal). (Including audio signals) can reduce the number of times the sampling rate is changed during the conversion.
- a tenth audio reproducing apparatus is the audio reproducing apparatus, wherein the one audio signal is an audio signal having a highest sampling rate among the plurality of audio signals. This has the effect of keeping the high-quality sound as it is, upsampling other sounds, and maintaining the sound quality as much as possible.
- An audio reproduction device is the audio reproduction device, wherein any one of the audio signals is an audio signal having a lowest sampling rate among the plurality of audio signals. This has the effect of reducing the amount of data transmitted for audio by converting to a low sampling rate, such as when the transmission band for audio output is limited.
- a twelfth audio reproducing apparatus is the audio reproducing apparatus, wherein the one audio signal is an audio signal of which sampling rate does not change among the plurality of audio signals. If the sampling rate is changed on the way, audio mute may be required at the point of change in rate playback. This has the effect of mainly selecting the one that does not change the rate, and keeping continuous audio playback.
- a thirteenth audio reproduction device may further comprise: combining the plurality of audio signals by adding another audio signal to any one of the plurality of audio signals.
- An output level adjusting means for reducing a reproduction output level of any one of the audio signals only in a portion to which the other audio signal is added.
- a fourteenth audio reproducing apparatus is the audio reproducing apparatus, wherein the output level adjusting means synthesizes the one audio signal with the other audio signal.
- the fifteenth audio reproducing apparatus further comprises: adjusting the number of reproduced signal channels of another audio signal in accordance with the number of reproduced signal channels of any one of the plurality of audio signals.
- An integrated distribution means for integrating or distributing is provided. This has the effect of realizing addition of a specific audio signal to a channel without causing audio distortion even if the number of reproduction channels of the reproduced signals is different from each other.
- a sixteenth audio reproducing apparatus further integrates or distributes the number of reproduction signal channels of each of the audio signals in accordance with the number of channels of an audio output device connected to the audio reproducing apparatus.
- An integrated distribution unit is provided. It integrates or distributes the number of playback signal channels according to the number of channels of the user's audio output device (for example, the number of speaker connections). To perform speech synthesis.
- the integrated distributing means integrates or distributes the number of reproduced signal channels of each of the audio signals in accordance with an audio output designation channel of the audio output device by a user. Device.
- the number of reproduction signal channels is integrated or distributed according to the number of channels (for example, the number of speaker connections) of the user's audio output device that the user wants to reproduce, and synthesis is performed.
- the present invention can also be realized as a sound reproducing method using the characteristic constituent means of the sound reproducing apparatus of the present invention as steps, or as a program for causing a computer to execute those steps.
- the program can be distributed via a recording medium such as a CD-ROM or a transmission medium such as a communication network.
- the present invention can provide an audio reproduction device that reproduces a plurality of digital audio signals in synchronization.
- the audio reproduction device of the present invention can execute mixing of a plurality of audio signals having different sampling rates and encoding methods, and synchronous reproduction of a plurality of audio signals in variable speed reproduction.
- FIG. 1 is a configuration diagram of a dual audio decoder that performs a conventional audio reproduction method.
- FIG. 2 is a block diagram illustrating a configuration of an image and sound reproduction device according to Embodiment 1.
- FIG. 3 is a flowchart showing a method for synchronously reproducing video and audio in Embodiment 1.
- FIG. 4 is a diagram for explaining a method of storing audio reproduction data in the embodiment.
- FIG. 5 is a diagram showing an example in which a plurality of images are overlapped in the embodiment.
- FIG. 6 is a diagram showing an example of a temporal relationship in which a main video and a commentary video are displayed in the embodiment.
- FIG. 7 shows that commentary video is superimposed on the main video in Embodiments 1 and 4.
- FIG. 3 is a block diagram illustrating a configuration of a sleeping image reproducing device.
- FIG. 8 is a configuration diagram of an audio reproducing apparatus that superimposes a main sound and a sub sound in each embodiment.
- FIG. 9 is a diagram showing a relationship between audio reproduction time information of a main audio and audio reproduction time information of a sub audio.
- FIG. 10 is a diagram showing a state in which audio playback time information is added to the audio streams of the main audio and the sub audio.
- FIG. 11 is a diagram showing a configuration example of an addition output unit for describing a voice addition method according to the first embodiment.
- FIG. 12 is a diagram for explaining connection between the audio reproduction device according to the first embodiment and an externally connected device.
- FIG. 13 is a diagram for explaining sound integration.
- FIG. 14 is a diagram for explaining audio distribution.
- FIG. 15 is a diagram for explaining connection between the audio reproduction device according to the first embodiment and an externally connected device.
- FIG. 16 is a diagram showing a state in which the sub sound has not yet ended even after the main sound has ended.
- FIG. 17 is a diagram showing a state in which a sound effect is synthesized with a main sound.
- FIG. 18 is a diagram for explaining synthesis and integration of audio signals.
- FIG. 19 is a diagram showing a DVD on which a plurality of audio signals are recorded.
- FIG. 20 is a flowchart showing processing for adding a sub-voice to a main voice and performing voice synthesis before or after variable speed processing in the second embodiment.
- FIG. 21 is a block diagram for explaining a method of performing variable speed control by an audio output processing unit according to the second and third embodiments.
- FIG. 22 is a diagram for explaining the principle of audio variable speed processing according to the second embodiment.
- FIG. 23 is a flowchart showing a method for synchronously reproducing a plurality of videos according to Embodiment 4. Explanation of reference numerals
- FIG. 2 is a block diagram illustrating a configuration of the image and sound reproduction device according to the first embodiment.
- a configuration of the image and sound reproduction device, an image reproduction method, and a sound reproduction method according to the first embodiment will be mainly described mainly with reference to FIG. explain.
- the present invention relates to a technique for reproducing a plurality of digital audio signals in synchronization with each other. Before describing the technique in detail, a signal in which a video signal and an audio signal are multiplexed is reproduced. The technology is explained.
- FIG. 2 is a block diagram showing a configuration of the video and audio reproduction device according to the first embodiment.
- the video and audio reproduction device according to the first embodiment is a device that reproduces a signal in which a video signal and an audio signal are multiplexed.
- an input unit 1 a video buffer unit A102, Video buffer section B103, video decoding section A104, video decoding section B105, image synthesizing section 106, audio buffer section A2, audio buffer section B3, audio decoding section A4, audio decoding section B5, audio synthesis It is composed of Part 6.
- Video buffer unit A102, video buffer unit B103, video decoding unit A104, video The decoding unit B105 and the image synthesizing unit 106 are components that process video signals.
- the audio buffer unit A2, the audio buffer unit B3, the audio decoding unit A4, the audio decoding unit B5, and the audio synthesizing unit 6 are components that process audio signals.
- the input unit 1 includes a data recording device such as an optical disk that stores contents encoded by various encoding methods and digital audio signals and video signals subjected to compression encoding such as digital broadcasting (see FIG. (Not shown) from the multiplexed audio signal and the video signal.
- the input unit 1 separates the multiplexed audio signal and video signal into a video signal and an audio signal, extracts video playback time information from the video signal, and extracts audio playback time information from the audio signal. I do.
- the video signal and the audio signal input to the input unit 1 are two-channel signals, respectively. Therefore, the input unit 1 separates the multiplexed audio signal and video signal into a video signal and an audio signal for each channel.
- each of the video buffer unit A 102, video buffer unit B 103, video decoding unit A 104, video decoding unit B 105, and image synthesizing unit 106 that processes a video signal will be described.
- the video buffer unit A102 is a component unit that stores the video signal of the first channel separated by the input unit 1.
- the video buffer unit A102 stores a video playback time information management unit A121 that stores video playback time information of the video signal of the first channel, and stores compressed video data of the video signal of the first channel.
- a compression video buffer unit A122 stores compressed video data of the video signal of the first channel.
- the video playback time information management unit A121 has a table that associates the compressed video data of the first channel with the video playback time information.
- the video buffer unit B103 is a component that stores the video signal of the second channel separated by the input unit 1.
- the video buffer unit B103 stores a video playback time information management unit B131 that stores video playback time information of the video signal of the second channel, and stores compressed video data of the video signal of the second channel.
- a compression video buffer unit B132 stores compressed video data of the video signal of the second channel.
- the video playback time information management unit B131 It has a table for associating the compressed video data of channel 2 with the video playback time information.
- the video decoding unit A104 analyzes the attribute information (video header information) of the compressed video data of the first channel stored in the compressed video buffer unit A122, and manages the compressed video data in the video reproduction time information management. This is a component for decompressing according to the video playback time information stored in the unit A121.
- the video decoding unit A104 has a frame buffer unit A141 for storing the expanded video data.
- the video decoding unit B105 analyzes the attribute information (video header information) of the compressed video data of the second channel stored in the compressed video buffer unit B132, and manages the compressed video data in the video reproduction time information management. This is a component that is stored in the unit B 131 and decompressed in accordance with the video playback time information.
- the video decoding unit B105 has a frame buffer unit B151 for storing the expanded video data.
- the image synthesizing unit 106 is a component that synthesizes each video data expanded by the video decoding unit A104 and the video decoding unit B105 and outputs the synthesized data to an external display unit.
- audio buffer unit A2 audio buffer unit B3, audio decoding unit A4, audio decoding unit B5, and audio synthesizing unit 6, which process audio signals, will be described.
- the audio buffer unit A 2 is a component that stores the audio signal of the first channel separated by the input unit 1.
- the audio buffer unit A2 includes a compressed audio buffer unit A21 that stores compressed audio data of the audio signal of the first channel, and an audio playback unit that stores audio playback time information of the audio signal of the first channel. It consists of a time information management unit A22.
- the audio reproduction time information management unit A22 has a table that associates the compressed audio data of the first channel with the audio reproduction time information.
- the audio buffer unit B3 is a component that stores the audio signal of the second channel separated by the input unit 1.
- the audio buffer unit B3 includes a compressed audio buffer unit B31 that stores compressed audio data of the audio signal of the second channel, and an audio playback time of the audio signal of the second channel.
- the audio playback time information management unit B32 stores information.
- the audio reproduction time information management section B32 has a table for associating the compressed audio data of the second channel with the audio reproduction time information.
- the audio decoding unit A4 analyzes the attribute information (video header information) of the compressed audio data of the first channel stored in the compressed audio buffer unit A21 and converts the compressed audio data into audio playback time information. This is a component that is stored in the management unit A22 and decompressed in accordance with the current playback time information.
- the audio decoding unit A4 has a PCM buffer unit A41 for storing expanded audio data.
- the audio decoding unit B5 analyzes attribute information (video header information) of the compressed audio data of the second channel stored in the compressed audio buffer unit B31, and manages the compressed audio data in audio reproduction time information management. This is a configuration unit that is stored in the unit B32 and decompressed according to the current playback time information.
- the audio decoding section B5 has a PCM buffer section B51 for storing expanded audio data.
- the voice synthesis unit 6 is a component unit that synthesizes each audio data expanded by the audio decoding unit A4 and the audio decoding unit B5 and outputs the synthesized data to an external speaker.
- a video signal and an audio signal include a unit of decoding and reproduction called an access unit (one frame in the case of video data).
- an access unit one frame in the case of video data.
- time stamp information indicating whether the unit should be decoded and played back is attached.
- This time stamp information is called Presentation Time Stamp (PTS).
- PTS Presentation Time Stamp
- Video PTS hereafter, "VPTS” t ⁇
- Audio PTS hereafter, "APTS”
- the system reference reference section is a component that generates a system time reference System Time Clock (STC) inside the reference decoder of the MPEG system.
- STC System Time Clock
- the system reference section is used to create the system time reference STC, which is used to create the system time reference STC.
- SCR System Time Reference Reference Value
- PCR Program Clock Reference
- the system standard reference unit sets the standard time by setting the same value as the value indicated by SCR or PCR in the system time standard STC when the last byte of each stream arrives (when reading).
- the image / audio reproduction apparatus can set the reference time System clock STC with the exact same system clock and clock frequency.
- the system clock of the STC is 27MHz.
- Each PTS (90KHz cycle) is referenced by dividing the system time reference STC by a counter or the like.
- the accuracy of the system time base STC is 90 KHz. Therefore, if each decoder reproduces each reproduction unit so as to synchronize the system time reference STC with the video reproduction time information VPTS and the audio reproduction time information APTS within the range of the accuracy of 90 KHz, the AV synchronization is performed. An output is obtained.
- FIG. 3 is a flowchart of the AV synchronization process.
- a video stream and an audio stream of one channel are multiplexed (a video stream and an audio stream of two channels are multiplexed). If it is done, it will be explained later).
- the input unit 1 converts the encoded data input from the data recording device or the like into compressed video data, video playback time information VPTS, compressed audio data, and audio playback time information. Separate from APTS.
- the compressed video buffer unit A122 stores the compressed video data
- the video playback time information management unit A121 stores the video playback time information VPTS (Step 301).
- the video playback time information management unit A121 stores the video playback time information VPTS together with the address of each compressed video data in the compressed video buffer unit A122.
- the compressed audio buffer unit A21 stores the compressed audio data
- the audio reproduction time information management unit A22 stores the audio reproduction time information APTS (Step 302).
- the audio reproduction time information management unit A22 divides the audio reproduction time information APTS into units called slots, as shown in FIG. 4, and stores them together with the address of each audio data in the compressed audio buffer unit A21. Therefore, the audio reproduction time information management unit A22 stores the value of the audio reproduction time information APTS and a pointer to the address where the compressed audio data related thereto is stored.
- Step 301 and Step 302 is appropriately changed according to the order of input of the video signal and the audio signal to the input unit 1.
- the compressed audio buffer unit A21 has a write pointer to which the latest write position moves to the final point where data is written.
- the compressed audio buffer unit A21 also has a read pointer for specifying a read position of the compressed audio data, and the position of the read pointer is updated by reading the compressed audio data by the audio decoding unit A4.
- the compressed audio buffer unit A21 is a ring-shaped storage unit in which data is written up to the final address and the write position returns to the first address. Therefore, it is possible to write the next data up to the position where the data is read, and the input unit 1 manages the writing of the compressed audio data while preventing the write pointer from overtaking the read pointer.
- the video decoding unit A104 acquires compressed video data from the compressed video buffer unit A122, and acquires video playback time information VPTS from the video playback time information management unit A121 (step 303).
- the audio decoding unit A4 acquires the compressed audio data from the compressed audio buffer unit A21, and acquires the audio reproduction time information APTS from the audio reproduction time information management unit A22 (step 304).
- the video decoding unit A104 performs video decoding, and stores the decoded data in the frame buffer unit A141 (step 305).
- the audio decoding unit A4 Before the playback time information APTS reaches the system time reference STC, audio decoding is performed, and the decoded data is stored in the PCM buffer unit A41 (step 306).
- the video decoding unit A104 and the audio decoding unit A4 decode each data, but do not output decoded data immediately after decoding.
- the audio decoding unit A4 refers to the system time reference STC, and when the audio playback time information APTS matches the system time reference STC, or when the audio playback time information APTS matches the system time reference STC. At that point, the audio decoding data related to the audio playback time information APTS is output from the PCM buffer unit A41 (step 307).
- the video decoding unit A104 refers to the system time reference STC, and when the video playback time information VPTS matches the system time reference STC, or when the video playback time information VPTS exceeds the system time reference STC. At this point, the video decoding data related to the video reproduction time information VPTS is output from the frame buffer unit A141 (step 308).
- the image and sound reproduction device may output a stream such as a Dolby Digital optical output terminal as it is.
- the stream is stored in a stream buffer (not shown) and, when the audio playback time information APTS matches or exceeds the system time reference STC, the audio decode data associated with the audio playback time information APTS. Data is output.
- the image and sound reproduction device ends the decoding.
- the compressed video buffer unit A122 stores the compressed video data
- the video reproduction time information management unit A121 Returns to the video signal storage step (step 301) for storing the video playback time information VPTS.
- the video and audio playback device synchronizes the video playback time information VPTS and the audio playback time information APTS with the system time reference STC, and outputs video decode data and audio decode data.
- video playback time information V PTS audio playback time information If the corresponding video decode data and audio decode data are output from the time 50 ms ahead of the APTS to the time 30 ms delay, the lip sync The deviation is of a degree that does not matter.
- a commentary video of the content creator is superimposed on a video of the main part, which is a normal reproduced video, in a sub-screen, and the video of the main part is displayed.
- a sound corresponding to a commentary video hereinafter, referred to as “sub-sound”
- main sound a sound to be reproduced
- the commentary video is a video for explaining the video of the main part.
- the commentary video is a commentary video that allows the commentator to explain the place name and the like of the landscape.
- the sub-audio is audio that explains the main video that is output when the commentary video is projected and is output together with the commentary video.
- FIG. 6 is a diagram showing an example of a temporal relationship in which a main video and a commentary video are displayed.
- the video of the main part is displayed throughout the beginning of the program, and the commentary video is displayed a plurality of times during the program for a predetermined period shorter than the length of the program.
- the sub-audio is output when the commentary video is displayed as described above.
- the time during which the commentary video is displayed may be longer than the time during which the main video is displayed.
- the time during which the secondary sound is output may be longer than the time during which the main audio is output.
- FIG. 7 is a block diagram showing a configuration of an image reproducing apparatus that superimposes a commentary video on a main video.
- the video decoding unit A 104 decodes the video data of the main video
- Section B 105 decodes video data of commentary video. Synchronization of each decoded data decoded by the video decoding unit A104 and the video decoding unit B105 is managed by video playback time information VPTS and the like in each video stream. When the video playback time information VPTS matches the system time reference STC, the decoded data obtained by the video decoding unit A104 and the decoded data obtained by the video decoding unit B105 are output. The decoded data can be output in synchronization.
- one of the main video and the commentary video is a video with 24 frames per second and the other is a video with 30 frames per second.
- the image processing unit 160 converts the format of the video obtained from the movie material so that there are 30 frames per second, and then enlarges one or both of the two images. And shrink.
- the frame synchronization section 162 performs frame synchronization of the two images.
- the composite output unit 161 outputs two images by superimposing one image on the other image. As a result, the main video and the commentary video are superimposed and displayed after being synchronized.
- FIG. 8 is a block diagram showing a configuration of an audio reproducing apparatus for superimposing a main audio and a sub audio.
- the input unit 1 stores compressed audio data of main audio and audio reproduction time information APTS in the audio buffer unit A2, and stores compressed audio data of sub audio and audio reproduction time information APTS in the audio buffer. Store in part B3.
- the synchronization setting unit 11 assigns each audio reproduction time information APTS of the sub audio to the time axis T specified by each audio reproduction time information APTS of the main audio.
- Each audio playback time information of the main audio APTS is "MOO", “Mil”, “M2 0 “,” M29 “,” M40 “, and” M52 ", each block is marked with” SOO “and” SOO "on the time axis T.
- S09 “,” S20 “,” S31 “, or” S40 " are assigned to each audio reproduction time information A PTS of the sub-audio indicated by a block.
- the synchronization setting unit 11 retains the difference between the values of the adjacent audio reproduction time information APTS of the sub-audio and allocates each audio reproduction time information APTS of the sub-audio on the time axis T.
- the synchronization setting unit 11 assigns each audio reproduction time information APTS of the sub-audio to a value obtained by adding a value "11" to the value of each audio reproduction time information APTS of the sub-audio. For example, when audio reproduction time information “S09” of the sub-voice is assigned on the time axis T, the synchronization setting unit 11 adds the value “09” to the difference value “11”, that is, the value “09”. Assign audio playback time information "S09” to "M20".
- each audio reproduction time information APTS of the sub-audio is allocated on the time axis T in a state where the difference between the values of the adjacent audio reproduction time information APTS of the sub-audio is maintained.
- the main audio and the sub audio are reproduced using the audio reproduction time information APTS, the main audio and the sub audio are reproduced in synchronization.
- the audio decoding unit A4 decodes the compressed audio data of the main audio stored in the audio buffer unit A2, and refers to the audio reproduction time information APTS to Plays audio at the time synchronized with the system time reference STC.
- the audio decoding unit B5 decodes the compressed audio data of the sub-audio stored in the audio buffer unit B3, and refers to the audio reproduction time information AP TS, so that the audio is decoded at a time synchronized with the system time reference STC. Reproduce .
- the main sound and the sub sound are reproduced in synchronization.
- the difference between the audio playback time information “MOO” at the beginning of the main audio and the audio playback time information “SOO” at the beginning of the sub audio is a value “11”. Is recorded, for example, in the header of the stream, and is generated when the start time of the commentary video (sub-audio) is specified in advance. The difference may be “0”. That is, the main sound and the sub sound are You may start at the same time.
- the start time of the sub audio is set by a user's remote control operation or the like, the difference is the difference between the main audio playback time information and the main audio playback time information at the start start time.
- One recording medium (such as a disc) stores an audio stream of compressed audio encoding data of the main audio and the sub audio, a flag information S for identifying the main audio and the sub audio, and the audio stream of each audio stream. It is stored in the bit stream header information.
- the main sound is selected and reproduced from Dolby Digital 5. lch Japanese sound, Dolby Digital 5. lch English sound, and linear PCM2ch sound.
- the author's commentary Dolby Digital 2ch English audio is played.
- Each audio stream stores audio reproduction time information APTS. The user selects the main sound and the sound for simultaneous reproduction of the sub sound by selecting the menu of mixed reproduction of the sub sound and the main sound.
- the main voice is English and the sub-voice is any of Japanese, French, and German, and that there are a plurality of sub-voices. It can be assumed that there are a plurality of them.
- the user selects the sound to be reproduced.
- a content such as a movie
- an identifier for identifying the main sound for playing the scene of the movie and an identifier for identifying the sub-sound that explains the ingenuity in creating the movie creator are previously written in the content.
- the main audio and the sub audio are distinguished from each other, and both can be reproduced in synchronization. As a result, the user can reproduce the main sound and the sub sound in synchronization.
- Fig. 10 shows a state in which audio playback time information APTS is added to each audio stream when the main audio is lch and the sub audio is 3ch.
- the secondary audio is, for example, an audio stream of English audio, Japanese audio, and Korean audio.
- any of the sub-audios is reproduced in synchronization with the main audio by the operation of the synchronization setting unit 11 described above. can do.
- the audio frame size of each data may be different due to the difference in the audio coding method between the main audio and the sub audio.
- audio playback time information APTS is added to each audio stream, the main audio and sub audio can be separated by using the system time reference STC and each audio playback time information APTS. Can be played back in synchronization.
- a plurality of audio decoding units are configured to have processing independence, even if the audio frame processing unit differs due to the difference in encoding system, each audio stream will be encoded with its own audio playback time information APTS. And can be reproduced in synchronization.
- the sampling rate of the main audio and the sampling rate of the sub audio may be different.
- the rate converter 7 converts the sampling rate of one reproduced audio signal according to the sampling rate of the other reproduced audio signal.
- the main audio and the sub audio can be reproduced at the same sampling rate.
- the rate conversion unit 7 adjusts the sampling rate of the sub audio to the sampling rate of the main audio.
- the main sound and the sub sound are reproduced at a fixed sampling rate regardless of the presence or absence of the commentary sound, so that the user can hear the main sound and the sub sound without feeling uncomfortable.
- a method of converting the sampling rate there is a method of using a DA converter that converts digital audio to analog audio and an AD converter that performs the reverse operation to convert digital audio back to analog audio and convert it. is there.
- a method of converting to a desired sampling rate by using a semiconductor circuit as a sampling rate converter a method of generating a rate-converted sound by thinning-out or interpolation, which is easily applied when the sampling rates are in a multiple relationship, etc. There is.
- a method of selecting an audio signal having a main sampling rate in a case where the identifiers of the main audio and the sub audio are not recorded will be described.
- a continuous audio playback section selects an audio signal having a longer length, and a continuous audio playback section has a shorter sampling rate of an audio signal having a shorter audio playback section.
- a sub-sound is inserted as a commentary to assist the main sound, such as commentary on a specific scene, the sound reproduction section of the sub-sound is shorter than that of the main sound.
- the longer playback section is selected as the audio signal having the main sampling rate, and the sampling rate of the shorter playback section is converted in accordance with the sampling rate of the selected audio signal.
- the reproduction of the sub-sound may start in the middle of the story and end in the middle, for example, only a specific scene is played. If the longer audio playback section is selected as the audio signal having the main sampling rate, the time during which the audio having the same sampling rate is played becomes longer, and the time during which the user feels uncomfortable becomes shorter.
- the audio signal having no intermittent audio reproduction section is selected, and the sampling rate of the audio signal having the intermittent audio reproduction section is selected. Adjust to the sampling rate of the audio signal without the intermittent audio playback section. For example, when reproducing an audio signal having an intermittent commentary playback section for each scene, the sampling rate of an audio signal having an intermittent audio playback section is converted so as to match the non-intermittent one.
- the sampling rate of the audio signal having a lower sampling rate is changed to a higher sampling rate. Convert together. In other words, the high-quality audio signal is left as it is, and the other audio signals are up-sampled and rate-converted to be synthesized.
- the sampling rate ratio of the two audio signals is a multiple of the other, it is possible to simplify the circuit that synthesizes the voice after the rate conversion. For example, if the sampling rate of one audio signal is 96 KHz and the sampling rate of the other audio signal is 48 KHz, or if one is 48 KHz and the other is 24 KHz, frequency interpolation is used.
- the synthesized audio signal data can be added as is, making it easy to synthesize.
- an audio signal having a lower sampling rate is selected, and the sampling rate of the audio signal having a higher sampling rate is set lower.
- the conversion may be performed in accordance with a different sampling rate.
- This method is used when the transmission band for audio output is limited or when high-quality reproduced audio is not required. For example, assuming a case where audio data is transmitted using a specific transmission path, an effect of reducing the transmission amount of audio data can be expected by performing conversion in accordance with a low sampling rate. Also in this case, if the sampling rate ratio of the two audio signals is a multiple of the other, the circuit that synthesizes the audio after the rate conversion can be simplified.
- the sampling rate of one audio signal is 96 kHz and the sampling rate of the other audio signal is 48 kHz, or if one is 48 kHz and the other is 24 kHz, the frequency is Since the thinned audio signal data can be added as it is, it is easy to synthesize!
- an audio signal consisting of continuous audio playback sections whose sampling rate is not changed in the middle is selected, and the sampling rate of the audio signal whose sampling rate is changed in the middle is selected. Is converted to the sampling rate that does not change. Use this method when there are multiple commentaries or the sampling rate of the main audio is changed occasionally.
- audio mute may be required at the point where the sampling rate changes. Therefore, it is preferable to mainly select the audio signal whose rate is not changed. The number of sections in which the audio is muted is reduced, and continuous audio reproduction can be easily realized.
- the encoding method of the encoding program to be decoded and the operation of the hardware may be different. You may need to change circuit settings. In such a case, it is necessary to clear the compressed audio data and information such as the read pointer and the write pointer stored in the paired compressed audio buffer together with the initialization of the audio decoder. It is necessary to delete the audio playback time information APTS and the storage address pointer information in the audio playback time information management unit that can be used only with the compressed audio buffer unit. This audio buffer information needs to be cleared only by the person whose coding method and sampling rate are changed. If it is not changed, the user should be aware of the switching by continuing to decode and play back the compressed audio data. You can enjoy playing the sound that you hear.
- the addition ratio processing unit A8 and the addition ratio processing unit B9 change the reproduction output level. For example, on a recording medium or the like, addition ratio information indicating an addition ratio of sub-audio such as commentary to main audio is stored in header information of each audio stream or a stream of sub-audio such as commentary.
- the addition ratio processing unit A8 and the addition ratio processing unit B9 apply the addition ratio to one or both of the main sound and the sub sound with a value according to the addition ratio information.
- the main voice and the sub voice are synthesized.
- the addition ratio processing unit A8 and the addition ratio processing unit B9 add both the main sound and the sub-sound to the original sound by lowering the output level to 0.7 times or the like.
- the reproduction output level of the arbitrary one audio is synthesized with the other audio.
- the reproduction output level of any one of the aforementioned voices is not reduced.
- the playback output level is a fixed value "1" and two voices are synthesized
- the playback output level of the voice to be added is reduced from the fixed value "1" to "0.6”
- the voice on the added side can be emphasized and heard.
- the voice to be synthesized is commentary voice, if you want to listen carefully to the explanation, increase the playback voice level of the commentary voice and decrease the playback voice level of the main voice.
- any one original sound can be replaced with the other. If the other audio level is set higher by the user's intention in the part that synthesizes the audio, any one of the original audio output levels is reduced according to the increment of the other. This is because if one is added at the same volume while the other is increased, a signal component exceeding the reproduction dynamic range is generated in a part of the sound after the addition, resulting in sound noise such as clipping. This is because there is a possibility that the sound will be generated and the sound will be very difficult to hear. Conversely, when the output level of the sub sound is lowered, the addition ratio of the main sound may be relatively increased.
- the addition output unit 10 synthesizes the voice. In that case, the number of playback channels for each audio may be different.
- FIG. 11 shows a configuration example of the addition output unit 10 (the rate conversion unit 7 is omitted for simplicity of the drawing.) 0
- the addition ratio is adjusted according to the number of reproduction signal channels of the arbitrary one audio. After the processing is performed, the addition output unit 10 integrates or distributes the number of channels of the reproduction signal of the other audio and synthesizes them.
- the addition channel information of the sub-audio such as the commentary for the main audio is stored in the header information of each audio stream or the stream on the commentary side, and is recorded on a recording medium or the like.
- the addition output unit 10 synthesizes speech with a value according to the addition channel information.
- the addition output unit 10 synthesizes the sub audio into the center channel of the main audio.
- the addition channel information includes the mixing level and channel mapping of each addition channel, addition channel information such as addition restriction information for a specific channel, the sampling rate, the number of sampling bits for each channel, and the data rate of the compressed stream. Can be specified. Furthermore, if there is detailed addition ratio information such as an addition volume coefficient table along with the addition channel information, the sub audio is added to the front right channel of the main audio by lowering the output level by 0.7 times, etc. The output level is reduced to 0.7 times, etc. and added.
- the audio reproduced by the audio decoding unit A4 is 5.1 ch and the audio decoding unit B5 outputs If the reproduced sound is monaural lch, the (first) cell is added as a destination of the sound reproduced by the audio decoding unit B5.
- the addition ratio of the other channel is changed as necessary in consideration of the balance with other main audio channels as well as the change such as reducing the gain of the main audio of the addition channel. It is desirable that the addition ratio can be set flexibly at the request of the user in order to raise the volume of the main audio by lowering the volume of the main audio if the volume of the sub-audio is increased, and to increase the volume of the main audio by lowering the volume of the sub-audio.
- the addition unit sets the addition ratio for executing the addition of the voice without clipping.
- the addition ratio first set the clipping channel to a value that does not cause clipping, and then set the addition ratio of the other channels again according to the output relative level with the channel for which the addition ratio has been set. And so on.
- a configuration in which the user sets the addition ratio for each channel may be provided. Therefore, each addition ratio processing unit Is added according to the number of reproduction channels.
- the addition value is changed according to a user's instruction, if a process such as pausing the reproduction, muting the sound, and changing the addition coefficient is performed, an abnormal sound or the like occurs during the change. It is possible to realize the change of the added value without performing. If a detection unit is provided to detect the clipping before multiplying the decoded audio by the addition ratio and synthesizing and outputting, the addition ratio processing unit A8 and the addition ratio processing unit B9 automatically change the addition value. By doing so, the addition ratio can be changed again, and synthesis can be performed again so that clipping does not occur, thereby preventing generation of abnormal noise.
- a processing unit is provided for changing the addition coefficient so that the audio output level gradually decreases and becomes a level at which clipping does not occur, in response to a case where the detection unit finds a point in time at which clipping occurs. This makes it possible to provide a device in which abnormal noise is not output continuously.
- the synthesis of sound may be affected by the configuration of an external device connected to the sound reproducing device.
- the external sound device 92 shown in FIG. 12 is connected to a sound reproducing device.
- the connected speakers may have only three channels, even if the original playback content has 5.lch.
- the number of channels of one arbitrary audio signal is integrated or distributed according to the number of channels of the external audio device 92, and the number of channels of the other audio signal is integrated or distributed and synthesized. I do.
- the number of channels to be reproduced and output may be changed by the user.
- the number of the reproduction signal channels of any one of the sounds is integrated or distributed according to the setting of the external sound device 92 or the output unit in the sound reproduction device to the channel designated by the user for sound output.
- the user can set all or part of the audio output and automatically add the value required for the addition ratio processing.
- the audio playback device can be set.
- the L channel of the main sound also outputs the power of the first speaker.
- the R channel of the main audio is output from the second speaker.
- the C channel of the main audio, the FL channel of the sub audio, and the FR channel of the sub audio are calorie-calculated and output from the third speaker.
- the channel to which the sub sound is added may be temporally changed.
- the V of the secondary audio, the shifted channel or both channels are first added only to the L channel of the main audio, then to the L channel of the main audio and the C channel of the main audio, and then to the main audio Channels that add over time, such as adding only to the C channel of the main audio, then adding to the C channel of the main audio and the R channel of the main audio, and finally adding only to the R channel of the main audio.
- an external video device 91 or an external audio device 92 is connected to the audio reproducing device, and the audio reproducing device specifies the external device such as the device ID of the externally connected device.
- Information on the number of speakers that can be output by recognizing the If the configuration is such that the setting information of the channel to be synthesized with the voice is obtained and the selection of the addition before and after each output process at the time of the variable speed reproduction is set, the convenience is further improved.
- the audio playback device receives an ID number or the like for knowing the type of the output device of the other party, and sets various setting conditions by referring to a table in the main body or a memory card for setting conditions. With such a configuration, it is possible to synthesize the main sound and the sub-sound according to the number of channels that can be output without the user's operation on the sound reproducing device.
- FIG. 15 shows the configuration of two devices connected by HDMI.
- a line 87 for exchanging device-specific information and a ROM 85 for storing device-specific information are displayed.
- the source device 81 and the sink device 82 perform an authentication procedure to confirm that the source device 81 and the sink device 82 can be connected to each other.
- device-specific information data is sent.
- the audio playback device which is the source-side device 81, obtains the device-specific information of the external video device 91 and the external audio device 92 by this method, it can limit the number of composite channels and the restriction information of the composite image format. Earn and change settings. If the acquired information is configured to be stored as default setting values by the audio reproducing apparatus, the AV viewing can be performed in the same state as long as the device connection does not change. Whenever there is a change in the connected device ID, etc., the information on the partner device side must be received and the setting changed.
- the synthesized output of the main sound and the sub sound is performed by synthesizing and outputting the PCM data stored in each PCM buffer.
- the PCM data is transmitted to the external audio equipment 92 by outputting the PCM data from an audio DAC built in the audio playback device or from an optical digital cable compatible with a digital audio interface standard such as IEC60958. Can be played.
- the PCM data created by synthesizing the main sound and the sub-sound is subjected to audio coding, thereby obtaining a digital code such as a Dolby Digital system.
- the data may be converted into data and output to an externally connected device by an optical digital cable, an HDMI cable, or the like, according to an audio digital interface standard such as the IEC61937 standard of a compression encoding stream.
- These externally connected devices are assumed to be monitor output devices such as TVs, audio output amplifiers, interface devices such as AV amplifiers having an AV selector function, portable output devices, in-car AV playback devices, and the like. You.
- the addition output unit 10 performs audio output of the audio data that has been subjected to the addition ratio processing in each addition ratio processing unit at the same sampling rate without causing audio clipping. In addition, when the sampling rate is converted or when the addition ratio is changed, the continuity of the sound cannot be maintained.
- the speech synthesis unit 6 includes a rate conversion unit 7, an addition ratio processing unit A8, an addition ratio processing unit B9, and an addition output unit 10.
- the power rate conversion unit 7 described in the case where the rate conversion unit 7 is provided only on the audio decoding unit B5 side may be provided on the audio decoding unit A4 side, or on the audio decoding unit A4 side and the audio decoding unit B5 side.
- a configuration in which each of the decoding units decodes three or more compressed audio data and synthesizes the two or more voices is also possible.
- the system time reference itself serving as the reference of the entire system is made variable and the update of the reference value of the system time reference signal is made variable, a plurality of synchronous reproductions based on the reference value information can be performed. By synchronizing and decoding the audio reproduction time information of the audio signals, the two can be synchronized with each other.
- the encoded data stream of the compressed audio data for the sub-audio is not limited to the one provided with one recording medium, but may be the input of a device connected via a network. Also, a recording medium power different from the recording medium on which the main audio is recorded may be provided. Both may be downloaded and played back from external devices connected via a network. In some cases, the information is recorded in advance in a recording device such as a device-specific semiconductor or a hard disk device, or recorded as an initial setting. In any case, synchronous reproduction can be performed if the audio reproduction time information is associated with each other in order to ensure synchronous reproduction of the main voice and the sub-audio. If they are not related, they will not play at the same time. In any case, it is not necessary to play back with the playback time information.
- the input stream is not limited to a stream recorded on a recording medium such as a DVD or a stream recorded by receiving a digital broadcast signal. It may be a stream in which an analog signal from the outside is digitally encoded and encoded.
- AV synchronization can be achieved at the time of playback.
- a system that realizes post-recording playback by encoding another audio stream synchronized with the original playback audio and adding audio playback time information with reference to the audio playback time information of the original audio stream. can do.
- the commentary video is displayed a plurality of times during a predetermined period shorter than the length of the main video.
- the commentary video may start in the middle of the main video and may not be finished even after the main video is completed. Accordingly, the sub sound does not end even when the main sound ends (see "SB" part in Fig. 16).
- the sub-audio is reproduced in synchronization with the main audio according to the audio reproduction time information APTS of the main audio.
- the sub sound may be played back according to (1) the system time reference STC, or (2) predicting the audio playback time information APTS of the main sound after the end of the main sound.
- the main audio may be reproduced according to the audio reproduction time information APTS, or (3) the sub audio may be reproduced according to the audio reproduction time information APTS.
- the commentary video may be displayed enlarged.
- a sound effect (for example, a buzzer sound) may be synthesized with the main sound.
- the audio playback time information APTS is included in the signal of the sound effect, the sound effect is processed as the auxiliary sound, and the audio effect is synchronized with the main audio and the auxiliary audio by using the audio reproduction time information APTS. May be played back.
- the sound effect signal does not include the audio playback time information APTS, the playback time information APTS of the main audio corresponding to the playback start time of the sound effect is defined as the audio playback time information of the sound effect. If this is the case, synchronous reproduction can be performed similarly.
- ch audio signal must be integrated into 3 channels due to limitations on output speakers, etc., that is, if it is output on 3 channels of "TL”, “TR”, and “TC”, the main audio signal
- "L” and “SL” of synthesized speech are “TL” of integrated speech
- "R” and “SR” of synthesized speech are “TR” of integrated speech
- "C” and ""SUB” is integrated into 3 channels of "TC” of integrated voice.
- the attached data is information for specifying the number of channels, the encoding method, the sampling rate, the audio reproduction section, and the like of each audio signal. Further, the attached data may include addition ratio information and addition channel information. It may also include information for specifying the start time of the sub sound. Thus, the sound reproducing apparatus can easily synthesize or integrate a plurality of sounds.
- FIG. 8 is a block diagram showing the configuration of the audio reproducing apparatus according to the second embodiment, the configuration and the audio reproducing method of the audio reproducing apparatus according to the second embodiment will be described.
- the audio reproducing apparatus separates a plurality of audio signals from the input compressed audio data card, reads out the respective audio reproduction time information, and outputs an audio signal of one of the audio signals.
- the main audio signal is decoded based on the playback time information, and the other audio signal is decoded.
- the audio decoder has a processing capability equal to or higher than the normal reproduction speed process and has the capability of performing the audio output reproduction process at a variable speed
- the audio decode of one of the audio signals is performed at the time of the reproduction at the variable speed process.
- FIG. 20 is a diagram showing a flow of a process of selecting whether to add a sub-sound to a main sound before or after a variable speed process after an audio decoding process, and synthesizing and reproducing the sound.
- the result of the audio decoding is stored in the PCM buffer unit.
- step 331 either before or after the audio synthesis processing is selected. The criteria will be described later.
- step 332 If before the audio synthesis processing is selected (Yes in step 331), in step 332, the audio playback time information of the main audio and the audio playback time information of the commentary sub audio match (within the allowable output time difference). If they match, for example, within several tens of ms), the auxiliary sound such as commentary is added to the main sound, and in step 333, audio variable speed processing is performed. On the other hand, if after the audio synthesis processing is selected (No in step 331), in step 334, the main audio is subjected to audio variable speed processing, and in step 335, the sub audio is added to the main audio. In step 307, the audio obtained by adding the sub audio to the main audio is output in synchronization with the video output.
- FIG. 21 is a block diagram for explaining a method of performing variable speed control of audio output processing section 61 according to the second embodiment.
- the variable speed control when performing the reproduction speed conversion function shown in FIG. 22 is performed. Examples are described in detail below.
- the audio signal from the PCM buffer unit A41 is input to the variable speed processing unit 62, and the variable speed processing described below is performed. After that, the audio signal is temporarily output to the output buffer 63 And output to the speech synthesizer 6.
- variable speed playback There are several methods for implementing variable speed playback. The first is a method of repeating normal speed reproduction and skip reproduction, and the second is a method of actually performing high-speed decoding.
- the variable-speed processing unit 62 in the audio output unit which does not play back all audio frames, uses a specific audio frame so that the playback time is reduced by half after audio output processing conversion. Is created, and the reproduced audio data is created and stored in the output buffer unit 63. Then, the audio reproduction time information APTS value corresponding to the audio frame portion to be reproduced is obtained.
- the video output unit performs AV synchronization by skipping display of a specific frame in order to acquire synchronization information and output a video corresponding to the corresponding audio reproduction time information APTS.
- AV playback at the time of variable speed playback is achieved by performing video display in synchronization with audio playback time information APTS when skip playback is performed in audio frame processing units.
- variable speed processing can be performed after adding the other decoded sound after the audio decoding processing.
- the audio output processing unit 61 performs variable speed processing, so that the added sound can be output in synchronization with the variable speed processing of the decoded sound. is there.
- the sub sound can be added to the main sound after the variable speed processing. Since the sub-sound is added to the main sound after the audio output processing unit 61 performs variable-speed processing, even if the decoded sound is subjected to variable-speed processing, the added sub-sound must be added at the normal speed sound. Can be.
- the input unit 1 fetches data faster than the input speed required for normal reproduction, divides it into a video stream and an audio stream, and stores the stream in each buffer unit. Thereby, the plurality of video decoding units and the plurality of audio decoding units are activated.
- Each decoder performs decoding at a speed higher than the normal playback speed (effectively using the given resources regardless of the playback speed), and decodes each frame buffer and PCM buffer. Stores the code result.
- audio decoding processing capacity is required to be higher than normal reproduction speed processing. For example, in order to maintain a reproduction speed of about 1.3 times, it is desirable to have a decoding processing capacity of about 1.5 times slightly higher than the reproduction speed. This requires not only the decoding processing performance but also the read processing performance of the playback media power and the same performance in the transfer processing performance.
- audio data stored in a PCM buffer or the like is processed as follows.
- the upper side shows data at the normal speed before variable speed processing
- the lower side shows high speed data after variable speed processing.
- the upper part shows the case where 6 audio frames (one audio frame is about 10 ms or more) are normally played in the time T1.
- the lower part shows the case where the playback of the first and second audio frames is overlapped, and as a result, six audio frames are played at the time of T2, which is five sixths of T1.
- the compression ratio is defined as the value obtained by dividing the time length after processing by the time length before processing
- the speed ratio is the reciprocal of the compression ratio. Therefore, here, high-speed reproduction is performed at 6/5 (1.2 times).
- variable speed control in the audio output processing unit 61 is performed, and another decoded sound is added after the audio decoding process. If a means is provided for selecting whether the voice is subjected to the addition processing after the variable speed processing, the added data voice can be reproduced at a pitch different from that of the original sound.
- the synchronization between the main playback audio and the sub audio is as described above.
- it is sufficient to add by referring to the PTS of another audio based on the PTS originally calculated for all audio frames.
- a rule may be determined in advance as to which of the overlapping audio frames the PTS in the overlapping portion of the audio frames should be valid.
- the audio reproducing device may reproduce the currently reproduced main audio so that the reproduction continuity is maintained. At this time, sampling rate conversion, addition value conversion, output channel change, and the like may be performed in the same manner as in the previous embodiment.
- AV synchronous playback is easy if audio playback time information APTS, which is an audio playback reference time, is used.
- a determination unit for determining the content content of a playback stream is provided as selection means for addition for playback synthesis.
- the timing for adding the audio information extracted from the data at the time of playback either before or after the audio output processing, or adding the text or character information with the extracted data power
- the timing for adding the audio information extracted from the data at the time of playback either before or after the audio output processing, or adding the text or character information with the extracted data power
- a selection unit for addition a selection unit for selecting the content reproduction processing content specified by the user is provided. According to the result obtained by the selection unit, the timing for adding the audio information extracted from the data is either before or after the audio output processing, or the timing for adding text or character information from which the data power is also extracted. Then, either before or after the video output processing can be selected and reproduced.
- calorie calculation according to a user's instruction such as whether to add voice information and text information before variable speed processing or to add voice information and text information after variable speed processing is selected. it can.
- a determination unit for determining the content content of the stream to be reproduced and the usage by the user is provided. According to the result obtained by the determination unit, at the time of reproduction, as a timing for adding the audio information extracted from the data output, either before or after the audio output process is selected, or text or character information extracted from the data output is used. As the timing for addition, either before or after the video output processing can be selected and reproduced.
- voice information and character information are added before the variable speed process in the variable speed process according to the user's instruction, but in the pitch change process in which only the pitch is changed, the pitch is changed.
- the addition before and after each output process can be selected by taking into account the user's instruction in addition to the content, such as adding voice information and text information after the change process.
- FIG. 8 is a block diagram showing the configuration of the audio reproduction device according to the third embodiment
- FIG. 21 which shows the configuration of an audio output processing unit for performing variable speed control
- the audio reproduction device of the third embodiment will be mainly described.
- the configuration and audio reproduction method will be described.
- the audio output processing unit 61 is not limited to performing the variable speed reproduction process.
- a process of changing the pitch of the decoded voice may be performed.
- a digital broadcast signal is received and recorded, and at least a stream in which audio is encoded is played back while ensuring time synchronization
- the audio information extracted from the data is used for audio synthesis processing using synchronization information.
- the audio output processing unit 61 can also perform an acoustic effect process to which various surround effects are added. It is possible to change whether the surround effect is performed after adding the sub sound or the sub sound is added after the surround effect is added. As a result, it is possible to change the sense of spread of the sub-voice and the output speaker destination.
- the audio output processing unit 61 may perform a delay setting effect in consideration of a synchronization processing delay between the video processing and the audio processing. When the output delay of the connected video equipment and audio equipment is configured to be able to be set by the audio playback device, set the power to add the sub-audio before applying the delay, and set whether to add the sub-audio after the delay. Can be.
- FIG. 7 and 8 which are block diagrams showing the configurations of an image reproducing device and an audio reproducing device according to the fourth embodiment
- FIG. 23 which is a flowchart showing a method of synchronously reproducing a plurality of videos according to the fourth embodiment.
- the configuration of the image reproduction device and the audio reproduction device, and the image reproduction method and the audio reproduction method according to the fourth embodiment will be described with reference to FIG.
- step 351 When skipping after combining (Yes in step 351), the result decoded by the video decoding unit B105 is stored in the frame buffer unit B151 (step 405). Then, in step 352, if the playback time information of the video decoding unit A104 and the playback time information of the video decoding unit B105 match (within the allowable output time difference, for example, within 33 ms), the decoded image is decoded. After the superimposition, in step 353, an image skip output process is performed.
- Step 354 after skipping in the image skipping process, in Step 355, the video decoding unit that matches the playback time information of the video decoding unit A104 The decoded image of the playback time information of B105 is superimposed. Then, in step 308, an image is output in synchronization with the audio output.
- either one before or after the video output process is selected based on the synchronization information at the time of the other video decoding, and the image is synthesized and reproduced. For example, after adding the other image to the decoded image, the output of synchronizing one image and the other image in the image skip output process is skipped by the image skip process, and then the other decoded image is added. By doing so, the way of outputting the added image can be changed.
- the video skip processing is performed, and the video that matches the video playback time information VPTS to be displayed is added to the other video.
- the decoded images may be added.
- the video skip processing is performed, and only the video playback time information VPTS of the video to be displayed and the decoded image whose playback time information matches are selected and added. I do.
- the addition that matches the time information after the video synthesis processing does not depend on the video playback time information VPTS of the displayed video after performing the video skip processing, and Code images are added and displayed.
- This skip processing corresponds to high-speed I playback in which only I pictures are played back and P pictures and B pictures are skipped, and IP playback in which only B pictures are skipped. These do not play back B pictures, depending on whether the input unit 1 discards B-picture data or discards it after decoding. Therefore, the reproduction time information for reproducing the image of the B picture is not required. Therefore, at the time of high-speed reproduction with skips or the like, the reproduction time information of the finally output image is valid.
- each frame buffer unit is added by the image synthesizing unit 106, the addition result is output as a video.
- the processing waits until the frame output time synchronized with the next data without performing the addition processing.
- the time difference between each PTS is about 33 ms.
- the synthesizing of the main audio data and the sub-audio such as commentary in the PCM buffer section may be synchronized by the same principle. In this case, if the difference is within 10 ms (accuracy of several ms to several tens ms due to the difference in audio compression method), it is determined that synchronization is established, and a synthesized sound can be generated.
- the currently output or output PTS value is referred to, the PTS value is converted to reproduction time information, and the video data and audio data are converted. If the time for synchronizing with the data is set, the data can be synthesized in the same manner as in the normal synchronous reproduction.
- seamless playback there are the following methods for seamless playback so as to maintain continuity between the two as much as possible.
- seamless editing is performed mainly on video.
- the audio reproduction for the video before the connection point is performed by one audio decoding unit A4 until the last reproduction time before the seamless connection point.
- decoding is performed in the audio corresponding to the playback time of the first image of the next seamless connection point, and preparations are made so that sound can be output at the synchronization time.
- playback can be performed so that both decoded audios are switched! If necessary, apply audio and fade processing.
- the image processing unit 160 is provided when images are combined by the image combining unit 106 after video decoding, if output size conversion such as enlargement / reduction of a combined screen is set after decoding, child screens are combined. In this case, it is possible to select whether to combine the sub-screens after reducing the size or to cut out a specific part and enlarge it! Selection of partial enlargement or reduction of the original screen Selection is also possible.
- conversion from high resolution to low resolution for the output TV monitor or vice versa for example, conversion from standard resolution of 4801 to high resolution of 10801
- various format conversions such as output format conversion of NTSC, frequency format conversion between NTSC system and PAL system, and IP conversion to interlaced image quality progressive image quality will be implemented. These orders are not necessarily as shown in this example.
- format conversion a plurality of format conversions (resolution format and output format, etc.) may be performed simultaneously. When two images are combined, such as when one is an NTSC image and the other is a PAL image, or one is a standard quality image and the other is a high quality image, etc. Is easy to combine if both formats are matched in advance.
- these superimposed images are displayed by pasting a GUI screen or the like that assists the user's operation on the superimposed images, it is possible to combine the images with a screen size suitable for the GUI screen menu arrangement. It may be desired. For example, if the main image is displayed on the background screen, the commentary image is superimposed on the sub-image, and the transparent menu screen for setting various screens is superimposed on it, the image effect according to the setting menu is used. Is easy to confirm.
- subtitles are called closed caption signals, and the specification specifies that display and non-display are switched by a user's remote control operation. Therefore, when applied to the embodiment of the present invention, it is desirable to select addition of each output processing and display selection according to a user's instruction. Furthermore, even when subtitle characters and the like are scrolled vertically and horizontally, or when various display effects are involved, such as when performing a wipe, if it is possible to select before and after various output processes, even when fast-forwarding. If important information is overlooked, or if all subtitles are not displayed and confirmed, the user will not be able to move on to the next screen display, thereby eliminating the messiness.
- Such closed captions and similar examples of closed captions include not only US closed captions but also European teletext.
- the audio information extracted from the stream data during the data broadcast is output as an audio output. It is possible to make settings such that they are added before the processing and the character information is added after the video output processing.
- the audio information extracted from the data at the time of reproduction can be obtained before the audio output processing by the result obtained by the judgment unit.
- an additional sound such as a buzzer, an after-recording sound for adding a plurality of recorded sounds, and a microphone echo sound for adding a microphone echo such as karaoke to an accompaniment sound are also output as audio.
- the addition can be performed before or after the process is selected, the same effect as described above can be obtained.
- the same effect can be obtained by selecting and adding the subtitles, superimposed characters, characters and graphics that you want to insert at the time of editing, before or after the video output processing. Is obtained. This can be achieved by installing a dedicated audio processing element or digital signal processor (DSP), or by using a high-performance CPU.
- DSP digital signal processor
- the input data has been described as data to which an external force is also input or data to be input from an external recording medium, the input data may be data that exists in the device in advance.
- the input unit 1 separates input data into a video signal and an audio signal.
- the video signal and the audio signal may be file data separated in advance.
- a configuration in which the playback time information, the compressed audio data and the playback time information related to the compressed video data are input, and the compressed video data and the compressed audio data can be synchronized and played back using the respective playback time information.
- an audio reproducing apparatus that performs the audio reproducing method of the present invention. This is a result of editing a signal shot by a video camera or the like on a personal computer.
- Examples of application of the data reproducing method and the apparatus include a set-top box, a digital satellite broadcast receiver and its recording device, a DVD player or a DVD recorder, a VCD-related device, a hard disk recorder, and a personal computer. .
- a set-top box a digital satellite broadcast receiver and its recording device
- DVD player or a DVD recorder a VCD-related device
- hard disk recorder a personal computer.
- a personal computer By creating an AV playback program according to the audio playback method of the present invention, it is possible to load an external operation program into a personal computer or the like and perform an AV synchronous execution operation while synthesizing audio or images.
- a part or all of the components shown in FIG. 2 may be realized by one integrated circuit (integrated chip). Further, some or all of the components shown in FIG. 7 may be realized by one integrated circuit (integrated chip). Further, some or all of the components shown in FIG. 8 may be realized by one integrated circuit (integrated chip). Further, a part or all of the components shown in FIG. 12 may be realized by one integrated circuit (integrated chip). Further, some or all of the components shown in FIG. 21 may be realized by one integrated circuit (integrated chip).
- the audio reproducing method and the audio reproducing apparatus according to the present invention are based on a synchronous signal of a plurality of coded digital audio signals, and perform a plurality of operations such as changing a sampling rate even if the encoding system is different.
- a plurality of operations such as changing a sampling rate even if the encoding system is different.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2006512092A JP3892478B2 (ja) | 2004-04-06 | 2005-04-05 | 音声再生装置 |
US11/547,305 US7877156B2 (en) | 2004-04-06 | 2005-04-05 | Audio reproducing apparatus, audio reproducing method, and program |
EP05728821A EP1734527A4 (en) | 2004-04-06 | 2005-04-05 | AUDIOWIEDERGABEANORDNUNG, AUDIOWIEDRGABEMETHODE AND PROGRAM |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004-112224 | 2004-04-06 | ||
JP2004112224 | 2004-04-06 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2005098854A1 true WO2005098854A1 (ja) | 2005-10-20 |
Family
ID=35125327
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2005/006685 WO2005098854A1 (ja) | 2004-04-06 | 2005-04-05 | 音声再生装置、音声再生方法及びプログラム |
Country Status (6)
Country | Link |
---|---|
US (1) | US7877156B2 (ja) |
EP (1) | EP1734527A4 (ja) |
JP (1) | JP3892478B2 (ja) |
KR (1) | KR100762608B1 (ja) |
CN (1) | CN100505064C (ja) |
WO (1) | WO2005098854A1 (ja) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006246296A (ja) * | 2005-03-07 | 2006-09-14 | Nec Electronics Corp | データ処理装置及びデータ処理方法 |
JP2007127861A (ja) * | 2005-11-04 | 2007-05-24 | Kddi Corp | 付属情報埋め込み装置および再生装置 |
JP2007150853A (ja) * | 2005-11-29 | 2007-06-14 | Toshiba Corp | 供給装置と処理装置及び指示方法 |
JP2008090936A (ja) * | 2006-10-02 | 2008-04-17 | Sony Corp | 信号処理装置、信号処理方法、およびプログラム |
EP1928110A2 (en) * | 2006-11-30 | 2008-06-04 | Broadcom Corporation | Method and system for utilizing rate conversion filters to reduce mixing complexity during multipath multi-rate audio processing |
EP1927982A3 (en) * | 2006-11-30 | 2008-06-11 | Broadcom Corporation | Method and system for processing multi-rate audio from a plurality of audio processing sources |
JP2008159238A (ja) * | 2006-11-30 | 2008-07-10 | Matsushita Electric Ind Co Ltd | 音声データ送信装置および音声データ受信装置 |
JP2009063752A (ja) * | 2007-09-05 | 2009-03-26 | Toshiba Corp | 音声再生装置及び音声再生方法 |
JP2009289372A (ja) * | 2008-05-30 | 2009-12-10 | Toshiba Corp | 音声データ処理装置および音声データ処理方法 |
JP2010288262A (ja) * | 2009-05-14 | 2010-12-24 | Yamaha Corp | 信号処理装置 |
JP2011044213A (ja) * | 2009-08-24 | 2011-03-03 | Sony Corp | 情報処理装置および方法、並びにプログラム |
JP2011070076A (ja) * | 2009-09-28 | 2011-04-07 | Nec Personal Products Co Ltd | 情報処理装置 |
JP2011077678A (ja) * | 2009-09-29 | 2011-04-14 | Toshiba Corp | データストリーム処理装置、映像装置、およびデータストリーム処理方法 |
US7936288B2 (en) | 2006-11-30 | 2011-05-03 | Broadcom Corporation | Method and system for audio CODEC voice ADC processing |
JP2014140135A (ja) * | 2013-01-21 | 2014-07-31 | Kddi Corp | 情報再生端末 |
JP2017521922A (ja) * | 2014-06-10 | 2017-08-03 | テンセント テクノロジー (シェンジェン) カンパニー リミテッド | ビデオリモートコメンタリー同期方法及びシステム並びにターミナルデバイス |
JP2019165487A (ja) * | 2019-05-15 | 2019-09-26 | 東芝映像ソリューション株式会社 | 放送受信装置及び放送受信方法 |
JP2019165488A (ja) * | 2019-05-15 | 2019-09-26 | 東芝映像ソリューション株式会社 | 放送受信装置及び放送受信方法 |
WO2021065496A1 (ja) * | 2019-09-30 | 2021-04-08 | ソニー株式会社 | 信号処理装置および方法、並びにプログラム |
Families Citing this family (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9609278B2 (en) | 2000-04-07 | 2017-03-28 | Koplar Interactive Systems International, Llc | Method and system for auxiliary data detection and delivery |
JP4912296B2 (ja) * | 2005-04-28 | 2012-04-11 | パナソニック株式会社 | リップシンク補正システム、リップシンク補正装置及びリップシンク補正方法 |
US8000423B2 (en) * | 2005-10-07 | 2011-08-16 | Zoran Corporation | Adaptive sample rate converter |
US20070299983A1 (en) * | 2006-06-21 | 2007-12-27 | Brothers Thomas J | Apparatus for synchronizing multicast audio and video |
US20070297454A1 (en) * | 2006-06-21 | 2007-12-27 | Brothers Thomas J | Systems and methods for multicasting audio |
US20080133249A1 (en) * | 2006-11-30 | 2008-06-05 | Hashiguchi Kohei | Audio data transmitting device and audio data receiving device |
JP4991272B2 (ja) * | 2006-12-19 | 2012-08-01 | 株式会社東芝 | カメラ装置およびカメラ装置における再生制御方法 |
KR100809717B1 (ko) | 2007-01-12 | 2008-03-06 | 삼성전자주식회사 | 더블 패터닝된 패턴의 전기적 특성을 콘트롤할 수 있는반도체 소자 및 그의 패턴 콘트롤방법 |
CN101889441A (zh) * | 2007-11-16 | 2010-11-17 | 松下电器产业株式会社 | 便携式终端和用于视频输出的方法 |
US8798133B2 (en) * | 2007-11-29 | 2014-08-05 | Koplar Interactive Systems International L.L.C. | Dual channel encoding and detection |
KR101403682B1 (ko) * | 2007-12-13 | 2014-06-05 | 삼성전자주식회사 | 오디오 데이터를 전송하는 영상기기 및 그의 오디오 데이터전송방법 |
JP5283914B2 (ja) * | 2008-01-29 | 2013-09-04 | キヤノン株式会社 | 表示制御装置及び表示制御方法 |
EP2141689A1 (en) | 2008-07-04 | 2010-01-06 | Koninklijke KPN N.V. | Generating a stream comprising interactive content |
JP2009277277A (ja) * | 2008-05-13 | 2009-11-26 | Funai Electric Co Ltd | 音声処理装置 |
US8515239B2 (en) * | 2008-12-03 | 2013-08-20 | D-Box Technologies Inc. | Method and device for encoding vibro-kinetic data onto an LPCM audio stream over an HDMI link |
JP2010197957A (ja) * | 2009-02-27 | 2010-09-09 | Seiko Epson Corp | 画像音声供給装置、画像音声出力装置、画像供給方法、画像音声出力方法、及びプログラム |
US8984626B2 (en) | 2009-09-14 | 2015-03-17 | Tivo Inc. | Multifunction multimedia device |
US8605564B2 (en) * | 2011-04-28 | 2013-12-10 | Mediatek Inc. | Audio mixing method and audio mixing apparatus capable of processing and/or mixing audio inputs individually |
JP5426628B2 (ja) * | 2011-09-16 | 2014-02-26 | 株式会社東芝 | 映像再生装置、映像再生装置の制御方法及びプログラム |
DE112013005221T5 (de) * | 2012-10-30 | 2015-08-20 | Mitsubishi Electric Corporation | Audio/Video-Reproduktionssystem, Video-Anzeigevorrichtung und Audio-Ausgabevorrichtung |
US9154834B2 (en) * | 2012-11-06 | 2015-10-06 | Broadcom Corporation | Fast switching of synchronized media using time-stamp management |
US9263014B2 (en) * | 2013-03-14 | 2016-02-16 | Andrew John Brandt | Method and apparatus for audio effects chain sequencing |
US9350474B2 (en) * | 2013-04-15 | 2016-05-24 | William Mareci | Digital audio routing system |
US20160170970A1 (en) * | 2014-12-12 | 2016-06-16 | Microsoft Technology Licensing, Llc | Translation Control |
WO2016091332A1 (en) * | 2014-12-12 | 2016-06-16 | Huawei Technologies Co., Ltd. | A signal processing apparatus for enhancing a voice component within a multi-channel audio signal |
JP6582589B2 (ja) * | 2015-06-16 | 2019-10-02 | ヤマハ株式会社 | オーディオ機器 |
KR102582494B1 (ko) * | 2016-12-09 | 2023-09-25 | 주식회사 케이티 | 오디오 컨텐츠를 분석하는 장치 및 방법 |
KR20180068069A (ko) * | 2016-12-13 | 2018-06-21 | 삼성전자주식회사 | 전자 장치 및 이의 제어 방법 |
CN106851385B (zh) * | 2017-02-20 | 2019-12-27 | 北京乐我无限科技有限责任公司 | 视频录制方法、装置和电子设备 |
CN107230474B (zh) * | 2017-04-18 | 2020-06-09 | 福建天泉教育科技有限公司 | 一种合成音频数据的方法及*** |
US11475872B2 (en) * | 2019-07-30 | 2022-10-18 | Lapis Semiconductor Co., Ltd. | Semiconductor device |
CN112083379B (zh) * | 2020-09-09 | 2023-10-20 | 极米科技股份有限公司 | 基于声源定位的音频播放方法、装置、投影设备及介质 |
CN113138744B (zh) * | 2021-04-30 | 2023-03-31 | 海信视像科技股份有限公司 | 一种显示设备和声道切换方法 |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05101608A (ja) * | 1991-10-09 | 1993-04-23 | Fujitsu Ltd | 音声編集装置 |
JPH05266634A (ja) * | 1992-03-19 | 1993-10-15 | Fujitsu Ltd | オーディオデータの重ね合せ方法及び重ね合せ装置 |
JPH07296519A (ja) * | 1994-04-28 | 1995-11-10 | Sony Corp | ディジタルオーディオ信号伝送装置 |
JPH10145735A (ja) * | 1996-11-05 | 1998-05-29 | Toshiba Corp | 復号装置および画像/音声再生方法 |
JPH10243349A (ja) * | 1997-02-21 | 1998-09-11 | Matsushita Electric Ind Co Ltd | データ作成方法,及びデータ再生装置 |
JPH10340542A (ja) * | 1997-06-06 | 1998-12-22 | Toshiba Corp | マルチストリームのデータ記録媒体とそのデータ伝送再生装置及び方法 |
JPH1153841A (ja) * | 1997-08-07 | 1999-02-26 | Pioneer Electron Corp | 音声信号処理装置および音声信号処理方法 |
JPH11120705A (ja) * | 1997-10-17 | 1999-04-30 | Toshiba Corp | ディスク再生方法及び装置 |
JPH11328863A (ja) * | 1998-05-19 | 1999-11-30 | Toshiba Corp | デジタル音声データ処理装置 |
JP2001036863A (ja) * | 1999-07-22 | 2001-02-09 | Nec Ic Microcomput Syst Ltd | 画像処理装置 |
JP2003257125A (ja) * | 2002-03-05 | 2003-09-12 | Seiko Epson Corp | 音再生方法および音再生装置 |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE2456642A1 (de) | 1974-11-29 | 1976-08-12 | Deutscher Filmdienst Waldfried | Synchronisierverfahren fuer die tonfilmtechnik |
JPH0662500A (ja) | 1992-08-05 | 1994-03-04 | Mitsubishi Electric Corp | ミューズデコーダ |
JP2766466B2 (ja) * | 1995-08-02 | 1998-06-18 | 株式会社東芝 | オーディオ方式、その再生方法、並びにその記録媒体及びその記録媒体への記録方法 |
JPH09288866A (ja) | 1996-04-22 | 1997-11-04 | Sony Corp | 記録再生装置および方法 |
US6044307A (en) * | 1996-09-02 | 2000-03-28 | Yamaha Corporation | Method of entering audio signal, method of transmitting audio signal, audio signal transmitting apparatus, and audio signal receiving and reproducing apparatus |
CA2257572A1 (en) | 1997-04-12 | 1998-10-22 | Yoshiyuki Nakamura | Editing system and editing method |
JP2000228054A (ja) | 1999-02-08 | 2000-08-15 | Sharp Corp | 情報再生装置 |
US6778756B1 (en) * | 1999-06-22 | 2004-08-17 | Matsushita Electric Industrial Co., Ltd. | Countdown audio generation apparatus and countdown audio generation system |
JP4555072B2 (ja) * | 2002-05-06 | 2010-09-29 | シンクロネイション インコーポレイテッド | ローカライズされたオーディオ・ネットワークおよび関連するディジタル・アクセサリ |
KR100910975B1 (ko) | 2002-05-14 | 2009-08-05 | 엘지전자 주식회사 | 인터넷을 이용한 대화형 광디스크 재생방법 |
US7706544B2 (en) * | 2002-11-21 | 2010-04-27 | Fraunhofer-Geselleschaft Zur Forderung Der Angewandten Forschung E.V. | Audio reproduction system and method for reproducing an audio signal |
US20040199276A1 (en) * | 2003-04-03 | 2004-10-07 | Wai-Leong Poon | Method and apparatus for audio synchronization |
DE10322722B4 (de) * | 2003-05-20 | 2005-11-24 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Vorrichtung und Verfahren zum Synchronisieren eines Audiossignals mit einem Film |
JP4305065B2 (ja) * | 2003-06-12 | 2009-07-29 | ソニー株式会社 | Av同期処理装置および方法ならびにav記録装置 |
US20050058307A1 (en) * | 2003-07-12 | 2005-03-17 | Samsung Electronics Co., Ltd. | Method and apparatus for constructing audio stream for mixing, and information storage medium |
-
2005
- 2005-04-05 WO PCT/JP2005/006685 patent/WO2005098854A1/ja active Application Filing
- 2005-04-05 EP EP05728821A patent/EP1734527A4/en not_active Ceased
- 2005-04-05 CN CNB2005800119734A patent/CN100505064C/zh active Active
- 2005-04-05 US US11/547,305 patent/US7877156B2/en active Active
- 2005-04-05 JP JP2006512092A patent/JP3892478B2/ja active Active
- 2005-04-05 KR KR1020067019351A patent/KR100762608B1/ko active IP Right Grant
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05101608A (ja) * | 1991-10-09 | 1993-04-23 | Fujitsu Ltd | 音声編集装置 |
JPH05266634A (ja) * | 1992-03-19 | 1993-10-15 | Fujitsu Ltd | オーディオデータの重ね合せ方法及び重ね合せ装置 |
JPH07296519A (ja) * | 1994-04-28 | 1995-11-10 | Sony Corp | ディジタルオーディオ信号伝送装置 |
JPH10145735A (ja) * | 1996-11-05 | 1998-05-29 | Toshiba Corp | 復号装置および画像/音声再生方法 |
JPH10243349A (ja) * | 1997-02-21 | 1998-09-11 | Matsushita Electric Ind Co Ltd | データ作成方法,及びデータ再生装置 |
JPH10340542A (ja) * | 1997-06-06 | 1998-12-22 | Toshiba Corp | マルチストリームのデータ記録媒体とそのデータ伝送再生装置及び方法 |
JPH1153841A (ja) * | 1997-08-07 | 1999-02-26 | Pioneer Electron Corp | 音声信号処理装置および音声信号処理方法 |
JPH11120705A (ja) * | 1997-10-17 | 1999-04-30 | Toshiba Corp | ディスク再生方法及び装置 |
JPH11328863A (ja) * | 1998-05-19 | 1999-11-30 | Toshiba Corp | デジタル音声データ処理装置 |
JP2001036863A (ja) * | 1999-07-22 | 2001-02-09 | Nec Ic Microcomput Syst Ltd | 画像処理装置 |
JP2003257125A (ja) * | 2002-03-05 | 2003-09-12 | Seiko Epson Corp | 音再生方法および音再生装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP1734527A4 * |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006246296A (ja) * | 2005-03-07 | 2006-09-14 | Nec Electronics Corp | データ処理装置及びデータ処理方法 |
JP4541191B2 (ja) * | 2005-03-07 | 2010-09-08 | ルネサスエレクトロニクス株式会社 | データ処理装置及びデータ処理方法 |
JP2007127861A (ja) * | 2005-11-04 | 2007-05-24 | Kddi Corp | 付属情報埋め込み装置および再生装置 |
JP2007150853A (ja) * | 2005-11-29 | 2007-06-14 | Toshiba Corp | 供給装置と処理装置及び指示方法 |
US8478910B2 (en) | 2005-11-29 | 2013-07-02 | Kabushiki Kaisha Toshiba | Supply device and processing device as well as instruction method |
JP2008090936A (ja) * | 2006-10-02 | 2008-04-17 | Sony Corp | 信号処理装置、信号処理方法、およびプログラム |
US8719040B2 (en) | 2006-10-02 | 2014-05-06 | Sony Corporation | Signal processing apparatus, signal processing method, and computer program |
US7463170B2 (en) | 2006-11-30 | 2008-12-09 | Broadcom Corporation | Method and system for processing multi-rate audio from a plurality of audio processing sources |
US7852239B2 (en) | 2006-11-30 | 2010-12-14 | Broadcom Corporation | Method and system for processing multi-rate audio from a plurality of audio processing sources |
EP1927982A3 (en) * | 2006-11-30 | 2008-06-11 | Broadcom Corporation | Method and system for processing multi-rate audio from a plurality of audio processing sources |
KR100915115B1 (ko) * | 2006-11-30 | 2009-09-03 | 브로드콤 코포레이션 | 복수의 오디오 프로세싱 소스들로부터 멀티 레이트오디오를 처리하는 방법 및 시스템 |
KR100915116B1 (ko) * | 2006-11-30 | 2009-09-03 | 브로드콤 코포레이션 | 다중 경로의 멀티 레이트 오디오 프로세싱 시의 믹싱복잡도를 줄일 수 있도록 레이트 변환 필터를 이용하는방법 및 시스템 |
US8169344B2 (en) | 2006-11-30 | 2012-05-01 | Broadcom Corporation | Method and system for audio CODEC voice ADC processing |
JP2008159238A (ja) * | 2006-11-30 | 2008-07-10 | Matsushita Electric Ind Co Ltd | 音声データ送信装置および音声データ受信装置 |
EP1928110A2 (en) * | 2006-11-30 | 2008-06-04 | Broadcom Corporation | Method and system for utilizing rate conversion filters to reduce mixing complexity during multipath multi-rate audio processing |
US7936288B2 (en) | 2006-11-30 | 2011-05-03 | Broadcom Corporation | Method and system for audio CODEC voice ADC processing |
EP1928110A3 (en) * | 2006-11-30 | 2008-12-10 | Broadcom Corporation | Method and system for utilizing rate conversion filters to reduce mixing complexity during multipath multi-rate audio processing |
JP2009063752A (ja) * | 2007-09-05 | 2009-03-26 | Toshiba Corp | 音声再生装置及び音声再生方法 |
JP2009289372A (ja) * | 2008-05-30 | 2009-12-10 | Toshiba Corp | 音声データ処理装置および音声データ処理方法 |
JP2010288262A (ja) * | 2009-05-14 | 2010-12-24 | Yamaha Corp | 信号処理装置 |
JP2011044213A (ja) * | 2009-08-24 | 2011-03-03 | Sony Corp | 情報処理装置および方法、並びにプログラム |
JP2011070076A (ja) * | 2009-09-28 | 2011-04-07 | Nec Personal Products Co Ltd | 情報処理装置 |
JP2011077678A (ja) * | 2009-09-29 | 2011-04-14 | Toshiba Corp | データストリーム処理装置、映像装置、およびデータストリーム処理方法 |
JP2014140135A (ja) * | 2013-01-21 | 2014-07-31 | Kddi Corp | 情報再生端末 |
JP2017521922A (ja) * | 2014-06-10 | 2017-08-03 | テンセント テクノロジー (シェンジェン) カンパニー リミテッド | ビデオリモートコメンタリー同期方法及びシステム並びにターミナルデバイス |
US9924205B2 (en) | 2014-06-10 | 2018-03-20 | Tencent Technology (Shenzhen) Company Limited | Video remote-commentary synchronization method and system, and terminal device |
JP2019165487A (ja) * | 2019-05-15 | 2019-09-26 | 東芝映像ソリューション株式会社 | 放送受信装置及び放送受信方法 |
JP2019165488A (ja) * | 2019-05-15 | 2019-09-26 | 東芝映像ソリューション株式会社 | 放送受信装置及び放送受信方法 |
WO2021065496A1 (ja) * | 2019-09-30 | 2021-04-08 | ソニー株式会社 | 信号処理装置および方法、並びにプログラム |
Also Published As
Publication number | Publication date |
---|---|
JP3892478B2 (ja) | 2007-03-14 |
US7877156B2 (en) | 2011-01-25 |
CN1942962A (zh) | 2007-04-04 |
KR20070003958A (ko) | 2007-01-05 |
KR100762608B1 (ko) | 2007-10-01 |
EP1734527A1 (en) | 2006-12-20 |
EP1734527A4 (en) | 2007-06-13 |
US20080037151A1 (en) | 2008-02-14 |
JPWO2005098854A1 (ja) | 2007-08-16 |
CN100505064C (zh) | 2009-06-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP3892478B2 (ja) | 音声再生装置 | |
JP5586950B2 (ja) | プリセットオーディオシーンを用いたオブジェクトベースの3次元オーディオサービスシステム及びその方法 | |
JP4536653B2 (ja) | データ処理装置および方法 | |
JP4602204B2 (ja) | 音声信号処理装置および音声信号処理方法 | |
US20060210245A1 (en) | Apparatus and method for simultaneously utilizing audio visual data | |
KR100802179B1 (ko) | 프리셋 오디오 장면을 이용한 객체기반 3차원 오디오서비스 시스템 및 그 방법 | |
TW200830874A (en) | Image information transmission system, image information transmitting apparatus, image information receiving apparatus, image information transmission method, image information transmitting method, and image information receiving method | |
JP4613674B2 (ja) | 音声再生装置 | |
WO2011142129A1 (ja) | デジタル放送受信装置及びデジタル放送受信方法 | |
JP2012019386A (ja) | 再生装置、再生方法、およびプログラム | |
JP2003111023A (ja) | データ記録装置、データ記録方法、プログラム、および媒体 | |
JP4013800B2 (ja) | データ作成方法及びデータ記録装置 | |
JP2008288935A (ja) | 音声処理装置 | |
JP4285099B2 (ja) | データ再生方法及びデータ再生装置 | |
JP2006148679A (ja) | データ処理装置 | |
JP2007235519A (ja) | 映像音声同期方法及び映像音声同期システム | |
JP4270084B2 (ja) | 記録再生装置 | |
KR100681647B1 (ko) | Pvr의 편집 관리 시스템 및 그 제어 방법 | |
JP2006148839A (ja) | 放送装置、受信装置、及びこれらを備えるデジタル放送システム | |
JP2008125015A (ja) | 映像音声記録再生装置 | |
WO2013098898A1 (ja) | 放送受信装置および音声信号再生方法 | |
JP2005033576A (ja) | コンテンツ記録再生装置 | |
JP2010098522A (ja) | デジタル放送受信機 | |
JP2002247476A (ja) | 多言語音声出力機能付き放送受信装置及び再生装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DPEN | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2006512092 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020067019351 Country of ref document: KR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2005728821 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11547305 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 200580011973.4 Country of ref document: CN |
|
WWP | Wipo information: published in national office |
Ref document number: 2005728821 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 1020067019351 Country of ref document: KR |
|
WWP | Wipo information: published in national office |
Ref document number: 11547305 Country of ref document: US |