EP2489206A1 - Verarbeitung von in einer subbanddomäne codierten schalldaten - Google Patents
Verarbeitung von in einer subbanddomäne codierten schalldatenInfo
- Publication number
- EP2489206A1 EP2489206A1 EP10781956A EP10781956A EP2489206A1 EP 2489206 A1 EP2489206 A1 EP 2489206A1 EP 10781956 A EP10781956 A EP 10781956A EP 10781956 A EP10781956 A EP 10781956A EP 2489206 A1 EP2489206 A1 EP 2489206A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- ear
- channel
- lateral
- virtual
- channels
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000012545 processing Methods 0.000 title claims abstract description 69
- 238000012546 transfer Methods 0.000 claims abstract description 81
- 230000003447 ipsilateral effect Effects 0.000 claims abstract description 57
- 239000011159 matrix material Substances 0.000 claims abstract description 57
- 238000001914 filtration Methods 0.000 claims abstract description 36
- 238000001228 spectrum Methods 0.000 claims abstract description 31
- 230000006870 function Effects 0.000 claims description 95
- 238000011282 treatment Methods 0.000 claims description 54
- 238000000034 method Methods 0.000 claims description 33
- 230000014509 gene expression Effects 0.000 claims description 19
- 230000008569 process Effects 0.000 claims description 10
- 238000009877 rendering Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 4
- 230000001934 delay Effects 0.000 claims description 3
- 230000010363 phase shift Effects 0.000 claims description 3
- 210000003128 head Anatomy 0.000 description 18
- 230000000694 effects Effects 0.000 description 15
- 230000008901 benefit Effects 0.000 description 7
- 230000008447 perception Effects 0.000 description 5
- 210000004556 brain Anatomy 0.000 description 4
- 238000004040 coloring Methods 0.000 description 4
- 230000003111 delayed effect Effects 0.000 description 4
- 230000003595 spectral effect Effects 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 230000007704 transition Effects 0.000 description 4
- VBRBNWWNRIMAII-WYMLVPIESA-N 3-[(e)-5-(4-ethylphenoxy)-3-methylpent-3-enyl]-2,2-dimethyloxirane Chemical compound C1=CC(CC)=CC=C1OC\C=C(/C)CCC1C(C)(C)O1 VBRBNWWNRIMAII-WYMLVPIESA-N 0.000 description 3
- 238000010276 construction Methods 0.000 description 3
- 230000015654 memory Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000002238 attenuated effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 210000005069 ears Anatomy 0.000 description 2
- 238000007654 immersion Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000013707 sensory perception of sound Effects 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 241000282994 Cervidae Species 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 208000015532 congenital bilateral absence of vas deferens Diseases 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 210000000613 ear canal Anatomy 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 210000002837 heart atrium Anatomy 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 230000003936 working memory Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S1/00—Two-channel systems
- H04S1/002—Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/01—Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
Definitions
- the invention relates to a sound data processing.
- a sound data processing In the context of processing sound data in a multichannel format (5.1 or more), we seek to provide a 3D spatialization effect called "Virtual Surround".
- Such treatments involve filters that aim to reproduce a sound field at the entrances of a person's ear canals. Indeed, a listener is able to locate the sounds in the space with a certain precision, thanks to the perception of the sounds by his two ears.
- the signals emitted by the sound sources undergo acoustic transformations by spreading to the ears. These acoustic transformations are characteristic of the acoustic channel established between a sound source and a point of the auditory canal of the individual.
- Each ear has its own acoustic channel, and these acoustic channels depend on the position and orientation of the source relative to the listener, the shape of the head and ear of the listener, but also the acoustic environment (eg a reverberation due to a room effect).
- These acoustic channels can be modeled by filters commonly called “Head Impulse Responses” or “Head Related Impulse Responses” (HRIR), or “Head Transfer Functions” or HRTF (“Head Related Transfer Functions”). "in English) according to whether we give respectively a representation in the time domain or frequency. Referring to FIG.
- the HRTFs functions for the left ear and for the right ear are identical for the sources which lie in the median plane (plane P which separates the left half of the right half of the body as illustrated in figure 2).
- Acoustic indices exploited by the brain to locate sounds are often classified into two families of indices:
- transaural playback means listening on two remote speakers of audio content initially in a multi-channel format.
- a stamping channels hereinafter referred to as “downmix” or “downmix”.
- Downmix processing is a matrix processing that allows to go from N channels to M channels with N> M. It will be considered in the following that a “Downmix” treatment (since it does not take into account spatialisation effects) does not involve a filter based on HRTF functions.
- "Downmix” processing matrices used in sound reproduction devices (PC, DVD, TV, etc.) have constant coefficients that are independent of time and frequency.
- SG and SR are respectively left and right stereo output signals
- EAVG and EAVD are respectively input signals which would have been intended to supply left side speakers AVG and right AVD (illustrated in FIG. 2)
- E ARG and E ARD are respectively input signals that would have been intended to supply left rear ARG loudspeakers and right ARD rear speakers, located behind the AU listener of FIG. 2,
- Ec is an input signal that would have been intended to power a central loudspeaker C located in front of the AU listener, and
- the treatment referred to below as "ITU Downmix” does not allow the precise spatial perception of sound events.
- a "Downmix” type treatment in general, does not allow spatial perception since it does not involve an HRTF filter.
- the feeling of immersion that multi-channel content can offer is then lost with headphone listening compared to listening on a system with more than two speakers (for example in the format 5.1 as illustrated in the figure 2).
- a sound supposed to be emitted by a moving source from the front to the back of the listener is not correctly reproduced on a simply stereo system (on a headset or a pair of high Speakers).
- a sound present only in the S-channel G (OR SR) and processed by the downmix ITU submix is only output in the left (or right, respectively) atrium in the case of listening. on the headphones, whereas in the case of listening on a system with more than two speakers (for example in the 5.1 format), the right ear (or left, respectively) also perceives a diffraction signal.
- Downmix binaural a binaural format
- the virtual loudspeakers are created by the so-called “binaural synthesis” technique.
- This technique consists in applying sound transfer functions of the head (HRTF) to audio signals monophonic, to obtain a binaural signal that allows, when listening to the headphones, to have the feeling that sound sources come from a particular direction of space.
- the signal of the right ear is obtained by filtering the monophonic signal by the HRTF function of the right ear and the signal of the left ear is obtained by filtering this same monophonic signal by the HRTF function of the left ear.
- the resulting binaural signal is then available for headphone listening.
- FIG. 3A A transfer function defined by a filter is associated with each acoustic path between an ear of the listener and a virtual speaker (placed as recommended in the multi-channel format 5.1 in the example shown).
- a transfer function defined by a filter is associated with each acoustic path between an ear of the listener and a virtual speaker (placed as recommended in the multi-channel format 5.1 in the example shown).
- HCg (respectively HCd) is the filter corresponding to an HRTF for the path between the central loudspeaker C and the left ear OG (respectively right OD) of the listener,
- HGg (respectively HDd) is the filter corresponding to an HRTF called "ipsi- lateral" (ear “illuminated” by the loudspeaker) for the direct path (solid line) between the AVG left lateral loudspeaker (respectively right lateral AVD) and the left ear OG (respectively right OD) of the listener,
- HGd (respectively HDg) is the filter corresponding to a so-called “contralateral” HRTF (ear in the "shadow” of the head) for the indirect path (in dashed lines) between the left lateral loudspeaker AVG (respectively right lateral AVD) and the right ear OD (respectively left OG) of the listener,
- HGSg (respectively HDSd) is the filter corresponding to an ipsi-lateral HRTF for the direct path (solid line) between the ARG left rear speaker
- HDSg is the filter corresponding to a contralateral HRTF for the indirect path (in dashed lines) between the ARG left rear loudspeaker (ARD right rear respectively) and the right OD ear
- this standard provides an embodiment in which a multi-channel signal is transported in the form of stereo downmix and spatialization parameters (CLD for Channel Level Difference, ICC for Inter-channel). Channel Coherence ", and CPC for" Channel Prediction Coefficient ").
- CLD Stereo Downmix and spatialization parameters
- ICC Inter-channel
- CPC Channel Prediction Coefficient
- These parameters make it possible, in a first step, to implement stereo expansion downmix processing to three L ', R' and C signals.
- they allow the expansion of L signals. , R 'and C to obtain 5.1 signals (denoted L, Ls, R, Rs, C and LFE for "Low Frequency Effect").
- the C and LFE signals are not separated.
- Signal C is used for binaural Downmix processing. So here, from two monophonic signals, three signals are first constructed (for respective left channels L ', right R' and center C ').
- channels L and Ls respectively, of the left and right surround virtual speakers in 5.1 format, for sample 1 of the frequency band m in time-frequency transform
- - is the expression of the spectrum of the HRTF for a path between a right speaker in 5.1 format and the right ear
- - is the expression of the spectrum of the HRTF for a path between a left loudspeaker in 5.1 format and the left ear
- the present invention improves the situation.
- the applied matrix filtering comprises a multiplicative coefficient defined by the spectrum, in the field of the subbands, of the second transfer function deconvolved by the first transfer function.
- a first advantage that arises from such a construction is the significant reduction in the complexity of the treatments.
- central virtual speaker transfer functions no longer need to be taken into account.
- the coefficients of the matrix are no longer expressed as a function of the HRTFs spectra but simply as a function of the spatialization gains of the M channels on the N virtual speakers located in a field. hemisphere around a first ear.
- the N-channel representation comprises, by hemisphere around an ear, at least one direct virtual speaker and a virtual ambience speaker as in the "virtual surround"
- the coefficients of the matrix s' expressing, in a domain of time-frequency transform subbands (for example of the "P MF” type for "Pseudo-Quadrature Mirror Filters"), by:
- contra-lateral relative to the right ear of the listener, deconvolved by an ipsi-lateral transfer function, relating to the left ear, for a virtual left speaker, direct or respectively ambient,
- contra-lateral relative to the left ear of the listener, deconvolved by an ipsi-lateral transfer function, relative to the right ear, for a virtual right speaker, direct or respectively ambient,
- ipsi-lateral corresponding to selected interaural delays, and - are selected weights.
- the coefficient g may have an advantageous value of 0.707 (corresponding to the root of 1/2, when a half energy distribution of the signal of the central loudspeaker is provided on the side loudspeakers), as recommended in the "Downmix ITU" treatment.
- the matrix filtering is expressed according to a product of matrices of type:
- the filtering of the contralateral component defined by the counter-lateral transfer function deconvolved by the ipsilateral transfer function makes it possible to reduce the stamp distortion provided by the binauralization processing.
- a filtering returns to a low-pass filtering delayed by a value corresponding to the interaural delay.
- the brain perceives, on one ear, the original signal (without treatment) and, on the other ear, the delayed and filtered signal passes low. Beyond the cutoff frequency, the difference in perceived level compared to the diotic listening of the attenuated moose signal of 6dB, is minimal. On the other hand, under the cutoff frequency, the signal is perceived twice as strong. For signals containing frequencies below the cutoff frequency, the difference in timbre will therefore consist of an amplification of the low frequencies.
- Such de-stamping may advantageously be eliminated simply by high-pass filtering, which may be the same for all HRTFs transfer functions (speaker directions).
- high-pass filtering which may be the same for all HRTFs transfer functions (speaker directions).
- the above-mentioned de-stamping can advantageously be applied to the binaural stereo signal resulting from the submixing.
- an automatic gain control can be advantageously provided at the end of the treatment, so that so that the levels that the Downmix processing and the Binauralization process would deliver in the sense of the invention are Similar.
- there is provided at the end of the processing chain a high-pass filter and an automatic gain control.
- a selected gain is also applied to two left-channel and right-channel signals in two-channel representation (binaural or transaural®), before restitution, the selected gain being controlled to limit a signal energy. of left and right channels, at most, to a signal energy of the virtual loudspeakers.
- preferential automatic gain control is applied to the two left and right channel signals, downstream from the application of the variable frequency weighting factor.
- the coefficients of the aforementioned matrix and intervening in the matrix filtering vary according to the frequency, according to a weighting of a factor (Gain) chosen and less than one, if the frequency is lower than a threshold chosen, and one if not.
- the factor is about 0.5 and the chosen frequency threshold is about 500 Hz to eliminate a color distortion.
- Another advantage provided by the invention is the transport of the encoded signal and its processing with a decoder to improve its sound quality, for example a decoder type MPEG Surround ®.
- a decoder type MPEG Surround ® for example a decoder type MPEG Surround ®.
- no transfer function is applied for the direct paths (ipsi-lateral contributions) and additional processing is provided on the indirect paths (spectrum of the counter-lateral transfer function deconvolved by the function ipsi-lateral transfer)
- the untreated part of the stereo submix ipsilateral contributions
- the above can be generalized to any type of downmix processing.
- downmix processing to two channels usually involves applying weighting to the channels (virtual speakers), then summing the N channels to two output signals.
- Applying binaural spatialization processing to Downmix processing involves applying to the N weighted channels the HRTF filters corresponding to the positions of the N virtual speakers. Since these filters are equal to 1 for the ipsi-lateral contributions, we find the Downmix treatment by applying the sum of the ipsi-lateral contributions.
- the signals obtained by a binauralization processing in the sense of the invention are presented as being derived from a sum of Downmix type signals and a stereo signal comprising the localization indices necessary for the brain to perceive the spatialization of the sounds.
- This second signal is hereinafter referred to as "Downmix Binaural Additionnel", so that the treatment in the sense of the invention here called “Downmix Binaural” is such that:
- a can be a coefficient between 0 and 1.
- a listener user can choose the level of coefficient a between 0 and 1, continuously or by switching between 0 and 1 (in "ON-OFF" mode). . So, we can choose a a weighting of the second treatment "Downmix Binaural Additional" global processing using matrix filtering within the meaning of the invention.
- This embodiment has the advantage of requiring only a low bandwidth for the transmission of the results of Downmix and DBA processing, from an encoder to a decoder as shown in FIG. 7 described below, by only requesting the bit rate if the result of the DBA treatment is significant compared to the result of the Downmix.
- ⁇ 0; 0.25; 0.5; 0.75; 1.
- This additional signal requires only a small amount of flow to transport it. Indeed, it presents itself as a residual signal, filtered low-pass and thus a priori much less energetic than the Downmix signal. In addition, it has redundancies with the Downmix signal. This property can be exploited advantageously in conjunction with Dolby Surround, Dolby Prologic or MPEG Surround type codecs.
- the "Downmix Binaural Additional" signal can then be compressed and transported additionally and / or scalable to the Downmix signal, with little bit rate.
- the addition of the two stereo signals allows the listener to take full advantage of the binaural signal with a quality very close to a 5.1 format.
- matrix filtering within the meaning of the invention consists in applying, in an advantageous embodiment:
- a second processing leading when executed in conjunction with the first processing, to a spatialization of the N virtual loudspeakers respectively associated with the N channels to obtain a bi-channel, binaural or transaural representation.
- the application of the second processing is decided optionally (for example as a function of the bit rate, the spatialized rendering capabilities of a terminal, or others).
- the first aforementioned treatment can be applied in an encoder communicating with a decoder, while the second treatment is advantageously applied to the decoder.
- the treatment management in the sense of the invention may advantageously be conducted by a computer program comprising instructions for implementing the method according to the invention, when this program is executed by a processor, for example with a decoder in particular .
- the invention also aims at such a program.
- the present invention also relates to a module equipped with a processor and a memory and capable of executing this computer program.
- a module in the sense of the invention for the processing of sound data encoded in a subband domain, for binaural or transaural® bi-channel rendering, then comprises means for applying matrix filtering to switch from a representation.
- N channel sound with N> 0, to a two-channel representation.
- the N-channel sound representation consists of considering N virtual loudspeakers surrounding a listener's head, and, for each virtual loudspeaker of at least part of the loudspeakers:
- the applied matrix filtering comprises a multiplicative coefficient defined by the spectrum, in the field of the subbands, of the second transfer function deconvolved by the first transfer function.
- Such a module may advantageously be a decoder of the MPEG Surround® type and furthermore include decoding means of the MPEG Surround® type, or may alternatively be implanted in such a decoder.
- FIG. 1 shows schematically a restitution on two speakers around the head of a listener
- FIG. 2 shows schematically a reproduction of five speakers in 5.1 multi-channel format
- FIG. 3A schematically represents the ipsilateral (solid lines) and counter-lateral (dashed lines) paths in multi-channel 5.1 format;
- FIG. 3B shows a prior art processing scheme for switching from a multi-channel 5.1 format illustrated in Fig. 3A to a binaural or transaural format
- FIG. 4A schematically represents the ipsilateral (solid lines) and contra-lateral (dashed lines) paths in multi-channel 5.1 format, with the ipsilateral and counter-lateral paths of the central loudspeaker;
- FIG. 4B represents a processing diagram for the transition from a multi-channel 5.1 format illustrated in FIG. 4A to a binaural or transaural format, with only four filters in an embodiment within the meaning of the invention;
- FIG. 5 illustrates a treatment equivalent to the application of one of the filters of FIG. 4B
- FIG. 6 illustrates an additional processing of high pass filtering and automatic gain control to be applied to the outputs SG and SD to avoid a color distortion and a difference in tone between a "downmix" treatment and a treatment according to the invention. invention
- FIG. 7 illustrates the situation of a treatment in the sense of the invention, made with the encoder in an exemplary embodiment of the invention, particularly in the case of an additional DBA treatment to be combined with the Downmix treatment.
- FIG. 4A is firstly described to describe an example of implementation of the processing to switch from a multi-channel representation (format 5.1 in the example described) to a binaural or transaural stereo two-channel representation. .
- a multi-channel representation format 5.1 in the example described
- a binaural or transaural stereo two-channel representation format 5.1 in the example described
- five speakers configured in 5.1 format are illustrated:
- the channels associated with speaker positions for example the AVG and ARG loudspeakers of FIG. 4A
- speaker positions for example the AVG and ARG loudspeakers of FIG. 4A
- first hemisphere with respect to the listener that of the left ear OG
- second hemisphere relative to the listener that of his right ear OD
- first and second hemispheres are separated by the median plane of the listener.
- the additional treatment preferably comprises the application of filtering (C / I) AVG, (C / I) A VD, (C / I) ARG, (C / I) ARD (FIG. 4B) defined, in the coded domain (or transformed) by the spectrum of a counter-lateral acoustic transfer function deconvolved by an ipsilateral transfer function.
- the ipsi-lateral transfer function is associated with a direct acoustic path Uvc IAVD, RG, URD (FIG.
- the spatialization of the virtual speaker is provided by a pair of transfer functions HRTF (expressed in the frequency domain) or HRIR (expressed in the time domain). These transfer functions translate the ipsi-lateral path (direct path between the loudspeaker and the closest ear in solid lines in FIG. 4 A) and the contra-lateral path (path between the loudspeaker and the ear masked by the listener's head in dashed lines in Figure 4A).
- the filter associated with the ipsi-lateral path is advantageously omitted and a filter corresponding to the transfer function is used for the contra-lateral path. counter-lateral deconvolved by the ipsilateral transfer function. Thus, for each virtual speaker (except the central speaker C), only one filter is used.
- the referenced filter (C / I) ARG is defined, in the transformed domain, by the spectrum of the counter-lateral transfer function of the path between the rear left speaker ARG and the right ear OD deconvolved by the function of ipsi-lateral transfer of the path between the left rear loudspeaker ARG and the left ear OG of the individual, - the filter referenced (C / I) A RD is defined, in the transformed domain, by the spectrum of the function counter-lateral transfer of the path between the rear right speaker
- a VG is defined in the transformed domain, by the spectrum of the counter-lateral transfer function of the path between the left lateral loudspeaker AVG and the right ear OD deconvolved by the ipsilateral transfer function of the path between the AVG left lateral speaker and the left ear OG of the individual, and
- the referenced filter (C / I) AVD is defined, in the transformed domain, by the spectrum of the counter-lateral transfer function of the path between the right lateral loudspeaker AVD and the left ear OG deconvolved by the function of ipsilateral lateral transfer of the path between the right lateral speaker AVD and the right ear OD of the individual.
- the signal which, in encoding 5.1, is intended to supply the central loudspeaker C (in the median plane of symmetry of the listener's head), is distributed in two fractions (preferably equal to 50% and 50%) on two channels adding to two respective channels of the left and right side speakers.
- the associated signal is mixed with the signals associated with the ARG left rear speaker and ARD right rear speaker.
- central loudspeakers front speaker for a reproduction of the midrange frequencies, front speaker for a reproduction of low frequencies, or other
- the channel associated with a central speaker position C, in the median plane is divided into a first and a second signal fraction, respectively added to the AVG speaker channel in the first hemisphere (around the ear left OG) and to the AVD loudspeaker channel in the second hemisphere (around the right ear OD), it is not necessary to provide for filtering by the transfer functions associated with the loudspeakers in the plane median, without any change in the perception of the spatialization of the sound stage in binaural or transaural restitution.
- the processing complexity is greatly reduced since the filters associated with the loudspeakers located in the median plane are removed. Another advantage is that the coloring effect of the associated signals is reduced.
- the spectrum of the contralateral transfer function deconvolved by the ipsilateral transfer function can be defined in the transformed domain by:
- the spectrum of the contralateral transfer function deconvolved by the ipsilateral transfer function can be defined, in the transformed domain, by: and being the gain and the phase of the
- each filter is equivalent to applying:
- an equalizer filter 1 preferably of the low-pass type
- interaural delay (or "ITD") 10 to take account of the differences in path between a virtual source and each ear
- the unfiltered signal components possibly an attenuation 12 with respect to the unfiltered signal components (for example the AVG component on the SG channel of FIG. 4B).
- the applied ITD delay is “substantially” interaural, the term “substantially” referring in particular to the fact that the strict morphology of the listener may not be rigorously taken into account (eg if HRTFs are used by default, including HRTFs called “Kemar head”).
- the binaural synthesis of a virtual loudspeaker consists simply of playing without modification the input signal on the relative ipsi-lateral channel (channel SG in FIG. 4B) and to apply to the signal to be played on the counter-lateral channel (SD channel in FIG. 4B) a corresponding AVG filter (C / I) in application of delay, attenuation and low-pass filtering.
- the resulting signal is delayed, attenuated and filtered by eliminating the high frequencies, which results, from the point of view of auditory perception, in masking the signal received by the "counter-lateral" ear (OD, in the example where the virtual speaker is the left side AVG), relative to the signal received by the "ipsi-lateral” ear (OG).
- the coloration that can be perceived is therefore directly that of the signal received by the ipsilateral ear.
- this signal undergoes no transformation and, therefore, the treatment in the sense of the invention should provide only a weak coloration.
- a processing of the output signals SG and SD of FIG. 4B can be provided consisting in applying a high-pass filter FPH followed by an automatic gain control AGC.
- the high-pass filter is equivalent to applying the "Gain" factor described above, with:
- Gain 0.5 if the frequency f is less than 500 Hz and
- this factor is applied globally at the output of the signals SG and SD, alternatively from an individual application to each coefficient of the matrix explained below.
- the automatic gain control is calibrated on the overall intensity of the signals corresponding to the Downmix treatment, given by:
- the gains g and g s are applied globally to the signal C for the gain g and to the signals ARG and ARD for the gain g s .
- the energy of the left channel signals S'G and the right channel S ' D is thus limited at the end of this treatment, to the maximum, to the overall energy I D 2 of the signals from the top virtual speakers.
- the recovered signals S 'G and S'D can finally be routed to a sound reproduction device in binaural stereophonic mode.
- the overall intensity of the signals is usually calculated directly from the energy of the input signals.
- this data will be taken into account for the estimation of the intensity l D.
- the implementation of the invention results in a suppression of monaural location indices.
- the more a source deviates from the median plane the more the interaural indices become predominant to the detriment of the monaural indices.
- the angle between the side speakers (or between the rear speakers) is greater than 60 °, Monaural clues have little influence on the perceived position of the virtual speakers.
- the difference perceived here is less than the difference that the listener could perceive from the fact that the HRTFs used would not be specific to him (for example models of HRTFs drawn from the so-called "Kemar head” technique). .
- the spatial perception of the signal is respected, and this, without bringing color and retaining the timbre of the sound sources.
- the solution within the meaning of the present invention divides the number of filters to provide substantially by two and further corrects the coloring effects.
- the choice of the position of the virtual loudspeakers can significantly influence the quality of the result of the spatialization. Indeed, it has proved preferable to place the virtual speakers side and rear +/- 45 ° with respect to the median plane, rather than +/- 30 ° of the median plane according to the configuration recommended by the International Telecommunications Union (ITU). Indeed, when the virtual speakers approach the median plane, the ipsi-lateral and contra-lateral HRTF functions tend to resemble each other and the previous simplifications may no longer give a satisfactory spatialization.
- ITU International Telecommunications Union
- the position of a lateral loudspeaker is advantageously in an angular sector of 10 ° to 90 ° and preferably 30 to 60 ° from a plane of symmetry P and facing the face of the listener. More particularly, the position of a lateral loudspeaker will preferably be close to 45 ° from the plane of symmetry.
- a processing module within the meaning of the invention 72 intervenes directly downstream of an encoder 71, to deliver, as indicated previously, processed data according to a treatment of the type:
- Downmix + ⁇ DBA (with DBA for "Downmix Binaural Additional").
- DBA Downmix Binaural Additional
- the coefficients of the matrix are such that:
- the global processing matrix H 1 1, k is still expressed as the sum of two matrices:, with
- the matrix consists of applying function-based filtering
- the present invention is not limited to the embodiment described before by way of example; it extends to other variants.
- it has been described above the case of a processing of two initial stereo signals to encode and spatialize to binaural stereo and passing through a 5.1 spatialization.
- the SG and SD channels of FIG. 4B may furthermore undergo dynamic low-pass filtering of the Dolby® or other type.
- the present invention also relates to a MOD module (FIG. 4B) for processing sound data, for the transition from a multi-channel format to a binaural or transaural format, in the transformed domain, the elements of which could be those illustrated in FIG. 4B.
- a module then comprises processing means, such as a PROC processor and a MEM working memory, for the implementation of the invention. It can be implemented in any type of decoder, including a sound reproduction device (PC, walkman, mobile phone, or other) and possibly movie viewing. Alternatively, the module may be designed to operate separately from the restitution, for example to prepare binaural or transaural format content, for subsequent decoding.
- the present invention also relates to a computer program, downloadable via a telecommunication network and / or stored in a memory of a processing module of the aforementioned type and / or stored on a memory medium intended to cooperate with a reader of such a module processing, and comprising instructions for the implementation of the invention, when they are executed by a processor of said module.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Stereophonic System (AREA)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR0957118 | 2009-10-12 | ||
PCT/FR2010/052119 WO2011045506A1 (fr) | 2009-10-12 | 2010-10-08 | Traitement de donnees sonores encodees dans un domaine de sous-bandes |
Publications (1)
Publication Number | Publication Date |
---|---|
EP2489206A1 true EP2489206A1 (de) | 2012-08-22 |
Family
ID=42145029
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP10781956A Withdrawn EP2489206A1 (de) | 2009-10-12 | 2010-10-08 | Verarbeitung von in einer subbanddomäne codierten schalldaten |
Country Status (3)
Country | Link |
---|---|
US (1) | US8976972B2 (de) |
EP (1) | EP2489206A1 (de) |
WO (1) | WO2011045506A1 (de) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101685408B1 (ko) * | 2012-09-12 | 2016-12-20 | 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베. | 3차원 오디오를 위한 향상된 가이드 다운믹스 능력을 제공하기 위한 장치 및 방법 |
FR3012247A1 (fr) * | 2013-10-18 | 2015-04-24 | Orange | Spatialisation sonore avec effet de salle, optimisee en complexite |
WO2015058818A1 (en) | 2013-10-22 | 2015-04-30 | Huawei Technologies Co., Ltd. | Apparatus and method for compressing a set of n binaural room impulse responses |
CN104681034A (zh) | 2013-11-27 | 2015-06-03 | 杜比实验室特许公司 | 音频信号处理 |
DE102014214052A1 (de) * | 2014-07-18 | 2016-01-21 | Bayerische Motoren Werke Aktiengesellschaft | Virtuelle Verdeckungsmethoden |
EP2980789A1 (de) * | 2014-07-30 | 2016-02-03 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Vorrichtung und Verfahren zur Verbesserung eines Audiosignals, Tonverbesserungssystem |
US9749757B2 (en) * | 2014-09-02 | 2017-08-29 | Oticon A/S | Binaural hearing system and method |
US9596544B1 (en) * | 2015-12-30 | 2017-03-14 | Gregory Douglas Brotherton | Head mounted phased focused speakers |
EP3453190A4 (de) | 2016-05-06 | 2020-01-15 | DTS, Inc. | Systeme zur immersiven audiowiedergabe |
US10979844B2 (en) | 2017-03-08 | 2021-04-13 | Dts, Inc. | Distributed audio virtualization systems |
WO2018182274A1 (ko) * | 2017-03-27 | 2018-10-04 | 가우디오디오랩 주식회사 | 오디오 신호 처리 방법 및 장치 |
CN108156561B (zh) * | 2017-12-26 | 2020-08-04 | 广州酷狗计算机科技有限公司 | 音频信号的处理方法、装置及终端 |
US11212631B2 (en) * | 2019-09-16 | 2021-12-28 | Gaudio Lab, Inc. | Method for generating binaural signals from stereo signals using upmixing binauralization, and apparatus therefor |
TWI740206B (zh) * | 2019-09-16 | 2021-09-21 | 宏碁股份有限公司 | 訊號量測的校正系統及其校正方法 |
US20220366919A1 (en) * | 2019-09-23 | 2022-11-17 | Dolby Laboratories Licensing Corporation | Audio encoding/decoding with transform parameters |
CN112653985B (zh) * | 2019-10-10 | 2022-09-27 | 高迪奥实验室公司 | 使用2声道立体声扬声器处理音频信号的方法和设备 |
CN115865688A (zh) * | 2022-11-25 | 2023-03-28 | 天津光电通信技术有限公司 | 一种双通道高速模拟采集回放设备 |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004103023A1 (ja) * | 1995-09-26 | 2004-11-25 | Ikuichiro Kinoshita | 仮想音像定位用伝達関数表作成方法、その伝達関数表を記録した記憶媒体及びそれを用いた音響信号編集方法 |
DE69712230T2 (de) * | 1997-05-08 | 2002-10-31 | St Microelectronics Asia | Verfahren und gerät zur frequenzdomäneabwärtsumsetzung mit zwangblockschaltung für audiodekoderfunktionen |
US6442277B1 (en) * | 1998-12-22 | 2002-08-27 | Texas Instruments Incorporated | Method and apparatus for loudspeaker presentation for positional 3D sound |
US7505601B1 (en) | 2005-02-09 | 2009-03-17 | United States Of America As Represented By The Secretary Of The Air Force | Efficient spatial separation of speech signals |
EP1984913A4 (de) * | 2006-02-07 | 2011-01-12 | Lg Electronics Inc | Vorrichtung und verfahren zum codieren/decodieren eines signals |
KR101358700B1 (ko) * | 2006-02-21 | 2014-02-07 | 코닌클리케 필립스 엔.브이. | 오디오 인코딩 및 디코딩 |
JP4572945B2 (ja) * | 2008-03-28 | 2010-11-04 | ソニー株式会社 | ヘッドフォン装置、信号処理装置、信号処理方法 |
US8321214B2 (en) * | 2008-06-02 | 2012-11-27 | Qualcomm Incorporated | Systems, methods, and apparatus for multichannel signal amplitude balancing |
-
2010
- 2010-10-08 US US13/500,955 patent/US8976972B2/en active Active
- 2010-10-08 WO PCT/FR2010/052119 patent/WO2011045506A1/fr active Application Filing
- 2010-10-08 EP EP10781956A patent/EP2489206A1/de not_active Withdrawn
Non-Patent Citations (1)
Title |
---|
See references of WO2011045506A1 * |
Also Published As
Publication number | Publication date |
---|---|
WO2011045506A1 (fr) | 2011-04-21 |
US20120201389A1 (en) | 2012-08-09 |
US8976972B2 (en) | 2015-03-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2489206A1 (de) | Verarbeitung von in einer subbanddomäne codierten schalldaten | |
EP2042001B1 (de) | Binaurale spatialisierung kompressionsverschlüsselter tondaten | |
EP1600042B1 (de) | Verfahren zum bearbeiten komprimierter audiodaten zur räumlichen wiedergabe | |
JP4874555B2 (ja) | 聴覚情景の後部残響音ベースの合成 | |
CA2820199C (en) | Signal generation for binaural signals | |
FR2790634A1 (fr) | Procede de synthese d'un champ sonore tridimensionnel | |
EP1566077A1 (de) | Entzerrung des ausgangssignals in einemstereo-verbreiterungsnetzwerk | |
WO2007101958A2 (fr) | Optimisation d'une spatialisation sonore binaurale a partir d'un encodage multicanal | |
EP1886535B1 (de) | Verfahren zum herstellen mehrerer zeitsignale | |
CN101855917A (zh) | 生成具有增强的感知质量的立体声信号的方法和装置 | |
EP2000002A2 (de) | Verfahren und einrichtung zur effizienten kunstkopf-schallortslokalisierung im transformierten bereich | |
EP2005420A1 (de) | Einrichtung und verfahren zur codierung durch hauptkomponentenanalyse eines mehrkanaligen audiosignals | |
EP3729832B1 (de) | Verarbeitung eines monophonen signals in einem 3d-audiodecodierer, der binauralen inhalt liefert | |
JP7286876B2 (ja) | 変換パラメータによるオーディオ符号化/復号化 | |
US11470435B2 (en) | Method and device for processing audio signals using 2-channel stereo speaker | |
WO2006075079A1 (fr) | Procede d’encodage de pistes audio d’un contenu multimedia destine a une diffusion sur terminaux mobiles | |
CA3142575A1 (en) | Stereo headphone psychoacoustic sound localization system and method for reconstructing stereo psychoacoustic sound signals using same | |
EP3920552A1 (de) | Zentralisierte verarbeitung eines eingangs-audiodatenstroms | |
Toledo et al. | The role of spectral features in sound localization | |
KR20060004529A (ko) | 입체 음향을 생성하는 장치 및 방법 | |
WO2017032946A1 (fr) | Procédé de mesure de filtres phrtf d'un auditeur, cabine pour la mise en oeuvre du procédé, et procédés permettant d'aboutir à la restitution d'une bande sonore multicanal personnalisée | |
FR3002406A1 (fr) | Procede et dispositif de generation de signaux d'alimentation destines a un systeme de restitution sonore |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20120403 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
DAX | Request for extension of the european patent (deleted) | ||
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: ORANGE |
|
17Q | First examination report despatched |
Effective date: 20141217 |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
INTG | Intention to grant announced |
Effective date: 20150824 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20160105 |