EP2489206A1 - Verarbeitung von in einer subbanddomäne codierten schalldaten - Google Patents

Verarbeitung von in einer subbanddomäne codierten schalldaten

Info

Publication number: EP2489206A1
Authority: EP; European Patent Office
Prior art keywords: ear; channel; lateral; virtual; channels
Prior art date: 2009-10-12
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.): Withdrawn

Application number

EP10781956A

Other languages

English (en)

French (fr)

Inventor

Marc Emerit

Rozenn Nicol

Grégory PALLONE

Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)

Orange SA

Original Assignee

France Telecom SA

Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)

2009-10-12

Filing date

2010-10-08

Publication date

2012-08-22

2010-10-08 Application filed by France Telecom SA filed Critical France Telecom SA

2012-08-22 Publication of EP2489206A1 publication Critical patent/EP2489206A1/de

Status Withdrawn legal-status Critical Current

Links

238000012545 processing Methods 0.000 title claims abstract description 69
238000012546 transfer Methods 0.000 claims abstract description 81
230000003447 ipsilateral effect Effects 0.000 claims abstract description 57
239000011159 matrix material Substances 0.000 claims abstract description 57
238000001914 filtration Methods 0.000 claims abstract description 36
238000001228 spectrum Methods 0.000 claims abstract description 31
230000006870 function Effects 0.000 claims description 95
238000011282 treatment Methods 0.000 claims description 54
238000000034 method Methods 0.000 claims description 33
230000014509 gene expression Effects 0.000 claims description 19
230000008569 process Effects 0.000 claims description 10
238000009877 rendering Methods 0.000 claims description 7
238000004590 computer program Methods 0.000 claims description 4
230000001934 delay Effects 0.000 claims description 3
230000010363 phase shift Effects 0.000 claims description 3
210000003128 head Anatomy 0.000 description 18
230000000694 effects Effects 0.000 description 15
230000008901 benefit Effects 0.000 description 7
230000008447 perception Effects 0.000 description 5
210000004556 brain Anatomy 0.000 description 4
238000004040 coloring Methods 0.000 description 4
230000003111 delayed effect Effects 0.000 description 4
230000003595 spectral effect Effects 0.000 description 4
230000009466 transformation Effects 0.000 description 4
230000007704 transition Effects 0.000 description 4
VBRBNWWNRIMAII-WYMLVPIESA-N 3-[(e)-5-(4-ethylphenoxy)-3-methylpent-3-enyl]-2,2-dimethyloxirane Chemical compound C1=CC(CC)=CC=C1OC\C=C(/C)CCC1C(C)(C)O1 VBRBNWWNRIMAII-WYMLVPIESA-N 0.000 description 3
238000010276 construction Methods 0.000 description 3
230000015654 memory Effects 0.000 description 3
238000013459 approach Methods 0.000 description 2
230000002238 attenuated effect Effects 0.000 description 2
230000005540 biological transmission Effects 0.000 description 2
230000015572 biosynthetic process Effects 0.000 description 2
210000005069 ears Anatomy 0.000 description 2
238000007654 immersion Methods 0.000 description 2
230000004044 response Effects 0.000 description 2
230000013707 sensory perception of sound Effects 0.000 description 2
238000003786 synthesis reaction Methods 0.000 description 2
238000000844 transformation Methods 0.000 description 2
241000282994 Cervidae Species 0.000 description 1
230000003321 amplification Effects 0.000 description 1
238000004364 calculation method Methods 0.000 description 1
239000000969 carrier Substances 0.000 description 1
230000015556 catabolic process Effects 0.000 description 1
230000008859 change Effects 0.000 description 1
208000015532 congenital bilateral absence of vas deferens Diseases 0.000 description 1
238000006731 degradation reaction Methods 0.000 description 1
238000010586 diagram Methods 0.000 description 1
210000000613 ear canal Anatomy 0.000 description 1
238000005516 engineering process Methods 0.000 description 1
210000002837 heart atrium Anatomy 0.000 description 1
230000004807 localization Effects 0.000 description 1
230000000873 masking effect Effects 0.000 description 1
230000004048 modification Effects 0.000 description 1
238000012986 modification Methods 0.000 description 1
238000003199 nucleic acid amplification method Methods 0.000 description 1
238000013139 quantization Methods 0.000 description 1
230000009467 reduction Effects 0.000 description 1
230000005236 sound signal Effects 0.000 description 1
230000007480 spreading Effects 0.000 description 1
230000001629 suppression Effects 0.000 description 1
230000003936 working memory Effects 0.000 description 1

Classifications

- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S1/00—Two-channel systems
- H04S1/002—Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/01—Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]

Definitions

the invention relates to a sound data processing.
a sound data processing In the context of processing sound data in a multichannel format (5.1 or more), we seek to provide a 3D spatialization effect called "Virtual Surround".
Such treatments involve filters that aim to reproduce a sound field at the entrances of a person's ear canals. Indeed, a listener is able to locate the sounds in the space with a certain precision, thanks to the perception of the sounds by his two ears.
the signals emitted by the sound sources undergo acoustic transformations by spreading to the ears. These acoustic transformations are characteristic of the acoustic channel established between a sound source and a point of the auditory canal of the individual.
Each ear has its own acoustic channel, and these acoustic channels depend on the position and orientation of the source relative to the listener, the shape of the head and ear of the listener, but also the acoustic environment (eg a reverberation due to a room effect).
These acoustic channels can be modeled by filters commonly called “Head Impulse Responses” or “Head Related Impulse Responses” (HRIR), or “Head Transfer Functions” or HRTF (“Head Related Transfer Functions”). "in English) according to whether we give respectively a representation in the time domain or frequency. Referring to FIG.
the HRTFs functions for the left ear and for the right ear are identical for the sources which lie in the median plane (plane P which separates the left half of the right half of the body as illustrated in figure 2).
Acoustic indices exploited by the brain to locate sounds are often classified into two families of indices:
transaural playback means listening on two remote speakers of audio content initially in a multi-channel format.
a stamping channels hereinafter referred to as “downmix” or “downmix”.
Downmix processing is a matrix processing that allows to go from N channels to M channels with N> M. It will be considered in the following that a “Downmix” treatment (since it does not take into account spatialisation effects) does not involve a filter based on HRTF functions.
"Downmix” processing matrices used in sound reproduction devices (PC, DVD, TV, etc.) have constant coefficients that are independent of time and frequency.
SG and SR are respectively left and right stereo output signals
EAVG and EAVD are respectively input signals which would have been intended to supply left side speakers AVG and right AVD (illustrated in FIG. 2)
E ARG and E ARD are respectively input signals that would have been intended to supply left rear ARG loudspeakers and right ARD rear speakers, located behind the AU listener of FIG. 2,
Ec is an input signal that would have been intended to power a central loudspeaker C located in front of the AU listener, and
the treatment referred to below as "ITU Downmix” does not allow the precise spatial perception of sound events.
a "Downmix” type treatment in general, does not allow spatial perception since it does not involve an HRTF filter.
the feeling of immersion that multi-channel content can offer is then lost with headphone listening compared to listening on a system with more than two speakers (for example in the format 5.1 as illustrated in the figure 2).
a sound supposed to be emitted by a moving source from the front to the back of the listener is not correctly reproduced on a simply stereo system (on a headset or a pair of high Speakers).
a sound present only in the S-channel G (OR SR) and processed by the downmix ITU submix is only output in the left (or right, respectively) atrium in the case of listening. on the headphones, whereas in the case of listening on a system with more than two speakers (for example in the 5.1 format), the right ear (or left, respectively) also perceives a diffraction signal.
Downmix binaural a binaural format
the virtual loudspeakers are created by the so-called “binaural synthesis” technique.
This technique consists in applying sound transfer functions of the head (HRTF) to audio signals monophonic, to obtain a binaural signal that allows, when listening to the headphones, to have the feeling that sound sources come from a particular direction of space.
the signal of the right ear is obtained by filtering the monophonic signal by the HRTF function of the right ear and the signal of the left ear is obtained by filtering this same monophonic signal by the HRTF function of the left ear.
the resulting binaural signal is then available for headphone listening.
FIG. 3A A transfer function defined by a filter is associated with each acoustic path between an ear of the listener and a virtual speaker (placed as recommended in the multi-channel format 5.1 in the example shown).
a transfer function defined by a filter is associated with each acoustic path between an ear of the listener and a virtual speaker (placed as recommended in the multi-channel format 5.1 in the example shown).
HCg (respectively HCd) is the filter corresponding to an HRTF for the path between the central loudspeaker C and the left ear OG (respectively right OD) of the listener,
HGg (respectively HDd) is the filter corresponding to an HRTF called "ipsi- lateral" (ear “illuminated” by the loudspeaker) for the direct path (solid line) between the AVG left lateral loudspeaker (respectively right lateral AVD) and the left ear OG (respectively right OD) of the listener,
HGd (respectively HDg) is the filter corresponding to a so-called “contralateral” HRTF (ear in the "shadow” of the head) for the indirect path (in dashed lines) between the left lateral loudspeaker AVG (respectively right lateral AVD) and the right ear OD (respectively left OG) of the listener,
HGSg (respectively HDSd) is the filter corresponding to an ipsi-lateral HRTF for the direct path (solid line) between the ARG left rear speaker
HDSg is the filter corresponding to a contralateral HRTF for the indirect path (in dashed lines) between the ARG left rear loudspeaker (ARD right rear respectively) and the right OD ear
this standard provides an embodiment in which a multi-channel signal is transported in the form of stereo downmix and spatialization parameters (CLD for Channel Level Difference, ICC for Inter-channel). Channel Coherence ", and CPC for" Channel Prediction Coefficient ").
CLD Stereo Downmix and spatialization parameters
ICC Inter-channel
CPC Channel Prediction Coefficient
These parameters make it possible, in a first step, to implement stereo expansion downmix processing to three L ', R' and C signals.
they allow the expansion of L signals. , R 'and C to obtain 5.1 signals (denoted L, Ls, R, Rs, C and LFE for "Low Frequency Effect").
the C and LFE signals are not separated.
Signal C is used for binaural Downmix processing. So here, from two monophonic signals, three signals are first constructed (for respective left channels L ', right R' and center C ').
channels L and Ls respectively, of the left and right surround virtual speakers in 5.1 format, for sample 1 of the frequency band m in time-frequency transform
- is the expression of the spectrum of the HRTF for a path between a right speaker in 5.1 format and the right ear
- is the expression of the spectrum of the HRTF for a path between a left loudspeaker in 5.1 format and the left ear
the present invention improves the situation.
the applied matrix filtering comprises a multiplicative coefficient defined by the spectrum, in the field of the subbands, of the second transfer function deconvolved by the first transfer function.
a first advantage that arises from such a construction is the significant reduction in the complexity of the treatments.
central virtual speaker transfer functions no longer need to be taken into account.
the coefficients of the matrix are no longer expressed as a function of the HRTFs spectra but simply as a function of the spatialization gains of the M channels on the N virtual speakers located in a field. hemisphere around a first ear.
the N-channel representation comprises, by hemisphere around an ear, at least one direct virtual speaker and a virtual ambience speaker as in the "virtual surround"
the coefficients of the matrix s' expressing, in a domain of time-frequency transform subbands (for example of the "P MF” type for "Pseudo-Quadrature Mirror Filters"), by:
contra-lateral relative to the right ear of the listener, deconvolved by an ipsi-lateral transfer function, relating to the left ear, for a virtual left speaker, direct or respectively ambient,
contra-lateral relative to the left ear of the listener, deconvolved by an ipsi-lateral transfer function, relative to the right ear, for a virtual right speaker, direct or respectively ambient,
ipsi-lateral corresponding to selected interaural delays, and - are selected weights.
the coefficient g may have an advantageous value of 0.707 (corresponding to the root of 1/2, when a half energy distribution of the signal of the central loudspeaker is provided on the side loudspeakers), as recommended in the "Downmix ITU" treatment.
the matrix filtering is expressed according to a product of matrices of type:
the filtering of the contralateral component defined by the counter-lateral transfer function deconvolved by the ipsilateral transfer function makes it possible to reduce the stamp distortion provided by the binauralization processing.
a filtering returns to a low-pass filtering delayed by a value corresponding to the interaural delay.
the brain perceives, on one ear, the original signal (without treatment) and, on the other ear, the delayed and filtered signal passes low. Beyond the cutoff frequency, the difference in perceived level compared to the diotic listening of the attenuated moose signal of 6dB, is minimal. On the other hand, under the cutoff frequency, the signal is perceived twice as strong. For signals containing frequencies below the cutoff frequency, the difference in timbre will therefore consist of an amplification of the low frequencies.
Such de-stamping may advantageously be eliminated simply by high-pass filtering, which may be the same for all HRTFs transfer functions (speaker directions).
high-pass filtering which may be the same for all HRTFs transfer functions (speaker directions).
the above-mentioned de-stamping can advantageously be applied to the binaural stereo signal resulting from the submixing.
an automatic gain control can be advantageously provided at the end of the treatment, so that so that the levels that the Downmix processing and the Binauralization process would deliver in the sense of the invention are Similar.
there is provided at the end of the processing chain a high-pass filter and an automatic gain control.
a selected gain is also applied to two left-channel and right-channel signals in two-channel representation (binaural or transaural®), before restitution, the selected gain being controlled to limit a signal energy. of left and right channels, at most, to a signal energy of the virtual loudspeakers.
preferential automatic gain control is applied to the two left and right channel signals, downstream from the application of the variable frequency weighting factor.
the coefficients of the aforementioned matrix and intervening in the matrix filtering vary according to the frequency, according to a weighting of a factor (Gain) chosen and less than one, if the frequency is lower than a threshold chosen, and one if not.
the factor is about 0.5 and the chosen frequency threshold is about 500 Hz to eliminate a color distortion.
Another advantage provided by the invention is the transport of the encoded signal and its processing with a decoder to improve its sound quality, for example a decoder type MPEG Surround ®.
a decoder type MPEG Surround ® for example a decoder type MPEG Surround ®.
no transfer function is applied for the direct paths (ipsi-lateral contributions) and additional processing is provided on the indirect paths (spectrum of the counter-lateral transfer function deconvolved by the function ipsi-lateral transfer)
the untreated part of the stereo submix ipsilateral contributions
the above can be generalized to any type of downmix processing.
downmix processing to two channels usually involves applying weighting to the channels (virtual speakers), then summing the N channels to two output signals.
Applying binaural spatialization processing to Downmix processing involves applying to the N weighted channels the HRTF filters corresponding to the positions of the N virtual speakers. Since these filters are equal to 1 for the ipsi-lateral contributions, we find the Downmix treatment by applying the sum of the ipsi-lateral contributions.
the signals obtained by a binauralization processing in the sense of the invention are presented as being derived from a sum of Downmix type signals and a stereo signal comprising the localization indices necessary for the brain to perceive the spatialization of the sounds.
This second signal is hereinafter referred to as "Downmix Binaural Additionnel", so that the treatment in the sense of the invention here called “Downmix Binaural” is such that:
a can be a coefficient between 0 and 1.
a listener user can choose the level of coefficient a between 0 and 1, continuously or by switching between 0 and 1 (in "ON-OFF" mode). . So, we can choose a a weighting of the second treatment "Downmix Binaural Additional" global processing using matrix filtering within the meaning of the invention.
This embodiment has the advantage of requiring only a low bandwidth for the transmission of the results of Downmix and DBA processing, from an encoder to a decoder as shown in FIG. 7 described below, by only requesting the bit rate if the result of the DBA treatment is significant compared to the result of the Downmix.
⁇ 0; 0.25; 0.5; 0.75; 1.
This additional signal requires only a small amount of flow to transport it. Indeed, it presents itself as a residual signal, filtered low-pass and thus a priori much less energetic than the Downmix signal. In addition, it has redundancies with the Downmix signal. This property can be exploited advantageously in conjunction with Dolby Surround, Dolby Prologic or MPEG Surround type codecs.
the "Downmix Binaural Additional" signal can then be compressed and transported additionally and / or scalable to the Downmix signal, with little bit rate.
the addition of the two stereo signals allows the listener to take full advantage of the binaural signal with a quality very close to a 5.1 format.
matrix filtering within the meaning of the invention consists in applying, in an advantageous embodiment:
a second processing leading when executed in conjunction with the first processing, to a spatialization of the N virtual loudspeakers respectively associated with the N channels to obtain a bi-channel, binaural or transaural representation.
the application of the second processing is decided optionally (for example as a function of the bit rate, the spatialized rendering capabilities of a terminal, or others).
the first aforementioned treatment can be applied in an encoder communicating with a decoder, while the second treatment is advantageously applied to the decoder.
the treatment management in the sense of the invention may advantageously be conducted by a computer program comprising instructions for implementing the method according to the invention, when this program is executed by a processor, for example with a decoder in particular .
the invention also aims at such a program.
the present invention also relates to a module equipped with a processor and a memory and capable of executing this computer program.
a module in the sense of the invention for the processing of sound data encoded in a subband domain, for binaural or transaural® bi-channel rendering, then comprises means for applying matrix filtering to switch from a representation.
N channel sound with N> 0, to a two-channel representation.
the N-channel sound representation consists of considering N virtual loudspeakers surrounding a listener's head, and, for each virtual loudspeaker of at least part of the loudspeakers:
the applied matrix filtering comprises a multiplicative coefficient defined by the spectrum, in the field of the subbands, of the second transfer function deconvolved by the first transfer function.
Such a module may advantageously be a decoder of the MPEG Surround® type and furthermore include decoding means of the MPEG Surround® type, or may alternatively be implanted in such a decoder.
FIG. 1 shows schematically a restitution on two speakers around the head of a listener
FIG. 2 shows schematically a reproduction of five speakers in 5.1 multi-channel format
FIG. 3A schematically represents the ipsilateral (solid lines) and counter-lateral (dashed lines) paths in multi-channel 5.1 format;
FIG. 3B shows a prior art processing scheme for switching from a multi-channel 5.1 format illustrated in Fig. 3A to a binaural or transaural format
FIG. 4A schematically represents the ipsilateral (solid lines) and contra-lateral (dashed lines) paths in multi-channel 5.1 format, with the ipsilateral and counter-lateral paths of the central loudspeaker;
FIG. 4B represents a processing diagram for the transition from a multi-channel 5.1 format illustrated in FIG. 4A to a binaural or transaural format, with only four filters in an embodiment within the meaning of the invention;
FIG. 5 illustrates a treatment equivalent to the application of one of the filters of FIG. 4B
FIG. 6 illustrates an additional processing of high pass filtering and automatic gain control to be applied to the outputs SG and SD to avoid a color distortion and a difference in tone between a "downmix" treatment and a treatment according to the invention. invention
FIG. 7 illustrates the situation of a treatment in the sense of the invention, made with the encoder in an exemplary embodiment of the invention, particularly in the case of an additional DBA treatment to be combined with the Downmix treatment.
FIG. 4A is firstly described to describe an example of implementation of the processing to switch from a multi-channel representation (format 5.1 in the example described) to a binaural or transaural stereo two-channel representation. .
a multi-channel representation format 5.1 in the example described
a binaural or transaural stereo two-channel representation format 5.1 in the example described
five speakers configured in 5.1 format are illustrated:
the channels associated with speaker positions for example the AVG and ARG loudspeakers of FIG. 4A
speaker positions for example the AVG and ARG loudspeakers of FIG. 4A
first hemisphere with respect to the listener that of the left ear OG
second hemisphere relative to the listener that of his right ear OD
first and second hemispheres are separated by the median plane of the listener.
the additional treatment preferably comprises the application of filtering (C / I) AVG, (C / I) A VD, (C / I) ARG, (C / I) ARD (FIG. 4B) defined, in the coded domain (or transformed) by the spectrum of a counter-lateral acoustic transfer function deconvolved by an ipsilateral transfer function.
the ipsi-lateral transfer function is associated with a direct acoustic path Uvc IAVD, RG, URD (FIG.
the spatialization of the virtual speaker is provided by a pair of transfer functions HRTF (expressed in the frequency domain) or HRIR (expressed in the time domain). These transfer functions translate the ipsi-lateral path (direct path between the loudspeaker and the closest ear in solid lines in FIG. 4 A) and the contra-lateral path (path between the loudspeaker and the ear masked by the listener's head in dashed lines in Figure 4A).
the filter associated with the ipsi-lateral path is advantageously omitted and a filter corresponding to the transfer function is used for the contra-lateral path. counter-lateral deconvolved by the ipsilateral transfer function. Thus, for each virtual speaker (except the central speaker C), only one filter is used.
the referenced filter (C / I) ARG is defined, in the transformed domain, by the spectrum of the counter-lateral transfer function of the path between the rear left speaker ARG and the right ear OD deconvolved by the function of ipsi-lateral transfer of the path between the left rear loudspeaker ARG and the left ear OG of the individual, - the filter referenced (C / I) A RD is defined, in the transformed domain, by the spectrum of the function counter-lateral transfer of the path between the rear right speaker
a VG is defined in the transformed domain, by the spectrum of the counter-lateral transfer function of the path between the left lateral loudspeaker AVG and the right ear OD deconvolved by the ipsilateral transfer function of the path between the AVG left lateral speaker and the left ear OG of the individual, and
the referenced filter (C / I) AVD is defined, in the transformed domain, by the spectrum of the counter-lateral transfer function of the path between the right lateral loudspeaker AVD and the left ear OG deconvolved by the function of ipsilateral lateral transfer of the path between the right lateral speaker AVD and the right ear OD of the individual.
the signal which, in encoding 5.1, is intended to supply the central loudspeaker C (in the median plane of symmetry of the listener's head), is distributed in two fractions (preferably equal to 50% and 50%) on two channels adding to two respective channels of the left and right side speakers.
the associated signal is mixed with the signals associated with the ARG left rear speaker and ARD right rear speaker.
central loudspeakers front speaker for a reproduction of the midrange frequencies, front speaker for a reproduction of low frequencies, or other
the channel associated with a central speaker position C, in the median plane is divided into a first and a second signal fraction, respectively added to the AVG speaker channel in the first hemisphere (around the ear left OG) and to the AVD loudspeaker channel in the second hemisphere (around the right ear OD), it is not necessary to provide for filtering by the transfer functions associated with the loudspeakers in the plane median, without any change in the perception of the spatialization of the sound stage in binaural or transaural restitution.
the processing complexity is greatly reduced since the filters associated with the loudspeakers located in the median plane are removed. Another advantage is that the coloring effect of the associated signals is reduced.
the spectrum of the contralateral transfer function deconvolved by the ipsilateral transfer function can be defined in the transformed domain by:
the spectrum of the contralateral transfer function deconvolved by the ipsilateral transfer function can be defined, in the transformed domain, by: and being the gain and the phase of the
each filter is equivalent to applying:
an equalizer filter 1 preferably of the low-pass type
interaural delay (or "ITD") 10 to take account of the differences in path between a virtual source and each ear
the unfiltered signal components possibly an attenuation 12 with respect to the unfiltered signal components (for example the AVG component on the SG channel of FIG. 4B).
the applied ITD delay is “substantially” interaural, the term “substantially” referring in particular to the fact that the strict morphology of the listener may not be rigorously taken into account (eg if HRTFs are used by default, including HRTFs called “Kemar head”).
the binaural synthesis of a virtual loudspeaker consists simply of playing without modification the input signal on the relative ipsi-lateral channel (channel SG in FIG. 4B) and to apply to the signal to be played on the counter-lateral channel (SD channel in FIG. 4B) a corresponding AVG filter (C / I) in application of delay, attenuation and low-pass filtering.
the resulting signal is delayed, attenuated and filtered by eliminating the high frequencies, which results, from the point of view of auditory perception, in masking the signal received by the "counter-lateral" ear (OD, in the example where the virtual speaker is the left side AVG), relative to the signal received by the "ipsi-lateral” ear (OG).
the coloration that can be perceived is therefore directly that of the signal received by the ipsilateral ear.
this signal undergoes no transformation and, therefore, the treatment in the sense of the invention should provide only a weak coloration.
a processing of the output signals SG and SD of FIG. 4B can be provided consisting in applying a high-pass filter FPH followed by an automatic gain control AGC.
the high-pass filter is equivalent to applying the "Gain" factor described above, with:
Gain 0.5 if the frequency f is less than 500 Hz and
this factor is applied globally at the output of the signals SG and SD, alternatively from an individual application to each coefficient of the matrix explained below.
the automatic gain control is calibrated on the overall intensity of the signals corresponding to the Downmix treatment, given by:
the gains g and g s are applied globally to the signal C for the gain g and to the signals ARG and ARD for the gain g s .
the energy of the left channel signals S'G and the right channel S ' D is thus limited at the end of this treatment, to the maximum, to the overall energy I D 2 of the signals from the top virtual speakers.
the recovered signals S 'G and S'D can finally be routed to a sound reproduction device in binaural stereophonic mode.
the overall intensity of the signals is usually calculated directly from the energy of the input signals.
this data will be taken into account for the estimation of the intensity l D.
the implementation of the invention results in a suppression of monaural location indices.
the more a source deviates from the median plane the more the interaural indices become predominant to the detriment of the monaural indices.
the angle between the side speakers (or between the rear speakers) is greater than 60 °, Monaural clues have little influence on the perceived position of the virtual speakers.
the difference perceived here is less than the difference that the listener could perceive from the fact that the HRTFs used would not be specific to him (for example models of HRTFs drawn from the so-called "Kemar head” technique). .
the spatial perception of the signal is respected, and this, without bringing color and retaining the timbre of the sound sources.
the solution within the meaning of the present invention divides the number of filters to provide substantially by two and further corrects the coloring effects.
the choice of the position of the virtual loudspeakers can significantly influence the quality of the result of the spatialization. Indeed, it has proved preferable to place the virtual speakers side and rear +/- 45 ° with respect to the median plane, rather than +/- 30 ° of the median plane according to the configuration recommended by the International Telecommunications Union (ITU). Indeed, when the virtual speakers approach the median plane, the ipsi-lateral and contra-lateral HRTF functions tend to resemble each other and the previous simplifications may no longer give a satisfactory spatialization.
ITU International Telecommunications Union
the position of a lateral loudspeaker is advantageously in an angular sector of 10 ° to 90 ° and preferably 30 to 60 ° from a plane of symmetry P and facing the face of the listener. More particularly, the position of a lateral loudspeaker will preferably be close to 45 ° from the plane of symmetry.
a processing module within the meaning of the invention 72 intervenes directly downstream of an encoder 71, to deliver, as indicated previously, processed data according to a treatment of the type:
Downmix + ⁇ DBA (with DBA for "Downmix Binaural Additional").
DBA Downmix Binaural Additional
the coefficients of the matrix are such that:
the global processing matrix H 1 1, k is still expressed as the sum of two matrices:, with
the matrix consists of applying function-based filtering
the present invention is not limited to the embodiment described before by way of example; it extends to other variants.
it has been described above the case of a processing of two initial stereo signals to encode and spatialize to binaural stereo and passing through a 5.1 spatialization.
the SG and SD channels of FIG. 4B may furthermore undergo dynamic low-pass filtering of the Dolby® or other type.
the present invention also relates to a MOD module (FIG. 4B) for processing sound data, for the transition from a multi-channel format to a binaural or transaural format, in the transformed domain, the elements of which could be those illustrated in FIG. 4B.
a module then comprises processing means, such as a PROC processor and a MEM working memory, for the implementation of the invention. It can be implemented in any type of decoder, including a sound reproduction device (PC, walkman, mobile phone, or other) and possibly movie viewing. Alternatively, the module may be designed to operate separately from the restitution, for example to prepare binaural or transaural format content, for subsequent decoding.
the present invention also relates to a computer program, downloadable via a telecommunication network and / or stored in a memory of a processing module of the aforementioned type and / or stored on a memory medium intended to cooperate with a reader of such a module processing, and comprising instructions for the implementation of the invention, when they are executed by a processor of said module.

Landscapes

Physics & Mathematics (AREA)
Engineering & Computer Science (AREA)
Acoustics & Sound (AREA)
Signal Processing (AREA)
Stereophonic System (AREA)

EP10781956A 2009-10-12 2010-10-08 Verarbeitung von in einer subbanddomäne codierten schalldaten Withdrawn EP2489206A1 (de)

Applications Claiming Priority (2)

Application Number	Priority Date	Filing Date	Title
FR0957118		2009-10-12
PCT/FR2010/052119 WO2011045506A1 (fr)	2009-10-12	2010-10-08	Traitement de donnees sonores encodees dans un domaine de sous-bandes

Publications (1)

Publication Number	Publication Date
EP2489206A1 true EP2489206A1 (de)	2012-08-22

Family

ID=42145029

Family Applications (1)

Application Number	Title	Priority Date	Filing Date
EP10781956A Withdrawn EP2489206A1 (de)	2009-10-12	2010-10-08	Verarbeitung von in einer subbanddomäne codierten schalldaten

Country Status (3)

Country	Link
US (1)	US8976972B2 (de)
EP (1)	EP2489206A1 (de)
WO (1)	WO2011045506A1 (de)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
KR101685408B1 (ko) *	2012-09-12	2016-12-20	프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베.	3차원 오디오를 위한 향상된 가이드 다운믹스 능력을 제공하기 위한 장치 및 방법
FR3012247A1 (fr) *	2013-10-18	2015-04-24	Orange	Spatialisation sonore avec effet de salle, optimisee en complexite
WO2015058818A1 (en)	2013-10-22	2015-04-30	Huawei Technologies Co., Ltd.	Apparatus and method for compressing a set of n binaural room impulse responses
CN104681034A (zh)	2013-11-27	2015-06-03	杜比实验室特许公司	音频信号处理
DE102014214052A1 (de) *	2014-07-18	2016-01-21	Bayerische Motoren Werke Aktiengesellschaft	Virtuelle Verdeckungsmethoden
EP2980789A1 (de) *	2014-07-30	2016-02-03	Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.	Vorrichtung und Verfahren zur Verbesserung eines Audiosignals, Tonverbesserungssystem
US9749757B2 (en) *	2014-09-02	2017-08-29	Oticon A/S	Binaural hearing system and method
US9596544B1 (en) *	2015-12-30	2017-03-14	Gregory Douglas Brotherton	Head mounted phased focused speakers
EP3453190A4 (de)	2016-05-06	2020-01-15	DTS, Inc.	Systeme zur immersiven audiowiedergabe
US10979844B2 (en)	2017-03-08	2021-04-13	Dts, Inc.	Distributed audio virtualization systems
WO2018182274A1 (ko) *	2017-03-27	2018-10-04	가우디오디오랩 주식회사	오디오 신호 처리 방법 및 장치
CN108156561B (zh) *	2017-12-26	2020-08-04	广州酷狗计算机科技有限公司	音频信号的处理方法、装置及终端
US11212631B2 (en) *	2019-09-16	2021-12-28	Gaudio Lab, Inc.	Method for generating binaural signals from stereo signals using upmixing binauralization, and apparatus therefor
TWI740206B (zh) *	2019-09-16	2021-09-21	宏碁股份有限公司	訊號量測的校正系統及其校正方法
US20220366919A1 (en) *	2019-09-23	2022-11-17	Dolby Laboratories Licensing Corporation	Audio encoding/decoding with transform parameters
CN112653985B (zh) *	2019-10-10	2022-09-27	高迪奥实验室公司	使用2声道立体声扬声器处理音频信号的方法和设备
CN115865688A (zh) *	2022-11-25	2023-03-28	天津光电通信技术有限公司	一种双通道高速模拟采集回放设备

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
WO2004103023A1 (ja) *	1995-09-26	2004-11-25	Ikuichiro Kinoshita	仮想音像定位用伝達関数表作成方法、その伝達関数表を記録した記憶媒体及びそれを用いた音響信号編集方法
DE69712230T2 (de) *	1997-05-08	2002-10-31	St Microelectronics Asia	Verfahren und gerät zur frequenzdomäneabwärtsumsetzung mit zwangblockschaltung für audiodekoderfunktionen
US6442277B1 (en) *	1998-12-22	2002-08-27	Texas Instruments Incorporated	Method and apparatus for loudspeaker presentation for positional 3D sound
US7505601B1 (en)	2005-02-09	2009-03-17	United States Of America As Represented By The Secretary Of The Air Force	Efficient spatial separation of speech signals
EP1984913A4 (de) *	2006-02-07	2011-01-12	Lg Electronics Inc	Vorrichtung und verfahren zum codieren/decodieren eines signals
KR101358700B1 (ko) *	2006-02-21	2014-02-07	코닌클리케 필립스 엔.브이.	오디오 인코딩 및 디코딩
JP4572945B2 (ja) *	2008-03-28	2010-11-04	ソニー株式会社	ヘッドフォン装置、信号処理装置、信号処理方法
US8321214B2 (en) *	2008-06-02	2012-11-27	Qualcomm Incorporated	Systems, methods, and apparatus for multichannel signal amplitude balancing

2010
- 2010-10-08 US US13/500,955 patent/US8976972B2/en active Active
- 2010-10-08 WO PCT/FR2010/052119 patent/WO2011045506A1/fr active Application Filing
- 2010-10-08 EP EP10781956A patent/EP2489206A1/de not_active Withdrawn

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2011045506A1 *

Also Published As

Publication number	Publication date
WO2011045506A1 (fr)	2011-04-21
US20120201389A1 (en)	2012-08-09
US8976972B2 (en)	2015-03-10

Legal Events

Date	Code	Title	Description
2012-07-20	PUAI	Public reference made under article 153(3) epc to a published international application that has entered the european phase	Free format text: ORIGINAL CODE: 0009012
2012-08-22	17P	Request for examination filed	Effective date: 20120403
2012-08-22	AK	Designated contracting states	Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR
2013-01-16	DAX	Request for extension of the european patent (deleted)
2013-09-25	RAP1	Party data changed (applicant data changed or rights of an application transferred)	Owner name: ORANGE
2015-01-21	17Q	First examination report despatched	Effective date: 20141217
2015-08-23	GRAP	Despatch of communication of intention to grant a patent	Free format text: ORIGINAL CODE: EPIDOSNIGR1
2015-09-23	INTG	Intention to grant announced	Effective date: 20150824
2016-05-27	STAA	Information on the status of an ep patent application or granted ep patent	Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN
2016-06-29	18D	Application deemed to be withdrawn	Effective date: 20160105

Publication	Publication Date	Title
EP2489206A1 (de)	2012-08-22	Verarbeitung von in einer subbanddomäne codierten schalldaten
EP2042001B1 (de)	2009-10-21	Binaurale spatialisierung kompressionsverschlüsselter tondaten
EP1600042B1 (de)	2006-08-09	Verfahren zum bearbeiten komprimierter audiodaten zur räumlichen wiedergabe
JP4874555B2 (ja)	2012-02-15	聴覚情景の後部残響音ベースの合成
CA2820199C (en)	2017-02-28	Signal generation for binaural signals
FR2790634A1 (fr)	2000-09-08	Procede de synthese d'un champ sonore tridimensionnel
EP1566077A1 (de)	2005-08-24	Entzerrung des ausgangssignals in einemstereo-verbreiterungsnetzwerk
WO2007101958A2 (fr)	2007-09-13	Optimisation d'une spatialisation sonore binaurale a partir d'un encodage multicanal
EP1886535B1 (de)	2013-10-16	Verfahren zum herstellen mehrerer zeitsignale
CN101855917A (zh)	2010-10-06	生成具有增强的感知质量的立体声信号的方法和装置
EP2000002A2 (de)	2008-12-10	Verfahren und einrichtung zur effizienten kunstkopf-schallortslokalisierung im transformierten bereich
EP2005420A1 (de)	2008-12-24	Einrichtung und verfahren zur codierung durch hauptkomponentenanalyse eines mehrkanaligen audiosignals
EP3729832B1 (de)	2024-06-26	Verarbeitung eines monophonen signals in einem 3d-audiodecodierer, der binauralen inhalt liefert
JP7286876B2 (ja)	2023-06-05	変換パラメータによるオーディオ符号化／復号化
US11470435B2 (en)	2022-10-11	Method and device for processing audio signals using 2-channel stereo speaker
WO2006075079A1 (fr)	2006-07-20	Procede d’encodage de pistes audio d’un contenu multimedia destine a une diffusion sur terminaux mobiles
CA3142575A1 (en)	2022-06-16	Stereo headphone psychoacoustic sound localization system and method for reconstructing stereo psychoacoustic sound signals using same
EP3920552A1 (de)	2021-12-08	Zentralisierte verarbeitung eines eingangs-audiodatenstroms
Toledo et al.	2008	The role of spectral features in sound localization
KR20060004529A (ko)	2006-01-12	입체 음향을 생성하는 장치 및 방법
WO2017032946A1 (fr)	2017-03-02	Procédé de mesure de filtres phrtf d'un auditeur, cabine pour la mise en oeuvre du procédé, et procédés permettant d'aboutir à la restitution d'une bande sonore multicanal personnalisée
FR3002406A1 (fr)	2014-08-22	Procede et dispositif de generation de signaux d'alimentation destines a un systeme de restitution sonore