EP1600042A1 - Verfahren zum bearbeiten komprimierter audiodaten zur räumlichen wiedergabe - Google Patents

Verfahren zum bearbeiten komprimierter audiodaten zur räumlichen wiedergabe

Info

Publication number
EP1600042A1
EP1600042A1 EP04712070A EP04712070A EP1600042A1 EP 1600042 A1 EP1600042 A1 EP 1600042A1 EP 04712070 A EP04712070 A EP 04712070A EP 04712070 A EP04712070 A EP 04712070A EP 1600042 A1 EP1600042 A1 EP 1600042A1
Authority
EP
European Patent Office
Prior art keywords
signals
matrix
sub
filters
sound
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP04712070A
Other languages
English (en)
French (fr)
Other versions
EP1600042B1 (de
Inventor
Abdellatif Benjelloun Touimi
Marc Emerit
Jean-Marie Pernaux
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Orange SA
Original Assignee
France Telecom SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by France Telecom SA filed Critical France Telecom SA
Publication of EP1600042A1 publication Critical patent/EP1600042A1/de
Application granted granted Critical
Publication of EP1600042B1 publication Critical patent/EP1600042B1/de
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/02Systems employing more than two channels, e.g. quadraphonic of the matrix type, i.e. in which input signals are combined algebraically, e.g. after having been phase shifted with respect to each other
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/008Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/01Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]

Definitions

  • the invention relates to a processing of sound data for a spatialized reproduction of acoustic signals.
  • headphones are preferably used.
  • the constraints of this type of terminal (computing power, memory size) make it difficult to implement sound spatialization techniques.
  • the sound spatialization covers two different types of processing. From a monophonic audio signal, one seeks to give the illusion to a listener that the sound source (s) are at well positioned precise space (which we want to be able to modify in real time), and immersed in a space with specific acoustic properties (reverberation, or other acoustic phenomena such as occlusion). For example, on mobile type telecommunication terminals, it is natural to envisage a sound rendering with a stereophonic headset. The most effective technique for positioning sound sources is then binaural synthesis.
  • HRTFs from the English "Head Related Transfer Functions"
  • HRTFs are therefore functions of a spatial position, more particularly of an azimuth angle ⁇ and an elevation angle ⁇ , and of the sound frequency f.
  • a similar spatialization processing consists of a so-called “transaural” synthesis, in which there are simply more than two speakers in one restitution device (which then takes the form of a helmet with two left and right ear cups).
  • the implementation of this technique is done in so-called "jicanale" form (processing shown schematically in Figure 1 relating to the prior art).
  • the source signal is filtered by the HRTF function of the left ear and by the HRTF function of the right ear.
  • the two left and right channels deliver acoustic signals which are then broadcast to the listener's ears with stereo headphones.
  • This bi-channel binaural synthesis is of the so-called "statigue" type, because in this case, the positions of the sound sources do not change over time.
  • the audio and / or speech streams are transmitted in a compressed coded format.
  • frequency type encoders or by frequency transform
  • the MPEG-2/4 standard are considered below.
  • time / frequency transformation can take the form of a filter bank in frequency sub-bands or an MDCT type transform (for "Modified Discrete Cosine Transfor").
  • subband domain a domain defined in a space of frequency subbands, a domain of a temporal space transformed into frequency or a frequency domain.
  • the conventional method consists in first decoding, carrying out the sound spatialization processing on the time signals, then recoding the resulting signals, for transmission to a reproduction terminal.
  • This tedious succession of steps is often very costly in terms of computing power, the memory required for processing and the algorithmic delay introduced. It is therefore often unsuitable for the constraints imposed by the machines where the processing takes place and for the communication constraints.
  • the present invention improves the situation.
  • One of the aims of the present invention is to propose a method for processing sound data grouping the coding / decoding operations in compression of the audio streams and the spatialization of said streams.
  • Another object of the present invention is to propose a process for processing sound data, by spatialization, which adapts to a variable number (dynamically) of sound sources to be positioned.
  • a general aim of the present invention is to propose a method for processing sound data, by spatialization, allowing a wide distribution of spatialized sound data, in particular a distribution for the general public, the reproduction devices being simply equipped with a decoder of the received signals and with reproduction loudspeakers.
  • a process for processing sound data for a spatialized reproduction of acoustic signals, in which: a) at least one first set and one second set of weighting terms, representative of a direction of perception of said acoustic signal by a listener; b) and said acoustic signals are applied to at least two sets of filter units, arranged in parallel, to deliver at least a first output signal and a second output signal each corresponding to a linear combination of the acoustic signals weighted by One set of weighting terms respectively from the first set and the second set and filtered by said filtering units.
  • Each acoustic signal in step a) of the method within the meaning of the invention is at least partially coded in compression and is expressed in the form of a vector of sub-signals associated with respective frequency sub-bands, and each filtering unit is arranged to perform a matrix filtering applied to each vector, in the space of the frequency sub-bands.
  • each matrix filtering is obtained by conversion, in the space of the frequency sub-bands, of a filter with impulse response (finite or infinite) defined in the time space.
  • Such an impulse response filter is preferably obtained by determining an acoustic transfer function depending on a direction of perception of a sound and the frequency of this sound.
  • these transfer functions are expressed by a linear combination of terms depending on the frequency and weighted by terms depending on the direction, which allows, as indicated above, on the one hand , to process a variable number of acoustic signals in step a) and, on the other hand, to dynamically vary the position of each source over time.
  • such an expression of the transfer functions "integrates" the interaural delay which is conventionally applied to one of the output signals, with respect to the other, before the restitution, in binaural processing.
  • matrices of gain filters associated with each signal are provided.
  • said first and second output signals being preferably intended to be decoded into first and second restitution signals
  • the above-mentioned linear combination already takes account of a time difference between these first and second restitution signals, advantageously.
  • the combination of the techniques of linear decomposition of HRTFs with filtering techniques in the sub-band field makes it possible to take advantage of the advantages of the two techniques to arrive at sound spatialization systems at low complexity and reduced memory for multiple encoded audio signals.
  • direct filtering of signals in the coded domain allows the economy of a complete decoding by audio stream before proceeding to the spatialization of the sources, which implies a considerable gain in complexity.
  • the sound spatialization of audio streams can occur at different points in a transmission chain (servers, network nodes or terminals).
  • the nature of the application and the architecture of the communication used can favor one case or another.
  • the spatialization processing is preferably carried out at the level of the terminals in a decentralized architecture and, on the contrary, at the level of the audio bridge (or MCU for "Mul tipoint Control Uni t") in a centralized architecture.
  • the spatialization can be carried out either in the server or in the terminal, or even during the creation of content.
  • a spatialization processing is preferably provided directly at the level a content server.
  • the present invention can also find applications in the field of the transmission of multiple audio streams. included in structured sound scenes, as provided by the MPEG-4 standard.
  • FIG. 1 schematically illustrates a processing corresponding to a binaural "dual-channel" static synthesis for temporal digital audio signals Si, of the prior art
  • FIG. 2 schematically shows an implementation of binaural synthesis based on the linear decomposition of HRTFs for non-coded temporal digital audio signals, of the prior art
  • FIG. 3 schematically represents a system, within the meaning of the prior art, of binaural spatialization of N audio sources initially coded, then completely decoded for spatialization processing in the time domain and then recoded for transmission to one or more devices restitution, here from a server;
  • FIG. 4 schematically represents a system, within the meaning of the present invention, of binaural spatialization of N audio sources partially decoded for spatialization processing in the sub-band domain and then completely recoded for transmission to one or more restitution, here from a server;
  • FIG. 5 schematically shows a sound spatialization processing in the field of sub-bands, at sense of the invention, based on the linear decomposition of HRTFs in the binaural context;
  • FIG. 6 schematically shows an encoding / decoding process for spatialization, carried out in the sub-band domain and based on a linear decomposition of transfer functions in the ambisonic context, in an alternative embodiment of the invention
  • FIG. 7 schematically represents a binaural spatialization processing of N coded audio sources, within the meaning of the present invention, carried out with a communication terminal, according to a variant of the system of FIG. 4;
  • FIG. 8 schematically shows an architecture of a centralized teleconferencing system, with an audio bridge between a plurality of terminals;
  • FIG. 9 schematically represents a processing, within the meaning of the present invention, of spatialization of (Nl) coded audio sources among N sources at the input of an audio bridge of a system according to FIG. 8, carried out near this audio bridge , according to a variant of the system of the figure.
  • FIG. 1 a conventional treatment of "two-channel" binaural synthesis.
  • This processing consists in filtering the signal of the sources (Si) which one wishes to position at a position chosen in space by the acoustic transfer functions left (HRTF_1) and right (HRTF_r) corresponding to the direction ( ⁇ i, ⁇ i) appropriate.
  • Two signals are obtained which are then added to the left and right signals resulting from the spatialization of other sources, to give the global signals L and R broadcast to the left and right ears of a listener.
  • the number of filters required is then 2.N for a static binaural synthesis and 4.N for a dynamic binaural synthesis, N being the number of audio streams to be spatialized.
  • each HRTF filter is first broken down into a minimum phase filter, characterized by its module, and into a pure delay ⁇ .
  • the spatial and frequency dependencies of the modules of the HRTFs are separated thanks to a linear decomposition.
  • These modules of HRTFs transfer functions are then written as a sum of spatial functions C n ( ⁇ , ⁇ ) and reconstruction filters L n (f), as expressed below:
  • These coefficients have the particularity of depending only on the position [ ⁇ , ⁇ ] where one wishes to place the source, and not on the frequency f. The number of these coefficients depends on the number P of basic vectors that has been kept for reconstruction.
  • the N signals from all the sources weighted by the "directional" coefficient C ⁇ ⁇ are then added (for the right channel and the left channel, separately), then filtered by the filter corresponding to the nth basic vector.
  • the addition of an additional source does not require the addition of two additional filters (often FIR or IIR type).
  • the P basic filters are in fact shared by all the sources present. This implementation is called “multi channel”.
  • the coefficients C n i correspond to the directional coefficients for the source i at the position ( ⁇ i, ⁇ i) and for the reconstruction filter n. They are noted C for the left channel (L) and D for the right channel (R). It is indicated that the principle of processing the right path R is the same as that of the left path L. However, the arrows in dotted lines for the treatment of the right path have not been represented for the sake of clarity of the drawing. Between the two vertical lines in broken lines in FIG. 2, a system denoted I, of the type represented in FIG. 3, is then defined.
  • a first method is based on a so-called Karhunen-Loeve decomposition and is described in particular in document WO94 / 10816.
  • Another method is based on the principal component analysis of HRTFs and is described in WO96 / 13962. The more recent document FR-2782228 also describes such an implementation.
  • a step of decoding the N signals is necessary before the spatialization processing proper.
  • This step requires considerable computing resources (which is problematic on current communication terminals, in particular of portable type). Furthermore, this step causes a delay on the processed signals, which affects the interactivity of the communication. If the transmitted sound scene comprises a large number of sources (N), the decoding step may in fact become more costly in computing resources than the sound spatialization step proper. In fact, as indicated above, the cost of calculating the binaural "multi-channel" synthesis depends very little on the number of sound sources to be spatialized.
  • the spatialization of N sound sources (forming for example part of a complex MPEG4 type sound scene) therefore requires: - a complete decoding of the N audio sources Si, ..., Si, .. ., S ⁇ encoded at the input of the represented system (noted "System I") to obtain N decoded audio streams, corresponding for example to PCM signals (for "Puise Code Modulation”), - a spatialization processing in the time domain (“System T”) to obtain two spatialized signals L and R,
  • the decoding of the N coded streams is necessary before the stage of spatialization of the sound sources, which leads to an increase in the cost of calculation and the addition of a delay due to the processing of the decoder. It says that the initial audio sources are generally stored directly in coded format, in current content servers.
  • the number of signals resulting from the spatialization processing is generally greater than two, which further increases the cost of calculation to completely recode these signals before their transmission by the communication network.
  • FIG. 4 Reference is now made to FIG. 4 to describe an implementation of the method within the meaning of the present invention.
  • this operation mainly consists in recovering the parameters of the sub-bands from the coded binary audio stream. This operation depends on the initial encoder used. It can consist, for example, of an entropy decoding followed by an inverse quantization as in an MPEG-1 Layer III coder. Once these parameters of the sub-bands have been found, the processing is carried out in the domain of the sub-bands, as will be seen below.
  • the overall calculation cost of the spatialization operation of the coded audio streams is then considerably reduced. Indeed, the initial decoding operation in a conventional system is replaced by a partial decoding operation of much lower complexity.
  • the computing load in a system within the meaning of the invention becomes substantially constant as a function of the number of audio streams that it is desired to spatialize. Compared to conventional systems, a gain is obtained in terms of computation cost which then becomes proportional to the number of audio streams that one wishes to spatialize.
  • the partial decoding operation results in a lower processing time than the full decoding operation, which is particularly interesting in an interactive communication context.
  • System II The system for implementing the method according to the invention, performing the spatialization in the sub-band domain, is denoted "System II" in FIG. 4.
  • the binaural transfer functions or HRTFs are accessible in the form of temporal impulse responses. These functions generally consist of 256 time samples, at a sampling frequency of 44.1 kHz (typical in the audio field). These impulse responses can come from measurements or acoustic simulations.
  • the pre-processing steps for obtaining the parameters in the sub-band domain are preferably the following:
  • G is a matrix of filters.
  • the D directional coefficients C n i, D n i to be applied in the domain of the sub-bands are scalars of the same values as the C n i and D n i respectively in the time domain);
  • the filter matrices Gi applied independently to each source "integrate" a conventional delay calculation operation for adding the interaural delay between a signal Li and a signal R ⁇ to return.
  • delay lines ⁇ ⁇ FIG. 2
  • the dependency relationship between the aliasing components of the different sub-bands is preferably preserved during the filtering operation so that their removal is ensured by the bank of synthesis filters.
  • critical sampling means that the number of all the output samples of the sub-bands corresponds to the number of samples in input. This filter bank is also supposed to satisfy the condition for perfect reconstruction.
  • the complete filtering matrix is then calculated in sub-bands by the following formula:
  • K (L / M) -1 (characterizing the bank of filters used)
  • L being the length of the analysis and synthesis filters of the filter banks used.
  • corresponds to the number of bands which overlap enough on one side with the bandwidth of a filter in the filter bank. It therefore depends on the type of filter banks used in the chosen coding. For example, for the MDCT filter bank, ⁇ can be taken equal to 2 or 3. For the Pseudo-QMF filter bank of MPEG-1 coding, ⁇ is taken equal to 1.
  • the result of this transposition of a finite or infinite impulse response filter to the domain of the subbands is a matrix of filters of size MxM.
  • MxM filters of size
  • the filters of the main diagonal and of a few adjacent sub-diagonals can be used to obtain a result similar to that obtained by filtering in the time domain (without thereby altering the quality of the reproduction).
  • the matrix S sb (z) resulting from this transposition, then reduced, is that used for the filtering in sub-bands.
  • the expression of the polyphase matrices E (z) and R (z) for an MDCT filter bank is indicated below. / 4 AAC, or Dolby AC-2 & AC-3, or TDAC of the Applicant.
  • the following processing can also be adapted to a Pseudo-QMF type filter bank of the MPEG-1/2 Layer I-II coder.
  • R ⁇ z) J M T ⁇ + J M ⁇ [z- 1 , where JM corresponds to the anti-identity matrix of size MM and T 0 and 1) are matrices of size MxM resulting from the following partition:
  • the polyphase analysis matrix is then expressed as follows:
  • the values of the window (-1) 'h (2lM + k) are typically provided, with 0 ⁇ k ⁇ 2M -l, 0 ⁇ l ⁇ m -l.
  • partial decoding of N audio sources S ⁇ , ... / Si, ... S N coded in compression is carried out, to obtain signals S ⁇ l ..., If f ... f S N preferably corresponding to signal vectors whose coefficients are values each assigned to a sub-band.
  • partial decoding is understood to mean a processing which makes it possible to obtain from the coded signals in compression such signal vectors in the field of sub-bands. We can also obtain position information from which are deduced the respective gain values G ⁇ f ...
  • the spatialization processing is carried out in a server connected to a communication network.
  • these signal vectors L and R can be completely recoded in compression to broadcast the compressed signals L and R (left and right channels) in the communication network and intended for the restitution terminals.
  • an initial step of partial decoding of the coded signals Si is provided, before the spatialization processing.
  • this step is much less expensive and faster than the complete decoding operation which was necessary in the prior art ( Figure 3).
  • the L and R signal vectors are already expressed in the sub-band domain and the partial recoding of FIG. 4 to obtain the coded signals in L and R compression is faster and less costly than a complete coding such as shown in figure 3. It is indicated that the two vertical broken lines in FIG. 5 delimit the spatialization processing carried out in “System II” in FIG. 4.
  • the present invention also relates to such a system comprising means for processing partially coded signals If, for the implementation of the method according to the invention.
  • This last document presents a method for transposing a finite impulse response (FIR) filter in the sub-band domain of pseudo-QMF filter banks of the MPEG-1 Layer I-II coder and MDCT of the MPEG-2/4 coder AAC.
  • the equivalent filtering operation in the sub-band domain is represented by a matrix of FIR filters.
  • this proposal fits in the context of a transposition of HRTFs filters, directly in their classical form and not in the form of a linear decomposition as expressed by the equation Eq [l] above and on a basis of filters in the sense of the invention.
  • a drawback of the method within the meaning of this last document consists in that the spatialization processing cannot be adapted to any number of sources or encoded audio streams to be spatialized.
  • each HRTF filter (of order 200 for an FIR and of order 12 for an IIR) gives rise to a matrix of filters (square) of dimension equal to the number of sub-bands of the bank of filters used.
  • an adaptation of a linear decomposition of HRTFs in the sub-band domain does not present this problem since the number (P) of matrices of basic filters L n and R n is much more reduced.
  • These matrices are then permanently stored in a memory (of the content server or of the playback terminal) and allow simultaneous spatialization processing of any number of sources, as shown in FIG. 5.
  • a generalization of the spatialization processing in the sense of FIG. 5 is described below to other processing of sound rendering, such as a processing called "ambisonic encoding".
  • a sound rendering system can generally be in the form of a real or virtual sound recording system (for a simulation) consisting of an encoding of the sound field. This phase consists in recording p sound signals in a real way or in simulating such signals (virtual encoding) corresponding to the whole of a sound scene comprising all the sounds, as well as a room effect.
  • the aforementioned system can also be in the form of a sound rendering system consisting in decoding the signals coming from the sound pickup to adapt them to the sound rendering translator devices (such as a plurality of speakers or a stereo headphones).
  • the p signals are transformed into n signals which supply the n loudspeakers.
  • binaural synthesis consists in taking a real sound recording, using a pair of microphones introduced into the ears of a human head (artificial or real).
  • N audio streams Sj represented in the sub-band domain after partial decoding undergo spatialization processing, for example ambisonic encoding, to deliver p signals Ei encoded in the sub-band domain .
  • spatialization processing therefore respects the general case governed by the equation Eq [2] above.
  • the application to the signals Sj of the matrix of filters Gy (to define the interaural delay ITD) is no longer necessary here, in the ambisonic context.
  • the filters K j i (f) are fixed and depend, at constant frequency, only on the sound rendering system and its arrangement with respect to a listener. This situation is shown in Figure 6 (to the right of the vertical dotted line), in the example of the ambisonic context.
  • the Ei signals spatially encoded in the subband domain are completely recoded in compression, transmitted in a communication network, recovered in a rendering terminal, partially decoded in compression to obtain a representation in the subband domain. bands.
  • an encoding format with three signals W, X, Y for p sound sources is expressed, for encoding, by:
  • Table I values of the coefficients defining the filters K ⁇ (f) for 0 ⁇ f ⁇ f
  • Table II values of the coefficients defining the filters K ⁇ (f) for f ⁇ f ⁇ f 2
  • coded signals (Si) emanate from N remote terminals. They are spatialized at the level of the teleconference server (for example at the level of an audio bridge for a star architecture as represented in FIG. 8), for each participant. This step, carried out in the sub-band domain after a partial decoding phase, is followed by a partial recoding.
  • the signals thus coded in compression are then transmitted via the network and, upon reception by a rendering terminal, are decoded completely in compression and applied to the two left and right channels 1 and r, respectively, of the rendering terminal, in the case of binaural spatialization.
  • the decoding processing in compression thus makes it possible to deliver two time signals left and right which contain the information of positions of N distant speakers and which supply two respective loudspeakers (headset with two headsets).
  • m channels can be recovered at the output of the communication server, if the encoding / decoding in spatialization are carried out by the server.
  • This spatialization can be static or dynamic and, moreover, interactive. Thus, the position of the speakers is fixed or may vary over time. If the spatialization is not interactive, the position of the different speakers is fixed: the listener cannot modify it. On the other hand, if the spatialization is interactive, each listener can configure their terminal to position the voice of the N other speakers where he wishes, substantially in real time.
  • the reproduction terminal receives ⁇ audio streams (Si) coded in compression (MPEG, AAC, or other) from a communication network.
  • the terminal After a partial decoding to obtain the signal vectors (Si), the terminal (“JJ System") processes these signal vectors to spatialize the audio sources, here in binaural synthesis, in two signal vectors L and R which are then applied to banks synthesis filters for decoding in compression.
  • the left and right PCM signals, respectively 1 and r, resulting from this decoding are then intended to supply directly to the loudspeakers.
  • This type of processing advantageously adapts to a decentralized teleconferencing system (several terminals connected in point-to-point mode).
  • This scene can be simple, or even complex as often in the context of MPEG-4 transmissions where the sound scene is transmitted in a structured format.
  • the client terminal receives, from a multimedia server, a multiplex bit stream corresponding to each of the coded primitive audio objects, as well as instructions as to their composition for reconstructing the sound scene.
  • "Audio object” means an elementary bit stream obtained by an MPEG-4 Audio coder.
  • the MPEG-4 System standard provides a special format, called "AudioBIFS" (for "BInary Format for Scene description”), in order to transmit these instructions.
  • the role of this format is to describe the spatiotemporal composition of audio objects.
  • these different decoded streams can undergo further processing.
  • a sound spatialization processing step can be carried out.
  • the manipulations to be performed are represented by a graph.
  • the decoded audio signals at the input of the graph are provided.
  • Each node of the graph represents a type of processing to be carried out on an audio signal.
  • the various sound signals are provided at the output of the graph to be restored or to be associated with other media objects (images or other).
  • transform coders used mainly for high quality audio transmission. (monophonic and multi-channel). This is the case for AAC and TwinVQ encoders based on the MDCT transform.
  • the low decoding layer In a receiving MPEG-4 terminal, it then suffices to integrate the low decoding layer at the nodes of the upper layer which provides specific processing, such as binaural spatialization by HRTFs filters.
  • the nodes of the "AudioBIFS" graph which involve binaural spatialization can be treated directly in the field of sub-bands (MDCT for example).
  • MDCT sub-bands
  • the processing of the signals for spatialization can only be carried out at the audio bridge.
  • the terminals TER1, TER2, TER3 and TER4 receive flows already mixed and therefore no processing can be carried out at their level for spatialization.
  • the audio bridge must carry out a spatialization of the speakers coming from the terminals for each of the N subsets made up of (Nl) speakers among the N participating in the conference. Processing in the coded field naturally brings more benefit.
  • FIG. 9 schematically represents the processing system provided in the audio bridge. This processing is thus carried out on a subset of (Nl) audio signals coded among the N at the input of the bridge.
  • the left and right coded audio frames in the case of binaural spatialization, or the m coded audio frames in the case of a general spatialization (for example in ambisonic encoding) as represented in FIG. 9, which result from this processing are thus transmitted to the remaining terminal which participates in the teleconference but which is not included in this subset (corresponding to an "audio terminal").
  • N processing operations of the type described above are carried out in the audio bridge (N subsets of (Nl) coded signals). It is indicated that the partial coding in FIG.
  • the position of the sound source to be spatialized can vary over time, which amounts to varying over time the directional coefficients of the domain of the subbands n i and D ⁇ .
  • the variation of the value of these coefficients is preferably done in a discrete manner.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Mathematical Optimization (AREA)
  • Algebra (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Stereophonic System (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Television Signal Processing For Recording (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
EP04712070A 2003-02-27 2004-02-18 Verfahren zum bearbeiten komprimierter audiodaten zur räumlichen wiedergabe Expired - Lifetime EP1600042B1 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
FR0302397 2003-02-27
FR0302397A FR2851879A1 (fr) 2003-02-27 2003-02-27 Procede de traitement de donnees sonores compressees, pour spatialisation.
PCT/FR2004/000385 WO2004080124A1 (fr) 2003-02-27 2004-02-18 Procede de traitement de donnees sonores compressees, pour spatialisation

Publications (2)

Publication Number Publication Date
EP1600042A1 true EP1600042A1 (de) 2005-11-30
EP1600042B1 EP1600042B1 (de) 2006-08-09

Family

ID=32843028

Family Applications (1)

Application Number Title Priority Date Filing Date
EP04712070A Expired - Lifetime EP1600042B1 (de) 2003-02-27 2004-02-18 Verfahren zum bearbeiten komprimierter audiodaten zur räumlichen wiedergabe

Country Status (7)

Country Link
US (1) US20060198542A1 (de)
EP (1) EP1600042B1 (de)
AT (1) ATE336151T1 (de)
DE (1) DE602004001868T2 (de)
ES (1) ES2271847T3 (de)
FR (1) FR2851879A1 (de)
WO (1) WO2004080124A1 (de)

Families Citing this family (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100606734B1 (ko) 2005-02-04 2006-08-01 엘지전자 주식회사 삼차원 입체음향 구현 방법 및 그 장치
DE102005010057A1 (de) * 2005-03-04 2006-09-07 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Vorrichtung und Verfahren zum Erzeugen eines codierten Stereo-Signals eines Audiostücks oder Audiodatenstroms
US8345890B2 (en) 2006-01-05 2013-01-01 Audience, Inc. System and method for utilizing inter-microphone level differences for speech enhancement
US8194880B2 (en) 2006-01-30 2012-06-05 Audience, Inc. System and method for utilizing omni-directional microphones for speech enhancement
US8204252B1 (en) 2006-10-10 2012-06-19 Audience, Inc. System and method for providing close microphone adaptive array processing
US8744844B2 (en) 2007-07-06 2014-06-03 Audience, Inc. System and method for adaptive intelligent noise suppression
US9185487B2 (en) 2006-01-30 2015-11-10 Audience, Inc. System and method for providing noise suppression utilizing null processing noise subtraction
KR100754220B1 (ko) 2006-03-07 2007-09-03 삼성전자주식회사 Mpeg 서라운드를 위한 바이노럴 디코더 및 그 디코딩방법
EP1994526B1 (de) * 2006-03-13 2009-10-28 France Telecom Gemeinsame schallsynthese und -spatialisierung
EP1994796A1 (de) * 2006-03-15 2008-11-26 Dolby Laboratories Licensing Corporation Binaurales rendering mit subbandfiltern
FR2899423A1 (fr) * 2006-03-28 2007-10-05 France Telecom Procede et dispositif de spatialisation sonore binaurale efficace dans le domaine transforme.
US8266195B2 (en) * 2006-03-28 2012-09-11 Telefonaktiebolaget L M Ericsson (Publ) Filter adaptive frequency resolution
US8204253B1 (en) 2008-06-30 2012-06-19 Audience, Inc. Self calibration of audio device
US8949120B1 (en) 2006-05-25 2015-02-03 Audience, Inc. Adaptive noise cancelation
US8150065B2 (en) 2006-05-25 2012-04-03 Audience, Inc. System and method for processing an audio signal
US8934641B2 (en) 2006-05-25 2015-01-13 Audience, Inc. Systems and methods for reconstructing decomposed audio signals
US8849231B1 (en) 2007-08-08 2014-09-30 Audience, Inc. System and method for adaptive power control
US8259926B1 (en) 2007-02-23 2012-09-04 Audience, Inc. System and method for 2-channel and 3-channel acoustic echo cancellation
US20080273708A1 (en) * 2007-05-03 2008-11-06 Telefonaktiebolaget L M Ericsson (Publ) Early Reflection Method for Enhanced Externalization
US8189766B1 (en) 2007-07-26 2012-05-29 Audience, Inc. System and method for blind subband acoustic echo cancellation postfiltering
JP2009128559A (ja) * 2007-11-22 2009-06-11 Casio Comput Co Ltd 残響効果付加装置
US8180064B1 (en) 2007-12-21 2012-05-15 Audience, Inc. System and method for providing voice equalization
US8143620B1 (en) 2007-12-21 2012-03-27 Audience, Inc. System and method for adaptive classification of audio sources
US8194882B2 (en) 2008-02-29 2012-06-05 Audience, Inc. System and method for providing single microphone noise suppression fallback
US8355511B2 (en) 2008-03-18 2013-01-15 Audience, Inc. System and method for envelope-based acoustic echo cancellation
US8521530B1 (en) 2008-06-30 2013-08-27 Audience, Inc. System and method for enhancing a monaural audio signal
US8774423B1 (en) 2008-06-30 2014-07-08 Audience, Inc. System and method for controlling adaptivity of signal modification using a phantom coefficient
KR101496760B1 (ko) * 2008-12-29 2015-02-27 삼성전자주식회사 서라운드 사운드 가상화 방법 및 장치
US8639046B2 (en) * 2009-05-04 2014-01-28 Mamigo Inc Method and system for scalable multi-user interactive visualization
CN102577441B (zh) * 2009-10-12 2015-06-03 诺基亚公司 用于音频处理的多路分析
US9838784B2 (en) 2009-12-02 2017-12-05 Knowles Electronics, Llc Directional audio capture
US8786852B2 (en) 2009-12-02 2014-07-22 Lawrence Livermore National Security, Llc Nanoscale array structures suitable for surface enhanced raman scattering and methods related thereto
US8718290B2 (en) 2010-01-26 2014-05-06 Audience, Inc. Adaptive noise reduction using level cues
US9008329B1 (en) 2010-01-26 2015-04-14 Audience, Inc. Noise reduction using multi-feature cluster tracker
US9378754B1 (en) 2010-04-28 2016-06-28 Knowles Electronics, Llc Adaptive spatial classifier for multi-microphone systems
US9395304B2 (en) 2012-03-01 2016-07-19 Lawrence Livermore National Security, Llc Nanoscale structures on optical fiber for surface enhanced Raman scattering and methods related thereto
US9491299B2 (en) * 2012-11-27 2016-11-08 Dolby Laboratories Licensing Corporation Teleconferencing using monophonic audio mixed with positional metadata
US9536540B2 (en) 2013-07-19 2017-01-03 Knowles Electronics, Llc Speech signal separation and synthesis based on auditory scene analysis and speech modeling
FR3009158A1 (fr) * 2013-07-24 2015-01-30 Orange Spatialisation sonore avec effet de salle
DE102013223201B3 (de) * 2013-11-14 2015-05-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Verfahren und Vorrichtung zum Komprimieren und Dekomprimieren von Schallfelddaten eines Gebietes
CN107112025A (zh) 2014-09-12 2017-08-29 美商楼氏电子有限公司 用于恢复语音分量的***和方法
US10249312B2 (en) * 2015-10-08 2019-04-02 Qualcomm Incorporated Quantization of spatial vectors
US9820042B1 (en) 2016-05-02 2017-11-14 Knowles Electronics, Llc Stereo separation and directional suppression with omni-directional microphones
US10598506B2 (en) * 2016-09-12 2020-03-24 Bragi GmbH Audio navigation using short range bilateral earpieces
FR3065137B1 (fr) 2017-04-07 2020-02-28 Axd Technologies, Llc Procede de spatialisation sonore

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5583962A (en) * 1991-01-08 1996-12-10 Dolby Laboratories Licensing Corporation Encoder/decoder for multidimensional sound fields
KR100206333B1 (ko) * 1996-10-08 1999-07-01 윤종용 두개의 스피커를 이용한 멀티채널 오디오 재생장치및 방법
US7116787B2 (en) * 2001-05-04 2006-10-03 Agere Systems Inc. Perceptual synthesis of auditory scenes

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2004080124A1 *

Also Published As

Publication number Publication date
ES2271847T3 (es) 2007-04-16
DE602004001868D1 (de) 2006-09-21
EP1600042B1 (de) 2006-08-09
FR2851879A1 (fr) 2004-09-03
WO2004080124A1 (fr) 2004-09-16
ATE336151T1 (de) 2006-09-15
DE602004001868T2 (de) 2007-03-08
US20060198542A1 (en) 2006-09-07

Similar Documents

Publication Publication Date Title
EP1600042B1 (de) Verfahren zum bearbeiten komprimierter audiodaten zur räumlichen wiedergabe
EP2042001B1 (de) Binaurale spatialisierung kompressionsverschlüsselter tondaten
EP2374123B1 (de) Verbesserte codierung von mehrkanaligen digitalen audiosignalen
JP5090436B2 (ja) 変換ドメイン内で効率的なバイノーラルサウンド空間化を行う方法およびデバイス
EP2143102B1 (de) Verfahren zur audiokodierung und -dekodierung, audiokodierer, audiodekodierer und zugehörige computerprogramme
WO2007101958A2 (fr) Optimisation d'une spatialisation sonore binaurale a partir d'un encodage multicanal
EP2374124A1 (de) Verwaltete codierung von mehrkanaligen digitalen audiosignalen
FR2875351A1 (fr) Procede de traitement de donnees par passage entre domaines differents de sous-bandes
EP2005420A1 (de) Einrichtung und verfahren zur codierung durch hauptkomponentenanalyse eines mehrkanaligen audiosignals
EP1695335A1 (de) Verfahren zum synthetisieren akustischer spazialisierung
EP2319037A1 (de) Rekonstruktion von mehrkanal-audiodaten
EP3935629A1 (de) Räumliche audiocodierung mit interpolation und quantifizierung von drehungen
EP3025514B1 (de) Klangverräumlichung mit raumwirkung
EP1994526B1 (de) Gemeinsame schallsynthese und -spatialisierung
WO2006075079A1 (fr) Procede d’encodage de pistes audio d’un contenu multimedia destine a une diffusion sur terminaux mobiles
Touimi et al. Efficient method for multiple compressed audio streams spatialization
EP4042418B1 (de) Bestimmung von korrekturen zur anwendung auf ein mehrkanalaudiosignal, zugehörige codierung und decodierung
WO2022003275A1 (fr) Codage optimise d'une information representative d'une image spatiale d'un signal audio multicanal
Pernaux Efficient Method for Multiple Compressed Audio Streams Spatialization

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20050825

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL LT LV MK

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

RIN1 Information on inventor provided before grant (corrected)

Inventor name: PERNAUX, JEAN-MARIE

Inventor name: BENJELLOUN TOUIMI, ABDELLATIF

Inventor name: EMERIT, MARC

DAX Request for extension of the european patent (deleted)
GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20060809

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT;WARNING: LAPSES OF ITALIAN PATENTS WITH EFFECTIVE DATE BEFORE 2007 MAY HAVE OCCURRED AT ANY TIME BEFORE 2007. THE CORRECT EFFECTIVE DATE MAY BE DIFFERENT FROM THE ONE RECORDED.

Effective date: 20060809

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20060809

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20060809

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20060809

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20060809

Ref country code: IE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20060809

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20060809

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20060809

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

Free format text: NOT ENGLISH

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

Free format text: LANGUAGE OF EP DOCUMENT: FRENCH

REF Corresponds to:

Ref document number: 602004001868

Country of ref document: DE

Date of ref document: 20060921

Kind code of ref document: P

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20061109

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20061109

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20061109

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20070109

GBT Gb: translation of ep patent filed (gb section 77(6)(a)/1977)

Effective date: 20061220

NLV1 Nl: lapsed or annulled due to failure to fulfill the requirements of art. 29p and 29m of the patents act
PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MC

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20070228

REG Reference to a national code

Ref country code: IE

Ref legal event code: FD4D

REG Reference to a national code

Ref country code: ES

Ref legal event code: FG2A

Ref document number: 2271847

Country of ref document: ES

Kind code of ref document: T3

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20070510

BERE Be: lapsed

Owner name: FRANCE TELECOM

Effective date: 20070228

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20070228

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20061110

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20060809

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20080229

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20080229

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20060809

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20070218

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: TR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20060809

Ref country code: HU

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20070210

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 12

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 13

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 14

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 15

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20230119

Year of fee payment: 20

Ref country code: ES

Payment date: 20230301

Year of fee payment: 20

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: IT

Payment date: 20230120

Year of fee payment: 20

Ref country code: GB

Payment date: 20230121

Year of fee payment: 20

Ref country code: DE

Payment date: 20230119

Year of fee payment: 20

REG Reference to a national code

Ref country code: DE

Ref legal event code: R071

Ref document number: 602004001868

Country of ref document: DE

REG Reference to a national code

Ref country code: ES

Ref legal event code: FD2A

Effective date: 20240226

REG Reference to a national code

Ref country code: GB

Ref legal event code: PE20

Expiry date: 20240217

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: ES

Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION

Effective date: 20240219

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: ES

Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION

Effective date: 20240219

Ref country code: GB

Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION

Effective date: 20240217