EP2268064A1 - Device and method for converting spatial audio signal - Google Patents

Device and method for converting spatial audio signal Download PDF

Info

Publication number
EP2268064A1
EP2268064A1 EP09163760A EP09163760A EP2268064A1 EP 2268064 A1 EP2268064 A1 EP 2268064A1 EP 09163760 A EP09163760 A EP 09163760A EP 09163760 A EP09163760 A EP 09163760A EP 2268064 A1 EP2268064 A1 EP 2268064A1
Authority
EP
European Patent Office
Prior art keywords
audio
virtual loudspeaker
frequency
input signal
signals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP09163760A
Other languages
German (de)
French (fr)
Inventor
Svein Berge
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Berges Allmenndigitale Radgivningstjeneste
Original Assignee
Berges Allmenndigitale Radgivningstjeneste
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Berges Allmenndigitale Radgivningstjeneste filed Critical Berges Allmenndigitale Radgivningstjeneste
Priority to EP09163760A priority Critical patent/EP2268064A1/en
Priority to EP10167042.0A priority patent/EP2285139B1/en
Priority to ES10167042.0T priority patent/ES2690164T3/en
Priority to PL10167042T priority patent/PL2285139T3/en
Priority to US12/822,015 priority patent/US8705750B2/en
Publication of EP2268064A1 publication Critical patent/EP2268064A1/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/002Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/01Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/07Synergistic effects of band splitting and sub-band processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/11Application of ambisonics in stereophonic audio systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/002Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
    • H04S3/004For headphones

Definitions

  • the invention relates to the field of audio signal processing. More specifically, the invention provides a processor and a method for converting a multi-channel audio signal, such as a sound field signal, into another type of multi-channel audio signal suited for playback via headphones or loudspeakers, while preserving spatial information in the original signal.
  • a multi-channel audio signal such as a sound field signal
  • filter banks are used to split each component of the spatial sound field set into a set of frequency bands.
  • the short-term correlation between the W (omni) channel and each of the three other bands is then used to estimate the direction of arrival of sound within each frequency band.
  • the input signal is split in two parts: One consisting of the frequency bands where a clear direction of arrival was detected, and one consisting of the remainder of the frequency bands.
  • the first part of the signal is processed through head-related transfer functions corresponding to the estimated direction of arrival in each frequency band.
  • the second part is processed through a linear decoding matrix and a fixed set of head-related transfer functions corresponding to a virtual loudspeaker array.
  • US 6,628,787 by Lake Technology Ltd. describes a specific method for creating a multi-channel of binaural signal from a B-format sound-field signal.
  • the sound-field signal is split into frequency bands, and in each band a direction factor is determined.
  • speaker drive signals are computed foe each band by panning the signals to drive the nearest speaker.
  • residual signal components are apportioned to the speaker signals by means of known decoding techniques.
  • the direction estimate is generally missing or incorrect in the case where more than a single sound source emits sound at the same time and within the same frequency band. This leads to imprecise or incorrect localization when there is more than one sound source present and when echoes interfere with the direct sound from a single source.
  • the use of head-related transfer functions from different directions in different frequency bands leads to a phase mismatch which increases proportionally with frequency. This in turn leads to two problems: Firstly, the group delay of high-frequency input signals does not correspond to the group delay encoded in the head-related transfer functions. This gives the wrong inter-aural time difference and therefore inaccurate localization. Secondly, the temporal evolution of the input signal is distorted, leading to poor reproduction of such transient sounds as applause and percussion instruments.
  • a processor and a method for converting a multi-channel audio input such as a B-format sound field input into an audio output suited for playback over headphones or via loudspeakers, while still preserving the substantial spatial information contained in the original multi-channel input.
  • the invention provides an audio processor arranged to convert a multi-channel audio input signal comprising at least two channels, such as a B-format Sound Field signal, into a set of audio output signals, such as a set of two audio output signals arranged for headphone reproduction, the audio processor comprising
  • Such audio processor provides an advantageous conversion of the multi-channel input signal due to the combination of plane wave expansion extraction of directions for dominant sound sources for each frequency band and the selection of at least one virtual loudspeaker position coinciding with a direction for at least one dominant sound source.
  • this provides a virtual loudspeaker signal highly suited for generation of a binaural output signal by applying Head-Related Transfer Functions to the virtual loudspeaker signals.
  • Head-Related Transfer Functions to the virtual loudspeaker signals.
  • the dominant sound source will be reproduced through two sets of Head-Related Transfer Functions corresponding to the two fixed virtual loudspeaker positions which results in a rather blurred spatial image of the dominant sound source.
  • the dominant sound source will be reproduced through one set of Head-Related Transfer Functions corresponding to its actual direction, thereby resulting in an optimal reproduction of the 3D spatial information contained in the original input signal.
  • the audio processor is arranged to generate the set of audio output signals such that it is arranged for playback over headphones, e.g. by applying Head-Related Transfer Functions, or other known ways of creating a spatial effects based on a single input signal and its direction.
  • the filter bank may comprise at least 500, such as 1000 to 5000, preferably partially overlapping filters covering the frequency range of 0 Hz to 22 kHz.
  • 500 such as 1000 to 5000
  • an FFT analysis with a window length of 2048 to 8192 samples, i.e. 1024-4096 bands covering 0-22050 Hz may be used.
  • the invention may be performed also with fewer filters, in case a reduced performance is accepted.
  • the sound source separation unit preferably determines the at least one dominant direction in each frequency band for each time frame, such as a time frame having a size of 2,000 to 10,000 samples, e.g. 2048-8192, as mentioned. However, it is to be understood that a lower update of the dominant direction may be used, in case a reduced performance is accepted.
  • the filter outputs from two consecutive time frames should be sent to the sound source separation unit. This is preferably achieved with a plurality of delay elements.
  • the virtual loudspeaker positions may be selected by a rotation of a set of at least two positions in a fixed spatial interrelation.
  • the set of positions in a fixed spatial interrelation comprises four positions, such as four positions arranged in a tetrahedron, which has been found to provide an accurate localization of the sound sources without increasing the noise level of the recorded signal.
  • the plane wave expansion may determine two dominant directions, and wherein the array of at least two virtual loudspeaker positions is selected such that two of the virtual loudspeaker positions at least substantially coincides, such as precisely coincides, with the two dominant directions.
  • the two most dominating sound sources, in a frequency band are as precisely spatially represented as possible, thus leading to the best possible spatial reproduction of audio material with two dominant sound sources spatially distributed, e.g. two singers or two musical instruments playing at the same time.
  • the audio processor may comprise a binaural synthesizer unit arranged to generate first and second audio output signals by applying Head-Related Transfer Functions to each of the virtual loudspeaker signals.
  • such audio processor may be implemented by a decoding matrix corresponding to the determined virtual loudspeaker positions and a transfer function matrix corresponding to the Head-Related Transfer Functions being combined into an output transfer matrix prior to being applied to the audio input signals.
  • a smoothing may be performed on transfer functions of such output transfer matrix prior to being applied to the input signals, which will serve to improve reproduction of transient sounds.
  • the phase of the Head-Related Transfer Functions is differentiated with respect to frequency, and after combining components of Head-Related Transfer Functions corresponding to different directions, the phase of the combined transfer functions is integrated with respect to frequency. This serves to preserve the group delay of the Head-Related Transfer Functions. Even more specifically, the phase of the Head-Related Transfer Functions may be differentiated with respect to frequency at frequencies above a frequency limit only, such as above 1.8 kHz, and after combining components of Head-Related Transfer Functions corresponding to different directions, the phase of the combined transfer functions is integrated with respect to frequency at frequencies above the frequency limit.
  • the phase of the Head-Related Transfer Functions may be left unaltered below a first frequency limit, such as below 1.6 kHz, and differentiated with respect to frequency at frequencies above a second frequency limit with a higher frequency than the first frequency limit, such as 2.0 kHz, and with a gradual transition in between, and after combining components of HRTFs corresponding to different directions, the inverse operation is applied to the combined function.
  • a first frequency limit such as below 1.6 kHz
  • a second frequency limit with a higher frequency than the first frequency limit, such as 2.0 kHz
  • the audio input signal is preferably a multi-channel audio signal arranged for decomposition into plane wave components.
  • the input signal may be one of: a B-format sound field signal, a higher-order ambisonics recording, a stereo recording, and a surround sound recording.
  • the invention provides a device comprising an audio processor according to the first aspect.
  • the device may be one of: a device for recording sound or video signals, a device for playback of sound or video signals, a portable device, a computer device, a video game device, a hi-fi device, an audio converter device, and a headphone unit.
  • the invention provides a method for converting a multi-channel audio input signal comprising at least two channels, such as a B-format Sound Field signal, into a set of audio output signals, such as a set of two audio output signals arranged for headphone reproduction, the method comprising
  • the method may be implemented in pure software, e.g. in the form of a generic code or in the form of a processor specific executable code. Alternatively, the method may be implemented partly in specific analog and/or digital electronic components and partly in software. Still alternatively, the method may be implemented in a single dedicated chip.
  • Fig. 1 shows an audio processor component with basic components according to the invention.
  • Input to the audio processor is a multi-channel audio signal.
  • This signal is split into a plurality of frequency bands in a filter bank, e.g. in the form of an FFT analysis performed on each of the plurality of channels.
  • a sound source separation unit SSS is then performed on the frequency separated signal.
  • a plane wave expansion calculation PWE is performed on each frequency band in order to determine one or two dominant sound source directions.
  • the one or two dominant sound source directions are then applied to a virtual loudspeaker position calculation algorithm VLP serving to select a set of virtual sound source or virtual loudspeaker directions, e.g.
  • the input signal is transferred or decoded DEC according to a decoding matrix corresponding to the selected virtual loudspeaker directions, and optionally Head-Related Transfer Functions corresponding to the virtual loudspeaker directions are applied before the frequency components are finally combined in a summation unit SU to form a set of output signals, e.g. two output signals in case of a binaural implementation, or such as four, five, six, seven or even more output signals in case of conversion to a format suitable for reproduction through a surround sound set-up of loudspeakers.
  • a set of output signals e.g. two output signals in case of a binaural implementation, or such as four, five, six, seven or even more output signals in case of conversion to a format suitable for reproduction through a surround sound set-up of loudspeakers.
  • the audio processor can be implemented in various ways, e.g. in the form of a processor forming part of a device, wherein the processor is provided with executable code to perform the invention.
  • Figs. 2 and 3 illustrate components of a preferred embodiment suited to convert an input signal having a three dimensional characteristics and is in an "ambisonic B-format".
  • the ambisonic B-format system is a very high quality sound positioning system which operates by breaking down the directionality of the sound into spherical harmonic components termed W, X, Y and Z.
  • the ambisonic system is then designed to utilize a plurality of output speakers to cooperatively recreate the original directional components.
  • a B-format signal is input having X, Y, Z and W components.
  • Each component of the B-format input set is processed through a corresponding filter bank 1-4 each of which divides the input into a number of output frequency bands (The number of bands being implementation dependent, typically in the range of 1024 to 4096).
  • Elements 5, 6, 7, 8 and 10 are replicated once for each frequency band, although only one of each is shown in Fig. 2 .
  • the four signals (one from each filter bank 1-4) are processed by a plane wave expansion element 5, which determines the smallest number of plane waves necessary to recreate the local sound field encoded in the four signals.
  • the plane wave expansion element also calculates the direction, phase and amplitude of these waves.
  • the input signal is denoted w , x , y, z, with subscripts r and i .
  • Equation 5 gives zero, one or two real values for cos 2 ⁇ , corresponding to zero, one or two solutions to the equations.
  • Each value for cos 2 ⁇ corresponds to several possible values of ⁇ , one in each quadrant, or the values 0 and n. Only one of these is correct.
  • the correct quadrant can be determined from equation 9 and the requirement that w 1 and w 2 should be positive.
  • equation 5 gives no real solutions, more than two plane waves are necessary to reconstruct the local sound field. It may also be advantageous to use an alternative method when the matrix to invert in equation 4 is singular or nearly singular. When allowing for more than two plane waves, an infinite number of possible solutions exist. Since this alternative method is necessary only for a small part of most signals, the choice of solution is not critical. One possible choice that of two plane waves travelling in the directions of the principal axes of the ellipse which is described by the time-dependent velocity vector associated with each frequency band.
  • the quadrant of ⁇ can be determined based on another equation (18) and the requirement that w ' 1 and w ' 2 should be positive.
  • the output of 5 consists of the two vectors ⁇ x 1 , y 1 , z 1 > and ⁇ x 2 , y 2 , z 2 >.
  • This output is connected to an element 6 which sorts these two vectors in accordance to the value of their y element. In an alternative embodiment of the invention, only one of the two vectors is passed on from element 6. The choice can be that of the longest vector or the one with the highest degree of similarity with neighbouring vectors.
  • the output of 6 is connected to a smoothing element 7 which suppresses rapid changes in the direction estimates.
  • the output of 7 is connected to an element 8 which generates suitable transfer functions from each of the input signals to each of the output signals, a total of eight transfer functions. Each of these transfer functions are passed through a smoothing element 9.
  • This element suppresses large differences in phase and amplitude between neighbouring frequency bands and also suppresses rapid changes in phase and amplitude.
  • the output of 9 is passed to a matrix multiplier 10 which applies the transfer functions to the input signals and creates two output signals. Elements 11 and 12 sum each of the output signals from 10 across all filter bands to produce a binaural signal.
  • FIG. 3 there is illustrated schematically the preferred embodiment of the transfer matrix generator referenced in Fig. 2 . While one transfer matrix generator is provided for each frequency band, elements 3, 4 and 5 are shared across all frequency bands. An element 1 generates two new vectors whose directions are chosen so as to maximize the angles between the four resulting vectors. In an alternative embodiment of the invention, only one vector is passed into the transfer matrix generator. In this case, element 1 must generate three new vectors, preferably such that the resulting four vectors point towards the vertices of a regular tetrahedron. This alternative approach is also beneficial in cases where the two input vectors are collinear or nearly collinear.
  • An element 5 stores a set of head-related transfer functions.
  • An element 3 alters the phase of these transfer functions, leaving the amplitude unchanged.
  • the reason for this transformation is that any uncertainty in the direction estimates translates to an uncertainty in the absolute phase of the head-related transfer functions which increases proportionally with frequency. Without this alteration to the transfer functions, too much noise would be added to the phase of high-frequency components in the signal, resulting in poor reproduction of transient sounds.
  • H f H ⁇ f ⁇ H ⁇ f - ⁇ ⁇ f + e f c - f / f 0 H ⁇ f - ⁇ ⁇ f + e f c - f / f 0
  • Element 2 uses the virtual loudspeaker directions to select and interpolate between the modified head-related transfer functions closest to the direction of each virtual loudspeaker. For each virtual loudspeaker, there are two head-related transfer functions; one for each ear, providing a total of eight transfer functions which are passed to element 4.
  • the outputs of elements 4 and 6 are multiplied in a matrix multiplication 7 to produce the suitable transfer matrix.
  • the decoding matrix is multiplied with the transfer function matrix before their product is multiplied with the input signals.
  • the input signals are first multiplied with the decoding matrix and their product subsequently multiplied with the transfer function matrix.
  • this would preclude the possibility of smoothing of the overall transfer functions. Such smoothing is advantageous for the reproduction of transient sounds.
  • the overall effect of the arrangement shown in Figs. 2 and 3 is to decompose the local sound field into a large number of plane waves and to pass these plane waves through corresponding head-related transfer functions in order to produce a binaural signal suited for headphone reproduction.
  • Fig. 4 illustrates a block diagram of an audio device with an audio processor according to the invention, e.g. the one illustrated in Figs. 2 and 3 .
  • the device may be a dedicated headphone unit, a general audio device offering the conversion of a multi-channel input signal to another output format as an option, or the device may be a general computer with a sound card provided with software suited to perform the conversion method according to the invention.
  • the device may be able to perform on-line conversion of the input signal, e.g. by receiving the multi-channel input audio signal in the form of a digital bit stream.
  • the device may generate the output signal in the form of an audio output file based on an audio file as input.
  • the invention provides an audio processor for converting a multi-channel audio input signal X, Y, Z, W, such as a B-format Sound Field signal, into a set of audio output signals L, R, such as a set of two audio output signals L, R arranged for headphone reproduction.
  • a filter bank splits the input signal X, Y, Z, W into frequency bands.
  • a sound source separation unit uses wave expansion on the input signal X, Y, Z, W to determine one or two dominant sound source directions. These are used to determine virtual loudspeaker positions selected such that one or both of the virtual loudspeaker positions coincide with one or both of the dominant directions.
  • the input signal X, Y, Z, W is then decoded into virtual loudspeaker signals corresponding to each of the virtual loudspeaker positions, and finally the frequency components are combined in a summation unit to arrive at the set of audio output signals L, R.
  • E.g. Head-Related Transfer Functions (HRTFs) are applied to arrive at a binaural signal suited for headphone reproduction.
  • HRTFs Head-Related Transfer Functions
  • Improved performance can be obtained by differentiating the phase of a high frequency part of the HRTFs before with respect to frequency, followed by a corresponding integration of this part with respect to frequency after combining the components of HRTFs.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)

Abstract

An audio processor for converting a multi-channel audio input signal (X, Y, Z, W), such as a B-format Sound Field signal, into a set of audio output signals (L, R), such as a set of two audio output signals (L, R) arranged for headphone reproduction. A filter bank splits the input signal (X, Y, Z, W) into frequency bands. A sound source separation unit uses wave expansion on the input signal (X, Y, Z, W) to determine one or two dominant sound source directions. The(se) are used to determine virtual loudspeaker positions selected such that one or both of the virtual loudspeaker positions coincide(s) with one or both of the dominant directions. The input signal (X, Y, Z, W) is then decoded into virtual loudspeaker signals corresponding to each of the virtual loudspeaker positions, and finally the frequency components are combined in a summation unit to arrive at the set of audio output signals (L, R). E.g. Head-Related Transfer Functions (HRTFs) are applied to arrive at a binaural signal suited for headphone reproduction. A high spatial fidelity is obtained due to the coincidence of virtual loudspeaker positions and the determined dominant sound source direction(s). Improved performance can be obtained by differentiating the phase of a high frequency part of the HRTFs before with respect to frequency, followed by a corresponding integration of this part with respect to frequency after combining the components of HRTFs.

Description

    FIELD OF THE INVENTION
  • The invention relates to the field of audio signal processing. More specifically, the invention provides a processor and a method for converting a multi-channel audio signal, such as a sound field signal, into another type of multi-channel audio signal suited for playback via headphones or loudspeakers, while preserving spatial information in the original signal.
  • BACKGROUND OF THE INVENTION
  • The use of B-format measurements, recordings and playback in the provision of more ideal acoustic reproductions which capture part of the spatial characteristics of an audio reproduction are well known.
  • In the case of conversion of B-format signals to multiple loudspeakers in a loudspeaker array, there is a well recognized problem due to the spreading of individual virtual sound sources over a large number of playback speaker elements. In the case of binaural playback of B-format signals, the approximations inherent in the B-format sound field can lead to less precise localization of sound sources, and a loss of the out-of-head sensation that is an important part of the binaural playback experience.
  • In prior art, filter banks are used to split each component of the spatial sound field set into a set of frequency bands. The short-term correlation between the W (omni) channel and each of the three other bands is then used to estimate the direction of arrival of sound within each frequency band. The input signal is split in two parts: One consisting of the frequency bands where a clear direction of arrival was detected, and one consisting of the remainder of the frequency bands. The first part of the signal is processed through head-related transfer functions corresponding to the estimated direction of arrival in each frequency band. The second part is processed through a linear decoding matrix and a fixed set of head-related transfer functions corresponding to a virtual loudspeaker array.
  • US 6,628,787 by Lake Technology Ltd. describes a specific method for creating a multi-channel of binaural signal from a B-format sound-field signal. The sound-field signal is split into frequency bands, and in each band a direction factor is determined. Based on the direction factor, speaker drive signals are computed foe each band by panning the signals to drive the nearest speaker. In addition, residual signal components are apportioned to the speaker signals by means of known decoding techniques.
  • There are different problems with the known methods. Firstly, the direction estimate is generally missing or incorrect in the case where more than a single sound source emits sound at the same time and within the same frequency band. This leads to imprecise or incorrect localization when there is more than one sound source present and when echoes interfere with the direct sound from a single source. Secondly, the use of head-related transfer functions from different directions in different frequency bands leads to a phase mismatch which increases proportionally with frequency. This in turn leads to two problems: Firstly, the group delay of high-frequency input signals does not correspond to the group delay encoded in the head-related transfer functions. This gives the wrong inter-aural time difference and therefore inaccurate localization. Secondly, the temporal evolution of the input signal is distorted, leading to poor reproduction of such transient sounds as applause and percussion instruments.
  • SUMMARY OF THE INVENTION
  • In view of the above, it may be seen as an object of the present invention to provide a processor and a method for converting a multi-channel audio input, such as a B-format sound field input into an audio output suited for playback over headphones or via loudspeakers, while still preserving the substantial spatial information contained in the original multi-channel input.
  • In a first aspect, the invention provides an audio processor arranged to convert a multi-channel audio input signal comprising at least two channels, such as a B-format Sound Field signal, into a set of audio output signals, such as a set of two audio output signals arranged for headphone reproduction, the audio processor comprising
    • a filter bank arranged to separate the input signal into a plurality of frequency bands, such as partially overlapping frequency bands,
    • a sound source separation unit arranged, for at least a part of the plurality of frequency bands, to
      • perform a plane wave expansion computation on the multi-channel audio input signal so as to determine at least one dominant direction corresponding to a direction of a dominant sound source in the audio input signal,
      • determine an array of at least two, such as four, virtual loudspeaker positions selected such that one or more of the virtual loudspeaker positions at least substantially coincides, such as precisely coincides, with the at least one dominant direction, and
      • decode the audio input signal into virtual loudspeaker signals corresponding to each of the virtual loudspeaker positions, and
    • a summation unit arranged to sum the virtual loudspeaker signals for the at least part of the plurality of frequency bands to arrive at the set of audio output signals.
  • Such audio processor provides an advantageous conversion of the multi-channel input signal due to the combination of plane wave expansion extraction of directions for dominant sound sources for each frequency band and the selection of at least one virtual loudspeaker position coinciding with a direction for at least one dominant sound source. For example, this provides a virtual loudspeaker signal highly suited for generation of a binaural output signal by applying Head-Related Transfer Functions to the virtual loudspeaker signals. The reason is that it is secured that a dominant sound source is represented in the virtual loudspeaker signal by its direction, whereas prior art systems with a fixed set of virtual loudspeaker positions will in general split such dominant sound source between the nearest fixed virtual loudspeaker positions. When applying Head-Related Transfer Functions, this means that the dominant sound source will be reproduced through two sets of Head-Related Transfer Functions corresponding to the two fixed virtual loudspeaker positions which results in a rather blurred spatial image of the dominant sound source. According to the invention, the dominant sound source will be reproduced through one set of Head-Related Transfer Functions corresponding to its actual direction, thereby resulting in an optimal reproduction of the 3D spatial information contained in the original input signal.
  • Thus, in a preferred embodiment, the audio processor is arranged to generate the set of audio output signals such that it is arranged for playback over headphones, e.g. by applying Head-Related Transfer Functions, or other known ways of creating a spatial effects based on a single input signal and its direction.
  • The filter bank may comprise at least 500, such as 1000 to 5000, preferably partially overlapping filters covering the frequency range of 0 Hz to 22 kHz. E.g. specifically, an FFT analysis with a window length of 2048 to 8192 samples, i.e. 1024-4096 bands covering 0-22050 Hz may be used. However, it is appreciated that the invention may be performed also with fewer filters, in case a reduced performance is accepted.
  • The sound source separation unit preferably determines the at least one dominant direction in each frequency band for each time frame, such as a time frame having a size of 2,000 to 10,000 samples, e.g. 2048-8192, as mentioned. However, it is to be understood that a lower update of the dominant direction may be used, in case a reduced performance is accepted.
  • For audio input signals where a panning technique is used to position sound sources, such as stereo recordings or surround sound recordings, less spatial information is present, in comparison with a B-format input. To compensate, the filter outputs from two consecutive time frames should be sent to the sound source separation unit. This is preferably achieved with a plurality of delay elements.
  • The virtual loudspeaker positions may be selected by a rotation of a set of at least two positions in a fixed spatial interrelation. Especially, the set of positions in a fixed spatial interrelation comprises four positions, such as four positions arranged in a tetrahedron, which has been found to provide an accurate localization of the sound sources without increasing the noise level of the recorded signal.
  • The plane wave expansion may determine two dominant directions, and wherein the array of at least two virtual loudspeaker positions is selected such that two of the virtual loudspeaker positions at least substantially coincides, such as precisely coincides, with the two dominant directions. Hereby it is ensured that the two most dominating sound sources, in a frequency band, are as precisely spatially represented as possible, thus leading to the best possible spatial reproduction of audio material with two dominant sound sources spatially distributed, e.g. two singers or two musical instruments playing at the same time.
  • In order to generate a binaural two-channel output signal, the audio processor may comprise a binaural synthesizer unit arranged to generate first and second audio output signals by applying Head-Related Transfer Functions to each of the virtual loudspeaker signals. Especially, such audio processor may be implemented by a decoding matrix corresponding to the determined virtual loudspeaker positions and a transfer function matrix corresponding to the Head-Related Transfer Functions being combined into an output transfer matrix prior to being applied to the audio input signals. Hereby a smoothing may be performed on transfer functions of such output transfer matrix prior to being applied to the input signals, which will serve to improve reproduction of transient sounds.
  • In a preferred embodiment, the phase of the Head-Related Transfer Functions is differentiated with respect to frequency, and after combining components of Head-Related Transfer Functions corresponding to different directions, the phase of the combined transfer functions is integrated with respect to frequency. This serves to preserve the group delay of the Head-Related Transfer Functions. Even more specifically, the phase of the Head-Related Transfer Functions may be differentiated with respect to frequency at frequencies above a frequency limit only, such as above 1.8 kHz, and after combining components of Head-Related Transfer Functions corresponding to different directions, the phase of the combined transfer functions is integrated with respect to frequency at frequencies above the frequency limit. Hereby, only the phase at higher frequencies is manipulated, and thus at lower frequencies where the interaural phase difference is significant, the phase is left unchanged. In a more specific embodiment, the phase of the Head-Related Transfer Functions may be left unaltered below a first frequency limit, such as below 1.6 kHz, and differentiated with respect to frequency at frequencies above a second frequency limit with a higher frequency than the first frequency limit, such as 2.0 kHz, and with a gradual transition in between, and after combining components of HRTFs corresponding to different directions, the inverse operation is applied to the combined function.
  • The audio input signal is preferably a multi-channel audio signal arranged for decomposition into plane wave components. Especially, the input signal may be one of: a B-format sound field signal, a higher-order ambisonics recording, a stereo recording, and a surround sound recording.
  • In a second aspect, the invention provides a device comprising an audio processor according to the first aspect. Especially, the device may be one of: a device for recording sound or video signals, a device for playback of sound or video signals, a portable device, a computer device, a video game device, a hi-fi device, an audio converter device, and a headphone unit.
  • In a third aspect, the invention provides a method for converting a multi-channel audio input signal comprising at least two channels, such as a B-format Sound Field signal, into a set of audio output signals, such as a set of two audio output signals arranged for headphone reproduction, the method comprising
    • separating the input signal into a plurality of frequency bands, such as partially overlapping frequency bands,
    • performing a sound source separation for at least a part of the plurality of frequency bands, comprising
    • performing a plane wave expansion computation on the multi-channel audio input signal so as to determine at least one dominant direction corresponding to a direction of a dominant sound source in the audio input signal,
    • determining an array of at least two, such as four, virtual loudspeaker positions selected such that one or more of the virtual loudspeaker positions at least substantially coincides, such as precisely coincides, with the at least one dominant direction, and
    • decoding the audio input signal into virtual loudspeaker signals corresponding to each of the virtual loudspeaker positions, and
    • summing the virtual loudspeaker signals for the at least part of the plurality of frequency bands to arrive at the set of audio output signals.
  • The method may be implemented in pure software, e.g. in the form of a generic code or in the form of a processor specific executable code. Alternatively, the method may be implemented partly in specific analog and/or digital electronic components and partly in software. Still alternatively, the method may be implemented in a single dedicated chip.
  • It is appreciated that two or more of the mentioned embodiments can advantageously be combined. It is also appreciated that embodiments and advantages mentioned for the first aspect, applies as well for the second and third aspects.
  • BRIEF DESCRIPTION OF THE DRAWING
  • Embodiments of the invention will be described, by way of example only, with reference to the drawings.
    • Fig. 1 illustrates basic components of one embodiment of the audio processor,
    • Fig. 2 illustrates details of an embodiment for converting a B-format sound field signal into a binaural signal,
    • Fig. 3 illustrates a possible implementation of the transfer matrix generator referred to in Fig. 2, and
    • Fig. 4 illustrates an audio device with an audio processor according to the invention.
    DESCRIPTION OF EMBODIMENTS
  • Fig. 1 shows an audio processor component with basic components according to the invention. Input to the audio processor is a multi-channel audio signal. This signal is split into a plurality of frequency bands in a filter bank, e.g. in the form of an FFT analysis performed on each of the plurality of channels. A sound source separation unit SSS is then performed on the frequency separated signal. First, a plane wave expansion calculation PWE is performed on each frequency band in order to determine one or two dominant sound source directions. The one or two dominant sound source directions are then applied to a virtual loudspeaker position calculation algorithm VLP serving to select a set of virtual sound source or virtual loudspeaker directions, e.g. by rotation of a fixed set of virtual loudspeaker directions, such that the one or both, in case of two, dominant sound source directions coincide with respective virtual loudspeaker directions. Then, the input signal is transferred or decoded DEC according to a decoding matrix corresponding to the selected virtual loudspeaker directions, and optionally Head-Related Transfer Functions corresponding to the virtual loudspeaker directions are applied before the frequency components are finally combined in a summation unit SU to form a set of output signals, e.g. two output signals in case of a binaural implementation, or such as four, five, six, seven or even more output signals in case of conversion to a format suitable for reproduction through a surround sound set-up of loudspeakers.
  • The audio processor can be implemented in various ways, e.g. in the form of a processor forming part of a device, wherein the processor is provided with executable code to perform the invention.
  • Figs. 2 and 3 illustrate components of a preferred embodiment suited to convert an input signal having a three dimensional characteristics and is in an "ambisonic B-format". The ambisonic B-format system is a very high quality sound positioning system which operates by breaking down the directionality of the sound into spherical harmonic components termed W, X, Y and Z. The ambisonic system is then designed to utilize a plurality of output speakers to cooperatively recreate the original directional components. For a description of the B-format system, reference is made to: http://en.wikipedia.org/wiki/Ambisonics.
  • Referring to Fig. 2, the preferred embodiment is directed at providing an improved spatialization of input audio signals. A B-format signal is input having X, Y, Z and W components. Each component of the B-format input set is processed through a corresponding filter bank 1-4 each of which divides the input into a number of output frequency bands (The number of bands being implementation dependent, typically in the range of 1024 to 4096).
  • Elements 5, 6, 7, 8 and 10 are replicated once for each frequency band, although only one of each is shown in Fig. 2. For each frequency band, the four signals (one from each filter bank 1-4) are processed by a plane wave expansion element 5, which determines the smallest number of plane waves necessary to recreate the local sound field encoded in the four signals. The plane wave expansion element also calculates the direction, phase and amplitude of these waves. The input signal is denoted w, x, y, z, with subscripts r and i. The local sound field can in most cases be recreated by two plane waves, as expressed in the following equations: w 1 x 1 y 1 z 1 e i ϕ 1 + w 2 x 2 y 2 z 2 e i ϕ 2 = w r x r y r z r + w i x i y i z i i
    Figure imgb0001
    x 1 2 + y 1 2 + z 1 2 = w 1 2
    Figure imgb0002
    x 2 2 + y 2 2 + z 2 2 = w 2 2
    Figure imgb0003
  • The solution to these equations is w 1 x 1 y 1 z 1 w 2 x 2 y 2 z 2 = cosϕ 1 cosϕ 1 sinϕ 2 sinϕ 2 - 1 w r x r y r z r w i x i y i z i
    Figure imgb0004

    where cos 2 ϕ n = 2 a 2 - bc + b 2 ± 2 a a 2 - bc c - b 2 + 4 a 2
    Figure imgb0005
    a = - w r w i + x r x i + y r y i + z r z i
    Figure imgb0006
    b = - w r 2 + x r 2 + y r 2 + z r 2
    Figure imgb0007
    c = - w i 2 + x i 2 + y i 2 + z i 2
    Figure imgb0008
  • Equation 5 gives zero, one or two real values for cos2ϕ, corresponding to zero, one or two solutions to the equations. Each value for cos2ϕ corresponds to several possible values of ϕ, one in each quadrant, or the values 0 and n. Only one of these is correct. The correct quadrant can be determined from equation 9 and the requirement that w 1 and w 2 should be positive. sinϕ n cosϕ n = c - b cos 2 ϕ n + b 2 a
    Figure imgb0009
  • When equation 5 gives no real solutions, more than two plane waves are necessary to reconstruct the local sound field. It may also be advantageous to use an alternative method when the matrix to invert in equation 4 is singular or nearly singular. When allowing for more than two plane waves, an infinite number of possible solutions exist. Since this alternative method is necessary only for a small part of most signals, the choice of solution is not critical. One possible choice that of two plane waves travelling in the directions of the principal axes of the ellipse which is described by the time-dependent velocity vector associated with each frequency band. In addition to these two plane waves, a spherical wave is necessary to reconstruct the W component of the incoming signal: w 0 0 0 0 e i ϕ 0 + w 1 x 1 y 1 z 1 e i ϕ 1 + w 2 x 2 y 2 z 2 e i ϕ 2 = w r x r y r z r + w i x i y i z i i
    Figure imgb0010
    x 1 2 + y 1 2 + z 1 2 = w 1 2
    Figure imgb0011
    x 2 2 + y 2 2 + z 2 2 = w 2 2
    Figure imgb0012
  • The chosen solution is w 1 ʹ x 1 y 1 z 1 w 2 ʹ x 2 y 2 z 2 = cosϕ 1 cosϕ 1 sinϕ 2 sinϕ 2 - 1 w r x r y r z r w i x i y i z i
    Figure imgb0013

    where cos 2 ϕ n = 1 2 ± b - c 2 4 a 2 + b - c 2
    Figure imgb0014
    a = x r x i + y r y i + z r z i
    Figure imgb0015
    b = x r 2 + y r 2 + z r 2
    Figure imgb0016
    c = x i 2 + y i 2 + z i 2
    Figure imgb0017
  • As before, the quadrant of ϕ can be determined based on another equation (18) and the requirement that w'1 and w'2 should be positive. sinϕ n cosϕ n = 2 a cos 2 ϕ n - a b - c
    Figure imgb0018
  • The values of w o and ϕ0 are not used in subsequent steps.
  • The output of 5 consists of the two vectors <x1 , y1 , z1 > and <x2 , y2 , z2 >. This output is connected to an element 6 which sorts these two vectors in accordance to the value of their y element. In an alternative embodiment of the invention, only one of the two vectors is passed on from element 6. The choice can be that of the longest vector or the one with the highest degree of similarity with neighbouring vectors. The output of 6 is connected to a smoothing element 7 which suppresses rapid changes in the direction estimates. The output of 7 is connected to an element 8 which generates suitable transfer functions from each of the input signals to each of the output signals, a total of eight transfer functions. Each of these transfer functions are passed through a smoothing element 9. This element suppresses large differences in phase and amplitude between neighbouring frequency bands and also suppresses rapid changes in phase and amplitude. The output of 9 is passed to a matrix multiplier 10 which applies the transfer functions to the input signals and creates two output signals. Elements 11 and 12 sum each of the output signals from 10 across all filter bands to produce a binaural signal.
  • Referring to Fig. 3, there is illustrated schematically the preferred embodiment of the transfer matrix generator referenced in Fig. 2. While one transfer matrix generator is provided for each frequency band, elements 3, 4 and 5 are shared across all frequency bands. An element 1 generates two new vectors whose directions are chosen so as to maximize the angles between the four resulting vectors. In an alternative embodiment of the invention, only one vector is passed into the transfer matrix generator. In this case, element 1 must generate three new vectors, preferably such that the resulting four vectors point towards the vertices of a regular tetrahedron. This alternative approach is also beneficial in cases where the two input vectors are collinear or nearly collinear.
  • The four vectors are used to represent the directions to four virtual loudspeakers which will be used to play back the input signals. An element 6 calculates a decoding matrix by inverting the following matrix: 1 x 1 ʹ y 1 ʹ z 1 ʹ 1 x 2 ʹ y 2 ʹ z 2 ʹ 1 x 3 ʹ y 3 ʹ z 3 ʹ 1 x 4 ʹ y 4 ʹ z 4 ʹ
    Figure imgb0019

    where x n ʹ y n ʹ z n ʹ = x n y n z n x n 2 + y n 2 + z n 2
    Figure imgb0020
  • An element 5 stores a set of head-related transfer functions. An element 3 alters the phase of these transfer functions, leaving the amplitude unchanged. The reason for this transformation is that any uncertainty in the direction estimates translates to an uncertainty in the absolute phase of the head-related transfer functions which increases proportionally with frequency. Without this alteration to the transfer functions, too much noise would be added to the phase of high-frequency components in the signal, resulting in poor reproduction of transient sounds. The transformation performed by element 3 is: f = H f H f - Δ f + e f c - f / f 0 H f - Δ f + e f c - f / f 0
    Figure imgb0021
  • The effect of this transformation is none at low frequencies. At high frequencies, the phase of the transfer function is differentiated with respect to frequency. The transition happens around f=fc , in a transition region of approximate width fo. In this equation, Δf is the band-to-band frequency difference. Since the human ability to perceive inter-aural phase difference is limited to frequencies below approx. 1200-1600 Hz, reasonable values for fc and f0 are 1800 Hz and 200 Hz, respectively. Above this transition frequency, humans are still sensitive to inter-aural group delay, which will be restored after performing the inverse transformation, as is done in element 4: H f = f H f - Δ f + e f c - f / f 0 H f - Δ f + e f c - f / f 0
    Figure imgb0022
  • Element 2 uses the virtual loudspeaker directions to select and interpolate between the modified head-related transfer functions closest to the direction of each virtual loudspeaker. For each virtual loudspeaker, there are two head-related transfer functions; one for each ear, providing a total of eight transfer functions which are passed to element 4.
  • The outputs of elements 4 and 6 are multiplied in a matrix multiplication 7 to produce the suitable transfer matrix.
  • In the arrangement shown in Fig. 3, the decoding matrix is multiplied with the transfer function matrix before their product is multiplied with the input signals. In an alternative embodiment of the invention, the input signals are first multiplied with the decoding matrix and their product subsequently multiplied with the transfer function matrix. However, this would preclude the possibility of smoothing of the overall transfer functions. Such smoothing is advantageous for the reproduction of transient sounds.
  • The overall effect of the arrangement shown in Figs. 2 and 3 is to decompose the local sound field into a large number of plane waves and to pass these plane waves through corresponding head-related transfer functions in order to produce a binaural signal suited for headphone reproduction.
  • Fig. 4 illustrates a block diagram of an audio device with an audio processor according to the invention, e.g. the one illustrated in Figs. 2 and 3. The device may be a dedicated headphone unit, a general audio device offering the conversion of a multi-channel input signal to another output format as an option, or the device may be a general computer with a sound card provided with software suited to perform the conversion method according to the invention.
  • The device may be able to perform on-line conversion of the input signal, e.g. by receiving the multi-channel input audio signal in the form of a digital bit stream. Alternatively, e.g. if the device is a computer, the device may generate the output signal in the form of an audio output file based on an audio file as input.
  • To sum up, the invention provides an audio processor for converting a multi-channel audio input signal X, Y, Z, W, such as a B-format Sound Field signal, into a set of audio output signals L, R, such as a set of two audio output signals L, R arranged for headphone reproduction. A filter bank splits the input signal X, Y, Z, W into frequency bands. A sound source separation unit uses wave expansion on the input signal X, Y, Z, W to determine one or two dominant sound source directions. These are used to determine virtual loudspeaker positions selected such that one or both of the virtual loudspeaker positions coincide with one or both of the dominant directions. The input signal X, Y, Z, W is then decoded into virtual loudspeaker signals corresponding to each of the virtual loudspeaker positions, and finally the frequency components are combined in a summation unit to arrive at the set of audio output signals L, R. E.g. Head-Related Transfer Functions (HRTFs) are applied to arrive at a binaural signal suited for headphone reproduction. A high spatial fidelity is obtained due to the coincidence of virtual loudspeaker positions and the determined dominant sound source direction(s).
  • Improved performance can be obtained by differentiating the phase of a high frequency part of the HRTFs before with respect to frequency, followed by a corresponding integration of this part with respect to frequency after combining the components of HRTFs.
  • In the claims, the term "comprising" does not exclude the presence of other elements or steps. Additionally, although individual features may be included in different claims, these may possibly be advantageously combined, and the inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. In addition, singular references do not exclude a plurality. Thus, references to "a", "an", "first", "second" etc. do not preclude a plurality. Reference signs are included in the claims however the inclusion of the reference signs is only for clarity reasons and should not be construed as limiting the scope of the claims.

Claims (15)

  1. An audio processor arranged to convert a multi-channel audio input signal (X, Y, Z, W) comprising at least two channels, such as a B-format Sound Field signal, into a set of audio output signals (L, R), such as a set of two audio output signals (L, R) arranged for headphone reproduction, the audio processor comprising
    - a filter bank arranged to separate the input signal (X, Y, Z, W) into a plurality of frequency bands, such as partially overlapping frequency bands,
    - a sound source separation unit arranged, for at least a part of the plurality of frequency bands, to
    - perform a plane wave expansion computation on the multi-channel audio input signal (X, Y, Z, W) so as to determine at least one dominant direction corresponding to a direction of a dominant sound source in the audio input signal (X, Y, Z, W),
    - determine an array of at least two, such as four, virtual loudspeaker positions selected such that one or more of the virtual loudspeaker positions at least substantially coincides, such as precisely coincides, with the at least one dominant direction, and
    - decode the audio input signal (X, Y, Z, W) into virtual loudspeaker signals corresponding to each of the virtual loudspeaker positions, and
    - a summation unit arranged to sum the virtual loudspeaker signals for the at least part of the plurality of frequency bands to arrive at the set of audio output signals (L, R).
  2. Audio processor according to claim 1, wherein the filter bank comprises at least 500, such as 1000 to 5000, partially overlapping filters covering a frequency range of 0 Hz to 22 kHz.
  3. Audio processor according to claim 1 or 2, wherein the virtual loudspeaker positions are selected by a rotation of a set of at least three positions in a fixed spatial interrelation.
  4. Audio processor according to claim 3, wherein the set of positions in a fixed spatial interrelation comprises four positions, such as four positions arranged in a tetrahedron.
  5. Audio processor according to any of the preceding claims, wherein the wave expansion determines two dominant directions, and wherein the array of at least two virtual loudspeaker positions is selected such that two of the virtual loudspeaker positions at least substantially coincides, such as precisely coincides, with the two dominant directions.
  6. Audio processor according to any of the preceding claims, comprising a binaural synthesizer unit arranged to generate first and second audio output signals (L, R) by applying Head-Related Transfer Functions (HRTF) to each of the virtual loudspeaker signals.
  7. Audio processor according to claim 6, wherein a decoding matrix corresponding to the determined virtual loudspeaker positions and a transfer function matrix corresponding to the Head-Related Transfer Functions (HRTF) are being combined into an output transfer matrix prior to being applied to the audio input signals (X, Y, Z, W).
  8. Audio processor according to claim 7, wherein a smoothing is performed on transfer functions of the output transfer matrix prior to being applied to the input signals (X, Y, Z, W).
  9. Audio processor according to any of claims 6-8, wherein the phase of the Head-Related Transfer Functions (HRTF) is differentiated with respect to frequency, and after combining components of Head-Related Transfer Functions (HRTF) corresponding to different directions, the phase of the combined transfer functions is integrated with respect to frequency.
  10. Audio processor according to any of claim 9, wherein the phase of the Head-Related Transfer Functions (HRTF) is left unaltered below a first frequency limit, such as below 1.6 kHz, and differentiated with respect to frequency at frequencies above a second frequency limit with a higher frequency than the first frequency limit, such as 2.0 kHz, and with a gradual transition in between, and after combining components of Head-Related Transfer Functions (HRTF) corresponding to different directions, the inverse operation is applied to the combined function.
  11. Audio processor according to any of the preceding claims, wherein the audio input signal is a multi-channel audio signal arranged for decomposition into plane wave components, such as one of: a B-format sound field signal, a higher-order ambisonics recording, a stereo recording, and a surround sound recording.
  12. Audio processor according to any of the preceding claims, wherein the sound source separation unit determines the at least one dominant direction in each frequency band for each time frame, wherein a time frame has a size of 2,000 to 10,000 samples.
  13. Audio processor according to any of the preceding claims, wherein the set of audio output signals (L, R) is arranged for playback over headphones.
  14. Device comprising an audio processor according to any of claims 1-13, such as the device being one of: a device for recording sound or video signals, a device for playback of sound or video signals, a portable device, a computer device, a video game device, a hi-fi device, an audio converter device, and a headphone unit.
  15. Method for converting a multi-channel audio input signal (X, Y, Z, W) comprising at least two channels, such as a B-format Sound Field signal, into a set of audio output signals (L, R), such as a set of two audio output signals (L, R) arranged for headphone reproduction, the method comprising
    - separating the input signal (X, Y, Z, W) into a plurality of frequency bands, such as partially overlapping frequency bands,
    - performing a sound source separation for at least a part of the plurality of frequency bands, comprising
    - performing a plane wave expansion computation on the multi-channel audio input signal (X, Y, Z, W) so as to determine at least one dominant direction corresponding to a direction of a dominant sound source in the audio input signal (X, Y, Z, W),
    - determining an array of at least two, such as four, virtual loudspeaker positions selected such that one or more of the virtual loudspeaker positions at least substantially coincides, such as precisely coincides, with the at least one dominant direction, and
    - decoding the audio input signal (X, Y, Z, W) into virtual loudspeaker signals corresponding to each of the virtual loudspeaker positions, and
    - summing the virtual loudspeaker signals for the at least part of the plurality of frequency bands to arrive at the set of audio output signals (L, R).
EP09163760A 2009-06-25 2009-06-25 Device and method for converting spatial audio signal Withdrawn EP2268064A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
EP09163760A EP2268064A1 (en) 2009-06-25 2009-06-25 Device and method for converting spatial audio signal
EP10167042.0A EP2285139B1 (en) 2009-06-25 2010-06-23 Device and method for converting spatial audio signal
ES10167042.0T ES2690164T3 (en) 2009-06-25 2010-06-23 Device and method to convert a spatial audio signal
PL10167042T PL2285139T3 (en) 2009-06-25 2010-06-23 Device and method for converting spatial audio signal
US12/822,015 US8705750B2 (en) 2009-06-25 2010-06-23 Device and method for converting spatial audio signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP09163760A EP2268064A1 (en) 2009-06-25 2009-06-25 Device and method for converting spatial audio signal

Publications (1)

Publication Number Publication Date
EP2268064A1 true EP2268064A1 (en) 2010-12-29

Family

ID=41265575

Family Applications (1)

Application Number Title Priority Date Filing Date
EP09163760A Withdrawn EP2268064A1 (en) 2009-06-25 2009-06-25 Device and method for converting spatial audio signal

Country Status (1)

Country Link
EP (1) EP2268064A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103945308A (en) * 2013-01-23 2014-07-23 中国科学院声学研究所 Sound reproduction method and system based on wave field synthesis and wave field analysis
CN106358126A (en) * 2016-09-26 2017-01-25 宇龙计算机通信科技(深圳)有限公司 Multi-audio frequency playing method, system and terminal
CN109448743A (en) * 2012-12-12 2019-03-08 杜比国际公司 The method and apparatus that the high-order ambiophony of sound field is indicated to carry out compression and decompression
US10412531B2 (en) 2016-01-08 2019-09-10 Sony Corporation Audio processing apparatus, method, and program
US10582329B2 (en) 2016-01-08 2020-03-03 Sony Corporation Audio processing device and method
US10595148B2 (en) 2016-01-08 2020-03-17 Sony Corporation Sound processing apparatus and method, and program
CN113138363A (en) * 2021-04-22 2021-07-20 苏州臻迪智能科技有限公司 Sound source positioning method and device, storage medium and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000019415A2 (en) * 1998-09-25 2000-04-06 Creative Technology Ltd. Method and apparatus for three-dimensional audio display
US6259795B1 (en) * 1996-07-12 2001-07-10 Lake Dsp Pty Ltd. Methods and apparatus for processing spatialized audio
US6628787B1 (en) 1998-03-31 2003-09-30 Lake Technology Ltd Wavelet conversion of 3-D audio signals

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6259795B1 (en) * 1996-07-12 2001-07-10 Lake Dsp Pty Ltd. Methods and apparatus for processing spatialized audio
US6628787B1 (en) 1998-03-31 2003-09-30 Lake Technology Ltd Wavelet conversion of 3-D audio signals
WO2000019415A2 (en) * 1998-09-25 2000-04-06 Creative Technology Ltd. Method and apparatus for three-dimensional audio display

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109448743A (en) * 2012-12-12 2019-03-08 杜比国际公司 The method and apparatus that the high-order ambiophony of sound field is indicated to carry out compression and decompression
CN109448743B (en) * 2012-12-12 2020-03-10 杜比国际公司 Method and apparatus for compressing and decompressing higher order ambisonic representations of a sound field
CN103945308A (en) * 2013-01-23 2014-07-23 中国科学院声学研究所 Sound reproduction method and system based on wave field synthesis and wave field analysis
CN103945308B (en) * 2013-01-23 2016-03-02 中国科学院声学研究所 A kind of based on wave field synthesis and the low voice speaking of Wave field analysis put method and system
US10412531B2 (en) 2016-01-08 2019-09-10 Sony Corporation Audio processing apparatus, method, and program
US10582329B2 (en) 2016-01-08 2020-03-03 Sony Corporation Audio processing device and method
US10595148B2 (en) 2016-01-08 2020-03-17 Sony Corporation Sound processing apparatus and method, and program
CN106358126A (en) * 2016-09-26 2017-01-25 宇龙计算机通信科技(深圳)有限公司 Multi-audio frequency playing method, system and terminal
CN113138363A (en) * 2021-04-22 2021-07-20 苏州臻迪智能科技有限公司 Sound source positioning method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
US8705750B2 (en) Device and method for converting spatial audio signal
US20200335115A1 (en) Audio encoding and decoding
EP3320692B1 (en) Spatial audio processing apparatus
US9154895B2 (en) Apparatus of generating multi-channel sound signal
US8180062B2 (en) Spatial sound zooming
US9794721B2 (en) System and method for capturing, encoding, distributing, and decoding immersive audio
TWI415111B (en) Spatial decoder unit, spatial decoder device, audio system, consumer electronic device, method of producing a pair of binaural output channels, and computer readable medium
EP2976769B1 (en) Method and apparatus for enhancing directivity of a 1st order ambisonics signal
US7231054B1 (en) Method and apparatus for three-dimensional audio display
EP1354495B1 (en) Method of decoding two-channel matrix encoded audio to reconstruct multichannel audio
US8605909B2 (en) Method and device for efficient binaural sound spatialization in the transformed domain
US9055371B2 (en) Controllable playback system offering hierarchical playback options
US20150163615A1 (en) Method and device for rendering an audio soundfield representation for audio playback
EP2268064A1 (en) Device and method for converting spatial audio signal
US6628787B1 (en) Wavelet conversion of 3-D audio signals
Wiggins An investigation into the real-time manipulation and control of three-dimensional sound fields
Rafaely et al. Spatial audio signal processing for binaural reproduction of recorded acoustic scenes–review and challenges
US11979723B2 (en) Content based spatial remixing
WO2000019415A2 (en) Method and apparatus for three-dimensional audio display
CN114450977A (en) Apparatus, method or computer program for processing a representation of a sound field in the spatial transform domain
KR101637407B1 (en) Apparatus and method and computer program for generating a stereo output signal for providing additional output channels
Hold et al. Parametric binaural reproduction of higher-order spatial impulse responses
Deshpande et al. Blind localization and segregation of two sources including a binaural head movement model
Trevino et al. Enhancing stereo signals with high-order Ambisonics spatial information
JP2006270649A (en) Voice acoustic signal processing apparatus and method thereof

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL BA RS

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20110630