EP2484127B1

EP2484127B1 - Method, computer program and apparatus for processing audio signals

Info

Publication number: EP2484127B1
Application number: EP10819956.3A
Authority: EP
Inventors: Ravi Rangnath Shenoy
Original assignee: Nokia Technologies Oy
Current assignee: Nokia Technologies Oy
Priority date: 2009-09-30
Filing date: 2010-09-15
Publication date: 2020-02-12
Anticipated expiration: 2030-09-15
Also published as: EP2484127A1; WO2011039413A1; CN102550048B; EP2484127A4; CN102550048A

Description

The present invention relates to apparatus for processing of audio signals. The invention further relates to, but is not limited to, apparatus for processing audio and speech signals in audio playback devices.
Audio rendering and sound virtualization has been a growing area in recent years. There are different playback techniques some of which are mono, stereo playback, surround 5.1, ambisonics etc. In addition to playback techniques, apparatus or signal processing integrated within apparatus or signal processing performed prior to the final playback apparatus has been designed to allow a virtual sound image to be created in many applications such as music playback, movie sound tracks, 3D audio, and gaming applications.
The standard for commercial audio content until recently, for music or movie, was stereo audio signal generation. Signals from different musical instruments, speech or voice, and other audio sources creating the sound scene were combined to form a stereo signal. Commercially available playback devices would typically have two loudspeakers placed at a suitable distance in front of the listener. The goal of stereo rendering was limited to creating phantom images at a position between the two speakers and is known as panned stereo. The same content could be played on portable playback devices as well, as it relied on a headphone or an earplug which uses 2 channels. Furthermore the use of stereo widening and 3D audio applications have recently become more popular especially for portable devices with audio playback capabilities. There are various techniques for these applications that provide user spatial feeling and 3D audio content. The techniques employ various signal processing algorithms and filters. It is known that the effectiveness of spatial audio is stronger over headphone playback
Commercial audio today boasts of 5.1, 7.1 and 10.1 multichannel content where 5, 7 or 10 channels are used to generate surrounding audio scenery. An example of a 5.1 multichannel system is shown in Figure 2 where the user 211 is surrounded by a front left channel speaker 251, a front right channel speaker 253, a centre channel speaker 255, a left surround channel speaker 257 and a right surround channel speaker 259. Phantom images can be created using this type of setup lying anywhere on the circle 271 as shown in Figure 2. Furthermore a channel in multichannel audio is not necessarily unique. -Audio signals for one channel after frequency dependent phase shifts and magnitude modifications can become the audio signal for a different channel. This in a way helps to create phantom audio sources around the listener leading to a surround sound experience. However such equipment is expensive and many end users do not have the multi-loudspeaker equipment for replaying the multichannel audio content. To enable multichannel audio signals to be played on previous generation stereo playback systems, the multichannel audio signals are matrix downmixed.
After the downmix the original multi-channel content is no longer available in its component form (each component being each channel in say 5.1). All of the channels from 5.1 are present in the down-mixed stereo. When such stereo signals are played back over headphones directly, the phantom images lie on an imaginary line joining the left and right ears. This line is known as the interaural axis and the experience is often called inside-the-head feeling or lateralization.
However in real life a user would not experience an audio source that is localized inside their head. As a result of this unnatural playback method, prolonged listening to this form of stereo audio over headphones leads to listener fatigue. Furthermore even if some stereo widening is applied to the two channels of a panned stereo, users perceive a limited surround feeling.
To overcome such problems firstly there is a re-synthesis of multichannel signals from the stereo signals. Such re-synthesis typically involves an upmixing of the stereo-signal to extract additional channel audio signals. In particular centre channel extraction is important as the centre channel could be speech/vocal audio signals, specific musical instruments or both.
Each of these extracted audio signals may be then virtualized to different virtual locations. A virtualizer typically introduces frequency dependent relative delays and amplification or attenuation to the signals before the signals are sent to headphone speakers. The introduction of typical virtualization would pan certain sources away from the mid plane where the user does not have any control how loud or quiet these sources could be.
For example the user may be interested in a vocalist located centre stage rather than the audience located off-centre stage and the stereo audio signals may easily mask the key sections of the vocalist by the background noise from the audience.
The sources that appear to be originating from the centre can often be at higher or lower audio levels relative to the rest of the sources in the audio scene. Listeners typically do not have any control over this level and often want to amplify or attenuate these central sources depending on their perceptual preference. Lack of this feature often results in a poor audio experience.
This invention proceeds from the consideration that prior art solutions for centre channel extraction do not produce good quality centre channel audio signals. Thus listening to centre channel audio signals produces a poor listening experience. Furthermore the poor quality centre channel audio signals produce poor quality listening experiences when virtualized.
Embodiments of the present invention aim to address the above problem.
EP1784048 discloses a signal processing apparatus generating, from left-channel and right-channel stereo signals, a centre-channel signal. The stereo signals are split into different frequency bands by two identical filter banks. Within each frequency band, the phase difference between the stereo signals is determined by a phase difference detector. For each frequency band, a gain is calculated by gain generator as a function of the phase difference. The gain is set to 0 for phase differences of +-180 DEG and to 1 for phase differences of 0 DEG. This gain is applied by a multiplier to the average of the stereo signals within each frequency band. The resulting outputs of all frequency bands are synthesised by a signal synthesiser to form the resulting centre-channel signal.
US2005169482 (A1) discloses an audio spatial environment engine for converting from an N channel audio system to an M channel audio system, where N is an integer greater than M. The audio spatial environment engine includes one or more correlators receiving two or more of the N channels of audio data and eliminating delays between the channels that are irrelevant to an average human listener. One or more Hilbert transform systems each perform a Hilbert transform on one or more of the correlated channels of audio data. One or more summers receive at least one of the correlated channels of audio data and at least one of the Hilbert transformed correlated channels of audio data and generate one of the M channels of audio data.
JP2002078100 (A ) discloses a system for processing a stereophonic signal provided with frequency band division sections that divide a stereophonic signal into frequency bands in each channel, a similarity calculation section that calculates the similarity between channels for each frequency band, an attenuation coefficient calculation section that calculates an attenuation coefficient to suppress or emphasize a sound source signal localized around the middle on the basis of the similarity, a multiplier that multiplies the attenuation coefficient with each frequency band signal, and a sound source signal synthesis section and an output section that resynthesize each frequency band signal in each channel after the multiplication of the attenuation coefficient and provide an output of the result.
An article by Ewan A. Macpherson et all in 2002 in Acoustical Society of America "Listener weighting of cues for lateral angle: The duplex theory of sound localization revisited", vol. 111, no. 5, 1 May 220, pages 2219 - 2236, XP012002885 discloses that the virtual auditory space technique was used to quantify the relative strengths of interaural time difference (ITD), interaural level difference (ILD), and spectral cues in determining the perceived lateral angle of wideband, low-pass, and high-pass noise bursts. Listeners reported the apparent locations of virtual targets that were presented over headphones and filtered with listeners' own directional transfer functions. The stimuli were manipulated by delaying or attenuating the signal to one ear (by up to 600 µs or 20 dB) or by altering the spectral cues at one or both ears. Listener weighting of the manipulated cues was determined by examining the resulting localization response biases. In accordance with the Duplex Theory defined for pure-tones, listeners gave high weight to ITD and low weight to ILD for low-pass stimuli, and high weight to ILD for high-pass stimuli. Most (but not all) listeners gave low weight to ITD for high-pass stimuli. This weight could be increased by amplitude-modulating the stimuli or reduced by lengthening stimulus onsets. For wideband stimuli, the ITD weight was greater than or equal to that given to ILD. Manipulations of monaural spectral cues and the interaural level spectrum had little influence on lateral angle judgements. An example of duplex-theory of sound localization and use of ITD and ILD values is given by SENGPIEL: 'Die Duplex-Theorie von Lord Rayleigh', 30 April 2008, pages 1 - 1, XP055099399.
There is provided according to a first aspect of the invention a method for processing audio signals comprising the set of features according to claim 1.
Further advantageous features of the method steps are defined in dependent claims 2 - 6.
According to a second aspect of the invention there is provided an apparatus for processing audio signals comprising the set of features according to claim 8.
Further advantageous features of the apparatus are defined in dependent claims 9-13.
According to a third aspect of the invention there is provided a computer-readable medium encoded with instructions that, when executed by a computer perform the method of any of claims 1 to 6. An electronic device or a chipset may comprise the apparatus as described above.

Brief Description of Drawings

For better understanding of the present invention, reference will now be made by way of example to the accompanying drawings in which:

Figure 1 shows schematically an electronic device employing embodiments of the application;
Figure 2 shows schematically a 5 channel audio system configuration;
Figure 3 shows schematically a stereo to multichannel up-mixer;
Figure 4 shows schematically a centre channel extractor as shown in Figure 3;
Figure 5 shows schematically a centre channel extractor as shown in Figure 3 and 4 in further detail;
Figure 6 shows a flow diagram illustrating the operation of the centre channel extractor according to embodiments of the application;
Figure 7 shows schematically an Euclidean difference distance showing the first and second threshold distances;
Figures 8a and 8b show graphically head related transfer functions across frequencies for specific azimuth angles for use in determining first and second threshold values according to some embodiments;
Figures 9a and 9b show graphically head related transfer functions across azimuth positions for specific frequencies for use in determining first and second threshold values according to some embodiments;
Figures 10a and 10b show graphically perception beam determinations for frequencies;
Figure 11 shows schematically a pre-processing stage for the left channel audio signal; and
Figure 12 shows schematically a section of the centre channel extractor for some further examples.

The following describes apparatus and methods for the provision of enhancing centre channel extraction. In this regard reference is first made to Figure 1 schematic block diagram of an exemplary electronic device 10 or apparatus, which may incorporate a centre channel extractor. The centre channel extracted by the centre channel extractor in some embodiments is suitable for an up-mixer.
The electronic device 10 may for example be a mobile terminal or user equipment for a wireless communication system. In other embodiments the electronic device may be a Television (TV) receiver, portable digital versatile disc (DVD) player, or audio player such as an ipod.
The electronic device 10 comprises a processor 21 which may be linked via a digital-to-analogue converter 32 to a headphone connector for receiving a headphone or headset 33. The processor 21 is further linked to a transceiver (TX/RX) 13, to a user interface (UI) 15 and to a memory 22.
The processor 21 may be configured to execute various program codes. The implemented program codes comprise a channel extractor for extracting a centre channel audio signal from a stereo audio signal. The implemented program codes 23 may be stored for example in the memory 22 for retrieval by the processor 21 whenever needed. The memory 22 could further provide a section 24 for storing data, for example data that has been processed in accordance with the embodiments.
The channel extracting code may in embodiments be implemented in hardware or firmware.
The user interface 15 enables a user to input commands to the electronic device 10, for example via a keypad, and/or to obtain information from the electronic device 10, for example via a display. The transceiver 13 enables a communication with other electronic devices, for example via a wireless communication network.
It is to be understood again that the structure of the electronic device 10 could be supplemented and varied in many ways.
The apparatus 10 may in some examples further comprise at least two microphones for inputting audio or speech that is to be processed according to embodiments of the application or transmitted to some other electronic device or stored in the data section 24 of the memory 22. A corresponding application to capture stereo audio signals using the at least two microphones may be activated to this end by the user via the user interface 15. The apparatus 10 in such exemples may further comprise an analogue-to-digital converter configured to convert the input analogue audio signal into a digital audio signal and provide the digital audio signal to the processor 21.
The apparatus 10 may in some examples also receive a bit stream with correspondingly encoded stereo audio data from another electronic device via the transceiver 13. In these examples, the processor 21 may execute the channel extraction program code stored in the memory 22. The processor 21 in these examples may process the received stereo audio signal data, and output the extracted channel data.
In some examples the headphone connector 33 may be configured to communicate to a headphone set or earplugs wirelessly, for example by a Bluetooth profile, or using a conventional wired connection.
The received stereo audio data may in some examples also be stored, instead of being processed immediately, in the data section 24 of the memory 22, for instance for enabling a later processing and presentation or forwarding to still another electronic device.
It would be appreciated that the schematic structures described in figures 3, 4, 5 and 11 and the method steps in figure 6 represent only a part of the operation of a complete audio processing chain comprising some examples and embodiments as exemplarily shown implemented in the electronic device shown in figure 1.
Figure 3 shows in further detail an up-mixer 106 suitable for the implementation of some examples of the application. The up-mixer is configured to receive a stereo audio signal and generate a left channel audio signal L", a centre channel audio signal C' and a right channel audio signal R".
The up-mixer 106 is configured to receive the left channel audio signal at the left input 451 and the right channel audio signal at the right input 453. The up-mixer 106 furthermore comprises a centre channel extractor 455 which receives the left channel audio signal L and the right channel audio signal R and generates a centre channel audio signal C. Although the above and the following describes an input of a left and right channel audio signal and an up-mixed output of a left, centre and right channel audio signal it would be appreciated that the input may be any pair of input audio signal channels, such as a first and second input channel audio signal and the output is a upmixed first, second and third output channel, where at least one of the three output channels is an extraction of the first and second input channel.
The centre channel audio signal C is in some examples furthermore passed to a first amplifier 461 which applies a gain A₁ to the signal and outputs the amplified signal to the left channel modifier 465.
The left channel audio signal L is further passed to a left channel filter 454 which applies a delay to the audio signal substantially equal to the time required to generate the centre channel audio signal C. The left channel filter 454 in some examples may be implemented by an all pass filter. The filtered left channel audio signal is passed to the left channel modifier 465.
The left channel modifier 465 is configured to subtract the amplified centre channel audio signal A₁C from the filtered left channel audio signal to generate a modified left channel audio signal L. The modified left channel audio signal in some embodiments is passed to the left channel amplifier 487.
The centre channel audio signal C is furthermore in some examples passed to a second amplifier 463 which applies a gain A₂ to the signal and outputs the amplified signal to the right channel modifier 467.
The right channel audio signal R is further passed to a right channel filter 456 which applies a delay to the audio signal substantially equal to the time required to generate the centre channel audio signal C. The right channel filter 456 in some examples may be implemented by an all pass filter. The filtered right channel audio signal is passed to the right channel modifier 467.
The right channel modifier 467 is configured to subtract the amplified centre channel audio signal A₂C from the filtered right channel audio signal to generate a modified left channel audio signal R'. The modified right channel audio signal in some examples is passed to the right channel amplifier 491.
The left channel amplifier 487 in some example is configured to receive the modified left channel audio signal L', amplify the modified left channel audio signal and output the amplified left channel signal L". The up-mixer 106 furthermore is configured in some examples to comprise a centre channel amplifier 489 configured to receive the centre channel audio signal C, amplify the centre channel audio signal and output an amplified centre channel signal C' . The up-mixer 106 in the same examples comprises a right channel amplifier 491 configured to receive the modified right channel audio signal R', amplify the modified right channel audio signal and output the amplified right channel signal R".
The gain of the left channel amplifier 487, centre channel amplifier 489 and right channel amplifier 491 in some examples may be determined by the user for example using the user interface 15 so as to control the importance of the 'centre' stage audio components with respect to the 'left' and 'right' stage audio components. In other words the user may control the gain of the 'centre' over the 'left' and 'right' components so that the user may emphasise the vocalist over the instruments or audience audio components according to the earlier examples. In some other embodiments the gains may be controlled or determined automatically or semiautomatically. Such examples may be implemented for applications such as Karaoke.
With respect to Figures 4, and 5 schematic views of centre channel extractor 455 with respect to some examples of the application is shown in further detail, furthermore with respect to Figure 6 the operation of the centre channel extractor 455 according to some embodiments of the application is described.
In some embodiments as described in further detail hereafter the extraction of the centre channel uses both magnitude and phase information for lower frequency components and magnitude information only for higher frequencies. More specifically the centre channel extractor 455 in some embodiments uses frequency dependent magnitude and phase difference information between the stereo signals and compares this information against the user interaural level difference (ILD) and interaural phase difference (IPD) to decide if the signal is located at the centre i.e. in the median plane (vertical plane passing through the midpoint between two ears and nose). The proposed method can in some embodiments be customized according to user's own head related transfer function. It can in some other embodiments be used to extract sources in the median plane for a binaurally recorded signal.
However it would be appreciated that the methods and apparatus as described hereafter may extract the centre channel using at least one of the interaural level difference (ILD), the interaural phase difference (IPD) and the interaural time difference (ITD). Furthermore it may be understood that in some embodiments the selection of the at least one difference used may differ according to the frequency being analysed. As the example described above and following, there is a first selection for a first frequency range where the interaural level difference and interaural phase difference are used at a low frequency range and a second selection of only the interaural level difference used at a higher frequency range.
For example for the case of a centre channel extracted from a downmixed stereo system, the centre channel audio signal components are present in both the left and right stereo audio signals where the components have the same intensity and zero delay i.e. no phase difference. When listening over headphones, a listener would perceive this sound to be on the median plane (vertical plane passing through the midpoint of two ears and noise). The absence of finer frequency specific cues would mean that a listener would often perceive this signal at the centre of the head. In the other words the listener may not be able to determine whether the signal is at the front or back or up or down on that plane.
Now consider the case of signals originally from a front left audio channel downmixed into a left and right channel stereo audio. As would be expected, the stereo Right channel audio signal does not contain any or significant components of the front left audio channel signal. As a result, the user perceives this signal to be at the left ear.
The principle of how to identify the centre channel components for extraction from such a downmixed stereo audio signal is to determine a selection of at least one of the ITD, IPD and ILD in the stereo signal and compare the ITD, IPD and ILD values to accustomed ILD, IPD and ITD values in order to evaluate the direction. This approach may be termed as Perceptual Basis from here on.
Thus for a single source (instrument, single vocalist etc) to be in the median plane, the overall level difference is minimal, there should be minimal interaural time delay (in other words the ITD is small), and furthermore minimal interaural phase delay (in other words the IPD is small).
It would be understood that the analysis may be carried out on a time domain basis for example where ITD is the selected difference and in some other embodiments on a spectral domain basis. For example in presence of multiple channels the spatial analysis may in some embodiments be done on a frequency sub-band basis. In some examples the analysis may employ time domain analysis such that in these other examples instead of calculating the relative phase, the time difference between the envelopes of signal pairs in the time domain are calculated.
The frequency sub-band based analysis is in some embodiments based on the superimposition of signals from all the sources in that given frequency band. The extraction in some embodiments uses the differenced in different frequency sub-bands (such as level, time or phase differences or a selection or combination of differences) to estimate the direction of the source in that frequency sub band. The net differences are compared to the differences (ILD, IPD and ITD cues) that are unique to that particular listener. These values are obtained from Head Related Transfer Function (HRTF) for that particular listener. Furthermore in some embodiments more than one of the cues (ILD, IPD, ITD) may be used to estimate the source direction in the lower frequency ranges (<1.5 KHz) but a single cue (for example the ILD or in other embodiments the ITD) may be the dominant cue at a higher frequency range (>1.5 KHz). The determination of the use of a dominant cue such as the use of the ILD for higher frequency ranges in some embodiments is because a high frequency source signal may see multiple phase wraparounds before reaching the contralateral ear.
A crude or basic estimator for the centre channel is 0.5*(L(n)+R(n)). This average of samples in time domain may perfectly preserve the original centre channel, but all of the remaining channels may also leak into the extracted centre channel. This leakage may be controlled by applying frequency specific gating or gains.
Thus, for example where interaural phase difference and interaural level differences are the selected differences then for each frequency band or sub-band, where the IPD and/or ILD pair for the band or sub-band'does not match up well against the IPD and/or ILD pair for the user centre direction, then a weighting may be applied to the components of the band or sub-band to prevent leakage of non-centre components into the extracted centre channel audio signal. In other words, by comparing the IPD and ILD pair of the stereo signal to the cues for a centre channel, a beam pattern may be formed to gate or filter unwanted leakage from other channels. This may be considered to be forming a perceptual beam pattern to allow signals that are located in the median plane.
The centre channel extractor 455 may receive the left channel audio signal L and the right channel audio signal R. The audio signals may be described with respect to a time and thus at time n the left channel audio signal may be labelled as L(n) and the right channel audio signal may be labelled as R(n).
This operation of receiving the left and right channel audio signals may be shown with respect to Figure 6 in step 651.
The centre channel extractor 455 may comprise a sub-band generator 601 which is configured to receive the left and right channel audio signals and output for each channel a number of frequency sub-band signals. In some embodiments the number of sub-bands may be N+1 and thus the output of the sub-band generator 601 comprises N+1 left channel sub-band audio signals L₀(n),...,L_N(n) and N+1 right channel sub-band audio signals R₀(n),...,R_N(n). The frequency range for each sub-band may be any suitable frequency division design. For example the sub-bands in some embodiments may be regular whilst in some other embodiments the sub-bands may be determined according to psychoacoustical principles. In some embodiments of the application the sub-bands may have overlapping frequency ranges on some other embodiments at least some sub-bands may have abutting or separated frequency ranges.
With respect to the centre channel extractor 455 shown in Figure 5 the sub-band generator is shown as a filterbank comprising a pair of first filters 603 (one left channel low pass filter 603_L and one right channel low pass filter 603_R) with cut-off frequency of 150Hz, a pair of second filters 605 (one left channel band pass filter 605_L and one right channel band pass filter 605_R) with a centre frequency of 200Hz and a bandwidth of 150Hz, a pair of third filters 607 (one left channel band pass filter 607_L and one right channel band pass filter 607_R) with a centre frequency of 400Hz and a bandwidth of 200Hz, and on to a pair of N+1th filters 609 (one left channel band pass filter 609_L and one right channel band pass filter 609_R) with centre frequency of 2500Hz and bandwidth of 500Hz. For clarity reasons further filters generating sub-band signals for other frequency ranges are not shown in figure 5.
Any suitable filter design may be used in embodiments of the application to implement the filters. Thus in some embodiments there may be different filterbank designs with suitable characteristics for the filterbank filters chosen.
For example, gammatone or gammachirp filterbank models which are models of particularly suitable filterbanks for the human hearing system may be used. In some other embodiments a suitable finite impulse response (FIR) filter design may be used to generate the sub-bands.
Furthermore the filtering process may be configured in some embodiments to be carried out in the frequency domain and thus the sub-band generator 601 may in these embodiments comprise a time to frequency domain converter, a frequency domain filtering and a frequency to time domain converter.
The operation of generating sub-bands is shown in Figure 6 by step 653.
The centre channel extractor 455 in some embodiments may further comprise a gain determiner 604. The gain determiner 604 is configured in some embodiments to receive the left and right channel sub-band audio signals from the sub-band generator 601 and determine a gain function value to be passed to a combined signal amplifier 610.
With respect to Figure 5 the gain determiner 604 for clarity reasons is partially shown as separate gain determiner apparatus for the first sub-band (a first sub-band gain determiner 604₀) and the N+1th sub-band (a N+1th sub-band gain determiner 604_N). This separation of the gain determination into sub-band apparatus allows the gain determination to be carried out in parallel or substantially in parallel. However it would be understood that the same operation may be carried out serially for each sub-band in some embodiments of the application and as such may employ a number of separate sub-band gain determiner apparatus fewer than the number of sub-bands.
The gain determiner 604 in some embodiments may comprise a gain estimator 633 and a threshold determiner 614. The gain estimator 633 in some embodiments receives the left and right channel sub-band audio signal values, and the threshold values for each sub-band from the threshold determiner 614 and determines the gain function value for each sub-band.
The threshold determiner 614 is configured in some embodiments to generate the threshold values for each sub-band. In some embodiments the threshold determiner generates or stores two thresholds for each sub-band, a lower threshold value threshold₁ and a higher threshold value threshold₂. The thresholds generated for each sub-band such as threshold₁ and threshold₂ are generated based on the listener's head related transfer function (HRTF). In some embodiments the HRTF for the specific listener may be determined using any suitable method for determining the HRTF. For example in some embodiments the HRTF may be generated by selecting a suitable HRTF from the Centre for Image Processing and Integrated Computing (CIPIC) database or any suitable HRTF database. In some other embodiments a suitable HRTF may be retrieved from a earlier determined HRTF for a user determined using a HRTF measuring device. In some other embodiments the threshold determiner 614 generates sub-band threshold values dependent on an idealized or modelled HRTF function such as a dummy head model HRTF.
With respect to Figures 8a, 8b, 9a and 9b sample signal level HRTF are shown. Figure 8a shows a sample HRTF for the left and right ears for frequencies from 20Hz to 20kHz for azimuth of 0 degrees, in other words with a source directly in front of the listener. From the above plot, it can be seen that the Interaul level differences (ILD) most of the frequencies up to about 5KHz is less than 6dB. This would be true for sources that are directly in front of the listener. Figure 8b shows for the same listener a sample-HRTF for the left and right ears for frequencies from 20Hz to 20kHz for a source azimuth of -65 degrees. The level differences in this example are now much greater at higher frequencies.
Figures 9a and 9b shows the signal level HRTF for the left and right ears for a 200Hz and 2 KHz signal for a sample listener for different azimuth angles all around the listener.
Thus the threshold determiner 614, in order to determine threshold values suitable so that the centre channel extractor may perceive a signal to be in the median plane (0,180 degrees) may have to determine threshold values where the left and right levels of a stereo signal (in other words the difference between the two traces for that azimuth angle) are very close at lower as well as for higher frequencies. This closeness metric is a function of frequency and tolerance around the intended azimuth angle (e.g. +/-15 degrees from 0 degree azimuth). Similarly phase differences may in some examples also be checked at lower frequencies and limits can be established. The threshold values generated by the threshold determiner thus specify the differences allowed between the left and right channels to enable the extraction of the centre channel for each frequency band.
In some embodiments of the application the selected or generated HRTF may be associated with a number of predetermined threshold values for each sub-band. In some further examples the thresholds may be determined by determining the ILD between the left and right HRTF for the user at +/- 15 degree range from the centre.
In some further examples the thresholds may be determined by examining the total power in a frequency band or sub-band (for example in some examples this may be a indicated or selected critical band). Similarly in some examples a band filtered Head Related Impulse Response (HRIR) may be cross correlated to determine the difference between the left and right ear response in terms of phase/time differences.
Then the threshold determiner 614 in these embodiments may use these Interaural Level Difference (ILD) values, Interaural Time Difference (ITD) and/or Interaural Phase Difference (IPD) values to set the threshold values for each band/sub-band accordingly. For example in the embodiments described above where the differences selected are for the lower frequency range based on a selection of the Interaural Level Difference (ILD) values, and Interaural Phase Difference (IPD) values then HRTF or HRIR values for the Interaural Level Difference (ILD) and Interaural Phase Difference (IPD) may be used to set the threshold values for the lower frequency range. Similarly in these embodiments where the difference selected for the higher frequency range is based on the Interaural Level Difference (ILD) values only then the HRTF or HRIR values for the Interaural Level Difference (ILD) may be used to set the threshold values for the higher frequency ranges. In other words dependent on the difference selection the threshold is set based on the selected difference or differences shown in the HRTF or HRIR. The operation of determining threshold values for sub-bands is shown in figure 6 by step 656.
The gain estimator 633 in some examples, and as shown in Figure 4, comprises a Discrete Fourier Transformer (DFT) computation block 606 and a coefficient comparator 608. The DFT computation block 606 in some examples receives the left and right channel sub-band audio signal values. The DFT computation block 606 generates complex frequency domain values for each sub-band for both the left and right channel. In other examples any suitable time to frequency domain transformer may be used to generate complex frequency domain values such as a discrete cosine transform (DCT), Fast Fourier Transform (FFT), or wavelet transform.
In some examples the DFT computation block 606 may generate the complex coefficients for each sub-band using Goertzel algorithm: $v_{k} (n) = 2 \cos (2 π \frac{k}{M}) v_{k} (n - 1) - v_{k} (n - 2) + x (n) .$
The DFT computation block 606 may thus in these examples compute v_k(n) for each new input sample.
After M samples have been computed the DFT computation block 606 calculates the DFT coefficient by evaluating the left hand side of the equation once: $y_{k} (M) = v_{k} (M) - W_{M}^{k} v_{k} (M - 1)$
$W_{M}^{k} = \exp (- j 2 π \frac{k}{M})$
Values of M and k may in some examples be chosen for each sub-band independently to approximately capture the frequency range of the given sub-band filter. W_M ^k and cos(2*pi*k/M) are constants.
The DFT computation block 606 in these examples sets the values of v_k(n-2) and v_k(n-1) to zero initially, and also reset after every M samples. After doing the above processing for M samples y_k(n) is the required DFT coefficient. The DFT computes these coefficients for all the sub-bands for both left and right channel signals.
The DFT coefficients determined by the DFT computation block 606 are complex numbers. The left channel DFT coefficients are represented as H_L(k), and the right channel DFT coefficients are represented as H_R(k) where k represents the sub-band number.
The DFT coefficients are passed to the coefficient comparator 608. The operation of generating the DFT coefficients is shown in figure 6 by step 655.
The coefficient comparator 608 receives the DFT coefficients from the DFT computation block 606 and the threshold values for each sub-band from the threshold determiner 614 to determine the gain function value for each sub-band.
The coefficient comparator 608 is configured in some examples to determine how close the sub-band Interaural difference (for example at least one of the Interaural level difference - ILD, the Interaural time difference ITD, and the Interaural phase difference - IPD) values are with respect to the ILD, IPD and ITD values for the centre of head (front or back) localization. In other words where the signal component was a part of the original centre channel there would be virtually no Interaural difference (or put another way the ILD, IPD and ITD values would be expected to be close to zero). The coefficient comparator 608 thus attempts to find closeness in H_L(k) and H_R(k) values. As the DFT values for each sub-band for the left and right channels are complex numbers, this 'closeness' can be measured by determining the Euclidean distance between the H_L(k) and H_R(k) points on the complex plane. In other examples other distance metrics may be applied.
In some examples the pure phase difference value, the IPD, may be determined by calculating the minimum phase impulse response for the sub-band. For example if a head related impulse response for the left and right channel signal is determined and converted to a minimum phase impulse response form the difference in the phase between phase responses of the minimum phase impulse response may be treated as the IPD value.
With respect to Figure 7 a graphical representation of the selection of the difference and the thresholds for some embodiments which have selected the differences as level and phase differences may be shown in which an example of the normalized H_L(k) value H_L(k)/(max(H_L(k),H_R(k))) 711 with a orientation of Ø_L from the real plane and the normalized H_R(k) value H_R(k)/(max(H_L(k),H_R(k))) 713 with a orientation of Ø_R from the real plane is shown. Furthermore the vector difference distance 705 is shown. It would be understood that non-normalized differences and values may be determined in some other embodiments.
The coefficient comparator 608 may in some embodiments determine the distance of the difference vector (or scalar) 705 for the sub-band and also compare the distance against the defined/generated threshold values for the sub-band. For example in the embodiments described above where the differences selected are for the lower frequency range based on a selection of the Interaural Level Difference (ILD) values, and Interaural Phase Difference (IPD) values then the difference is the vector difference which is compared against a vector threshold - which may be represented by the circle from the end of one of the vectors in Figure 7.
Similarly in these embodiments where the difference selected for the higher frequency range is based on the Interaural Level Difference (ILD) values only then the difference is the scalar difference produced by rotating one of the left or right normalized vectors onto the other vector. Although the vector difference is compared against a scalar threshold it would be understood that the threshold or thresholds themselves may further be vector in nature (in other words that the level difference is more significant than the phase difference).
In some embodiments as described above two threshold values are determined/generated and passed to the coefficient comparator 608 to be checked against the sub-band difference vector distance. However in some other embodiments only one threshold value is determined/generated and checked against or in some other embodiments more than two threshold values may be used.
In the two threshold per sub-band embodiments the coefficient comparator 608 may determine that if the two DFT vectors, H_L(k) and, H_R(k) for a specific sub-band k are close, in other words less than the smaller threshold (threshold₁) value, or mathematically
Difference vector distance < threshold₁
then a gain g_k of 1 or 0dB is assigned to that sub-band. This is represented by a first region 721 in Figure 7. Thus the comparator 608 has determined that as the difference values (such as a selection of one of the ILD, IPD and ITD and for example for the lower frequency range based on a selection of the Interaural Level Difference (ILD) values, and Interaural Phase Difference (IPD) values and for the higher frequency range is based on the Interaural Level Difference (ILD) values only) between the two channels is small then this sub-band comprises audio information which with a high confidence level was originally centre channel audio signal.
The comparison operation against the first threshold value is shown in Figure 6 by step 657. Furthermore the operation of the assignment of the gain g_k of 1 where the difference is less than the threshold is shown in step 659. Following step 659 the method progresses to the operation of combining left and right channel audio signals.
In the same embodiments the coefficient comparator 608 furthermore determines that if the difference between the vectors (for the IPD and ILD lower frequency range) or scalars (for the ILD only higher frequency range) shown as the two DFT vectors, H_L(k) and, H_R(k) in Figure 7 for a specific sub-band k are greater than the lower threshold (threshold₁) value but less than a higher threshold (threshold₂) then a gain, g_k, which is less than 1 but greater than 0 is assigned to that sub-band. This area is represented in Figure 7 by a second region 723. Thus the comparator 608 has determined that as the difference values (such as a selection of at least one of the ILD, IPD and ITD as seen from the vector or scalar distances between the left and right channel sub-vector values H_L and H_R) between the two channels is moderate then this sub-band comprises audio information which with a moderate confidence level was originally part of the centre channel audio signal. In some embodiments the assigned gain is a function of the difference distance and the threshold values. For example, the assigned gain may be an interpolation of a value between 0 and 1 where the assigned gain is higher the nearer the difference value is to the lower threshold value. This interpolation may in some examples be a linear interpolation and may in some other examples be a non-linear interpolation.
Furthermore in the same some examples the coefficient comparator 608 furthermore determines that if the distance of the vector (for the IPD and ILD lower frequency range) or of the scalar (for the ILD only higher frequency range) is greater than the higher threshold (threshold₂) value then the gain, g_k assigned for the sub-band is 0. This is represented in Figure 7 by a third region 725. Thus the comparator 608 has determined that as the difference values (such as at least one of ILD, IPD and ITD) between the two channels is large then this sub-band comprises audio information which with a low or no confidence level was originally centre channel audio signal.
The comparison operation against the second or higher, threshold value (threshold₂) is shown in Figure 6 by step 661. Furthermore the operation of the assignment of the gain of between 1 and 0 where the difference is less than the higher threshold (but implicitly greater than the lower threshold) is shown in step 665. Following step 665 the method progresses to the operation of: combining left and right channel audio signals.
Furthermore the operation of the assignment of the gain of 0 where the difference is greater than the higher threshold (and implicitly greater than the lower threshold) is shown in step 663. Following step 663 the method progresses to the operation of combining left and right channel audio signals.
In some examples the coefficient comparator 608 may for some sub-band compare a non-vector (scalar) difference distance against the threshold value or values. In such examples the non-vector difference is the difference between the magnitudes |H_L(k)| and |H_R(k)| without considering the phase (and thus related by frequency also the time) difference. In such embodiments the magnitude or level (ILD) difference is compared against the threshold values in the same way as described above.
In some examples the coefficient comparator 608 determines both vector and scalar differences and selects the result dependent on the sub-band being analysed. Thus in the examples the magnitude (scalar) difference may be determined and compared for the higher frequency sub-bands and the vector (phase and level) difference values may be determined for the lower frequency sub-bands. For example the coefficient comparator 608 may in some examples compare the magnitude difference against the threshold values for sub-bands for the frequency range >1500Hz and the vector difference against the threshold values for the sub-bands in the frequency range <1500Hz.
Although the examples described above use difference thresholds or 'cue' values defined by the IPD and ILD it would be appreciated that other cues such as phase difference or inter-aural time difference (ITD) where the relative time difference between the right and left signals is determined and compared against time threshold values or value may be used in some other examples. For example in some examples the ILD and ITD differences which would describe a vector difference may be employed in lower frequency ranges or sub-bands and ILD differences only which would describe a scalar difference in higher frequency ranges or sub-bands. Furthermore in some other examples the differences selected may be all three of the differences IPD, ILD and ITD which define a three dimensional vector. The distance between the left and right channels may then define a three dimensional space and be tested against at least one three dimensional threshold. In further examples the ILD may be employed for the whole frequency range being analysed and the IPD and ITD being selected dependent on the frequency range being analysed.
With respect to figure 12 a schematic view of a gain determiner 604 configured to determine a gain based on the selection of the ILD and ITD is shown.
The sub-band signals for the left and right channels are passed to the cross correlator 1201 and the level difference calculator.
The cross correlator 1201 may determine a cross correlation between the filterbank pairs, for example the cross correlation for the first band or sub-band may be determined between the output of the first band or sub-band of the left channel audio signal with the output of the first band or sub-band of the right signal. The cross correlation would in these examples reveal a maximum peak which would occur at the time delay between the two signals, or in other words generating a result similar to the ITD which is passed to the coefficient comparator 608.
In some other examples the group delays of each of the filtered signals may be calculated and the ITD between the right and left signals after the filterbanks be determined from these group delay values.
Furthermore the level difference calculator 1203 may determine the magnitude of the sub band components and may further determine the difference between the magnitude of the components and furthermore pass these values to the coefficient comparator 608.
The threshold determiner 614 in these embodiments may determine at least one threshold value for each of the ILD value and the ITD value. In other words two sets of thresholds are determined, received or generated, one for delay and one for timing.
The coefficient comparator 608 may then compare the determined ITD and ILD values against the associated set of threshold values to generate the associated gain or pass value.
Although the examples above describe the coefficient comparator 608 as generating an associated gain value according to a algorithmic function, it would be appreciated that the coefficient comparator 608 in some examples may generate the threshold values by using a lookup table. For example in the examples where the difference is the selection of the ITD and ILD values a two dimensional look-up table is used with delay on one axis and level difference on the other axis. The gain is then read from the look-up table based on the input delay and level difference values for that sub-band.
As described previously in some examples one difference or cue may be used for one frequency range (or sub-band) and a second difference or cue for a different frequency range (or sub-band). For example, in some embodiments the ITD cue may be used for higher frequency signals because ITD is effective at higher frequencies whereas the IPD is used at lower frequencies. The ITD can be thought as the time difference between the envelopes of signal pairs whereas IPD is the difference between the signal contents (in other words inside the envelope). In some further embodiments the IPD and ITD may be determined at lower frequencies.
In some further examples any suitable combination of the IPD, ITD, and/or ILD cues may be used determine or identify the sub-band components which may be used to generate the centre channel audio signal by comparing the difference value against one or more threshold values.
The above description has presented the embodiments where differing selections of differences are used from the viewpoint of a series of frequency ranges for which different selections of differences are used and tested against various threshold values. However the same embodiments may be presented from the viewpoint of the differences, in other words that each of the differences (for example IPD, ITD, ILD) have an effect on different ranges of the sub-bands. For example in some embodiments the ILD may be used on sub-bands analysed above 1500 Hz, IPD may be used on sub-bands analysed below 1500 Hz, and ITD used for sub-bands analysed from 0 to 5000 Hz (This would for example from the viewpoint of frequency ranges could be seen as a lower frequency range <1500 Hz with the IPD and ITD difference being selected and a higher frequency range > 1500 Hz with the ILD and ITD being selected).
In some embodiments each of the differences may be used for different analysis ranges which may overlap or abut or be separated. Thus a further example of such embodiments would be that the IPD is selected for a first frequency range from 0 Hz to 500 Hz, the ITD is selected for a second frequency range from 501 Hz to 1500 Hz and the ILD is selected for a third frequency range from 1501 Hz to 5000 Hz.
Although the above embodiment is described with reference to two threshold values per sub-band and defining three regions (a first region 721 with a unity gain a second region 723 with a sub-unity gain, and a third region with a zero gain), it would be appreciated that two or more regions may be defined with different gain values. For example with one threshold value two regions may be defined with one region being a pass region (i.e. the switch is on or gain is equal to one) where the cue value is less than the threshold, and the second region being a block region (i.e. the switch is off or gain is zero) where the cue value is greater than the threshold. In other embodiments more than two thresholds would produce more than three regions.
In some further examples the comparator 608 applies an additional 1^st order low pass smoothing function to reduce any perceptible distortion because of time varying nature of the gain. Mathematically such a low pass filter may be implemented using the following equation, $g_{k} (n) = (1 - α) \times g_{k} (n - 1) + α \times g_{k},$
Where the g_k(n-1) value is the previous time instant output gain value for the k'th sub-band, g_k the value determined by the comparer 608 and g_k(n) the output gain value for the current time instant for the k'th sub-band. In some other examples of the application the comparator 608 may apply a higher order smoothing function or any suitable smoothing function to the output gain values in order to attempt to reduce perceptible distortion.
The gain values are in some embodiments output to the amplifier 610.
The centre channel extractor 455 in some examples and embodiments comprises a sub-band combiner 602 which receives the left and right channel sub-band audio signals and outputs combined left and right sub-band audio signals. In figure 5 the sub-band combiner 602 is shown to comprise an array of adders. Each of the adders is in some examples shown to receive one sub-band left channel audio signal and the same sub-band right channel audio signal and output a combined signal for that sub-band. Thus in such embodiments there is shown a first adder 623 adding the left and right channel audio signals for the sub-band 0, a second adder 625 adding the left and right channel audio signals for the sub-band 1, a third adder 627 adding the left and right channel audio signals for the sub-band 2, and a N+1th adder 629 adding the left and right channel audio signals for the sub-band N. The fourth to N'th adders are not shown in Figure 5 for clarity reasons.
In some examples the combination is an averaging of the left and right channel audio signals for a specific sub-band. Thus the sub-band combiner may in these embodiments produce the following results: $B_{0} (n) = 0.5 * (L_{0} (n) + R_{0} (n))$
$B_{1} (n) = 0.5 * (L_{1} (n) + R_{1} (n))$
and so on until $B_{N} (n) = 0.5 * (L_{N} (n) + R_{N} (n))$
These combined values are passed to the amplifier 610.
The process of combining the sub-band left and right channel audio signals is shown in Figure 6 by step 667.
The centre channel extractor 455 comprises in some embodiments an amplifier 610 for amplifying the combined left and right channel audio signals for each sub-band by the assigned gain value for the sub-band and outputting an amplified value of the combined audio signal to the sub-band combiner 612.
The amplifier 610 in some examples may comprise as shown in Figure 5 an array of variable gain amplifiers where the gain is set from a control signal from the gain determiner 604. In such examples there may be a first variable gain amplifier 633 amplifying the sub-band 0 combined audio signal B₀ by the sub-band 0 assigned gain value g₀, a second variable gain amplifier 635 amplifying the sub-band 1 combined audio signal B₁ by the sub-band 1 assigned gain value g₁, a third variable gain amplifier 637 amplifying the sub-band 2 combined audio signal B₂ by the sub-band 2 assigned gain value g₂, and a N+1th variable gain amplifier 639 amplifying the sub-band N combined audio signal B_N by the sub-band N assigned gain value g_N. The fourth to Nth variable gain amplifiers are not shown in Figure 5 for clarity reasons.
These amplified values are then in some examples as described above passed to the sub-band combiner 612.
The operation of amplifying the combined values by the assigned gains is shown in Figure 6 by step 669.
The centre channel extractor 455 may further in some examples comprise a sub-band combiner 612. The sub-band combiner 612 in some embodiments receives for each sub-band the amplified combined sub-band audio signal value and combines them to generate an extracted centre channel audio signal.
In some embodiments, as shown in figure 6, the sub-band combiner 612 comprises an adder 651 for performing a summing of the amplified combined sub-band audio signals. This averaging may be expressed as the following equation: $C (n) = \sum_{0}^{N - 1} g_{k} B_{k} (n) .$
The operation of the combining of the sub-bands is shown in figure 6 by step 673.
The difference between the basic averaging of the combined left and right channel signals and a centre channel signal extracted according to some of the embodiments is shown in figures 10a and 10b. The basic averaging of the left and the right channel signals, as expected does not detect the audio components where the signal is clearly in the left and right channel and thus audio sources which originated in the right or left sound stage 'bleed' into the extracted centre channel signal. However as can be shown in figures 10a and 10b the example embodiment where the allocated gain is applied to the combination of the left and right channel signals produces an extracted centre channel signal which is far less susceptible to audio signals originating from other channels.
The embodiments of the application as described above thus achieve a more natural and accurate centre channel extraction process. Such centre channel extraction may in further examples provide further use cases so that the user may control the centre channel depending on user preferences.
Although the centre channel extractor has been described with respect to an upmixing and virtualisation process for headphones the centre channel extraction apparatus and method is suitable for many different audio signal processing operations. Thus it would be appreciated that the apparatus may be employed to extract audio signals from pairs of channels at various directions to the pairs of channels. For example the same centre channel extraction process may be used to extract so called unknown sources. For example a device, such as a camera, with microphones mounted on opposite sides to record stereo sound may generate a pair of audio signals which using the channel extraction apparatus or methods may then produce a centre channel audio signal for presentation. In other words where a sound stage is recorded using stereo microphones a centre channel signal may be determined in order to isolate an audio source located in the 'centre'. For example where a vocalist is located at a centre stage position, the accompanying instruments on one side and audience on the other, the vocalist audio components may be extracted from the signal containing the instruments and audience component signals. In another employment of embodiments of the application the extracted centre channel may be subtracted from the left L' and right R' channel audio signals to generate a modified left L" and right R" channel audio signal. This output stereo signal would conventionally thus remove the vocalist from the audio signal rendering the resultant stereo audio signal suitable for Karaoke.
Furthermore in some embodiments the process and apparatus may be implemented by an electronic device (such as a mobile phone) or through a server/database.
In some examples the centre channel extractor 455 further comprises a pre-processor. With respect to figure 11 a left channel audio signal pre-processor part 1151 is shown. It would be appreciated that the pre-processor would further comprise a mirror image right channel audio signal pre-processor part which is not shown in order to clarify the figure. The pre-processor is implemented prior to the sub-band generator 601 and the output of the pre-processor in such embodiments is input to the sub-band generator 601. The pre-processor is configured to apply a pre-processing to the signal to remove some of the uncorrelated signal in left and right channels. Therefore in some examples the pre-processor attempts to remove these uncorrected signals from the left and right channel audio signals before the generation of the sub-band audio signals.
The left channel audio signal may be expressed as a combination of two components. These two components are a component S(n) that is coherent with right channel audio signal and an uncorrelated component N₁(n). Similarly the right channel signal may also be expressed as a combination of two components, the coherent component S(n) and an uncorrelated component N₂(n).
In some examples the left channel audio signal pre-processor part 1151 comprises a Least Mean Square (LMS) processor 1109 to estimate the uncorrelated component. In such examples the left channel audio signal is input into a delay 1101 with a length of T+1 and then passed to a first pre-processor combiner 1105 and a second pre-processor combiner 1107. The right channel audio signal is in these examples input to a filter W 1103 with a length of 2T+1 and whose filter parameters are controlled by the LMS processor 1109. The output of the filter is furthermore in these examples passed to the first pre-processing combiner 1105 to generate an estimate of the unrelated component N₁' which is passed to the second pre-processing combiner 1107 to be subtracted from the delayed left channel audio signal in an attempt to remove the uncorrelated information. The LMS processor 1109 in examples such as these receives both the n₁' estimate of the uncorrelated information and the right channel audio signal to choose the filter parameters such that the correlated information is output to be subtracted at the first pre-processing combiner 1105.
In some further examples where the signals have been diffused significantly and the listener's localization accuracy therefore affected, the comparator 608 may use Inter channel coherence as another metric as a measure to compute uncorrelatedness or diffusion. If the signals are perfectly correlated (ICC = 1). A separate gain term that is a function of this metric may be further assigned to the sub-band gain in some examples and thus multiplied to the combined signals to gate the signals that are highly diffused from getting leaked into the extracted centre channel.
Although the above examples describe embodiments of the invention operating within an electronic device 10 or apparatus, it would be appreciated that the invention as described below may be implemented as part of any audio processor. Thus, for example, embodiments of the invention may be implemented in an audio processor which may implement audio processing over fixed or wired communication paths.
Thus user equipment may comprise an audio processor such as those described in embodiments of the invention above.
It shall be appreciated that the term electronic device and user equipment is intended to cover any suitable type of wireless user equipment, such as mobile telephones, portable data processing devices or portable web browsers.
In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
The embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
Thus at least some embodiments may be a computer-readable medium encoded with instructions that, when executed by a computer perform: filtering at least two audio signals to generate at least two groups of audio components per audio signal; determining a difference between the at least two audio signals for each group of audio components; and generating a further audio signal by selectively combining the at least two audio signals for each group of audio components dependent on the difference between the at least two audio signals for each group of audio components.
The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.
As used in this application, the term 'circuitry' refers to all of the following:

(a) hardware-only circuit implementations (such as implementations in only analog and/or digital circuitry) and
(b) to combinations of circuits and software (and/or firmware), such as: (i) to a combination of processor(s) or (ii) to portions of processor(s)/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions and
(c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.

This definition of 'circuitry' applies to all uses of this term in this application, including any claims. As a further example, as used in this application, the term 'circuitry' would also cover an implementation of merely a processor (or multiple processors) or portion of a processor and its (or their) accompanying software and/or firmware. The term 'circuitry' would also cover, for example and if applicable to the particular claim element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or similar integrated circuit in server, a cellular network device, or other network device.
The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.

Claims

A method for processing audio signals comprising:
filtering at least two audio signals to generate at least two groups of at least two audio components;

determining at least one interaural difference between respective pairs of the at least two groups, wherein the at least one interaural difference is determined based on at least one audio component of each group of the at least two groups;

comparing the determined at least one interaural difference between the respective pairs of the at least two groups against at least one difference threshold for each respective pair of the at least two groups, wherein the at least one difference threshold is generated from a left and right head related transfer functions for a listener;

determining a gain value for each group of the at least two groups based on the comparing;

determining a sum of the at least two audio components for each group of the at least two groups;

determining a product for each group of the at least two groups by multiplying the gain value and the sum determined for each group of the at least two groups; and

generating a centre audio signal by combining the product determined for each group of the at least two groups.
The method as claimed in claim 1, wherein filtering the at least two audio signals comprises filtering the at least two audio signals into at least one of:
overlapping frequency range groups;

abutting frequency range groups;

linear interval frequency range groups; and

non-linear interval frequency range groups.
The method as claimed in claim 1 or 2, wherein the at least one interaural difference comprises at least one of:
interaural level difference value;

interaural phase difference value; and

interaural time difference value.
The method as claimed in claim 1, wherein determining the gain value further comprises:
determining a first gain value for each group of the at least two groups with an interaural difference less than a first difference threshold;

determining a second gain value for each group of the at least two groups with an interaural difference greater or equal to a first difference threshold of the at least one difference threshold and less than a second difference threshold of the at least one difference threshold;

determining a third gain value for each group of the at least two groups with an interaural difference greater or equal to the second difference threshold of the at least one difference threshold.
The method as claimed in claim 1 further comprising determining the left and right head related transfer functions for the listener dependent on at least one of:
a measured head related transfer function;

a measured head related impulse response;

a chosen head related transfer function;

a chosen head related impulse response;

a modified head related transfer function; and

a modified head related impulse response.
The method as claimed in claim 1, wherein the at least two audio signals include a left channel audio signal and a right channel audio signal.
A computer program product comprising instructions which when executed by at least one processor cause the processor to perform the method for processing audio signals comprising:
filtering at least two audio signals to generate at least two groups of at least two audio components;

determining at least one interaural difference between respective pairs of the at least two groups, wherein the at least one interaural difference is determined based on at least one audio component of each group of the at least two groups;

comparing the determined at least one interaural difference between respective pairs of the at least two groups against at least one difference threshold for each respective pair of the at least two groups, wherein the at least one difference threshold is generated from a left and right head related transfer function for a listener;

determining a gain value for each group of the at least two groups based on the comparing;

determining a sum of the at least two audio components for each group of the at least two groups;

determining a product for each group of the at least two groups by multiplying the gain value and the sum determined for each group of the at least two groups; and

generating a centre audio signal by combining the product determined for each group of the at least two groups.
An apparatus for processing audio signals comprising means for:
filtering at least two audio signals to generate at least two groups of at least two audio components;

determining at least one interaural difference between respective pairs of the at least two groups, wherein the at least one interaural difference is determined based on at least one audio component of each group of the at least two groups; and

comparing the determined at least one interaural difference between respective pairs of the at least two groups against at least one difference threshold for each respective pair of the at least two groups, wherein the at least one difference threshold is generated from a left and right head related transfer function for a listener;

determining a gain value for each group of the at least two groups based on the comparing;

determining a sum of the at least two audio components for each group of the at least two groups;

determining a product for each group of the at least two groups by multiplying the gain value and the sum determined for each group of the at least two groups; and

generating a centre audio signal by combining the product determined for each group of the at least two groups.
The apparatus as claimed in claim 8, wherein the means for filtering the at least two audio signals is further configured for filtering the at least two audio signals into at least one of:
overlapping frequency range groups;

abutting frequency range groups;

linear interval frequency range groups; and

non-linear interval frequency range groups.
The apparatus as claimed in claim 8 or 9, wherein the at least one interaural difference comprises at least one of:
interaural level difference value;

interaural phase difference value; and

interaural time difference value.
The apparatus as claimed in claim 8, wherein the means for determining the gain value is further configured for:
determining a first gain value for each group of the at least two groups with an interaural difference less than a first difference threshold of the at least one difference threshold;

determining a second gain value for each group of the at least two groups with an interaural difference greater or equal to the first difference threshold of the at least one difference threshold and less than a second difference threshold of the at least one difference threshold;

determining a third gain value for each group of the at least two groups with an interaural difference greater or equal to the second difference threshold of the at least one difference threshold.
The apparatus as claimed in claim 8 further comprising determining the left and right head related transfer functions for the listener dependent on at least one of:
a measured head related transfer function;

a measured head related impulse response;

a chosen head related transfer function;

a chosen head related impulse response;

a modified head related transfer function; and

a modified head related impulse response.
The apparatus as claimed in claim 8, wherein the at least two audio signals include a left channel audio signal and a right channel audio signal.