GB2542579A

GB2542579A - Spatial audio generator

Info

Publication number: GB2542579A
Application number: GB1516800.8A
Authority: GB
Inventors: Gregory Stanier James
Original assignee: Individual
Current assignee: Individual
Priority date: 2015-09-22
Filing date: 2015-09-22
Publication date: 2017-03-29
Also published as: GB201516800D0

Abstract

A spatial audio generator (26) is disclosed that provides improvements in efficiency compared to conventional sound engines using HRTFs. The spatial audio generator receives an audio signal 22a for a sound source 10a and generates an audio output (28a-28n) corresponding to an observer location (20) spaced from the sound source. The observer location is defined by a variable in each of two orthogonal planes relative to the sound source. The spatial audio generator has: a distance modifier 30 for altering the received audio signal according to the distance between the sound source and the observer location; a first plane modifier 40 for altering the received audio signal according to the variable value in an azimuth plane; and a second plane modifier 32 for altering the received audio signal according to the variable value in a vertical plane. The modifiers each operate on the received audio signal in series, e.g. in a predetermined order, with the output of the final modifier providing a representation of the received audio signal observed at the observer location. A computationally inexpensive three-dimensional audio generator can be provided which can amass audio outputs from multiple stages, easily representing multiple different sound sources in the environment.

Description

Spatial Audio Generator

The present invention relates to an acoustic engine for generating audio output capable of representing relative locations of a sound source and observer, e.g. in three-dimensions.

Humans have an excellent appreciation of auditory space, and are able to rapidly and accurately locate sources of sound. The human auditory system is used to locate the source of a sound, and the head then turns towards the identified location for visual analysis. The human auditory system is also a sophisticated spatial sensor, allowing humans to monitor the position of sound sources in three dimensions, with possible identification of objects.

Identification of the location of a source of sound in the horizontal plane is achieved primarily by detecting interaural phase difference and interaural level difference. For a sound source positioned directly in front of a listener, the signals reaching the left and right ears are almost identical. However, if a sound source is located left or right from the centre of the two ears, the sound signals reaching the left and right ears will differ in level and signals will reach one ear sooner than the other ear. If a sound source is located at the right of a listener’s head, sound signals will reach the listener’s right ear before reaching the left ear, and the listener is capable of identifying this interaural phase difference, and locating the sound source at the right of the listener’s head. Additionally, signals from a sound source located to the right of a listener’s head will reach the right ear at a higher level, or amplitude, than the left ear, due to attenuation as the sound signals travel through space. The listener is capable of identifying interaural level difference, and is able to locate the sound as coming from the right of the listener’s head.

Identification of the location of a source of sound in the vertical plane is achieved using spatial cues created by the external pinna. The pinna causes reflections of sound signals, thereby creating delayed signals, which produces a comb-filtered spectrum of peaks and troughs. The more the ear cavities are filled, i.e. the more sound signals entering the ear, the greater the error in location of the sound source.

It is desirable to generate sounds from loud-speakers or headphones which appear to the listener to be coming from a point in three-dimensional space. This is particularly desirable for use in simulated environments or augmented reality environments, where it is desirable to provide context to the sounds played to the user/listener. It is a particular technical challenge to provide for sound signals from multiple sources in three-dimensional space.

The Head-Related Transfer Function (HRTF) is the conventional model of a human ear and/or head that is used to process incoming sounds in order that they appear to be originating from a point in three-dimensional space when fed over headphones. As the signals received by the ears do not change linearly as the source of sound is moved around the head, a complex HRTF is usually required in order to accurately model the sound. This requires a large amount of processing, and as such typically only a few sound sources can be spatially positioned in real time.

It is an object of the invention to provide a spatial audio generator that provides improvements in efficiency on conventional sound engines that use HRTF.

According to the present invention, there is provided a spatial audio generator arranged to receive an audio signal for a sound source and to generate an audio output corresponding to an observer location spaced from said sound source, the observer location being defined by a variable in each of two orthogonal planes relative to the sound source, the audio generator comprising: a distance modifier for altering the received signal according to the distance between the sound source and the observer location; a first plane modifier for altering the received signal according to the variable value in a first of the orthogonal planes; and, a second plane modifier for altering the received signal according to the variable value in a second of the orthogonal planes, wherein the distance modifier, the first plane modifier and the second plane modifier each operate on the received signal in series with an output of one modifier being fed as an input to the next modifier and the output of the final modifier of the series providing a representation of the received signal observed at the observer location.

One of the first and second plane modifiers may be arranged to generate from a single input channel an output comprising a plurality of audio channels, e.g. a stereo output. Said plane modifier may be the final modifier in the series. The other plane modifier and the distance modifier may output a single channel output for the or each input channel. All other modifiers may output a single channel output for the or each input channel thereto.

The received signal may be a monaural audio signal. In other examples, the received signal may comprise a plurality of channels, e.g. a stereo input.

The observer location may comprise location/coordinate data and a direction, e.g. corresponding to a direction in which the observer is facing.

The first plane modifier may be an azimuth/horizontal plane modifier. The first plane modifier may duplicate the received signal, thereby providing a left signal and a right signal, and attenuate the left and right signals separately and/or by differing amounts based on the variable value in the first plane. In the event that the first plane modifier receives a plurality of channels, e.g. already comprising left and right signals, the received signal may not be duplicated and instead the first plane modifier may attenuate the existing left and right signals based on the variable value in the first plane.

The variable in the first plane may comprise an angle or lateral dimension relative to the direction in which the observer is facing. The first plane modifier may apply only relative left and right signal attenuation, e.g. without any additional audio signal processing.

The second plane modifier may be an elevation/vertical plane modifier. The second plane modifier may determine a delay to the received signal. The delay may be calculated according to the variable value in the second plane. The variable in the second plane may comprise an elevation angle or vertical dimension relative to the direction in which the observer is facing, e.g. horizontal.

The delay may be calculated according to a linear or non-linear function of the variable value in the second plane. The delay may decrease with increasing value of the variable in the second plane, e.g. elevation angle or height. A maximum and/or minimum threshold delay may be implemented (e.g. predetermined cut-off value(s)) such that the delay applied by the modifier cannot lie beyond said threshold value(s). The minimum threshold may or may not be zero. The maximum threshold may be between 400ps and 600 or 700ps, e.g. around 500 ps.

The second plane modifier may apply a delay-and-add function to the received signal. The delay may be added to the received signal to form a combined signal, which is output by the second plane modifier.

The second plane modifier may attenuate the delayed, received and/or combined signal. A fixed or variable attenuation may be used, e.g. according to the bandwidth of the received signal.

The audio generator may be arranged to receive an audio signal, e.g. a monaural/stereo signal, for a plurality of sound sources, e.g. wherein the received signal and/or location for each sound source is different. Each of the plurality of sound sources is typically spaced from the observer location.

For a plurality of sound sources, the received monaural signal for each sound source may be processed in parallel by the audio generator, e.g. with the distance modifier, first plane modifier and second plane modifier are applied in series for each sound source but in parallel across the different sound sources.

The distance modifier may attenuate the received signal. The distance modifier may or may not apply a time delay to the received signal, e.g. delaying the timing of playback to a user. The distance modifier may perform only signal attenuation and/or delay.

The distance modifier may attenuate the received signal as a function of the distance between the sound source and the observer, e.g. in two-dimensional or three-dimensional space. An inverse or inverse square function may or may not be used. The distance modifier may apply a delay to the received signal as a linear function of the distance between the sound source and observer location.

One or more further modifier may be used. The further modifier may comprise a directional modifier, e.g. according to whether the observer is facing towards or away from the sound source. The directional modifier may comprise an audio signal filter, e.g. a low-pass filter. The directional modifier may consist of a filter.

The audio generator may comprise a discretization controller or sampler arranged to process the monaural signal for the, or each, sound source. The discretization controller may sample the monaural signal prior to application of the modifiers.

The discretization controller may sample the representation of the received signal after processing by any, or all, of the modifiers, e.g. prior to output by the audio generator and/or playback to the user.

According to a second aspect of the invention, there is provided an audio playback system comprising the audio generator of the first aspect and a spatial modeller, the spatial modeller arranged to log the location of one or more sound source relative to the observer location and to output the relative location data to the audio generator.

According to a third aspect of the invention, there is provided a method of generating a spatial audio signal representing an observed audio signal at an observer location for a sound source spaced therefrom, the method comprising receiving the observer location defined by a variable in each of two orthogonal planes relative to the sound source, and processing a received audio signal for the sound source by applying the following audio modification processes in any series order: a distance modification by altering the received signal according to the distance between the sound source and the observer location; a first plane modification by altering the received signal according to the variable value in a first of the orthogonal planes; and, a second plane modification by altering the received signal according to the variable value in a second of the orthogonal planes, wherein the output of a preceding modification process is fed as an input into the ensuing modification process.

According to a fourth aspect of the invention, there is provided a data carrier comprising machine readable instructions for the operation of one or more computer processor to operate the method of the third aspect.

According to a further aspect of the invention there is provided a method for producing spatially adjusted sound signals, comprising applying a time delay and attenuation to a signal from a sound source based on a vertical variable, duplicating the signal, thereby providing a left signal and a right signal, and attenuating the left and right signals by differing amounts based on a horizontal variable, and outputting the signals for the left and right ear.

The invention offers a highly computationally efficient system for generating audio output for a user that contains a representation of the relative positioning of the sound source and user/observer location. This allows multiple sound sources to be modelled with less computational resource than in the prior art. Therefore, whilst the processing technique used by the present invention may not be as accurate for an individual sound source as previous implementations of HRTF, the invention may beneficially allow spatial context to be added to a sound source in circumstances in which it was previously considered to be computationally too expensive. Additionally or alternatively, the invention may allow more sound sources to be modelled using a finite computational resource, thereby allowing a richer audio experience by virtue of the number of sound sources.

Any of the optional feature defined above in relation to any one aspect of the invention may be applied to any further aspect of the invention wherever practicable.

Working embodiments of the invention are described in further detail below with reference to the accompanying drawings, of which:

Fig. 1 shows a modelled environment comprising a sound source and listener;

Fig. 2 shows an overview of audio generation system according to an example of the invention; and,

Fig. 3 shows an example of the steps used in an example of spatial audio processing according to the invention.

Turning to Fig. 1 there is shown a sound source 10 located within a region defined by a Cartesian coordinate system having conventional orthogonal axes consisting of a vertical axis 12, a first horizontal axis 14, and a second horizontal axis 16. The sound source 10 is shown as being located at the point 0, 0, 0 in Fig. 1 but could be defined at any location within the coordinate system in a conventional manner.

An observer 18 defined at a different point in space, at location 20, would thus perceive a sound 22 emanating from the source 10 differently to a listener located elsewhere. Examples of the invention described below provide methods of processing the audio signal representative of the sound 22 such that it represents the sound that would be heard by a listener at a specified location relative to the sound source 10. The embodiments described below concern a modelled environment in which the relative locations are defined in terms of coordinate data and/or other corresponding spatial location data.

In Fig. 1 the distance ‘d’ between the observer location 20 and the sound source 10 can be determined in three dimensional space from the coordinate data using standard trigonometry. Thus the spacing between the sound source 10 and location 20 can be defined in terms of a vector having a magnitude, d, and a direction defined in terms of an angular orientation. The angular orientation may be defined in terms of an angle in the horizontal plane comprising axes 14 and 16 and an angle in a vertical plane comprising axes 12 and 14. The angles may be defined relative to an axis in the relevant plane, e.g. axis 14, or else a direction of the emanated sound, e.g. in the example of a directional sound source, or a direction 24 in which the observer 18 is facing.

Fig. 1 represent a simplified example in which a single sound source 10 is present but Fig. 2 shows a system in which multiple sound sources 10a, 10b, 10c, 10d to 10n are provided. Each sound source may emanate a different sound and/or be positioned at a different location to any, any combination, or all of the other sound sources.

Such spatial models may exist in mathematically/geometrically defined environments, i.e. simulated environments, on suitable conventional computing equipment and may be presented to a user via a screen, projector or the like.

Such a simulated environment may comprise a computational model of a real environment, or an entirely fictional environment, e.g. for training and/or entertainment purposes. In addition to wholly simulated environments, the invention may be implemented in a real environment, e.g. in which it is desirable for a user to receive computer generated audio cues, or an augmented-reality environment, in which simulated audio-visual cues are provided to a user within the context of a physical environment.

In Fig. 2, each sound source 10 has associated therewith a corresponding sound 22a-22n defined as an analogue or digital audio signal. The length of the signal may vary according to whether the sound is a single sound, repeating sound or ongoing stream of varying sound, e.g. such as a stream of music, conversation, machine operation or the like.

The audio signal corresponding to each sound 22a-n is processed as will be described below by the spatial audio generator 26 before being output as a modified audio signal 28a-n which can be played to a user, e.g. over headphones or using another suitable stereo or surround sound speaker system so as to give the user the impression that they are located at the observer location 20 relative to any sound sources within the environment.

The audio generator or sound engine 26 comprises one or more computer processor running machine code provided as a plurality of functional modules as will be described below. Whilst the use of modules to partition individual processing stages of the audio signal modification process is in many ways beneficial, it is not essential as will be understood by the skilled person provided the stages can be executed in a series order for a single sound source. The modules are described below in the order from left to right in which they are executed in the example of Fig. 3, although the order could be altered if desired.

Distance module

The distance ‘d’ described above in relation to Fig. 1 is made available to the audio generator 26 either from the computer model of the simulated space or else is calculated by the audio generator 26 from coordinate, or other spatial data, made available. The value of d will typically for each sound source relative to the observer location 20 and so different values are shown in Fig. 3 as Da, Db... DNas necessary.

For 2D visual applications, the distance can be fixed at a predetermined level, by applying a fixed attenuation and distance (typically 0 for both). For 3D visual applications that simulate a 3D world, the distance from the sound source to the user can usually be determined via variables within the model. An approximation of the distance, e.g. within a 5% or 10% threshold may be used if necessary in the event that the precise value is not attainable.

The distance between the sound source 10a and observer is fed into the distance module 30, which then modifies the original audio signal 22a so as to provide an audio representation of the space between the source and observation location 20.

An attenuation is applied to the sound source 22a by module 30. The level of attenuation typically increases with distance from the source 10a. The amplitude of the sound may be affected according to the inverse of the square of its distance (i.e. 1/d2) from the source. A delay (or digital offset) from the sound source 10 to the listener can be applied (e.g. using the speed of sound as 346 m/s). This might be particularly useful in circumstances where there is a single sound effect, but multiple sound sources playing this effect at the same time, but are varying in speed and distance from each other. The delay may or may not be required depending on the circumstances of the environment or the sound source therein being modelled.

Vertical module

In this example, the vertical module 32 modifies the received audio signal 22a, after distance module 30, according to the vertical offset between the source 10a and listener location 20. This is input in this example as an angular value in the vertical plane, i.e. the angle subtended between the line ‘d’ in Fig. 1 (i.e. the direction from the source to the observer location) and a horizontal axis or plane.

The angle of elevation may take a value between +90° and -90° in this example.

The vertical module 32 takes the form of a delay-and-add operation as shown in Fig. 3 in which the input signal is processed and then added to the original signal at step 34 so as to produce a combined output signal of the module. For a single input channel (i.e. a monaural input), the output of module 32 will also be a single channel. The processing performed in module 32 comprises implementation of a time delay at 36 and signal attenuation at 38.

For 2D applications, the position of the sound source representing an object in a user display can be used to determine its vertical angle, or elevation. The angle can then be fed into an equation that determines the delay to be applied for the vertical effect. A suitable range has been found to be between 500ps and 200ps for angles below the centre, and between 200ps and Ops for angles above the centre position. The centre may be horizontal or else the direction in which the listener is facing, depending on different examples of the invention. This could be a linear decrease in delay for increasing vertical angle, or a non-linear change if one is preferred.

For 3D applications, the vertical angle subtended between the sound source and the virtual listener can be calculated from variables within the model, and passed to the same equation used for the 2D model to determine the delay of the delay-and-add model to be applied.

The equation used to determine the delay from the vertically subtended angle could also be presented as a graph for the user to manipulate; i.e. it could be steeper for lower angles, and shallower for higher angles, for example in the form of an exponential decay or other suitable function. Alternatively the user may dictate a desired profile that is not dictated by a mathematical function.

Maximum and minimum limits may be applied to the amount of signal modification available to so that the effect does not change beyond a predefined point. It may also be useful to allow for limits beyond the extent of a visual display of the environment, e.g. so that a sound could be perceived to be coming down from above or up from below, e.g. off-screen, before it eventually comes into view on the screen.

The attenuation factor should be a multiplication of less than 1; otherwise the effect can become too strong and sound somewhat synthetic. A good factor to use for a sound with a reasonable bandwidth is around 0.7.

Horizontal module

The horizontal module 40 splits the received monaural signal into two, i.e. stereo, 42 and 44. If a stereo signal input is provided to module 40, it will instead process the two received channels.

For 2D applications, the position of the sound source represented in a visual display can be used determine its horizontal angle, or azimuth. This could be a model that provides an interaural level difference between the left and right channels. This means that when the sound is due to be perceived as coming from the left side of the screen, the left channel should deliver the un-attenuated sound, but the right channel should attenuate the sound by a certain amount so that it’s position is perceived as coming from the left. The reverse is true for a sound to be perceived as coming from the right hand side. There should therefore be no attenuation of either channel if the sound is to be perceived as coming from the dead centre, e.g. the direction 24 in which the listener is facing.

No equation is given here to determine the left and right attenuation factors depending on azimuth as the ideal equation is a question of design choice and optimisation to suit personal taste, although it has been found that a good fit is the shape of a sine function. However, this can provide absolute attenuation at the farthest angles from the centre, which might not be what the designer desires. Accordingly, in other examples a portion of the shape of the sine function may be discarded. A similar approach can be used for determining the delays between the left and right channels for the interaural phase difference model. Though through experimentation, their effect is only apparent for low frequency sounds; and since this invention works better using a broad bandwidth of sound frequencies, the IPD technique could be omitted to save on CPU/GPU processing.

The ILD/IPD equations used to determine the panning or phase from the horizontally subtended angle could also be presented as a graph for the user to manipulate; i.e. it could be steeper for lower angles, and shallower for higher angles. There could also be maximum and minimum limits applied so that the effect does not change passed a certain point. It may also be useful to increase the graph limits off-screen so that a sound could be perceived to be coming from the left or the right before it eventually comes into view on the screen.

Summation with multiple stages

If there is more than one stage or sound source to be positioned in 3D space, the left and right channels should be attenuated if necessary, and added to the corresponding outputs of the other stages, as shown at 46 before being fed to the output to prevent clipping.

The number of stages that can be added is merely determined by the spare capacity of the available CPUs and/or GPUs. This means that if an application is using these units heavily (such as a 3D virtual world), there may not be room for a lot of stages, but if their use is light, many stages could be used to position various sound sources at various positions at various stages in time. The invention has been found to be particularly useful in allowing larger number of stages to be processed, thereby providing a richer audio experience, despite the fact that each stage may user simpler processing than in prior art HRTF examples. Prior art examples running on conventional computing equipment, such as desktop PC’s or equivalent devices, tend to limit the number of stages that can be processed alongside other applications to around ten or less.

According to examples of the invention, the number of stages that can be processed may be greater than 10,15 or 20 and may be in the hundreds if necessary, e.g. up to a thousand stages. In more practical examples, it has been found that processing of 10-100 or 200 stages, or an upper limit in the region of between 30 or 50 and 100 stages can provide beneficial results.

In different examples, of the invention the use of a significant number of sound sources can be beneficial, for example in modelling large room reflections, such as a cathedral, that has potentially hundreds, if not thousands of different sound reflections, delays and directions over the life of a sound’s journey from source to listener through this environment. Each reflection/echo could be modelled as a sound source if necessary.

For digital signal processing applications, a buffer may be used that will need momentarily writing to, ready for playback, e.g. with any overspill of the buffer from the distance, horizontal and vertical processing delays introduced being accommodated in any subsequent buffers writes.

One advantage of using a digital platform to process the sound is that a delay can be introduced very easily; simply by incrementing the position of the sample to read. However, multiplication through attenuation or amplification is more processor intensive than analogue, but this is comfortably handled by today’s highspeed processors. And since the number of multiplications is orders of magnitude less when using this invention compared with a typical HRTF, this simplified approach should be all the more attractive in today’s market and applications.

Optional Direction Module

If the sound source is to be perceived as coming from behind the user, a low pass filter can be introduced into the path of the sound. This is because the ears are actually facing away from the sound source, and blocking the direct path to the sound. A filter is therefore required to attenuate the higher frequencies since they are more reluctant to bend around the ears to the ear canal than the lower frequency sounds.

The simple use of a low-pass filter or not can thus provide an auditory cue to a listener whether the sound is in front or behind the user. Such a front-back module, whilst simplistic in isolation, has been found to provide good spatial information when used in combination with the other modules described herein. The front-back module is not required if only forward facing sound modelling is required and is not shown in Fig. 3 but can be included at any suitable point in the process of Fig. 3 between the incoming signal 22a and the addition of the different stages before output to the user.

Sound Source Considerations

The sound source 10 can be any pre-recorded or live sound, but it is recommended that its bandwidth be as wide as possible (i.e. up to 20kHz) so the processing according to the invention, particularly the vertical effect, is perceivable to the user. It is also beneficial if the sound contains both relatively high and low frequencies, as it will produce a more effective 3D sound, although this is not essential for all sound sources. If the sound is digital, it is recommended to be sampled at least 44.1 kHz or above.

If the sound source is digitally sampled at the minimum recommended, there will be a resolution of approximately 10 degrees in the vertical and horizontal planes. The down-side of this reduced resolution could produce definite step changes in the sound for changes in direction, especially for sounds with very high bandwidths. To prevent this, oversampling can be performed before entering the 3D sound processing stages, and then under-sampling before playback. However this oversampling would increase the need for more CPU/GPU time, so a trade-off would be needed between resolution, number of stages and processing time in order to find the best fit solution for a given application. Note also that the more load placed on the processor(s), the more latency is incurred, including the introduction of unwanted pops and clicks.

In order to sustain the quality of the signals during the vertical processing, it is recommended that the original signal be sampled at a high frequency, typically 192 kHz or above. If this cannot be achieved, some sort of linear or other intelligent interpolation may be used to reduce the introduction of any distortion. Linear interpolation implies oversampling, and instead of making the amplitudes of the additional samples the same as the previous ones, a linear line between the first and second original samples may be drawn, with the oversamples reaching up to, or down to this line. For intelligent interpolation, the gradients of the lines through the first and second original samples may be assessed, and a fitting curve between the two points may be drawn. The strength of the oversamples are then extended or reduced to reach this calculated line.

The oversampling, e.g. including interpolation, prior to operation of one or more of the modules described above and/or under-sampling for playback may contribute to any aspect of the invention.

Audio Output

The amassed audio outputs of the different stages, representing different sound sources in the environment, may be listened to by the end user over speakers, such as headphones 46. Reverberation effects come into play when the sounds are delivered over loudspeakers, which help the user to defer their localisation away from that intended and towards the position of the loudspeakers.

The invention has low processing requirements making it potentially implementable using portable computing devices such as mobile phones, tablets and the like. Virtual reality and augmented reality headset technology is also becoming more widely considered, where this invention could usefully be applied. However, if the signals are strong and loud enough, you should be able to overcome the deference reverberation effects and have a pleasurable experience listening over loudspeakers. Loudspeakers can be effective for end user playback within a closed room or other controlled environment with aptly positioned speakers.

The order of the modules described above is not fixed, as long as they are sequential. In addition, it should be noted that if the horizontal panning module 40 is put first, the distance attenuation and delay module 30, and vertical delay-and-add model 32 will need to be doubled up for each channel (i.e. left and right).

Claims

Claims:

1. A spatial audio generator arranged to receive an audio signal for a sound source and to generate an audio output corresponding to an observer location spaced from said sound source, the observer location being defined by a variable in each of two orthogonal planes relative to the sound source, the audio generator comprising: a distance modifier for altering the received signal according to the distance between the sound source and the observer location; a first plane modifier for altering the received signal according to the variable value in a first of the orthogonal planes; and, a second plane modifier for altering the received signal according to the variable value in a second of the orthogonal planes, wherein the distance modifier, the first plane modifier and the second plane modifier each operate on the received signal in series with an output of one modifier being fed as an input to the next modifier and the output of the final modifier of the series providing a representation of the received signal observed at the observer location.

2. An audio generator according to claim 1, wherein the first plane modifier is an azimuth plane modifier arranged to process a plurality of channels for the received signal comprising a left signal and a right signal, by attenuating the left and right signals separately and/or by differing amounts based on the variable value in the first plane.

3. An audio generator according to claim 2, wherein the first plane modifier duplicates the received audio signal in the event that the received audio signal is a monaural signal so as to create the left and right signals.

4. An audio generator according to any preceding claim, wherein the observer location comprises a direction in which the observer location is facing and the variable in the first plane comprises an angle or lateral dimension relative to the direction in which the observer is facing.

5. An audio generator according to any preceding claim, wherein the first plane modifier applies only relative left and right signal attenuation to a received audio signal.

6. An audio generator according to any preceding claim, wherein the second plane modifier is an elevation modifier arranged to determine a delay to the received signal according to the variable value in the second plane.

7. An audio generator according to claim 6, wherein the variable in the second plane comprises an elevation angle or vertical dimension relative to the direction in which the observer is facing in a vertical plane.

8. An audio generator according to claim 6 or 7, wherein the second plane modifier applies a maximum and/or minimum threshold delay such that the delay applied by the modifier cannot lie beyond said maximum and/or minimum threshold.

9. An audio generator according to any one of claims 6-8, wherein the second plane modifier applies a delay-and-add function to the received signal.

10. An audio generator according to any one of claims 6-9, wherein the second plane modifier attenuates the delayed signal.

11. An audio generator according to any preceding claim, wherein the audio generator may be arranged to receive a plurality of audio signals for a plurality of sound sources, wherein the received signal and/or location for each sound source is different, and the audio generator processes the received signal for each sound source in parallel, the representations of the received signals for the plurality of sound sources being amassed by the audio generator prior to audio output.

12. An audio generator according to any preceding claim, wherein the distance modifier attenuates the received signal as a function of a distance between the sound source and the observer location.

13. An audio generator according to any preceding claim, wherein the distance modifier performs only signal attenuation and/or delay on the received signal.

14. An audio generator according to any preceding claim, wherein the observer location comprises a distance from a sound source and an observer direction.

15. An audio generator according to any preceding claim, wherein the first plane modifiers generates from a received monaural signal an output comprising a plurality of audio channels, and the second plane modifier is the final modifier in the series.

16. An audio generator according to any preceding claim, further comprising a directional modifier, having a low-pass filter.

17. An audio generator according to any preceding claim, further comprising a discretization controller arranged to sample the received signal prior to application of the modifiers and/or after processing by any, or all, of the modifiers.

18. An audio playback system comprising the audio generator according to any preceding claim and a spatial modeller, the spatial modeller arranged to determine the location of one or more sound source relative to the observer location and to output location data to the audio generator.

19. A method of generating a spatial audio signal representing an observed audio signal at an observer location for a sound source spaced therefrom, the method comprising receiving the observer location defined by a variable in each of two orthogonal planes relative to the sound source, and processing a received audio signal for the sound source by applying the following audio modification processes in any series order: a distance modification by altering the received signal according to the distance between the sound source and the observer location; a first plane modification by altering the received signal according to the variable value in a first of the orthogonal planes; and, a second plane modification by altering the received signal according to the variable value in a second of the orthogonal planes, wherein the output of a preceding modification process is fed as an input into the ensuing modification process.

20. A data carrier comprising machine readable instructions for the operation of one or more computer processor to operate the method of claim 19.

21. A spatial audio generator substantially as hereinbefore described with reference to the accompanying figures.