EP0990369B1

EP0990369B1 - Sound reproduction system

Info

Publication number: EP0990369B1
Application number: EP98924440A
Authority: EP
Inventors: Michael Peter Hollier; Kelvin Chee Kin Foo; Malcolm Omar-University of Essex HAWKSFORD
Original assignee: British Telecommunications PLC
Current assignee: British Telecommunications PLC
Priority date: 1997-06-19
Filing date: 1998-05-27
Publication date: 2003-07-09
Anticipated expiration: 2018-05-27
Also published as: JP2002505057A; WO1998058522A2; WO1998058522A3; AU7664498A; AU735233B2; EP0990369A2; DE69816298D1; DE69816298T2

Description

This invention relates to sound reproduction systems, and in particular to an improved system for binaural synthesis, that is the generation of sound signals such that the pressures at a user's ears correspond to those which would have existed in the presence of the sound source to be simulated. Such sounds will have a true source, which is generally a loudspeaker or array of loudspeakers, but seem to the listener to originate from another source, located at the position of the source being simulated. This perceived source of the sound is known as a "virtual source".
The principle of using a conventional stereo loudspeaker setup for binaural synthesis was first conceived by Atal B S, & Shroeder M R, "Apparent sound source translator", US Patent 3236949, 1966; and was later further optimised by Cooper D H & Bauk J L, "Prospects for transaural recording" Journal of the Audio Engineeing Society, Vol.37, (3-19), 1989, who introduced the term "Transaural Stereo". See also Shroeder M R, "Models of Hearing", Proc. IEEE, Vol.63 (1332-1350), 1975 and Cooper D H, Bauk J L, "Generalised transaural stereo" 93rd AES Convention, Preprint 3401, 1992.
If signals at the ears relating to direction of sound sources can be reconstructed accurately, combined with accurate reconstruction of secondary images such as reflections, compelling spatial immersion could be accomplished.
In a loudspeaker listening situation, in order to synthesise the correct signals to the ears to simulate a sound source at some physical point other than the loudspeakers, the signals to the loudspeakers have to be tailored in such a way as to reconstruct, at the listener's ears, sound pressure indistinguishable from those that the ears would have received in a free field setup. The propagation from each loudspeaker L1, L2 to each ear of a listener Z is represented in Figure 1, and can be characterised by the following matrix equations:
where:

X_L: is the signal received at the left ear;
X_R: is the signal received at the right ear;
Y₁: is the signal transmitted by the left source (loudspeaker L1);
Y₂: is the signal transmitted by the right source (loudspeaker L2);
H_1L =: transfer function of left source (loudspeaker L1) to left ear
H_1R =: transfer function of left source (loudspeaker L1) to right ear
H_2L =: transfer function of right source (loudspeaker L2) to left ear
H_2R =: transfer function of right source (loudspeaker L2) to right ear

Solving for Y with known signals X that describe the sound source at an arbitrary point in space should obtain the appropriate signals to be fed to the loudspeakers. This equation clearly shows that the signal X is required to be filtered through a crosstalk cancellation stage formed by the inverted matrix (hereinafter referred to as the crosstalk cancellation matrix) as depicted in the following equation:
In theory, such a derivation method for a crosstalk cancellation solution could be applied to any set-up of a pair of loudspeakers, whether symmetrical or non-symmetrical.
In a conventional crosstalk cancellation configuration for a stereo pair of loudspeakers, and for all transaural systems depicted so far, all synthesised sound images are filtered through a crosstalk cancellation process. However, a stereo pair cannot give accurate reconstruction of signals from all directions. A typical stereo pair arranged in front of a listener gives accurate simulation only within approximately ± 100° of the direction the listener is facing. Moreover, if more than two loudspeakers are introduced, the crosstalk cancellation technique breaks down as the system would result in an indeterminate system (more unknowns than equations). Cooper and Bauck extended their generalised transaural theory to more than two discrete channels of information which generalised the crosstalk cancellation to any number of loudspeakers for any number of listeners. However, approximate solutions were only given, in an attempt to solve an indeterminate system for one listener.
According to the invention, there is provided a sound reproduction system, the system comprising a plurality of loudspeakers, and a processor capable of determining where, within a defined space, one or more virtual sound are located,
characterised by means, for each virtual sound source, for selecting a subset of the loudspeakers, said sub-set being selected from the plurality of loudspeakers on the basis of the location of the virtual sound source in the defined space, and means for applying a cross talk cancellation process to the selected sub-set of the loudspeakers.
In another aspect of the invention, there is provided a method for reproducing sound by way of a plurality of loudspeakers the method comprising the step of determining where, within a defined space, one or more virtual sound sources are located and characterised by applying, for each virtual sound source, a cross talk cancellation process to a sub-set of the loudspeakers, said sub-set being selected from the plurality of loudspeakers on the basis of the location of the virtual sound source in the defined space.
The plurality of loudspeakers from which the subset is selected allows accurate simulation over a greater range of virtual source locations than a single pair of loudspeakers could achieve. However, the selection of a subset (preferably a pair) from this larger plurality of loudspeakers allows the crosstalk processing to be greatly simplified. The pairwise concept introduced here embraces a finite number of independent crosstalk cancellation processes, each identifying with a pair of loudspeakers in a multiple speaker array. The derivation of the crosstalk cancellation matrix process for each pair is identical to that for a conventional pair. The number of independent crosstalk cancellation matrix modules which can be implemented in such an array is governed by the locations of loudspeakers in the multi-loudspeaker array, and the spatial coverage and accuracy achievable by an optimised pair of loudspeakers in that array.
Embodiments of the invention will now be described with reference to the drawings, in which:
Figure 1 shows a conventional stereo pair configuration with the respective transfer functions from sources to ears as already discussed;
Figure 2 illustrates four physical point sources with maximum possible number of crosstalk cancellation processes;
Figure 3 illustrates a lateral set of four loudspeakers, showing the loudspeakers' area of coverage on the lateral plane (the horizontal plane containing the ears):
Figure 4 illustrates the application of binaurally synthesised signals to appropriate crosstalk cancellation processes for the configuration of Figure 3;
Figure 5 illustrates a three loudspeaker configuration;
Figure 6 illustrates a five loudspeaker configuration;
Figure 7 illustrates an application of virtual static point sources to overcome limitations in available space;
Figure 8 illustrates another four-loudspeaker configuration;
Figure 9 shows schematically a pairwise crosstalk cancellation implementation circuit for localising five monophonic virtual sources using the four-loudspeaker layout of Figure 8.
Figure 2 shows a loudspeaker layout having four loudspeakers L1, L2, L3, L4. It is not in general necessary to implement all the possible pairwise processes, as in most configurations only adjacent pairs of loudspeakers are used, but for some virtual sources non-adjacent pairs may be selected (as will be seen when discussing Figure 6) so the maximum number of crosstalk cancellation processes between pairs of loudspeakers in an array of four loudspeakers is not four, but six, or more generally, for an array of n loudspeakers, n(n-1)/2.
The selection of an appropriate crosstalk cancellation process is governed by the direction of the synthesised sound source or sources, i.e. if synthesised sound images are to emanate from directions which are covered by one pair of loudspeakers, the processed directional signals are only applied to that pair of loudspeakers and its respective crosstalk cancellation process. If two or more sound sources of different directions are to be synthesised and played back via an array of multiple loudspeakers, respective crosstalk cancellation process modules relating to respective pairs of loudspeakers can be implemented to deliver each pair of directional signals to the ears, taking note that the process is always performed pairwise.
To illustrate the pairwise concept and the explanation given above, consider the lateral setup as shown in Figure 3. The layout consists of a ±30° frontal pair of loudspeakers L1, L2, and a ±120° rear pair of loudspeakers L3, L4 (angles of incidence are measured with respect to the direction due front of the listener 2). Seven virtual images V1 to V7 are shown emanating from different bearings. To deliver correctly each binaurally-synthesised sound signal (carrying directional information) to the listener's ears, each pair of signals is applied to crosstalk cancellation process modules of appropriate pairs of loudspeakers which cover the location of the sound images. Four areas of coverage are shown, with loudspeaker L1,L2 encompassing the frontal sector 31 (± 60°), L1,L3 and L2,L4 for left and right sectors 32, 33 respectively and L3,L4 for rear coverage (sector 34). The block diagram in Figure 4 illustrates the strategic switching of a number of processed signals having left and right components (X_L, X_R) as heard at the ears to appropriate modules 41, 42, 43, 44, each corresponding to the pair of the loudspeakers appropriate to the lateral bearings of these signals.
Translating virtual moving sound sources using the pairwise concept can be achieved by correctly switching or directing the synthesised signals to the appropriate pairwise crosstalk cancellation process. Using the example shown in Figure 3, a sound source can be made to translate from the left sector (32) to the frontal sector (31), by first applying the synthesised signal to the crosstalk cancellation processor 42 for the left sector 32, to give its initial position as well as the points of movement within the left sector, depending on the angular step size between synthesised sources. Once the image shifts to the next sector, the synthesised signals are switched to the crosstalk cancellation processor 41 for the front sector 31 to continue projecting the moving source.
The example shown above may appear to suggest that the pairwise concept restricts the crosstalk cancellation to within the angle between the pair of loudspeakers. However, the angle of coverage, be it lateral or spherical, strictly depends on how well a pair of loudspeakers can spatialise within its capability (in the sense of localisation accuracy). The following worked examples were taken from experiments which demonstrate that different paired configurations gave significantly different localisation abilities and reveal advantages of some unconventional loudspeaker placement over current layout practice.
An unusual layout, which may seem to be impractical on initial inspection, is shown in Figure 5. This has just three loudspeakers L1, L2, L3 (Left, Centre Front, and Right), arranged at 0° (Centre Front) and ± 90° (Right and Left). It displays good imaging ability within the respective loudspeakers' optimised fields of coverage as shown in Figure 5. The left and right frontal quadrants 51, 52 covered by the Left/Centre pair L1/L2 and Right/Centre pair L2/L3 give good static frontal sources even with a distinct degree of head rotation to face the virtually positioned source. The unconventional Left/Right pair L1/L3 along the axis of the ears gave remarkable rear incidence synthesised images covering the range from +90° to -90°, even on the onsets of the synthesised sound sample. The Left-Right ear axis loudspeaker pair L1/L3 not only gives coverage along the rear half of the lateral plane (sector 53), but it also encompasses the rear hemisphere, i.e. including point sources above or below the lateral plane.
Another example is illustrated in Figure 6. This illustrates that the coverage provided by some paired loudspeakers is limited but, by combining with several other pairs of loudspeakers in the array, the voids are filled and a desired spatialisation is fulfilled. Five loudspeakers are used, arranged at 0° (Centre-Front: L2) ± 60° (Right-Front: L3, and Left-Front: L1), and ± 120° (Right-Rear: L4 and Left-Rear: L5). The frontal ±60° stereo pair L1/L3 provide poor frontal images in the range covering ±10° (sectors 62/63). Addition of the centre-front unit L2 and implementation of crosstalk cancellation on left-front/centre-front (L1/L2) and right-front/centre-front (L2/L3) pairs provides sufficient coverage for sound images in these sectors. It can be seen that there is a possibility of extending the coverage between the centre loudspeaker L2 and each of the respective front loudspeakers L1, L3. The pairwise concept employs a strategy of applying the best pair available to achieve good localisation and in this case, subjective tests have shown that sound images projected at the angles between -10° and -60° (sector 61) and between +10° and +60° (sector 64) are better localised using the left-front/right-front non-adjacent pair L1/L3 than that processed by either the left-front/centre-front or centre-front/right-front pairs (L1/L2, L2/L3).
The pairwise concept is not restricted to just these few loudspeaker configurations and locations. The invention delivers a new but yet direct general approach to solving three-dimensional sound field spatialisation for multiple loudspeaker applications. The loudspeaker array itself may be designed to comply with other constraints such as cost (in particular the number of loudspeakers to be used) and the availability of locations to site the loudspeakers. With such a strategy, in general terms, the best localisation effect of a sound source is achieved by engaging a crosstalk cancellation process that relates to the most appropriate pair of loudspeakers available in the array. This does not restrict to just the direct path of sound sources. Each individual reflection of a sound source could be treated as a further virtual source, with a suitable delay with respect to the primary source, to simulate a reflected sound. Applying the appropriate crosstalk cancellation process to each reflection could accurately render their positions in space, an essence of an immersive spatial environment.
The introduction of unconventional loudspeaker locations also reveals exceptional rear localisation of sound images and, with another paired configuration that has good frontal attributes, gave strong distinction between front and rear virtual images therefore eliminating front-back and back-front ambiguities.
The ability to project static virtual point sources accurately has great contributions to teleconferencing and fully immersive personal workstation applications. Further applications also extend to home cinema setup in which the loudspeaker positions intended for a cinema need to be simulated. The home environment is restricted in both the number of available loudspeakers and in the availability of positions to place them. Virtual loudspeakers in such a setup could be rendered in their respective places as shown in Figure 7. In the example illustrated five virtual units 71, 72, 73, 74, 75 are simulated by only three real units L1. L2, L3, configured as already described with reference to Figure 5. Two of the virtual units 74, 75 are located outside the confines of the room R in which the loudspeakers L1, L2, L3 and the listener Z are located. This could overcome the limitation of physical point sources and available listening space in a room. Directional loudspeakers can be used to reduce the volume of sound audible at locations away from the listener Z, and in particular at the locations of the virtual rear surround units 74, 75 outside the room R.
A simplified example of an implementation of the system, based on a four-loudspeaker array with only two sectors, as shown in Figure 8, will now be described. Figure 9 shows the array set up with pairwise crosstalk cancellation applied to a forward pair L1, L2 set at ±60° and a side pair L3, L4 set at ±90°, i.e. it is based on the assumption that the forward pair L1, L2 provides the best reconstruction of spatialised images in the front sector (Sector 81) and the side pair L3, L4 provides the best reconstruction of spatialised images in the rear sector (Sector 82).
The example depicts five virtual sources X₀, X₁, X₂, X₃, X₄ to be spatialised, however the implementation of the pairwise concept does not limit the number of input sources.
The input sources X₀ - X₄ are each first subjected to analogue/digital conversion in a bank 91 of converters A/D. The input sources are then treated in a bank of processors 92 with the appropriate hearing response transfer functions (HRTFs), H_X0L, H_X0R, H_X1L, H_X1R, H_X2L, H_X2R, H_X3L, H_X3R, H_X4L, H_X4R ; where H_X0L is the HRTF of source X₀ to Left Ear, H_X0R is the HRTF of source X₀ to Right Ear, etc. The left outputs of the front three sources X_0L, X_1L, X_2L are then combined in a combiner 93, and similarly for the right outputs X_0R, X_1R, X_2R, (combiner 93a) and the two outputs filtered in a processor 94 by the forward pair crosstalk cancellation matrix for the reconstruction of virtual images in the front sector 81. The remaining two input sources X₃, X₄ are similarly filtered by the side pair crosstalk cancellation matrix (processor 94a) for the reconstruction of virtual images in the rear sector 82. The outputs from the cancellation stages 94, 94a are then subject to digital/analogue conversion (D/A) (convertors 96) for output to the appropriate loudspeakers L1, L2; L3, L4.
In the pairwise crosstalk cancellation processes, the following calculations are performed:
for loudspeaker L1: Y1 =( H'2R) X_L + (H'2L)X_R , where: H'2L=-H2L(H1L•H2R) - (H1R•H2L) H'2R= H2R(H1L•H2R) - (H1R•H2L)
for loudspeaker L2: Y2 = ( H'1R) X_L + (H'1L)X_R, where: H'1L = H1L(H1L•H2R) - (H1R•H2L) H'1R = -H1R(H1L•H2R) - (H1R•H2L) where:

H1L

= HRTF of Loudspeaker L1 to Left Ear

H1R

= HRTF of Loudspeaker L1 to Right Ear

H2L

= HRTF of Loudspeaker L2 to Left Ear

H2R

= HRTF of Loudspeaker L2 to Right Ear
for loudspeaker L3: Y3 = ( H'4R) X_L + (H'4L)X_R, where: H'4L = -H4L(H3L•H4R) - (H3R•H4L) H'4R = H4R(H3L•H4R) - (H3R•H4L)
for loudspeaker L4: Y4 = ( H'3L) X_R + (H'3R)X_L, where: H'3L = H3L(H3L•H4R) - (H3R•H4L) H'3R = -H3R(H3L•H4R) - (H3R•H4L) where

H3L

= HRTF of Loudspeaker L3 to Left Ear

H3R =

HRTF of Loudspeaker L3 to Right Ear

H4L =

HRTF of Loudspeaker L4 to Left Ear

H4R =

HRTF of Loudspeaker L4 to Right Ear

Claims

A sound reproduction system, the system comprising a plurality of loudspeakers (L1, L2, L3, L4), and a processor capable of determining where, within a defined space, one or more virtual sound sources (V1, V2, V3, V4, V5, V6, V7) are located,
characterised by means (91), for each virtual sound source, for selecting a sub-set of the loudspeakers, said sub-set being selected from the plurality of loudspeakers on the basis of the location of the virtual sound source in the defined space, and means (94) for applying a cross talk cancellation process to the selected sub-set of the loudspeakers.
A sound reproduction system as claimed in Claim 1, having means for reproducing at least a primary virtual sound source and a secondary virtual sound source, and for delaying the secondary virtual sound source signal with respect to the primary source, to simulate a reflection of the primary source.
A sound reproduction system as defined in Claim 1 or Claim 2 wherein the subsets of loudspeakers are pairs (L1, L2; L3, L4) of loudspeakers.
A sound reproduction system as claimed in claim 3 wherein there are four loudspeakers (L1, L2, L3, L4) arranged substantially at 30° and 120° to left and right of a pre-determined centre line, and wherein four sectors (31, 32, 33, 34) are defined bounded by divisions at substantially 60° and 120° to left and right of the centre line, and wherein for virtual sources in the sector (31) bounded by the divisions to 60° left and right of the centre line the loudspeakers (L1, L2) at 30° from the centre line are selected, for positions (34) greater than 120° to left and right of the centre line the loudspeakers (L3, L4) at 120° to left and right of the centre line are selected, and for intermediate angles (32, 33) to the left of the centre line the two left-hand loudspeakers (L1, L3) are selected and for intermediate angles to the right of the centre line the two right-hand loudspeakers (L2, L4) are selected.
A sound reproduction system as claimed in claim 3 wherein there are five loudspeakers (L1, L2, L3 L4, L5) arranged substantially at 0°, 60° and 120° to left and right of a pre-determined centre line, and wherein five sectors (61, 62, 63, 64, 65) are defined bounded by divisions at substantially 0°, 10°, and 120° to left and right of the centre line, and wherein for virtual sources in the sector (62) bounded by the divisions at 0° and 10° left of the centre line the loudspeakers (L1, L2) at 0° and 60° left from the centre line are selected, for virtual sources in the sector (63) bounded by the divisions at 0° and 10° right of the centre line the loudspeakers (L1, L3) at 0° and 60° right from the centre line are selected, for positions (65) greater than 120° to left and right of the centre line the loudspeakers (L4, L5) at 120° to left and right of the centre line are selected, and for intermediate angles (61, 64) between 10° and 120° left or right of the centre line the two loudspeakers (L1, L3) at 60° left and right of the centre line are selected.
A system according to claim 3'wherein there are three loudspeakers (L1, L2, L3), arranged in front of a listening point (Z) and at 90° of the centre line to left and right, wherein virtual sources (74, 75) to the rear of the listening point (Z) are reproduced using the left and right loudspeakers (L1, L3) and virtual sources to the front of the listening point (Z) are represented by the central loudspeaker (L2) and the left or right speaker (L1, L3) according to which side of the centre line the virtual source is.
A method for reproducing sound by way of a plurality of loudspeakers (L1, L2, L3, L4) the method comprising the step of determining where, within a defined space, one or more virtual sound sources (V1, V2, V3, V4, V5, V6, V7) are located and characterised by applying, for each virtual sound source, a cross talk cancellation process to a sub-set of the loudspeakers, said sub-set being selected from the plurality of loudspeakers on the basis of the location of the virtual sound source in the defined space.
A method as claimed in Claim 7, for reproducing at least a primary virtual sound source and a secondary virtual sound source, wherein the secondary virtual sound source signal is delayed with respect to the primary source, to simulate a reflection of the primary source.
A method as defined in claim 7 or 8 wherein the loudspeakers are selected pairwise.
A method of sound reproduction according to claim 7, 8, or 9 wherein a plurality of virtual sound sources are operated on simultaneously, with cross-talk cancellation processes applied to appropriate sub-sets of the loudspeakers for each virtual sound source.