GB2474680A

GB2474680A - An audio processing method and apparatus

Info

Publication number: GB2474680A
Application number: GB0918584A
Authority: GB
Inventors: Nicolas Christian Andre Fournel
Original assignee: Sony Computer Entertainment Europe Ltd
Current assignee: Sony Interactive Entertainment Europe Ltd
Priority date: 2009-10-22
Filing date: 2009-10-22
Publication date: 2011-04-27
Anticipated expiration: 2029-10-22
Also published as: GB2474680B; GB0918584D0

Abstract

A method of audio processing using an entertainment device operable to implement a game environment comprises generating an output audio signal relating to the game environment from an input audio signal, detecting an in-game event from a set of in-game events each having an associated audio event, and selecting an audio processing operation to be applied to the input audio signal to implement the audio event. The audio processing operation is selected in dependence upon a current spectral content of the output audio signal and spectral information representing the effect on the output audio signal of the candidate audio processing operations, such that the spectral content of the output audio signal after implementation of the audio event approximates to a required spectral content. The method further comprises outputting the output audio signal. The generating step may comprise the step of including one or more of a plurality of audio assets, and candidate audio processing operations may comprise an audio asset selection operation. An audio processing apparatus is also described.

Description

AUDIO PROCESSING METHOD AND APPARATUS

The present invention relates to an audio processing method and apparatus.

Ever since entertainment devices have been used to play computer games, audio effects and audio content have typically been used in games so as to provide an enhanced audio experience for a user. Typical games use a so-called "audio engine" to generate an audio output signal for reproduction during a game, and to manage audio assets for reproduction during the game. Audio assets may often relate to a backing track for background music to be played during a game, or relate to sound effects for use in the game.

Typical audio engines are operable to carry out many different types of audio processing operation so as to generate the output audio signal. For example, the audio engine may select an audio asset for reproduction so that output of the audio asset substantially coincides with a particular game event within the game. As another example, a game audio engine may carry out audio mixing so as to include an audio asset in the output audio signal.

As a further example, the audio engine may apply an audio effect, such as phasing, flanging, compression, chorus, reverb, and the like, to one or more of the audio assets so as to affect the resultant output audio signal.

Typically, the output audio signal may comprise a plurality of audio channels. For example, for stereo output, the output audio signal will comprise a left channel and a right channel. As another example, for so called 5.1 (five channel plus low frequency effect (LFE)) surround sound, the output audio signal would typically comprise five channels: front left; front centre; front right; rear left; and rear right. The low frequency effect (a "sixth" channel) may be generated from the other five channels or it may be generated independently.

However, where the entertainment device has limited processing resources, for example in a game having very processor intensive graphics, or when a game is implemented on a hand-held device, the audio engine may have to select an audio asset from a plurality of audio assets so as to provide an appropriate audio output. In some cases, the selection of which audio asset to output may be carried out in dependence upon available processing resources or other criteria, with the audio asset requiring the least processing power to output being selected for output. However, this may mean that an inappropriate audio asset may be selected.

This may occur, for example, where the output audio signal comprises substantial low frequency audio content, such as a rumble caused by a simulated earthquake. If the audio engine were to select, for example, an audio asset corresponding to an explosion to be played out during simulation of the earthquake, it is likely that a user may not perceive or distinguish the noise of the explosion from the audio content which is already being output. Accordingly, the audio experience for the user may be impaired, or at least appear to be somewhat unexciting and flat.

The present invention seeks to mitigate or at least alleviate the above problems.

In a first aspect, there is provided a method of audio processing using an entertainment device operable to implement a game environment, the method comprising the steps of: generating an output audio signal relating to the game environment from an input audio signal, the output audio signal being associated with audio spectra data indicative of the spectral content of the output audio signal in a frequency domain representation of the output audio signal; detecting an in-game event in the game environment from a set of in-game events each having an associated audio event; selecting an audio processing operation from a plurality of candidate audio processing operations to be applied to the input audio signal in order to implement the audio event associated with the detected in-game event, the audio processing operation being selected in dependence upon a current spectral content of the output audio signal as indicated by the audio spectra data and spectral information representing the effect on the output audio signal of the candidate audio processing operations so that the spectral content of the output audio signal after implementation of the audio event approximates to a required spectral content; and outputting the output audio signal.

In a second aspect, there is provided an audio processing apparatus comprising an entertainment device operable to implement a game environment, the apparatus comprising: an output audio signal generator operable to generate an output audio signal from an input audio signal, the output audio signal being associated with audio spectra data indicative of the spectral content of the output audio signal in a frequency domain representation of the output audio signal; an in-game event detector operable to detect an in-game event in the game environment from a set of in-game events each having an associated audio event; an audio processing operation selector operable to select from a plurality of candidate audio processing operations an audio processing operation to be applied to the input audio signal in order to implement the audio event associated with the detected in-game event, the selector being operable to select the audio processing operation in dependence upon a current spectral content of the output audio signal as indicated by the audio spectra data and spectral information representing the effect on the output audio signal of the candidate audio processing operations so that the spectral content of the output audio signal after implementation of the audio event approximates to a required spectral content; and an output element operable to output the output audio signal to an audio signal reproduction device.

Embodiments of the invention advantageously provide an enhanced audio experience for a user during game play of a video game. By selecting an audio processing operation in dependence upon the current spectral output of an output audio signal and audio spectra data representative of the effect on the output audio signal of a candidate audio processing operation, embodiments of the invention allow the audio output to be adjusted so that, for example, an audio effect or new audio asset included in the output audio signal can be clearly audible to a user when the effect of new audio asset is applied. In other words, the spectral content of the output audio signal after implementation of the audio event can be adjusted so as to approximate to a required spectral content. This improves the audio experience because the new audio asset or effect on the output audio signal is likely to sound more exciting and dynamic.

Further respective aspects and features of the invention are defined in the appended claims.

Embodiments of the present invention will now be described by way of example with reference to the accompanying drawings, in which: Figure 1 is a schematic diagram of an entertainment device; Figure 2 is a schematic diagram of a cell processor; Figure 3 is a schematic diagram of a video graphics processor; Figure 4 is a schematic diagram of a game audio engine in accordance with embodiments of the present invention; Figure 5 is a schematic representation of audio spectra data in accordance with embodiments of the present invention; Figure 6 is a schematic diagram of an audio asset selector of the game audio engine in accordance with embodiments of the present invention; Figure 7 is a schematic diagram of an audio mixer of the game audio engine in accordance with embodiments of the present invention; Figure 8 is a schematic diagram of an audio shader of the game audio engine in accordance with embodiments of the present invention; Figure 9 is schematic diagram of an audio tool and audio engine in accordance with embodiments of the present invention; and Figure 10 is a flow chart of a method of audio processing in accordance with embodiments of the present invention.

A method of audio processing and an audio processing apparatus is disclosed. In the following description, a number of specific details are presented in order to provide a thorough understanding of embodiments of the present invention. However, it will be apparent to a person skilled in the art that these specific details need not be employed to practise the present invention. Conversely, specific details known to the person skilled in the art are omitted for the purposes of clarity in presenting the embodiments.

Figure 1 schematically illustrates the overall system architecture of the Sony® Playstation 3® entertainment device. A system unit 10 is provided, with various peripheral devices connectable to the system unit.

The system unit 10 comprises: a Cell processor 100; a Rambus® dynamic random access memory (XDRAM) unit 500; a Reality Synthesiser graphics unit 200 with a dedicated video random access memory (VRAM) unit 250; and an I/O bridge 700.

The system unit 10 also comprises a Blu Ray® Disk BD-ROM® optical disk reader 430 for reading from a disk 440 and a removable slot-in hard disk drive (HDD) 400, accessible through the I/O bridge 700. Optionally the system unit also comprises a memory card reader 450 for reading compact flash memory cards, Memory Stick® memory cards and the like, which is similarly accessible through the I/O bridge 700.

The 110 bridge 700 also connects to four Universal Serial Bus (USB) 2.0 ports 710; a gigabit Ethernet port 720; an IEEE 802.1 lb/g wireless network (Wi-Fi) port 730; and a Bluetooth® wireless link port 740 capable of supporting up to seven Bluetooth connections.

In operation the I/O bridge 700 handles all wireless, USB and Ethernet data, including data from one or more game controllers 751. For example when a user is playing a game, the 110 bridge 700 receives data from the game controller 751 via a Bluetooth link and directs it to the Cell processor 100, which updates the current state of the game accordingly.

The wireless, USB and Ethernet ports also provide connectivity for other peripheral devices in addition to game controllers 751, such as: a remote control 752; a keyboard 753; a mouse 754; a portable entertainment device 755 such as a Sony Playstation Portable® entertainment device; a video camera such as an EyeToy® video camera 756; and a microphone headset 757. Such peripheral devices may therefore in principle be connected to the system unit 10 wireles sly; for example the portable entertainment device 755 may communicate via a Wi-Fi ad-hoc connection, whilst the microphone headset 757 may communicate via a Bluetooth link.

The provision of these interfaces means that the Playstation 3 device is also potentially compatible with other peripheral devices such as digital video recorders (DVRs), set-top boxes, digital cameras, portable media players, Voice over IP telephones, mobile telephones, printers and scanners.

In addition, a legacy memory card reader 410 may be connected to the system unit via a USB port 710, enabling the reading of memory cards 420 of the kind used by the Playstation® or Playstation 2® devices.

In the present embodiment, the game controller 751 is operable to communicate wirelessly with the system unit 10 via the Bluetooth link. However, the game controller 751 can instead be connected to a USB port, thereby also providing power by which to charge the battery of the game controller 751. In addition to one or more analogue joysticks and conventional control buttons, the game controller is sensitive to motion in 6 degrees of freedom, corresponding to translation and rotation in each axis. Consequently gestures and movements by the user of the game controller may be translated as inputs to a game in addition to or instead of conventional button or joystick commands. Optionally, other wirelessly enabled peripheral devices such as the Playstation Portable device may be used as a controller. In the case of the Playstation Portable device, additional game or control information (for example, control instructions or number of lives) may be provided on the screen of the device. Other alternative or supplementary control devices may also be used, such as a dance mat (not shown), a light gun (not shown), a steering wheel and pedals (not shown) or bespoke controllers, such as a single or several large buttons for a rapid-response quiz game (also not shown).

The remote control 752 is also operable to communicate wirelessly with the system unit 10 via a Bluetooth link. The remote control 752 comprises controls suitable for the operation of the Blu Ray Disk BD-ROM reader 430 and for the navigation of disk content.

The Blu Ray Disk BD-ROM reader 430 is operable to read CD-ROMs compatible with the Playstation and PlayStation 2 devices, in addition to conventional pre-recorded and recordable CDs, and so-called Super Audio CDs. The reader 430 is also operable to read DVD-ROMs compatible with the Playstation 2 and PlayStation 3 devices, in addition to conventional pre-recorded and recordable DVDs. The reader 430 is further operable to read BD-ROMs compatible with the Playstation 3 device, as well as conventional pre-recorded and recordable Blu-Ray Disks.

The system unit 10 is operable to supply audio and video, either generated or decoded by the Playstation 3 device via the Reality Synthesiser graphics unit 200, through audio and video connectors to a display and sound output device 300 such as a monitor or television set having a display 305 and one or more loudspeakers 310. The audio connectors 210 may include conventional analogue and digital outputs whilst the video connectors 220 may variously include component video, S-video, composite video and one or more High Definition Multimedia Interface (HDMI) outputs. Consequently, video output may be in formats such as PAL or NTSC, or in 72Op, 1080i or lO8Op high definition.

Audio processing (generation, decoding and so on) is performed by the Cell processor 100. The Playstation 3 device's operating system supports Dolby® 5.1 surround sound, Dolby® Theatre Surround (DTS), and the decoding of 7.1 surround sound from Blu-Ray® disks.

In the present embodiment, the video camera 756 comprises a single charge coupled device (CCD), an LED indicator, and hardware-based real-time data compression and encoding apparatus so that compressed video data may be transmitted in an appropriate format such as an intra-image based MPEG (motion picture expert group) standard for decoding by the system unit 10. The camera LED indicator is arranged to illuminate in response to appropriate control data from the system unit 10, for example to signify adverse lighting conditions. Embodiments of the video camera 756 may variously connect to the system unit 10 via a USB, Bluetooth or Wi-Fi communication port. Embodiments of the video camera may include one or more associated microphones and also be capable of transmitting audio data. In embodiments of the video camera, the CCD may have a resolution suitable for high-definition video capture. In use, images captured by the video camera may for example be incorporated within a game or interpreted as game control inputs.

In general, in order for successful data communication to occur with a peripheral device such as a video camera or remote control via one of the communication ports of the system unit 10, an appropriate piece of software such as a device driver should be provided.

Device driver technology is well-known and will not be described in detail here, except to say that the skilled man will be aware that a device driver or similar software interface may be required in the present embodiment described.

Referring now to Figure 2, the Cell processor 100 has an architecture comprising four basic components: external input and output structures comprising a memory controller 160 and a dual bus interface controller 170A,B; a main processor referred to as the Power Processing Element 150; eight co-processors referred to as Synergistic Processing Elements (SPEs) 11 OA-H; and a circular data bus connecting the above components referred to as the Element Interconnect Bus 180. The total floating point performance of the Cell processor is 218 GFLOPS, compared with the 6.2 GFLOPs of the Playstation 2 device's Emotion Engine.

The Power Processing Element (PPE) 150 is based upon a two-way simultaneous multithreading Power 970 compliant PowerPC core (PPU) 155 running with an internal clock of 3.2 GHz. It comprises a 512 kB level 2 (L2) cache and a 32 kB level 1 (LI) cache. The PPE 150 is capable of eight single position operations per clock cycle, translating to 25.6 GFLOPs at 3.2 GHz. The primary role of the PPE 150 is to act as a controller for the Synergistic Processing Elements 11 OA-H, which handle most of the computational workload.

In operation the PPE 150 maintains a job queue, scheduling jobs for the Synergistic Processing Elements 11 OA-H and monitoring their progress. Consequently each Synergistic Processing Element 1 1OA-H runs a kernel whose role is to fetch a job, execute it and synchronise with the PPE 150.

Each Synergistic Processing Element (SPE) 11 OA-H comprises a respective Synergistic Processing Unit (SPU) 1 20A-H, and a respective Memory Flow Controller (MFC) 140A-H comprising in turn a respective Dynamic Memory Access Controller (DMAC) 142A-H, a respective Memory Management Unit (MMLJ) 144A-H and a bus interface (not shown).

Each SPU 120A-H is a RISC processor clocked at 3.2 GHz and comprising 256 kB local RAM 130A-H, expandable in principle to 4 GB. Each SPE gives a theoretical 25.6 GFLOPS of single precision performance. An SPU can operate on 4 single precision floating point members, 4 32-bit numbers, 8 16-bit integers, or 16 8-bit integers in a single clock cycle. In the same clock cycle it can also perform a memory operation. The SPU l2OA-H does not directly access the system memory XDRAM 500; the 64-bit addresses formed by the SPU 120A-H are passed to the MFC 140A-H which instructs its DMA controller 142A-H to access memory via the Element Interconnect Bus 180 and the memory controller 160.

The Element Interconnect Bus (EIB) 180 is a logically circular conimunication bus internal to the Cell processor 100 which connects the above processor elements, namely the PPE 150, the memory controller 160, the dual bus interface 1 70A,B and the 8 SPEs 11 OA-H, totalling 12 participants. Participants can simultaneously read and write to the bus at a rate of 8 bytes per clock cycle. As noted previously, each SPE 11 OA-H comprises a DMAC 1 42A-H for scheduling longer read or write sequences. The EIB comprises four channels, two each in clockwise and anti-clockwise directions. Consequently for twelve participants, the longest step-wise data-flow between any two participants is six steps in the appropriate direction. The theoretical peak instantaneous EIB bandwidth for 12 slots is therefore 96B per clock, in the event of full utilisation through arbitration between participants. This equates to a theoretical peak bandwidth of 307.2 GB/s (gigabytes per second) at a clock rate of 3.2GHz.

The memory controller 160 comprises an XDRAM interface 162, developed by Rambus Incorporated. The memory controller interfaces with the Rambus XDRAM 500 with a theoretical peak bandwidth of 25.6 GB/s.

The dual bus interface 170A,B comprises a Rambus FlexIO® system interface 172A,B. The interface is organised into 12 channels each being 8 bits wide, with five paths being inbound and seven outbound. This provides a theoretical peak bandwidth of 62.4 GB/s (36.4 GB/s outbound, 26 GB/s inbound) between the Cell processor and the 110 Bridge 700 via controller 1 70A and the Reality Simulator graphics unit 200 via controller 1 70B.

Data sent by the Cell processor 100 to the Reality Simulator graphics unit 200 will typically comprise display lists, being a sequence of commands to draw vertices, apply textures to polygons, specify lighting conditions, and so on.

Referring now to Figure 3, the Reality Simulator graphics (RSX) unit 200 is a video accelerator based upon the NVidia® G70/7 1 architecture that processes and renders lists of commands produced by the Cell processor 100. The RSX unit 200 comprises a host interface 202 operable to communicate with the bus interface controller 1 70B of the Cell processor 100; a vertex pipeline 204 (VP) comprising eight vertex shaders 205; a pixel pipeline 206 (PP) comprising 24 pixel shaders 207; a render pipeline 208 (RP) comprising eight render output units (ROPs) 209; a memory interface 210; and a video converter 212 for generating a video output. The RSX 200 is complemented by 256 MB double data rate (DDR) video RAM (VRAM) 250, clocked at 600MHz and operable to interface with the RSX 200 at a theoretical peak bandwidth of 25.6 GB/s. In operation, the VRAM 250 maintains a frame buffer 214 and a texture buffer 216. The texture buffer 216 provides textures to the pixel shaders 207, whilst the frame buffer 214 stores results of the processing pipelines. The RSX can also access the main memory 500 via the EIB 180, for example to load textures into the VRAM 250.

The vertex pipeline 204 primarily processes deformations and transformations of vertices defining polygons within the image to be rendered.

The pixel pipeline 206 primarily processes the application of colour, textures and lighting to these polygons, including any pixel transparency, generating red, green, blue and alpha (transparency) values for each processed pixel. Texture mapping may simply apply a graphic image to a surface, or may include bump-mapping (in which the notional direction of S a surface is perturbed in accordance with texture values to create highlights and shade in the lighting model) or displacement mapping (in which the applied texture additionally perturbs vertex positions to generate a deformed surface consistent with the texture).

The render pipeline 208 performs depth comparisons between pixels to determine which should be rendered in the final image. Optionally, if the intervening pixel process will not affect depth values (for example in the absence of transparency or displacement mapping) then the render pipeline and vertex pipeline 204 can communicate depth information between them, thereby enabling the removal of occluded elements prior to pixel processing, and so improving overall rendering efficiency. In addition, the render pipeline 208 also applies subsequent effects such as full-screen anti-aliasing over the resulting image.

Both the vertex shaders 205 and pixel shaders 207 are based on the shader model 3.0 standard. Up to 136 shader operations can be performed per clock cycle, with the combined pipeline therefore capable of 74.8 billion shader operations per second, outputting up to 840 million vertices and 10 billion pixels per second. The total floating point performance of the RSX 200 is 1.8 TFLOPS.

Typically, the RSX 200 operates in close collaboration with the Cell processor 100; for example, when displaying an explosion, or weather effects such as rain or snow, a large number of particles must be tracked, updated and rendered within the scene. In this case, the PPU 155 of the Cell processor may schedule one or more SPEs 11 OA-H to compute the trajectories of respective batches of particles. Meanwhile, the RSX 200 accesses any texture data (e.g. snowflakes) not currently held in the video RAM 250 from the main system memory 500 via the element interconnect bus 180, the memory controller 160 and a bus interface controller 170B. The or each SPE 1 1OA-H outputs its computed particle properties (typically coordinates and normals, indicating position and attitude) directly to the video RAM 250; the DMA controller 142A-H of the or each SPE 1 1OA-H addresses the video RAM 250 via the bus interface controller 1 70B. Thus in effect the assigned SPEs become part of the video processing pipeline for the duration of the task.

In general, the PPU 155 can assign tasks in this fashion to six of the eight SPEs available; one SPE is reserved for the operating system, whilst one SPE is effectively disabled. The disabling of one SPE provides a greater level of tolerance during fabrication of the Cell processor, as it allows for one SPE to fail the fabrication process. Alternatively if all eight SPEs are functional, then the eighth SPE provides scope for redundancy in the event of subsequent failure by one of the other SPEs during the life of the Cell processor.

The PPU 155 can assign tasks to SPEs in several ways. For example, SPEs may be chained together to handle each step in a complex operation, such as accessing a DVD, video and audio decoding, and error masking, with each step being assigned to a separate SPE.

Alternatively or in addition, two or more SPEs may be assigned to operate on input data in parallel, as in the particle animation example above.

Software instructions implemented by the Cell processor 100 andlor the RSX 200 may be supplied at manufacture and stored on the HDD 400, and/or may be supplied on a data carrier or storage medium such as an optical disk or solid state memory, or via a transmission medium such as a wired or wireless network or internet connection, or via combinations of these.

The software supplied at manufacture comprises system firmware and the Playstation 3 device's operating system (OS). In operation, the OS provides a user interface enabling a user to select from a variety of functions, including playing a game, listening to music, viewing photographs, or viewing a video. The interface takes the form of a so-called cross media-bar (XMB), with categories of function arranged horizontally. The user navigates by moving through the function icons (representing the functions) horizontally using the game controller 751, remote control 752 or other suitable control device so as to highlight a desired function icon, at which point options pertaining to that function appear as a vertically scrollable list of option icons centred on that function icon, which may be navigated in analogous fashion. However, if a game, audio or movie disk 440 is inserted into the BD-ROM optical disk reader 430, the Playstation 3 device may select appropriate options automatically (for example, by commencing the game), or may provide relevant options (for example, to select between playing an audio disk or compressing its content to the HDD 400).

In addition, the OS provides an on-line capability, including a web browser, an interface with an on-line store from which additional game content, demonstration games (demos) and other media may be downloaded, and a friends management capability, providing on-line communication with other Playstation 3 device users nominated by the user of the current device; for example, by text, audio or video depending on the peripheral devices available. The on-line capability also provides for on-line communication, content download and content purchase during play of a suitably configured game, and for updating the firmware and OS of the Playstation 3 device itself. It will be appreciated that the term "on-line" does not imply the physical presence of wires, as the term can also apply to wireless connections of various types.

Embodiments of the present invention in which an audio processing operation is applied to an input audio signal in dependence upon the spectral content of an output signal will now be described with reference to Figures 4 to 10.

Figure 4 shows a schematic diagram of a game audio engine 1000 in accordance with embodiments of the present invention. The game audio engine 1000 comprises decision logic operable to determine how one or more audio processing operations should be applied to an input audio signal so as to generate an output audio signal.

In embodiments of the present invention, the audio engine is implemented by the system unit 10 under software control, although it will be appreciated that the audio engine could be implemented in hardware, for example as an application specific integrated circuit (ASIC), field programmable gate array (FPGA), and the like. It is to be understood that the term "game audio engine" should be taken to be any device, process, or method operable to implement the functionality of embodiments of the present invention as described below.

The audio engine 1000 is operable to generate an output audio signal for reproduction by, for example, the display and sound output device 300. In particular, the audio engine 1000 is operable to generate an output audio signal for reproduction during game play of a game being executed on the system unit 10. In embodiments of the present invention, the audio engine 1000 is operable to generate the output audio signal from audio assets stored on the hard disc drive 400, although it will be appreciated that the audio assets may be stored on any other suitable storage element such as a Blu-Ray® disc, CD-ROM, DVD-ROM, memory card, and the like.

In embodiments, an audio asset comprises audio data indicative of audio content of that asset such that the audio asset may be reproduced by a suitable reproduction device such as a loudspeaker. For example, an audio asset might comprise audio data relating to a backing track for background music to be played during a game or dialogue by a game character or a user. As another example, an audio asset may comprise audio data relating to a sound effect such as a siren, punch, gun shot, explosion, lightning, car engine noise, and the like. However, it will be appreciated that an audio asset may comprise audio data relating to any type of audio content. In some embodiments, an audio asset may be categorised according to whether it relates to a sound effect, music, or dialogue, although it will be appreciated that any other suitable categories could be used. In some embodiments, the category of the audio asset may be appended to the audio asset as category metadata, although it will be appreciated that other methods for indicating the category of the audio asset may be used.

The audio engine 1000 is operable to analyse audio spectra data 1020 using the decision logic 1010 to determine how an audio processing operation should be applied to the audio assets so as to generate the output audio signal. In embodiments, the audio spectra data 1020 is indicative of the spectral content of the output audio signal in a frequency domain representation of the output audio signal.

In embodiments, the system unit 10 is operable to implement a game environment corresponding to a game such as a sports game, fighting game, role playing game and the like although the system unit 10 could implement any suitable game environment. More generally, as mentioned above, the system unit 10 is operable to implement any of the functionality of the audio engine 1000.

The audio engine 1000 is operable to generate an output audio signal from an input audio signal such as one or more audio assets. The output audio signal is associated with the audio spectra data 1020. The audio engine 1000 is operable to detect an in-game event in the game environment from a set of in-game events each having an associated audio event.

For example, an in-game event may be a gun being fired within the game environment implemented by the system unit 10, with the associated audio event being the sound of a gun-shot. As another example, within a sports game such as cricket, a game event in which a game player uses a cricket bat to hit a ball would have an associated sound (audio event) corresponding to a sound of the cricket ball being hit. If the game player hits a "six" (a high scoring shot) within the game, the associated audio event could correspond to the sound of a crowd cheering. In other words, each in-game event in the set of in-game events (e.g. a ball being hit, a player hitting a "six" etc.) has an associated audio event. In other words, the set of in-game events comprises one or more in-game events which may occur during game play.

The audio engine 1000 is operable to select from a plurality of candidate audio processing operations an audio processing operation to be applied to the input audio signal in order to implement the audio event associated with the detected in-game event. In some embodiments, the candidate audio processing operations comprise an audio asset selection operation for selecting an audio asset to be included in the input audio signal. In other embodiments, the candidate audio processing operations comprise an audio mixing operation to be applied to the audio assets in the input audio signal. In some embodiments, the candidate audio processing operations comprise one or more audio effect processing operations to be applied to the input audio signal. The audio asset selection operation, audio mixing operation and audio effect processing operations will be described in more detail later below.

The audio engine 1000 is operable to select the audio processing operation in dependence upon a current spectral content of the output audio signal as indicated by the audio spectra data 1020 and spectral information representing the effect on the output audio signal of the candidate audio processing operations so that the spectral content of the output audio signal after implementation of the audio event approximates to a required spectral content. The audio spectra data 1020 and spectral information will be described in more detail below.

The audio engine 1000 is operable to output the output audio signal to an audio signal reproduction device such as the display and sound output device 300, although any suitable audio signal reproduction device could be used.

As mentioned above, prior art audio engines may select an audio asset in dependence upon available processing resources. Other techniques for selecting an audio asset include: selecting an audio asset in dependence upon a distance between a sound source in the game and an in-game listener (for example, an audio asset corresponding to an in-game sound source closest to an in-game listener could be selected for reproduction); assigning a respective priority to each audio asset and reproducing the asset with the highest priority; selecting an audio asset in accordance with how recently an audio asset was reproduced (for example, reproduction of older audio assets could be stopped and newer audio assets reproduced in place of the older audio assets); and selecting audio assets which are already being reproduced in preference over newer audio assets to be included in the audio output.

However, the above techniques may lead to a situation where a selected audio asset is not audible to a user in the output audio signal because the selected audio asset may have a frequency content which is similar to the frequency content of other audio assets that are being used to generate the current output audio signal.

Therefore, in some embodiments, the audio engine 1000 is operable to select an audio asset to include in the output audio signal in dependence upon the spectral content of the output signal. Where, for example, the audio output signal has a substantial low frequency component (as indicated by the audio spectra data), embodiments of the present invention may allow an audio asset which has a substantial high frequency content (for example corresponding to spectral information representing the effect on the output audio signal) to be selected in preference to an audio asset which has a substantial low frequency component.

However, it will be appreciated that any other suitable audio asset may be selected in accordance with the spectral content of the output audio signal and spectral information representing the effect on the output audio signal as appropriate. This advantageously provides an improved audio experience for the user because the audio engine 1000 can select an audio asset which is more likely to be discemable to a user when the output audio signal is reproduced. In other words, the spectral content of the output audio signal after implementation of the audio event approximates to a required spectral content where, for example, the required spectral content is such that the selected audio asset is more likely to be discernable by the user.

In order for the audio engine 1000 to select an audio asset in dependence upon the current spectral content of the output audio signal, the audio engine 1000 should have a way of determining the current spectral content of the output audio signal. In embodiments, the output audio signal is associated with audio spectra data 1020 representative of the spectral content of the output audio signal in a frequency domain representation of the output audio signal.

In some embodiments, the audio engine is operable to carry out spectrum analysis on each channel of the output audio signal so as to generate the audio spectra data 1020 by using known techniques such as a Fast Fourier Transform (FFT) or a Constant Q Transform (CQT), although it will be appreciated that any other suitable method for generating the audio spectra data 1020 may be used. As mentioned above, the term "channel" should be taken to mean an output channel of the output audio signal such as a left channel and a right channel, for example. In some embodiments, the audio spectra data 1020 is generated each frame period corresponding to a frame rate of the game, although the audio spectra data 1020 could be generated periodically at any other suitable time period.

However, generating the audio spectra data 1020 by carrying out spectrum analysis on the output audio signal can be computationally expensive. This is of particular importance where a game is implemented on a hand-held device such as a mobile telephone or personal digital assistant (PDA) as processor intensive tasks may shorten battery life due to an increase in power consumption when carrying out the tasks. Additionally, spectrum analysis of each audio channel may not provide an accurate representation of the panning position of each audio asset (i.e. the relative apparent position of each audio asset with respect to audio output devices such as loudspeakers or headphones); if an audio asset is to be reproduced such that it appears to sound as if located between two audio output devices, an input audio signal corresponding to that audio asset will be split between the two channels for the audio output devices. Therefore, it may be difficult to reconstruct the panning position of audio asset by analysis of the channels of the output audio signal, especially if the output audio signal comprises a large number of audio assets.

Accordingly, in embodiments of the present invention, the audio spectra data 1020 is represented as an audio spectra matrix comprising frequency band and panning position elements. The structure of an audio spectra matrix in accordance with embodiments of the present invention will now be described with reference to Figure 5.

Figure 5 shows a schematic diagram of a format of audio spectra data in accordance with embodiments of the present invention. In particular, Figure 5 shows an example of a visual representation of the audio spectra matrix. The audio spectra matrix comprises matrix elements e, representative of an audio level (e.g. volume) for that element. Here, the subscript 1 refers to the frequency band, and the subscript j refers to the panning position. In the example shown in Figure 5, the audio spectra matrix comprises 5 columns corresponding to 5 frequency bands, and 16 rows corresponding to 16 panning positions. However, it will be appreciated that the audio spectra matrix may comprise any number of rows and columns as appropriate.

In the example shown in Figure 5, shading of the elements represents the audio level for each element. In particular, Figure 5 shows five audio levels, with 0 corresponding to white, and the heaviest shading corresponding to 4. However, it should be understood that the representative shading of Figure 5 is merely for understanding the drawings and should not be taken as being indicative of an actual audio level. Furthermore, it will be appreciated that any number of audio levels could be used. In preferred embodiments, the audio level is represented in decibels (dB), although the audio levels may be represented within the audio spectra matrix in any suitable manner.

Additionally, in Figure 5, the panning position is labelled in the horizontal direction and the frequency bands indicated by the "primed" numbers in the vertical direction. The panning position and frequency bands can therefore be thought of as coordinates. For example, the coordinates (1, 1') refer to matrix element ej, and the coordinates (2, 1') refer to the matrix element e]2. In Figure 5, frequency band 5' comprises higher frequencies than frequency band 1'. In other words, higher frequency bands are shown in the increasing y-direction.

In some embodiments, the audio spectra data and the spectral information respectively comprise audio spectra matrices as described below, each having matrix elements that S associate panning data with frequency band data. As mentioned above, each matrix element has an associated audio level. The panning data (as indicated by the panning position) is indicative of an apparent relative position of components of the input audio signal with respect to a multi-channel sound output device such as a plurality of loud speakers. For example, each channel of the multi-channel sound output device could correspond to a loud speaker. In other words, the panning data comprises the data relating to the panning position.

In embodiments, the frequency band data is associated with a plurality of audio spectra frequency bands in the frequency domain representation of the output audio signal.

In the example shown in Figure 5, the panning position is illustrated with respect to a left position and a right position. In embodiments, the left position corresponds to a left-hand is loudspeaker or headphone (a left-hand channel) and the right position corresponds to a right-hand loudspeaker or headphone (a right-hand channel) with respect to a user.

In embodiments, the audio spectra matrix is generally of the form shown below: e11 e12 e1j e1 e22 e2j matrix 1 e1 e.2... e Taking the example shown in Figure 5, the corresponding audio spectra matrix is matrix 2 where elements corresponding to panning positions 5-15 have not been shown (as indicated by lines of three dots in matrix 2) for the sake of conciseness.

In some embodiments, so that the audio spectra matrix for the output audio signal can be generated in accordance with the audio assets which make up the output audio signal, each audio asset is associated with a respective audio spectra matrix which represents the spectral content of that audio asset. Each audio spectra matrix associated with an audio asset has the same general form as matrix 1 shown above. The way in which audio spectra matrices are generated for each of the audio assets will be described in more detail later below.

Accordingly, in some embodiments, each time the content of the output audio signal is changed by the audio engine 1000, for example, by including another audio asset in the output audio signal in response to a game event, the audio engine 1000 is operable to update the audio spectra matrix associated with the output audio signal. Therefore, the audio engine 1000 can update the audio spectra matrix associated with the output audio signal so that a representation of the current spectral content is maintained. In embodiments, the audio engine 1000 is operable to update the audio spectra data substantially in real-time in accordance with the current spectral content of the output audio signal.

By maintaining the audio spectra matrix so that it represents the current spectral content of the output audio signal, the audio engine 1000 can apply an audio processing operation to an input audio signal so as to generate the output audio signal without having to carry out periodical spectral analysis of the output audio signal. Therefore, the use of an audio spectra matrix which is updated in accordance with the current spectral content of the output audio signal allows implementation of embodiments of the invention on portable devices such as mobile telephones, hand-held entertainment devices, personal digital assistants, and the like. Furthermore, the audio spectra matrix provides a better representation of the panning position of audio assets than detecting the panning position of audio assets by spectral analysis of the output channels of the output audio signal.

Although the panning position has been described with respect to a left-hand and a right-hand channel, it will be appreciated that any number of channels could be used, for example five channels for so-called 5.1 surround sound. Therefore, more generally, to represent the panning data for multiple audio output channels (for example, for five channels), in embodiments of the present invention, the audio spectra data and spectral information comprise audio spectra tensors indicative of the spectral content of the output audio signal aridlor spectral information, with the audio spectra tensors having as many dimensions as necessary to represent the spectral content of the respective audio asset or the output audio signal. However, it will be appreciated that any multi-dimensional array could be used to represent the audio spectra data and the spectral information.

The selection of audio assets in accordance with the spectral content of the output audio signal will now be described in more detail with reference to Figure 6.

Figure 6 is a schematic diagram of an audio asset selector of the audio engine 1000 in accordance with embodiments of the present invention. In particular, the audio engine 1000 comprises an asset selector 2000, a spectral data generator, and an asset selector controller 2020. The audio asset selector 2000 is operable to select one or more audio assets, AA1 to AA7, to include as at least part of the output audio signal in accordance with the current spectral content of the output audio signal.

In some embodiments, the spectral data generator 2010 is operable to carry out spectral analysis of the audio output signal so as to generate the audio spectra data. However, as mentioned above, the audio spectra data preferably comprises the audio spectra matrix associated with the output audio signal. Therefore, in embodiments, the spectral data generator 2010 is operable to update and maintain the audio spectra matrix in accordance with the current output audio signal. To achieve this functionality, the spectral data generator 2010 is operable to communicate bi-directionally with the asset selector controller 2020 (as indicated by the dashed line in Figure 6). Therefore, the audio spectra matrix can be updated in accordance with a selection of an audio asset by the asset selector controller 2020.

More generally, in embodiments, the audio engine 1000 is operable to include one or more of a plurality of the audio assets as at least part of the input audio signal. As mentioned above, in some embodiments, the candidate audio processing operations comprise an audio asset selection operation for selecting an audio asset to be included in the input audio signal.

The spectral information of the audio asset selection operation comprises asset spectra data (such as asset spectra matrices) indicative of the spectral content of respective audio assets in a frequency domain representation of those audio assets.

In embodiments, the audio engine 1000 is operable to select the audio asset selection operation from the candidate audio processing operations in response to the detection of the in-game event. In embodiments, the audio asset selection operation comprises selecting a first audio asset from the plurality of audio assets to be included in the input audio signal, the first audio asset being selected in dependence upon a comparison between the audio spectra data and the respective asset spectra data associated with each audio asset stored in the storage element so as to implement the audio event associated with the detected in-game event. An example of this functionality will now be described below.

An example of a selection of an audio asset to include as at least part of the audio output signal will now be described with reference to Figures 5 and 6.

In this example, the output audio signal comprises audio asset AA1, audio asset AA2, and audio asset AA3. The current spectral content of the output audio signal is that shown in Figure 5 and represented in matrix 2 above. Considering the first three panning positions, the corresponding audio spectra matrix for the output audio signal comprising audio assets AA 1, AA2, and AA3 is: matrix 3 As will be appreciated from matrix 3 above and from Figure 5, the audio output signal comprising the audio assets AA1, AA2, and AA3 has a substantial low frequency component, as indicated by the audio level 4 in frequency bands 1' and 2' (e21 and e23 in matrix 3).

In this example, the audio asset spectra matrix for audio assets AA4 to AA7 are as follows: Audio asset 4 (AA4): matrix 4 Audio asset 5 (AA5): matrix 5 Audio asset 6 (AA6): matrix 6 Audio asset 7 (AA7): matrix 7 As can be seen from matrix 4 above, matrix 4 has a substantial low frequency component. Matrix 5 has some higher frequency components corresponding to frequency bands 3' and 4' with the audio levels of matrix 5 all being less than an audio threshold level thr (where, in this example, thrAL 3). Matrix 6 has a substantial high frequency component as indicated by the audio levels in frequency bands 3', 4', and 5', where the audio levels at ej5, e23, e25, and e34 are greater than or equal to the audio threshold level thrAL. Matrix 7 has low frequency components in frequency bands 1', and 2' with corresponding audio levels which are less than audio threshold level thrAL.

In order to select an audio asset to include as at least part of the output audio signal, in some embodiments, the asset selector controller 2020 is operable to generate a first set of frequency bands from the audio spectra frequency bands of the audio spectra data. To generate the first set, the asset selector controller 2020 is operable to determine, from the audio spectra matrix associated with the output audio signal, those frequency bands which have an audio level which is less than the audio threshold level thrAL.

In the example given above, matrix 3 comprises twelve elements whose audio levels are less than the audio threshold level thrAL. In this example, the first set comprises elements ejj, e12, e13, e14, ejs, e22, e23, e24, e25, e33, e34, and e35.

The audio asset controller 2020 is additionally operable to determine which audio assets have associated audio spectra matrices whose elements have respective audio levels which are greater than the audio threshold level thrAL in the corresponding data elements of the first set so as to generate an audio asset set comprising one or more audio assets. To achieve this, in some embodiments, the audio asset controller 2020 is operable to compare each element of the audio spectra matrix which has an audio level less than the audio threshold level thrAL (in other words, elements in the first set) with the corresponding elements in each of the audio spectra matrices associated with the audio assets.

For the example above, matrix 4, elements ejj, and e22 have audio levels which are greater than the audio threshold level thrAL. Matrix 5 does not have any elements which have an audio level greater than the audio threshold level thrAL at element positions corresponding to the elements in the first set. In matrix 6, elements en, ej, e23, e25, and e34 have audio levels greater than the audio threshold level thrAL. Matrix 7 does not have any elements which have an audio level greater than the audio threshold level thrAL at element positions corresponding to the elements in the first set.

In embodiments, the audio asset controller 2020 is operable to include, in the audio asset set, those audio assets which have elements whose audio levels are greater than the audio threshold level thrAL at element positions corresponding to the element positions of the first set. With reference to the above example, the audio asset set would thus comprise the audio asset AA4 and the audio asset AA6.

The audio asset controller 2020 is then operable to cause the asset selector 2000 to select an audio asset from the audio asset set in accordance with a selection criterion. In some embodiments, the selection criterion is which of the audio assets has an associated matrix which has the greatest number of elements having audio levels greater than the audio threshold level thr.4L. In the above example, this selection criterion would lead to audio asset 6 being selected to be included as a least a part of the output audio signal.

Were audio asset AA4 to be selected, this audio asset would be unlikely to be easily heard by a user in the output audio signal because the low frequency components at elements ej, and e22 of matrix 4 are adjacent to similar low frequency components in matrix 3.

Although matrix 5 corresponding to audio asset AA5 has an element corresponding to frequency band 4' at element ej4, the corresponding audio level is 1, so that audio asset AA5 is unlikely to be clearly heard in the output audio signal. The audio levels of audio asset 7 are all below that of the corresponding elements of the matrix 3 of the current output audio signal.

Therefore, audio asset 7 would be unlikely to be heard in the resultant audio output signal if audio asset 7 were to be included in the audio output signal.

As can be seen from matrix 6, audio asset AA6 has spectral components which do not substantially correspond to the current spectral components of the audio output signal.

Therefore, when included in the output audio signal, audio asset AA6 is likely to be easily discemable to a listener, thus leading to an improved game audio experience for the user.

Accordingly, by using a selection criterion which relates to which of the audio assets has an associated matrix which has the greatest number of elements having audio levels greater than the audio threshold level rurAL, an appropriate audio asset which is likely to be easily heard in the output audio signal can be selected to be included as at least part of the output audio signal in accordance with the current spectral content of the output audio signal.

In other embodiments, the selection criterion is which of the audio assets has an associated matrix which has the greatest number of elements having audio levels greater than the audio levels at corresponding element positions in the audio spectra matrix of the output audio signal (for example matrix 3).

S Alternatively, in other embodiments, the asset selector controller 2020 is operable to compare the audio levels of each matrix element of the audio spectra matrix associated with the output audio signal with the respective audio levels of each corresponding matrix element of each of the audio assets. The asset selector controller 2020 is then operable to cause the asset selector 2000 to select an audio asset whose corresponding matrix is detected as having the greatest number of matrix elements with audio levels greater than the audio levels of the corresponding elements of the audio spectra matrix.

In other words, more generally, the audio engine 1000 is operable to generate a first set of data elements from one or more data elements of the audio spectra data, the data elements in the first set each being associated with respective audio levels which are less than an audio level threshold. The audio engine 1000 is then operable to compare the audio levels of the data elements in the first set with audio levels associated with corresponding data elements of the asset spectra data. The audio engine 1000 is operable to select an audio asset (for example audio asset AA6) whose audio levels are greater than the audio level threshold at data elements corresponding to the data elements in the first set as the first audio asset so that the spectral content of the output audio after the first audio asset is included in the input audio signal approximates to the required spectral content.

It will be appreciated that the spectral content of an audio asset may vary with time.

Therefore, in some embodiments, the spectral content of an audio asset is represented by a sequence of audio spectra matrices. Each audio spectra matrix in the sequence of audio spectra matrices has a time stamp representative of a time at which that audio spectra matrix represents the spectral content of the audio asset with respect to a time stamp associated with a first audio spectra matrix in the sequence of audio spectra matrices of that audio asset. The time period between each time stamp of the audio spectra matrices of an audio asset is referred to herein as the time resolution. In some embodiments, a time period between the time stamps of the audio spectra matrices in the sequence corresponds to a time period between successive frames in the game. For example, where the frame rate of the game is 25 frames per second, the time period (time resolution) between time stamps of the audio spectra matrices is 0.04 seconds. However, it will be appreciated that any other suitable time resolution may be used. Additionally, it will be appreciated that any other suitable timing data could be associated with each audio spectra matrix in the sequence of audio spectra matrices so as to identify its relative time within the sequence.

More generally, in embodiments, successive audio spectra data (successive audio spectra data items) may be used to represent the spectral content of an audio asset over a period of time, with each audio spectra data item having an associated time stamp.

In some embodiments, the asset selector 2000 is operable to select an audio asset to include as at least part of the output audio signal based on spectral content corresponding to the start of an audio asset (for example, a first audio asset matrix having a time stamp of zero). However, this may mean that, as time progresses, the selected audio asset may become less appropriate for inclusion in the output audio signal. Accordingly, in some embodiments, the asset selector controller 2020 is operable to generate time averaged audio spectra data for each audio asset by calculating the mean average spectral content of that audio asset over the duration of that asset.

For example, if an audio asset has a duration of 4 seconds when reproduced for output with a frame rate of 25 samples per second (corresponding to a time resolution of 1/25 second and corresponding to a total of 100 audio asset matrices), the asset selector controller 2020 can generate the mean average spectral content by calculating a mean average element value for each element position in the matrices. In other words, in embodiments, the asset selector controller 2020 is operable to generate a time averaged audio asset matrix from the sequence of audio asset matrices associated with that audio asset. However, it will be appreciated that a median average or a modal average could also be used.

Accordingly, in some embodiments, the asset selector 2000 is operable to select an audio asset in dependence upon the average spectral content of the audio asset in a similar manner to that described above with reference to Figure 5 and matrices 1 to 7 above.

In some embodiments, the asset selector 2000 is operable to select an audio asset in dependence upon the number of matrix elements which have a level greater than a selection threshold level. For example, the asset selector 2000 could select an audio asset (to include as at least part of the output audio signal) whose corresponding matrix has the greatest number of elements having a value exceeding the selection threshold out of the audio assets available for selection (candidate audio assets).

In other embodiments, the asset selector 2000 is operable to select an audio asset in dependence upon the number of matrix elements which have a level less than the selection threshold level. For example, the asset selector 2000 could select an audio asset (to include as at least part of the output audio signal) whose corresponding matrix has the least number of elements having a value exceeding the selection threshold out of the audio assets available for selection (candidate audio assets). The selection level threshold can be preset within the audio engine 1000 or it may be set by a user via a suitable audio tool, although any other method of setting the selection level threshold could be used.

However, it will be appreciated that any other suitable method for selecting one or more audio assets to include as at least part of the output audio signal could be used.

Embodiments of the present invention in which one or more audio assets are mixed with one or more other audio assets in accordance with the current spectral content of the output audio signal will now be described with reference to Figure 7.

Figure 7 shows a schematic diagram of an audio mixer of the game audio engine in accordance with embodiments of the present invention. In particular, in these embodiments, the game audio engine 1000 comprises a mixer 3000, a spectral data generator 3010, and a mixer controller 3020. The spectral generator 3010 has substantially the same functionality as the spectral generator 2010 described above with reference to Figure 6.

The mixer controller 3020 is operable to communicate bi-directionally with the spectral data generator 3010 so as to control the mix of audio assets used to generate the output audio signal. The spectral data generator 3010 is operable to maintain a representation of the current spectral content of the output audio signal by updating the audio spectra matrix as appropriate in accordance with the spectral content of the output audio signal.

Alternatively, in other embodiments, the spectral data generator 3010 is operable to carry out spectrum analysis on the output audio signal so as to generate the audio spectra data.

In order to control the content of the output audio signal so as to provide a dynamic and interesting audio experience for a user, the mixer controller 3020 is operable to cause the mixer 3000 to adjust the relative proportions of the audio assets in accordance with the current spectral content of the output audio signal so as to substantially correspond to a mix profile. A mix profile (also referred to as a target mix) represents a desired mix of audio assets in the output audio signal. In some embodiments, the mix profile may be generated by the audio engine 1000 in response to in-game events. In the example shown in Figure 7, the input audio signal to the mixer 3000 comprises the seven audio assets AA1 to AA7. However, it will be appreciated that the input audio signal could comprise any suitable number of audio assets.

In embodiments, the mix profile is associated with respective spectral mixing information representing the effect on the output signal of the audio mixing operation.

Accordingly, the mixer 3000 is operable to predict how the spectral content of the output audio signal will change during an audio mixing operation based on the spectral mixing information. The mixer 3000 can then adjust the mix of audio assets appropriately so as to substantially correspond to the mix profile. In other words, the spectral content of the output audio signal can be adjusted to substantially correspond to the spectral content of the mix profile as indicated by the spectral mixing information associated with that mix profile.

In some embodiments, the audio engine 1000 is operable to carry out a mixing operation using the mixer 3000 so as to mix a selected audio asset with audio assets which form the output audio signal. For example, this may be necessary once an audio asset has been selected to be included as at least part of the output audio signal as described above with reference to Figure 6.

An example of mixing an audio asset with a plurality of other audio assets will now be described with reference to Figure 7. In this example, the input audio signal comprises the audio assets AA1 to AA6. As mentioned above, the audio engine is operable to select an audio asset to include as at least part of the output audio signal in dependence upon the spectral content of the output audio signal. In this example, the audio asset AA7 is selected to be included in the audio output signal. As indicated by the dashed line 3030, the mixer controller 3020 is operable to cause the mixer 3000 to adjust the proportion of the audio asset AA7 which is included in the output audio signal in dependence upon the spectral content of the output audio signal as indicated by data received from the spectral data generator 3010 so that the spectral content of the output audio signal corresponds to that of the mix profile.

In some embodiments, the mixer controller 3020 is operable to cause the mixer 3000 to adjust the mix of audio assets so as to correspond to the mix profile within a predetermined mix threshold. This enables the spectral content of output audio signal to substantially correspond to the desired mix profile without an iterative loop of successively finer adjustments of the mix occurring.

In some embodiments, the mixer controller 3020 is operable to control the mixer 3000 so as to adjust the mix of the audio assets already present in the input audio signal so as to substantially correspond to the mix profile. For example, the mixer controller 3020 could adjust the mix of the audio assets AA1 to AA6 (as indicated by the dashed line 3040). In another example, the mixer controller 3020 could adjust the mix of the audio assets AA1 to AA6 with respect to the audio asset AA7 so that the spectral content of the output audio signal substantially matches the mix profile (target mix).

However, if the mix is adjusted too quickly, for example by including a new audio asset in the output audio signal within the timescale of one image frame (e.g. 1/25 second), this may cause an audio artefact to be audible to the user, thus impairing the listening experience. Therefore, in some embodiments, the mixer controller 3020 is operable to use so-called "proportional, integral, differential" (PID) control to adjust the mix of audio assets to substantially match the mix profile. With suitable PID parameters, PID control allows the mix to be adjusted so that the spectral content of the output audio signal converges towards that of the target mix in timescales such that the audio asset is included at an appropriate point in the game, whilst the occurrence of audio artefacts due to, for example, overshoot, or oscillation are reduced. PID control is well known in the field of systems control and so will not be described in more detail herein. However, it will be appreciated that any other suitable method for adjusting the mix of audio assets so that the spectral content of the output audio signal substantially matches the mix profile whilst minimising the occurrence of audio artefacts may be used.

More generally, in embodiments, the input audio signal comprises a plurality of the audio assets such as the audio assets AA1 to AA7. As mentioned above, in embodiments, the candidate audio processing operations comprise an audio mixing operation to be applied to the audio assets in the input audio signal. The audio mixing operation is associated with a respective mix profile (such as a mix profile as described above) indicative of the relative proportions of the audio assets in the input audio signal, and the mix profile is associated with respective spectral mixing information representing the effect on the output audio signal of the audio mixing operation. In embodiments, the audio engine is operable to select the audio mixing operation from the candidate audio processing operations in response to the detection of the in-game event, and adjust the relative proportions of the audio assets in the input audio signal in accordance with the current spectral content of the output audio signal so that the spectral content of the output audio signal substantially corresponds to the spectral content associated with the mix profile as indicated by the spectral mixing information so as to implement the audio event associated with the detected in-game event.

Embodiments of the present invention in which one or more audio effects are applied to the input audio signal in accordance with the current spectral content of the output audio signal will now be described with reference to Figure 8.

Figure 8 is a schematic diagram of an audio shader of the game audio engine in accordance with embodiments of the present invention. Here, the term "audio shader" is typically taken in the art as meaning an apparatus (for example hardware, or hardware acting under software control) for applying audio effects such as reverberation, chorus, compression, flanging, delay, equalisation, and the like to an input audio signal. Additionally, more complex effects such as simulating muffling due to fog, may be applied by the audio shader implementing suitable audio filters.

In the embodiments described with reference to Figure 8, the audio engine 1000 comprises an audio shader 4000, a spectral data generator 4010, and an effects controller 4020. The spectral data generator has the same functionality as the spectral data generator 3010 and the spectral data generator 2010 described above. The effects controller 4020 is operable to communicate bi-directionally with the spectral data generator 4010 so as to determine whether an audio effect should be applied to the input audio signal andlor to adjust the parameters of the sound effect in accordance with the current spectral content of the output audio signal. Additionally, in the example shown in Figure 8, the input audio signal comprises audio assets AA1 to AA7, although it will be appreciated that the input audio signal could comprise any suitable number of audio assets.

As mentioned above, in some embodiments, the candidate audio processing operations comprise one or more audio effect processing operations to be applied to the input audio signal. In some embodiments, each audio effect processing operation is associated with respective spectral effect information representing the effect of the audio effect processing operation on the input audio signal. For example, a low pass filter effect could have associated spectral effect information comprising a representation of the low pass filter in the frequency domain. However, any other suitable spectral effect information could be used. The audio engine 1000 is operable to select, in response to the detection of the in-game event, an audio effect processing operation from the candidate audio processing operations to be applied the input audio signal in accordance with the spectral effect information and the current spectral content of the output audio signal as indicated by the audio spectra data.

For example, a game scene in a game may have a game character enter a game region which comprises fog. The effects controller 4020 may then analyse the spectral data as received from the spectral data generator 4010 to determine whether a "fog" audio effect should be applied to the input audio signal.

To simulate a game character entering a foggy region, an appropriate audio effect may comprise applying a low pass filter to reduce the audibility of high frequency noise, and to apply a filter which reduces apparent reverberation. However, if the current spectral content of the output audio signal comprises a substantial amount of high frequency components, i.e. the higher frequency components are more audible than lower frequency components, applying a "fog" effect to the input audio signal may cause the output audio signal to sound dull and lifeless. Therefore, the effects controller 4020 may cause the audio shader 4000 to not apply the fog effect to the input audio signal. Alternatively, the effects controller 4020 may adjust the parameters of the fog effect as appropriate so as to cause the audio shader 4000 to apply the fog effect to the input audio signal with reduced apparent audibility to a user.

Furthermore, the effects controller 4020 could adjust the fog effect parameters so that only the reverberation appears to be reduced whilst the higher frequency components do not sound attenuated.

More generally, the audio shader 4000 is operable to adjust a degree by which the selected audio effect processing operation is applied to the input audio signal in accordance with the current spectral content of the output audio signal data so that the spectral content of the output audio signal after application of the selected audio effect processing operation approximates to the required spectral content.

As mentioned above, in some situations, the spectral content of an audio asset may vary with time. Therefore, in some embodiments, the audio shader 4000 is operable to adjust the parameters of an audio effect applied to an audio asset in dependence upon the current spectral content of that audio asset as indicated by the sequence of successive audio spectra data items associated with that audio asset. For exampEe, the effects controller 4020 could adjust the parameters of an audio effect as the spectral content of an audio asset or the output audio signal changes with time.

It will be appreciated that effects controller 4020 could adjust the parameters of any audio effect and/or determine whether the audio effect should be applied by the audio shader 4000 in accordance with the current spectral content of the output audio signal in any suitable manner. Additionally, any suitable audio effect could be used as appropriate.

Embodiments of the present invention in which audio assets are generated using an audio tool will now be described with reference to Figure 9.

Figure 9 is a schematic diagram of audio tool and audio engine in accordance with embodiments of the present invention. In particular, Figure 9 shows an audio tool 5000 operable to generate audio assets as indicated by arrow 5010 together with spectral representations of the audio assets as indicated by arrow 5020. In the schematic diagram of Figure 9, the audio tool 5000 is shown on a tool side, which indicates that the audio assets and associated audio spectra data are generated by the audio tool 5000 before the game is executed at run-time. The audio engine 1000 is located in Figure 9 on a run-time side, which indicates that the audio engine implements the audio processing operations when the game is executed, i.e. at run-time. A solid line 5030 in Figure 9 separates the tool side from the run-time environment.

In embodiments, the audio tool 5000 is implemented by the system unit 10 under software control, although it will be appreciated that the audio engine could be implemented in hardware, for example as an application specific integrated circuit (ASIC), field programmable gate array (FPGA), and the like. It is to be understood that the term "audio tool" should be taken to be any device, process, or method operable to implement the functionality of embodiments of the audio tool described herein. Additionally, it will be appreciated that the audio tool could be implemented on any suitable apparatus such as an audio design workstation.

To generate the audio spectra data, the audio tool is operable to carry out spectrum analysis on the audio assets using known techniques such as Fast Fourier Transform, Constant Q-Transfonn, Wavelet transform and the like, although any suitable method of spectrum analysis could be used.

In embodiments, typically fewer than ten frequency bands are used to represent the spectral content of each audio asset, as this reduces processing resources needed to update the audio spectra data at run-time (for example when the audio engine is implemented as part of a game during game play). In some embodiments, between 3 and 10 frequency bands may be used to represent the spectral content of each audio asset. However, any number of frequency bands could be used.

Additionally, the time resolution can be relatively coarse, although any suitable time resolution could be used. In some embodiments, the number of frequency bands used to represent the spectral content of each audio asset may be set by a user, for example by using a suitable interface to the audio tool 5000. For example, a user may select a greater number of frequency bands to represent the spectral content of audio assets as appropriate depending on an effect to be applied by the audio shader 4000. In other words, a greater number of frequency bands used to represent the spectral content of each audio asset can be thought of as increasing the frequency resolution for representing the audio assets.

In some embodiments, the time resolution can be set by the audio tool 5000. In some embodiments, the time resolution may be set by the user, although in other embodiments, the time resolution may be preset within the audio tool 5000. As mentioned above, time resolution is taken to mean a time period between the generation of audio spectra data (i.e. the spectral representation of the audio asset) for an audio asset.

In some embodiments, the audio engine 1000 is operable to interpolate the audio spectra data so as to generate data for audio output. However, this can increase processing resources needed to generate the audio output. Therefore, in some embodiments, the time resolution is a multiple of an update rate of the audio engine, andlor a multiple of an update rate of the game (e.g. a frame rate of the game). For example, as mentioned above, the audio spectra data 1020 can be generated each frame period corresponding to a frame rate of the game. In other words, by generating the audio spectra data 1020 at a time resolution which is a multiple of the update rate of the audio engine, andlor the update rate of the game a need for audio interpolation between the audio spectra data is reduced.

In order to try and reduce the size of the audio spectra data, in some embodiments, the audio tool 1000 is operable to generate audio spectra difference data representative of a spectral difference between successive audio spectra data items. To generate the audio spectra difference data, the audio tool 5000 is operable to calculate the spectral flux of the audio spectra data items using known techniques such as by norrnalising a power spectrum of each audio spectra data item and calculating the Euclidean distance between two adjacent audio spectra data items in the sequence of audio spectra data items. However, it will be appreciated that any other suitable method for calculating the spectral difference between successive audio spectra data items may be used.

To further reduce the size of audio spectra data used when selecting an audio processing operation to be carried out, in some embodiments, the audio tool 5000 is operable to detect whether the audio spectra difference data has a value greater than an audio spectra difference threshold. In some embodiments, the detection of whether the audio spectra difference data is greater than the audio spectra difference threshold is carried out in respect of each frequency band of the audio spectra data.

In other embodiments the detection of whether the audio spectra difference data is greater than the audio spectra difference threshold is carried out in respect of a mean average value calculated by the audio tool 5000 over the frequency bands of the audio spectra difference data. In some embodiments, the same audio spectra difference threshold may be used in respect of each frequency band, although in other embodiments, a different audio spectra difference threshold may be used for each frequency band as appropriate. The audio spectra difference threshold may be set by a user via the audio tool 5000 or may be preset by the audio tool 5000. However, it will be appreciated that the audio spectra difference threshold(s) may be set in any other suitable manner.

In some embodiments, the audio tool 5000 is operable to log the time stamps associated with the audio spectra data items associated with audio spectra difference data which is detected to be greater than the audio spectra difference threshold. In these embodiments, audio spectra data associated with audio spectra difference data which is detected to be less than the audio spectra difference threshold is not used to represent the spectral content of the respective audio asset. This advantageously reduces the overall data size used to represent the spectral content of the audio assets in the frequency domain.

It will be appreciated that sample points for an FFT are typically distributed linearly throughout the frequency range. However, human hearing is logarithmic and therefore the use of an FF1 leads to relatively good high frequency resolution but relatively poor low frequency resolution. For a typical sample rate of 44.1kHz and 1024 samples, the bin size is 43 Hz. At low frequencies, 43Hz to 86Hz will sound to a listener to correspond to an octave in pitch.

However, at higher frequencies, a difference of 43Hz may merely sound to a listener to be a difference of a minor second (a semitone) in pitch. Therefore, in embodiments of the present invention, the audio tool 5000 is operable to map the frequency bands to a perceptual scale of pitches, such as the Mel scale. However, any other suitable perceptual scale of pitches could be used. Therefore, when the audio engine carries out an audio operation in dependence upon the spectral content of the output audio signal, the resolution of the spectral transform is less likely to be biased towards higher frequencies. However, any other suitable sample rate such as 48kHz could be used.

It will be appreciated that the spectral content of any new audio assets (unknown at a design stage) is unlikely to already have been generated by the audio tool 5000 when a new audio asset is to be used during a game (e.g. at run-time). This may occur, for example, where a user includes their own music tracks or sound effects in a game. Therefore, in embodiments, the audio engine 1000 is operable to carry out spectrum analysis of the new audio assets (for example, as introduced into the game by a user) so as to generate the audio spectra data when the audio assets are loaded at run-time. In other words, more generally, the audio engine 1000 is operable to generate the audio spectra data by carrying out spectrum analysis on the output audio data substantially in real-time, for example by carrying our real-time spectrum analysis on the audio asset data of the new audio assets. In some embodiments, the user can use an audio tool provided by a game publisher such as the audio tool 5000 in order to generate the audio spectra data.

In embodiments, the audio spectra data is generated by the audio tool 5000 or audio engine 1000 as appropriate as metadata which is associated with the respective audio asset. In other embodiments, the audio spectra data may be appended to the respective audio asset.

However, any other suitable data format for associating the audio spectra data with the respective audio assets may be used.

A method of audio processing in accordance with embodiments of the present invention will now be described with reference to Figure 10.

As mentioned above, the system unit 10 (entertainment device) is operable to implement a game environment so that a user may play a game using the system unit. At a step slOO, the audio engine 1000 generates an output audio signal from an input audio signal, such as the audio assets AA1 to AA7. As mentioned above, the output audio signal relates to a game environment of a game executed on the system unit 10. The output audio signal is associated with audio spectra data, such as an audio spectra matrix, indicative of the spectral content of the output audio signal in a frequency domain representation of the output audio signal.

Then, at a step s105, the. audio engine 1000 detects an in-game event in the game environment from a set of in-game events each having an associated audio event.

At a step silO, the audio engine 1000 selects an audio processing operation from a plurality of candidate audio processing operations to be applied to the input audio signal in order to implement the audio event associated with the detected in-game event. In embodiments, the candidate audio processing operations comprise any or all of: a selection of an audio asset as described above; a mixing operation as described above; and an audio effect operation as described above. The audio processing operation is selected in dependence upon a current spectral content of the output audio signal as indicated by the audio spectra data and spectral information representing the effect on the output audio signal of the candidate audio processing operations so that the spectral content of the output audio signal after implementation of the audio event approximates to a required spectral content.

At a step s 115, the audio engine causes the output audio signal to be output to the display and sound output device 300 as described above.

In some embodiments, the audio engine 1000 is operable to generate one or more audio assets by real-time audio synthesis to be used during a game. In these embodiments, the audio engine 1000 is operable to generate audio spectra data representative of the spectral content of a synthesised audio asset in the frequency domain. In some embodiments, the audio engine 1000 is operable to generate the audio spectra data associated with the synthesised audio asset substantially at the same time as generating (synthesising) the audio asset.

Therefore, in embodiments, an audio asset generated by the audio engine 1000 by real-time audio synthesis can be used in a similar way to the audio assets described above which are generated by the audio tool 5000 or included in a game by a user.

Although the selection of an audio asset, mixing of audio assets, and application of audio effects have been described as separated embodiments, it will be appreciated that these embodiments may be combined in any suitable manner as appropriate. For example, an audio asset could be selected, mixed with other audio assets, and then an audio effect may be applied to the resultant mix. Furthermore, it will be appreciated that the decision logic 1010 may implement the functionality of any of the asset selector controller 2020, the mixer controller 3020, and the effects controller 4020. Additionally, the audio engine 1000 may implement the functionality of any of the spectral data generator 2010, the spectral data generator 3010, and the spectral data generator 4010. Furthermore, the audio engine 1000 could implement the functionality of any of the asset selector 2000, the mixer 3000, and the audio shader 4000. Additionally, the audio engine 1000 could implement any of the functionality of the audio tool 5000 as necessary. Furthermore, it will be appreciated that each of the embodiments described herein may be implemented separately or in combination with one or more of the other embodiments.

In embodiments, the system unit 10 is operable to implement any of the functionality of the embodiments described above such as the audio engine 1000, andlor the audio tool 5000. However, it will be appreciated that the above described embodiments could be implemented using any other suitable processing device or apparatus.

It will be appreciated that in embodiments of the present invention, elements of the entertainment method may be implemented in the entertainment device in any suitable manner. Thus the required adaptation to existing parts of a conventional equivalent device may be implemented in the form of a computer program product comprising processor implementable instructions stored on a data carrier such as a floppy disk, optical disk, hard disk, PROM, RAM, flash memory or any combination of these or other storage media, or transmitted via data signals on a network such as an Ethernet, a wireless network, the Internet, or any combination of these of other networks, or realised in hardware as an ASIC (application specific integrated circuit) or an FPGA (field programmable gate array) or other configurable or bespoke circuit suitable to use in adapting the conventional equivalent device.

In conclusion, although a variety of embodiments have been described herein, these are provided by way of example only, and many variations and modifications on such embodiments will be apparent to the skilled person and fall within the scope of the present invention, which is defined by the appended claims and their equivalents.

Claims

CLAIMS1. A method of audio processing using an entertainment device operable to implement a game environment, the method comprising the steps of: generating an output audio signal relating to the game environment from an input audio signal, the output audio signal being associated with audio spectra data indicative of the spectral content of the output audio signal in a frequency domain representation of the output audio signal; detecting an in-game event in the game environment from a set of in-game events each having an associated audio event; selecting an audio processing operation from a plurality of candidate audio processing operations to be applied to the input audio signal in order to implement the audio event associated with the detected in-game event, the audio processing operation being selected in dependence upon a current spectral content of the output audio signal as indicated by the audio spectra data and spectral information representing the effect on the output audio signal of the candidate audio processing operations so that the spectral content of the output audio signal after implementation of the audio event approximates to a required spectral content; and outputting the output audio signal.
2. A method according to claim 1, in which: the generating step comprises the step of including one or more of a plurality of audio assets as at least part of the input audio signal; the candidate audio processing operations comprise an audio asset selection operation for selecting an audio asset to be included in the input audio signal, the spectral information of the audio asset selection operation comprising asset spectra data indicative of the spectral content of respective audio assets in a frequency domain representation of those audio assets; and the selecting step comprises selecting the audio asset selection operation in response to the detection of the in-game event, the audio asset selection operation comprising selecting a first audio asset from the plurality of audio assets to be included in the input audio signal, the first audio asset being selected in dependence upon a comparison between the audio spectra data and the respective asset spectra data associated with each audio asset stored in the storage element so as to implement the audio event associated with the detected in-game event.
3. A method according to claim 2, in which the audio asset selection operation comprises: generating a first set of data elements from one or more data elements of the audio spectra data, the data elements in the first set each being associated with respective audio levels which are less than an audio level threshold; comparing the audio levels of the data elements in the first set with audio levels associated with corresponding data elements of the asset spectra data; and selecting an audio asset whose audio levels are greater than the audio level threshold at data elements corresponding to the data elements in the first set as the first audio asset so that the spectral content of the output audio after the first audio asset is included in the input audio signal approximates to the required spectral content.
4. A method according to any one of the preceding claims, in which: the input audio signal comprises a plurality of the audio assets; the candidate audio processing operations comprise an audio mixing operation to be applied to the audio assets in the input audio signal, the audio mixing operation being associated with a respective mix profile indicative of the relative proportions of the audio assets in the input audio signal, and the mix profile being associated with respective spectral mixing information representing the effect on the output audio signal of the audio mixing operation; and the selecting step comprises: selecting the audio mixing operation from the candidate audio processing operations in response to the detection of the in-game event; and adjusting the relative proportions of the audio assets in the input audio signal in accordance with the current spectral content of the output audio signal so that the spectral content of the output audio signal substantially corresponds to the spectral content associated with the mix profile as indicated by the spectral mixing information so as to implement the audio event associated with the detected in-game event.
5. A method according to any one of the preceding claims, in which: the candidate audio processing operations comprise one or more audio effect processing operations to be applied to the input audio signal, each audio effect processing operation being associated with respective spectral effect information representing the effect of the audio effect processing operation on the input audio signal; the selecting step comprises selecting, in response to the detection of the in-game event, an audio effect processing operation from the candidate audio processing operations to be applied the input audio signal in accordance with the spectral effect information and the current spectral content of the output audio signal as indicated by the audio spectra data.
6. A method according to claim 5, in which the selecting step comprises: adjusting a degree by which the selected audio effect processing operation is applied to the input audio signal in accordance with the current spectral content of the output audio signal data so that the spectral content of the output audio signal after application of the selected audio effect processing operation approximates to the required spectral content.
7. A method according to any one of the preceding claims, comprising updating the audio spectra data substantially in real-time in accordance with the current spectral content of the output audio signal.
8. A method according to any one of claims 1 to 7, comprising generating the audio spectra data by carrying out spectrum analysis on the output audio data substantially in real-time.
9. A method according to any one of the preceding claims, in which: the audio spectra data and the spectral information respectively comprise audio spectra matrices each having matrix elements that associate panning data with frequency band data, each matrix element having an associated audio level; the panning data is indicative of apparent relative position of components of the input audio signal with respect to a multi-channel sound output device; and the frequency band data is associated with a plurality of audio spectra frequency bands in the frequency domain representation of the output audio signal.
10. A method according to claim 9, in which the audio spectra frequency bands are associated with respective pitches in a perceptual audio scale.
11. A computer program for implementing the method of any one of the preceding claims.
12. An audio processing apparatus comprising an entertainment device operable to implement a game environment, the apparatus comprising: an output audio signal generator operable to generate an output audio signal from an input audio signal, the output audio signal being associated with audio spectra data indicative of the spectral content of the output audio signal in a frequency domain representation of the output audio signal; an in-game event detector operable to detect an in-game event in the game environment from a set of in-game events each having an associated audio event; an audio processing operation selector operable to select from a plurality of candidate is audio processing operations an audio processing operation to be applied to the input audio signal in order to implement the audio event associated with the detected in-game event, the selector being operable to select the audio processing operation in dependence upon a current spectral content of the output audio signal as indicated by the audio spectra data and spectral information representing the effect on the output audio signal of the candidate audio processing operations so that the spectral content of the output audio signal after implementation of the audio event approximates to a required spectral content; and an output element operable to output the output audio signal to an audio signal reproduction device.
13. An audio processing method substantially as described herein with reference to the accompanying drawings.
14. An audio processing apparatus substantially as described herein with reference to the accompanying drawings.