GB2569576A

GB2569576A - Audio generation system

Info

Publication number: GB2569576A
Application number: GB1721428.9A
Authority: GB
Inventors: Breuglemans Mark
Original assignee: Sony Interactive Entertainment Inc
Current assignee: Sony Interactive Entertainment Inc
Priority date: 2017-12-20
Filing date: 2017-12-20
Publication date: 2019-06-26
Also published as: GB201721428D0

Abstract

An audio generation method comprising locating a virtual sound source in a virtual environment, detecting the real-world location and orientation of a peripheral 1200 and mapping it to a virtual position and orientation in the virtual environment. An indicated virtual sound source 1410 is then identified based on the peripheral, and audio corresponding to the sound source is output. Optionally, the sound is output through a HMD. Optionally, there is a virtual volume projecting from the peripheral, and the indicated object falls in the virtual volume. Optionally there may be restrictions on sound sources a user can listen to. Alternatively, a method comprising locating virtual sound receivers in a virtual environment and detecting the real-world location and orientation of a peripheral and mapping it to a virtual position and orientation in the virtual environment. A recipient is identified based on the mapped peripheral and the transmission of audio to the recipient.

Description

AUDIO GENERATION SYSTEM

BACKGROUND

Field of the Disclosure

This disclosure relates to audio generation systems and methods.

Description of the Prior Art

In recent years there has been an increase in the amount of content that is made available to users for viewing using a head-mountable display (HMD) system. Such systems provide virtual content for viewing by a user, either overlaid upon a view of the real-world environment (in a see-through HMD arrangement) or in an entirely virtual environment (a fullimmersion HMD arrangement).

In these arrangements, it is anticipated that sounds may be generated that correspond to the virtual objects and that these sounds are provided to a user via headphones (or another suitable audio output device). In some arrangements, sounds from the real-world environment may be blocked (either intentionally or incidentally) by the use of headphones. In such cases, these real-world sounds may be captured by a microphone arrangement and also provided to the user via headphones when appropriate.

With such an array of sound sources being available to a user, it is considered that the user may experience situations in which useful or otherwise interesting sounds are obscured by less interesting sounds. This may be particularly true in virtual environments in which a large number of users are able to interact, for example. In order to provide a good experience for a user, it is likely that they would need to hear the interesting or useful content in preference to other content; therefore a method for filtering this unwanted sound content out is desirable.

The present disclosure is provided in view of this desire to prevent useful or interesting sounds from being obscured by unwanted sounds.

Various aspects and features of the present disclosure are defined in the appended claims and within the text of the accompanying description and include at least an audio generation apparatus and a method of operating an audio generating apparatus as well as a computer program.

BRIEF DESCRIPTION OF THE DRAWINGS

A more complete appreciation of the disclosure and many of the attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings, wherein:

Figure 1 schematically illustrates an HMD worn by a user;

Figure 2 is a schematic plan view of an HMD;

Figure 3 schematically illustrates the formation of a virtual image by an HMD;

Figure 4 schematically illustrates another type of display for use in an HMD;

Figure 5 schematically illustrates a pair of stereoscopic images;

Figures 6 and 7 schematically illustrate a user wearing an HMD connected to a Sony® PlayStation 3® games console;

Figure 8 schematically illustrates a change of view of user of an HMD;

Figures 9a and 9b schematically illustrate HMDs with motion sensing;

Figure 10 schematically illustrates a position sensor based on optical flow detection;

Figure 11 schematically illustrates image processing carried out in response to a detected position or change in position of an HMD;

Figure 12 schematically illustrates a controller;

Figure 13 schematically illustrates a controller and an associated cone;

Figure 14 schematically illustrates a scene comprising a plurality of characters;

Figure 15 schematically illustrates a virtual scene;

Figure 16 schematically illustrates modifications of a cone associated with a controller;

Figure 17 schematically illustrates characters with corresponding importance ratings;

Figure 18 schematically illustrates a selection of character statuses;

Figure 19 schematically illustrates a virtual environment with an obstacle;

Figure 20 schematically illustrates an audio generation system;

Figure 21 schematically illustrates a virtual scene analysis unit;

Figure 22 schematically illustrates a peripheral identification unit;

Figure 23 schematically illustrates a sound identification unit;

Figure 24 schematically illustrates an audio generation method;

Figure 25 schematically illustrates a scene analysis method;

Figure 26 schematically illustrates a peripheral identification method;

Figure 27 schematically illustrates a sound source identification method;

Figure 28 schematically illustrates a system; and

Figure 29 schematically illustrates a method.

DESCRIPTION OF THE EMBODIMENTS

Referring now to Figure 1, a user 10 is wearing an HMD 20 (as an example of a generic head-mountable apparatus or virtual reality apparatus). The HMD comprises a frame 40, in this example formed of a rear strap and a top strap, and a display portion 50.

An HMD is presented here as an example of a system that is used to provide images and audio to a user; however, the present disclosure may be used with any suitable display system. For example, a television that is operable to present a view of a virtual environment to a user may be used instead of an HMD.

Note that the HMD of Figure 1 may comprise further features, to be described below in connection with other drawings, but which are not shown in Figure 1 for clarity of this initial explanation.

The HMD of Figure 1 completely (or at least substantially completely) obscures the user's view of the surrounding environment. All that the user can see is the pair of images displayed within the HMD.

The HMD has associated headphone audio transducers or earpieces 60 which fit into the user's left and right ears 70. The earpieces 60 replay an audio signal provided from an external source, which may be the same as the video signal source which provides the video signal for display to the user's eyes. A boom microphone 75 is mounted on the HMD so as to extend towards the user’s mouth.

The combination of the fact that the user can see only what is displayed by the HMD and, subject to the limitations of the noise blocking or active cancellation properties of the earpieces and associated electronics, can hear only what is provided via the earpieces, mean that this HMD may be considered as a so-called “full immersion” HMD. Note however that in some embodiments the HMD is not a full immersion HMD, and may provide at least some facility for the user to see and/or hear the user’s surroundings. This could be by providing some degree of transparency or partial transparency in the display arrangements, and/or by projecting a view of the outside (captured using a camera, for example a camera mounted on the HMD) via the HMD’s displays, and/or by allowing the transmission of ambient sound past the earpieces and/or by providing a microphone to generate an input sound signal (for transmission to the earpieces) dependent upon the ambient sound. In some embodiments, a directional microphone may be provided on the HMD so as to allow sounds in the real-world environment to be selectively captured and passed to the user. Alternatively, or in addition, an array of microphones may be provided (or any other suitable arrangement) so as to record sound in the environment and identify a source of the sound.

A front-facing camera 122 may capture images to the front of the HMD, in use. A Bluetooth® antenna 124 may provide communication facilities or may simply be arranged as a directional antenna to allow a detection of the direction of a nearby Bluetooth transmitter.

In operation, a video signal is provided for display by the HMD. This could be provided by an external video signal source 80 such as a video games machine or data processing apparatus (such as a personal computer), in which case the signals could be transmitted to the HMD by a wired or a wireless connection 82. Examples of suitable wireless connections include Bluetooth® connections. Audio signals for the earpieces 60 can be carried by the same connection. Similarly, any control signals passed from the HMD to the video (audio) signal source may be carried by the same connection. Furthermore, a power supply 83 (including one or more batteries and/or being connectable to a mains power outlet) may be linked by a cable to the HMD. Note that the power supply 83 and the video signal source 80 may be separate units or may be embodied as the same physical unit. There may be separate cables for power and video (and indeed for audio) signal supply, or these may be combined for carriage on a single cable (for example, using separate conductors, as in a USB cable, or in a similar way to a “power over Ethernet” arrangement in which data is carried as a balanced signal and power as direct current, over the same collection of physical wires). The video and/or audio signal may be carried by, for example, an optical fibre cable. In other embodiments, at least part of the functionality associated with generating image and/or audio signals for presentation to the user may be carried out by circuitry and/or processing forming part of the HMD itself. A power supply may be provided as part of the HMD itself.

Some embodiments of the disclosure are applicable to an HMD having at least one electrical and/or optical cable linking the HMD to another device, such as a power supply and/or a video (and/or audio) signal source. So, embodiments of the disclosure can include, for example:

(a) an HMD having its own power supply (as part of the HMD arrangement) but a cabled connection to a video and/or audio signal source;

(b) an HMD having a cabled connection to a power supply and to a video and/or audio signal source, embodied as a single physical cable or more than one physical cable;

(c) an HMD having its own video and/or audio signal source (as part of the HMD arrangement) and a cabled connection to a power supply; or (d) an HMD having a wireless connection to a video and/or audio signal source and a cabled connection to a power supply.

If one or more cables are used, the physical position at which the cable 82 and/or 84 enters or joins the HMD is not particularly important from a technical point of view. Aesthetically, and to avoid the cable(s) brushing the user’s face in operation, it would normally be the case that the cable(s) would enter or join the HMD at the side or back of the HMD (relative to the orientation of the user’s head when worn in normal operation). Accordingly, the position of the cables 82, 84 relative to the HMD in Figure 1 should be treated merely as a schematic representation.

Accordingly, the arrangement of Figure 1 provides an example of a head-mountable display system comprising a frame to be mounted onto an observer’s head, the frame defining one or two eye display positions which, in use, are positioned in front of a respective eye of the observer and a display element mounted with respect to each of the eye display positions, the display element providing a virtual image of a video display of a video signal from a video signal source to that eye of the observer.

Figure 1 shows just one example of an HMD. Other formats are possible: for example an HMD could use a frame more similar to that associated with conventional eyeglasses, namely a substantially horizontal leg extending back from the display portion to the top rear of the user's ear, possibly curling down behind the ear. In other (not full immersion) examples, the user's view of the external environment may not in fact be entirely obscured; the displayed images could be arranged so as to be superposed (from the user's point of view) over the external environment. An example of such an arrangement will be described below with reference to Figure 4.

In the example of Figure 1, a separate respective display is provided for each of the user's eyes. A schematic plan view of how this is achieved is provided as Figure 2, which illustrates the positions 100 of the user's eyes and the relative position 110 of the user's nose. The display portion 50, in schematic form, comprises an exterior shield 120 to mask ambient light from the user's eyes and an internal shield 130 which prevents one eye from seeing the display intended for the other eye. The combination of the user's face, the exterior shield 120 and the interior shield 130 form two compartments 140, one for each eye. In each of the compartments there is provided a display element 150 and one or more optical elements 160. The way in which the display element and the optical element(s) cooperate to provide a display to the user will be described with reference to Figure 3.

Referring to Figure 3, the display element 150 generates a displayed image which is (in this example) refracted by the optical elements 160 (shown schematically as a convex lens but which could include compound lenses or other elements) so as to generate a virtual image 170 which appears to the user to be larger than and significantly further away than the real image generated by the display element 150. As an example, the virtual image may have an apparent image size (image diagonal) of more than 1 m and may be disposed at a distance of more than 1 m from the user's eye (or from the frame of the HMD). In general terms, depending on the purpose of the HMD, it is desirable to have the virtual image disposed a significant distance from the user. For example, if the HMD is for viewing movies or the like, it is desirable that the user's eyes are relaxed during such viewing, which requires a distance (to the virtual image) of at least several metres. In Figure 3, solid lines (such as the line 180) are used to denote real optical rays, whereas broken lines (such as the line 190) are used to denote virtual rays.

An alternative arrangement is shown in Figure 4. This arrangement may be used where it is desired that the user's view of the external environment is not entirely obscured. However, it is also applicable to HMDs in which the user's external view is wholly obscured. In the arrangement of Figure 4, the display element 150 and optical elements 200 cooperate to provide an image which is projected onto a mirror 210, which deflects the image towards the user's eye position 220. The user perceives a virtual image to be located at a position 230 which is in front of the user and at a suitable distance from the user.

In the case of an HMD in which the user's view of the external surroundings is entirely obscured, the mirror 210 can be a substantially 100% reflective mirror. The arrangement of

Figure 4 then has the advantage that the display element and optical elements can be located closer to the centre of gravity of the user's head and to the side of the user's eyes, which can produce a less bulky HMD for the user to wear. Alternatively, if the HMD is designed not to completely obscure the user's view of the external environment, the mirror 210 can be made partially reflective so that the user sees the external environment, through the mirror 210, with the virtual image superposed over the real external environment.

In the case where separate respective displays are provided for each of the user's eyes, it is possible to display stereoscopic images. An example of a pair of stereoscopic images for display to the left and right eyes is shown in Figure 5. The images exhibit a lateral displacement relative to one another, with the displacement of image features depending upon the (real or simulated) lateral separation of the cameras by which the images were captured, the angular convergence of the cameras and the (real or simulated) distance of each image feature from the camera position.

Note that the lateral displacements in Figure 5 could in fact be the other way round, which is to say that the left eye image as drawn could in fact be the right eye image, and the right eye image as drawn could in fact be the left eye image. This is because some stereoscopic displays tend to shift objects to the right in the right eye image and to the left in the left eye image, so as to simulate the idea that the user is looking through a stereoscopic window onto the scene beyond. However, some HMDs use the arrangement shown in Figure 5 because this gives the impression to the user that the user is viewing the scene through a pair of binoculars. The choice between these two arrangements is at the discretion of the system designer.

In some situations, an HMD may be used simply to view movies and the like. In this case, there is no change required to the apparent viewpoint of the displayed images as the user turns the user's head, for example from side to side. In other uses, however, such as those associated with virtual reality (VR) or augmented reality (AR) systems, the user's viewpoint needs to track movements with respect to a real or virtual space in which the user is located.

Figure 6 schematically illustrates an example virtual reality system and in particular shows a user wearing an HMD connected to a Sony® PlayStation 3® games console 300 as an example of a base device. The games console 300 is connected to a mains power supply 310 and (optionally) to a main display screen (not shown). A cable, acting as the cables 82, 84 discussed above (and so acting as both power supply and signal cables), links the HMD 20 to the games console 300 and is, for example, plugged into a USB socket 320 on the console 300. Note that in the present embodiments, a single physical cable is provided which fulfils the functions of the cables 82, 84. In Figure 6, the user is also shown holding a pair of hand-held controller 330s which may be, for example, Sony® Move® controllers which communicate wirelessly with the games console 300 to control (or to contribute to the control of) game operations relating to a currently executed game program.

The video displays in the HMD 20 are arranged to display images generated by the games console 300, and the earpieces 60 in the HMD 20 are arranged to reproduce audio signals generated by the games console 300. Note that if a USB type cable is used, these signals will be in digital form when they reach the HMD 20, such that the HMD 20 comprises a digital to analogue converter (DAC) to convert at least the audio signals back into an analogue form for reproduction.

Images from the camera 122 mounted on the HMD 20 are passed back to the games console 300 via the cable 82, 84. Similarly, if motion or other sensors are provided at the HMD 20, signals from those sensors may be at least partially processed at the HMD 20 and/or may be at least partially processed at the games console 300. The use and processing of such signals will be described further below.

The USB connection from the games console 300 also provides power to the HMD 20, according to the USB standard.

Figure 6 also shows a separate display 305 such as a television or other openly viewable display (by which it is meant that viewers other than the HMD wearer may see images displayed by the display 305) and a camera 315, which may be (for example) directed towards the user (such as the HMD wearer) during operation of the apparatus. An example of a suitable camera is the PlayStation Eye camera, although more generally a generic “webcam”, connected to the console 300 by a wired (such as a USB) or wireless (such as WiFi or Bluetooth) connection.

The display 305 may be arranged (under the control of the games console) to provide the function of a so-called “social screen”. It is noted that playing a computer game using an HMD can be very engaging for the wearer of the HMD but less so for other people in the vicinity (particularly if they are not themselves also wearing HMDs). To provide an improved experience for a group of users, where the number of HMDs in operation is fewer than the number of users, images can be displayed on a social screen. The images displayed on the social screen may be substantially similar to those displayed to the user wearing the HMD, so that viewers of the social screen see the virtual environment (or a subset, version or representation of it) as seen by the HMD wearer. In other examples, the social screen could display other material such as information relating to the HMD wearer’s current progress through the ongoing computer game. For example, the HMD wearer could see the game environment from a first person viewpoint whereas the social screen could provide a third person view of activities and movement of the HMD wearer’s avatar, or an overview of a larger portion of the virtual environment. In these examples, an image generator (for example, a part of the functionality of the games console) is configured to generate some of the virtual environment images for display by a display separate to the head mountable display.

Figure 7 schematically illustrates a similar arrangement (another example of a virtual reality system) in which the games console is connected (by a wired or wireless link) to a socalled “break out box” acting as a base or intermediate device 350, to which the HMD 20 is connected by a cabled link 82, 84. The breakout box has various functions in this regard. One function is to provide a location, near to the user, for some user controls relating to the operation of the HMD, such as (for example) one or more of a power control, a brightness control, an input source selector, a volume control and the like. Another function is to provide a local power supply for the HMD (if one is needed according to the embodiment being discussed). Another function is to provide a local cable anchoring point. In this last function, it is not envisaged that the break-out box 350 is fixed to the ground or to a piece of furniture, but rather than having a very long trailing cable from the games console 300, the break-out box provides a locally weighted point so that the cable 82, 84 linking the HMD 20 to the break-out box will tend to move around the position of the break-out box. This can improve user safety and comfort by avoiding the use of very long trailing cables.

It will be appreciated that the localisation of processing in the various techniques described in this application can be varied without changing the overall effect, given that an HMD may form part of a set or cohort of interconnected devices (that is to say, interconnected for the purposes of data or signal transfer, but not necessarily connected by a physical cable). So, processing which is described as taking place “at” one device, such as at the HMD, could be devolved to another device such as the games console (base device) or the break-out box. Processing tasks can be shared amongst devices. Source signals, on which the processing is to take place, could be distributed to another device, or the processing results from the processing of those source signals could be sent to another device, as required. So any references to processing taking place at a particular device should be understood in this context. Similarly, where an interaction between two devices is basically symmetrical, for example where a camera or sensor on one device detects a signal or feature of the other device, it will be understood that unless the context prohibits this, the two devices could be interchanged without any loss of functionality.

As mentioned above, in some uses of the HMD, such as those associated with virtual reality (VR) or augmented reality (AR) systems, the user's viewpoint needs to track movements with respect to a real or virtual space in which the user is located.

This tracking is carried out by detecting motion of the HMD and varying the apparent viewpoint of the displayed images so that the apparent viewpoint tracks the motion.

Figure 8 schematically illustrates the effect of a user head movement in a VR or AR system.

Referring to Figure 8, a virtual environment is represented by a (virtual) spherical shell 250 around a user. This provides an example of a virtual display screen (VDS). Because of the need to represent this arrangement on a two-dimensional paper drawing, the shell is represented by a part of a circle, at a distance from the user equivalent to the separation of the displayed virtual image from the user. A user is initially at a first position 260 and is directed towards a portion 270 of the virtual environment. It is this portion 270 which is represented in the images displayed on the display elements 150 of the user's HMD. It can be seen from the drawing that the VDS subsists in three dimensional space (in a virtual sense) around the position in space of the HMD wearer, such that the HMD wearer sees a current portion of VDS according to the HMD orientation.

Consider the situation in which the user then moves his head to a new position and/or orientation 280. In order to maintain the correct sense of the virtual reality or augmented reality display, the displayed portion of the virtual environment also moves so that, at the end of the movement, a new portion 290 is displayed by the HMD.

So, in this arrangement, the apparent viewpoint within the virtual environment moves with the head movement. If the head rotates to the right side, for example, as shown in Figure 8, the apparent viewpoint also moves to the right from the user's point of view. If the situation is considered from the aspect of a displayed object, such as a displayed object 300, this will effectively move in the opposite direction to the head movement. So, if the head movement is to the right, the apparent viewpoint moves to the right but an object such as the displayed object 300 which is stationary in the virtual environment will move towards the left of the displayed image and eventually will disappear off the left-hand side of the displayed image, for the simple reason that the displayed portion of the virtual environment has moved to the right whereas the displayed object 300 has not moved in the virtual environment.

Figures 9a and 9b schematically illustrated HMDs with motion sensing. The two drawings are in a similar format to that shown in Figure 2. That is to say, the drawings are schematic plan views of an HMD, in which the display element 150 and optical elements 160 are represented by a simple box shape. Many features of Figure 2 are not shown, for clarity of the diagrams. Both drawings show examples of HMDs with a motion detector for detecting motion of the observer’s head.

In Figure 9a, a forward-facing camera 322 is provided on the front of the HMD. This may be the same camera as the camera 122 discussed above, or may be an additional camera. This does not necessarily provide images for display to the user (although it could do so in an augmented reality arrangement). Instead, its primary purpose in the present embodiments is to allow motion sensing. A technique for using images captured by the camera 322 for motion sensing will be described below in connection with Figure 10. In these arrangements, the motion detector comprises a camera mounted so as to move with the frame; and an image comparator operable to compare successive images captured by the camera so as to detect inter-image motion.

Figure 9b makes use of a hardware motion detector 332. This can be mounted anywhere within or on the HMD. Examples of suitable hardware motion detectors are piezoelectric accelerometers or optical fibre gyroscopes. It will of course be appreciated that both hardware motion detection and camera-based motion detection can be used in the same device, in which case one sensing arrangement could be used as a backup when the other one is unavailable, or one sensing arrangement (such as the camera) could provide data for changing the apparent viewpoint of the displayed images, whereas the other (such as an accelerometer) could provide data for image stabilisation.

Figure 10 schematically illustrates one example of motion detection using the camera 322 of Figure 9a.

The camera 322 is a video camera, capturing images at an image capture rate of, for example, 25 images per second. As each image is captured, it is passed to an image store 400 for storage and is also compared, by an image comparator 410, with a preceding image retrieved from the image store. The comparison uses known block matching techniques (socalled “optical flow” detection) to establish whether substantially the whole image has moved since the time at which the preceding image was captured. Localised motion might indicate moving objects within the field of view of the camera 322, but global motion of substantially the whole image would tend to indicate motion of the camera rather than of individual features in the captured scene, and in the present case because the camera is mounted on the HMD, motion of the camera corresponds to motion of the HMD and in turn to motion of the user’s head.

The displacement between one image and the next, as detected by the image comparator 410, is converted to a signal indicative of motion by a motion detector 420. If required, the motion signal is converted by to a position signal by an integrator 430.

As mentioned above, as an alternative to, or in addition to, the detection of motion by detecting inter-image motion between images captured by a video camera associated with the HMD, the HMD can detect head motion using a mechanical or solid state detector 332 such as an accelerometer. This can in fact give a faster response in respect of the indication of motion, given that the response time of the video-based system is at best the reciprocal of the image capture rate. In some instances, therefore, the detector 332 can be better suited for use with higher frequency motion detection. However, in other instances, for example if a high image rate camera is used (such as a 200 Hz capture rate camera), a camera-based system may be more appropriate. In terms of Figure 10, the detector 332 could take the place of the camera 322, the image store 400 and the comparator 410, so as to provide an input directly to the motion detector 420. Or the detector 332 could take the place of the motion detector 420 as well, directly providing an output signal indicative of physical motion.

Other position or motion detecting techniques are of course possible. For example, a mechanical arrangement by which the HMD is linked by a moveable pantograph arm to a fixed point (for example, on a data processing device or on a piece of furniture) may be used, with position and orientation sensors detecting changes in the deflection of the pantograph arm. In other embodiments, a system of one or more transmitters and receivers, mounted on the HMD and on a fixed point, can be used to allow detection of the position and orientation of the HMD by triangulation techniques. For example, the HMD could carry one or more directional transmitters, and an array of receivers associated with known or fixed points could detect the relative signals from the one or more transmitters. Or the transmitters could be fixed and the receivers could be on the HMD. Examples of transmitters and receivers include infra-red transducers, ultrasonic transducers and radio frequency transducers. The radio frequency transducers could have a dual purpose, in that they could also form part of a radio frequency data link to and/or from the HMD, such as a Bluetooth® link.

Figure 11 schematically illustrates image processing carried out in response to a detected position or change in position of the HMD.

As mentioned above in connection with Figure 10, in some applications such as virtual reality and augmented reality arrangements, the apparent viewpoint of the video being displayed to the user of the HMD is changed in response to a change in actual position or orientation of the user’s head.

With reference to Figure 11, this is achieved by a motion sensor 450 (such as the arrangement of Figure 10 and/or the motion detector 332 of Figure 9b) supplying data indicative of motion and/or current position to a required image position detector 460, which translates the actual position of the HMD into data defining the required image for display. An image generator 480 accesses image data stored in an image store 470 if required, and generates the required images from the appropriate viewpoint for display by the HMD. The external video signal source can provide the functionality of the image generator 480 and act as a controller to compensate for the lower frequency component of motion of the observer’s head by changing the viewpoint of the displayed image so as to move the displayed image in the opposite direction to that of the detected motion so as to change the apparent viewpoint of the observer in the direction of the detected motion.

Figure 12 schematically illustrates a handheld peripheral that may be used by a user in order to interact with the virtual environment. An example of such a peripheral is that of the Sony® PlayStation® Move® Controller, such as the controllers 330 in Figure 6 and 7.

The controller 1200 comprises a plurality of buttons 1210 arranged on its surface. The buttons 1210 may be located anywhere on the surface of the controller 1200, including on the rear of the controller 1200. In addition to the buttons 1210, pressure sensors or the like (not shown) may be provided so as to allow a user to interact with the controller in a greater range of ways.

The controller 1200 also comprises a tracking element 1220 that acts as a marker for camera associated with a processing device (such as the camera 315 associated with the console 300 in Figures 6 and 7) to be able to identify in order to locate the controller 1200 in the environment. Information generated in this manner may be supplemented with other sensor data, such as data from gyroscopes or accelerometers forming part of the controller 1200 and wirelessly transmitting data to the processing device, in order to accurately determine the location and/or orientation of the controller 1200. The tracking element 1220 may be operable to light up in a predetermined colour, or emit a predetermined wavelength or wavelength range of light (not necessarily visible) in order to allow the locating of the controller in the image captured by the camera used for tracking the controller 1200 by recognition, within a captured image, of a region of that colour. The size of the element 1220 in the captured image may be used to calculate a distance from the camera in the image, for example, or other methods (such as comparing disparity in the left/right images from a stereoscopic camera) may be used instead.

While such a controller may be useful for direct inputs using the provided buttons or the like, the location/orientation tracking may also be useful for interacting with a virtual scene. The operation of such a controller may also be dependent upon which hand it is held in.

Figure 13 schematically illustrates a controller 1200 with a cone 1300 that extends longitudinally from the tracking element 1220. The cone 1300 may be thought of similarly to a representation of the way that light is emitted from a flashlight, for example. The cone 1300, while not physically present in the real environment, may be displayed to a user in the virtual environment if it is deemed appropriate for a particular application.

The cone 1300 may be used to define a volume in a particular direction so as to indicate a ‘pointing’ gesture by the user. A user is therefore able to use the controller 1200 to indicate an object of particular interest within an environment, for example. A cone may be advantageous in interpreting such an action, as it allows the user to indicate an area (such as the location of a group of people) as well as not being restricted to a particular depth within the image. The determining of a particular object (or objects) of interest with the cone 1300 is discussed in more detail below. In some examples, the identified object(s) of interest may be object(s) associated with sound sources which the user wants to hear.

In some embodiments a user may use a pair of such controllers to provide inputs such as in Figures 6 and 7. This may allow a user to indicate two different regions; for example regions in which sound sources should be audible to a user. Alternatively, or in addition, the controllers may be used in conjunction with one another. For example, by pointing each controller to a similar location a region of overlap may be defined, which may be taken as the region from which the user wishes to hear sound. This may assist the user by increasing the accuracy of the indication of an area in the scene.

Of course, while Figure 13 (and the following description) makes reference to cones, any suitable shape may be used. For example, pyramids or cylinders may be suitable alternatives. It may be preferable that a solid that is defined to have an apex at the controller 1200 and a surface arranged opposite to the apex in the direction extending from the front of the controller 1200, however this disclosure should not be read in such a way so as to be limited in this fashion. In some embodiments, any polyhedron may be suitable. In some embodiments, instead of a cone, a ‘laser-pointer’ style implementation may be used such that a narrow volume is defined (rather than a broad cone volume, such that the volume is more similar to a representation of a laser pointer rather than a flashlight) so as to allow the high-precision identification of a sound source.

In general, the following Figures refer to a virtual environment or a virtual scene; however it should be appreciated that the methods described may be equally suited to use with a see-through HMD. This means that instead of virtual elements, the scenes may relate to real objects and people; in such a case, it may be possible to perform processing to separate sound sources from one another and to selectively relay sounds in the environment to a user. For example, users of see-through HMD users in the same room may be able to indicate another user using the controller in order to receive audio from the microphone associated with that user’s HMD.

Figure 14 schematically illustrates a virtual scene 1400 that may be displayed to a user. The virtual scene 1400 comprises a first character 1410 and a pair of other characters 1420. Each of the characters may correspond to in-game characters or avatars (which may or may not be controlled by other, real players), other users of HMDs (for example, captured using a camera or seen with a see-through HMD system), or other people who are not using HMDs. In this example, it is considered that the first character 1410 is a character in which the user is interested in listening to, whilst the other characters 1420 are of lesser (or no) interest to the user.

This Figure is representative of an advantageous use of the arrangement in the present disclosure. In a social environment, a user may find it difficult to hear the person they are conversing with if there is a large amount of noise from other sources - such as passing vehicles, or other people’s conversations. By being able to identify which sound sources are of interest, these other sources may be omitted from the sound provided to the user and thus they may focus on the desired sounds more easily.

While Figure 14 shows the characters 1410 and 1420 as people, this should not be seen as limiting. The methods described in the present disclosure are equally applicable to virtual representations (or real versions) of inanimate objects or non-human beings; examples include televisions and radios that are associated with sound outputs that a user may expect to hear.

In order to indicate that the character 1410 is of interest, the user orients the controller so as to be directed in the virtual environment towards the first character 1410. This is shown in Figure 14 with the controller 1200 and an associated cone being displayed encompassing the character 1410; however the user’s view of the scene may differ from that shown.

In some embodiments, the user is unable to see either the controller 1200 or the corresponding cone (for example, in a full-immersion HMD system). In that case, the user may only be able to see the scene 1400 (or a portion of the scene 1400). Alternatively, the user may be able to see the cone, but not the controller 1200, in order to assist with aiming. This helps the user to ensure that they have indicated the correct object or character of interest.

Alternatively, the user may be presented with a view of the controller 1200, or a virtual representation of the controller 1200. A virtual representation may be a virtual reconstruction of the controller 1200, or alternatively it may be a virtual object that is different from the physical controller 1200. The virtual object may be selected to be an object associated with capturing sound (such as a microphone), or an object that is more in keeping with a virtual environment that is being rendered (such as a magical item in a fantasy environment, for example a wand) so as to increase the user’s sense of immersion while still providing the desired functionality.

Any suitable combination of the display methods described above may also be used, so as to provide an improved interaction for the user.

Once the user has directed the controller 1200 towards the character 1410, processing is performed to determine an appropriate sound output to be provided to the user. In some embodiments, any sound source that is identified as corresponding to an object or character within the cone is used to provide a sound output to the user. This may be subject to constraints on depth (such as only the first object in the depth direction that appears in the cone being used for a sound output) or the like, so as to refine the sound output.

In some embodiments, processing may be performed so as to identify objects within the cone, and to use the sound outputs of these objects to generate sound for a user. For example, if a user were to direct the controller towards another user’s leg, the user would be able to hear the other user speaking even though the voice does not originate from the other user’s leg.

In some embodiments, processing may be performed so as to identify objects and/or sounds that are associated with objects within the cone. For example, if the user were to instead direct the controller 1200 towards one of the characters 1420 then the sound output for both characters may be provided to the user in recognition of the fact that they are likely to be in conversation due to their proximity. Alternatively, or in addition, it could be deduced that they are in conversation (or otherwise interacting) using other metrics, such as analysis of sound outputs, identification of belonging to an in-game party, or identifying if they are directing their respective controllers towards each other, for example.

Of course, any combination of the above methods (and other suitable methods) may be used in order to provide a sound output to a user. The selected combination may be dependent upon the properties of the virtual/real scene (such as the number of people/sound sources present and/or their distribution within the scene), so as to provide an improved sound output to the user.

Figure 15 schematically illustrates a virtual environment 1500 that simulates a cinema experience. A cinema screen 1510 and a number of seats 1520 are present in the environment so as to present a cinema-style environment to the user.

In the environment 1500, the user generally would like to hear the sound from the movie only; the sound of other viewers is generally regarded as being undesirable. In view of this, the user is able to direct a controller towards the screen 1510 in order to only hear the sound from the movie. The user is also able to direct the controller towards other users if they are interested in hearing the sound outputs (for example, hearing a friend’s commentary on the movie).

It should be noted that this is an example of a situation in which a sound source is not within the cone defined by the controller’s position and orientation, but it is possible to derive the desired sound based upon the cone. More simply; the user is likely to direct the controller towards the screen 1510, rather than the speakers that actually emit the sound. However, by identifying that the screen 1510 is used to display content that has associated sound it is possible to provide the desired sound output to the user.

Such an association between the object identified by the cone and a sound source may be identified in a number of ways. In some examples, this is achieved via metadata associated with an object in a virtual image; for instance, an ‘associated sound source’ field could be provided in the object description. In some examples, image processing is performed to identify the object that is indicated; for instance, the image processing could identify that there is a screen in the scene and then seek an associated sound track. In some examples, appropriate audio is associated with an object, but not played to the user - in effect, the screen would be a muted audio source that only becomes unmuted when a user directs the controller 1200 towards the audio source.

In some embodiments, it may be possible to pass the control of the cone definition to another peripheral or body part of the user. For example, the user may become weary of constantly directing a controller towards a screen during a movie. In such a case, it may be advantageous to define a fixed cone, or a cone that moves in dependence upon something other than the controller.

In some examples, the user is able to indicate that a current cone is to be fixed for use for future interaction. This may be on the basis of the cone remaining directed towards a particular object for a predetermined amount of time (for example, a user-defined period of time), until a predetermined event in the virtual scene (such as a new character appearing, or a movie finishing) or until further inputs are provided by a user, for example. A user may be able to orient a controller in a desired fashion and then press a button (or provide any other suitable input) in order to indicate that that particular cone should be used for future interaction.

In some examples, this cone is defined in real- or virtual-world space, such that the cone is consistently aimed at the same point in space. In some examples, the cone may be fixed upon a particular object in the environment, and is operable to track that object (so as to follow a conversation with a person who is walking). In further examples, the position of the cone is fixed with respect to the user’s viewpoint, such that if the user moves their head then the cone moves too.

In some embodiments, the user is able to indicate that they wish to pass the cone definition from the controller to the HMD, for example, or any other peripheral or even body part. Any suitable manner of indicating this may be used. For example, the user may be able to move the controller to touch the HMD (which may be detected using a camera, any kind of proximity sensing, or by comparing results from tracking each object, for example) in order to indicate that the HMD should be used to define the cone. This may be performed in conjunction with a button press or the like to indicate the intention of the user. Once such an action is performed, the position and/or orientation of the HMD (or other device) is used to define a cone. In this particular example, the cone would be defined to be in the direction of the user’s current view if the cone were facing forwards at the time.

Of course, it should be appreciated that the cone could be ‘attached’ to the HMD at any angle - for example, the cone could be facing behind the user so as to hear sounds from the rear, with the cone still moving about the initial position in dependence upon the motion of the HMD.

Figure 16 schematically illustrates a selection of ways in which the cone may be modified by a user. An example of a user input that may cause the modification is described, however this should not be seen as limiting; any suitable user input may be used to indicate that such a modification should be made to the cone.

A cone 1600 represents a default cone size for a particular embodiment. The size of the cone 1600 may be determined in dependence upon any number of factors. For example, the number of sound sources in the environment could be considered (such as a busier environment corresponding to a smaller cone, so as to allow a user more precision in identifying sound sources). Technical restrictions of the processing device through which content is provided may also be considered, such that a cone is defined that limits the number of selectable sound sources to a number that may be processed by the device.

A cone 1610 illustrates a first modification to the cone size; the angle at the apex of the cone 1610 is much larger than that of the cone 1600. This is apparent from the radius of the base of the cone 1610; it incorporates a much larger area than that of the cone 1600. In the example provided, such a modification is obtained by rotating the controller 1200 about its axis. This may be advantageous in that a larger target area may be used, such that a greater number of sound sources may be indicated.

A cone 1620 illustrates a second modification to the cone size; the length of the sides (the distance between the apex and the opposing surface) has been increased relative to the cone 1600. In the example provided, this modification is obtained by squeezing the controller (detected by the pressure sensor(s) discussed above). This may be advantageous in that a longer range is provided, such that further away sound sources may be indicated.

A cone 1630 illustrates a third modification to the cone size; this modification is a combination of the first two modifications, such that the cone 1630 is both broadened and lengthened relative to the initial cone 1600. This may be acquired by providing the corresponding inputs for each modification simultaneously or sequentially; in the present example, this would mean that the user squeezes the controller 1200 to increase the length and rotating the controller 1200 to increase the breadth.

In some embodiments the size of the cone may be adjusted in a continuous manner such that any value between a minimum and maximum size may be selected by a user. In other embodiments, a number of preset values are determined that the user is able to cycle through by providing inputs. In some embodiments, different settings are provided in different manners, and/or the user is able to determine which setting type is associated with each setting (for example, the user may specify that the breadth may be adjusted in a continuous manner, while each squeeze of the controller 1200 changes the length to the next preset value).

While the above discusses the increasing of the size of the cone 1600, it should be apparent that the cone may instead be made smaller (so as to be more selective, for example) than the default size rather than a user only being able to increase the size of the cone.

A default cone size may be defined by a content provider or the like, or in dependence upon the properties of the real/virtual environment. Alternatively, or in addition, the user may define a preferred cone size that may replace this, or the user may define a multiplier or the like that is used to modify the default size.

The cone size may also be varied in dependence upon factors other than those specified by the user. For example, the size of a user’s cone (and therefore the number of sound sources they may be able to listen to at a given time) may be dependent upon the number of people listening to them. This may be advantageous in that a user is able to engage with groups of people without having to manually adjust the cone size so as to enable them to identify everyone in the group.

In some embodiments, the hand that the controller 1200 is held in by a user may be used to affect the operation of the system. For example, when held in one hand the cone is used to identify a desirable sound, but when held in the other the cone is used to exclude a sound. The hand used for each may be defined by user settings (such as defining a dominant hand) and in the context of the use of the system. A cone of sound exclusion may be advantageous in some embodiments, such as when a user is exploring nature in a virtual environment; in this case, it may be beneficial to be able to indicate that sounds from a partner may not be desirable.

In some embodiments, an ‘aim-assist’ type feature may be provided so as to enable a user to more reliably identify a desired sound source. This may be implemented in any suitable fashion; for example, each of the sound sources present in an environment may be detected and located, and when a user indicates a region not containing a sound source but within a threshold distance of a sound source it may be assumed that the nearest sound source is that which the user intended to identify. The intended source may be dependent upon other factors too, such as those indicating a preference for object type (so as to prioritise the sounds of people over those of vehicles, for example) or those identifying a significance or importance of a sound source.

Figure 17 schematically illustrates an environment in which different objects (characters) are each assigned an importance or significance rating. In this description, importance and significance ratings will be used interchangeably, although in practice they may refer to difference measures. For example, an importance rating may be a more personal measure, while a significance rating may be defined by a system or content provider. Nevertheless, each may be used in a similar manner when providing audio to a user.

Such a rating may be determined in any number of ways. For example, previous interaction history may be used to determine a significance rating; this means that objects that are frequently interacted with or friends that are regularly conversed with may be identified as being more important than other objects/people in the environment. Significance may also be determined with regards to volume of the sound, relevance to the user (for example, a particular genre of music or conversations comprising keywords that relate to a user’s interests), or proximity of the sound source to the user for example.

In the scene 1700, three characters are present; a first character 1710 with an importance of 7, a second character 1720 with an importance of 3, and a third character 1730 with an importance of 10. These importance ratings may be displayed to a user, so as to indicate which sound sources they may be interested in listening to, or they may be hidden from the user so as not to visually interfere with their experience. Alternatively, an indication of the rating could be provided using non-numerical symbols or by highlighting sound sources in a particular colour or the like.

In some embodiments, the size of the sound source representation (rather than the object from which sound originates) may be dependent upon the importance rating. For example, the third character 1730 may have a significantly larger sound source representation than the second character 1720 in view of the difference in importance rating. This means that a user is more likely to indicating a more important element as is presents a larger target, so to speak.

The user has used the controller 1200 to indicate that the sound source they wish to focus on is that of the character 1710. This means that the characters 1720 and 1730 may not be audible in many embodiments. However, in some embodiments the user may also be able to hear the character 1730 in view of the high importance rating; this could be because of the enlarged sound source representation (in this case, enlarged so as to cover the character 1720), or simply because it is within a threshold distance of the area identified by the user using the controller 1200. Any other suitable discriminating factor may be used to determine when an important sound source should be heard even when not specifically identified.

In some embodiments, a user may be able to assign an importance rating to an object or character. For example, a user could indicate that anyone on their friends list should be assigned a 10 so as to always (or at least more often than not) be able to hear them in a virtual environment without requiring them to be identified using the controller 1200.

In some embodiments, a user’s own significance rating may be used to determine the default size of their cone; this may be advantageous in a social environment in which it is expected that a celebrity is likely to engage with more people than the average user, for example.

Figure 18 schematically illustrates a scene 1800 in which a plurality of characters are present, each demonstrating a different sound status. While symbols are used to illustrate the status in this Figure, any other suitable illustration method may be used, if any. For example, symbols may be displayed only when a user attempts to listen to the sound source, or coloured borders around the source may be used instead.

A first character 1810 is displayed with a ‘stop’ symbol above them. This is to indicate that a user is unable to hear sounds from that particular source. This may be because they lack permission; for example, if the character 1810 has restrictive privacy settings or if the user has privacy settings (such as children not being allowed to interact with strangers).

A second character 1820 is shown with a ‘tick’ symbol above them. This is to indicate that the user is able to hear sounds from the character 1820. This may be useful for when the user is in an environment with a large number of characters they are not able to hear, at which point it may be less visually disturbing to indicate those who a user can interact with rather than those that they cannot.

A third character 1830 is shown with a '£’ symbol above them, to indicate that some form of payment is required before a user can listen to them. Examples of this include an in-game guide who offers hints - these may be paid for by a user who is stuck on a particular problem. Alternatively, this could be used in conjunction with a digital music store to indicate that a user must purchase the music first.

A fourth character 1840 is displayed with a speaker symbol above their head. This is to indicate that that character 1840 is listening to the user. This may be advantageous in an online lecture environment, to see if users are paying attention, for example, or simply to identify which characters may be interested in interacting.

Of course, the symbols described here are merely exemplary and any number of additional or alternative symbols to indicate these or other statuses may be employed by the skilled person when implementing a system according to the current disclosure.

Figure 19 schematically illustrates an example of a use scenario in which a user is unable to see the sound sources that they would like to hear. Figure 19 comprises a scene 1900 with an obstacle 1910 that obscures a user’s view of at least a portion of the environment 1900. By moving the controller 1200 to the position/orientation 1920, the user is able to identify a sound source that is obscured by the obstacle 1910. This may be advantageous in a gameplay environment in which an enemy may be hiding around the corner, for example.

Figure 20 schematically illustrates an audio generation system for generating audio in dependence upon the position and orientation of a peripheral associated with the audio generation system. Figure 20 includes a virtual scene analysis unit 2000, a peripheral identification unit 2010, a sound identification unit 2020 and a sound output unit 2030.

The virtual scene analysis unit 2000 is operable to locate virtual sounds and/or objects within a virtual environment, and to identify links between them where appropriate and/or possible (as some sounds may not have a visible source in the environment, such as the wind or a source outside of a virtual room).

The peripheral identification unit 2010 is operable to detect the real-world location and orientation of the peripheral (such as the physical location of the controller 1200, although ‘peripheral’ could instead refer to a user’s hands or the like rather than a peripheral in the traditional sense), and to map these to a virtual position and orientation in the virtual environment.

In addition to this, the peripheral identification unit 2010 is operable to detect any inputs being provided via the peripheral, if these are to be used to control the audio generation process. In some embodiments, the peripheral identification unit 2010 is operable to generate a visual representation of the mapped virtual position and orientation for display to a user.

The sound identification unit 2020 is operable to identify an indicated virtual sound source in the virtual environment in dependence upon the mapped virtual location and orientation of the peripheral, and to identify audio corresponding to that source (so that the identified audio can for example provide an output audio signal to the user of the apparatus such as the operator of an HMD). In some embodiments, the sound identification unit 2020 is operable to identify two or more indicated sound sources.

The sound output unit 2030 is operable to generate an audio output in dependence upon the audio identified by the sound identification unit, via headphones or speakers or any other suitable sound output method.

In some embodiments, a head-mountable display device having one or more audio transducers operable to provide the generated audio output to a user.

Figure 21 schematically illustrates a virtual scene analysis unit 2000, comprising a sound location unit 2100, an object location unit 2110 and an object identification unit 2120.

The sound location unit 2100 is operable to locate sound sources in the virtual environment. This may be performed by analysing metadata associated with a scene, or by performing processing to identify objects in a scene that are generally associated with sound outputs (such as televisions or people), for example. Processing may be performed that simulates a virtual microphone in the virtual environment, so as to capture sound information for the virtual environment in real-time.

The object location unit 2110 is operable to locate objects that correspond to the sound sources located by the sound location unit 2100. This may be performed using image processing (such as edge detection) to identify the boundaries of an object at the location of the sound. Alternatively, or in addition, metadata associated with the virtual environment that provides information about the location of objects within the environment may be used to locate objects corresponding to the located sounds.

The object identification unit 2120 is operable to identify the objects that are located by the object location unit 2110. This may comprise a categorisation, for example, such as into people/media/other objects or into available/unavailable sources (such as the paid example above), or any other categorisation. In addition to this, the object identification unit 2120 may be operable to identify a significance of identified objects in the environment.

Alternatively, or in addition, this identification of objects may comprise generating a list of all of the sound sources in the environment and providing them with a name or other identifier. This may be advantageous for playing back the sound to a user, as it may enable the correct sound to be more easily identified. In addition, if categorisation or the like is used, determining any properties based upon the category may be simplified.

The object identification unit 2120 may be operable to determine the status of an indicated sound source, the status being indicative of any restrictions on a user’s potential interactions with that sound source (see Figure 18 and the associated discussion, for example). Information about the status of an object may be included in metadata, for example, or be generated in dependence upon a user’s personal settings or the like - such as generating an ‘allowed’ list for interactions in dependence upon a user’s friends list.

By combining information about the objects and sound sources that are identified, the scene analysis unit 2000 may be operable to identify corresponding sound sources and objects within the environment; this means that a sound source may be associated with a particular object. This may allow a user to indicate an object and hear the correct sound, even if the sound source is not directly indicated.

Figure 22 schematically illustrates the peripheral identification unit 2010, comprising a peripheral location unit 2200, a peripheral orientation detection unit 2210, an input detection unit 2220, and a squeeze detection unit 2230.

The peripheral location unit 2200 is operable to locate a peripheral (such as the controller 1200) in the real-world environment. This may be performed using image tracking with images captured of the peripheral in use, or by using accelerometers and the like that are associated with the peripheral.

The peripheral orientation detection unit 2210 is operable to detect the orientation of the peripheral. This may be performed using image tracking with images captured of the peripheral in use, or by using accelerometers and the like that are associated with the peripheral. The orientation of the peripheral may be defined using any suitable convention; for example, defining the orientation relative to a display device, or using any suitable coordinate system.

The input detection unit 2220 is operable to detect inputs from the controller, such as a button press. These button presses may be operable to control the processing, for example so as to start or stop audio playback, or modify the volume of the playback or the like.

The squeeze detection unit 2230 is operable to detect pressure-based inputs by a user, using pressure sensors arranged on the peripheral for example. The use of pressure-based inputs is described above, although they may be used to control any aspect of the processing. For example, if no squeeze at all is detected (or an appropriate detection result that indicates that the user is not holding the peripheral) then all sound from the environment may be provided to the user rather than using the selective method described in this application.

One or both of the input detection unit 2220 and the squeeze detection unit 2230 may be used to enable the peripheral identification unit 2010 to be operable to detect inputs from the peripheral (such as the controller 1200).

In some embodiments, the peripheral identification unit is operable to detect which hand a peripheral is held in by a user. This may be achieved by using information provided directly by a user, position/orientation tracking of the peripheral, or from inputs from the peripheral itself (such as using an array of pressure sensors to determine which hand is holding the peripheral).

For example, the user could identify a left- and/or a right-hand controller upon launching a game. Alternatively, or in addition, position/orientation tracking may be able to identify that the controller is predominantly on a particular side of a user, or adopts a position/orientation that is only possible using one of the user’s hands (such as pointing in the direction of a particular range of angles, which is different for each hand in normal use). Pressure sensors may be used to sense the location of a user’s fingers on the controller, which may be used to identify which hand the controller is held in.

The sound identification unit 2020 may be operable to use the detection of which hand is used to hold a peripheral when identifying indicated sound sources; in some embodiments, for example, holding a controller in one hand results in the ‘normal’ use described above, whilst holding a controller in the other hand means that the position/orientation is used to identify an indicated sound source as a sound source that is not to be used for audio generation, such that audio from other sound sources from the environment is to be used for audio generation instead.

Figure 23 schematically illustrates the sound identification unit 2020, comprising a target identification unit 2300 and a sound acquisition unit 2310.

The target identification unit 2300 is operable to identify a targeted sound source using the information obtained by the peripheral identification unit 2010. This may comprise determining which sound sources are within the region of the environment indicated by the peripheral position and orientation. Any other processing to determine targets may also be performed in line with the above description, such as that which considers significance ratings or the like.

The target identification unit 2300 may be operable to define the cone (as described above) that indicates a virtual area or volume for identifying sound sources in the environment, in dependence upon the position and orientation information identified by the peripheral identification unit 2010. In such an embodiment, the sound identification unit 2020 is therefore operable to use the mapped location and orientation of the peripheral to define a virtual volume projecting from the peripheral into the environment, and may also be operable to generate a visual representation of the virtual volume for display to a user. Further to this, the sound identification unit 2020 in such an embodiment is operable to identify indicated sound sources within this virtual volume.

The target identification unit 2300 may also be operable to use inputs from the peripheral (as detected by the peripheral identification unit 2010) when defining the virtual volume.

The target identification unit 2300 may be operable to identify a sound source in dependence upon an indicated object, such that a sound source associated with the indicated object may be identified in the case that the sound source is not indicated directly. Such a process may utilise information about corresponding sound sources and objects within the environment as identified by the scene analysis unit 2000.

The sound acquisition unit 2310 is operable to acquire the audio corresponding to the identified targets. This may comprise analysing metadata associated with the sound sources so as to identify the location of the audio and then acquiring it, for example from local storage or streaming via a network connection.

In some embodiments, the virtual scene analysis unit 2000 is replaced (or supplemented) by a real-world scene analysis unit that is operable to provide similar processing using camera inputs that represent the environment surrounding the user. In such an arrangement, the sound acquisition unit 2310 is further operable to obtain sound information using a directional microphone or a microphone array that is able to capture sound information selectively from one or more sources.

Figure 24 schematically illustrates a method for generating an output sound.

A step 2400 comprises analysing a scene, the scene being a virtual or real environment. This analysis is performed so as to locate sound sources and objects within the environment.

A step 2410 comprises obtaining the location and orientation information for the peripheral.

A step 2420 comprises identifying an indicated sound source, using the information acquired in steps 2400 and 2410.

A step 2430 comprises outputting the sound corresponding to the identified sound source or sound sources.

Figure 25 schematically illustrates a process for analysing a scene, the process corresponding to step 2400 of Figure 24.

A step 2500 comprises locating sound sources in the environment. This may be the process of using metadata or the like to identify sound sources in a virtual environment, or using directional microphones or the like to identify sound sources in a real environment.

A step 2510 comprises locating objects in the environment. This may also be performed using metadata describing a virtual environment, or may comprise image processing or the like (such as edge detection) on images of a real or virtual environment so as to identify objects. Information may be generated identifying objects with locations that correspond to the located sound sources in particular, although objects not corresponding to sound sources may also be located.

Figure 26 schematically illustrates a process for obtaining peripheral use information, the process corresponding to step 2410 of Figure 24. This information may be used to identify the cone (or other indicator) that is defined by operation of the peripheral.

A step 2600 comprises obtaining the location of the peripheral. This may be achieved using sensor data, such as data from gyroscopes or accelerometers associated with the peripheral, or via image processing performed on images captured of the peripheral device when in use, for example.

A step 2610 comprises obtaining the orientation of the peripheral. As when obtaining the location of the peripheral, this may be achieved using sensor data, such as data from gyroscopes or accelerometers associated with the peripheral, or via image processing performed on images captured of the peripheral device when in use, for example.

A step 2620 comprises detecting inputs from the peripheral. This may include button presses, squeezes, or any other suitable inputs. These inputs may be provided to the peripheral identification unit 2010 (or a processing device comprising this unit) using a wireless or wired connection, for example.

Figure 27 schematically illustrates a process for identifying an indicated sound source, the process corresponding to step 2420 of Figure 24.

A step 2700 comprises the identification of a target sound source in the environment. This comprises using information about the location/orientation of the peripheral in conjunction with information about the location of sound sources in the environment in order to identify which of the one or more sound sources in the environment are being indicated by the user.

A step 2710 comprises the acquisition of the sounds corresponding to the identified target sound sources. This may comprise examining metadata in order to identify the location of the sounds on local storage or online, for example.

Variations of the techniques discussed above can be implemented within the scope of the appended claims. These include one or more of the following:

The selection techniques of Figure 14 can be combined with face recognition in the real world environment to detect sound sources of interest around a room.

Within the selection area 1410 or near to it, analysis of tone or voice/accent can be carried out to select voices of interest (for example child, male, female) and disallow others.

As one example of using the system in reverse, the selection technique can be used to remove sounds, for example to selectively ‘mute’ specific people or sounds that are annoying to the user.

Within the ensemble of selected sounds, machine translation and/or automatic subtitling can be applied to the selected voices.

A visual indicator can be provided so that users know if they are being listened to by someone. This could be, for example, an icon or make the selection cone 1300 visible to others. This could provide a safety aspect to know if someone is eavesdropping.

More generally, the system can be used “in reverse” by a user to choose which people are allowed to hear that user’s sounds. Such an arrangement will now be described with reference to Figures 28 and 29.

Figure 28 schematically represents an audio transmission system for transmitting audio in dependence upon the position and orientation of a peripheral associated with the audio transmission system. The system uses techniques corresponding to those discussed above, and comprises:

a virtual scene analysis unit 2800 operable to locate virtual sound receivers in a virtual environment, each virtual sound receiver being associable with a corresponding user;

a peripheral identification unit 2810 operable to detect the real-world location and orientation of the peripheral, and to map these to a virtual position and orientation in the virtual environment; and a recipient identification unit 2820 operable to identify an indicated virtual sound receiver in the virtual environment in dependence upon the mapped virtual location and orientation of the peripheral, and to allow transmission of audio to that virtual sound receiver.

Figure 29 is a schematic flowchart illustrating an audio transmission method for transmitting audio in dependence upon the position and orientation of a peripheral associated with the audio transmission system, the method comprising:

locating (at a step 2900) virtual sound receivers in a virtual environment, each virtual sound receiver being associable with a corresponding user (which is to say, the virtual object such a an avatar can be (but does not have to be) associated with a user who can listen to the sound);

detecting (at a step 2910) the real-world location and orientation of the peripheral; mapping (at a step 2920) the detected real-world location and orientation to a virtual position and orientation in the virtual environment;

identifying (at a step 2930) an indicated virtual sound receiver in the virtual environment in dependence upon the mapped virtual location and orientation of the peripheral; and allowing (at a step 2940) transmission of audio to that virtual sound receiver.

It will be appreciated that example embodiments can be implemented by computer software operating on a general purpose computing system such as a games machine. In these examples, computer software, which when executed by a computer, causes the computer to carry out any of the methods discussed above is considered as an embodiment of the present disclosure. Similarly, embodiments of the disclosure are provided by a non-transitory, machine-readable storage medium which stores such computer software.

It will also be apparent that numerous modifications and variations of the present disclosure are possible in light of the above teachings. It is therefore to be understood that within the scope of the appended claims, the disclosure may be practised otherwise than as specifically described herein.

Claims

1. An audio generation system for generating audio in dependence upon the position and orientation of a peripheral associated with the audio generation system, the system comprising:

a virtual scene analysis unit operable to locate virtual sound sources in a virtual environment;

a peripheral identification unit operable to detect the real-world location and orientation of the peripheral, and to map these to a virtual position and orientation in the virtual environment; and a sound identification unit operable to identify an indicated virtual sound source in the virtual environment in dependence upon the mapped virtual location and orientation of the peripheral, and to identify audio corresponding to that source.

2. A system according to claim 1, wherein the peripheral identification unit is operable to generate a visual representation of the mapped virtual position and orientation for display to a user.

3. A system according to claim 1, comprising a sound output unit operable to generate an audio output in dependence upon the audio identified by the sound identification unit.

4. A system according to claim 3, comprising a head-mountable display device having one or more audio transducers operable to provide the generated audio output to a user.

5. A system according to claim 1, wherein the sound identification unit is operable to use the mapped location and orientation of the peripheral to define a virtual volume projecting from the peripheral into the environment.

6. A system according to claim 5, wherein the sound identification unit is operable to generate a visual representation of the virtual volume for display to a user.

7. A system according to claim 5, wherein the sound identification unit is operable to identify indicated sound sources within this virtual volume.

8. A system according to claim 5, wherein:

the peripheral identification unit is operable to detect inputs from the peripheral; and the sound identification unit is operable to use these inputs when defining the virtual volume.

9. A system according to claim 1, wherein:

the peripheral identification unit is operable to detect which hand a peripheral is held in by a user; and the sound identification unit is operable to use this detection when identifying indicated sound sources.

10. A system according to claim 8, wherein the sound identification unit is operable to identify an indicated sound source as a sound source that is not to be used for audio generation, and to identify audio from other sound sources from the environment to be used for audio generation.

11. A system according to claim 1, wherein the scene analysis unit is operable to determine the status of an indicated sound source, the status being indicative of any restrictions on a user’s potential interactions with that sound source.

12. A system according to claim 1, wherein:

the scene analysis unit is operable to identify objects within the environment, and to identify a significance of identified objects in the environment; and the sound identification unit is operable to identify indicated sound sources in dependence upon the significance of the objects.

13. A system according to claim 1, wherein:

the scene analysis unit is operable to identify corresponding sound sources and objects within the environment; and the sound identification unit is operable to identify a sound source in dependence upon an indicated object, such that a sound source associated with the indicated object may be identified in the case that the sound source is not indicated directly.

14. An audio generation method for generating audio in dependence upon the position and orientation of a peripheral associated with an audio generating system, comprising:

locating virtual sound sources in a virtual environment;

detecting the real-world location and orientation of the peripheral;

mapping the real-world location and orientation of the peripheral to a virtual position and orientation in the virtual environment;

identifying an indicated virtual sound source in the virtual environment in dependence upon the mapped location and orientation of the peripheral; and identifying audio corresponding to the identified source.

15 An audio transmission system for transmitting audio in dependence upon the position and orientation of a peripheral associated with the audio transmission system, the system comprising:

a virtual scene analysis unit operable to locate virtual sound receivers in a virtual environment, each virtual sound receiver being associable with a corresponding user;

a peripheral identification unit operable to detect the real-world location and orientation ofthe peripheral, and to map these to a virtual position and orientation in the virtual environment; and a recipient identification unit operable to identify an indicated virtual sound receiver in the virtual environment in dependence upon the mapped virtual location and orientation of the peripheral, and to allow transmission of audio to that virtual sound receiver.

16 An audio transmission method for transmitting audio in dependence upon the position and orientation of a peripheral associated with the audio transmission system, the method comprising:

locating virtual sound receivers in a virtual environment, each virtual sound receiver being associable with a corresponding user;

detecting the real-world location and orientation ofthe peripheral;

mapping the detected real-world location and orientation to a virtual position and orientation in the virtual environment;

identifying an indicated virtual sound receiver in the virtual environment in dependence upon the mapped virtual location and orientation ofthe peripheral; and allowing transmission of audio to that virtual sound receiver.

17. Computer software which, when executed by a computer, causes the computer to execute the method of claim 14 or claim 16.