WO2021206734A1

WO2021206734A1 - 3d sound reconstruction using head related transfer functions with wearable devices

Info

Publication number: WO2021206734A1
Application number: PCT/US2020/027743
Authority: WO
Inventors: Mithra VANKIPURAM
Original assignee: Hewlett-Packard Development Company, L.P.
Priority date: 2020-04-10
Filing date: 2020-04-10
Publication date: 2021-10-14

Abstract

Systems, devices, and methods may include determining a baseline position of a head of a user and monitoring movement of the head of the user from the baseline position. Head-related transfer functions (HRTFs) may be generated to provide audio signals enabling three-dimensional (3D) sound to the user through an audio device.

Description

3D Sound Reconstruction Using Head Related Transfer Functions With Wearable Devices

BACKGROUND

[0001] Head-related transfer functions (HRTFs) may be used to provide three-dimensional (3D) sound reconstruction to a user. For example, two HRTFs may be used to synthesize a binaural sound perceived as originating from a particular point in space.

BRIEF DESCRIPTION OF THE DRAWINGS [0002] The written disclosure herein describes illustrative examples that are nonlimiting and non-exhaustive. Reference is made to several of such illustrative examples that are depicted in the figures described below.

[0003] FIG. 1 illustrates an example block diagram of a system for implementing head-related transfer functions (HRTFs) with a wearable device.

[0004] FIG. 2 illustrates a simplified view of devices for determining positions of one or more portions of a user.

[0005] FIG. 3 illustrates a flowchart of a method to implement one or more HRTFs with a wearable device to provide a user with three-dimensional (3D) audio.

[0006] FIG. 4 illustrates an example block diagram of a wearable device.

DETAILED DESCRIPTION

[0007] Systems, devices, and methods used to detect and update head-related transfer functions (HRTFs) with a wearable device are disclosed herein. Head- related transfer functions are used to provide perceived directional audio (e.g., three-dimensional (3D) sound or sound reconstruction) to provide a user with an enhanced listening experience. In some examples, 3D sound reconstruction may provide instructions (e.g., navigational instructions and/or prompts) to the user emanating from a specific point in space as perceived by the user.

[0008] A system may utilize the position and orientation of a user’s head (e.g., the local position of the head) to provide sounds that are perceived by the user as emanating from a particular point or area (e.g., a geographically fixed point or area). Where navigational instructions are being implemented, the user’s location (e.g., the global position of the user) may be used with the local position of the user’s head to determine how the 3D sound is provided to the user.

[0009] Directional audio may be implemented with a wearable device or devices (e.g., headphones, jewelry, a pin, glasses, another tracking device, etc.) that are coupled to a portion of the user (e.g., the user’s head). For example, the wearable device may include a position device for determining and/or tracking the position of the user’s head in one or more axes of movement (e.g., rotational and/or translational movement). The position device may comprise one or more features for motion or position tracking, such as, for example, an inertial measurement unit (IMU) with one or more accelerometers, gyroscopes, magnetometers, pressure sensors, and/or temperature sensors. For example, the position device may measure rotational movement of the user’s head along three axes of movement (e.g., pitch, roll, and yaw).

[0010] In some examples, the position device may measure translation (e.g., displacement) along one or more of three axes. For example, the position device may track x-axis movement, y-axis movement, z-axis movement, and/or combinations thereof. In some examples, x-axis position and z-axis position on a surface or plane may be determined by satellite {e.g., via global positioning systems (GPS)) and y-axis position may be determined by one or more of satellite signals, pressure sensors, and/or temperature sensors.

[0011] In some examples, the wearable device may provide the audio signals to the user. For example, headphones (e.g., over-ear headphones, in-ear headphones, on-ear headphones, one or more earbuds, etc.) may include an integrated IMU in one side or two sides, where implemented, of the headphones (e.g., left and/or right portions of each respective ear of the user).

[0012] Where examples of the present disclosure are implemented with audio devices that move at least partially in unison with the user, determining and/or tracking movement of the user (e.g., the user’s head) may assist in providing the directional audio to the user. For example, as the user turns their head, audio intended to come from a select direction (e.g., a fixed point) may no longer be perceived correctly by the user as the user has reoriented their own local position. Utilization of a position device may further refine the user’s experience and at least partially ensure sounds are provided from the intended direction. [0013] For example, where intuitive navigation using 3D audio is implemented, the audio may be provided at certain perceived locations by the user. Navigation systems may use auditory feedback to provide information to a user ( e.g ., through audio systems in automobiles or via headphones in communication with a mobile device). In some examples, a system may use 3D spatial audio to provide navigational feedback in the direction in which the user is guided. For example, the sound may appear to come from the destination or from the location where the user should turn. In some examples, portions of the audio may be modulated to enhance audio effects and provide the navigational information at a more intuitive level for users. For example, the volume of the audio may be modulated while directions are being provided (e.g., the volume may be increased as the user approaches direction change or a target of the navigation). By way of further example, other audio modulation may be implemented, such as altering audio level in a mix. For example, when the user is also listening to music, the volume of the music as perceived in one location may be decreased while the audio volume of the directional instructions in the same perceived location may be increased. Such manipulation of the audio may be used to provide navigational information and instructions at a relatively more intuitive level for users.

[0014] In order to provide 3D audio, the position of the user and/or the position of the device providing the audio may be used to effectively process the audio (e.g., with one or more FIRTFs). For example, providing the 3D audio may include aligning the user’s head (e.g., in a known manner) with a frame of reference of a mobile device providing the navigation (e.g., a mobile phone or computer, a navigational system of a vehicle, etc.). The audio may not be efficiently provided as one or both of the user or the mobile device continue to move and reorient relative to one another.

[0015] While examples of the instant disclosure discuss the use of 3D audio for navigation instruction by way of example, other examples may include other types of audio provided to the user (e.g., directional music, surround sound, sound effects, other audio feedback, etc.). For example, a surround sound experience may be provided for one or more users (e.g., wearing headphones) where the position of the user may be determined and/or monitored according to examples of the present disclosure ( e.g ., a position of the user’s head in relation to a screen or another reference point).

[0016] As discussed above, movement data for the HRTFs may be provided from sensors of the position device in the wearable device {e.g., via embedded accelerometers, gyroscopes, and/or other sensors) and/or from other devices monitoring the user {e.g., a camera). The sensors may detect relative acceleration and/or rotations of the head to enable local orientation of the user’s head. As discussed below in more detail, the local coordinates from the user’s head may be converted to global coordinates with another device (e.g., another device that is in communication with and/or controlling the position device monitoring the movement of the user’s head).

[0017] In some examples, a mobile device (e.g., a mobile phone, a computer of a vehicle, other handheld devices, etc.) may be used with the position device of the wearable device where data from the position device is provided to the mobile device. The mobile device, or another connected device, may use the position data to provide the HRTFs and generate the 3D audio that is provided back to the user.

In tracking the position of the user’s head, the mobile device may act to detect (e.g., intermittently detect) the position of the user’s head with another method outside of the sensors in the position device. For example, the mobile device may use cameras to capture the orientation of the user’s head to establish a known position of the user’s head (e.g., a baseline position, an updated verified position, an initial position). The position of the user’s head is considered “known” as the camera captures the actual position for a given time or time frame. The system may then rely on motion tracking provided by the position device to estimate the position of the user’s head between the known captures of the head as the head deviates from the detected known position.

[0018] Capture of the user’s head may take place intermittently (e.g., at scheduled intervals and/or when the user’s head is detected within view of a forward facing or screen-side camera) or may be maintained continuously as long as the user’s head is in view of the camera. For example, the mobile device may capture the position of the user’s head every time the user looks at the device (e.g., every time the user checks their mobile phone). [0019] In some examples, the mobile device may use one or more systems of the device to further correlate the detected position of the user’s head. For example, the mobile device may detect the current global location of the user ( e.g ., via a global positioning system (GPS) and/or data from a compass of the mobile device) to set the global coordinates of the head. By way of further example, the mobile device may employ a local position device or sensors, such as those discussed above (e.g., an IMU of the mobile device), to determine the orientation of the mobile device. The mobile device may combine those local coordinates with the local coordinates detected from the user’s head, and then correlate those local coordinates with the detected GPS position. After establishing a known position of the user’s head, the mobile device may start receiving accelerometer and gyroscope readings from the IMU in the position device to update relative changes to the user’s HRTFs.

Flowever, as such readings may be prone to errors and/or noise overtime, additional (e.g., subsequent) readings of the known position of the user’s head may be performed to update the position of the user’s head. Each known position capture may enable recalibration of the position of the user’s head to global coordinates when a user looks at the mobile device, which recalibration can be used to more effectively generate the FI RTFs.

[0020] As noted above, in some examples, the mobile device may include other devices other than a mobile phone. For example, the computer system in an automobile may be utilized where one or more cameras, positioned in locations where the user’s face is typically directed in the automobile, are used to capture the position of the user’s face.

[0021] FIG. 1 illustrates an example block diagram of a system 100 for implementing FIRTFs with a wearable device where, as discussed above, known positions of the user’s head may be provided (e.g., intermittently provided) to enhance the ability of the system 100 to track the user’s head in 3D space. As shown in FIG. 1, the system 100 may include multiple devices in communication with each other, where each device includes a number of subsystems. For example, the system 100 may include a first device (e.g., a mobile device 102) and a second device (e.g., a wearable device 104), where the mobile device 102 receives data from the wearable device 104 and may, optionally transmit data to the wearable device 104 (e.g., instruction, status queries, etc.). [0022] The mobile device 102 may include a position capture subsystem 106, a local coordinate tracking subsystem 108, a global tracking subsystem 110 ( e.g ., a global positioning system (GPS)), and a communications subsystem 112. In some examples, the mobile device 102 may include one or more of a processor 114, memory 116, and a network interface 118 {e.g., to enable wired and/or wireless communications with other devices) connected to a computer-readable storage medium 120 {e.g., a non-transitory computer-readable storage medium) via a communication bus 122. The processor 114 may execute or otherwise process instructions stored in the computer-readable storage medium 120. The instructions stored in the computer-readable storage medium 120 include operational modules 106 through 112 to implement the subsystems described herein.

[0023] The wearable device 104 may include a local coordinate tracking subsystem 124, a communications subsystem 126, and an audio subsystem 128.

In some examples, the wearable device 104 may include one or more of a processor 130, memory 132, and a network interface 134 (e.g., to enable wired and/or wireless communications with other devices) connected to a computer-readable storage medium 136 {e.g., a non-transitory computer-readable storage medium) via a communication bus 138. The processor 130 may execute or otherwise process instructions stored in the computer-readable storage medium 136. The instructions stored in the computer-readable storage medium 136 include operational modules 124 through 128 to implement the subsystems described herein.

[0024] One or both of the mobile device 102 and the wearable device 104 may include one or more position devices for tracking position and/or movements of the user and/or associated devices (e.g., respective inertial measurement units (IMU) 140, 142 of the mobile device 102 and the wearable device 104).

[0025] In some examples, the wearable device may include one or more integrated audio devices 144 (e.g., where the wearable device 104 comprises headphones, one or more earbuds, etc.). In some examples, the audio device may be provided separately from the wearable device 104.

[0026] FIG. 2 illustrates a simplified view of the devices (e.g., the mobile device 102 and the wearable device 104) for determining positions (e.g., local and global positions) of one or more portions of a user 200 in order to provide 3D audio to the user 200. As shown in FIG. 2, each of the mobile device 102 and the wearable device 104 may be independently positioned and/or tracked in respective 3D coordinate systems ( e.g ., relative to local coordinates on an x-axis, a y-axis, and a z- axis). For example, the position of the mobile device 102 may be tracked with local coordinate system 202 and the wearable device 104 {e.g., positioned on the head of the user 200) may be tracked with local coordinate system 204. Both the local coordinate systems 202, 204 will be oriented in 3D space relative to a global coordinate system 206 {e.g., a GPS position system proved by data from multiple satellites), which may be defined on a surface 208 over which both the mobile device 102 and the wearable device 104 are positioned.

[0027] Referring to FIGS. 1 and 2, the position capture subsystem 106 of the mobile device 102 may include an imaging or optical device (e.g., camera 146) used to capture a position of at least a portion of the user 200 or a device worn by the user 200. For example, when the head of the user 200 is in view of the camera 146, the camera 146 may image {e.g., automatically image without further intervention from the user 200) the head of the user 200 to determine the local coordinates of the head of the user 200 on the local coordinate system 204.

[0028] In order to determine the position of the head of the user 200 relative to the mobile device 102 and/or a global position of the user 200, the local coordinate tracking subsystem 108 and the global tracking subsystem 110 may reconcile the local coordinates of the mobile device 102 on the local coordinate system 202 and the local coordinates of the wearable device 104 on the local coordinate system 204. For example, the local coordinates of the mobile device 102 and the wearable device 104 may be reconciled to each other relative to a selected orientation or may both be transformed onto the global coordinate system 206. In some examples, the local coordinate tracking subsystem 108 of the mobile device 102 may include the IMU 140 of the mobile device 102 that tracks the local orientation of mobile device 102. [0029] The resulting coordinates may be utilized to provide a known (e.g., observed) position of the head of the user 200 in 3D space. The communications subsystem 112 may receive additional data relating to movement of the head of user 200 from the known position provided by the mobile device 102. For example, the local coordinate tracking subsystem 124 of the wearable device 104 (e.g., including IMU 142) may track movements of the head of the user 200 and send that data to the mobile device 102 via the communications subsystem 126. The tracked movements of the user 200 may be used to infer subsequent positions of the head of the user 200 based on the movement data from the IMU 142 as the head moves from ( e.g ., deviates from) the known position.

[0030] As the position of the head of the user 200 changes, a device of the system 100 {e.g., the mobile device 102, the wearable device 104) may continue to provide {e.g., calculate, update) HRTFs in order to generate the 3D audio that is provided back to the user 200 via the audio subsystem 128. For example, each iterative position of the user’s head may be provided at selected intervals to update the FIRTFs such that the perceived location of the audio remains substantially constant even when the user 200 continues to alter the position of their head.

[0031] As discussed above, this process may continue in a cyclical manner where known positions of the head are provided (e.g., intermittently provided) to update the HRTFs and ensure that the correct position of the head is being used in the providing of the 3D audio.

[0032] FIG. 3 illustrates a flowchart 300 of a method of implementing one or more HRTFs with a wearable device to provide a user with three-dimensional (3D) audio.

In some examples, the systems and respective devices (e.g., the mobile device 102 and the wearable device 104) may be similar to those examples illustrated in and described with reference to FIGS. 1 and 2.

[0033] At 302, a baseline position of a head of a user may be determined with a first device (e.g., the mobile device 102 (FIGS. 1 and 2)) capable of viewing the head of the user.

[0034] At 304, movement of the head of the user departing from the baseline position may be monitored with a second device (e.g., the wearable device 104 (FIGS. 1 and 2)) in communication with the first device to determine one or more additional positions of the head of the user. For example, a motion-tracking device of the wearable device (e.g., comprising headphones) may track movement of the user’s head and the position of the user’s head may be determined from the motion tracking.

[0035] At 306, head-related transfer functions (HRTFs) may be generated with the baseline position and the one or more additional positions of the head of the user. [0036] At 308, audio signals enabling three-dimensional (3D) sound reconstruction based on the HRTFs may be provided to the user through an audio device.

[0037] In order to at least partially ensure correct orientation of the 3D audio, each new position of the head may be used in an HRTF and the audio signals may be updated accordingly. For example, at 310, the baseline position of the head of the user may be updated (e.g., intermittently updated) with the first device. The baseline position may comprise an image of the head providing a known position of the head. At 312, subsequent movement of the head of the user may be monitored from the updated baseline position with the second device to determine further additional positions of the head of the user.

[0038] In some examples, 306 and 308 may repeat as each additional position of the head of the user may be used in the FIRTFs and associated 3D audio is generated and provided to the user.

[0039] In some examples, 310 and 312 may repeat as the user continues to move their head and yet additional positions are detected directly and/or inferred through detection of movement.

[0040] FIG. 4 illustrates an example block diagram of a wearable device 400, which may be similar to those discussed above. As shown in FIG. 4, the wearable device 400 may include a motion tracking subsystem 402, a communications subsystem 404, and an audio subsystem 406. The wearable device 400 may include an external interface 408 to enable communications with another device {e.g., the mobile device discussed above).

[0041] The motion tracking subsystem 402 may track a position of a head of a user. The communications subsystem 404 may receive audio signals and communicate position data from the motion tracking subsystem 402 to an external device {e.g., a mobile device) via the external device interface 408. The position data may update the position of the head of the user from an initial position of the head of the user that is determined by the external device. The audio subsystem 406 {e.g., a three-dimensional (3D) audio subsystem) may receive the audio signals and transmit the audio signals to the user. The audio signals may provide three-dimensional (3D) sound based on head-related transfer functions (HRTFs) derived from the position data from the motion tracking subsystem 402. [0042] Specific examples of the disclosure are described above and illustrated in the figures. It is, however, appreciated that many adaptations and modifications could be made to the specific configurations and components detailed above. In some cases, well-known features, structures, and/or operations are not shown or described in detail. Furthermore, the described features, structures, or operations may be combined in any suitable manner. It is also appreciated that the components of the examples as generally described, and as described in conjunction with the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, all feasible permutations and combinations of examples are contemplated. Furthermore, it is appreciated that changes may be made to the details of the above-described examples without departing from the underlying principles thereof.

[0043] In the description above, various features are sometimes grouped together in a single example, figure, or description thereof for the purpose of streamlining the disclosure. This method of disclosure, however, is not to be interpreted as reflecting an intention that any claim now presented or presented in the future requires more features than those expressly recited in that claim. Rather, it is appreciated that inventive aspects lie in a combination of fewer than all features of any single foregoing disclosed example. The claims are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate example. This disclosure includes all permutations and combinations of the independent claims with their dependent claims.

Claims

What is claimed is:

1. A method comprising: determining a baseline position of a head of a user with a first device capable of viewing the head of the user; monitoring movement of the head of the user from the baseline position with a second device in communication with the first device to determine an additional position of the head of the user; generating head-related transfer functions (HRTFs) with the baseline position and the additional position of the head of the user; and providing audio signals enabling three-dimensional (3D) sound reconstruction based on the head-related transfer functions (HRTFs) to the user through an audio device.

2. The method of claim 1 , further comprising: updating the baseline position of the head of the user with the first device; and monitoring subsequent movement of the head of the user from the updated baseline position with the second device to determine further additional positions of the head of the user.

3. The method of claim 1 , wherein the second device comprises the audio device.

4. The method of claim 1 , wherein providing the audio signals comprises emitting the audio signals through the audio device comprising a set of headphones worn by the user proximate ears of the user.

5. The method of claim 1 , further comprising capturing an image of the head of the user with a camera of the first device to determine the baseline position.

6. The method of claim 5, further comprising automatically capturing the image of the head of the user with a camera of the first device comprising a mobile device to determine the baseline position when the mobile device detects that the head of the user is in view of the camera.

7. The method of claim 1 , wherein determining the baseline position comprises: determining first local position coordinates of the first device with the first device; recording second local position coordinates of the second device with the first device; and combining the first local position coordinates and the second local position coordinates to determine the baseline position.

8. The method of claim 7, further comprising: determining global position coordinates with the first device; and aligning the baseline position with the global position coordinates.

9. The method of claim 1 , further comprising tracking the movement of the head of the user with an inertial measurement unit (IMU) of the second device.

10. The method of claim 1 , further comprising delivering the audio signals to the user as a navigational direction in a perceived direction of an end target of the navigational direction via the head related transfer functions (HRTFs).

11. The method of claim 10, further comprising modulating a volume of the audio signals as the user approaches the end target of the navigational direction the head related transfer functions (HRTFs).

12. A non-transitory computer-readable medium with instructions stored thereon that, when executed by a processor of a computing device, cause the computing device to: establish a first set of local position coordinates of a head of a user with a first device capable of viewing the head of the user; receive data relating to subsequent movement of the head of the user from the first set of local position coordinates from a second device in communication with the first device to determine a subsequent set of local position coordinates of the head of the user; generate head-related transfer functions (HRTFs) by combining the first set of local position coordinates and the subsequent set of local position coordinates; and provide audio signals with three-dimensional (3D) sound based on the head-related transfer functions (HRTFs) to the user through an audio device.

13. The non-transitory computer-readable medium of claim 12, wherein the instructions further cause the computing device to: establish a second set of local position coordinates of the head of the user with the first device; and monitor additional movement of the head of the user from the second set of local position coordinates with the second device to determine additional sets of local position coordinates of the head of the user.

14. A wearable device comprising: a motion tracking subsystem comprising an inertial measurement unit (IMU) to track a position of a head of a user; a communications subsystem to receive audio signals and to communicate position data from the motion tracking subsystem to an external device, the position data to update the position of the head of the user from an initial position of the head of the user determined by the external device; and a three-dimensional (3D) audio subsystem to receive the audio signals and to transmit the audio signals to the user, the audio signals to provide three-dimensional (3D) sound based on head-related transfer functions (HRTFs) derived from the position data from the motion tracking subsystem.

15. The wearable device of claim 14, wherein the three-dimensional (3D) audio subsystem is further to receive the audio signals comprising navigational directions, the navigational directions being modified by the head-related transfer functions (HRTFs) to provide the navigational directions to the user in a perceived direction of an end target of the navigational directions.