EP4240026A1 - Rendu audio - Google Patents

Rendu audio Download PDF

Info

Publication number: EP4240026A1
Authority: EP; European Patent Office
Prior art keywords: angular direction; audio; audio object; listener; rendering
Prior art date: 2022-03-02
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.): Pending

Application number

EP22159713.1A

Other languages

German (de)

English (en)

Inventor

Miikka Tapani Vilermo

Lasse Juhani Laaksonen

Arto Juhani Lehtiniemi

Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)

Nokia Technologies Oy

Original Assignee

Nokia Technologies Oy

Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)

2022-03-02

Filing date

2022-03-02

Publication date

2023-09-06

2022-03-02 Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy

2022-03-02 Priority to EP22159713.1A priority Critical patent/EP4240026A1/fr

2023-09-06 Publication of EP4240026A1 publication Critical patent/EP4240026A1/fr

Status Pending legal-status Critical Current

Links

238000009877 rendering Methods 0.000 title claims description 142
230000005236 sound signal Effects 0.000 claims abstract description 101
238000012545 processing Methods 0.000 claims abstract description 28
238000004590 computer program Methods 0.000 claims description 42
238000000034 method Methods 0.000 claims description 39
238000013507 mapping Methods 0.000 claims description 19
230000000694 effects Effects 0.000 abstract description 3
230000006870 function Effects 0.000 description 7
230000008569 process Effects 0.000 description 5
230000009467 reduction Effects 0.000 description 4
230000001934 delay Effects 0.000 description 3
230000001419 dependent effect Effects 0.000 description 3
230000007246 mechanism Effects 0.000 description 3
238000012935 Averaging Methods 0.000 description 1
238000003491 array Methods 0.000 description 1
238000004364 calculation method Methods 0.000 description 1
230000001413 cellular effect Effects 0.000 description 1
230000008859 change Effects 0.000 description 1
238000004891 communication Methods 0.000 description 1
238000004519 manufacturing process Methods 0.000 description 1
230000001404 mediated effect Effects 0.000 description 1
238000012986 modification Methods 0.000 description 1
230000004048 modification Effects 0.000 description 1
238000012546 transfer Methods 0.000 description 1
230000000007 visual effect Effects 0.000 description 1

Images

Classifications

- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/11—Positioning of individual sound objects, e.g. moving airplane, within a sound field
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/15—Aspects of sound capture and related signal processing for recording or reproduction
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
- H04S7/303—Tracking of listener position or orientation

Definitions

Examples of the disclosure relate to audio rendering. Some relate to reducing processing requirements for spatial audio rendering.
Spatial audio enables spatial properties of a sound scene to be reproduced for a user so that the user can perceive the spatial properties. This can provide an immersive audio experience for a user or could be used for other applications. Delays or latencies within the processing of the spatial audio can reduce the quality of the audio experience for a listener.
an encoder apparatus comprising means for:
the second angular direction may be independent of a head orientation of a listener.
the second angular direction may be a predefined angular direction.
the second angular direction may be ninety degrees relative to a reference.
the second angular direction may be less than ninety degrees relative to a reference.
the reference may comprise the first angular direction.
a computer program comprising computer program instructions that, when executed by processing circuitry, cause:
a decoder apparatus comprising means for:
the mapping of the at least one audio object to a third angular direction may comprise rendering the at least one audio object to a third angular direction wherein the third angular direction is at a predetermined angular position relative to the second angular direction.
the generating of the rendering of the at least one audio object for head orientation of the listener may comprise mixing of the rendering of the at least one audio object to the second angular direction and the rendering of the audio object to the third angular direction.
the mixing may be weighted based on the head orientation of the listener.
the generating of the rendering of the at least one audio object for head orientation of the listener may be performed in the time domain.
a computer program comprising computer program instructions that, when executed by processing circuitry, cause:
an apparatus comprising means for:
the rendering of the audio object for the second listener may be generated based on the rendering of the audio object for the first listener.
the rendering of the audio object for the second listener may be generated by mapping the rendering of the audio object for the first listener to a different angular orientation.
the rendering may comprise at least one of: binaural rendering, stereo rendering.
the field of view may be determined based on a device configured to be viewed by the first listener and the second listener.
Spatial audio enables spatial properties of a sound scene to be reproduced for a user so that the user can perceive the spatial properties of the original sound scene.
the rendering of the spatial audio has to be adjusted to take into account the changes in the head position of the listener. This adjustment has to take place in real time. If there are significant latencies in the processing of the spatial audio this can lead to delays that can be perceptible to the listener. These delays can reduce the audio quality and make the spatial audio sound unrealistic. If there is a significant delay a listener in a mediated reality environment, or other spatial audio application could have difficulties in determining which audio corresponds to a visual object. Examples of the disclosure make use of symmetries and other properties of the sound scenes to reduce the processing requirements for providing spatial audio and so reduces the effects of these latencies.
Figs. 1A to 1E show symmetrical relationships between different audio signals. Examples of the disclosure can make use of these relationships so as to reduce the processing required for rendering spatial audio.
Fig. 1A shows a listener 101 and an audio object 103.
the listener 101 is listening to spatial audio using a head set 105 or other suitable audio device.
the head set 105 can be configured to provide binaural audio to the listener 101.
the head set 105 provides a right signal R to the right ear of the listener 101 and a left signal L to the left ear of the listener 101.
the right signal R and the left signal L can comprise binaural signals or any other suitable type of signals.
the audio object 103 is located in front of the listener 101 and slightly to the right of the listener 101.
the right signal R and the left signal L can be rendered so that this location of the audio object 103 can be perceived by the listener 101.
Fig. 1B shows another arrangement for the listener 101 and the audio object 103. This arrangement has a symmetrical relationship to the arrangement that is shown in Fig. 1A . In the arrangement of Fig. 1B the audio object 103 has been reflected about the dashed line. The audio object 103 is now positioned in front of the listener 101 and to the left of the listener 101 rather than the right of the listener 101.
the head set 105 provides a right signal R' to the right ear of the listener 101 and a left signal L' to the left ear of the listener 101.
the right signal R' and the left signal L' can comprise binaural signals or any other suitable type of signals.
the audio scenes represented in Figs. 1A and 1B are mirror images of each other. This means that the first left signal L is the same as the second right signal R' and the first right signal R is the same as the second left signal L'.
the signals L and R are available then the signals R' and L' can be obtained from the original signals L and R.
the audio scene shown in Fig. 1B can be recreated by swapping over the signals used for the audio scene in Fig. 1A .
Fig. 1C shows another arrangement for a listener 101 and an audio object 103.
the listener 101 is facing to the right and the audio object 103 is positioned directly to the left of them.
a first left signal L is provided to the left ear of the listener 101 and a first right signal R is provided to the right ear of the listener 101.
Fig. 1D the listener 101 has rotated through 180° compared to the arrangement shown in Fig. 1C .
the listener 101 is facing to the left and the audio object 103 is positioned directly to the right of them.
a second left signal L' is provided to the left ear of the listener 101 and a second right signal R' is provided to the right ear of the listener 101.
the symmetry of the arrangements shown in Figs. 1C and 1D means that, in an analogous manner to that shown in Figs. 1A and 1B , the first left signal L is the same as the second right signal R' and the first right signal R is the same as the second left signal L'.
the signals could be swapped for each other in this situation. That is, the audio scene shown in Fig. 1D can be rendered by swapping the signals used for rendering the audio scene shown in Fig. 1C .
Fig. 1E shows a further arrangement in which the listener 101 has rotated through 90° compared to the arrangements shown in Figs. 1C and 1D .
the listener 101 in this arrangement is facing directly towards the audio object 103.
a third left signal L" is provided to the left ear of the listener 101 and a third right signal R" is provided to the right ear of the listener 101.
the audio scene shown in Fig. 1E can be approximated by mixing the audio scenes from Fig. 1C and the audio scene from Fig. 1D .
the third left signal L" can be generated from a mix of the first left signal L and the second left signal L'.
the third right signal R" can be generated from a mix of the first right signal R and the second right signal R'.
Similar approximation can also be used in scenarios in which the listener 101 is not facing directly at the audio object 103 but could be facing an angle intermediate between those shown in Figs. 1C, 1D and 1E .
the signals needed for the left and right ear could still be obtained by mixing the first left signal L and the second left signal L' and the first right signal R and the second right signal R' respectively.
the signals that are mixed wouldn't be mixed equally and different factors could be applied to the respective signals. The different factors would be dependent upon the actual angular position of the audio object 103.
Figs. 1A to 1E therefore show that if you can obtain first left signals L and first right signals R for an audio object 103 you can use these to generate the left and right signals that can be used for other angular orientations for a listener 101. These approximations are sufficiently accurate to provide adequate spatial audio rendering for the listener 101. Examples of the disclosure make use of these spatial relationships to reduce the processing required for the spatial audio rendering. This reduction can help to reduce latencies and provide an improved spatial audio for the listener 101.
Fig. 2 shows an example method that can be used to implement some examples of the disclosure.
the method of Fig. 2 could be implemented using an encoder apparatus.
the encoder apparatus could be provided within an encoder 705 as shown in Fig. 7 or in any other suitable system.
the method comprises, at block 201, providing an audio signal.
the audio signal comprises at least one audio object 103.
the audio object 103 is located at a first angular direction.
the first angular direction can be determined relative to a device, a user, a coordinate system or any other suitable reference point.
the angular position of the audio object 103 is determined by the audio scene that is represented by the audio signals. For example, it can be determined by the position of the microphones that capture the audio from the audio object 103.
the audio object 103 is rendered to a second angular direction.
the second angular direction is different to the first angular direction. For instance, if the angular object 103 is positioned at thirty degrees, then this is the first angular direction.
the second angular direction could be a different angular direction such as ninety degrees or sixty degrees or any other suitable direction.
the angle that is used for the second angular direction can be independent of a head orientation of a listener 101.
the head orientation of the listener does not need to be known when the audio signal is being generated. This can mean that the encoding apparatus does not need to perform head tracking or obtain any head tracking data.
the second angular direction can be a predefined angular direction.
the predefined angular direction can be defined relative to a reference coordinate system or axis, relative to a head orientation of the listener 101, relative to the actual angular direction of one or more audio objects 103 or relative to any other suitable references.
the second angular direction is ninety degrees relative to a reference.
the reference could be the first angular direction that represents the actual position of the audio object 103.
the reference could be a direction that is determined to be a front facing or forward direction. In other examples the reference could be less than ninety degrees relative to the reference.
the rendering of the audio object 103 provides information that enables the audio object to be rendered as though it is positioned at the second angular direction. For instance, if the second angular direction is ninety degrees, then the rendering of the audio signal to the second angular direction would enable a left signal and right signal to be generated as though the audio object 103 is positioned at ninety degrees.
the audio signal that is provided at block 201 therefore provides rendering information that enables rendering of the audio object 103 at the second angular direction.
the rendering information is not determined for the actual or correct angular direction of the audio object 103.
the angular direction of the rendering information and the actual or correct angular direction of the audio object 103 will be different.
the rendering information could comprise any information that indicates how the audio signals 201 comprising the audio object 103 should be processed in order to produce an audio output with the audio object 103 at the second angular direction.
the rendering information could comprise metadata, mixing matrices or any other suitable type of information.
the method also comprises, at block 203, providing metadata where the metadata is indicative of the first angular direction of the audio object 103.
the metadata can be indicative of the actual angular position of the audio object 103 within the audio scene.
the metadata can be provided with the audio signal.
the metadata can be provided in any suitable format.
the audio signal and the metadata can be associated together. For example, they can be encoded into a single bitstream.
the combination of the metadata indicative of the first angular direction of the audio object 103 and the audio signal with the audio object 103 rendered to a second angular direction comprise sufficient information to enable the audio object 103 to be rendered to the correct angular location for any angular orientation of the head of the listener 101.
the audio signal can comprise a plurality of audio objects.
Each of the audio objects 103 can be rendered to a second angular position.
the second angular position can be different for the different audio objects 103.
the second angular direction can be the same for two or more of the audio objects 103.
the metadata comprises information about the actual angular direction for each of the audio objects within the audio signal.
the number of audio objects 103 and the angular direction of the audio objects 103 can be determined by the audio scene that is represented by the audio signals.
the audio signal and the metadata can be provided to a decoder apparatus.
the decoder apparatus can be provided in an audio playback device.
the audio playback device can be configured to decode the audio signals and process the decoded audio signal to enable spatial audio playback.
the spatial audio playback could comprise binaural audio or any other suitable type of audio.
Fig. 3 shows another example method that can be used to implement some examples of the disclosure.
the method of Fig. 3 could be implemented using a decoder apparatus.
the decoder apparatus could be provided within a decoder 709 as shown in Fig. 7 or in any other suitable system.
the method of Fig. 3 could be performed by an apparatus that has received the audio signal and metadata generated using the method of Fig. 2 , or any other suitable method.
the method comprises, at block 301 obtaining an audio signal.
the audio signal can be an audio signal as provided using the method of Fig. 2 , or any other suitable method.
the audio signal comprises at least one audio object 103 where the at least one audio object 103 is located at a first angular direction in the actual audio scene but is rendered to a second angular direction within the audio signal.
the second angular direction is not the correct angular direction of the audio object 103.
the method also comprises, at block 303, obtaining metadata indicative of the first angular direction of the audio object 103.
This metadata enables the decoder to obtain information about the actual angular direction of the audio object 103 within an actual audio scene.
the method comprises determining a head orientation of the listener 101.
the head orientation of the listener 101 can comprise an indication of the direction in which the listener 101 is facing.
the head orientation of the listener 101 can be provided in a single plane, for example the head orientation can comprise an indication of an azimuthal angle indicative of the direction the listener 101 is facing.
a headset or earphones can comprise one or more sensors that can be configured to detect movement of the head of the listener 101 and so can enable the orientation of the head of the listener 101 to be determined.
a head tracking device could be provided that is separate to a playback or decoding device.
the method comprises generating a rendering of the audio object 103 for the head orientation of the listener 101.
the rendering enables the audio object 103 to be perceived at the correct angular orientation. That is, the rendering enables the audio object 103 to be perceived to be at the first angular direction corresponding to the actual orientation of the audio object 103.
the rendering of the audio object 103 to the second angular direction can be the rendering that is received with the audio signal.
the mapping of the rendering of audio object 103 to a third angular direction can comprise any suitable rendering of the audio object 103 to a third angular direction.
the third angular direction can be at a predetermined angular position relative to the second angular direction.
the rendering of the audio object to a third angular direction can be determined by making use of spatial relationships between the second angular direction and the third angular direction. For example, a symmetrical relationship can be used to allow a mapping between left signals and right signals for the respective angular directions.
the third angular direction can be selected so that the left signal for the third angular direction is the same as the right signal for the second angular direction.
the right signal for the third angular direction would be the same as the left signal for the second angular direction.
the second angular direction is determined to be 90° to the right of a reference
the third angular direction could be 90° to the left of the reference. This could generate a symmetrical arrangement.
the angles that are used for the second angular direction and the third angular direction can be independent of the orientation of the head of the listener 101. This can enable the same second angular direction and the third angular direction to be used for different orientations of the head of the listener 101. This can enable the same second angular direction and third angular direction to be used as the listener 101 moves their head.
the rendering of the audio object 103 for head orientation of the listener 101 the rendering of the audio object to the second angular direction and the rendering of the audio object 103 to the third angular direction are mixed. Any suitable process can be used to mix the respective renderings.
the mixing can be weighted based on the head orientation of the listener 101. The weighting can be such that a larger factor is applied to angular direction that is closest to the actual angular orientation of the audio object 103.
the generating of the rendering of the audio object 103 for head orientation of the listener 101 can be performed in the time domain. This means that there is no need to transform the audio signals to a frequency domain. This can reduce latency of the processing of the audio signals. This can also reduce the processing requirements for determining the rendering for the audio objects as a listener moves their head.
Fig. 4 schematically shows an example listener 101 and audio object 103 and a respective first angular direction 401 and second angular direction 403.
the first angular direction 401 is the actual angular direction of the audio object 103.
the listener 101 is facing towards the audio object 103 so that the first angular direction 401 is directly ahead of the listener 101.
the audio object 103 could be positioned at different angular directions in other examples of the disclosure.
the first angular direction is given by the angle ⁇ with respect to a reference 405.
the reference 405 is an arbitrary axis that can be fixed within a coordinate system.
Other references could be used in other examples of the disclosure.
the second angular direction 403 is the direction to which the audio object 103 is rendered within the audio signal.
the second angular direction 403 is a rotation of 90° clockwise from the first angular direction 401.
the angle that is used for the second angular direction 403 is dependent upon the first angular direction 401.
the angle that is used for the second angular 403 direction can be independent of the first angular direction 401.
the second angular direction 403 could be determined by a reference or point in a coordinate system.
the second angular direction 403 is a rotation of 90° clockwise from the first angular direction 401 but other angular relationships could be used in other examples.
the audio signal that is provided would comprise a rendering of the audio object 103 to the second angular direction 403.
the rendering to the second angular direction 403 is such that, if that rendering was used, the listener 101 would perceive the audio object 103 to be located at the second angular direction 403 instead of the first angular direction 401.
the audio signal is provided with metadata indicative of the first angular direction 401.
Fig. 5 schematically shows the example listener 101 and audio object 103 and the respective first angular direction 401, second angular direction 403 and third angular direction 501.
the first angular direction 401 and the second angular direction 403 are as shown in Fig. 4 .
the third angular direction 501 is a direction to which the rendering of the audio object 103 can be remapped.
the third angular direction 501 is selected to provide a symmetrical arrangement around the first angular direction 401.
the symmetrical arrangement can be obtained by selecting an angle that is the same size as the angle between the first angular direction 401 and the second angular direction 403 but is in a different direction.
the second angular direction 403 direction is a rotation of 90° clockwise from the first angular direction 401 and so the third angular direction 501 is direction is a rotation of 90° anticlockwise from the first angular direction 401.
Other sized angles and references could be used in other examples of the disclosure.
a rendering of the audio object 103 to the third angular direction 501 is generated by mapping the rendering to the second angular direction 403 to the third angular direction 501.
the mapping makes use of the symmetrical properties of the signals.
the mapping to the third angular direction 501 can be generated by swapping the left and right signals that are used for the rendering to the second angular direction 403.
the rendering to the second angular direction 403 can comprise A where A is a spatial audio signal.
the spatial audio signals could comprise binaural or stereo signal or any other suitable type of signal.
the rendering to the second angular direction 403 can therefore comprise A left channel and A right channel signals.
the rendering to the third angular direction 501 would comprise spatial audio signal B.
the spatial audio signal B would comprise B left channel and B right channel signals.
the B left channel and B right channel signals would be generated by swapping the left and right channels for the audio signal A. That is
Fig. 6 schematically shows the example listener 101 and audio object 103 and the respective first angular direction 401, second angular direction 403, third angular direction 501 and the reference 405.
the head orientation of the listener 101 is indicated by angle ⁇ .
the listener 101 is facing toward the audio object 103 but that does not need to be the case in other implementations of the disclosure.
the listener 101 could be facing in other directions in other examples of the disclosure and/or the audio object 103 could be in a different position.
a rendering for the audio object 103 for the head orientation of the listener 101 is generated based on the rendering of the audio object 103 to the second angular direction 403 and the rendering or mapping of the audio object 103 to the third angular direction 501.
the rendering of the audio object 103 for head orientation of the listener 101 can be generated by mixing the rendering of the audio object 103 to the second angular direction 403 and the rendering of the audio object 103 to the third angular direction 501.
Audio played to listner ⁇ 90 ⁇ ⁇ ⁇ ⁇ 180 ⁇ A + ⁇ 90 + ⁇ ⁇ ⁇ 180 ⁇ B
⁇ is the first angular direction of the audio object 103
⁇ is the orientation of the head of the listener 101
A is a spatial audio signal rendered to the second angular direction 403
B is a spatial audio signal rendered to the third angular direction 501.
B is the same as A but with the left and right channels swapped.
mixing equation enable the mixing equation to be used for any orientation of the head of the listener 101. For instance, the listener could be facing towards the back so that the audio object 103 is behind the listener 101. This approximation can make use of the fact that spatialised audio from an angle of Y degrees is very similar to spatialised sound from 180-Y degrees. This simplification can provide spatial audio of adequate quality.
the second angular direction has been selected as 90° to the side of the first angular direction 401. Any angle ⁇ could be used in other examples.
Audio played to listener ⁇ ⁇ ⁇ ⁇ ⁇ 2 ⁇ ⁇ A + ⁇ ⁇ + ⁇ ⁇ ⁇ 2 ⁇ ⁇ B
the mixing equations can be configured so that the relative emphasis used for the rendering of the audio object 103 to the second angular direction 403 and the rendering of the audio object 103 to the third angular direction 501 is dependent upon whether the head orientation of the listener 101 is closer to the second angular direction 403 or the third angular direction 501. That is, if the head orientation of the listener 101 is closer to the second angular direction 403 then the rendering to the second angular direction 403 would be given a higher weighting than the rendering to the third angular direction 501. Conversely if the head orientation of the listener 101 is closer to the third angular direction 501 then the rendering to the third angular direction 501 would be given a higher weighting than the rendering to the second angular direction 403.
the examples described herein have been restricted to determining the orientation of the head of the listener 101 in the horizontal plane. This can be sufficient to cover many applications that use head tracking. It is possible to extend the examples of the disclosure to cover rotations of the head of the listener 101 out of the plane. For example, if the listener 101 tilts their head. To account for the listener 101 tilting their head additional audio signals can be provided. In such examples, an audio signal can be provided with a rendering at an angle above the angular position of the audio object 103. This can then be mapped to an angular direction below the audio object 103 and the two renderings can be mixed as appropriate to take into account any tilting of the head of the listener 101.
Examples of the disclosure can also be used for audio scenes that comprise a plurality of audio object 103.
each audio object 103 can be treated separately so that an audio signal and rendering to a second angular direction 403 can be provided for each audio object 103.
the renderings to the third angular direction 501 can be determined for each of the audio objects 103 and the signals for each audio object 103 can be mixed depending on the head orientation of the listener 101 and then summed for playback to the listener 101.
the audio signals that are used can also comprise other audio in addition to the audio objects 103.
the other audio could comprise ambient or non-directional or other types of audio.
the ambient or non-directional audio can be rendered using an assumed head orientation of the listener 101.
the assumed head orientation could be a front facing direction or any other suitable direction.
the ambient or non-directional audio can be rendered without applying head tracking because it can be assumed to be the same in all directions.
the non-directional audio could comprise music tracks or background audio or any other suitable type of audio.
Examples of the disclosure provide for improved spatial audio.
the calculations that are used in examples of the disclosure do not require very much data to be sent and also don't require very much processing to be carried out, compared to processing the frequency domain.
This reduction in the processing requirements mean that the processing could be carried out by small processors within headphones or headsets of other suitable playback devices.
This can provide a significant reduction in latencies introduced by the processing compared to systems in which all the processing is done by a central processing device, such as a mobile phone, and then transmitted to the headphones or other playback device.
all of the processing can be performed in the time domain. This also provides for a reduction in latency.
Examples of the disclosure can also be used for scenarios in which a plurality of listeners 101 are listening to the same audio scene but are each using their own headsets for play back of the audio.
the head tracking can be performed by the playback device and the data sent from the encoding device to the playback device is independent of the orientation of the head of the listener 101. This means that the encoding device can send the same audio signal and metadata to each decoding device. This can reduce the processing requirements for the encoding device.
Fig. 7 schematically shows an example system 701 that can be used to implement examples of the disclosure.
the system 701 comprises an encoder 705 and a decoder 709.
the encoder 705 and the decoder 709 can be in different devices.
the encoder 705 could be provided in an audio capture device or a network device and the decoder 709 could be provided in a headset or other device configured to enable audio playback.
the encoder 705 and the decoder 709 could be in the same device.
a device can be configured for both audio capture and audio playback.
the system 701 is configured so that the encoder 705 is configured to obtain an input comprising audio signals 703.
the audio signals 703 can be obtained from two or more microphones configured to capture spatial audio or from any other suitable source
the encoder 705 can comprise any means that can be configured to encode the audio signals 703 to provide a bitstream 707 as an output.
the bitstream 707 can comprise the audio signal with an audio object 103 rendered to a second angular direction 403 and also metadata indicative of the first angular direction 401 of the audio object 103.
the bitstream 707 can be transmitted from a device comprising the encoder 705 to a device comprising the encoder 709.
the bitstream 707 can be transmitted using any suitable means.
the bitstream 707 can be transmitted using wireless connections.
the wireless connections could comprise a low-power means such as Bluetooth or any other appropriate means.
the encoder 709 and the decoder 709 could be in the same device and so the bitstream 707 can be stored in the device comprising the encoder 705 and can be retrieved and decoded by a decoder 709 when appropriate.
the decoder 709 can be configured to receive the bitstream 707 as an input.
the decoder 709 comprises means that can be configured to decode the bitstream 707.
the decoder 707 can also comprise means for determining a head orientation for the listener.
the decoder 707 can be configured to generate a rendering of the audio object 103 at a third angular direction 501 and use the rendering of the audio object 103 to the second angular direction 403 and the rendering of the audio object to the third angular direction 501 to generate a rendering of the audio object 103 for the head orientation of the listener 101.
the decoder 709 provides the spatial audio output 711 for the listener.
the spatial audio output 711 can comprise binaural audio or any other suitable type of audio.
Fig. 8 shows another example method that could be used in some examples of the disclosure.
the method of Fig. 8 could be implemented using an encoder apparatus or a decoder apparatus or any other suitable type of apparatus.
the method of Fig. 8 can be used where two or more listeners 101A, 101B are rendering the same content but are each associated with their own playback device. For example, two or more 101A, 101B could be viewing the same content on a screen or other display but could each be wearing their own headset or earphones.
the method comprises, at block 801, determining a field of view for at least a first listener 101A and a second listener 101B.
the field of view can be determined based on information such as the locations of the listeners 101A, 101B and the directions in which they are facing.
the field of view could be determined based on a device that is shared by the listeners 101. For instance, it could be determined that the listeners 101A, 101B are sharing a display or screen so that both listeners 101A, 101B are viewing the same display or screen. In such cases the position of the display or screen could be used to determine the field of view of the listeners 101.
the field of view can be determined to comprise an angular range.
the angular range can be derived with respect to any suitable reference.
the reference could be the positions of the listeners 101A, 101B, the position of one or more devices and/or any other suitable object.
the field of view can have an angular range of about 30° or any other suitable range.
the method comprises determining whether an audio object 103 is located within the field of view.
the method can determine whether an audio object 103 is located within the field of view of each of the plurality of listeners 101A, 101B. Where there are more than two listeners 101 it can be determined whether or not the audio object 103 is within a shared field of view of a subset of the listeners 101.
the angular direction of the audio object 103 can be determined.
the angular direction of the audio object 103 can be determined based on metadata provided with an audio signal or by using any suitable process. Once the angular direction of the audio object 103 has been determined it can be determined whether or not this angular direction falls within the field view as determined at block 801.
the audio object 103 is rendered for the both the first listener 101A and the second listener 101B.
the rendering can comprise binaural rendering, stereo rendering or any other suitable type of spatial audio rendering.
the rendering of the audio object 103 can be based upon whether or not the audio object 103 is within the field of view of the listeners 101A, 101B.
the rendering of the audio object 103 can be the same for both the first listener 101A and the second listener 101B if the audio object 103 is outside of the field of view.
the rendering of the audio object 103 can be different for the first listener 101A and the second listener 101B if the audio object 103 is inside the field of view.
the rendering of the audio object 103 for the second listener 101B can be generated based on the rendering of the audio object 103 for the first listener 101A. Processes similar to those shown in Figs. 2 to 6 can be used to generate the rendering of the audio object 103 for the second listener 101B based on the rendering of the audio object 103 for the first listener 101A. In such examples, from the point of view of the second listener 101B, the rendering for the first listener 101A would be a rendering to the second angular direction 403. This could be processed using the methods described above to provide a rendering to the correct angular direction for the second listener 101B.
the rendering of the audio object 103 for the second listener 101B can be generated by mapping the rendering of the audio object 103 for the first listener 101A to a different angular orientation and mixing the respective renderings as appropriate based on the angular direction of the audio object 103 relative to the second listener 101B.
the rendering for the first listener 101A and the second listener 101B can be generated by summing and averaging binauralized or stereo audio objects 103.
An audio signal based on the summed, or averaged, audio objects 103 can then be sent to each of the playback devices of the listeners 101A, 101B. This can enable different renderings for each of the different listeners 101A, 101B.
Fig. 9 shows a plurality of listeners 101A, 101B and audio objects 103A, 103B.
the method of Fig. 8 could be used to provide spatial audio for the listeners 101A, 101B in Fig. 9 .
the plurality of listeners 101A, 101B are each listening to spatial audio using their own playback device.
the playback device comprises a headset 105A, 105B.
Each listener 101A, 101B within the plurality of listeners 101 is associated with a different headset 105A, 105B.
Fig. 9 two listeners 101A, 101B are shown. In other examples there could be more than two listeners 101A, 101B. Similarly, Fig. 9 also shows two audio objects 103A, 103B. The audio scene could comprise other numbers of audio objects 103 in other examples of the disclosure.
Each of the listeners 101A, 101B are listening to the same audio content. For instance, in the example of Fig. 9 each of the listeners 101A, 101B are viewing a device 901.
the device 901 could be displaying images or video content and the audio content could be corresponding to the images displayed on the display of the device 901.
the listeners 101A, 101B can be listening to the same audio scene via the headsets 105A, 105B. However, because the listeners 101A, 101B are in different positions and can be facing in different orientations the spatial aspects of the audio scene can be different for each of the listeners 101A, 101B.
the listeners 101A, 101B are positioned adjacent to each other so that the second listener 101B is to the right of the first listener 101A.
Each of the listeners 101A, 101B is facing towards the device 901.
the first listener 101A is facing to the front and slightly to the right to view the device 901 and the second listener 101B is facing to the front and slightly to the left to view the device 901.
Other arrangements of the listeners 101A, 101B could be used in other examples of the disclosure.
a field of view of the listeners 101A, 101B can be defined using any suitable parameters.
the field of view could be defined so that it covers the device 901 that is being viewed by each of the listeners 101A, 101B. in some examples the field of view could be determined based on the direction in which the listeners 101A, 101B are facing.
the field of view can cover an angular region that covers the span of the device 901.
the field of view can have an angular range of about 30° or any other suitable range.
the example of Fig. 9 also comprises a first audio object 103A and a second audio object 103B.
the first audio object 103A and the second audio object 103B are positioned in different locations around the listeners 101A, 101B.
the first audio object 103A is positioned at an angular direction that would be behind the listeners 101A, 101B. Both the first listener 101A and the second listener 101B are facing away from the first audio object 103A. In this case the first audio object 103A would be determined to not be within the field of view of the listeners 101A, 101B.
the second audio object 103B is positioned in front the listeners 101A, 101B.
the second audio object 103B is positioned at an angular direction that overlaps with the angular direction of the device 901. Both the first listener 101A and the second listener 101B are facing towards from the second audio object 103B. In this case the second audio object 103B would be determined to be within the field of view of the listeners 101A, 101B.
the first audio object 103A could be rendered to be the same for both the first listener 101A and the second listener 101B. This does not necessarily recreate an accurate representation of the audio scene. However, to create spatial audio of an adequate quality it is sufficiently accurate for the listeners 101A, 101B to perceive that the audio object 101A, 101B is behind them.
the precise angular direction of audio objects 103A, 103B behind the listeners 10A, 101B is not as important as the angular direction of audio objects 103A, 103B that are in front of the listeners 101A, 101B because the listeners 101A, 101B cannot see the audio objects 103A that are behind them.
the audio object that is outside of the filed of view of the listeners 101A, 101B is positioned behind the listeners 101A, 101B.
the audio object could be to the far left or the far right. Such an object would be perceived to be the left of both of the listeners 101A, 101B or to the right of both of the listeners 101A, 101B respectively. Therefore, in such cases the audio object 103 could be approximated to be at the same angular direction for each of the listeners 101A, 101B.This can enable the same rendering to be used for each of the listeners 101A, 101B.
the second audio object 103B could be rendered differently for each of the listeners 101A, 101B.
the second audio object 103B is positioned slightly to the right of the first listener 101A but slightly to the left of the second listener 101B. This means that the second audio object 103B would have spatial parameters that would be perceived differently by each of the listeners 101A, 101B. To enable these different spatial parameters to be taken into account the second audio object 103B would be rendered differently for each of the listeners 101A, 101B.
the rendering of the second audio object 103B that is within the field of view would take into account the angular direction between each of the respective listeners 101A, 101B and the audio object 103B. In some examples the methods shown in Figs. 2 to 6 and described above could be used to enable the rendering of the second audio object 103B for the different listeners 103A, 103B.
the different rendering for the different listeners 101A, 101B could take into account head tracking. For instance, if the first listener 101A moves their head this can change the rendering of the audio object 103B for that first listener 101A but not for the second listener 101B.
Examples of the disclosure therefore provide for an efficient method of processing audio for a plurality of listeners 101A, 101B while still providing spatial audio of a sufficient quality.
Using the same rendering for audio objects 103 that are outside of the field of view of the listeners 101A, 101B can reduce the processing requirements.
using different rendering for audio objects 103 that are within the field of view can ensure that adequate spatial audio quality is provided.
Fig. 10 schematically shows an example apparatus 1001 that could be used in some examples of the disclosure.
the apparatus 1001 could comprise a controller apparatus and could be provided within decoder device 705 or an encoder device 709 as shown in Fig. 7 , or in any other suitable type of device.
the apparatus 1001 comprises at least one processor 1003 and at least one memory 1005. It is to be appreciated that the apparatus 1001 could comprise additional components that are not shown in Fig. 10 .
the apparatus 1001 can be implemented as processing circuitry.
the apparatus 1001 can be implemented in hardware alone, have certain aspects in software including firmware alone or can be a combination of hardware and software (including firmware).
the apparatus 1001 can be implemented using instructions that enable hardware functionality, for example, by using executable instructions of a computer program 1007 in a general-purpose or special-purpose processor 1003 that can be stored on a computer readable storage medium (disk, memory etc.) to be executed by such a processor 1003.
a general-purpose or special-purpose processor 1003 that can be stored on a computer readable storage medium (disk, memory etc.) to be executed by such a processor 1003.
the processor 1003 is configured to read from and write to the memory 1005.
the processor 1003 can also comprise an output interface via which data and/or commands are output by the processor 1003 and an input interface via which data and/or commands are input to the processor 1003.
the memory 1005 is configured to store a computer program 1007 comprising computer program instructions (computer program code 1009) that controls the operation of the apparatus 1001 when loaded into the processor 1003.
the computer program instructions, of the computer program 1007 provide the logic and routines that enables the apparatus 1001 to perform methods as illustrated in Figs. 2, 3 and 9 .
the processor 1003 by reading the memory 1005 is able to load and execute the computer program 1007.
the apparatus 1001 could be comprised within an encoder apparatus.
the apparatus 1001 therefore comprises: at least one processor 1003; and at least one memory 1005 including computer program code 1009, the at least one memory 1005 and the computer program code 1009 configured to, with the at least one processor 1003, cause the apparatus 1001 at least to perform:
the apparatus 1001 could be comprised within a decoder apparatus.
the apparatus 1001 therefore comprises: at least one processor 1003; and at least one memory 1005 including computer program code 1009, the at least one memory 1005 and the computer program code 1009 configured to, with the at least one processor 1003, cause the apparatus 1001 at least to perform:
the apparatus 1001 can comprise: at least one processor 1003; and at least one memory 1005 including computer program code 1009, the at least one memory 1005 and the computer program code 1009 configured to, with the at least one processor 1003, cause the apparatus 1001 at least to perform:
the computer program 1009 can arrive at the apparatus 1001 via any suitable delivery mechanism 1011.
the delivery mechanism 1011 can be, for example, a machine readable medium, a computer-readable medium, a non-transitory computer-readable storage medium, a computer program product, a memory device, a record medium such as a Compact Disc Read-Only Memory (CD-ROM) or a Digital Versatile Disc (DVD) or a solid-state memory, an article of manufacture that comprises or tangibly embodies the computer program 1007.
the delivery mechanism can be a signal configured to reliably transfer the computer program 1007.
the apparatus 1001 can propagate or transmit the computer program 1007 as a computer data signal.
the computer program 1007 can be transmitted to the apparatus 1001 using a wireless protocol such as Bluetooth, Bluetooth Low Energy, Bluetooth Smart, 6LoWPan (IP v 6 over low power personal area networks) ZigBee, ANT+, near field communication (NFC), Radio frequency identification, wireless local area network (wireless LAN) or any other suitable protocol.
a wireless protocol such as Bluetooth, Bluetooth Low Energy, Bluetooth Smart, 6LoWPan (IP v 6 over low power personal area networks) ZigBee, ANT+, near field communication (NFC), Radio frequency identification, wireless local area network (wireless LAN) or any other suitable protocol.
the computer program 1007 can comprise computer program instructions for causing an apparatus 1001 to perform at least the following:
the computer program 1007 can comprise computer program instructions for causing an apparatus 1001 to perform at least the following:
the computer program 1007 can comprise computer program instructions for causing an apparatus 1001 to perform at least the following:
the computer program instructions can be comprised in a computer program 1007, a non-transitory computer readable medium, a computer program product, a machine readable medium. In some but not necessarily all examples, the computer program instructions can be distributed over more than one computer program 1007.
memory 1005 is illustrated as a single component/circuitry it can be implemented as one or more separate components/circuitry some or all of which can be integrated/removable and/or can provide permanent/semi-permanent/ dynamic/cached storage.
processor 1003 is illustrated as a single component/circuitry it can be implemented as one or more separate components/circuitry some or all of which can be integrated/removable.
the processor 1003 can be a single core or multi-core processor.
references to "computer-readable storage medium”, “computer program product”, “tangibly embodied computer program” etc. or a “controller”, “computer”, “processor” etc. should be understood to encompass not only computers having different architectures such as single /multi- processor architectures and sequential (Von Neumann)/parallel architectures but also specialized circuits such as field-programmable gate arrays (FPGA), application specific circuits (ASIC), signal processing devices and other processing circuitry.
References to computer program, instructions, code etc. should be understood to encompass software for a programmable processor or firmware such as, for example, the programmable content of a hardware device whether instructions for a processor, or configuration settings for a fixed-function device, gate array or programmable logic device etc.
circuitry can refer to one or more or all of the following:
circuitry also covers an implementation of merely a hardware circuit or processor and its (or their) accompanying software and/or firmware.
circuitry also covers, for example and if applicable to the particular claim element, a baseband integrated circuit for a mobile device or a similar integrated circuit in a server, a cellular network device, or other computing or network device.
the blocks illustrated in the Figs. 2, 3 and 9 can represent steps in a method and/or sections of code in the computer program 1007.
the illustration of a particular order to the blocks does not necessarily imply that there is a required or preferred order for the blocks and the order and arrangement of the block can be varied. Furthermore, it can be possible for some blocks to be omitted.
a property of the instance can be a property of only that instance or a property of the class or a property of a sub-class of the class that includes some but not all of the instances in the class. It is therefore implicitly disclosed that a feature described with reference to one example but not with reference to another example, can where possible be used in that other example as part of a working combination but does not necessarily have to be used in that other example.
'a' or 'the' is used in this document with an inclusive not an exclusive meaning. That is any reference to X comprising a/the Y indicates that X may comprise only one Y or may comprise more than one Y unless the context clearly indicates the contrary. If it is intended to use 'a' or 'the' with an exclusive meaning then it will be made clear in the context. In some circumstances the use of 'at least one' or 'one or more' may be used to emphasis an inclusive meaning but the absence of these terms should not be taken to infer any exclusive meaning.
the presence of a feature (or combination of features) in a claim is a reference to that feature or (combination of features) itself and also to features that achieve substantially the same technical effect (equivalent features).
the equivalent features include, for example, features that are variants and achieve substantially the same result in substantially the same way.
the equivalent features include, for example, features that perform substantially the same function, in substantially the same way to achieve substantially the same result.

Landscapes

Physics & Mathematics (AREA)
Engineering & Computer Science (AREA)
Acoustics & Sound (AREA)
Signal Processing (AREA)
Stereophonic System (AREA)

EP22159713.1A 2022-03-02 2022-03-02 Rendu audio Pending EP4240026A1 (fr)

Priority Applications (1)

Application Number	Priority Date	Filing Date	Title
EP22159713.1A EP4240026A1 (fr)	2022-03-02	2022-03-02	Rendu audio

Applications Claiming Priority (1)

Application Number	Priority Date	Filing Date	Title
EP22159713.1A EP4240026A1 (fr)	2022-03-02	2022-03-02	Rendu audio

Publications (1)

Publication Number	Publication Date
EP4240026A1 true EP4240026A1 (fr)	2023-09-06

Family

ID=80625025

Family Applications (1)

Application Number	Title	Priority Date	Filing Date
EP22159713.1A Pending EP4240026A1 (fr)	2022-03-02	2022-03-02	Rendu audio

Country Status (1)

Country	Link
EP (1)	EP4240026A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
US20210037335A1 (en) *	2018-04-09	2021-02-04	Dolby International Ab	Methods, apparatus and systems for three degrees of freedom (3dof+) extension of mpeg-h 3d audio
EP3777246A1 (fr) *	2018-04-09	2021-02-17	Dolby International AB	Procédés, appareil, et systèmes pour une extension à trois degrés de liberté (3dof +) d'un audio 3d mpeg-h

2022
- 2022-03-02 EP EP22159713.1A patent/EP4240026A1/fr active Pending

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
US20210037335A1 (en) *	2018-04-09	2021-02-04	Dolby International Ab	Methods, apparatus and systems for three degrees of freedom (3dof+) extension of mpeg-h 3d audio
EP3777246A1 (fr) *	2018-04-09	2021-02-17	Dolby International AB	Procédés, appareil, et systèmes pour une extension à trois degrés de liberté (3dof +) d'un audio 3d mpeg-h

Legal Events

Date	Code	Title	Description
2023-08-04	PUAI	Public reference made under article 153(3) epc to a published international application that has entered the european phase	Free format text: ORIGINAL CODE: 0009012
2023-08-04	STAA	Information on the status of an ep patent application or granted ep patent	Free format text: STATUS: THE APPLICATION HAS BEEN PUBLISHED
2023-09-06	AK	Designated contracting states	Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR
2024-07-19	STAA	Information on the status of an ep patent application or granted ep patent	Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

Publication	Publication Date	Title
US11055057B2 (en)	2021-07-06	Apparatus and associated methods in the field of virtual reality
CN111434126B (zh)	2022-04-26	信号处理装置和方法以及程序
US20230096873A1 (en)	2023-03-30	Apparatus, methods and computer programs for enabling reproduction of spatial audio signals
US20230254659A1 (en)	2023-08-10	Recording and rendering audio signals
US9843883B1 (en)	2017-12-12	Source independent sound field rotation for virtual and augmented reality applications
US11099802B2 (en)	2021-08-24	Virtual reality
US20210343296A1 (en)	2021-11-04	Apparatus, Methods and Computer Programs for Controlling Band Limited Audio Objects
EP4240026A1 (fr)	2023-09-06	Rendu audio
US20230110257A1 (en)	2023-04-13	6DOF Rendering of Microphone-Array Captured Audio For Locations Outside The Microphone-Arrays
US20230377276A1 (en)	2023-11-23	Audiovisual rendering apparatus and method of operation therefor
US11696085B2 (en)	2023-07-04	Apparatus, method and computer program for providing notifications
EP3691298A1 (fr)	2020-08-05	Appareil méthode et programme d'ordinateur permettant la communication audio en temps réel entre utilisateurs en audio immerssion
US20240259758A1 (en)	2024-08-01	Apparatus, Methods and Computer Programs for Processing Audio Signals
EP4210351A1 (fr)	2023-07-12	Service audio spatial
WO2023118643A1 (fr)	2023-06-29	Appareil, procédés et programmes informatiques pour générer une sortie audio spatiale
GB2568726A (en)	2019-05-29	Object prioritisation of virtual content
EP4164256A1 (fr)	2023-04-12	Appareil, procédés et programmes informatiques pour traiter un contenu audio spatial
US10200807B2 (en)	2019-02-05	Audio rendering in real time
CN118042345A (zh)	2024-05-14	基于自由视角的空间音效实现方法、设备及存储介质