WO2023165800A1 - Spatial rendering of reverberation - Google Patents

Spatial rendering of reverberation Download PDF

Info

Publication number
WO2023165800A1
WO2023165800A1 PCT/EP2023/053283 EP2023053283W WO2023165800A1 WO 2023165800 A1 WO2023165800 A1 WO 2023165800A1 EP 2023053283 W EP2023053283 W EP 2023053283W WO 2023165800 A1 WO2023165800 A1 WO 2023165800A1
Authority
WO
WIPO (PCT)
Prior art keywords
reverberation
reverberator
encoded
parameters
bitstream
Prior art date
Application number
PCT/EP2023/053283
Other languages
French (fr)
Inventor
Antti Johannes Eronen
Sujeet Shyamsundar Mate
Arto Juhani Lehtiniemi
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of WO2023165800A1 publication Critical patent/WO2023165800A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/305Electronic adaptation of stereophonic audio signals to reverberation of the listening space
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K15/00Acoustics not otherwise provided for
    • G10K15/08Arrangements for producing a reverberation or echo sound
    • G10K15/12Arrangements for producing a reverberation or echo sound using electronic time-delay networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2499/00Aspects covered by H04R or H04S not otherwise provided for in their subgroups
    • H04R2499/10General applications
    • H04R2499/15Transducers incorporated in visual displaying devices, e.g. televisions, computer displays, laptops
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation
    • H04S7/304For headphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/305Electronic adaptation of stereophonic audio signals to reverberation of the listening space
    • H04S7/306For headphones

Definitions

  • the present application relates to apparatus and methods for generating and employment of spatial rendering of reverberation, but not exclusively for spatial rendering of reverberation in augmented reality and/or virtual reality apparatus.
  • Reverberation refers to the persistence of sound in a space after the actual sound source has stopped. Different spaces are characterized by different reverberation characteristics. For conveying spatial impression of an environment, reproducing reverberation perceptually accurately is important. Room acoustics are often modelled with individually synthesized early reflection portion and a statistical model for the diffuse late reverberation.
  • Figure 1 depicts an example of a synthesized room impulse response where the direct sound 101 is followed by discrete early reflections 103 which have a direction of arrival (DOA) and diffuse late reverberation 105 which can be synthesized without any specific direction of arrival.
  • DOE direction of arrival
  • the delay d1 (t) 102 in Figure 1 can be seen to denote the direct sound arrival delay from the source to the listener and the delay d2(t) 104 can denote the delay from the source to the listener for one of the early reflections (in this case the first arriving reflection).
  • One method of reproducing reverberation is to utilize a set of A/ loudspeakers (or virtual loudspeakers reproduced binaurally using a set of head-related transfer functions (HRTF)).
  • the loudspeakers are positioned around the listener somewhat evenly.
  • Mutually incoherent reverberant signals are reproduced from these loudspeakers, producing a perception of surrounding diffuse reverberation.
  • the reverberation produced by the different loudspeakers has to be mutually incoherent.
  • the reverberations can be produced using the different channels of the same reverberator, where the output channels are uncorrelated but otherwise share the same acoustic characteristics such as RT60 time and level (specifically, the diffuse-to-direct ratio or reverberant-to-direct ratio).
  • Such uncorrelated outputs sharing the same acoustic characteristics can be obtained, for example, from the output taps of a Feedback-Delay-Network (FDN) reverberator with suitable tuning of the delay line lengths, or from a reverberator based on using decaying uncorrelated noise sequences by using a different uncorrelated noise sequence in each channel.
  • FDN Feedback-Delay-Network
  • the different reverberant signals effectively have the same features, and the reverberation is typically perceived to be similar to all directions.
  • Reverberation spectrum or level can be controlled using the diffuse-to-direct ratio (DDR), which describes the ratio of the energy (or level) of reverberant sound energy to the direct sound energy (or the total emitted energy of a sound source).
  • DDR diffuse-to-direct ratio
  • an apparatus for assisting spatial rendering in at least one acoustic environment comprising means configured to: obtain at least one reverberation parameter; convert the obtained at least one reverberation parameter into at least one frequency band data; encode the at least one frequency band data; encode the at least one reverberation parameter; compare resources required to transmit the encoded at least one frequency band data and the encoded at least one reverberation parameter; generate a bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising a selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter.
  • the at least one frequency band data may be organised as octave bands.
  • the at least one frequency band data may further comprise: an index identifying a centre band frequency range; and a number of bands.
  • the means configured to generate the bitstream may be configured to generate the bitstream comprising a selection indicator configured to indicate the selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter within the bitstream.
  • the apparatus may be further configured to obtain a scene description defining a virtual scene forming at least part of the at least one acoustic environment, wherein the at least one reverberation parameter may be associated with the virtual scene.
  • the at least one reverberation parameter may be a frequency dependent reverberation parameter.
  • the resources may be one of: encoded bitrate; encoded bits; and channel capacity.
  • the means configured to select, based on the comparison, the one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter may be configured to: select the encoded at least one reverberation parameter in a high bitrate mode; and select the encoded at least one frequency band data in a low bitrate mode.
  • an apparatus for assisting spatial rendering in at least one acoustic environment comprising means configured to: obtain a bitstream, the bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising one of: an encoded at least one frequency band data and an encoded at least one reverberation parameter; decode the reverberation parameter part to generate decoded reverberation parameters; obtain reverberator parameters from the decoded reverberation parameters; initialize at least one reverberator based on the reverberator parameters; obtain at least one input audio signal associated with the at least one acoustic environment; and generate an output audio signal based on the application of the at least one reverberator to the at least one input audio signal.
  • the bitstream may further comprise at least one indicator indicating that the bitstream comprises at least one of: the encoded at least one frequency band data and the encoded at least one reverberation parameter, wherein the means configured to obtain reverberator parameters from the decoded reverberation parameters may be configured to determine the reverberation parameter part based on the indicator.
  • the means configured to obtain reverberator parameters from the decoded reverberation parameters may be configured to determine the reverberation parameter part based on the indicator and configured to: determine the bitstream comprises the at least one reverberation parameter in a high bitrate mode; and determine the bitstream comprises the at least one frequency band data in a low bitrate mode.
  • the means configured to obtain reverberator parameters from the decoded reverberation parameters may be configured to determine the reverberation parameter part further comprises an indicator indicating that the reverberator parameters are to be determined from at least one reverberation parameter encoded into a scene payload.
  • the means configured to initialize at least one reverberator based on the reverberator parameters may be configured to initialize the at least one reverberator using the at least one reverberator parameter independent on whether at least one acoustic environment is a virtual acoustic environment or an augmented reality acoustic environment.
  • a method for an apparatus for assisting spatial rendering in at least one acoustic environment comprising: obtaining at least one reverberation parameter; converting the obtained at least one reverberation parameter into at least one frequency band data; encoding the at least one frequency band data; encoding the at least one reverberation parameter; comparing resources required to transmit the encoded at least one frequency band data and the encoded at least one reverberation parameter; generating a bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising a selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter.
  • the at least one frequency band data may be organised as octave bands.
  • the at least one frequency band data may further comprise: an index identifying a centre band frequency range; and a number of bands.
  • Generating the bitstream may comprise generating the bitstream comprising a selection indicator configured to indicate the selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter within the bitstream.
  • the method may further comprise obtaining a scene description defining a virtual scene forming at least part of the at least one acoustic environment, wherein the at least one reverberation parameter is associated with the virtual scene.
  • the at least one reverberation parameter may be a frequency dependent reverberation parameter.
  • the resources may be one of: encoded bitrate; encoded bits; and channel capacity.
  • Generating the bitstream comprising the reverberation parameter part comprising the selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter may comprise: selecting the encoded at least one reverberation parameter in a high bitrate mode; and selecting the encoded at least one frequency band data in a low bitrate mode.
  • a method for an apparatus for assisting spatial rendering in at least one acoustic environment comprising: obtaining a bitstream, the bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising one of: an encoded at least one frequency band data and an encoded at least one reverberation parameter; decoding the reverberation parameter part to generate decoded reverberation parameters; obtaining reverberator parameters from the decoded reverberation parameters; initializing at least one reverberator based on the reverberator parameters; obtaining at least one input audio signal associated with the at least one acoustic environment; and generating an output audio signal based on the application of the at least one reverberator to the at least one input audio signal.
  • the bitstream may further comprise at least one indicator indicating that the bitstream comprises at least one of: the encoded at least one frequency band data and the encoded at least one reverberation parameter, wherein obtaining reverberator parameters from the decoded reverberation parameters may comprise determining the reverberation parameter part based on the indicator.
  • Obtaining reverberator parameters from the decoded reverberation parameters may comprise determining the reverberation parameter part based on the indicator, wherein determining the reverberation parameter part based on the indicator may comprise: determining the bitstream comprises the at least one reverberation parameter in a high bitrate mode; and determining the bitstream comprises the at least one frequency band data in a low bitrate mode.
  • Obtaining reverberator parameters from the decoded reverberation parameters may comprise determining the reverberation parameter part comprising an indicator indicating that the reverberator parameters are to be determined from at least one reverberation parameter encoded into a scene payload.
  • Initializing at least one reverberator based on the reverberator parameters may comprise initializing the at least one reverberator using the at least one reverberator parameter independent on whether at least one acoustic environment is a virtual acoustic environment or an augmented reality acoustic environment.
  • an apparatus for assisting spatial rendering in at least one acoustic environment comprising at least one processor and at least one memory including a computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: obtain at least one reverberation parameter; convert the obtained at least one reverberation parameter into at least one frequency band data; encode the at least one frequency band data; encode the at least one reverberation parameter; compare resources required to transmit the encoded at least one frequency band data and the encoded at least one reverberation parameter; generate a bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising a selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter.
  • the at least one frequency band data may be organised as octave bands.
  • the at least one frequency band data may further comprise: an index identifying a centre band frequency range; and a number of bands.
  • the apparatus caused to generate the bitstream may be caused to generate the bitstream comprising a selection indicator configured to indicate the selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter within the bitstream.
  • the apparatus may be further caused to obtain a scene description defining a virtual scene forming at least part of the at least one acoustic environment, wherein the at least one reverberation parameter may be associated with the virtual scene.
  • the at least one reverberation parameter may be a frequency dependent reverberation parameter.
  • the resources may be one of: encoded bitrate; encoded bits; and channel capacity.
  • the apparatus caused to select, based on the comparison, the one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter may be caused to: select the encoded at least one reverberation parameter in a high bitrate mode; and select the encoded at least one frequency band data in a low bitrate mode.
  • an apparatus for assisting spatial rendering in at least one acoustic environment comprising at least one processor and at least one memory including a computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: obtain a bitstream, the bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising one of: an encoded at least one frequency band data and an encoded at least one reverberation parameter; decode the reverberation parameter part to generate decoded reverberation parameters; obtain reverberator parameters from the decoded reverberation parameters; initialize at least one reverberator based on the reverberator parameters; obtain at least one input audio signal associated with the at least one acoustic environment; and generate an output audio signal based on the application of the
  • the bitstream may further comprise at least one indicator indicating that the bitstream comprises at least one of: the encoded at least one frequency band data and the encoded at least one reverberation parameter, wherein the apparatus caused to obtain reverberator parameters from the decoded reverberation parameters may be caused to determine the reverberation parameter part based on the indicator.
  • the apparatus caused to obtain reverberator parameters from the decoded reverberation parameters may be caused to determine the reverberation parameter part based on the indicator and caused to: determine the bitstream comprises the at least one reverberation parameter in a high bitrate mode; and determine the bitstream comprises the at least one frequency band data in a low bitrate mode.
  • the apparatus caused to obtain reverberator parameters from the decoded reverberation parameters may be caused to determine the reverberation parameter part further comprises an indicator indicating that the reverberator parameters are to be determined from at least one reverberation parameter encoded into a scene payload.
  • the apparatus caused to initialize at least one reverberator based on the reverberator parameters may be caused to initialize the at least one reverberator using the at least one reverberator parameter independent on whether at least one acoustic environment is a virtual acoustic environment or an augmented reality acoustic environment.
  • an apparatus for assisting spatial rendering in at least one acoustic environment comprising: obtaining circuitry configured to obtain at least one reverberation parameter; converting circuitry configured to convert the obtained at least one reverberation parameter into at least one frequency band data; encoding circuitry configured to encode the at least one frequency band data; encoding circuitry configured to encode the at least one reverberation parameter; comparing circuitry configured to compare resources required to transmit the encoded at least one frequency band data and the encoded at least one reverberation parameter; generating circuitry configured to generate a bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising a selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter.
  • an apparatus for assisting spatial rendering in at least one acoustic environment comprising: obtaining circuitry configured to obtain a bitstream, the bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising one of: an encoded at least one frequency band data and an encoded at least one reverberation parameter; decoding circuitry configured to decode the reverberation parameter part to generate decoded reverberation parameters; obtaining circuitry configured to obtain reverberator parameters from the decoded reverberation parameters; initializing circuitry configured to initialize at least one reverberator based on the reverberator parameters; obtaining circuity configured to obtain at least one input audio signal associated with the at least one acoustic environment; and generating circuitry configured to generate an output audio signal based on the application of the at least one reverber
  • a computer program comprising instructions [or a computer readable medium comprising program instructions] for causing an apparatus, for assisting spatial rendering in at least one acoustic environment, to perform at least the following: obtain at least one reverberation parameter; convert the obtained at least one reverberation parameter into at least one frequency band data; encode the at least one frequency band data; encode the at least one reverberation parameter; compare resources required to transmit the encoded at least one frequency band data and the encoded at least one reverberation parameter; generate a bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising a selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter.
  • a computer program comprising instructions [or a computer readable medium comprising program instructions] for causing an apparatus, for assisting spatial rendering in at least one acoustic environment, to perform at least the following: obtain a bitstream, the bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising one of: an encoded at least one frequency band data and an encoded at least one reverberation parameter; decode the reverberation parameter part to generate decoded reverberation parameters; obtain reverberator parameters from the decoded reverberation parameters; initialize at least one reverberator based on the reverberator parameters; obtain at least one input audio signal associated with the at least one acoustic environment; and generate an output audio signal based on the application of the at least one reverberator to the at least one input
  • a non-transitory computer readable medium comprising program instructions for causing an apparatus, for assisting spatial rendering in at least one acoustic environment, to perform at least the following: obtain at least one reverberation parameter; convert the obtained at least one reverberation parameter into at least one frequency band data; encode the at least one frequency band data; encode the at least one reverberation parameter; compare resources required to transmit the encoded at least one frequency band data and the encoded at least one reverberation parameter; generate a bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising a selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter.
  • a non-transitory computer readable medium comprising program instructions for causing an apparatus, for assisting spatial rendering in at least one acoustic environment, to perform at least the following: obtain a bitstream, the bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising one of: an encoded at least one frequency band data and an encoded at least one reverberation parameter; decode the reverberation parameter part to generate decoded reverberation parameters; obtain reverberator parameters from the decoded reverberation parameters; initialize at least one reverberator based on the reverberator parameters; obtain at least one input audio signal associated with the at least one acoustic environment; and generate an output audio signal based on the application of the at least one reverberator to the at least one input audio signal.
  • an apparatus for assisting spatial rendering in at least one acoustic environment, comprising: means for obtaining at least one reverberation parameter; means for converting the obtained at least one reverberation parameter into at least one frequency band data; means for encoding the at least one frequency band data; means for encoding the at least one reverberation parameter; means for comparing resources required to transmit the encoded at least one frequency band data and the encoded at least one reverberation parameter; means for generating a bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising a selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter.
  • an apparatus for assisting spatial rendering in at least one acoustic environment, comprising: means for obtaining a bitstream, the bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising one of: an encoded at least one frequency band data and an encoded at least one reverberation parameter; means for decoding the reverberation parameter part to generate decoded reverberation parameters; means for obtaining reverberator parameters from the decoded reverberation parameters; means for initializing at least one reverberator based on the reverberator parameters; means for obtaining at least one input audio signal associated with the at least one acoustic environment; and means for generating an output audio signal based on the application of the at least one reverberator to the at least one input audio signal.
  • a computer readable medium comprising program instructions for causing an apparatus, for assisting spatial rendering in at least one acoustic environment, to perform at least the following: obtain at least one reverberation parameter; convert the obtained at least one reverberation parameter into at least one frequency band data; encode the at least one frequency band data; encode the at least one reverberation parameter; compare resources required to transmit the encoded at least one frequency band data and the encoded at least one reverberation parameter; generate a bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising a selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter.
  • a computer readable medium comprising program instructions for causing an apparatus, for assisting spatial rendering in at least one acoustic environment, to perform at least the following: obtain a bitstream, the bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising one of: an encoded at least one frequency band data and an encoded at least one reverberation parameter; decode the reverberation parameter part to generate decoded reverberation parameters; obtain reverberator parameters from the decoded reverberation parameters; initialize at least one reverberator based on the reverberator parameters; obtain at least one input audio signal associated with the at least one acoustic environment; and generate an output audio signal based on the application of the at least one reverberator to the at least one input audio signal.
  • An apparatus comprising means for performing the
  • An apparatus configured to perform the actions of the method as described above.
  • a computer program comprising program instructions for causing a computer to perform the method as described above.
  • a computer program product stored on a medium may cause an apparatus to perform the method as described herein.
  • An electronic device may comprise apparatus as described herein.
  • a chipset may comprise apparatus as described herein.
  • Embodiments of the present application aim to address problems associated with the state of the art.
  • Figure 1 shows a model of room acoustics and the room impulse response
  • Figure 2 shows an example environment within which embodiments can be implemented showing an audio scene with an audio portal or acoustic coupling
  • FIG. 3 shows schematically an example apparatus within which some embodiments may be implemented
  • Figure 4 shows a flow diagram of the operation of the example apparatus as shown in Figure 3;
  • Figure 5 shows schematically an example reverberator controller as shown in Figure 3 according to some embodiments
  • Figure 6 shows a flow diagram of the operation of the example reverberator controller as shown in Figure 5;
  • Figure 7 shows schematically an example reverberator output signals spatialization controller as shown in Figure 3 according to some embodiments
  • Figure 8 shows a flow diagram of the operation of the example reverberator output signals spatialization controller as shown in Figure 7;
  • Figure 9 shows schematically an example reverberator output signals spatializer as shown in Figure 3 according to some embodiments;
  • Figure 10 shows a flow diagram of the operation of the example Reverberator output signals spatializer as shown in Figure 9;
  • Figure 11 shows schematically an example FDN reverberator as shown in Figure 3 according to some embodiments
  • Figure 12 shows schematically an example bitstream generator according to some embodiments
  • Figure 13 shows a flow diagram of the operation of the example feedback filter designer as shown in Figure 12;
  • Figure 14 shows schematically an example apparatus with transmission and/or storage within which some embodiments can be implemented.
  • Figure 15 shows an example device suitable for implementing the apparatus shown in previous figures.
  • suitable apparatus and possible mechanisms for parameterizing and rendering audio scenes with reverberation can be part of a spatial audio rendering (also known as spatial rendering).
  • reverberation can be rendered using, e.g., a Feedback- Delay-Network (FDN) reverberator with a suitable tuning of delay line lengths.
  • FDN Feedback- Delay-Network
  • An FDN allows to control the reverberation times (RT60) and the energies of different frequency bands individually. Thus, it can be used to render the reverberation based on the characteristics of the room or modelled space. The reverberation times and the energies of the different frequencies are affected by the frequency-dependent absorption characteristics of the room.
  • the reverberation spectrum or level can be controlled using a diffuse-to-direct ratio, which describes the ratio of the energy (or level) of reverberant sound energy to the direct sound energy (or the total emitted energy of a sound source).
  • a diffuse-to-direct ratio which describes the ratio of the energy (or level) of reverberant sound energy to the direct sound energy (or the total emitted energy of a sound source).
  • DDR value indicates the ratio of the diffuse (reverberant) sound energy to the total emitted energy of a sound source.
  • RDR refers to reverberant-to-direct ratio and which can be measured from an impulse response.
  • the RDR can be calculated by
  • the logarithmic RDR can be obtained as 10*log10(RDR).
  • a virtual environment for virtual reality (VR) or a real physical environment for augmented reality (AR) there can be several acoustic environments, each with their own reverberation parameters which can be different in different acoustic environments.
  • FIG. 2 An example of such an environment is shown in Figure 2.
  • the audio scene comprising a first acoustic environment AEi 203, a second acoustic environment AE2 205 and outdoor 201 .
  • an acoustic coupling AC1 207 between the first acoustic environment AE1 203 and a second acoustic environment AE2 205.
  • the sound or audio sources 210 are located within the second acoustic environment AE2205.
  • the audio sources 210 comprise a first audio source, a drummer, Si 2103 and a second audio source, a guitarist, S2 2102.
  • the Listener 202 is further shown moving through the audio scene and is shown in the first acoustic environment AE1 203 at position Pi 200i, in the second acoustic environment AE2 205 at position P2 2OO2 and outdoor 201 at position P3 2OO3.
  • the encoder converts reverberation parameters of the acoustic environment into reverberator parameters for the FDN reverberator and then creates a bitstream of the optimized reverberator parameters. While the benefit of this approach is that encoder optimization can be used to provide optimal reverberator parameters based on the reverberation characteristics of the virtual environment, the disadvantage is that the bitstream size is not as small as possible. Furthermore there are known methods where high-perceptual-quality reverberation can be synthesized for physical environments in augmented reality, if reverberation parameters are obtained only at the Tenderer. However, these methods currently lack the possibility of obtaining reverberation parameters from the bitstream.
  • GB2200043.4 Control of activating/prioritizing reverberators is described in GB2200043.4, which specifically discusses a mechanism of prioritizing reverberators and activating only a subset of them based on the prioritization.
  • GB2200335.4 furthermore describes a method to adjust reverberation level especially in augmented reality (AR) rendering.
  • W02021186107 describes late reverb modelling from acoustic environment information using FDNs and specifically describes designing a DDR filter to adjust the late reverb level based on input DDR data.
  • GB2020673.6 describes a method and apparatus for fusion of virtual scene description in bitstream and listener space description for 6DoF rendering and specifically for late reverberation modelling for immersive audio scenes where the acoustic environment is a combination of content creator specified virtual scene as well as listener-consumption-space influenced listening space parameters.
  • this background describes a method for rendering in AR audio scene comprising virtual scene description acoustic parameters and real- world listening-space acoustic parameters.
  • GB2101657.1 describes late reverb rendering filter parameters are derived for a low latency Tenderer application.
  • GB21 16093.2 discusses reproduction of diffuse reverberation where a method is proposed that enables the reproduction of rotatable diffuse reverberation where the characteristics of the reverberation may be directionally dependent (i.e., having different reverberation characteristics in different directions) using a number of processing paths (at least 3, typically 6-20 paths) (virtual) multichannel signals by determining at least two panning gains based on a target direction and the positions of the (virtual) loudspeakers in a (virtual) loudspeaker set (e.g., using VBAP), obtaining mutually incoherent reverberant signals for each of the determined gains (e.g., using outputs of two reverberators tuned to produce mutually incoherent outputs, or using decorrelators), applying the determined gains for the corresponding obtained reverberant signals in order to obtain reverberant multichannel signals, combining the reverberant multichannel signals from the different processing paths, and reproducing the combined
  • the concept as discussed in the embodiments herein relates to reproduction of late reverberation in 6DoF audio rendering systems based on acoustic scene reverberation parameters where the solution is configured to transmit compact reverberation parameters and to convert them to reverberator parameters in a renderer to achieve low reverberation parameter bitstream size for low storage and network bandwidth requirements while still maintaining spatial rendering with high perceptual quality to achieve an immersive audio experience.
  • the apparatus and methods relate to reproduction of late reverberation in 6DoF audio rendering systems based on acoustic scene reverberation parameters where the solution is configured to transmit compact reverberation parameters and to convert them to reverberator parameters in a renderer to achieve low reverberation parameter bitstream size suitable for low storage requirements and low network bandwidth requirements while still maintaining spatial rendering with high perceptual quality to achieve an immersive audio experience.
  • apparatus and methods configured to implement the following operations (within an encoder): obtain frequency-dependent reverberation parameters associated with a virtual acoustic environment; convert the frequency-dependent reverberation parameters to frequency band data; encode the frequency band data into encoded frequency band data; encode the frequency-dependent reverberation parameters into encoded reverberation parameters; compare the bitrate required to transmit 1 ) the encoded frequency band data or 2) the encoded reverberation parameters; encode into bitstream compact reverberation parameters which are either 1 ) the encoded frequency band data or 2) the encoded reverberation parameters based on the comparison and associate them with the acoustic environment identifier and its dimensions.
  • the frequency bands are shown as octave bands.
  • the division of frequency bands can be any suitable division.
  • Such an example can be spacing for 4, 6, 8, and 10 frequency bands.
  • apparatus and methods configured to implement the following operations (within a Tenderer): obtain from a bitstream the compact reverberation parameters which are either encoded frequency band data or encoded reverberation parameters, an identifier and dimensions of an acoustic environment; decode the received compact reverberation parameters; convert the decoded compact reverberation parameters into reverberator parameters;
  • the parameters are encoded into a reverberation bitstream payload.
  • the parameters are not explicitly encoded into a reverberation bitstream payload but the reverberation bitstream payload contains a bit implying that reverberation parameters have been encoded into a scene payload.
  • the initialization of the reverberator using the parameters is implemented in the same manner as when a reverberator is initialized for rendering reverberation for an augmented reality (physical) scene and when reverberation parameters for the augmented reality scene are received directly in the Tenderer.
  • the reverberator used for rendering reverberation using the compact reverberation parameters can also be used for rendering reverberation for virtual acoustic environments when reverberator parameters are received in the bitstream.
  • the apparatus and methods can be configured to encode the compact reverberation parameters into bitstream when operating in a low bitrate mode.
  • the reverberator used for rendering reverberation using the compact reverberation parameters differs from the reverberator used for rendering reverberation for augmented reality physical scenes or virtual reality scenes when the solution is configured to not operate in a low bitrate mode.
  • an indication in the reverb payload bitstream can be used to indicate to the Tenderer to use reverberation parameters from the audio scene description in the bitstream.
  • the indication in the reverb payload bitstream to utilize reverberation parameters signals the expectation to perform reverberation auralization during rendering.
  • ISO/IEC 23090-4 MPEG-I Audio Phase 2 will normatively standardize the bitstream and the Tenderer processing. There will also be an encoder reference implementation, but it can be modified later on as long as the output bitstream follows the normative spec. This allows improving the codec quality also after the standard has been finalized with novel encoder implementations.
  • the normative bitstream can contain encoded octave band data or encoded frequency-dependent reverberation data for each acoustic environment.
  • the encoded frequency-dependent reverberation data can be under the reverb payload or under the scene payload; and the normative Tenderer can decode the bitstream to obtain scene and compact reverberation parameters, decode and map these to reverberator parameters, and render reverberated signal using the reverberator.
  • the apparatus can in some embodiments be part of a spatial rendering system as described later on.
  • the input to the system of apparatus is scene and reverberator parameters 300, listener pose parameters 302 and audio signal 306.
  • the system of apparatus generates as an output, a reverberated signal 314 (e.g. binauralized with head-related- transfer-function (HRTF) filtering for reproduction to headphones, or panned with Vector-Base Amplitude Panning (VBAP) for reproduction to loudspeakers).
  • HRTF head-related- transfer-function
  • VBAP Vector-Base Amplitude Panning
  • the apparatus comprises a reverberator controller 301 .
  • the reverberator controller 301 is configured to obtain or receive the scene and reverberation parameters 300.
  • the scene and reverberation parameters are in the form of a bitstream which contains enclosing room geometry and parameters describing the RT60 times and reverberant-to-direct ratio (RDR) for the enclosure (or Acoustic Environment).
  • the reverberator controller 301 is configured to obtain the bitstream, convert the encoded reverberation parameters into parameters for a reverberator (reverberator parameters), and pass the reverberator parameters to initialize at least one FDN reverberator to reproduce reverberation according to the reverberator parameters.
  • the reverberator parameters 304 can then be passed to the reverberator(s) 305.
  • the apparatus comprises a reverberator or reverberators 305.
  • the reverberator(s) are configured to receive the reverberator parameters 304 and the audio signal s in (t) (where t is time) 306.
  • the reverberator(s) 305 are configured to reverberate the audio signal 306 based on the reverberator parameters 304.
  • the reverberators 305 in some embodiments output the resulting reverberator output signals s rev r (J, t) 310 (where j is the output audio channel index and r the reverberator index). There are several reverberators, each of which produce several output audio signals. These reverberator output signals 310 are input into a reverberator output signals spatializer 307.
  • the apparatus comprises a reverberator output signals spatialization controller 303.
  • the reverberator output signals spatialization controller 303 is configured to receive the scene and reverberation parameters 300 and the listener pose parameters 302 and generate reverberator output channel positions 312.
  • the reverberator output channel positions 312 in some embodiments indicates cartesian coordinates which are to be used when rendering each of the signals in s rev r (j, t). In some other embodiments other representations (or other co-ordinate system) such as polar coordinates can be used.
  • the output channel positions can be virtual loudspeaker positions (or positions in a space which are unrelated to an actual or physical loudspeaker but can be used to generate a suitable spatial audio signal format such as binaural audio signals), or actual loudspeaker positions (for example in multi-speaker systems such as 5.1 , 7.2 channel systems).
  • the apparatus comprises a reverberator output signals spatializer 307.
  • the reverberator output signals spatializer 307 is configured to obtain the reverberator output signals 310 and the reverberator output channel positions 312 and based on these produces an output signal suitable for reproduction via headphones or via loudspeakers.
  • the reverberator output signals spatializer 307 is configured to render each reverberator output into a desired output format, such as binaural, and then sum the signals to produce the output reverberated signal 314.
  • the reverberator output signals spatializer 307 can further use HRTF filtering to render the reverberator output signals 310 in their desired positions indicated by the reverberator output channel positions 312.
  • This reverberation in the reverberated signals 314 is therefore based on the scene and reverberator parameters 300 as was desired and further considers listener pose parameters 302.
  • FIG. 4 With respect to Figure 4 is shown a flow diagram showing the operations of example apparatus shown in Figure 3 according to some embodiments.
  • the method may comprise obtaining scene and reverberator parameters and obtaining listener pose parameters is shown in Figure 4 by step 401 .
  • reverberator output signal spatialization controls are determined based on the obtaining scene and reverberator parameters and listener pose parameters as shown in Figure 4 by step 409.
  • the reverberator spatialization based on the reverberator output signal spatialization controls can then be applied to the reverberated audio signals from the reverberators to generate output reverberated audio signals as shown in Figure 4 by step 411 .
  • the reverberator controller 302 is configured to provide reverberator parameters to the reverberator(s).
  • the reverberation controller 302 is configured to receive encoded reverberation parameters which describe the reverberation characteristics in each acoustic environment (each acoustic environment contains at least one set of reverberation parameters).
  • the reverberation controller can be configured to decode the obtained reverberation parameters and convert the decoded parameters into concrete reverberator parameters.
  • the input to the apparatus can be configured to provide the desired RT60 times per specified frequencies k denoted as RT60(k) and DDR values DDR(k).
  • RDR logarithm logRDR(k).
  • the reverberator controller 301 comprises a reverberator payload selector 501.
  • the reverberator payload selector 501 is configured to determine how reverberation parameters for acoustic environments are represented.
  • an indicator or flag from the bitstream provides the information to make the selection.
  • the reverberation parameters are encoded in the scene payload and the controller is configured to obtain the reverberation parameters from a scene payload section.
  • the reverberation parameters are encoded as frequency data when carried in the scene payload.
  • the reverberation parameters are encoded either as octave band data (without the octave centre frequencies) or as frequency-dependent data with combinations of frequency-control value.
  • the payload selector 501 is then configured to control the decoder 505.
  • the reverberator controller 301 comprises a reverberator method type selector 503.
  • the reverberator method type selector 503 is configured to determine the method type for the reverberator. For example the parameters of the reverberator can be adjusted so that they produce reverberation having characteristics matching the desired RT60(k) and DDR(k) for the acoustic environment to which this FDN reverberator is to be associated. For example whether the acoustic environment/scene is a virtual reality (VR) scene or augmented reality (AR) scene.
  • VR virtual reality
  • AR augmented reality
  • AR augmented reality
  • the reverberator controller 301 comprises a decoder 505, the decoder 505 is controlled based on the outputs of the reverberator payload selector 501 and the reverberator method type selector 503.
  • the decoder when reverberation parameters are encoded as frequency data, the decoder is configured to revert the possible encoding.
  • the decoder is configured to implement Huffman decoding and reverting the differential encoding to obtain frequency dependent RT60(k) and logRDR(k) data.
  • the decoder 505 when the reverberation parameters are encoded as octave band data, is configured to decode the parameters by reverting the Huffman coding and differential encoding applied on the octave band values. In this case no band centre frequencies are transmitted as they are known by the renderer.
  • decoded values directly correspond to RT60(b) and logRDR(b) where b is the octave band index (in other words the ‘frequency data’ mapping described hereafter is not employed).
  • the reverberator controller 301 further comprises a mapper 507 configured to map the frequency dependent data into octave band data RT60(b) and logRDR(b) where b is the octave band index.
  • the values DDR(k) are mapped to a set of frequency bands b, which can be, e.g., either octave or third octave bands. Mapping of input DDR values to frequency bands b is done by obtaining for each band b the value from the input DDR response DDR(k) at the closest frequency k to the center frequency of band b. Other choices such as Bark bands or frequency bands with linearly-spaced center frequencies are also possible. This results in the frequency mapped DDR values DDR(b) and RT60 values RT60(b).
  • the reverberator controller 301 comprises a Filter parameter determiner 509 configured to convert the bandwise RT60(b) and logRDR(b) into reverberator parameters.
  • the reverberator parameters can, in some embodiments, comprise the coefficients of each attenuation filter GEQd, feedback matrix coefficients A, and lengths mid for D delay lines.
  • each attenuation filter GEQd is a graphic EQ filter using M biquad HR band filters.
  • the first operation is one of obtaining the bitstream containing scene and reverberation parameters as shown in Figure 6 by step 601.
  • the next operation is one of obtaining the encoded reverberation parameters from the scene payload as shown in Figure 6 by step 611 . Then having obtained the encoded reverberation parameters they can be decoded to obtain frequency data as shown in Figure 6 by step 613.
  • the obtained frequency data can then be mapped to obtain octave band data as shown in Figure 6 by step 615.
  • the octave band data can then be mapped to control gain data as shown in Figure 6 by step 617.
  • control gain data can then be used to obtain parameters for at least one graphic equalization filter for the reverberator as shown in Figure 6 by step 619.
  • the next operation is one of determining the reverberation method type as shown in Figure 6 by step 605.
  • the encoded reverberation parameters can be obtained from the reverb payload as shown in Figure 6 by step 607. Then the method can pass to step 613 of decoding the encoded reverberation parameters to obtain frequency data as described above.
  • the encoded reverberation parameters can be obtained from decoding the encoded octave band data to obtain the octave band data as shown in Figure 6 step 609. Then the method can pass to step 617 of mapping the octave band data to generate control gain data as described above.
  • the method can pass directly to the operation of obtaining the reverberator parameters as shown in Figure 6 by step 621 .
  • reverberator 305 As shown schematically in Figure 11 as a FDN (Feedback Delay Network) configuration.
  • the reverberator 305 which is enabled or configured to produce reverberation whose characteristics match the room parameters. There may be several of such reverberators, each parameterized based on the reverberation characteristics of an acoustic environment.
  • An example reverberator implementation comprises a feedback delay network (FDN) reverberator and DDR control filter which enables reproducing reverberation having desired frequency dependent RT60 times and levels.
  • FDN feedback delay network
  • the room (or reverberation) parameters are used to adjust the FDN reverberator parameters such that it produces the desired RT60 times and levels.
  • An example of a level parameter can be the direct- to-diffuse-ratio (DDR) (or the diffuse-to-total energy ratio as used in MPEG-I).
  • DDR direct- to-diffuse-ratio
  • the output from the FDN reverberator are the reverberated audio signals which for binaural headphone reproduction are then reproduced into two output signals and for loudspeaker output means typically more than two output audio signals. Reproducing several outputs such as 15 FDN delay line outputs to binaural output can be done, for example, via HRTF filtering.
  • FIG 11 shows an example FDN reverberator in further detail and which can be used to produce D uncorrelated output audio signals.
  • each output signal can be rendered at a certain spatial position around the listener for an enveloping reverb perception.
  • the example FDN reverberator is configured such that the reverberation parameters are processed to generate coefficients GEQd (GEQi, GEQ2, ... GEQD) of each attenuation filter 1161 , feedback matrix 1157 coefficients A, lengths mid (m-i, m2, ... mo) for D delay lines 1159 and DDR energy ratio control filter 1153 coefficients GEQddr.
  • the example FDN reverberator 305 thus shows a D-channel output, by providing the output from each FDN delay line as a separate output.
  • each attenuation filter GEQd 1161 is implemented as a graphic EQ filter using M biquad HR band filters.
  • M the parameters of each graphic EQ comprise the feedforward and feedback coefficients for biquad HR filters, the gains for biquad band filters, and the overall gain.
  • the reverberator uses a network of delays 1159, feedback elements (shown as attenuation filters 1161 , feedback matrix 1157 and combiners 1155 and output gain 1163) to generate a very dense impulse response for the late part.
  • Input samples 1751 are input to the reverberator to produce the reverberation audio signal component which can then be output.
  • the FDN reverberator comprises multiple recirculating delay lines.
  • the unitary matrix A 1157 is used to control the recirculation in the network.
  • Attenuation filters 1161 which may be implemented in some embodiments as graphic EQ filters implemented as cascades of second-order-section HR filters can facilitate controlling the energy decay rate at different frequencies.
  • the filters 1161 are designed such that they attenuate the desired amount in decibels at each pulse pass through the delay line and such that the desired RT60 time is obtained.
  • each attenuation filter GEQd is a graphic EQ filter using M biquad HR band filters.
  • each graphic EQ comprises the feedforward b and feedback a coefficients for 10 biquad HR filters, the gains for biquad band filters, and the overall gain.
  • a length mid for the delay line d can be determined based on virtual room dimensions.
  • a shoebox shaped room can be defined with dimensions xDim, yDim, zDim.
  • the input to the Tenderer is an AR scene with a listening space description file the dimensions are obtained from the listening space description.
  • a shoebox can be fit inside the room and the dimensions of the fitted shoebox can be utilized for obtaining the delay line lengths.
  • the dimensions can be obtained as three longest dimensions in the non-shoebox shaped room, or other suitable method.
  • Such dimensions can also be obtained from a mesh if the bounding box is provided as a mesh.
  • the dimensions can further be converted to modified dimensions of a virtual room or enclosure having the same volume as the input room or enclosure. For example, the ratios 1 , 1.3, and 1.9 can be used for the converted virtual room dimensions.
  • the enclosure vertices are obtained from the bitstream and the dimensions can be calculated, along each of the axes x, y, z, by the difference of the maximum and minimum value of the vertices. Dimensions can be calculated the same way when the input is an AR scene to be rendered with a listening space description with the difference that the enclosure vertices are obtained from the listening space description and not from the bitstream.
  • the delays can in some embodiments be set proportionally to standing wave resonance frequencies in the virtual room or physical room.
  • the delay line lengths mid can further be made mutually prime.
  • the attenuation filter coefficients in the delay lines can furthermore be adjusted so that a desired amount in decibels of attenuation happens at each signal recirculation through the delay line so that the desired RT60(k) time is obtained. This is done in a frequency specific manner to ensure the appropriate rate of decay of signal energy at specified frequencies k.
  • the attenuation filters are designed as cascade graphic equalizer filters as described in V. Valimaki and J. Liski, “Accurate cascade graphic equalizer,” IEEE Signal Process. Lett., vol. 24, no. 2, pp. 176-180, Feb. 2017, for each delay line.
  • the design procedure outlined takes as input a set of command gains at octave bands.
  • Reverberation ratio parameters can refer to the diffuse-to-total energy ratio (DDR) or reverberant-to-direct ratio (RDR) or other equivalent representation.
  • DDR diffuse-to-total energy ratio
  • RDR reverberant-to-direct ratio
  • the ratio parameters can be equivalently represented on a linear scale or logarithmic scale.
  • a filter is designed in the step such that, when the filter is applied to the input data of the FDN reverberator, the output reverberation is configured to have the desired energy ratio defined by the DDR(k).
  • the input to the design procedure can in some embodiments be the DDR values DDR(k ⁇ ).
  • the values can be converted to linear RDR values as
  • the values When receiving logarithmic RDR values logRDR(b), the values can be converted to linear RDR values as
  • RDR(b) i Q( /o 9 RDR ( b ) / i°)
  • the GEQDDR matches the reverberator spectrum energy to the target spectrum energy.
  • an estimate of the RDR of the reverberator output and the target RDR is obtained.
  • the RDR of the reverberator output can be obtained by rendering a unit impulse through the reverberator using the first reverberator parameters (that is, the parameters of the FDN without the GEQDDR filter the parameters of which are being obtained) and measuring the energy of the reverberator output and energy of the unit impulse and calculating the ratio of these energies.
  • a unity impulse input is generated where the first sample value is 1 and the length of the zero tail is long enough.
  • the length of the zero tail is equal max(RT60(b)) plus the t predelay in samples.
  • the monophonic output of the reverberator is of interest so the filter is configured to sum over the delay lines / to obtain the reverberator output s rev (t) as a function of time t.
  • a long FFT (of length NFFT) is calculated over s rev (t) and its absolute value is obtained as
  • kk are the FFT bin indices.
  • the positive half spectral energy density is obtained as
  • the energy of a unit impulse can be calculated or obtained analytically and can be denoted as Su(kk).
  • Band energies are calculated of both the positive half spectral energy density of the reverberator S(kk) and the positive half spectral energy density of the unit impulse Su(kk). Band energies can be calculated as where b iow and b high are the lowest and highest bin index belonging to band b, respectively. The band bin indices can be obtained by comparing the frequencies of the bins to the lower and upper frequencies of each band.
  • the reproduced RDR rev (b) of the reverberator output at the frequency band b is obtained as
  • RDR rev (b) S(b ⁇ /Su(b)'
  • GontrolGain(b) 20*log10(ddrFilterTargetResponse(b)) is input as the target response for the graphic equalizer design routine in Valimaki, Ramd, “Neurally Controlled Graphic Equalizer”, IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 27, NO. 12, DECEMBER 2019.
  • the DDR filter target response (control gains for the graphic EQ design routine) can also be obtained directly in the logarithmic domain as The first reverberator parameters and the parameters of the Reverberator DDR control filter GEQDDR together form the reverberator parameters.
  • the reverberator output signals spatialization controller 303 is configured to receive the scene and reverberator parameters 300 and listener pose parameters 302.
  • the reverberator output signals spatialization controller 303 is configured to use the listener pose parameters 302 and scene and reverberator parameters 300 to determine the acoustic environment where the listener currently is and provide that reverberator output channels such positions which surround the listener. This means that the reverberation when inside an acoustic enclosure, caused by that acoustic enclosure, is rendered as a diffuse signal enveloping the listener.
  • the reverberator output signals spatialization controller 303 comprises a listener acoustic environment determiner 701 configured to obtain the scene and reverberator parameters 300 and listener pose parameters 302 and determine the listener acoustic environment.
  • the reverberator output signals spatialization controller 303 comprises a listener reverberator corresponding to listener acoustic environment determiner 703 which is further configured to determine listener reverberator corresponding to listener acoustic environment information.
  • the reverberator output signals spatialization controller 303 comprises a head tracked output positions for the listener reverberator provider 705 configured to provide or determine the head tracked output positions for the listener and generate the output channel position 312.
  • the output of the reverberator output signals spatialization controller 303 is thus the reverberator output channel positions 312.
  • the method comprises determining listener acoustic environment as shown in Figure 8 by step 805. Having determined this then determine listener reverberator corresponding to listener acoustic environment as shown in Figure 8 by step 807.
  • the method comprises providing head tracked output positions for the listener reverberator as shown in Figure 8 by step 809.
  • step 811 outputting reverberator output channel positions as shown in Figure 8 by step 811 .
  • the reverberator corresponding to the acoustic environment where the user currently is is rendered by the reverberator output signals spatializer 307 as an immersive audio signal surrounding the user. That is, the signals in s rev r (j, t) corresponding to the listener environment are rendered as point sources surrounding the listener.
  • the reverberator output signals spatializer 307 is configured to receive the positions 312 from the reverberator output signals spatialization controller 303. Additionally is received the reverberator output signals 310 from the reverberators 305.
  • the reverberator output signals spatializer comprises a head-related transfer function (HRTF) filter 901 which is configured to render each reverberator output into a desired output format (such as binaural).
  • HRTF head-related transfer function
  • the reverberator output signals spatializer comprises a output channels combiner 903 which is configured to combine (or sum) the signals to produce the output reverberated signal 314.
  • the reverberator output signals spatializer 307 can use HRTF filtering to render the reverberator output signals in their desired positions indicated by reverberator output channel positions.
  • FIG. 10 With respect to Figure 10 is shown a flow diagram showing the operations of the reverberator output signals spatializer according to some embodiments.
  • the method can comprise obtaining reverberator output signals as shown in Figure 10 by step 1000 and obtaining reverberator output channel positions as shown in Figure 10 by step 1001.
  • the method may comprise applying a HRTF filter configured by the reverberator output channel positions to the reverberator output signals as shown in Figure 10 by step 1003.
  • the method may then comprise summing or combining the output channels as shown in Figure 10 by step 1005.
  • the reverberated audio signals can be output as shown in Figure 10 by step 1007.
  • Figure 14 shows schematically an example system where the embodiments are implemented in an encoder device 1901 which performs part of the functionality; writes data into a bitstream 1921 and transmits that for a Tenderer device 1941 , which decodes the bitstream, performs reverberator processing according to the embodiments and outputs audio for headphone listening.
  • Figure 14 for example shows apparatus, and specifically the Tenderer device 1941 , which is suitable for performing spatial rendering operations.
  • the encoder side 1901 of Figure 14 can be performed on content creator computers and/or network server computers.
  • the output of the encoder is the bitstream 1921 which is made available for downloading or streaming.
  • the decoder/renderer 1941 functionality runs on end-user-device, which can be a mobile device, personal computer, sound bar, tablet computer, car media system, home HiFi or theatre system, head mounted display for AR or VR, smart watch, or any suitable system for audio consumption.
  • the encoder 1901 is configured to receive the virtual scene description 1900 and the audio signals 1904.
  • the virtual scene description 1900 can be provided in the MPEG-I Encoder Input Format (EIF) or in other suitable format.
  • EIF MPEG-I Encoder Input Format
  • the virtual scene description contains an acoustically relevant description of the contents of the virtual scene, and contains, for example, the scene geometry as a mesh, acoustic materials, acoustic environments with reverberation parameters, positions of sound sources, and other audio element related parameters such as whether reverberation is to be rendered for an audio element or not.
  • the encoder 1901 in some embodiments comprises a reverberation parameter determiner 1911 configured to receive the virtual scene description 1900 and configured to obtain the reverberation parameters.
  • the reverberation parameters can in an embodiment be obtained from the RT60, DDR, predelay, and region/enclosure parameters of acoustic environments.
  • the encoder 1901 furthermore in some embodiments comprises a scene and reverberation payload encoder 1913 configured to obtain the determined reverberation parameters and virtual scene description 1900 and generate suitable encoded scene and reverberation parameters.
  • the encoder 1901 on Figure 14 can be executed on content creator computers and/or network server computers.
  • the output of the encoder is the bitstream which is made available for downloading or streaming. It can reside on a content delivery network (CDN) for example.
  • CDN content delivery network
  • the decoder/renderer 1941 functionality in some embodiments runs on an end-user-device, which can be a mobile device, personal computer, sound bar, tablet computer, car media system, home HiFi or theatre system, head mounted display for AR or VR, smart watch, or any suitable system for audio consumption.
  • the scene and reverberation parameters are encoded into a bitstream payload referred as a reverberation payload (generated by the scene and reverberation payload encoder 1913).
  • a reverberation payload generated by the scene and reverberation payload encoder 1913.
  • an input reverberation encoding preferences 1910 in the form of use_reverb_payload_metadata and reverb_method_type, is provided to the reverberation parameter determiner 1911 and the scene and reverberation payload encoder 1913.
  • Deriving reverberator parameters based on reverberation parameters can be implemented in some embodiments as described in the obtained parameters for at least one graphic EQ filter for a reverberator using the control gain data and obtain other parameters for a reverberator as indicated above.
  • the encoder 1901 further comprises a MPEG-H 3D audio encoder 1914 configured to obtain the audio signals 1904 and MPEG-H encode them and pass them to a bitstream encoder 1915.
  • a MPEG-H 3D audio encoder 1914 configured to obtain the audio signals 1904 and MPEG-H encode them and pass them to a bitstream encoder 1915.
  • the encoder 1901 furthermore in some embodiments comprises a bitstream encoder 1915 which is configured to receive the output of the scene and reverberation payload encoder 1913 and the encoded audio signals from the MPEG-H encoder 1914 and generate the bitstream 1921 which can be passed to the bitstream decoder 1941 .
  • the bitstream 1921 in some embodiments can be streamed to end-user devices or made available for download or stored.
  • the decoder 1941 in some embodiments comprises a bitstream decoder 1951 configured to decode the bitstream.
  • the decoder 1941 further can comprise a reverberation payload decoder 1953 configured to obtain the encoded reverberation parameters and decode these in an opposite or inverse operation to the reverberation payload encoder 1913.
  • the listening space description LSDF generator 1971 is configured to generate and pass the LSDF information to the reverberator controller 1955 and the reverberator output signals spatialization controller 1959.
  • the head pose generator 1957 receives information from a head mounted device or similar and generates head pose information or parameters which can be passed to the reverberator controller 1955, the reverberator output signals spatialization controller 1959 and HRTF processor 1963.
  • the decoder 1941 comprises a reverberator controller 1955 which also receives the output of the scene and reverberation payload decoder 1953 and generates the reverberation parameters for configuring the reverberators and passes this to the reverberators 1961 .
  • the decoder 1941 comprises a reverberator output signals spatialization controller 1959 configured to configure the reverberator output signals spatializer 1962.
  • the decoder 1941 in some embodiments comprises a MPEG-H 3D audio decoder 1954 which is configured to decode the audio signals and pass them to the (FDN) reverberators 1961 and direct sound processor 1965.
  • the decoder 1941 furthermore comprises (FDN) reverberators 1961 configured by the reverberator controller 1955 and configured to implement a suitable reverberation of the audio signals.
  • FDN reverberators 1961 configured by the reverberator controller 1955 and configured to implement a suitable reverberation of the audio signals.
  • the output of the (FDN) reverberators 1961 is configured to output to a reverberator output signal spatializer 1962.
  • the decoder 1941 comprises a reverberator output signal spatializer 1962 configured to apply the spatialization and output to the binaural combiner 1967.
  • the decoder/renderer 1941 comprises a direct sound processor 1965 which is configured to receive the decoded audio signals and configured to implement any direct sound processing such as air absorption and distance-gain attenuation and which can be passed to a HRTF processor 1963 which with the head orientation determination (from a suitable sensor 1991 ) can generate the direct sound component which with the reverberant component from the HRTF processor 1963 is passed to a binaural signal combiner 1967.
  • the binaural signal combiner 1967 is configured to combine the direct and reverberant parts to generate a suitable output (for example for headphone reproduction).
  • the decoder comprises a head orientation determiner 1991 which passes the head orientation information to the HRTF processor 1963.
  • the reverberation payload encoder 1913 comprises a frequency dependent RT60 and DDR data obtainer 1201 configured to obtain the frequency dependent RT60 and DDR data values.
  • the reverberation payload encoder 1913 comprises a RT60 and DDR to octave band centre frequency mapper (to obtain octave band data) 1203 which is configured map the obtained RT60 and DDR values to a octave band centre frequencies. This can be implemented by mapping each frequency k to the closest octave band centre frequency b.
  • Weighted linear interpolation can be used to obtain the value of the RT60(b) or logRDR(b) at each band centre frequency. If no data is provided above a certain band or below certain band, the last band value is extrapolated to higher bands (or the first band value is extrapolated to lower bands).
  • Frequency band divisions can be indicated with a set of centre frequencies like above, or with a set of band low and band high frequencies.
  • a predefined number of frequency band divisions can be known by the encoder and Tenderer.
  • Each frequency band division can be optionally identified with a unique identifier such as a unique index related to frequency band divisions.
  • Such identifiers and corresponding divisions can be known by the encoder and Tenderer.
  • new divisions can be formed by the encoder and then signalled to the Tenderer.
  • the encoder can evaluate different frequency band divisions for mapping the frequency dependent input data.
  • a good match between the input data and the corresponding frequency band division data can be determined.
  • This kind of evaluation can be performed by the encoder for a plurality of frequency band divisions and the frequency band division which best represents the input data, based on the criterion described above, can be selected for representing the input data.
  • Data can then be encoded by sending the values of the input data mapped to the centre frequencies of the selected frequency band division, and the identifier of the used frequency band division.
  • the frequency band divisions have different numbers of frequency bands which means that explicit identifiers are not needed but the Tenderer can identify the used frequency band division from the number of values.
  • the reverberation payload encoder 1913 comprises an octave band data encoder 1205.
  • the octave band data encoder 1205 in some embodiments is configured to encode the octave band data by differential encoding methods. For example taking the first value and then encoding the rest of the values as their differences to the first value.
  • the bitstream can contain the first value as such and Huffman codes of the difference values.
  • differential encoding is not applied but the octave band values are encoded into the bitstream as suitable integer values.
  • the reverberation payload encoder 1913 comprises a frequency dependent RT60 and DDR data encoder 1207.
  • the frequency dependent RT60 and DDR data encoder 1207 is configured to encode the frequency-dependent RT60(k) and DDR(k) data. If the frequency values k are shared between these, then the frequency values need to be encoded only once. They can be difference encoded and Huffman coded like octave band data. Similarly, RT60(k) and DDR(k) can be difference encoded and Huffman coded. In some embodiments the difference encoding and/or Huffman coding are omitted and the values are included into the bitstream as suitable integer values.
  • the reverberation payload encoder 1913 comprises a bitrate comparer/encoder selector 1209, the selector 1209 is configured to compare the bitrate required for transmitting the encoding from the octave band data encoder 1205 and frequency dependent RT60 and DDR data encoder 1207 and select one to be transmitted as compact reverberation parameters.
  • the number of bits required by transmitting the first value and the Huffman codes of the remaining values are compared for the data representations of both encoder options. The one leading to smallest number of bits is selected, and reverb_method_type is set accordingly to type 2 or type 3.
  • the reverberation payload encoder 1913 comprises a bitstream generator 1211 configured to create a bitstream representation of the selected compact reverberation parameters.
  • First is the operation of obtaining Scene and reverberation parameters from the encoder input format as shown in Figure 13 by step 1301 .
  • the next step is one of mapping the frequencies of RT60 and DDR data to octave band centre frequencies to obtain octave band data as shown in Figure 13 by step 1305.
  • the method comprises encoding the octave band data as shown in Figure 13 by step 1307 and encoding the frequency-dependent RT60 and DDR data as shown in step 1309.
  • the next operation is one of comparing the bitrate required for transmitting octave band data and RT60/DDR data and select one to be transmitted as compact reverberation parameters as shown in Figure 13 by step 1311.
  • step 1313 the next step is to create (and output) a bitstream representation of the selected compact reverberation parameters as shown in Figure 13 by step 1313.
  • the bitstream can thus carry the information for low bitrate representation of metadata for late reverberation in different methods.
  • Reverb parameters represent filter coefficients for the FDN reverberator attenuation filters and the ratio control filter (DDR control filter), the delay line lengths, and spatial positions for the output delay lines.
  • Other FDN parameters such as feedback matrix coefficients can be predetermined in the encoder and Tenderer and not included in the bitstream.
  • Encoded reverberation parameters carry RT60 and DDR data in the bitstream, encoded either as frequency dependent data with the frequencies at which the values are provided, or just as (interpolated) values at octave bands (without transmitting the octave band centre frequencies).
  • Other frequency band divisions can be used in some embodiments.
  • RT60 times mapped into control gains of a graphic EQ FDN attenuation filter. There are either 10 or 31 control gains. DDR values mapped into control gains of a graphic EQ. There are either 10 or 31 control gains.
  • reverbPayloadStruct ( ) ⁇ unsigned int(l) use reverb payload metadata ; //decides if reverb or scene payload is used if (use reverb payload metadata) !
  • PositionStruct ( ) ⁇ signed int(32) vertex pos x; signed int(32) vertex pos y; signed int(32) vertex pos z;
  • reverbPayloadStruct ( ) use_reverb_payload_metadata 1 indicates to the Tenderer that the metadata carried in the reverb payload data structure should be use to perform late reverberation rendering.
  • a value equal to 0 indicates to the renderer that the metadata from scene payload shall be used to perform late reverberation rendering.
  • reverb_method_type 1 indicates to the renderer that the metadata carries information with encoder optimized reverb parameters for the FDN.
  • a value equal to 2 indicates that the carriage of reverberation metadata carries encoded representation of the RT60 and DDR.
  • a value equal to 3 carried in the reverb payload data carries octave band data for RT60 and DDR.
  • numberOf SpatialPos itions defines the number of output delay line positions for the late reverb payload. This value is defined using an index which corresponds to a specific number of delay lines.
  • the value of the bit string ‘ObOO’ signals the renderer to a value of 15 spatial orientations for delay lines.
  • the other three values ‘ObOT, ‘Ob 10’ and ‘0b1 T are reserved.
  • az imuth defines azimuth of the delay line with respect to the listener.
  • the range is between -180 to 180 degrees.
  • elevation defines the elevation of the delay line with respect to the listener.
  • the range is between -90 to 90 degrees.
  • numberOfAcousticEnvironments defines the number of acoustic environments in the audio scene.
  • the reverbPayloadStruct() carries information regarding the one or more acoustic environments which are present in the audio scene at that time.
  • An acoustic environment has certain “Reverberation parameters” such as RT60 times which are used to obtain FDN reverb parameters.
  • environment id This value defines the unique identifier of the acoustic environment.
  • delayLineLength defines the length in units of samples for the graphic equalizer (GEQ) filter used for configuration of the delay line attenuation filter. The lengths of different delay lines corresponding to the same acoustic environment are mutually prime.
  • f i lterParamsStruct ( ) this structure describes the graphic equalizer cascade filter to configure the attenuation filter for the delay lines. The same structure is also used subsequently to configure the filter for diffuse-to-direct reverberation ratio GEQDDR. The details of this structure are described in the next table.
  • bitstream comprises three structures:
  • EncodedRT 60Struct ( ) carries RT60 (scene_rt60_value) values for each frequency band (frequency_value) represented as positive integers.
  • the integers are differentially encoded and Huffman coded integer indices.
  • EncodedDDRStruct ( ) carries DDR values(scene_ddr_coded_value) for each frequency (frequency_value) band represented as positive integers.
  • the frequency bands can be the same or different for RT60 and DDR.
  • the Huffman coding of differences can be same or different for RT60 and DDR.
  • bitstream comprises three structures:
  • DDROctaveBandDataStruct ( ) carries DDR values (ddr_encoded_value) for 10 Octave bands.
  • RT 60OctaveBandDataStruct ( ) carries RT60 values (encoded_rt60_values) for 10 Octave bands.
  • the RT60 values can be converted to a suitable integer representation in an embodiment.
  • the integers are differentially encoded and Huffman coded integer indices.
  • the Semantics of f i lterParamsStruct ( ) sosLength is the length of the each of the second order section filter coefficients.
  • b1 , b2, a1 , a2 The filter is configured with coefficients b1 , b2, a1 and a2. These are the feedforward and feedback HR filter coefficients of the second-order section HR filters.
  • globalGain specifies the gain factor in decibels for the GEQ. levelDB specifies a sound level offset for each of the delay lines in decibels.
  • MPEG-I Audio Phase 2 will normatively standardize the bitstream and the Tenderer processing. There will also be an encoder reference implementation, but it can be modified later on as long as the output bitstream follows the normative specification. This allows improving the codec quality also after the standard has been finalized with novel encoder implementations.
  • the portions going to different parts of the MPEG-I standard can be:
  • Encoder reference implementation will contain o Deriving the reverberator parameters or compact reverberation parameters for each of the acoustic environments based on their RT60 and DDR o Obtaining scene parameters from the encoder input and writing them into the bitstream. o Writing a bitstream description containing the reverberator or compact reverberation parameters and scene parameters.
  • the normative bitstream shall contain reverberator or compact reverberation parameters described using the syntax described here.
  • the bitstream shall be streamed to end-user devices or made available for download or stored.
  • the normative renderer shall decode the bitstream to obtain the Scene and reverberation parameters and perform the compact reverberation parameter decoding and mapping to reverberator parameters as described in the embodiments herein. Moreover, the renderer is configured to take care of reverberation rendering.
  • the complete normative renderer will also obtain other parameters from the bitstream related to room acoustics and sound source properties, and use them to render the direct sound, early reflection, diffraction, sound source spatial extent or width, and other acoustic effects in addition to diffuse late reverberation.
  • the invention presented here focuses on the rendering of the diffuse late reverberation part and in particular how to enable bitrate efficient coding of compact reverberation parameters.
  • the device may be any suitable electronics device or apparatus.
  • the device 2000 is a mobile device, user equipment, tablet computer, computer, audio playback apparatus, etc.
  • the device may for example be configured to implement the encoder or the Tenderer or any functional block as described above.
  • the device 2000 comprises at least one processor or central processing unit 2007.
  • the processor 2007 can be configured to execute various program codes such as the methods such as described herein.
  • the device 2000 comprises a memory 2011.
  • the at least one processor 2007 is coupled to the memory 2011.
  • the memory 2011 can be any suitable storage means.
  • the memory 2011 comprises a program code section for storing program codes implementable upon the processor 2007.
  • the memory 2011 can further comprise a stored data section for storing data, for example data that has been processed or to be processed in accordance with the embodiments as described herein. The implemented program code stored within the program code section and the data stored within the stored data section can be retrieved by the processor 2007 whenever needed via the memory-processor coupling.
  • the device 2000 comprises a user interface 2005.
  • the user interface 2005 can be coupled in some embodiments to the processor 2007.
  • the processor 2007 can control the operation of the user interface 2005 and receive inputs from the user interface 2005.
  • the user interface 2005 can enable a user to input commands to the device 2000, for example via a keypad.
  • the user interface 2005 can enable the user to obtain information from the device 2000.
  • the user interface 2005 may comprise a display configured to display information from the device 2000 to the user.
  • the user interface 2005 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the device 2000 and further displaying information to the user of the device 2000.
  • the user interface 2005 may be the user interface for communicating.
  • the device 2000 comprises an input/output port 2009.
  • the input/output port 2009 in some embodiments comprises a transceiver.
  • the transceiver in such embodiments can be coupled to the processor 2007 and configured to enable a communication with other apparatus or electronic devices, for example via a wireless communications network.
  • the transceiver or any suitable transceiver or transmitter and/or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling.
  • the transceiver can communicate with further apparatus by any suitable known communications protocol.
  • the transceiver can use a suitable universal mobile telecommunications system (UMTS) protocol, a wireless local area network (WLAN) protocol such as for example IEEE 802. X, a suitable short- range radio frequency communication protocol such as Bluetooth, or infrared data communication pathway (IRDA).
  • UMTS universal mobile telecommunications system
  • WLAN wireless local area network
  • IRDA infrared data communication pathway
  • the input/output port 2009 may be configured to receive the signals.
  • the device 2000 may be employed as at least part of the renderer.
  • the input/output port 2009 may be coupled to headphones (which may be a headtracked or a non-tracked headphones) or similar.
  • the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof.
  • some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
  • firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
  • While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
  • the embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware.
  • any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions.
  • the software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
  • the memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory.
  • the data processors may be of any type suitable to the local technical environment, and may include one or more of general-purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
  • Embodiments of the inventions may be practiced in various components such as integrated circuit modules.
  • the design of integrated circuits is by and large a highly automated process.
  • Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
  • Programs such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules.
  • the resultant design in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Stereophonic System (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

An apparatus for assisting spatial rendering in at least one acoustic environment, the apparatus comprising means configured to: obtain at least one reverberation parameter; convert the obtained at least one reverberation parameter into at least one frequency band data; encode the at least one frequency band data; encode the at least one reverberation parameter; compare resources required to transmit the encoded at least one frequency band data and the encoded at least one reverberation parameter; generate a bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising a selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter.

Description

SPATIAL RENDERING OF REVERBERATION
Field
The present application relates to apparatus and methods for generating and employment of spatial rendering of reverberation, but not exclusively for spatial rendering of reverberation in augmented reality and/or virtual reality apparatus.
Background
Reverberation refers to the persistence of sound in a space after the actual sound source has stopped. Different spaces are characterized by different reverberation characteristics. For conveying spatial impression of an environment, reproducing reverberation perceptually accurately is important. Room acoustics are often modelled with individually synthesized early reflection portion and a statistical model for the diffuse late reverberation. Figure 1 depicts an example of a synthesized room impulse response where the direct sound 101 is followed by discrete early reflections 103 which have a direction of arrival (DOA) and diffuse late reverberation 105 which can be synthesized without any specific direction of arrival. The delay d1 (t) 102 in Figure 1 can be seen to denote the direct sound arrival delay from the source to the listener and the delay d2(t) 104 can denote the delay from the source to the listener for one of the early reflections (in this case the first arriving reflection).
One method of reproducing reverberation is to utilize a set of A/ loudspeakers (or virtual loudspeakers reproduced binaurally using a set of head-related transfer functions (HRTF)). The loudspeakers are positioned around the listener somewhat evenly. Mutually incoherent reverberant signals are reproduced from these loudspeakers, producing a perception of surrounding diffuse reverberation.
The reverberation produced by the different loudspeakers has to be mutually incoherent. In a simple case the reverberations can be produced using the different channels of the same reverberator, where the output channels are uncorrelated but otherwise share the same acoustic characteristics such as RT60 time and level (specifically, the diffuse-to-direct ratio or reverberant-to-direct ratio). Such uncorrelated outputs sharing the same acoustic characteristics can be obtained, for example, from the output taps of a Feedback-Delay-Network (FDN) reverberator with suitable tuning of the delay line lengths, or from a reverberator based on using decaying uncorrelated noise sequences by using a different uncorrelated noise sequence in each channel. In this case, the different reverberant signals effectively have the same features, and the reverberation is typically perceived to be similar to all directions.
Reverberation spectrum or level can be controlled using the diffuse-to-direct ratio (DDR), which describes the ratio of the energy (or level) of reverberant sound energy to the direct sound energy (or the total emitted energy of a sound source).
Figure imgf000004_0001
There is provided according to a first aspect an apparatus for assisting spatial rendering in at least one acoustic environment, the apparatus comprising means configured to: obtain at least one reverberation parameter; convert the obtained at least one reverberation parameter into at least one frequency band data; encode the at least one frequency band data; encode the at least one reverberation parameter; compare resources required to transmit the encoded at least one frequency band data and the encoded at least one reverberation parameter; generate a bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising a selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter.
The at least one frequency band data may be organised as octave bands.
The at least one frequency band data may further comprise: an index identifying a centre band frequency range; and a number of bands.
The means configured to generate the bitstream may be configured to generate the bitstream comprising a selection indicator configured to indicate the selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter within the bitstream.
The apparatus may be further configured to obtain a scene description defining a virtual scene forming at least part of the at least one acoustic environment, wherein the at least one reverberation parameter may be associated with the virtual scene.
The at least one reverberation parameter may be a frequency dependent reverberation parameter.
The resources may be one of: encoded bitrate; encoded bits; and channel capacity. The means configured to select, based on the comparison, the one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter may be configured to: select the encoded at least one reverberation parameter in a high bitrate mode; and select the encoded at least one frequency band data in a low bitrate mode.
According to a second aspect there is provided an apparatus for assisting spatial rendering in at least one acoustic environment, the apparatus comprising means configured to: obtain a bitstream, the bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising one of: an encoded at least one frequency band data and an encoded at least one reverberation parameter; decode the reverberation parameter part to generate decoded reverberation parameters; obtain reverberator parameters from the decoded reverberation parameters; initialize at least one reverberator based on the reverberator parameters; obtain at least one input audio signal associated with the at least one acoustic environment; and generate an output audio signal based on the application of the at least one reverberator to the at least one input audio signal.
The bitstream may further comprise at least one indicator indicating that the bitstream comprises at least one of: the encoded at least one frequency band data and the encoded at least one reverberation parameter, wherein the means configured to obtain reverberator parameters from the decoded reverberation parameters may be configured to determine the reverberation parameter part based on the indicator.
The means configured to obtain reverberator parameters from the decoded reverberation parameters may be configured to determine the reverberation parameter part based on the indicator and configured to: determine the bitstream comprises the at least one reverberation parameter in a high bitrate mode; and determine the bitstream comprises the at least one frequency band data in a low bitrate mode.
The means configured to obtain reverberator parameters from the decoded reverberation parameters may be configured to determine the reverberation parameter part further comprises an indicator indicating that the reverberator parameters are to be determined from at least one reverberation parameter encoded into a scene payload.
The means configured to initialize at least one reverberator based on the reverberator parameters may be configured to initialize the at least one reverberator using the at least one reverberator parameter independent on whether at least one acoustic environment is a virtual acoustic environment or an augmented reality acoustic environment.
According to a third aspect there is provided a method for an apparatus for assisting spatial rendering in at least one acoustic environment, the method comprising: obtaining at least one reverberation parameter; converting the obtained at least one reverberation parameter into at least one frequency band data; encoding the at least one frequency band data; encoding the at least one reverberation parameter; comparing resources required to transmit the encoded at least one frequency band data and the encoded at least one reverberation parameter; generating a bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising a selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter.
The at least one frequency band data may be organised as octave bands.
The at least one frequency band data may further comprise: an index identifying a centre band frequency range; and a number of bands.
Generating the bitstream may comprise generating the bitstream comprising a selection indicator configured to indicate the selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter within the bitstream.
The method may further comprise obtaining a scene description defining a virtual scene forming at least part of the at least one acoustic environment, wherein the at least one reverberation parameter is associated with the virtual scene.
The at least one reverberation parameter may be a frequency dependent reverberation parameter.
The resources may be one of: encoded bitrate; encoded bits; and channel capacity.
Generating the bitstream comprising the reverberation parameter part comprising the selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter may comprise: selecting the encoded at least one reverberation parameter in a high bitrate mode; and selecting the encoded at least one frequency band data in a low bitrate mode.
According to a fourth aspect there is provided a method for an apparatus for assisting spatial rendering in at least one acoustic environment, the method comprising: obtaining a bitstream, the bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising one of: an encoded at least one frequency band data and an encoded at least one reverberation parameter; decoding the reverberation parameter part to generate decoded reverberation parameters; obtaining reverberator parameters from the decoded reverberation parameters; initializing at least one reverberator based on the reverberator parameters; obtaining at least one input audio signal associated with the at least one acoustic environment; and generating an output audio signal based on the application of the at least one reverberator to the at least one input audio signal.
The bitstream may further comprise at least one indicator indicating that the bitstream comprises at least one of: the encoded at least one frequency band data and the encoded at least one reverberation parameter, wherein obtaining reverberator parameters from the decoded reverberation parameters may comprise determining the reverberation parameter part based on the indicator.
Obtaining reverberator parameters from the decoded reverberation parameters may comprise determining the reverberation parameter part based on the indicator, wherein determining the reverberation parameter part based on the indicator may comprise: determining the bitstream comprises the at least one reverberation parameter in a high bitrate mode; and determining the bitstream comprises the at least one frequency band data in a low bitrate mode.
Obtaining reverberator parameters from the decoded reverberation parameters may comprise determining the reverberation parameter part comprising an indicator indicating that the reverberator parameters are to be determined from at least one reverberation parameter encoded into a scene payload.
Initializing at least one reverberator based on the reverberator parameters may comprise initializing the at least one reverberator using the at least one reverberator parameter independent on whether at least one acoustic environment is a virtual acoustic environment or an augmented reality acoustic environment. According to a fifth aspect there is provided an apparatus for assisting spatial rendering in at least one acoustic environment, the apparatus comprising at least one processor and at least one memory including a computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: obtain at least one reverberation parameter; convert the obtained at least one reverberation parameter into at least one frequency band data; encode the at least one frequency band data; encode the at least one reverberation parameter; compare resources required to transmit the encoded at least one frequency band data and the encoded at least one reverberation parameter; generate a bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising a selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter.
The at least one frequency band data may be organised as octave bands.
The at least one frequency band data may further comprise: an index identifying a centre band frequency range; and a number of bands.
The apparatus caused to generate the bitstream may be caused to generate the bitstream comprising a selection indicator configured to indicate the selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter within the bitstream.
The apparatus may be further caused to obtain a scene description defining a virtual scene forming at least part of the at least one acoustic environment, wherein the at least one reverberation parameter may be associated with the virtual scene.
The at least one reverberation parameter may be a frequency dependent reverberation parameter.
The resources may be one of: encoded bitrate; encoded bits; and channel capacity.
The apparatus caused to select, based on the comparison, the one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter may be caused to: select the encoded at least one reverberation parameter in a high bitrate mode; and select the encoded at least one frequency band data in a low bitrate mode. According to a sixth aspect there is provided an apparatus for assisting spatial rendering in at least one acoustic environment, the apparatus comprising at least one processor and at least one memory including a computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: obtain a bitstream, the bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising one of: an encoded at least one frequency band data and an encoded at least one reverberation parameter; decode the reverberation parameter part to generate decoded reverberation parameters; obtain reverberator parameters from the decoded reverberation parameters; initialize at least one reverberator based on the reverberator parameters; obtain at least one input audio signal associated with the at least one acoustic environment; and generate an output audio signal based on the application of the at least one reverberator to the at least one input audio signal.
The bitstream may further comprise at least one indicator indicating that the bitstream comprises at least one of: the encoded at least one frequency band data and the encoded at least one reverberation parameter, wherein the apparatus caused to obtain reverberator parameters from the decoded reverberation parameters may be caused to determine the reverberation parameter part based on the indicator.
The apparatus caused to obtain reverberator parameters from the decoded reverberation parameters may be caused to determine the reverberation parameter part based on the indicator and caused to: determine the bitstream comprises the at least one reverberation parameter in a high bitrate mode; and determine the bitstream comprises the at least one frequency band data in a low bitrate mode.
The apparatus caused to obtain reverberator parameters from the decoded reverberation parameters may be caused to determine the reverberation parameter part further comprises an indicator indicating that the reverberator parameters are to be determined from at least one reverberation parameter encoded into a scene payload.
The apparatus caused to initialize at least one reverberator based on the reverberator parameters may be caused to initialize the at least one reverberator using the at least one reverberator parameter independent on whether at least one acoustic environment is a virtual acoustic environment or an augmented reality acoustic environment.
According to a seventh aspect there is provided an apparatus for assisting spatial rendering in at least one acoustic environment, the apparatus comprising: obtaining circuitry configured to obtain at least one reverberation parameter; converting circuitry configured to convert the obtained at least one reverberation parameter into at least one frequency band data; encoding circuitry configured to encode the at least one frequency band data; encoding circuitry configured to encode the at least one reverberation parameter; comparing circuitry configured to compare resources required to transmit the encoded at least one frequency band data and the encoded at least one reverberation parameter; generating circuitry configured to generate a bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising a selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter.
According to an eighth aspect there is provided an apparatus for assisting spatial rendering in at least one acoustic environment, the apparatus comprising: obtaining circuitry configured to obtain a bitstream, the bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising one of: an encoded at least one frequency band data and an encoded at least one reverberation parameter; decoding circuitry configured to decode the reverberation parameter part to generate decoded reverberation parameters; obtaining circuitry configured to obtain reverberator parameters from the decoded reverberation parameters; initializing circuitry configured to initialize at least one reverberator based on the reverberator parameters; obtaining circuity configured to obtain at least one input audio signal associated with the at least one acoustic environment; and generating circuitry configured to generate an output audio signal based on the application of the at least one reverberator to the at least one input audio signal.
According to a ninth aspect there is provided a computer program comprising instructions [or a computer readable medium comprising program instructions] for causing an apparatus, for assisting spatial rendering in at least one acoustic environment, to perform at least the following: obtain at least one reverberation parameter; convert the obtained at least one reverberation parameter into at least one frequency band data; encode the at least one frequency band data; encode the at least one reverberation parameter; compare resources required to transmit the encoded at least one frequency band data and the encoded at least one reverberation parameter; generate a bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising a selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter.
According to a tenth aspect there is provided a computer program comprising instructions [or a computer readable medium comprising program instructions] for causing an apparatus, for assisting spatial rendering in at least one acoustic environment, to perform at least the following: obtain a bitstream, the bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising one of: an encoded at least one frequency band data and an encoded at least one reverberation parameter; decode the reverberation parameter part to generate decoded reverberation parameters; obtain reverberator parameters from the decoded reverberation parameters; initialize at least one reverberator based on the reverberator parameters; obtain at least one input audio signal associated with the at least one acoustic environment; and generate an output audio signal based on the application of the at least one reverberator to the at least one input audio signal.
According to an eleventh aspect there is provided a non-transitory computer readable medium comprising program instructions for causing an apparatus, for assisting spatial rendering in at least one acoustic environment, to perform at least the following: obtain at least one reverberation parameter; convert the obtained at least one reverberation parameter into at least one frequency band data; encode the at least one frequency band data; encode the at least one reverberation parameter; compare resources required to transmit the encoded at least one frequency band data and the encoded at least one reverberation parameter; generate a bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising a selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter.
According to a twelfth aspect there is provided a non-transitory computer readable medium comprising program instructions for causing an apparatus, for assisting spatial rendering in at least one acoustic environment, to perform at least the following: obtain a bitstream, the bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising one of: an encoded at least one frequency band data and an encoded at least one reverberation parameter; decode the reverberation parameter part to generate decoded reverberation parameters; obtain reverberator parameters from the decoded reverberation parameters; initialize at least one reverberator based on the reverberator parameters; obtain at least one input audio signal associated with the at least one acoustic environment; and generate an output audio signal based on the application of the at least one reverberator to the at least one input audio signal.
According to a thirteenth aspect there is provided an apparatus, for assisting spatial rendering in at least one acoustic environment, comprising: means for obtaining at least one reverberation parameter; means for converting the obtained at least one reverberation parameter into at least one frequency band data; means for encoding the at least one frequency band data; means for encoding the at least one reverberation parameter; means for comparing resources required to transmit the encoded at least one frequency band data and the encoded at least one reverberation parameter; means for generating a bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising a selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter.
According to a fourteenth aspect there is provided an apparatus, for assisting spatial rendering in at least one acoustic environment, comprising: means for obtaining a bitstream, the bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising one of: an encoded at least one frequency band data and an encoded at least one reverberation parameter; means for decoding the reverberation parameter part to generate decoded reverberation parameters; means for obtaining reverberator parameters from the decoded reverberation parameters; means for initializing at least one reverberator based on the reverberator parameters; means for obtaining at least one input audio signal associated with the at least one acoustic environment; and means for generating an output audio signal based on the application of the at least one reverberator to the at least one input audio signal.
According to a fifteenth aspect there is provided a computer readable medium comprising program instructions for causing an apparatus, for assisting spatial rendering in at least one acoustic environment, to perform at least the following: obtain at least one reverberation parameter; convert the obtained at least one reverberation parameter into at least one frequency band data; encode the at least one frequency band data; encode the at least one reverberation parameter; compare resources required to transmit the encoded at least one frequency band data and the encoded at least one reverberation parameter; generate a bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising a selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter.
According to a sixteenth aspect there is provided a computer readable medium comprising program instructions for causing an apparatus, for assisting spatial rendering in at least one acoustic environment, to perform at least the following: obtain a bitstream, the bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising one of: an encoded at least one frequency band data and an encoded at least one reverberation parameter; decode the reverberation parameter part to generate decoded reverberation parameters; obtain reverberator parameters from the decoded reverberation parameters; initialize at least one reverberator based on the reverberator parameters; obtain at least one input audio signal associated with the at least one acoustic environment; and generate an output audio signal based on the application of the at least one reverberator to the at least one input audio signal. An apparatus comprising means for performing the actions of the method as described above.
An apparatus configured to perform the actions of the method as described above.
A computer program comprising program instructions for causing a computer to perform the method as described above.
A computer program product stored on a medium may cause an apparatus to perform the method as described herein.
An electronic device may comprise apparatus as described herein.
A chipset may comprise apparatus as described herein.
Embodiments of the present application aim to address problems associated with the state of the art.
Summary of the Figures
For a better understanding of the present application, reference will now be made by way of example to the accompanying drawings in which:
Figure 1 shows a model of room acoustics and the room impulse response;
Figure 2 shows an example environment within which embodiments can be implemented showing an audio scene with an audio portal or acoustic coupling;
Figure 3 shows schematically an example apparatus within which some embodiments may be implemented;
Figure 4 shows a flow diagram of the operation of the example apparatus as shown in Figure 3;
Figure 5 shows schematically an example reverberator controller as shown in Figure 3 according to some embodiments;
Figure 6 shows a flow diagram of the operation of the example reverberator controller as shown in Figure 5;
Figure 7 shows schematically an example reverberator output signals spatialization controller as shown in Figure 3 according to some embodiments;
Figure 8 shows a flow diagram of the operation of the example reverberator output signals spatialization controller as shown in Figure 7;
Figure 9 shows schematically an example reverberator output signals spatializer as shown in Figure 3 according to some embodiments; Figure 10 shows a flow diagram of the operation of the example Reverberator output signals spatializer as shown in Figure 9;
Figure 11 shows schematically an example FDN reverberator as shown in Figure 3 according to some embodiments;
Figure 12 shows schematically an example bitstream generator according to some embodiments;
Figure 13 shows a flow diagram of the operation of the example feedback filter designer as shown in Figure 12;
Figure 14 shows schematically an example apparatus with transmission and/or storage within which some embodiments can be implemented; and
Figure 15 shows an example device suitable for implementing the apparatus shown in previous figures.
Embodiments of the Application
The following describes in further detail suitable apparatus and possible mechanisms for parameterizing and rendering audio scenes with reverberation. Thus for example the suitable apparatus and methods can be part of a spatial audio rendering (also known as spatial rendering).
As discussed above reverberation can be rendered using, e.g., a Feedback- Delay-Network (FDN) reverberator with a suitable tuning of delay line lengths. An FDN allows to control the reverberation times (RT60) and the energies of different frequency bands individually. Thus, it can be used to render the reverberation based on the characteristics of the room or modelled space. The reverberation times and the energies of the different frequencies are affected by the frequency-dependent absorption characteristics of the room.
As described above the reverberation spectrum or level can be controlled using a diffuse-to-direct ratio, which describes the ratio of the energy (or level) of reverberant sound energy to the direct sound energy (or the total emitted energy of a sound source). In ISO/IEC JTC1/SC29/WG6 N00054 MPEG-I Immersive Audio Encoder Input Format, the input to the encoder is provided as DDR value which indicates the ratio of the diffuse (reverberant) sound energy to the total emitted energy of a sound source. Another well-known measure is the RDR which refers to reverberant-to-direct ratio and which can be measured from an impulse response. The relation between these two, described in ISO/IEC JTC1/SC29/WG6 N0083 MPEG-I Immersive Audio CfP Supplemental Information, Recommendations and Clarifications, Version 1 , is that
10*log10(DDR) = 10*log10(RDR) - 41 dB.
Referring to Figure 1 , the RDR can be calculated by
- summing the squares of the sample values of the diffuse late reverberation portion 105
- summing the squares of the sample values of the direct sound portion 101
- calculating the ratio of these two sums to give the RDR.
The logarithmic RDR can be obtained as 10*log10(RDR).
In a virtual environment for virtual reality (VR) or a real physical environment for augmented reality (AR) there can be several acoustic environments, each with their own reverberation parameters which can be different in different acoustic environments.
An example of such an environment is shown in Figure 2. In this example there is shown the audio scene comprising a first acoustic environment AEi 203, a second acoustic environment AE2 205 and outdoor 201 . There is shown an acoustic coupling AC1 207 between the first acoustic environment AE1 203 and a second acoustic environment AE2 205.
In this example the sound or audio sources 210 are located within the second acoustic environment AE2205. In this example the audio sources 210 comprise a first audio source, a drummer, Si 2103 and a second audio source, a guitarist, S2 2102. The Listener 202 is further shown moving through the audio scene and is shown in the first acoustic environment AE1 203 at position Pi 200i, in the second acoustic environment AE2 205 at position P2 2OO2 and outdoor 201 at position P3 2OO3.
There are known methods where the encoder converts reverberation parameters of the acoustic environment into reverberator parameters for the FDN reverberator and then creates a bitstream of the optimized reverberator parameters. While the benefit of this approach is that encoder optimization can be used to provide optimal reverberator parameters based on the reverberation characteristics of the virtual environment, the disadvantage is that the bitstream size is not as small as possible. Furthermore there are known methods where high-perceptual-quality reverberation can be synthesized for physical environments in augmented reality, if reverberation parameters are obtained only at the Tenderer. However, these methods currently lack the possibility of obtaining reverberation parameters from the bitstream.
For some usage scenarios, such as ones where content is streamed or downloaded over the air and especially where the end user device is moving such as a mobile phone or a vehicle, it is desirable for the bitrate required for 6DoF reverberation rendering (or spatial rendering) for virtual environments to be as small as possible. This is to ensure fast content download speed, uninterrupted streaming, and/or fast playback startup.
Therefore, there is a need for apparatus and methods which can utilize a compact bitstream for reverberation parameters while still producing high quality reverberation. If the reverberation parameter bitstream is not compact, there can be usage scenarios which produce poor user experience because of slow download speed, interrupted streaming, and/or slow playback startup. If reverberation is not high enough quality then a poor user experience, because of suboptimal audio quality and poor immersion, can occur.
Control of activating/prioritizing reverberators is described in GB2200043.4, which specifically discusses a mechanism of prioritizing reverberators and activating only a subset of them based on the prioritization. GB2200335.4 furthermore describes a method to adjust reverberation level especially in augmented reality (AR) rendering. W02021186107 describes late reverb modelling from acoustic environment information using FDNs and specifically describes designing a DDR filter to adjust the late reverb level based on input DDR data. GB2020673.6 describes a method and apparatus for fusion of virtual scene description in bitstream and listener space description for 6DoF rendering and specifically for late reverberation modelling for immersive audio scenes where the acoustic environment is a combination of content creator specified virtual scene as well as listener-consumption-space influenced listening space parameters. Thus, this background describes a method for rendering in AR audio scene comprising virtual scene description acoustic parameters and real- world listening-space acoustic parameters. GB2101657.1 describes late reverb rendering filter parameters are derived for a low latency Tenderer application. GB21 16093.2 discusses reproduction of diffuse reverberation where a method is proposed that enables the reproduction of rotatable diffuse reverberation where the characteristics of the reverberation may be directionally dependent (i.e., having different reverberation characteristics in different directions) using a number of processing paths (at least 3, typically 6-20 paths) (virtual) multichannel signals by determining at least two panning gains based on a target direction and the positions of the (virtual) loudspeakers in a (virtual) loudspeaker set (e.g., using VBAP), obtaining mutually incoherent reverberant signals for each of the determined gains (e.g., using outputs of two reverberators tuned to produce mutually incoherent outputs, or using decorrelators), applying the determined gains for the corresponding obtained reverberant signals in order to obtain reverberant multichannel signals, combining the reverberant multichannel signals from the different processing paths, and reproducing the combined reverberant multichannel signals from the corresponding (virtual) loudspeakers. GB2115533.8 discusses a method for seamless listener transition between acoustic environments.
The concept as discussed in the embodiments herein relates to reproduction of late reverberation in 6DoF audio rendering systems based on acoustic scene reverberation parameters where the solution is configured to transmit compact reverberation parameters and to convert them to reverberator parameters in a renderer to achieve low reverberation parameter bitstream size for low storage and network bandwidth requirements while still maintaining spatial rendering with high perceptual quality to achieve an immersive audio experience.
Thus in some embodiments the apparatus and methods relate to reproduction of late reverberation in 6DoF audio rendering systems based on acoustic scene reverberation parameters where the solution is configured to transmit compact reverberation parameters and to convert them to reverberator parameters in a renderer to achieve low reverberation parameter bitstream size suitable for low storage requirements and low network bandwidth requirements while still maintaining spatial rendering with high perceptual quality to achieve an immersive audio experience.
This can be achieved by apparatus and methods configured to implement the following operations (within an encoder): obtain frequency-dependent reverberation parameters associated with a virtual acoustic environment; convert the frequency-dependent reverberation parameters to frequency band data; encode the frequency band data into encoded frequency band data; encode the frequency-dependent reverberation parameters into encoded reverberation parameters; compare the bitrate required to transmit 1 ) the encoded frequency band data or 2) the encoded reverberation parameters; encode into bitstream compact reverberation parameters which are either 1 ) the encoded frequency band data or 2) the encoded reverberation parameters based on the comparison and associate them with the acoustic environment identifier and its dimensions.
In the following examples the frequency bands are shown as octave bands. In some embodiments the division of frequency bands can be any suitable division. For example there could also be several such known, alternative frequency band divisions, identified based on the number of values. Such an example can be spacing for 4, 6, 8, and 10 frequency bands. In some embodiments there is a dictionary of known frequency band centre frequencies and number of bands, and, instead of transmitting the centre frequencies the method is configured to send the index of the known centre band & number of bands combination.
Furthermore in some embodiments there is described apparatus and methods configured to implement the following operations (within a Tenderer): obtain from a bitstream the compact reverberation parameters which are either encoded frequency band data or encoded reverberation parameters, an identifier and dimensions of an acoustic environment; decode the received compact reverberation parameters; convert the decoded compact reverberation parameters into reverberator parameters;
Initialize a reverberator to render reverberation using the reverberator parameters; and receive at least one input signal associated with the virtual acoustic environment and render an immersive output audio signal using the reverberator.
In some embodiments the parameters are encoded into a reverberation bitstream payload.
Furthermore in some embodiments the parameters are not explicitly encoded into a reverberation bitstream payload but the reverberation bitstream payload contains a bit implying that reverberation parameters have been encoded into a scene payload.
In some embodiments the initialization of the reverberator using the parameters is implemented in the same manner as when a reverberator is initialized for rendering reverberation for an augmented reality (physical) scene and when reverberation parameters for the augmented reality scene are received directly in the Tenderer.
Furthermore in some embodiments, the reverberator used for rendering reverberation using the compact reverberation parameters can also be used for rendering reverberation for virtual acoustic environments when reverberator parameters are received in the bitstream.
In some embodiments the apparatus and methods can be configured to encode the compact reverberation parameters into bitstream when operating in a low bitrate mode.
In some embodiments the reverberator used for rendering reverberation using the compact reverberation parameters differs from the reverberator used for rendering reverberation for augmented reality physical scenes or virtual reality scenes when the solution is configured to not operate in a low bitrate mode.
Furthermore in some embodiments an indication in the reverb payload bitstream can be used to indicate to the Tenderer to use reverberation parameters from the audio scene description in the bitstream.
In some embodiments, the indication in the reverb payload bitstream to utilize reverberation parameters signals the expectation to perform reverberation auralization during rendering.
It is understood that ISO/IEC 23090-4 MPEG-I Audio Phase 2 will normatively standardize the bitstream and the Tenderer processing. There will also be an encoder reference implementation, but it can be modified later on as long as the output bitstream follows the normative spec. This allows improving the codec quality also after the standard has been finalized with novel encoder implementations.
With respect to the embodiments described herein, the portions going to different parts of the MPEG-I standard are as follows: the normative bitstream can contain encoded octave band data or encoded frequency-dependent reverberation data for each acoustic environment. The encoded frequency-dependent reverberation data can be under the reverb payload or under the scene payload; and the normative Tenderer can decode the bitstream to obtain scene and compact reverberation parameters, decode and map these to reverberator parameters, and render reverberated signal using the reverberator.
With respect to Figure 3 is shown an example system of apparatus suitable for implementing some embodiments. The apparatus can in some embodiments be part of a spatial rendering system as described later on.
The input to the system of apparatus is scene and reverberator parameters 300, listener pose parameters 302 and audio signal 306. The system of apparatus generates as an output, a reverberated signal 314 (e.g. binauralized with head-related- transfer-function (HRTF) filtering for reproduction to headphones, or panned with Vector-Base Amplitude Panning (VBAP) for reproduction to loudspeakers).
In some embodiments the apparatus comprises a reverberator controller 301 . The reverberator controller 301 is configured to obtain or receive the scene and reverberation parameters 300. In this example implementation the scene and reverberation parameters are in the form of a bitstream which contains enclosing room geometry and parameters describing the RT60 times and reverberant-to-direct ratio (RDR) for the enclosure (or Acoustic Environment).
The reverberator controller 301 is configured to obtain the bitstream, convert the encoded reverberation parameters into parameters for a reverberator (reverberator parameters), and pass the reverberator parameters to initialize at least one FDN reverberator to reproduce reverberation according to the reverberator parameters. The reverberator parameters 304 can then be passed to the reverberator(s) 305.
In some embodiments the apparatus comprises a reverberator or reverberators 305. The reverberator(s) are configured to receive the reverberator parameters 304 and the audio signal sin(t) (where t is time) 306. In some embodiments the reverberator(s) 305 are configured to reverberate the audio signal 306 based on the reverberator parameters 304.
The details of the reverberation processing are presented in further detail later.
The reverberators 305 in some embodiments output the resulting reverberator output signals srev r(J, t) 310 (where j is the output audio channel index and r the reverberator index). There are several reverberators, each of which produce several output audio signals. These reverberator output signals 310 are input into a reverberator output signals spatializer 307.
Furthermore the apparatus comprises a reverberator output signals spatialization controller 303. The reverberator output signals spatialization controller 303 is configured to receive the scene and reverberation parameters 300 and the listener pose parameters 302 and generate reverberator output channel positions 312. The reverberator output channel positions 312 in some embodiments indicates cartesian coordinates which are to be used when rendering each of the signals in srev r(j, t). In some other embodiments other representations (or other co-ordinate system) such as polar coordinates can be used. The output channel positions can be virtual loudspeaker positions (or positions in a space which are unrelated to an actual or physical loudspeaker but can be used to generate a suitable spatial audio signal format such as binaural audio signals), or actual loudspeaker positions (for example in multi-speaker systems such as 5.1 , 7.2 channel systems).
In some embodiments the apparatus comprises a reverberator output signals spatializer 307. The reverberator output signals spatializer 307 is configured to obtain the reverberator output signals 310 and the reverberator output channel positions 312 and based on these produces an output signal suitable for reproduction via headphones or via loudspeakers. In some embodiments the reverberator output signals spatializer 307 is configured to render each reverberator output into a desired output format, such as binaural, and then sum the signals to produce the output reverberated signal 314. For binaural reproduction the reverberator output signals spatializer 307 can further use HRTF filtering to render the reverberator output signals 310 in their desired positions indicated by the reverberator output channel positions 312.
This reverberation in the reverberated signals 314 is therefore based on the scene and reverberator parameters 300 as was desired and further considers listener pose parameters 302.
With respect to Figure 4 is shown a flow diagram showing the operations of example apparatus shown in Figure 3 according to some embodiments.
Thus, for example, the method may comprise obtaining scene and reverberator parameters and obtaining listener pose parameters is shown in Figure 4 by step 401 .
Furthermore the audio signals are obtained is shown in Figure 4 by step 403. Then the reverberator controls are determined based on the obtained scene and reverberator parameters and listener pose parameters is shown in Figure 4 by step 405.
Then the reverberators controlled by the reverberator controls are applied to the audio signals as shown in Figure 4 by step 407.
Furthermore the reverberator output signal spatialization controls are determined based on the obtaining scene and reverberator parameters and listener pose parameters as shown in Figure 4 by step 409.
The reverberator spatialization based on the reverberator output signal spatialization controls can then be applied to the reverberated audio signals from the reverberators to generate output reverberated audio signals as shown in Figure 4 by step 411 .
Then the output reverberated audio signals are output as shown in Figure 4 by step 413.
With respect to Figure 5 there is shown in further detail an example reverberator controller 301 . As discussed above the reverberator controller 302 is configured to provide reverberator parameters to the reverberator(s). The reverberation controller 302 is configured to receive encoded reverberation parameters which describe the reverberation characteristics in each acoustic environment (each acoustic environment contains at least one set of reverberation parameters). The reverberation controller can be configured to decode the obtained reverberation parameters and convert the decoded parameters into concrete reverberator parameters. The input to the apparatus can be configured to provide the desired RT60 times per specified frequencies k denoted as RT60(k) and DDR values DDR(k). An alternative representation for the DDR is RDR, or its logarithm logRDR(k). To be useful for reverberation, parameters for a concrete reverberator need to be obtained based on these values, so that the reverberator 305 can then be used for reproducing reverberation.
In some embodiments the reverberator controller 301 comprises a reverberator payload selector 501. The reverberator payload selector 501 is configured to determine how reverberation parameters for acoustic environments are represented. In some embodiments an indicator or flag from the bitstream provides the information to make the selection. Thus, for example, when the bit use_reverb_payload_metadata is not set (equal to 0), the reverberation parameters are encoded in the scene payload and the controller is configured to obtain the reverberation parameters from a scene payload section. In an example embodiment the reverberation parameters are encoded as frequency data when carried in the scene payload.
If use_reverb_payload_metadata is set (equal to 1 ), the reverberation parameters are encoded either as octave band data (without the octave centre frequencies) or as frequency-dependent data with combinations of frequency-control value.
The payload selector 501 is then configured to control the decoder 505.
Furthermore in some embodiments the reverberator controller 301 comprises a reverberator method type selector 503. The reverberator method type selector 503 is configured to determine the method type for the reverberator. For example the parameters of the reverberator can be adjusted so that they produce reverberation having characteristics matching the desired RT60(k) and DDR(k) for the acoustic environment to which this FDN reverberator is to be associated. For example whether the acoustic environment/scene is a virtual reality (VR) scene or augmented reality (AR) scene.
For example in some embodiments the reverberator method type selector is configured to control the decoding and adjustment of the parameters based on an indicator or flag such that when reverb_method_type ==1 the reverberator parameters are obtained directly from the bitstream, and when reverb_method_type == 2 or reverb_method_type == 3 the reverberator parameters are adjusted or optimized based on the reverberation parameters obtained from the bitstream. In an example embodiment reverb_method_type == 1 and reverb_method_type == 2 and reverb_method_type == 3 are applicable for VR scenes and the processing (optimizing of reverberator parameters) occurring as a result of reverb_method_type == 2 and reverb_method_type == 3 is similar to when reverberation parameters are obtained for an augmented reality (AR) scene.
In some embodiments the reverberator controller 301 comprises a decoder 505, the decoder 505 is controlled based on the outputs of the reverberator payload selector 501 and the reverberator method type selector 503.
Thus when reverberation parameters are encoded as frequency data, the decoder is configured to revert the possible encoding. In an example embodiment the decoder is configured to implement Huffman decoding and reverting the differential encoding to obtain frequency dependent RT60(k) and logRDR(k) data. In some embodiments the decoder 505, when the reverberation parameters are encoded as octave band data, is configured to decode the parameters by reverting the Huffman coding and differential encoding applied on the octave band values. In this case no band centre frequencies are transmitted as they are known by the renderer. Thus, decoded values directly correspond to RT60(b) and logRDR(b) where b is the octave band index (in other words the ‘frequency data’ mapping described hereafter is not employed).
In some embodiments the reverberator controller 301 further comprises a mapper 507 configured to map the frequency dependent data into octave band data RT60(b) and logRDR(b) where b is the octave band index. The values DDR(k) are mapped to a set of frequency bands b, which can be, e.g., either octave or third octave bands. Mapping of input DDR values to frequency bands b is done by obtaining for each band b the value from the input DDR response DDR(k) at the closest frequency k to the center frequency of band b. Other choices such as Bark bands or frequency bands with linearly-spaced center frequencies are also possible. This results in the frequency mapped DDR values DDR(b) and RT60 values RT60(b).
In some embodiments the reverberator controller 301 comprises a Filter parameter determiner 509 configured to convert the bandwise RT60(b) and logRDR(b) into reverberator parameters. The reverberator parameters can, in some embodiments, comprise the coefficients of each attenuation filter GEQd, feedback matrix coefficients A, and lengths mid for D delay lines. In this invention, each attenuation filter GEQd is a graphic EQ filter using M biquad HR band filters.
With respect to Figure 6 is shown the operations of the reverberator controller 301 as shown in Figure 5.
The first operation is one of obtaining the bitstream containing scene and reverberation parameters as shown in Figure 6 by step 601.
Then a selection is made to determine whether to obtain the encoded reverberation parameters from the scene payload or from the reverb payload (or in other words whether they are to be obtained from the scene metadata or reverb metadata). This is shown in Figure 6 by step 603.
Where the selection is made to obtain the encoded reverberation parameters from the scene payload (use_reverb_payload_metadata ==0) then the next operation is one of obtaining the encoded reverberation parameters from the scene payload as shown in Figure 6 by step 611 . Then having obtained the encoded reverberation parameters they can be decoded to obtain frequency data as shown in Figure 6 by step 613.
The obtained frequency data can then be mapped to obtain octave band data as shown in Figure 6 by step 615.
The octave band data can then be mapped to control gain data as shown in Figure 6 by step 617.
The control gain data can then be used to obtain parameters for at least one graphic equalization filter for the reverberator as shown in Figure 6 by step 619.
Where the selection is made to obtain the encoded reverberation parameters from the reverb payload metadata (use_reverb_payload_metadata ==1 ) then the next operation is one of determining the reverberation method type as shown in Figure 6 by step 605.
In this example, where the method indicates that the parameters are derived/determined in the Tenderer and from encoded frequency data (reverb_method_type==2) then the encoded reverberation parameters can be obtained from the reverb payload as shown in Figure 6 by step 607. Then the method can pass to step 613 of decoding the encoded reverberation parameters to obtain frequency data as described above.
Furthermore where the method indicates that the parameters are derived/determined in the Tenderer (reverb_method_type==3) but from encoded octave band data then the encoded reverberation parameters can be obtained from decoding the encoded octave band data to obtain the octave band data as shown in Figure 6 step 609. Then the method can pass to step 617 of mapping the octave band data to generate control gain data as described above.
Furthermore where the method indicates that the adjusted reverberator parameters are included in the reverberation payload (and have been derived/determined in the encoder) then the method can pass directly to the operation of obtaining the reverberator parameters as shown in Figure 6 by step 621 .
The generation of reverberator parameters are discussed herein in further detail and with respect to an example reverberator 305 as shown schematically in Figure 11 as a FDN (Feedback Delay Network) configuration. The reverberator 305 which is enabled or configured to produce reverberation whose characteristics match the room parameters. There may be several of such reverberators, each parameterized based on the reverberation characteristics of an acoustic environment. An example reverberator implementation comprises a feedback delay network (FDN) reverberator and DDR control filter which enables reproducing reverberation having desired frequency dependent RT60 times and levels. The room (or reverberation) parameters are used to adjust the FDN reverberator parameters such that it produces the desired RT60 times and levels. An example of a level parameter can be the direct- to-diffuse-ratio (DDR) (or the diffuse-to-total energy ratio as used in MPEG-I). The output from the FDN reverberator are the reverberated audio signals which for binaural headphone reproduction are then reproduced into two output signals and for loudspeaker output means typically more than two output audio signals. Reproducing several outputs such as 15 FDN delay line outputs to binaural output can be done, for example, via HRTF filtering.
Figure 11 shows an example FDN reverberator in further detail and which can be used to produce D uncorrelated output audio signals. In this example each output signal can be rendered at a certain spatial position around the listener for an enveloping reverb perception.
The example FDN reverberator is configured such that the reverberation parameters are processed to generate coefficients GEQd (GEQi, GEQ2, ... GEQD) of each attenuation filter 1161 , feedback matrix 1157 coefficients A, lengths mid (m-i, m2, ... mo) for D delay lines 1159 and DDR energy ratio control filter 1153 coefficients GEQddr. The example FDN reverberator 305 thus shows a D-channel output, by providing the output from each FDN delay line as a separate output.
In some embodiments each attenuation filter GEQd 1161 is implemented as a graphic EQ filter using M biquad HR band filters. With octave bands M=10, thus, the parameters of each graphic EQ comprise the feedforward and feedback coefficients for biquad HR filters, the gains for biquad band filters, and the overall gain.
The reverberator uses a network of delays 1159, feedback elements (shown as attenuation filters 1161 , feedback matrix 1157 and combiners 1155 and output gain 1163) to generate a very dense impulse response for the late part. Input samples 1751 are input to the reverberator to produce the reverberation audio signal component which can then be output.
The FDN reverberator comprises multiple recirculating delay lines. The unitary matrix A 1157 is used to control the recirculation in the network. Attenuation filters 1161 which may be implemented in some embodiments as graphic EQ filters implemented as cascades of second-order-section HR filters can facilitate controlling the energy decay rate at different frequencies. The filters 1161 are designed such that they attenuate the desired amount in decibels at each pulse pass through the delay line and such that the desired RT60 time is obtained.
Thus with reverb_method_type == 1 , where the adjusted reverberator parameters are included in the scene and reverberation parameters. For the FDN reverberator the parameters contain the coefficients of each attenuation filter GEQd, feedback matrix coefficients A, and lengths mid for D delay lines. Not all the parameters need to be adjusted/obtained but can utilize constant defined values. For example the feedback matrix coefficients A can be tabulated and stored in the Tenderer or implemented in software and only some parameters adjusted based on the room parameters. In this invention, each attenuation filter GEQd is a graphic EQ filter using M biquad HR band filters.
With octave bands M=10, thus, the parameters of each graphic EQ comprise the feedforward b and feedback a coefficients for 10 biquad HR filters, the gains for biquad band filters, and the overall gain.
The number of delay lines D can be adjusted depending on quality requirements and the desired tradeoff between reverberation quality and computational complexity. In an embodiment, an efficient implementation with D=15 delay lines is used. This makes it possible to define the feedback matrix coefficients A as proposed by Rocchesso in Maximally Diffusive Yet Efficient Feedback Delay Networks for Artificial Reverberation, IEEE Signal Processing Letters, Vol. 4. No. 9, Sep 1997, in terms of a Galois sequence facilitating efficient implementation.
A length mid for the delay line d can be determined based on virtual room dimensions. Here, we use the dimensions of the enclosure. For example, a shoebox shaped room can be defined with dimensions xDim, yDim, zDim. When the method is executed in the apparatus for reverb_method_type == 1 the dimensions are obtained from the encoder input file. When reverb_method_type == 2 or reverb_method_type == 3 the dimensions are obtained from the encoder input file (by the encoder device), included into the scene payload of the bitstream, and obtained by the Tenderer from the scene payload. When the input to the Tenderer is an AR scene with a listening space description file the dimensions are obtained from the listening space description. If the room is not shaped as a shoebox (or cuboid) then a shoebox can be fit inside the room and the dimensions of the fitted shoebox can be utilized for obtaining the delay line lengths. Alternatively, the dimensions can be obtained as three longest dimensions in the non-shoebox shaped room, or other suitable method. Such dimensions can also be obtained from a mesh if the bounding box is provided as a mesh. The dimensions can further be converted to modified dimensions of a virtual room or enclosure having the same volume as the input room or enclosure. For example, the ratios 1 , 1.3, and 1.9 can be used for the converted virtual room dimensions. When the method is executed in the Tenderer (reverb_method_type == 2 or reverb_method_type == 3) then the enclosure vertices are obtained from the bitstream and the dimensions can be calculated, along each of the axes x, y, z, by the difference of the maximum and minimum value of the vertices. Dimensions can be calculated the same way when the input is an AR scene to be rendered with a listening space description with the difference that the enclosure vertices are obtained from the listening space description and not from the bitstream.
The delays can in some embodiments be set proportionally to standing wave resonance frequencies in the virtual room or physical room. The delay line lengths mid can further be made mutually prime.
The attenuation filter coefficients in the delay lines can furthermore be adjusted so that a desired amount in decibels of attenuation happens at each signal recirculation through the delay line so that the desired RT60(k) time is obtained. This is done in a frequency specific manner to ensure the appropriate rate of decay of signal energy at specified frequencies k.
For a frequency k, the desired attenuation per signal sample is calculated as attenuationPerSample(k) = -601 (samplingRate * rt60(k)). The attenuation in decibels for a delay line of length mid is then attenuationDb(k) = mid * attenuationPerSample(k).
Furthermore when reverberator parameters are derived in the encoder (reverb_method_type==1 ), the attenuation filters are designed as cascade graphic equalizer filters as described in V. Valimaki and J. Liski, “Accurate cascade graphic equalizer,” IEEE Signal Process. Lett., vol. 24, no. 2, pp. 176-180, Feb. 2017, for each delay line. The design procedure outlined takes as input a set of command gains at octave bands. There are also methods for a similar graphic EQ structure which can support third octave bands, increasing the number of biquad filters to 31 and providing better match for detailed target responses such as indicated in Third-Octave and Bark Graphic-Equalizer Design with Symmetric Band Filters, https://www.mdpi.com/2076- 3417/10/4/1222/pdf. When reverberator parameters are derived in the Tenderer (reverb_method_type==2 or reverb_method_type==3 or an AR scene), a neurally controlled graphic equalizer design such as described in Valimaki, Ramd, “Neurally Controlled Graphic Equalizer”, IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 27, NO. 12, DECEMBER 2019 can be used. Furthermore in some embodiments if the method designs third octave graphic EQ then the method of Third-Octave and Bark Graphic-Equalizer Design with Symmetric Band Filters, https://www.mdpi.eom/2076-3417/10/4/1222/pdf can be employed.
Reverberation ratio parameters can refer to the diffuse-to-total energy ratio (DDR) or reverberant-to-direct ratio (RDR) or other equivalent representation. The ratio parameters can be equivalently represented on a linear scale or logarithmic scale.
A filter is designed in the step such that, when the filter is applied to the input data of the FDN reverberator, the output reverberation is configured to have the desired energy ratio defined by the DDR(k). The input to the design procedure can in some embodiments be the DDR values DDR(k~).
When receiving linear DDR values DDR(b), the values can be converted to linear RDR values as
RDR(b) = DDR(b) * 10(41/10)
When receiving logarithmic RDR values logRDR(b), the values can be converted to linear RDR values as
RDR(b) = i Q(/o9RDR(b)/ i°)
The GEQDDR matches the reverberator spectrum energy to the target spectrum energy. In order to do this, an estimate of the RDR of the reverberator output and the target RDR is obtained. The RDR of the reverberator output can be obtained by rendering a unit impulse through the reverberator using the first reverberator parameters (that is, the parameters of the FDN without the GEQDDR filter the parameters of which are being obtained) and measuring the energy of the reverberator output and energy of the unit impulse and calculating the ratio of these energies.
In some embodiments a unity impulse input is generated where the first sample value is 1 and the length of the zero tail is long enough. In practice, we have adjusted the length of the zero tail to equal max(RT60(b)) plus the tpredelay in samples. The monophonic output of the reverberator is of interest so the filter is configured to sum over the delay lines / to obtain the reverberator output srev(t) as a function of time t.
A long FFT (of length NFFT) is calculated over srev(t) and its absolute value is obtained as
FFA(kk) = abs(FFT(srev(t))
Here, kk are the FFT bin indices. The positive half spectral energy density is obtained as
S(M) = 1/NFFT * FFA(kk)2 where the energy from the negative frequency indices kk is added into the corresponding positive frequency indices kk.
The energy of a unit impulse can be calculated or obtained analytically and can be denoted as Su(kk).
Band energies are calculated of both the positive half spectral energy density of the reverberator S(kk) and the positive half spectral energy density of the unit impulse Su(kk). Band energies can be calculated as
Figure imgf000031_0001
where biow and bhigh are the lowest and highest bin index belonging to band b, respectively. The band bin indices can be obtained by comparing the frequencies of the bins to the lower and upper frequencies of each band.
The reproduced RDRrev(b) of the reverberator output at the frequency band b is obtained as
RDRrev(b) = S(b}/Su(b)'
The target linear magnitude response for GEQDDR can be obtained as ddrFilterTargetResponse(b) = sqrt(RDF?fb>)) I sqrt(/?£)/?rev(h)) where RDR(b) is the linear target RDR value mapped to frequency band b.
GontrolGain(b) = 20*log10(ddrFilterTargetResponse(b)) is input as the target response for the graphic equalizer design routine in Valimaki, Ramd, “Neurally Controlled Graphic Equalizer”, IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 27, NO. 12, DECEMBER 2019.
The DDR filter target response (control gains for the graphic EQ design routine) can also be obtained directly in the logarithmic domain as
Figure imgf000031_0002
The first reverberator parameters and the parameters of the Reverberator DDR control filter GEQDDR together form the reverberator parameters.
With respect to Figure 7 there is shown in further detail the reverberator output signals spatialization controller 303 as shown in Figure 3.
The reverberator output signals spatialization controller 303 is configured to receive the scene and reverberator parameters 300 and listener pose parameters 302. The reverberator output signals spatialization controller 303 is configured to use the listener pose parameters 302 and scene and reverberator parameters 300 to determine the acoustic environment where the listener currently is and provide that reverberator output channels such positions which surround the listener. This means that the reverberation when inside an acoustic enclosure, caused by that acoustic enclosure, is rendered as a diffuse signal enveloping the listener.
In some embodiments the reverberator output signals spatialization controller 303 comprises a listener acoustic environment determiner 701 configured to obtain the scene and reverberator parameters 300 and listener pose parameters 302 and determine the listener acoustic environment.
In some embodiments the reverberator output signals spatialization controller 303 comprises a listener reverberator corresponding to listener acoustic environment determiner 703 which is further configured to determine listener reverberator corresponding to listener acoustic environment information.
In some embodiments the reverberator output signals spatialization controller 303 comprises a head tracked output positions for the listener reverberator provider 705 configured to provide or determine the head tracked output positions for the listener and generate the output channel position 312.
The output of the reverberator output signals spatialization controller 303 is thus the reverberator output channel positions 312.
With respect to Figure 8 is shown the operations of an example reverberator output signals spatialization controller 303 according to some embodiments.
Thus for example is shown obtaining scene and reverberator parameters as shown in Figure 8 by step 803 and obtaining listener pose parameters as shown in Figure 8 by step 801 .
Then the method comprises determining listener acoustic environment as shown in Figure 8 by step 805. Having determined this then determine listener reverberator corresponding to listener acoustic environment as shown in Figure 8 by step 807.
Further the method comprises providing head tracked output positions for the listener reverberator as shown in Figure 8 by step 809.
Then outputting reverberator output channel positions as shown in Figure 8 by step 811 .
In some embodiments, the reverberator corresponding to the acoustic environment where the user currently is, is rendered by the reverberator output signals spatializer 307 as an immersive audio signal surrounding the user. That is, the signals in srev r(j, t) corresponding to the listener environment are rendered as point sources surrounding the listener.
With respect to Figure 9 there is shown in further detail the reverberator output signals spatializer 307. The reverberator output signals spatializer 307 is configured to receive the positions 312 from the reverberator output signals spatialization controller 303. Additionally is received the reverberator output signals 310 from the reverberators 305.
In some embodiments the reverberator output signals spatializer comprises a head-related transfer function (HRTF) filter 901 which is configured to render each reverberator output into a desired output format (such as binaural).
Furthermore in some embodiments the reverberator output signals spatializer comprises a output channels combiner 903 which is configured to combine (or sum) the signals to produce the output reverberated signal 314.
Thus for example for binaural reproduction the reverberator output signals spatializer 307 can use HRTF filtering to render the reverberator output signals in their desired positions indicated by reverberator output channel positions.
With respect to Figure 10 is shown a flow diagram showing the operations of the reverberator output signals spatializer according to some embodiments.
Thus the method can comprise obtaining reverberator output signals as shown in Figure 10 by step 1000 and obtaining reverberator output channel positions as shown in Figure 10 by step 1001.
Then the method may comprise applying a HRTF filter configured by the reverberator output channel positions to the reverberator output signals as shown in Figure 10 by step 1003. The method may then comprise summing or combining the output channels as shown in Figure 10 by step 1005.
Then the reverberated audio signals can be output as shown in Figure 10 by step 1007.
Figure 14 shows schematically an example system where the embodiments are implemented in an encoder device 1901 which performs part of the functionality; writes data into a bitstream 1921 and transmits that for a Tenderer device 1941 , which decodes the bitstream, performs reverberator processing according to the embodiments and outputs audio for headphone listening. Figure 14 for example shows apparatus, and specifically the Tenderer device 1941 , which is suitable for performing spatial rendering operations.
The encoder side 1901 of Figure 14 can be performed on content creator computers and/or network server computers. The output of the encoder is the bitstream 1921 which is made available for downloading or streaming. The decoder/renderer 1941 functionality runs on end-user-device, which can be a mobile device, personal computer, sound bar, tablet computer, car media system, home HiFi or theatre system, head mounted display for AR or VR, smart watch, or any suitable system for audio consumption.
The encoder 1901 is configured to receive the virtual scene description 1900 and the audio signals 1904. The virtual scene description 1900 can be provided in the MPEG-I Encoder Input Format (EIF) or in other suitable format. Generally, the virtual scene description contains an acoustically relevant description of the contents of the virtual scene, and contains, for example, the scene geometry as a mesh, acoustic materials, acoustic environments with reverberation parameters, positions of sound sources, and other audio element related parameters such as whether reverberation is to be rendered for an audio element or not. The encoder 1901 in some embodiments comprises a reverberation parameter determiner 1911 configured to receive the virtual scene description 1900 and configured to obtain the reverberation parameters. The reverberation parameters can in an embodiment be obtained from the RT60, DDR, predelay, and region/enclosure parameters of acoustic environments.
The encoder 1901 furthermore in some embodiments comprises a scene and reverberation payload encoder 1913 configured to obtain the determined reverberation parameters and virtual scene description 1900 and generate suitable encoded scene and reverberation parameters. The encoder 1901 on Figure 14 can be executed on content creator computers and/or network server computers. The output of the encoder is the bitstream which is made available for downloading or streaming. It can reside on a content delivery network (CDN) for example. The decoder/renderer 1941 functionality in some embodiments runs on an end-user-device, which can be a mobile device, personal computer, sound bar, tablet computer, car media system, home HiFi or theatre system, head mounted display for AR or VR, smart watch, or any suitable system for audio consumption.
In the embodiments described herein the scene and reverberation parameters are encoded into a bitstream payload referred as a reverberation payload (generated by the scene and reverberation payload encoder 1913). In some embodiments, an input reverberation encoding preferences 1910, in the form of use_reverb_payload_metadata and reverb_method_type, is provided to the reverberation parameter determiner 1911 and the scene and reverberation payload encoder 1913. Depending on the obtained use_reverb_payload_metadata, the encoder 1901 is configured to derive and write reverberator or compact reverberation parameters into the reverberation payload part (use_reverb_payload_metadata == 1 ) or into the scene payload part (use_reverb_payload_metadata == 0). Furthermore in some embodiments and depending on the determined reverb_method_type, the encoder 1901 is configured to either derive reverberator parameters based on the reverberation parameters provided in the encoder input data (reverb_method_type==1 ), or encode reverberation parameters in a compact representation into bitstream (reverb_method_type==2 or reverb_method_type==3).
Deriving reverberator parameters based on reverberation parameters can be implemented in some embodiments as described in the obtained parameters for at least one graphic EQ filter for a reverberator using the control gain data and obtain other parameters for a reverberator as indicated above.
The encoder 1901 further comprises a MPEG-H 3D audio encoder 1914 configured to obtain the audio signals 1904 and MPEG-H encode them and pass them to a bitstream encoder 1915.
The encoder 1901 furthermore in some embodiments comprises a bitstream encoder 1915 which is configured to receive the output of the scene and reverberation payload encoder 1913 and the encoded audio signals from the MPEG-H encoder 1914 and generate the bitstream 1921 which can be passed to the bitstream decoder 1941 . The bitstream 1921 in some embodiments can be streamed to end-user devices or made available for download or stored.
The decoder 1941 in some embodiments comprises a bitstream decoder 1951 configured to decode the bitstream.
The decoder 1941 further can comprise a reverberation payload decoder 1953 configured to obtain the encoded reverberation parameters and decode these in an opposite or inverse operation to the reverberation payload encoder 1913.
The listening space description LSDF generator 1971 is configured to generate and pass the LSDF information to the reverberator controller 1955 and the reverberator output signals spatialization controller 1959.
Furthermore the head pose generator 1957 receives information from a head mounted device or similar and generates head pose information or parameters which can be passed to the reverberator controller 1955, the reverberator output signals spatialization controller 1959 and HRTF processor 1963.
The decoder 1941 , in some embodiments, comprises a reverberator controller 1955 which also receives the output of the scene and reverberation payload decoder 1953 and generates the reverberation parameters for configuring the reverberators and passes this to the reverberators 1961 .
In some embodiments the decoder 1941 comprises a reverberator output signals spatialization controller 1959 configured to configure the reverberator output signals spatializer 1962.
The decoder 1941 in some embodiments comprises a MPEG-H 3D audio decoder 1954 which is configured to decode the audio signals and pass them to the (FDN) reverberators 1961 and direct sound processor 1965.
The decoder 1941 furthermore comprises (FDN) reverberators 1961 configured by the reverberator controller 1955 and configured to implement a suitable reverberation of the audio signals.
The output of the (FDN) reverberators 1961 is configured to output to a reverberator output signal spatializer 1962.
In some embodiments the decoder 1941 comprises a reverberator output signal spatializer 1962 configured to apply the spatialization and output to the binaural combiner 1967.
Additionally the decoder/renderer 1941 comprises a direct sound processor 1965 which is configured to receive the decoded audio signals and configured to implement any direct sound processing such as air absorption and distance-gain attenuation and which can be passed to a HRTF processor 1963 which with the head orientation determination (from a suitable sensor 1991 ) can generate the direct sound component which with the reverberant component from the HRTF processor 1963 is passed to a binaural signal combiner 1967. The binaural signal combiner 1967 is configured to combine the direct and reverberant parts to generate a suitable output (for example for headphone reproduction).
Furthermore in some embodiments the decoder comprises a head orientation determiner 1991 which passes the head orientation information to the HRTF processor 1963.
Although not shown, there can be various other audio processing methods applied such as early reflection rendering combined with the proposed methods.
With respect to Figure 12 is shown the example reverberation payload encoder 1913 in further detail.
In some embodiments the reverberation payload encoder 1913 comprises a frequency dependent RT60 and DDR data obtainer 1201 configured to obtain the frequency dependent RT60 and DDR data values. Where an acoustic environment has reverberation parameters described with RT60(k) and DDR(k). DDR(k) can be converted to logarithmic logRDR(k) with logRDR(k) = 10*log10(RDR(k)) = 10*log10(DDR(k)) + 41 dB.
In some embodiments the reverberation payload encoder 1913 comprises a RT60 and DDR to octave band centre frequency mapper (to obtain octave band data) 1203 which is configured map the obtained RT60 and DDR values to a octave band centre frequencies. This can be implemented by mapping each frequency k to the closest octave band centre frequency b. The band centre frequencies can in some embodiments be the following (in Hz): bandCenterFreqs = [ 31.25 62.5 125 250 500 1000 2000 4000 8000 16000]
Weighted linear interpolation can be used to obtain the value of the RT60(b) or logRDR(b) at each band centre frequency. If no data is provided above a certain band or below certain band, the last band value is extrapolated to higher bands (or the first band value is extrapolated to lower bands).
In some other embodiments other predefined frequency band divisions can be utilized. Frequency band divisions can be indicated with a set of centre frequencies like above, or with a set of band low and band high frequencies. A predefined number of frequency band divisions can be known by the encoder and Tenderer. Each frequency band division can be optionally identified with a unique identifier such as a unique index related to frequency band divisions. Such identifiers and corresponding divisions can be known by the encoder and Tenderer. In some embodiments new divisions can be formed by the encoder and then signalled to the Tenderer. In some embodiments the encoder can evaluate different frequency band divisions for mapping the frequency dependent input data. If a predefined number of input frequencies almost coincide with a number of centre frequencies in a frequency band division data, then a good match between the input data and the corresponding frequency band division data can be determined. This kind of evaluation can be performed by the encoder for a plurality of frequency band divisions and the frequency band division which best represents the input data, based on the criterion described above, can be selected for representing the input data. Data can then be encoded by sending the values of the input data mapped to the centre frequencies of the selected frequency band division, and the identifier of the used frequency band division. In some embodiments the frequency band divisions have different numbers of frequency bands which means that explicit identifiers are not needed but the Tenderer can identify the used frequency band division from the number of values.
In some embodiments the reverberation payload encoder 1913 comprises an octave band data encoder 1205. The octave band data encoder 1205 in some embodiments is configured to encode the octave band data by differential encoding methods. For example taking the first value and then encoding the rest of the values as their differences to the first value. The bitstream can contain the first value as such and Huffman codes of the difference values. In some embodiments such differential encoding is not applied but the octave band values are encoded into the bitstream as suitable integer values.
The reverberation payload encoder 1913 comprises a frequency dependent RT60 and DDR data encoder 1207. The frequency dependent RT60 and DDR data encoder 1207 is configured to encode the frequency-dependent RT60(k) and DDR(k) data. If the frequency values k are shared between these, then the frequency values need to be encoded only once. They can be difference encoded and Huffman coded like octave band data. Similarly, RT60(k) and DDR(k) can be difference encoded and Huffman coded. In some embodiments the difference encoding and/or Huffman coding are omitted and the values are included into the bitstream as suitable integer values. Furthermore in some embodiments the reverberation payload encoder 1913 comprises a bitrate comparer/encoder selector 1209, the selector 1209 is configured to compare the bitrate required for transmitting the encoding from the octave band data encoder 1205 and frequency dependent RT60 and DDR data encoder 1207 and select one to be transmitted as compact reverberation parameters. The number of bits required by transmitting the first value and the Huffman codes of the remaining values are compared for the data representations of both encoder options. The one leading to smallest number of bits is selected, and reverb_method_type is set accordingly to type 2 or type 3.
In some embodiments the reverberation payload encoder 1913 comprises a bitstream generator 1211 configured to create a bitstream representation of the selected compact reverberation parameters.
With respect to Figure 13 the operation of the encoder shown in Figure 12 is shown according to some embodiments.
First is the operation of obtaining Scene and reverberation parameters from the encoder input format as shown in Figure 13 by step 1301 .
Then is the operation of obtaining frequency dependent RT60 and DDR data as shown in Figure 13 by step 1303.
The next step is one of mapping the frequencies of RT60 and DDR data to octave band centre frequencies to obtain octave band data as shown in Figure 13 by step 1305.
Then the method comprises encoding the octave band data as shown in Figure 13 by step 1307 and encoding the frequency-dependent RT60 and DDR data as shown in step 1309.
The next operation is one of comparing the bitrate required for transmitting octave band data and RT60/DDR data and select one to be transmitted as compact reverberation parameters as shown in Figure 13 by step 1311.
Then the next step is to create (and output) a bitstream representation of the selected compact reverberation parameters as shown in Figure 13 by step 1313.
The bitstream can thus carry the information for low bitrate representation of metadata for late reverberation in different methods.
Reverb parameters represent filter coefficients for the FDN reverberator attenuation filters and the ratio control filter (DDR control filter), the delay line lengths, and spatial positions for the output delay lines. Other FDN parameters such as feedback matrix coefficients can be predetermined in the encoder and Tenderer and not included in the bitstream.
Encoded reverberation parameters carry RT60 and DDR data in the bitstream, encoded either as frequency dependent data with the frequencies at which the values are provided, or just as (interpolated) values at octave bands (without transmitting the octave band centre frequencies). Other frequency band divisions can be used in some embodiments.
RT60 times mapped into control gains of a graphic EQ (FDN attenuation filter). There are either 10 or 31 control gains. DDR values mapped into control gains of a graphic EQ. There are either 10 or 31 control gains.
This can be implemented in an example structure such as the following: reverbPayloadStruct ( ) { unsigned int(l) use reverb payload metadata ; //decides if reverb or scene payload is used if (use reverb payload metadata) ! unsigned int(2) reverb method type; // 1 means current method, 2 means RT60 and DDR encoded, 3 means RT60 and DDR octave band data reserved bits (6) = 0; if (reverb method type==l) { unsigned int(2) numberOf SpatialPositions ; unsigned int(8) numberOfAcousticEnvironments ; for(int i=0 ; i<numberOf SpatialPositions ; i++) { signed int(32) azimuth; signed int(32) elevation;
} for(int i=0 ; i<numberOfAcousticEnvironments ; i++) { unsigned int(16) environmentsld; filterParamsStruct ( ) ; for(int j =0 ; j <numberOf SpatialPositions ; j ++) { unsigned int(32) delayLineLength; filterParamsStruct ( ) ;
} }
} else if (reverb method type == 2) { for(int i=0 ; i<numberOfAcousticEnvironments ; i++) { unsigned int(16) environmentsld; EncodedRT60Struct () ;// frequency values can be differentially and Huffman coded with values for RT60 values. So basically it will be 2 unsigned int per frequency bin.
EncodedDDRStruct ( ) ;
ReverbEnclosureStruct ( ) ; } } } else if (reverb method type == 3) { for (int i=0 ; i<numberOfAcousticEnvironments ; i++) { unsigned int (16) environmentsld; RT60OctaveBandsDataStruct ( ) ; //10 bands DDROctaveBandsDataStruct ( ) ; //10 bands ReverbEnclosureStruct () ; //8 vertices } } } } aligned (8) filterParamsStruct ( ) { SOSLength;
If (SOSLength>0) { for (i=0; i<SOSLength; i++) { signed int (32) bl;
} for (i=0; i<SOSLength; i++) { signed int (32) b2 ;
} for (i=0; i<SOSLength; i++) { signed int (32) al;
} for (i=0; i<SOSLength; i++) { signed int (32) a2 ;
} signed int (32) globalGain; signed int (32) levelDb;
} aligned(8) PositionStruct ( ) { signed int(32) vertex pos x; signed int(32) vertex pos y; signed int(32) vertex pos z;
} aligned (8) ReverbEnclosureStruct ( ) { unsigned int(8) num vertices; //vertices for the enclosure f or ( i=0 ; i<num vertices ; i++) { PositionStruct ( ) ;
}
} aligned (8) EncodedRT60Struct ( ) { unsigned int(8) num frequency bands; for (i=0;i<num frequency bands;i++) { unsigned int(32) frequency encoded value; unsigned int(32) scene rt60 encoded value;
}
} aligned (8) EncodedDDRStruct ( ) { unsigned int(8) num frequency bands; for (i=0;i<num frequency bands;i++) { unsigned int(32) frequency encoded value; unsigned int(32) scene ddr encoded value; //represented as logRDR, Huffman coded
}
} aligned (8) RT60OctaveBandDataStruct ( ) { //fixed octave bands in the renderer for (i=0; i<10; i++) { unsigned int(32) rt60 encoded value; //Huffman coded }
} aligned ( 8 ) DDROctaveBandDataStruct ( ) { / / fixed octave bands in the renderer for ( i=0 ; i<10 ; i++ ) { unsigned int ( 32 ) ddr encoded value ; / / represented as logRDR, Huffman coded
}
}
The Semantics of the structure reverbPayloadStruct ( ) use_reverb_payload_metadata equal to 1 indicates to the Tenderer that the metadata carried in the reverb payload data structure should be use to perform late reverberation rendering. A value equal to 0 indicates to the renderer that the metadata from scene payload shall be used to perform late reverberation rendering. reverb_method_type equal to 1 indicates to the renderer that the metadata carries information with encoder optimized reverb parameters for the FDN. A value equal to 2 indicates that the carriage of reverberation metadata carries encoded representation of the RT60 and DDR. A value equal to 3 carried in the reverb payload data carries octave band data for RT60 and DDR. numberOf SpatialPos itions defines the number of output delay line positions for the late reverb payload. This value is defined using an index which corresponds to a specific number of delay lines. The value of the bit string ‘ObOO’ signals the renderer to a value of 15 spatial orientations for delay lines. The other three values ‘ObOT, ‘Ob 10’ and ‘0b1 T are reserved. az imuth defines azimuth of the delay line with respect to the listener. The range is between -180 to 180 degrees. elevation defines the elevation of the delay line with respect to the listener. The range is between -90 to 90 degrees. numberOfAcousticEnvironments defines the number of acoustic environments in the audio scene. The reverbPayloadStruct() carries information regarding the one or more acoustic environments which are present in the audio scene at that time. An acoustic environment has certain “Reverberation parameters” such as RT60 times which are used to obtain FDN reverb parameters. environment id This value defines the unique identifier of the acoustic environment. delayLineLength defines the length in units of samples for the graphic equalizer (GEQ) filter used for configuration of the delay line attenuation filter. The lengths of different delay lines corresponding to the same acoustic environment are mutually prime. f i lterParamsStruct ( ) this structure describes the graphic equalizer cascade filter to configure the attenuation filter for the delay lines. The same structure is also used subsequently to configure the filter for diffuse-to-direct reverberation ratio GEQDDR. The details of this structure are described in the next table.
If reverb_method_type is equal to 2, the bitstream comprises three structures:
• EncodedRT 60Struct ( ) carries RT60 (scene_rt60_value) values for each frequency band (frequency_value) represented as positive integers. In an implementation embodiment, the integers are differentially encoded and Huffman coded integer indices.
• EncodedDDRStruct ( ) carries DDR values(scene_ddr_coded_value) for each frequency (frequency_value) band represented as positive integers. In an implementation embodiment, the DDR value is represented according to the equation logRDR(k) = 10*log10(RDR(k)) = 10*log10(DDR(k)) - 41 dB, the integers are differentially encoded and Huffman coded integer indices.
• The frequency bands can be the same or different for RT60 and DDR. The Huffman coding of differences can be same or different for RT60 and DDR.
If reverb_method_type is equal to 3, the bitstream comprises three structures:
• DDROctaveBandDataStruct ( ) carries DDR values (ddr_encoded_value) for 10 Octave bands. In an implementation embodiment, the DDR value is represented according to equation logRDR(k) = 10*log10(RDR(k)) = 10*log10(DDR(k)) - 41 dB, the integers are differentially encoded and Huffman coded integer indices. The DDR value is represented according to equation logRDR(k) = 10*log10(RDR(k)) = 10*log10(DDR(k)) - 41 dB also.
• RT 60OctaveBandDataStruct ( ) carries RT60 values (encoded_rt60_values) for 10 Octave bands. The RT60 values can be converted to a suitable integer representation in an embodiment. The integers are differentially encoded and Huffman coded integer indices.
The Semantics of f i lterParamsStruct ( ) sosLength is the length of the each of the second order section filter coefficients. b1 , b2, a1 , a2 The filter is configured with coefficients b1 , b2, a1 and a2. These are the feedforward and feedback HR filter coefficients of the second-order section HR filters. globalGain specifies the gain factor in decibels for the GEQ. levelDB specifies a sound level offset for each of the delay lines in decibels.
All f i lterParamsStruct ( ) get deserialized into a GEQ object in the renderer.
As indicated earlier MPEG-I Audio Phase 2 will normatively standardize the bitstream and the Tenderer processing. There will also be an encoder reference implementation, but it can be modified later on as long as the output bitstream follows the normative specification. This allows improving the codec quality also after the standard has been finalized with novel encoder implementations.
The portions going to different parts of the MPEG-I standard can be:
• Encoder reference implementation will contain o Deriving the reverberator parameters or compact reverberation parameters for each of the acoustic environments based on their RT60 and DDR o Obtaining scene parameters from the encoder input and writing them into the bitstream. o Writing a bitstream description containing the reverberator or compact reverberation parameters and scene parameters.
• The normative bitstream shall contain reverberator or compact reverberation parameters described using the syntax described here. The bitstream shall be streamed to end-user devices or made available for download or stored.
• The normative renderer shall decode the bitstream to obtain the Scene and reverberation parameters and perform the compact reverberation parameter decoding and mapping to reverberator parameters as described in the embodiments herein. Moreover, the renderer is configured to take care of reverberation rendering.
• The complete normative renderer will also obtain other parameters from the bitstream related to room acoustics and sound source properties, and use them to render the direct sound, early reflection, diffraction, sound source spatial extent or width, and other acoustic effects in addition to diffuse late reverberation. The invention presented here focuses on the rendering of the diffuse late reverberation part and in particular how to enable bitrate efficient coding of compact reverberation parameters.
With respect to Figure 15 an example electronic device which may be used as any of the apparatus parts of the system as described above. The device may be any suitable electronics device or apparatus. For example in some embodiments the device 2000 is a mobile device, user equipment, tablet computer, computer, audio playback apparatus, etc. The device may for example be configured to implement the encoder or the Tenderer or any functional block as described above.
In some embodiments the device 2000 comprises at least one processor or central processing unit 2007. The processor 2007 can be configured to execute various program codes such as the methods such as described herein.
In some embodiments the device 2000 comprises a memory 2011. In some embodiments the at least one processor 2007 is coupled to the memory 2011. The memory 2011 can be any suitable storage means. In some embodiments the memory 2011 comprises a program code section for storing program codes implementable upon the processor 2007. Furthermore in some embodiments the memory 2011 can further comprise a stored data section for storing data, for example data that has been processed or to be processed in accordance with the embodiments as described herein. The implemented program code stored within the program code section and the data stored within the stored data section can be retrieved by the processor 2007 whenever needed via the memory-processor coupling.
In some embodiments the device 2000 comprises a user interface 2005. The user interface 2005 can be coupled in some embodiments to the processor 2007. In some embodiments the processor 2007 can control the operation of the user interface 2005 and receive inputs from the user interface 2005. In some embodiments the user interface 2005 can enable a user to input commands to the device 2000, for example via a keypad. In some embodiments the user interface 2005 can enable the user to obtain information from the device 2000. For example the user interface 2005 may comprise a display configured to display information from the device 2000 to the user. The user interface 2005 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the device 2000 and further displaying information to the user of the device 2000. In some embodiments the user interface 2005 may be the user interface for communicating.
In some embodiments the device 2000 comprises an input/output port 2009. The input/output port 2009 in some embodiments comprises a transceiver. The transceiver in such embodiments can be coupled to the processor 2007 and configured to enable a communication with other apparatus or electronic devices, for example via a wireless communications network. The transceiver or any suitable transceiver or transmitter and/or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling.
The transceiver can communicate with further apparatus by any suitable known communications protocol. For example in some embodiments the transceiver can use a suitable universal mobile telecommunications system (UMTS) protocol, a wireless local area network (WLAN) protocol such as for example IEEE 802. X, a suitable short- range radio frequency communication protocol such as Bluetooth, or infrared data communication pathway (IRDA).
The input/output port 2009 may be configured to receive the signals.
In some embodiments the device 2000 may be employed as at least part of the renderer. The input/output port 2009 may be coupled to headphones (which may be a headtracked or a non-tracked headphones) or similar.
In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
The embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general-purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.
The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.

Claims

CLAIMS:
1 . An apparatus for assisting spatial rendering in at least one acoustic environment, the apparatus comprising means configured to: obtain at least one reverberation parameter; convert the obtained at least one reverberation parameter into at least one frequency band data; encode the at least one frequency band data; encode the at least one reverberation parameter; compare resources required to transmit the encoded at least one frequency band data and the encoded at least one reverberation parameter; and generate a bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising a selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter.
2. The apparatus as claimed in claim 1 , wherein the at least one frequency band data is organised as octave bands.
3. The apparatus as claimed in any of claims 1 or 2, wherein the at least one frequency band data further comprises: an index identifying a centre band frequency range; and a number of bands.
4. The apparatus as claimed in any of claims 1 to 3, wherein the means configured to generate the bitstream is configured to generate the bitstream comprising a selection indicator configured to indicate the selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter within the bitstream.
5. The apparatus as claimed in any of claims 1 to 4, further configured to obtain a scene description defining a virtual scene forming at least part of the at least one acoustic environment, wherein the at least one reverberation parameter is associated with the virtual scene.
6. The apparatus as claimed in any of claims 1 to 5, wherein the at least one reverberation parameter is a frequency dependent reverberation parameter.
7. The apparatus as claimed in any of claims 1 to 6, wherein the resources are one of: encoded bitrate; encoded bits; and channel capacity.
8. The apparatus as claimed in any of claims 1 to 6, wherein the means configured to generate the bitstream comprising the reverberation parameter part comprising the selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter is configured to: select the encoded at least one reverberation parameter in a high bitrate mode; and select the encoded at least one frequency band data in a low bitrate mode.
9. An apparatus for assisting spatial rendering in at least one acoustic environment, the apparatus comprising means configured to: obtain a bitstream, the bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising one of: an encoded at least one frequency band data and an encoded at least one reverberation parameter; decode the reverberation parameter part to generate decoded reverberation parameters; obtain reverberator parameters from the decoded reverberation parameters; initialize at least one reverberator based on the reverberator parameters; obtain at least one input audio signal associated with the at least one acoustic environment; and generate an output audio signal based on the application of the at least one reverberator to the at least one input audio signal.
10. The apparatus as claimed in claim 9, wherein the bitstream further comprises at least one indicator indicating that the bitstream comprises at least one of: the encoded at least one frequency band data and the encoded at least one reverberation parameter, wherein the means configured to obtain reverberator parameters from the decoded reverberation parameters is configured to determine the reverberation parameter part based on the indicator.
11. The apparatus as claimed in any of claims 9 and 10, wherein the means configured to obtain reverberator parameters from the decoded reverberation parameters is configured to determine the reverberation parameter part based on the indicator and configured to: determine the bitstream comprises the at least one reverberation parameter in a high bitrate mode; and determine the bitstream comprises the at least one frequency band data in a low bitrate mode.
12. The apparatus as claimed in claim 9, wherein the means configured to obtain reverberator parameters from the decoded reverberation parameters is configured to determine the reverberation parameter part further comprises an indicator indicating that the reverberator parameters are to be determined from at least one reverberation parameter encoded into a scene payload.
13. The apparatus as claimed in any of claims 9 to 12, wherein the means configured to initialize at least one reverberator based on the reverberator parameters is configured to initialize the at least one reverberator using the at least one reverberator parameter independent on whether at least one acoustic environment is a virtual acoustic environment or an augmented reality acoustic environment.
14. A method for an apparatus for assisting spatial rendering in at least one acoustic environment, the method comprising: obtaining at least one reverberation parameter; converting the obtained at least one reverberation parameter into at least one frequency band data; encoding the at least one frequency band data; encoding the at least one reverberation parameter; comparing resources required to transmit the encoded at least one frequency band data and the encoded at least one reverberation parameter; generating a bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising a selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter.
15. The method as claimed in claim 14, wherein generating the bitstream comprising a selection indicator configured to indicate the selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter within the bitstream.
16. The method as claimed in any of claim 14 or 15, further comprising obtaining a scene description defining a virtual scene forming at least part of the at least one acoustic environment, wherein the at least one reverberation parameter is associated with the virtual scene.
17. The method as claimed in any of claim 14 to 16, wherein the reverberation parameter part comprising the selection based on the comparison comprising: selecting the encoded at least one reverberation parameter in a high bitrate mode; and selecting the encoded at least one frequency band data in a low bitrate mode.
18. A method for an apparatus for assisting spatial rendering in at least one acoustic environment, the method comprising: obtaining a bitstream, the bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising one of: an encoded at least one frequency band data and an encoded at least one reverberation parameter; decoding the reverberation parameter part to generate decoded reverberation parameters; obtaining reverberator parameters from the decoded reverberation parameters; initializing at least one reverberator based on the reverberator parameters; obtaining at least one input audio signal associated with the at least one acoustic environment; and generating an output audio signal based on the application of the at least one reverberator to the at least one input audio signal.
19. The method as claimed in claim 18, wherein the bitstream further comprises at least one indicator indicating that the bitstream comprises at least one of: the encoded at least one frequency band data and the encoded at least one reverberation parameter, wherein obtaining reverberator parameters from the decoded reverberation parameters comprising determining the reverberation parameter part based on the indicator.
20. The method as claimed in any of claim 18 or 19, wherein obtaining reverberator parameters from the decoded reverberation parameters comprising determining the reverberation parameter part based on the indicator and further comprising: determining the bitstream comprising the at least one reverberation parameter in a high bitrate mode; and determining the bitstream comprising the at least one frequency band data in a low bitrate mode.
21 . The method as claimed in claim 18, wherein obtaining reverberator parameters from the decoded reverberation parameters comprising determining the reverberation parameter part further comprising an indicator indicating that the reverberator parameters are to be determined from at least one reverberation parameter encoded into a scene payload.
22. The method as claimed in claim 18 to 21 , wherein initializing at least one reverberator based on the reverberator parameters comprising initializing the at least one reverberator using the at least one reverberator parameter independent on whether at least one acoustic environment is a virtual acoustic environment or an augmented reality acoustic environment.
23. An apparatus for assisting spatial rendering in at least one acoustic environment, the apparatus comprising at least one processor and at least one memory including a computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: obtain at least one reverberation parameter; convert the obtained at least one reverberation parameter into at least one frequency band data; encode the at least one frequency band data; encode the at least one reverberation parameter; compare resources required to transmit the encoded at least one frequency band data and the encoded at least one reverberation parameter; and generate a bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising a selection, based on the comparison, of one or more of: the encoded at least one frequency band data; and the encoded at least one reverberation parameter.
24. An apparatus for assisting spatial rendering in at least one acoustic environment, the apparatus comprising at least one processor and at least one memory including a computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: obtain a bitstream, the bitstream comprising: an identifier identifying the at least one acoustic environment; information defining at least one dimension of the at least one acoustic environment; and a reverberation parameter part comprising one of: an encoded at least one frequency band data and an encoded at least one reverberation parameter; decode the reverberation parameter part to generate decoded reverberation parameters; obtain reverberator parameters from the decoded reverberation parameters; initialize at least one reverberator based on the reverberator parameters; obtain at least one input audio signal associated with the at least one acoustic environment; and generate an output audio signal based on the application of the at least one reverberator to the at least one input audio signal.
PCT/EP2023/053283 2022-03-02 2023-02-10 Spatial rendering of reverberation WO2023165800A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB2202892.2A GB2616280A (en) 2022-03-02 2022-03-02 Spatial rendering of reverberation
GB2202892.2 2022-03-02

Publications (1)

Publication Number Publication Date
WO2023165800A1 true WO2023165800A1 (en) 2023-09-07

Family

ID=81075657

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2023/053283 WO2023165800A1 (en) 2022-03-02 2023-02-10 Spatial rendering of reverberation

Country Status (2)

Country Link
GB (1) GB2616280A (en)
WO (1) WO2023165800A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2200335A (en) 1987-01-28 1988-08-03 Roberts Systems Inc Apparatus for packaging articles
AU2013207549A1 (en) * 2009-04-09 2013-08-01 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating a synthesis audio signal and for encoding an audio signal
US20200388291A1 (en) * 2017-09-15 2020-12-10 Lg Electronics Inc. Audio encoding method, to which brir/rir parameterization is applied, and method and device for reproducing audio by using parameterized brir/rir information
WO2021186104A1 (en) * 2020-03-16 2021-09-23 Nokia Technologies Oy Rendering encoded 6dof audio bitstream and late updates
WO2021186107A1 (en) 2020-03-16 2021-09-23 Nokia Technologies Oy Encoding reverberator parameters from virtual or physical scene geometry and desired reverberation characteristics and rendering using these

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108694955B (en) * 2017-04-12 2020-11-17 华为技术有限公司 Coding and decoding method and coder and decoder of multi-channel signal
EP3777249A4 (en) * 2018-04-10 2022-01-05 Nokia Technologies Oy An apparatus, a method and a computer program for reproducing spatial audio
GB2588171A (en) * 2019-10-11 2021-04-21 Nokia Technologies Oy Spatial audio representation and rendering

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2200335A (en) 1987-01-28 1988-08-03 Roberts Systems Inc Apparatus for packaging articles
AU2013207549A1 (en) * 2009-04-09 2013-08-01 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating a synthesis audio signal and for encoding an audio signal
US20200388291A1 (en) * 2017-09-15 2020-12-10 Lg Electronics Inc. Audio encoding method, to which brir/rir parameterization is applied, and method and device for reproducing audio by using parameterized brir/rir information
WO2021186104A1 (en) * 2020-03-16 2021-09-23 Nokia Technologies Oy Rendering encoded 6dof audio bitstream and late updates
WO2021186107A1 (en) 2020-03-16 2021-09-23 Nokia Technologies Oy Encoding reverberator parameters from virtual or physical scene geometry and desired reverberation characteristics and rendering using these

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ROCCHESSO: "Maximally Diffusive Yet Efficient Feedback Delay Networks for Artificial Reverberation", IEEE SIGNAL PROCESSING LETTERS, vol. 4, no. 9, September 1997 (1997-09-01), XP000701914, DOI: 10.1109/97.623041
V. VALIMAKIJ. LISKI: "Accurate cascade graphic equalizer", IEEE SIGNAL PROCESS. LETT., vol. 24, no. 2, February 2017 (2017-02-01), pages 176 - 180, XP011639395, DOI: 10.1109/LSP.2016.2645280
VALIMAKI, RAMD: "Neurally Controlled Graphic Equalizer", IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, vol. 27, no. 12, December 2019 (2019-12-01), XP011748617, DOI: 10.1109/TASLP.2019.2935809

Also Published As

Publication number Publication date
GB202202892D0 (en) 2022-04-13
GB2616280A (en) 2023-09-06

Similar Documents

Publication Publication Date Title
US11343630B2 (en) Audio signal processing method and apparatus
US9848275B2 (en) Audio signal processing method and device
US20230100071A1 (en) Rendering reverberation
CN110326310B (en) Dynamic equalization for crosstalk cancellation
WO2021186107A1 (en) Encoding reverberator parameters from virtual or physical scene geometry and desired reverberation characteristics and rendering using these
US20240089694A1 (en) A Method and Apparatus for Fusion of Virtual Scene Description and Listener Space Description
US20240196159A1 (en) Rendering Reverberation
WO2023165800A1 (en) Spatial rendering of reverberation
US20230179947A1 (en) Adjustment of Reverberator Based on Source Directivity
GB2618983A (en) Reverberation level compensation
WO2023169819A2 (en) Spatial audio rendering of reverberation
WO2023131744A1 (en) Conditional disabling of a reverberator
US20230143857A1 (en) Spatial Audio Reproduction by Positioning at Least Part of a Sound Field
WO2023135359A1 (en) Adjustment of reverberator based on input diffuse-to-direct ratio
CN116600242B (en) Audio sound image optimization method and device, electronic equipment and storage medium
WO2023213501A1 (en) Apparatus, methods and computer programs for spatial rendering of reverberation
WO2024149548A1 (en) A method and apparatus for complexity reduction in 6dof rendering

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23705217

Country of ref document: EP

Kind code of ref document: A1