WO2022237851A1 - 一种音频编码、解码方法及装置 - Google Patents

一种音频编码、解码方法及装置 Download PDF

Info

Publication number
WO2022237851A1
WO2022237851A1 PCT/CN2022/092310 CN2022092310W WO2022237851A1 WO 2022237851 A1 WO2022237851 A1 WO 2022237851A1 CN 2022092310 W CN2022092310 W CN 2022092310W WO 2022237851 A1 WO2022237851 A1 WO 2022237851A1
Authority
WO
WIPO (PCT)
Prior art keywords
virtual speaker
target virtual
encoding
audio
channel signal
Prior art date
Application number
PCT/CN2022/092310
Other languages
English (en)
French (fr)
Inventor
刘帅
高原
王宾
夏丙寅
王喆
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP22806813.6A priority Critical patent/EP4318470A1/en
Publication of WO2022237851A1 publication Critical patent/WO2022237851A1/zh
Priority to US18/504,102 priority patent/US20240079016A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters

Definitions

  • the embodiments of the present application relate to the technical field of encoding and decoding, and in particular, to an audio encoding and decoding method and device.
  • Three-dimensional audio technology is an audio technology for acquiring, processing, transmitting and rendering playback of sound events and three-dimensional sound field information in the real world.
  • the three-dimensional audio technology makes the sound have a strong sense of space, envelopment and immersion, giving people an extraordinary auditory experience of "immersive sound”.
  • Higher order ambisonics (HOA) technology has the property of being independent of the speaker layout in the recording, encoding and playback stages and the rotatable playback characteristics of HOA format data, which has higher flexibility in three-dimensional audio playback. Therefore, it has also received more extensive attention and research.
  • HOA technology requires a large amount of data to record more detailed sound scene information. Although this kind of 3D audio signal sampling and storage according to the scene is more conducive to the preservation and transmission of the spatial information of the audio signal, as the HOA order increases, the amount of data will also increase, and a large amount of data will cause difficulties in transmission and storage. Therefore, it is necessary to Encode and decode the HOA signal.
  • the HOA signal to be encoded is encoded to generate a virtual speaker signal and a residual signal, and then the virtual speaker signal and the residual signal are further encoded to obtain a code stream.
  • codec processing is performed on the virtual speaker signal and the residual signal of each frame.
  • only the correlation between the signals of the current frame is considered, and the virtual speaker signal and the residual signal of each frame are encoded, resulting in high computational complexity and low encoding efficiency.
  • Embodiments of the present application provide an audio encoding and decoding method and device to solve the problem of high computational complexity.
  • the embodiment of the present application provides an audio coding method, including: obtaining the audio channel signal of the current frame, the audio channel signal of the current frame is performed on the original high-order ambisonic reverberation HOA signal through the first target virtual speaker Obtained by spatial mapping; when it is determined that the first target virtual speaker and the second target virtual speaker meet the set condition, determine the current frame according to the second coding parameter of the audio channel signal of the previous frame of the current frame.
  • the first encoding parameter of the audio channel signal, the audio channel signal of the previous frame corresponds to the second target virtual speaker; encode the audio channel signal of the current frame according to the first encoding parameter;
  • the encoding result of the audio channel signal of the current frame is written into the code stream.
  • the encoding parameters of the current frame can be determined according to the encoding parameters of the previous frame, so that there is no need to recalculate the current frame. Encoding parameters, which can improve encoding efficiency.
  • the method further includes: writing the first encoding parameter into a code stream.
  • the coding parameters determined according to the coding parameters of the previous frame are written into the code stream as the coding parameters of the current frame, so that the peer can obtain the coding parameters and improve the coding efficiency.
  • the first encoding parameter includes one or more of an inter-channel pairing parameter, an inter-channel auditory space parameter, or an inter-channel bit allocation parameter.
  • the inter-channel auditory space parameter includes one or more items of an inter-channel sound level difference ILD, an inter-channel time difference ITD, or an inter-channel phase difference IPD.
  • the setting condition includes that the first spatial position overlaps with the second spatial position; and the determination of the current frame according to the second encoding parameter of the audio channel signal of the previous frame
  • the first encoding parameter of the audio channel signal includes: using the second encoding parameter of the audio channel signal of the previous frame as the first encoding parameter of the audio channel signal of the current frame.
  • the method further includes: writing the multiplexing identifier into the code stream, where the value of the multiplexing identifier is a first value, and the first value indicates the audio channel signal of the current frame
  • the first encoding parameter multiplexes the second encoding parameter.
  • the first spatial position includes first coordinates of the first target virtual speaker
  • the second spatial position includes second coordinates of the second target virtual speaker
  • the first The overlapping of the spatial position and the second spatial position includes that the first coordinate is the same as the second coordinate; or the first spatial position includes the first serial number of the first target virtual speaker, and the second spatial position Including the second serial number of the second target virtual speaker, the first spatial position overlapping the second spatial position includes the first serial number being the same as the second serial number; or the first spatial position includes the The first HOA coefficient of the first target virtual speaker, the second spatial position includes the second HOA coefficient of the second target virtual speaker, and the overlapping of the first spatial position and the second spatial position includes the first A HOA coefficient is the same as the second HOA coefficient.
  • the spatial position is represented by coordinates, serial numbers or HOA coefficients, which is simple and effective for determining whether the virtual speaker in the previous frame overlaps with the virtual speaker in the current frame.
  • the first target virtual speaker includes M virtual speakers, and the second target virtual speaker includes N virtual speakers;
  • the set condition includes the first The spatial position does not overlap with the second spatial position of the second target virtual speaker, and the mth virtual speaker included in the first target virtual speaker is located at the center of the nth virtual speaker included in the second target virtual speaker
  • m traverses a positive integer less than or equal to M
  • n traverses a positive integer less than or equal to N
  • the audio frequency of the current frame is determined according to the second encoding parameter of the audio channel signal of the previous frame
  • the first encoding parameter of the channel signal includes: adjusting the second encoding parameter according to a set ratio to obtain the first encoding parameter.
  • the encoding parameters of the current frame are adjusted by the encoding parameters of the previous frame, taking into account the audio channel signal.
  • the first encoding parameter may be one encoding parameter or multiple encoding parameters
  • the adjustment may be reduction or enlargement, or partial reduction and other part unchanged, or partial enlargement and other One part is unchanged, or part is reduced and the other part is enlarged, or part is reduced, part is unchanged and part is enlarged.
  • the Whether the m-th virtual speaker is located within a set range centered on the n-th virtual speaker is determined by a degree of correlation between the m-th virtual speaker and the n-th virtual speaker, wherein the correlation meet the following conditions:
  • R represents the degree of correlation
  • norm () represents the normalization operation
  • M H is the matrix that the coordinates of the virtual speakers included in the first target virtual speaker of the current frame form, Transpose of a matrix composed of coordinates of virtual speakers included in the second target virtual speaker of the previous frame; when the correlation is greater than a set value, the mth virtual speaker is located at within the set range of the center.
  • the method further includes: writing the multiplexing identifier into the code stream, where the value of the multiplexing identifier is a second value, and the second value indicates the audio channel signal of the current frame
  • the first encoding parameter of is obtained by adjusting the second encoding parameter according to a set ratio.
  • the method further includes: writing the set ratio into the code stream.
  • the set ratio is notified to the decoding side through the code stream, so that the decoding side determines the encoding parameters of the current frame according to the set ratio, so that the decoding side obtains the encoding parameters while improving encoding efficiency.
  • the embodiment of the present application provides an audio decoding method, including: parsing the multiplexing identifier from the code stream, the multiplexing identifier indicating that the first encoding parameter of the audio channel signal of the current frame is passed through the first coding parameter of the current frame Determining the second encoding parameter of the audio channel signal of the previous frame; determining the first encoding parameter according to the second encoding parameter of the audio channel signal of the previous frame; determining the first encoding parameter from the code stream according to the first encoding parameter Decode the audio channel signal of the current frame.
  • the decoding side does not need to parse the encoding parameters from the code stream, which can improve decoding efficiency.
  • determining the first encoding parameter according to the second encoding parameter of the audio channel signal of the previous frame includes: when the value of the multiplexing identifier is the first value, the The first value indicates that the first encoding parameter is multiplexed with the second encoding parameter, and the second encoding parameter is obtained as the first encoding parameter.
  • determining the first encoding parameter according to the second encoding parameter of the audio channel signal of the previous frame includes: when the value of the multiplexing identifier is the second value, the The second value indicates that the first encoding parameter is obtained by adjusting the second encoding parameter according to a set ratio, and the first encoding parameter is obtained by adjusting the second encoding parameter according to a set ratio.
  • the method further includes: when the value of the multiplexing identifier is a second value, decoding from the code stream to obtain the set ratio.
  • the encoding parameters of the audio channel signal include one or more of an inter-channel pairing parameter, an inter-channel auditory space parameter, or an inter-channel bit allocation parameter.
  • the embodiment of the present application provides an audio encoding device.
  • the audio coding device includes several functional units for implementing any one method of the first aspect.
  • the audio encoding device may include a spatial encoding unit, configured to obtain an audio channel signal of the current frame, where the audio channel signal of the current frame is spatially mapped to the original high-order ambisonics HOA signal through the first target virtual speaker Obtained; a core coding unit, configured to determine the current The first encoding parameter of the audio channel signal of the frame, the audio channel signal of the previous frame corresponds to the second target virtual speaker; encode the audio channel signal of the current frame according to the first encoding parameter, and Writing the encoding result of the audio channel signal of the current frame into a code stream.
  • the core coding unit is further configured to write the first coding parameter into a code stream.
  • the first encoding parameter includes one or more of an inter-channel pairing parameter, an inter-channel auditory space parameter, or an inter-channel bit allocation parameter.
  • the set condition includes that the first spatial position of the first target virtual speaker overlaps with the second spatial position of the second target virtual speaker; the core coding unit is specifically used to The second encoding parameter of the audio channel signal of the previous frame is used as the first encoding parameter of the audio channel signal of the current frame.
  • the core coding unit is further configured to write the multiplexing identifier into the code stream, where the value of the multiplexing identifier is a first value, and the first value indicates the The first encoding parameters of the audio channel signal multiplex the second encoding parameters.
  • the first spatial position includes first coordinates of the first target virtual speaker
  • the second spatial position includes second coordinates of the second target virtual speaker
  • the first The overlapping of the spatial position and the second spatial position includes that the first coordinate is the same as the second coordinate; or the first spatial position includes the first serial number of the first target virtual speaker, and the second spatial position Including the second serial number of the second target virtual speaker, the first spatial position overlapping the second spatial position includes the first serial number being the same as the second serial number; or the first spatial position includes the The first HOA coefficient of the first target virtual speaker, the second spatial position includes the second HOA coefficient of the second target virtual speaker, and the overlapping of the first spatial position and the second spatial position includes the first A HOA coefficient is the same as the second HOA coefficient.
  • the first target virtual speaker includes M virtual speakers, and the second target virtual speaker includes N virtual speakers;
  • the set condition includes the first The spatial position does not overlap with the second spatial position of the second target virtual speaker, and the mth virtual speaker included in the first target virtual speaker is located at the center of the nth virtual speaker included in the second target virtual speaker Within the set range, wherein, m traverses positive integers less than or equal to M, and n traverses positive integers less than or equal to N;
  • the core encoding unit is specifically configured to adjust the second encoding parameters according to a set ratio to obtain the the first encoding parameter.
  • the Whether the m-th virtual speaker is located within a set range centered on the n-th virtual speaker is determined by a degree of correlation between the m-th virtual speaker and the n-th virtual speaker, wherein the correlation meet the following conditions:
  • R represents the degree of correlation
  • norm () represents the normalization operation
  • M H is the matrix that the coordinates of the virtual speakers included in the first target virtual speaker of the current frame form, transpose of a matrix consisting of coordinates of the virtual speakers included for the second target virtual speaker of the previous frame;
  • the m th virtual speaker is located within a set range centered on the n th virtual speaker.
  • the core coding unit is further configured to write the multiplexing identifier into the code stream, where the value of the multiplexing identifier is a second value, and the second value indicates the The first encoding parameter of the audio channel signal is obtained by adjusting the second encoding parameter according to a set ratio.
  • the core coding unit is further configured to write the set ratio into the code stream.
  • the embodiment of the present application provides an audio decoding device.
  • the audio decoding device includes several functional units for implementing any one of the methods of the third aspect.
  • the audio decoding device may include: a core decoding unit, configured to parse the multiplexing identifier from the code stream, and the multiplexing identifier indicates that the first encoding parameter of the audio channel signal of the current frame is passed through the previous frame of the current frame.
  • Determining the second coding parameter of the audio channel signal of the frame determining the first coding parameter according to the second coding parameter of the audio channel signal of the previous frame; decoding the code stream from the code stream according to the first coding parameter
  • the audio channel signal of the current frame a spatial decoding unit, configured to perform spatial decoding on the audio channel signal to obtain a high-order ambisonic reverberation HOA signal.
  • the core decoding unit is specifically configured to: when the value of the multiplexing flag is a first value, the first value indicates that the first encoding parameter multiplexes the second An encoding parameter, obtaining the second encoding parameter as the first encoding parameter.
  • the core decoding unit is specifically configured to: when the value of the multiplexing flag is a second value, the second value indicates that the first encoding parameter is adjusted according to a set ratio.
  • the second encoding parameter is obtained, and the first encoding parameter is obtained by adjusting the second encoding parameter according to a set ratio.
  • the core decoding unit is specifically configured to, when the value of the multiplexing flag is a second value, decode the code stream to obtain the set ratio.
  • the encoding parameters of the audio channel signal include one or more of an inter-channel pairing parameter, an inter-channel auditory space parameter, or an inter-channel bit allocation parameter.
  • the embodiment of the present application provides an audio encoder, where the video encoder is used to encode an HOA signal.
  • the audio encoder can implement the method described in the first aspect.
  • the audio encoder may include the device described in any design of the third aspect.
  • the embodiment of the present application provides an audio decoder, where the video decoder is used to decode an HOA signal from a code stream.
  • the audio decoder can implement any one of the methods described in the second aspect.
  • the audio decoder includes the device described in any design of the fourth aspect.
  • the embodiment of the present application provides an audio coding device, including: a non-volatile memory and a processor coupled to each other, and the processor calls the program code stored in the memory to execute the first aspect or the first aspect.
  • the embodiment of the present application provides an audio decoding device, including: a non-volatile memory and a processor coupled to each other, and the processor calls the program code stored in the memory to execute the second aspect or the first The method described in either design of the two aspects.
  • the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores program code, wherein the program code includes any one of the first aspect to the second aspect Instructions for some or all steps of a method.
  • an embodiment of the present application provides a computer program product, which, when running on a computer, causes the computer to execute part or all of the steps of any one of the methods from the first aspect to the second aspect.
  • the embodiment of the present application provides a computer-readable storage medium, including the code stream obtained by any one of the methods in the first aspect.
  • FIG. 1A is a schematic block diagram of an audio encoding and decoding system 100 in an embodiment of the present application
  • FIG. 1B is a schematic block diagram of an audio encoding and decoding process in an embodiment of the present application
  • FIG. 1C is a schematic block diagram of another audio encoding and decoding system in the embodiment of the present application.
  • FIG. 1D is a schematic block diagram of another audio encoding and decoding system in the embodiment of the present application.
  • FIG. 2A is a schematic structural diagram of an audio encoding component in an embodiment of the present application.
  • FIG. 2B is a schematic structural diagram of an audio decoding component in an embodiment of the present application.
  • FIG. 3A is a schematic flowchart of an audio encoding method in an embodiment of the present application.
  • FIG. 3B is a schematic flow chart of another audio encoding method in the embodiment of the present application.
  • FIG. 4A is a schematic flow chart of an audio encoding and decoding method in an embodiment of the present application.
  • FIG. 4B is a schematic flow chart of another audio encoding and decoding method in the embodiment of the present application.
  • FIG. 5 is a schematic block diagram of an audio encoding process in an embodiment of the present application.
  • FIG. 6 is a schematic diagram of an audio encoding device in an embodiment of the present application.
  • FIG. 7 is a schematic diagram of an audio decoding device in an embodiment of the present application.
  • the corresponding device may include one or more units, such as functional units, to perform the described one or more method steps (for example, one unit performs one or more steps , or a plurality of units, each of which performs one or more of the plurality of steps), even if such one or more units are not explicitly described or illustrated in the drawing.
  • units such as functional units, to perform the described one or more method steps (for example, one unit performs one or more steps , or a plurality of units, each of which performs one or more of the plurality of steps), even if such one or more units are not explicitly described or illustrated in the drawing.
  • a corresponding method may comprise a step for performing the functionality of one or more units (e.g., a step for performing the functionality of one or more units functionality, or a plurality of steps, each of which performs the functionality of one or more of the plurality of units), even if such one or more steps are not explicitly described or illustrated in the drawing.
  • a step for performing the functionality of one or more units e.g., a step for performing the functionality of one or more units functionality, or a plurality of steps, each of which performs the functionality of one or more of the plurality of units
  • the "plurality” mentioned herein means two or more.
  • “And/or” describes the association relationship of associated objects, indicating that there may be three types of relationships, for example, A and/or B may indicate: A exists alone, A and B exist simultaneously, and B exists independently.
  • the character “/” generally indicates that the contextual objects are an "or” relationship.
  • FIG. 1A exemplarily shows a schematic block diagram of an audio encoding and decoding system 100 applied in the embodiment of the present application.
  • the audio encoding and decoding system 100 may include an audio encoding component 110 and an audio decoding component 120 .
  • the audio coding component 110 is used for audio coding the HOA signal (or 3D audio signal).
  • the audio encoding component 110 may be implemented by software, or by hardware, or by a combination of software and hardware, which is not specifically limited in this embodiment of the present application.
  • the audio encoding component 110 encodes the HOA signal (or 3D audio signal) and may include the following steps:
  • the pre-processing may include filtering out low-frequency parts in the HOA signal, for example, using 20 Hz or 50 Hz as a cut-off point to extract orientation information in the HOA signal.
  • the HOA signal can be collected by the audio collection component and sent to the audio coding component 110 .
  • the audio collection component and the audio coding component 110 may be set in the same device; or, the audio coding component 110 may be set in different devices.
  • the audio encoding component 110 sends (Delivery) the code stream to the audio decoding component 120 at the decoding end through the transmission channel.
  • the audio decoding component 120 is configured to decode the code stream generated by the audio encoding component 110 to obtain the HOA signal.
  • the audio encoding component 110 and the audio decoding component 120 may be connected in a wired or wireless manner.
  • the audio decoding component 120 obtains the code stream generated by the audio coding component 110 through the connection; or, the audio coding component 110 stores the generated code stream in the memory, and the audio decoding component 120 reads the code stream in the memory.
  • the audio decoding component 120 may be implemented by software; or, it may also be implemented by hardware; or, it may also be implemented by a combination of software and hardware, which is not limited in this embodiment of the present application.
  • the audio decoding component 120 decodes the code stream, and obtaining the HOA signal may include the following steps:
  • the rendered signal is mapped to the listener's headphones or speakers.
  • the earphone of the listener may be an independent earphone or an earphone on a terminal device such as a glasses device.
  • the audio coding component 110 and the audio decoding component 120 may be set in the same device; or, they may also be set in different devices.
  • the device can be a mobile terminal with audio signal processing functions such as a mobile phone, a tablet computer, a laptop computer and a desktop computer, a Bluetooth speaker, a recording pen, or a wearable device, or it can be a core network or a wireless network with audio signal processing functions.
  • the capable network element such as a media gateway, a transcoding device, a media resource server, etc., may also be an audio codec applied to a virtual reality (virtual reality, VR) streaming (streaming) service. Not limited.
  • VR virtual reality
  • the audio encoding component 110 is set in the mobile terminal 130
  • the audio decoding component 120 is set in the mobile terminal 140.
  • the mobile terminal 130 and the mobile terminal 140 are independent of each other and have audio signal processing capabilities.
  • electronic device, and the mobile terminal 130 and the mobile terminal 140 are connected through a wireless or wired network.
  • the mobile terminal 130 includes an audio collection component 131, an audio coding component 110, and a channel coding component 132, wherein the audio collection component 131 is connected to the audio coding component 110, and the audio coding component 110 is connected to the channel coding component 132.
  • the mobile terminal 140 includes an audio playback component 141 , an audio decoding component 120 and a channel decoding component 142 , wherein the audio playback component 141 is connected to the audio decoding component 120 , and the audio decoding component 120 is connected to the channel coding component 132 .
  • the mobile terminal 130 collects the HOA signal through the audio collection component 131, it encodes the HOA signal through the audio coding component 110 to obtain a coded stream; then, it encodes the coded stream through the channel coding component 132 to obtain a transmission signal.
  • the mobile terminal 130 sends the transmission signal to the mobile terminal 140 through a wireless or wired network, for example, the transmission signal may be sent to the mobile terminal 140 through a wireless or wired network communication device.
  • the communication devices of the wired or wireless network to which the mobile terminal 130 and the mobile terminal 140 belong may be the same or different.
  • the transmission signal is decoded by the channel decoding component 142 to obtain the encoded code stream (which may be referred to as the code stream for short); the encoded code stream is decoded by the audio decoding component 120 to obtain the HOA signal; The component broadcasts the HOA signal.
  • the embodiment of the present application is described by taking the audio encoding component 110 and the audio decoding component 120 being set in the same core network or network element 150 with audio signal processing capability in the same wireless network as an example.
  • the network element 150 includes a channel decoding component 151 , an audio decoding component 120 , an audio encoding component 110 and a channel encoding component 152 .
  • the channel decoding component 151 is connected to the audio decoding component 120
  • the audio decoding component 120 is connected to the audio coding component 110
  • the audio coding component 110 is connected to the channel coding component 152 .
  • the channel decoding component 151 After the channel decoding component 151 receives the transmission signal sent by other devices, it decodes the transmission signal to obtain the first coded stream; the audio decoding component 120 decodes the first coded stream to obtain the HOA signal; the audio coding component 110 The HOA signal is encoded to obtain a second encoded code stream; the channel coding component 152 is used to encode the second encoded code stream to obtain a transmission signal.
  • the other device may be a mobile terminal capable of processing audio signals; or may also be another network element capable of processing audio signals, which is not limited in this embodiment.
  • the audio encoding component 110 and the audio decoding component 120 in the network element may transcode the encoded code stream sent by the mobile terminal.
  • the device installed with the audio encoding component 110 is referred to as an audio encoding device.
  • the audio encoding device may also have an audio decoding function, which is not limited in this embodiment of the present application.
  • a device in which the audio decoding component 120 will be installed may be referred to as an audio decoding device.
  • the audio encoding component 110 may include a spatial encoder 210 and a core encoder 220 .
  • the HOA signal to be encoded is encoded by the spatial encoder 210 to obtain an audio channel signal, that is, the HOA to be encoded generates a virtual speaker signal and a residual signal through the spatial encoder 210; the core encoder 220 encodes the audio channel signal to obtain a code flow.
  • the audio decoding component 120 may include a core decoder 230 and a spatial decoder 240 .
  • the code stream is decoded by the core decoder 230 to obtain the audio channel signal; then the spatial decoder 240 can obtain the reconstructed HOA signal according to the audio channel signal (virtual loudspeaker signal and residual signal) obtained by decoding .
  • the spatial encoder 210 and the core encoder 220 may be two independent processing units.
  • Spatial decoder 240 and core decoder 230 may be two independent processing units.
  • the core encoder 220 usually encodes the audio channel signal as a plurality of mono-channel signals, stereo channel signals or multi-channel signals.
  • the core encoder 220 encodes the audio channel signal of each frame.
  • One possible way is to calculate the encoding parameters of the audio channel signal of each frame, then encode the audio channel signal of the current frame according to the calculated encoding parameters and write it into the code stream, and write the encoding parameters into the code flow.
  • this method only considers the correlation between audio channel signals and ignores the inter-frame spatial correlation of audio channel signals, resulting in low coding efficiency.
  • the audio channel signal is obtained by mapping the target virtual speaker on the original HOA signal, there is a certain relationship between the inter-frame correlation of the audio channel signal and the selection of the virtual speaker of the HOA signal.
  • the audio channel signal has a strong correlation between frames.
  • the embodiment of the present application provides a codec method, through the proximity relationship between the virtual speaker corresponding to the current frame and the virtual speaker corresponding to the previous frame, if the proximity or position Overlapping, the encoding parameters of the current frame can be determined according to the encoding parameters of the previous frame, so that the encoding parameters of the current frame are no longer calculated through the calculation algorithm of each encoding parameter, and the encoding efficiency can be improved.
  • the HOA signal is a three-dimensional (3D) representation of the sound field.
  • HOA signals are usually represented by multiple spherical harmonic coefficients (SHC) or other hierarchical elements.
  • SHC spherical harmonic coefficients
  • the corresponding HOA signal only has a difference in amplitude between channels, so a single-channel signal can be used It is represented by a set of proportional coefficients corresponding to each channel.
  • the HOA signal is usually converted into an actual speaker signal for playback, or the HOA signal is converted into a virtual loudspeaker (virtual loudspeaker, VL) signal and then mapped to the speaker signal corresponding to both ears for playback.
  • VL virtual loudspeaker
  • the current frame refers to a sample point of a certain length obtained by collecting the audio signal, such as 960 points or 1024 points.
  • the previous frame refers to the previous frame of the current frame. For example, if the current frame is the nth frame, then the previous frame is the n-1th frame. The previous frame may also be referred to as a previous frame.
  • Audio channel signals may include multi-channel virtual speaker signals, or multi-channel virtual speaker signals and residual signals.
  • the HOA signal to be encoded is mapped to multiple virtual speakers to obtain multi-channel virtual speaker signals and residual signals.
  • the channel data of the virtual speaker and the number of channels of the residual signal may be preset.
  • the audio channel signal may also be called a transmission channel, and other names may also be used, which is not specifically limited in this application.
  • the acquisition of the virtual speaker signal may be to select a target virtual speaker that matches the HOA signal of the current frame to be encoded from the virtual speaker set according to the matching projection algorithm, and obtain the virtual speaker according to the HOA signal of the current frame and the selected target virtual speaker Signal.
  • the residual signal can be obtained according to the HOA signal to be encoded and the virtual loudspeaker signal.
  • the coding parameters may include one or more of inter-channel pairing parameters, inter-channel auditory space parameters, or inter-channel bit allocation parameters.
  • the inter-channel pairing parameter is used to characterize the pairing relationship (or called grouping relationship) between the channels to which the multiple audio signals included in the audio channel signal respectively belong.
  • Inter-channel pairing is a calculation method for pairing each transmission channel of an audio signal through correlation and other criteria to realize efficient coding of the transmission channel.
  • the audio channel signal may include a virtual speaker signal and a residual signal.
  • the way to determine the inter-channel configuration parameters is exemplarily described as follows:
  • the audio channel signals can be divided into two groups, one group of virtual speaker signals is called a virtual speaker signal group, and one group of residual signals is called a residual signal group.
  • the virtual loudspeaker signal group includes M virtual loudspeaker signals composed of mono channels, where M is a positive integer greater than 2, and the residual signal group includes N residual signals composed of mono channels, where N is a positive integer greater than 2.
  • the pairing result between channels can be paired with two channels, paired with three or more channels, or not paired between channels. Taking pairwise pairing between channels as an example, the pairing parameter between channels refers to the selection result of forming a pair of different signals in each group.
  • the virtual speaker signal group includes 4 channels, which are channel 1, channel 2, channel 3 and channel 4 respectively.
  • the channel-to-channel pairing parameter could be channel 1 paired with channel 2, channel 3 paired with channel 4, or channel 1 paired with channel 3, channel 2 paired with channel 4, or channel 1 paired with channel 2, channel 3 paired with channel 4 Mismatch etc.
  • the method for determining the pairing parameters between channels is not specifically limited in this application.
  • the method of constructing the inter-channel correlation matrix W can be used to determine the inter-channel pairing parameters, for example, see formula (1):
  • m11-m44 both represent the correlation between two channels, and further set the value of the diagonal element of the matrix to 0 to obtain W', see formula (2):
  • the principle of pairing between channels may be the sequence number when the element in W′ reaches the maximum value, and the pairing parameter between channels may be the sequence number of the matrix element.
  • the inter-channel auditory space parameters are used to characterize the human ear's perception of the acoustic image characteristics of the auditory space.
  • the inter-channel auditory space parameters may include an inter-channel level difference (inter-channel level difference, ILD) (also referred to as an inter-channel level difference), an inter-channel time difference (inter-channel time difference, ITD) (also It may be called an inter-channel time difference) or an inter-channel phase difference (inter-channel phase difference, IPD) (also may be called an inter-channel phase difference).
  • ILD inter-channel level difference
  • ITD inter-channel time difference
  • IPD inter-channel phase difference
  • the ILD parameter may be a ratio of signal energy of each channel in the audio channel signal to an average value of energy of all channels.
  • the ILD parameter may consist of two parameters, the absolute value of the ratio of each channel and the adjustment direction value. The embodiment of the present application does not specifically limit the manner of determining the ILD, ITD, or IPD.
  • the audio channel signal includes two channel signals, which are channel 1 and channel 2 respectively, and the ITD parameter may be the ratio of the time difference between the two channels in the audio channel signal.
  • the audio channel signal includes two channel signals, which are channel 1 and channel 2 respectively, and the IPD parameter may be the ratio of the phase difference between the two channels in the audio channel signal.
  • the inter-channel bit allocation parameter is used to characterize the bit allocation relationship during encoding of the channels to which the multiple audio signals included in the audio channel signal respectively belong.
  • bit allocation between channels may be implemented by using an energy-based bit allocation manner between channels.
  • the channels to be allocated bits include four channels, which are channel 1, channel 2, channel 3 and channel 4 respectively.
  • the bit channel to be allocated may be the channel to which multiple audio signals included in the audio channel signal belong, or it may be a plurality of channels obtained by downmixing the audio channel signal after channel pairing, or it may be obtained through inter-channel ILD calculation and channel Indirect pairing of multiple channels obtained after downmixing.
  • bit allocation ratios of channel 1, channel 2, channel 3, and channel 4 can be obtained through inter-channel bit allocation, and the bit allocation ratio can be used as an inter-channel bit allocation parameter, for example, channel 1 occupies 3/16, channel 2 occupies 5/ 16. Channel 3 occupies 6/16 and channel 4 occupies 2/16.
  • the manner adopted for allocating bits between channels is not specifically limited in this embodiment of the present application.
  • FIG. 3A and FIG. 3B are schematic flowcharts of an encoding method provided by an exemplary embodiment of the present application.
  • the encoding method may be implemented by an audio encoding device, or by an audio encoding component, or by a core encoder.
  • the implementation by the audio coding component is taken as an example.
  • the first target virtual speaker may include one or more virtual speakers, and may also include one or more virtual speaker groups. Each speaker group can contain one or more virtual speakers. The number of virtual speakers included in different virtual speaker groups can be the same or different.
  • Each virtual speaker in the first target virtual speaker performs spatial mapping on the original HOA signal to obtain an audio channel signal.
  • the audio channel signal may include one or more channels of audio signals.
  • a virtual loudspeaker spatially maps the original HOA signal to obtain an audio channel signal for one channel.
  • the first target virtual speaker includes M virtual speakers, where M is a positive integer.
  • the audio channel signals of the current frame may include virtual speaker signals of M channels.
  • the virtual speaker signals of the M channels are in one-to-one correspondence with the M virtual speakers.
  • the encoding parameter determines a first encoding parameter of the audio channel signal of the current frame.
  • the first coding parameter may include one or more of an inter-channel pairing parameter, an inter-channel auditory space parameter, or an inter-channel bit allocation parameter.
  • determining that the first target virtual speaker and the second target virtual speaker corresponding to the audio channel signal of the previous frame of the current frame meet the set condition can be understood as determining that the first target virtual speaker is not the same as the current
  • the proximity relationship between the second target virtual speaker corresponding to the audio channel signal of the previous frame of the frame satisfies the set condition, or it is understood that the first target virtual speaker corresponds to the audio channel signal of the previous frame of the current frame
  • the proximity between the second target virtual speakers can be understood as the spatial position relationship between the first target virtual speaker and the second target virtual speaker, or the proximity relationship can be represented by the spatial correlation between the first target virtual speaker and the second target virtual speaker.
  • the spatial position of the first target virtual speaker is referred to as the first spatial position
  • the spatial position of the second target virtual speaker is referred to as the second spatial position.
  • the first target virtual speaker may include M virtual speakers
  • the first spatial position may include a spatial position of each virtual speaker in the M virtual speakers
  • the second target virtual speaker may include N virtual speakers
  • the second spatial position may include the spatial position of each virtual speaker in the N virtual speakers. Both M and N are positive integers greater than 1.
  • M and N may be the same or different.
  • the spatial position of the target virtual speaker may be characterized by coordinates or sequence numbers or HOA coefficients.
  • M N.
  • the first target virtual speaker and the second target virtual speaker corresponding to the audio channel signal of the previous frame of the current frame meet the set conditions, which may include the first spatial position and the second spatial position Overlap can also be understood as the proximity relationship satisfies the set conditions.
  • the second encoding parameter may be multiplexed as the first encoding parameter, that is, the encoding parameter of the audio channel signal of the previous frame is used as the encoding parameter of the audio channel signal of the current frame.
  • both the first target virtual speaker and the second target virtual speaker include a plurality of virtual speakers
  • the number of virtual speakers included in the first target virtual speaker and the second target virtual speaker is the same, and the first spatial position overlaps with the second spatial position, It can be described as that the spatial positions of the multiple virtual speakers included in the first target virtual speaker overlap with the spatial positions of the multiple virtual speakers included in the second target virtual speaker in a one-to-one correspondence.
  • the coordinates of the first target virtual speaker are called the first coordinates
  • the coordinates of the second target virtual speaker are called the second coordinates
  • the first spatial position includes the first target
  • the first coordinate of the virtual speaker and the second spatial position include the second coordinate of the second target virtual speaker
  • the first spatial position and the second spatial position overlap, that is, the first coordinate and the second coordinate are the same.
  • the coordinates of the multiple virtual speakers included in the first target virtual speaker are the same as the coordinates of the multiple virtual speakers included in the second target virtual speaker
  • the coordinates are the same in one-to-one correspondence.
  • the serial number of the first target virtual speaker is called the first serial number
  • the serial number of the second target virtual speaker is called the second serial number, that is, the first spatial position
  • the first serial number of the first target virtual speaker is included, and the second spatial position includes the second serial number of the second target virtual speaker, then the first spatial position and the second spatial position overlap, that is, the first serial number and the second serial number are the same.
  • the sequence numbers of the multiple virtual speakers included in the first target virtual speaker are the same as the serial numbers of the multiple virtual speakers included in the second target virtual speaker.
  • the serial numbers are the same one by one.
  • the HOA coefficient of the first target virtual speaker is called the first HOA coefficient
  • the HOA coefficient of the second target virtual speaker is called the second HOA coefficient
  • the first spatial position includes the first HOA coefficient of the first target virtual speaker
  • the second spatial position includes the second HOA coefficient of the second target virtual speaker
  • the first spatial position overlaps with the second spatial position, which is the first HOA The coefficient is the same as the second HOA coefficient.
  • the HOA coefficients of the multiple virtual speakers included in the first target virtual speaker are different from the HOA coefficients of the multiple virtual speakers included in the second target virtual speaker.
  • the HOA coefficients of the loudspeakers are the same in one-to-one correspondence.
  • the first target virtual speaker and the second target virtual speaker corresponding to the audio channel signal of the previous frame of the current frame meet the set conditions, which may include the first spatial position and the second spatial position.
  • the positions do not overlap, and the multiple virtual speakers included in the first target virtual speaker are located in a set range centered on the multiple virtual speakers included in the second target virtual speaker in one-to-one correspondence. It can also be understood that the proximity relationship satisfies the set condition.
  • the second encoding parameter of the audio channel signal of the current frame may be obtained by adjusting the second encoding parameter of the audio channel signal of the previous frame according to a set ratio.
  • the audio channel signal of the current frame may partially multiplex the second encoding parameter of the audio channel signal of the previous frame.
  • the coding parameters of the virtual speaker signal in the audio channel signal of the current frame are multiplexed with the coding parameters of the virtual speaker signal in the audio channel signal of the previous frame, and the coding parameters of the residual signal in the audio channel signal of the current frame are not multiplexed.
  • the encoding parameters of the virtual speaker signal in the audio channel signal of a frame are multiplexed with the encoding parameters of the virtual speaker signal in the audio channel signal of the previous frame, and the encoding parameters of the residual signal in the audio channel signal of the current frame are determined by setting It is obtained by proportionally adjusting the encoding parameters of the virtual speaker signal in the audio channel signal of the previous frame.
  • the first target virtual speaker includes two virtual speakers, respectively virtual speaker 1-1 and virtual speaker 1-2.
  • the audio channel signal of the previous frame includes two virtual speaker signals, FH1 and FH2 respectively
  • the second target virtual speaker includes two virtual speakers, respectively virtual speaker 2-1 and virtual speaker 2-2.
  • the virtual speaker 1-1 is located within the set range centered on the virtual speaker 2-1
  • the virtual speaker 1-2 is located within the set range centered on the virtual speaker 2-2, then the first target virtual speaker and the second target The proximity relationship of the virtual speakers satisfies the set conditions.
  • the coordinates of the virtual speaker are represented by (horizontal angle azi, pitch angle ele).
  • the coordinates of the virtual speaker 1-1 are (H1_pos_aiz, H1_pos_ele), and the coordinates of the virtual speaker 1-2 are (H2_pos_aiz, H2_pos_ele).
  • the coordinates of the virtual speaker 2-1 are (FH1_pos_aiz, FH1_pos_ele), and the coordinates of the virtual speaker 2-2 are (FH2_pos_aiz, FH2_pos_ele).
  • the proximity relationship between the first target virtual speaker and the second target virtual speaker satisfies the set A given condition is that the multiple virtual speakers included in the first target virtual speaker are located in a set range centered on the multiple virtual speakers included in the second target virtual speaker in one-to-one correspondence.
  • the serial number of the virtual speaker 1-1 is H1_Ind
  • the serial number of the virtual speaker 1-2 is H2_Ind
  • the serial number of the virtual speaker 2-1 is FH1_Ind
  • the serial number of the virtual speaker 2-2 is FH2_Ind.
  • the HOA coefficient of virtual speaker 1-1 is H1_Coef
  • the HOA coefficient of virtual speaker 1-2 is H2_Coef
  • the HOA coefficient of the virtual speaker 2-1 is FH1_Coef
  • the HOA coefficient of the virtual speaker 2-2 is FH2_Coef.
  • the audio encoding component may also determine that the first target virtual speaker and the second target virtual speaker meet the set condition by determining the correlation between the first target virtual speaker and the second target virtual speaker.
  • the audio coding component may determine the degree of correlation between the first target virtual speaker and the second target virtual speaker according to the first coordinates of the first target virtual speaker and the second coordinates of the second target virtual speaker.
  • the first encoding parameters may multiplex the second encoding parameters.
  • the correlation degree may be determined by the following formula (3).
  • R represents the degree of correlation
  • norm () represents the normalization operation
  • S () represents the operation of determining the distance
  • H m represents the coordinates of the mth virtual speaker in the first target virtual speaker
  • FH n represents the first target virtual speaker The coordinates of the nth virtual speaker in the second target virtual speaker.
  • S(H m , FH n ) represents determining the distance between the m th virtual speaker included in the first target virtual speaker and the n th virtual speaker included in the second target virtual speaker.
  • m traverses the positive integers not greater than N
  • n traverses the positive integers not greater than N.
  • N is a virtual speaker included in the first target virtual speaker and the second target virtual speaker.
  • the correlation may be determined by the following formula (4).
  • the first target virtual speaker in the current frame includes N virtual speakers, respectively: H1, H2, ... HN
  • the second target virtual speaker in the previous frame includes N virtual speakers, respectively, FH1, FH2, ... FHN.
  • M H is a matrix formed by the coordinates of the virtual speakers included in the first target virtual speaker of the current frame, The transpose of the matrix consisting of the coordinates of the virtual speakers included for the second target virtual speaker of the previous frame.
  • the correlation between the first target virtual speaker and the second target virtual speaker determined according to the first coordinates of the first target virtual speaker and the second coordinates of the second target virtual speaker satisfies The conditions shown in the following formula (5):
  • R represents the correlation degree
  • norm() represents the normalization operation
  • max() represents the maximum value operation of the elements in the brackets
  • the first encoding parameter may be partially multiplexed with the second encoding parameter, or the first encoding parameter may be obtained by adjusting the second encoding parameter according to a set ratio.
  • the set value is a number greater than 0.5 and less than 1.
  • multiplexing the second encoding parameter as the first encoding parameter for the audio channel signal of the current frame Encode and write code stream when the first spatial position of the first target virtual speaker overlaps with the second spatial position of the second target virtual speaker, multiplexing the second encoding parameter as the first encoding parameter for the audio channel signal of the current frame Encode and write code stream.
  • the second encoding parameter can be adjusted according to the set ratio to obtain the first encoding parameter.
  • the first encoding parameter may include one or more of an inter-channel pairing parameter, an inter-channel auditory space parameter, or an inter-channel bit allocation parameter.
  • the value of ⁇ can be different for different encoding parameters. For example, the value of ⁇ corresponding to the inter-channel pairing parameter is ⁇ 1, and the value of ⁇ corresponding to the inter-channel bit allocation parameter is ⁇ 2.
  • the audio encoding component also needs to notify the audio decoding component of the first encoding parameter of the audio channel signal of the current frame through the code stream.
  • the audio encoding component may write the first encoding parameter into the code stream, so as to notify the audio decoding component of the first encoding parameter of the audio channel signal of the current frame.
  • the audio encoding component further executes 304a to write the first encoding parameters into the code stream.
  • the decoding side may perform decoding through the following decoding method.
  • the method on the decoding side may be executed by an audio decoding device, or by an audio decoding component, or by a core encoder.
  • the method of performing the decoding side by the audio decoding component is taken as an example.
  • the audio coding component sends the code stream to the audio decoding component, so that the audio decoding component receives the code stream.
  • the audio decoding component decodes the code stream to obtain the first encoding parameter.
  • the audio decoding component decodes the code stream according to the first encoding parameter to obtain the audio channel signal of the current frame.
  • the audio encoding component may write the multiplexing identifier into the code stream, and indicate how to obtain the first encoding parameter of the audio channel signal of the current frame through different values of the multiplexing identifier.
  • the audio encoding component also executes 304b to encode the multiplexing identifier into the code stream.
  • the multiplexing identifier is used to indicate that the first encoding parameter of the audio channel signal of the current frame is determined by the second encoding parameter of the audio channel signal of the previous frame.
  • the multiplexing identifier is the first value to indicate the audio channel signal of the current frame
  • the first encoding parameter multiplexes the second encoding parameter.
  • the first encoding parameter may not be written in the code stream, thereby reducing resource occupation and improving transmission efficiency.
  • the multiplexing flag is set to a third value to indicate the first encoding of the audio channel signal of the current frame
  • the parameter does not multiplex the second encoding parameter, and the determined first encoding parameter can be written in the code stream.
  • the first encoding parameter may be determined according to the second encoding parameter, or may be obtained through calculation. For example, when the first spatial position does not overlap with the second spatial position, if the multiple virtual speakers included in the first target virtual speaker are located in a set range centered on the multiple virtual speakers included in the second target virtual speaker When it is inside, the second encoding parameter can be adjusted according to the set ratio to obtain the first encoding parameter, and then the obtained first encoding parameter can be written into the code stream and the multiplexing identifier whose value is the third value can be written into the code stream.
  • the first encoding parameter of the audio channel signal of the current frame can be calculated, the first encoding parameter can be written into the code stream, and the value Write the code stream for the multiplexing identifier of the third value.
  • the first value is 0 and the third value is 1, or the first value is 1 and the third value is 0.
  • the first value and the third value may also be other values, which are not limited in this embodiment of the present application.
  • the multiplexing identifier is written into the code stream, and the multiplexing identifier is the first value, multiplexing the second encoding parameter with the first encoding parameter indicating the audio channel signal of the current frame. Adjusting the second encoding parameter according to a set ratio to obtain the first encoding parameter, and writing the multiplexing identifier into the code stream, where the multiplexing identifier takes a second value to indicate the audio channel signal of the current frame
  • the first encoding parameter of is obtained by adjusting the second encoding parameter according to a set ratio.
  • the audio encoding component may also write the set ratio into the code stream.
  • the first encoding parameter of the audio channel signal of the current frame may be calculated, the first encoding parameter may be written into the code stream, and the The multiplexing identifier whose value is the third value is written into the code stream.
  • the first value is 11, the second value is 01, and the third value is 00.
  • the first value, the second value, and the third value may also be other values, which are not limited in this embodiment of the present application.
  • the decoding side can decode through the following decoding method.
  • the method on the decoding side may be executed by an audio decoding device, or by an audio decoding component, or by a core encoder.
  • the method of performing the decoding side by the audio decoding component is taken as an example.
  • the audio coding component sends the code stream to the audio decoding component, so that the audio decoding component receives the code stream.
  • the audio decoding component decodes the code stream to obtain the multiplexing identifier.
  • the audio decoding component determines the first encoding parameter according to the second encoding parameter.
  • the multiplexing identifier may include two values.
  • the value of the multiplexing identifier is the first value to indicate that the first encoding parameter of the audio channel signal of the current frame is to be multiplexed with the second encoding parameter.
  • the value of the multiplexing flag is the third value, indicating that the first encoding parameter of the audio channel of the current frame is not to be multiplexed with the second encoding parameter.
  • the audio decoding component decodes from the code stream to obtain the multiplexing identifier.
  • the value of the multiplexing identifier is the first value
  • the second encoding parameter is multiplexed as the first encoding parameter.
  • the Decode to obtain the audio channel signal of the current frame.
  • the value of the multiplexing flag is the third value, decode from the code stream to obtain the first encoding parameter of the audio channel signal of the current frame, and then decode from the code stream to obtain the audio of the current frame according to the first encoding parameter obtained by decoding channel signal.
  • the multiplexing identifier may include more than two values, and the multiplexing identifier is the first value to indicate that the first encoding parameter of the audio channel signal of the current frame is to be multiplexed with the second encoding parameter.
  • the value of the multiplexing identifier is a second value, to indicate that the first encoding parameter is obtained by adjusting the second encoding parameter according to a set ratio.
  • the value of the multiplexing identifier is the third value, indicating that the first encoding parameter is obtained by decoding from the code stream.
  • the audio decoding component decodes from the code stream to obtain the multiplexing identifier.
  • the second encoding parameter is multiplexed as the first encoding parameter.
  • the Decode to obtain the audio channel signal of the current frame.
  • the second encoding parameter is adjusted according to the set ratio to obtain the first encoding parameter, and then the audio channel signal of the current frame is obtained by decoding from the code stream according to the obtained first encoding parameter.
  • the set ratio may be pre-configured in the audio decoding component, and the audio decoding component may obtain the configured set ratio, so as to adjust the second encoding parameter according to the set ratio to obtain the first encoding parameter.
  • the set ratio can be written into the code stream by the audio encoding component, and the audio decoding component can decode the code stream to obtain the set ratio.
  • the value of the multiplexing flag is the third value, decode from the code stream to obtain the first encoding parameter of the audio channel signal of the current frame, and then decode from the code stream to obtain the audio of the current frame according to the first encoding parameter obtained by decoding channel signal.
  • the first encoding parameter includes one or more of an inter-channel pairing parameter, an inter-channel auditory space parameter, or an inter-channel bit allocation parameter.
  • one multiplexing identifier may be used for different parameters, and different multiplexing identifiers may be used for multiple parameters.
  • the same multiplexing identifier may be used as an example.
  • the multiplexing identifier is the first value, it indicates that the first encoding parameter includes the second encoding parameter that all parameters are multiplexed with the audio channel signal of the previous frame.
  • the first encoding parameter includes an inter-channel pairing parameter.
  • the pairing parameter does not reuse the channel pairing parameter of the audio channel signal of the previous frame
  • the inter-channel pairing parameter of the signal is obtained, or indicates that the inter-channel pairing parameter of the audio channel signal of the current frame is partially multiplexed with the inter-channel pairing parameter of the audio channel signal of the previous frame.
  • the first encoding parameter includes an inter-channel auditory space parameter.
  • the inter-channel auditory space parameters include one or more items of ILD, IPD or ITD.
  • a multiplexing flag can indicate whether the multiple parameters included in the inter-channel auditory space parameter of the audio channel signal of the current frame are multiplexed with the audio channel of the previous frame Interchannel auditory space parameters of the signal.
  • the inter-channel auditory space parameters of the audio channel signal of the current frame are adjusted according to the set ratio
  • the inter-channel auditory space parameter of the audio channel signal of a frame is obtained, or indicates that the inter-channel auditory space parameter of the audio channel signal of the current frame is partially multiplexed with the inter-channel auditory space parameter of the audio channel signal of the previous frame.
  • the inter-channel auditory space parameter when the inter-channel auditory space parameter includes multiple parameters, different parameters use different multiplexing identifiers. Take the inter-channel auditory spatial parameters including ILD, IPD and ITD as an example. Whether the ILD of the audio channel signal of the current frame is multiplexed with the ILD of the audio channel signal of the previous frame is indicated by the multiplexing flag Flag_2-1. Whether the ITD of the audio channel signal of the current frame is multiplexed with the ITD of the audio channel signal of the previous frame is indicated by the multiplexing flag Flag_2-2. Whether the IPD of the audio channel signal of the current frame is multiplexed with the IPD of the audio channel signal of the previous frame is indicated by the multiplexing flag Flag_2-3.
  • the first encoding parameter includes an inter-channel bit allocation parameter.
  • the process of generating the HOA coefficients of the virtual loudspeaker involved in the embodiment of the present application is exemplarily described as follows.
  • the HOA coefficients of the virtual loudspeaker may also be generated in other manners, which are not specifically limited in this embodiment of the present application.
  • the angular frequency w 2 ⁇ f
  • f is the sound wave frequency
  • c is the sound speed.
  • r represents the radius of the sphere
  • represents the horizontal angle
  • k indicates the wave number
  • s is the amplitude of the ideal plane wave
  • m is the serial number of the HOA order
  • the first j in represents the imaginary unit. Partially does not vary with angle. is ⁇ ,
  • the spherical harmonics of the direction is the spherical harmonic function of the direction of the sound source.
  • Equation (9) shows that the sound field can be expanded on a spherical surface according to spherical harmonic functions, using the coefficient to express.
  • spherical harmonic functions can be based on Rebuild the sound field.
  • Truncate the above formula to the Nth item, with the coefficient it is called the N-order HOA coefficient, and the HOA coefficient can also be called the Ambisonics coefficient.
  • the P-order Ambisonics coefficients have (P+1) 2 channels. Among them, the Ambisonics signal above the first order is also called the HOA signal. In one possible configuration, the HOA order can be 2 to 10 orders.
  • the spherical harmonic function is superimposed according to the coefficient corresponding to a sampling point of the HOA signal, and the reconstruction of the spatial sound field at the time corresponding to the sampling point can be realized.
  • the HOA coefficients of the virtual speakers can be generated according to the above description. Put ⁇ s in formula (8) and Set to the coordinates of the virtual speaker, namely the horizontal angle ( ⁇ s ) and the pitch angle According to the formula (8), the HOA coefficient of the loudspeaker can be obtained, which is also called the ambisonics coefficient.
  • represents the horizontal angle of the speaker, Indicates the elevation angle of the speaker.
  • the 16-channel coefficients corresponding to the third-order HOA signal can be obtained according to the speaker position coordinates.
  • the method for determining the target virtual speaker of the current frame and the method for generating the audio channel signal are exemplarily described below.
  • the determination of the target virtual speaker of the current frame and the generation of the audio channel signal may also adopt other manners, which are not specifically limited in this embodiment of the present application.
  • the audio coding component determines the number of virtual speakers included in the first target virtual speaker and the number of virtual speaker signals included in the audio channel signal.
  • the number M of the first target virtual speakers cannot exceed the total number of virtual speakers.
  • the virtual speaker set includes 1024 virtual speakers, and the number K of virtual speaker signals (virtual speaker signals to be transmitted by the encoder) cannot exceed the first target The number M of virtual speakers.
  • the number M of the first target virtual speakers may also be obtained through the scene signal type parameter.
  • the scene signal type parameter may be a feature value after performing SVD decomposition on the HOA signal to be encoded in the current frame.
  • the number d of sound sources including different directions in the sound field can be obtained through the scene signal type parameter, and the number M of the first target virtual speakers satisfies 1 ⁇ N ⁇ d.
  • A2 Determine a virtual speaker in the first target virtual speaker according to the HOA signal to be encoded and the candidate virtual speaker set.
  • the representative point may be firstly determined according to the HOA signal to be encoded in the current frame, and then the speaker voting value may be calculated according to the representative point of the HOA signal to be encoded.
  • the loudspeaker voting value may also be directly calculated according to each point of the HOA signal to be encoded in the current frame.
  • the representative point may be a representative sample point in the time domain or a representative frequency point in the frequency domain.
  • the set of speakers in the i-th round may be a set of virtual speakers, including Q virtual speakers; it may also be a subset selected from the set of virtual speakers according to a preset rule.
  • the set of speakers used in different rounds can be the same or different.
  • the voting value of the speaker is passed through the signal to be encoded
  • the HOA coefficients are obtained by projection of the loudspeaker HOA coefficients.
  • is the azimuth and is the pitch angle
  • Q is the total number of loudspeakers.
  • the selection criterion for the matching speaker gj,i of the i-th round of voting corresponding to the j-th frequency point is to select the absolute value of the voting value from the voting values corresponding to the Q speakers of the i-th round of voting corresponding to the j-th frequency point
  • E jig is the voting value of the matching speaker in the i-th round of voting at the j-th frequency point
  • the above the right side of the formula is the HOA coefficient of the signal to be encoded for the i-th round of voting corresponding to the j-th frequency point
  • the left side of the formula is the HOA coefficient of the signal to be encoded for the i+1 round of voting corresponding to the jth frequency point
  • w is the weight value
  • the preset value can satisfy 0 ⁇ w ⁇ 1, in addition to give a Adaptive weight calculation method:
  • norm is the operation to obtain the second norm,
  • the set of best matching speakers is determined based on the total vote value of the matching speakers. Specifically, the total voting value VOTE g of all matching speakers can be selected, and C matching speakers that win the vote are selected as the best matching speaker set according to the size of the total voting value VOTE g , and then the best matching speaker set is obtained. Position coordinates
  • a -1 represents the inverse matrix of matrix A
  • the size of matrix A is (M ⁇ C)
  • C is the number of loudspeakers that won the vote
  • a represents the HOA coefficient of the best matching speaker, for example
  • X represents the HOA coefficient of the signal to be encoded
  • the size of the matrix X is (M ⁇ L)
  • M is the number of channels of the N-order HOA coefficient
  • L is the number of frequency points
  • x represents the HOA coefficient of the signal to be encoded ,E.g,
  • the spatial encoder performs spatial encoding processing on the HOA signal to be encoded to obtain the audio channel signal of the current frame and the attribute information of the first target virtual speaker of the audio channel of the current frame, and transmits them to the core encoder.
  • the attribute information of the first target virtual speaker includes one or more items of coordinates, sequence numbers, or HOA coefficients of the first target virtual speaker.
  • the core encoder performs core encoding processing on the audio channel signal to obtain a code stream.
  • the core encoding process may include and is not limited to transformation, psychoacoustic model processing, downmixing, bandwidth expansion, quantization, and entropy encoding, etc.
  • the core encoding process may process audio channel signals in the frequency domain or audio channel signals in the time domain For processing, there is no limitation here.
  • the encoding parameters used in the downmix processing may include one or more of an inter-channel pairing parameter, an inter-channel auditory space parameter, or an inter-channel bit allocation parameter. That is, the downmix processing may include inter-channel pairing processing, channel signal adjustment processing, inter-channel bit allocation processing, and the like.
  • FIG. 5 is a schematic diagram of a possible encoding process.
  • the audio channel signal of the current frame and the attribute information of the first target virtual speaker of the audio channel of the current frame are output.
  • the core encoder performs transient detection on the audio channel signal, and then performs windowing transformation on the signal after transient detection to obtain a frequency domain signal.
  • a noise shaping process is further performed on the frequency domain signal to obtain a shaped audio channel signal. Then perform downmixing processing on the audio channel signals after the noise shaping processing, which may include pairing operations between channels, channel signal adjustment, and signal bit allocation operations between channels.
  • the embodiment of the present application does not specifically limit the processing sequences of the inter-channel pairing operation, channel signal adjustment, and inter-channel signal bit allocation operations.
  • the inter-channel pairing process is performed first, and the inter-channel pairing process is specifically performed according to the inter-channel pairing parameters, and the inter-channel pairing parameters and/or the multiplexing identifier are encoded into the code stream.
  • the inter-channel pairing parameters can be based on the attribute information of the first target virtual speaker in the current frame (the coordinates, serial number or HOA coefficient of the first target virtual speaker) and the attribute information of the second target virtual speaker in the previous frame (the second target virtual speaker coordinates, sequence numbers or HOA coefficients) to determine whether the inter-channel pairing parameters of the current frame reuse the inter-channel pairing parameters of the previous frame. Perform inter-channel pairing processing on the noise-shaping audio channel signals of the current frame according to the determined inter-channel pairing parameters of the current frame to obtain paired audio channel signals.
  • Inter-channel auditory space parameters can be based on the attribute information of the first target virtual speaker in the current frame (the coordinates, serial number or HOA coefficient of the first target virtual speaker) and the attribute information of the second target virtual speaker in the previous frame (the second target virtual speaker Speaker coordinates, sequence numbers or HOA coefficients) determine whether the inter-channel auditory space parameters of the current frame are multiplexed with the inter-channel auditory space parameters of the previous frame.
  • inter-channel bit allocation processing is performed on the adjusted audio channel signal according to the inter-channel bit allocation parameters, and the inter-channel bit allocation parameters and/or the multiplexing identifier are encoded into the code stream.
  • the inter-channel bit allocation parameters can be based on the attribute information of the first target virtual speaker of the current frame (the coordinates, serial number or HOA coefficient of the first target virtual speaker) and the attribute information of the second target virtual speaker of the previous frame (the second target virtual speaker Speaker coordinates, serial numbers or HOA coefficients) determine whether the inter-channel bit allocation parameters of the current frame are multiplexed with the inter-channel bit allocation parameters of the previous frame.
  • bit allocation between channels, quantization, entropy coding and bandwidth adjustment can be further performed to obtain a code stream.
  • the audio encoding device may include a spatial encoding unit 601 for obtaining the audio channel signal of the current frame, which is the original high-order ambisonic reverberation HOA signal through the first target virtual speaker Obtained by performing spatial mapping; the core encoding unit 602 is configured to determine that the first target virtual speaker and the second target virtual speaker corresponding to the audio channel signal of the previous frame of the current frame meet the set condition, according to the set condition.
  • the second encoding parameter of the audio channel signal of the previous frame determines the first encoding parameter of the audio channel signal of the current frame; encodes the audio channel signal of the current frame according to the first encoding parameter and writes it into a code stream.
  • the core encoding unit 602 is further configured to write the first encoding parameter into a code stream.
  • the first encoding parameter includes one or more of an inter-channel pairing parameter, an inter-channel auditory space parameter, or an inter-channel bit allocation parameter.
  • the setting condition includes that the first spatial position overlaps with the second spatial position; the core encoding unit 602 is specifically configured to convert the audio channel signal of the previous frame to The second encoding parameter is used as the first encoding parameter of the audio channel signal of the current frame.
  • the core encoding unit 602 is further configured to write the multiplexing identifier into the code stream, where the value of the multiplexing identifier is a first value, and the first value indicates that the current frame
  • the first encoding parameter of the audio channel signal multiplexes the second encoding parameter.
  • the first spatial position includes first coordinates of the first target virtual speaker
  • the second spatial position includes second coordinates of the second target virtual speaker
  • the first The overlapping of the spatial position and the second spatial position includes that the first coordinate is the same as the second coordinate; or the first spatial position includes the first serial number of the first target virtual speaker, and the second spatial position Including the second serial number of the second target virtual speaker, the first spatial position overlapping the second spatial position includes the first serial number being the same as the second serial number; or the first spatial position includes the The first HOA coefficient of the first target virtual speaker, the second spatial position includes the second HOA coefficient of the second target virtual speaker, and the overlapping of the first spatial position and the second spatial position includes the first A HOA coefficient is the same as the second HOA coefficient.
  • the first target virtual speaker includes M virtual speakers, and the second target virtual speaker includes N virtual speakers;
  • the set condition includes the first spatial position and the second spatial position The positions do not overlap and the mth virtual speaker included in the first target virtual speaker is located within a set range centered on the nth virtual speaker included in the second target virtual speaker, wherein m traverses less than or equal to M is a positive integer, n traverses positive integers less than or equal to N;
  • the core encoding unit 602 is specifically configured to adjust the second encoding parameter according to a set ratio to obtain the first encoding parameter.
  • the Whether the m-th virtual speaker is located within a set range centered on the n-th virtual speaker is determined by a degree of correlation between the m-th virtual speaker and the n-th virtual speaker, wherein the correlation meet the following conditions:
  • R represents the degree of correlation
  • norm () represents the normalization operation
  • M H is the matrix that the coordinates of the virtual speakers included in the first target virtual speaker of the current frame form, transpose of a matrix consisting of coordinates of the virtual speakers included for the second target virtual speaker of the previous frame;
  • the mth virtual speaker is located within the set range centered on the nth virtual speaker, wherein, m traverses a positive integer less than or equal to M, and n traverses less than Or a positive integer equal to N.
  • the core encoding unit 602 is further configured to write the multiplexing identifier into the code stream, where the value of the multiplexing identifier is a second value, and the second value indicates that the current frame
  • the first encoding parameter of the audio channel signal is obtained by adjusting the second encoding parameter according to a set ratio.
  • the core coding unit is further configured to write the set ratio into the code stream.
  • the audio decoding device may include a core decoding unit 701, configured to parse a multiplexing identifier from the code stream, and the multiplexing identifier indicates that the first encoding parameter of the audio channel signal of the current frame is passed through the first encoding parameter of the current frame.
  • Determining the second encoding parameter of the audio channel signal of the previous frame Determining the second encoding parameter of the audio channel signal of the previous frame; determining the first encoding parameter according to the second encoding parameter of the audio channel signal of the previous frame; determining the first encoding parameter from the code stream according to the first encoding parameter Decoding the audio channel signal of the current frame; a spatial decoding unit 702, configured to perform spatial decoding on the audio channel signal to obtain a high-order ambisonic reverberation HOA signal.
  • the core decoding unit 701 is specifically configured to, when the value of the multiplexing flag is a first value, the first value indicates that the first encoding parameter multiplexes the first Two encoding parameters, obtaining the second encoding parameter as the first encoding parameter.
  • the core decoding unit 701 is specifically configured to: when the value of the multiplexing flag is a second value, the second value indicates that the first coding parameter is passed according to a set ratio The second encoding parameter is adjusted to obtain the first encoding parameter by adjusting the second encoding parameter according to a set ratio.
  • the core decoding unit 701 is specifically configured to decode from the code stream to obtain the set ratio when the value of the multiplexing identifier is a second value.
  • the encoding parameters of the audio channel signal include one or more of an inter-channel pairing parameter, an inter-channel auditory space parameter, or an inter-channel bit allocation parameter.
  • the position of the core decoding unit 701 corresponds to the position of the core decoder 230 in FIG. 2B.
  • the specific realization of the function of the core decoding unit 701 can refer to the core decoder in FIG. 2B 230 for specific details.
  • the position of the spatial decoding unit 702 corresponds to the position of the spatial decoder 240 in FIG. 2B .
  • the specific implementation of the functions of the spatial decoding unit 702 can refer to the specific details of the spatial decoder 240 in FIG. 2B .
  • the position of the spatial encoding unit 601 corresponds to the position of the spatial encoder 210 in FIG. 2A.
  • the specific realization of the function of the spatial encoding unit 601 can refer to the spatial encoder 210 in FIG. specific details.
  • the position of the core encoding unit 602 corresponds to the position of the core encoder 220 in FIG. 2A .
  • the specific implementation of the functions of the core encoding unit 602 can refer to the specific details of the core encoder 220 in FIG. 2A .
  • the specific implementation process of the core encoding unit 602 and the core encoding unit 602 can refer to the detailed description of the embodiment in FIG. 3A, FIG. 3B or FIG.
  • Computer-readable media may include computer-readable storage media, which correspond to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another (eg, according to a communication protocol) .
  • a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this application.
  • a computer program product may include a computer readable medium.
  • such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk or other magnetic storage, flash memory, or any other medium that can contain the desired program code in the form of a computer and can be accessed by a computer.
  • any connection is properly termed a computer-readable medium.
  • coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave
  • coaxial cable Wire, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media.
  • Disk and disc includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD) and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce optically with lasers data. Combinations of the above should also be included within the scope of computer-readable media.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • the techniques of the present application may be implemented in a wide variety of devices or devices, including wireless handsets, an integrated circuit (IC), or a group of ICs (eg, a chipset).
  • IC integrated circuit
  • a group of ICs eg, a chipset
  • Various components, modules, or units are described in this application to emphasize functional aspects of means for performing the disclosed techniques, but do not necessarily require realization by different hardware units. Indeed, as described above, the various units may be combined in a codec hardware unit in conjunction with suitable software and/or firmware, or by interoperating hardware units (comprising one or more processors as described above) to supply.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Stereophonic System (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

一种音频编码、解码方法及装置,在对当前帧的音频通道信号编码时,先确定第一目标虚拟扬声器与当前帧的前一帧的音频通道信号对应的第二目标虚拟扬声器是否满足设定条件,在满足时,根据前一帧的音频通道信号的第二编码参数确定当前帧的音频通道信号的第一编码参数,从而根据第一编码参数对当前帧的音频通道信号进行编码获得编码结果并将编码结果写入码流。

Description

一种音频编码、解码方法及装置
相关申请的交叉引用
本申请要求在2021年05月14日提交中华人民共和国知识产权局、申请号为202110530309.1、申请名称为“一种音频编码、解码方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及编解码技术领域,尤其涉及一种音频编码、解码方法及装置。
背景技术
三维音频技术是对真实世界中的声音事件和三维声场信息进行获得、处理、传输和渲染回放的音频技术。三维音频技术使声音具有强烈的空间感、包围感及沉浸感,给人以“声临其境”的非凡听觉体验。高阶立体混响(higher order ambisonics,HOA)技术具有在录制、编码与回放阶段与扬声器布局无关的性质和HOA格式数据的可旋转回放特性,在进行三维音频回放时具有更高的灵活性,因而也获得了更为广泛的关注和研究。
为了实现更好的音频听觉效果,HOA技术需要大量的数据量用于记录更详细的声音场景的信息。虽然这种根据场景的三维音频信号采样和存储更加利于音频信号空间信息的保存和传输,但随着HOA阶数的增加,数据量也会增加,大量的数据造成传输和存储的困难,因此需要对HOA信号进行编解码。
待编码的HOA信号通过编码产生虚拟扬声器信号和残差信号,然后进一步对虚拟扬声器信号和残差信号进行编码后获得码流。通常情况下,在针对虚拟扬声器信号和残差信号进行编码时,针对每一帧的虚拟扬声器信号和残差信号进行编解码处理。但是只考虑了当前帧的信号间的相关性,对每一帧的虚拟扬声器信号和残差信号编码,导致计算复杂度较高,编码效率较低。
发明内容
本申请实施例提供一种音频编码、解码方法及装置,用以解决计算复杂度高的问题。
第一方面,本申请实施例提供一种音频编码方法,包括:获得当前帧的音频通道信号,所述当前帧的音频通道信号是通过第一目标虚拟扬声器对原始高阶立体混响HOA信号进行空间映射获得的;在确定所述第一目标虚拟扬声器与第二目标虚拟扬声器满足设定条件时,根据所述当前帧的前一帧的音频通道信号的第二编码参数确定所述当前帧的音频通道信号的第一编码参数,所述前一帧的音频通道信号与所述第二目标虚拟扬声器对应;根据所述第一编码参数对所述当前帧的音频通道信号进行编码;将所述当前帧的音频通道信号的编码结果写入码流。通过上述方法,在当前帧进行编码时,如果与前一帧匹配的虚拟扬声器之间的邻近时,可以根据前一帧的编码参数确定当前帧的编码参数,从而不需要再重新计算当前帧的编码参数,可以提高编码效率。
在一种可能的设计中,所述方法还包括:将所述第一编码参数写入码流。上述设计中, 将根据前一帧的编码参数确定的编码参数作为当前帧的编码参数写入码流,实现对端获得编码参数的同时,提高编码效率。
在一种可能的设计中,所述第一编码参数包括通道间配对参数、通道间听觉空间参数或者通道间比特分配参数中的一项或者多项。
在一种可能的设计中,所述通道间听觉空间参数包括通道间声级差ILD、通道间时间差ITD或者通道间相位差IPD中的一项或者多项。
在一种可能的设计中,所述设定条件包括所述第一空间位置与所述第二空间位置重叠;所述根据所述前一帧的音频通道信号的第二编码参数确定当前帧的音频通道信号的第一编码参数,包括:将所述前一帧的音频通道信号的第二编码参数作为所述当前帧的音频通道信号的第一编码参数。通过上述设计,在前一帧的目标虚拟扬声器的空间位置与当前帧的目标虚拟扬声器的空间位置重叠时,复用前一帧的编码参数作为当前帧的编码参数,考虑到音频通道信号之间的帧间空间相关性,无需再计算当前帧的编码参数,可以提高编码效率。
在一种可能的设计中,所述方法还包括:将复用标识写入码流,所述复用标识的取值为第一值,所述第一值指示所述当前帧的音频通道信号的第一编码参数复用所述第二编码参数。上述设计中,通过将复用标识写入码流,来通知解码侧确定当前帧的编码参数的方式,简单且有效。
在一种可能的设计中,所述第一空间位置包括所述第一目标虚拟扬声器的第一坐标,所述第二空间位置包括所述第二目标虚拟扬声器的第二坐标,所述第一空间位置与所述第二空间位置重叠包括所述第一坐标与所述第二坐标相同;或所述第一空间位置包括所述第一目标虚拟扬声器的第一序号,所述第二空间位置包括所述第二目标虚拟扬声器的第二序号,所述第一空间位置与所述第二空间位置重叠包括所述第一序号与所述第二序号相同;或所述第一空间位置包括所述第一目标虚拟扬声器的第一HOA系数,所述第二空间位置包括所述第二目标虚拟扬声器的第二HOA系数,所述第一空间位置与所述第二空间位置重叠包括所述第一HOA系数与所述第二HOA系数相同。上述设计中,通过坐标、序号或者HOA系数来表征空间位置,用于确定前一帧的虚拟扬声器与当前帧的虚拟扬声器是否重叠,简单且有效。
在一种可能的设计中,所述第一目标虚拟扬声器包括M个虚拟扬声器,所述第二目标虚拟扬声器包括N个虚拟扬声器;所述设定条件包括所述第一目标虚拟扬声器的第一空间位置与所述第二目标虚拟扬声器的第二空间位置不重叠且所述第一目标虚拟扬声器包括的第m个虚拟扬声器位于以所述第二目标虚拟扬声器包括的第n个虚拟扬声器为中心的设定范围内,其中,m遍历小于或者等于M的正整数,n遍历小于或者等于N的正整数;所述根据所述前一帧的音频通道信号的第二编码参数确定当前帧的音频通道信号的第一编码参数,包括:按照设定比例调整所述第二编码参数获得所述第一编码参数。上述设计中,在前一帧的目标虚拟扬声器的空间位置与当前帧的目标虚拟扬声器的空间位置不重叠但邻近时,通过前一帧的编码参数调整当前帧的编码参数,考虑到音频通道信号之间的帧间空间相关性,无需再通过复杂的计算方式来计算当前帧的编码参数,可以提高编码效率。
其中,本发明实施例中,第一编码参数可以是一个编码参数也可以是多个编码参数,所述的调整可以是缩小,或者放大,或者部分缩小且另一部分不变,或者部分放大且另一部分不变,或者部分缩小且另一部分放大,或者部分缩小,部分不变且部分放大。
在一种可能的设计中,当所述第一空间位置包括所述第一目标虚拟扬声器的第一坐标,所述第二空间位置包括所述第二目标虚拟扬声器的第二坐标时,所述第m个虚拟扬声器是否位于以所述第n个虚拟扬声器为中心的设定范围内通过所述第m个虚拟扬声器与所述第n个虚拟扬声器之间的相关度确定,其中,所述相关度满足如下条件:
Figure PCTCN2022092310-appb-000001
其中,R表示相关度,norm()表示归一化运算,M H为当前帧的第一目标虚拟扬声器包括的虚拟扬声器的坐标组成的矩阵,
Figure PCTCN2022092310-appb-000002
为前一帧的第二目标虚拟扬声器包括的虚拟扬声器的坐标组成的矩阵的转置;当所述相关度大于设定值时,所述第m个虚拟扬声器位于以所述第n个虚拟扬声器为中心的设定范围内。上述设计提供一种简单且有效的确定前一帧的虚拟扬声器与当前帧的虚拟扬声器的邻近关系。
在一种可能的设计中,所述方法还包括:将复用标识写入码流,所述复用标识的取值为第二值,所述第二值指示所述当前帧的音频通道信号的第一编码参数通过按照设定比例调整所述第二编码参数获得。
在一种可能的设计中,所述方法还包括:将所述设定比例写入所述码流。通过上述设计,将设定比例通过码流通知到解码侧,从而解码侧根据设定比例确定当前帧的编码参数,使得解码侧获得编码参数的同时,提高编码效率。
第二方面,本申请实施例提供了一种音频解码方法,包括:从码流中解析复用标识,所述复用标识指示当前帧的音频通道信号的第一编码参数通过所述当前帧的前一帧的音频通道信号的第二编码参数确定;根据所述前一帧的音频通道信号的第二编码参数确定所述第一编码参数;根据所述第一编码参数从所述码流中解码所述当前帧的音频通道信号。通过上述设计,解码侧无需在从码流解析编码参数,可以提高解码效率。
在一种可能的设计中,根据所述前一帧的音频通道信号的第二编码参数确定所述第一编码参数,包括:当所述复用标识的取值为第一值时,所述第一值指示所述第一编码参数复用所述第二编码参数,获得所述第二编码参数作为所述第一编码参数。通过上述设计,无需从码流中解码各个编码参数,仅需解码复用标识,可以提高解码效率。
在一种可能的设计中,根据所述前一帧的音频通道信号的第二编码参数确定所述第一编码参数,包括:当所述复用标识的取值为第二值时,所述第二值指示所述第一编码参数通过按照设定比例调整所述第二编码参数获得,按照设定比例调整所述第二编码参数获得所述第一编码参数。
在一种可能的设计中,所述方法还包括:当所述复用标识的取值为第二值时,从所述码流中解码获得所述设定比例。
在一种可能的设计中,所述音频通道信号的编码参数包括通道间配对参数、通道间听觉空间参数或者通道间比特分配参数中的一项或者多项。
第三方面,本申请实施例提供一种音频编码装置,有益效果可以参见第一方面的相关描述,此处不再赘述。音频编码装置包括用于实施第一方面的任意一种方法的若干个功能单元。举例来说,音频编码装置可以包括空间编码单元,用于获得当前帧的音频通道信号,所述当前帧的音频通道信号是通过第一目标虚拟扬声器对原始高阶立体混响HOA信号进行空间映射获得的;核心编码单元,用于在确定所述第一目标虚拟扬声器与第二目标虚拟扬声器满足设定条件时,根据所述当前帧的前一帧的音频通道信号的第二编码参数确定当前帧的音频通道信号的第一编码参数,所述前一帧的音频通道信号与所述第二目标虚拟扬声器对应;根据所述第一编码参数对所述当前帧的音频通道信号进行编码,并将所述当前 帧的音频通道信号的编码结果写入码流。
在一种可能的设计中,所述核心编码单元,还用于将所述第一编码参数写入码流。
在一种可能的设计中,所述第一编码参数包括通道间配对参数、通道间听觉空间参数或者通道间比特分配参数中的一项或者多项。
在一种可能的设计中,所述设定条件包括所述第一目标虚拟扬声器的第一空间位置与所述第二目标虚拟扬声器的第二空间位置重叠;所述核心编码单元,具体用于将所述前一帧的音频通道信号的第二编码参数作为所述当前帧的音频通道信号的第一编码参数。
在一种可能的设计中,所述核心编码单元,还用于将复用标识写入码流,所述复用标识的取值为第一值,所述第一值指示所述当前帧的音频通道信号的第一编码参数复用所述第二编码参数。
在一种可能的设计中,所述第一空间位置包括所述第一目标虚拟扬声器的第一坐标,所述第二空间位置包括所述第二目标虚拟扬声器的第二坐标,所述第一空间位置与所述第二空间位置重叠包括所述第一坐标与所述第二坐标相同;或所述第一空间位置包括所述第一目标虚拟扬声器的第一序号,所述第二空间位置包括所述第二目标虚拟扬声器的第二序号,所述第一空间位置与所述第二空间位置重叠包括所述第一序号与所述第二序号相同;或所述第一空间位置包括所述第一目标虚拟扬声器的第一HOA系数,所述第二空间位置包括所述第二目标虚拟扬声器的第二HOA系数,所述第一空间位置与所述第二空间位置重叠包括所述第一HOA系数与所述第二HOA系数相同。
在一种可能的设计中,所述第一目标虚拟扬声器包括M个虚拟扬声器,所述第二目标虚拟扬声器包括N个虚拟扬声器;所述设定条件包括所述第一目标虚拟扬声器的第一空间位置与所述第二目标虚拟扬声器的第二空间位置不重叠且所述第一目标虚拟扬声器包括的第m个虚拟扬声器位于以所述第二目标虚拟扬声器包括的第n个虚拟扬声器为中心的设定范围内,其中,m遍历小于或者等于M的正整数,n遍历小于或者等于N的正整数;所述核心编码单元,具体用于按照设定比例调整所述第二编码参数获得所述第一编码参数。
在一种可能的设计中,当所述第一空间位置包括所述第一目标虚拟扬声器的第一坐标,所述第二空间位置包括所述第二目标虚拟扬声器的第二坐标时,所述第m个虚拟扬声器是否位于以所述第n个虚拟扬声器为中心的设定范围内通过所述第m个虚拟扬声器与所述第n个虚拟扬声器之间的相关度确定,其中,所述相关度满足如下条件:
Figure PCTCN2022092310-appb-000003
其中,R表示相关度,norm()表示归一化运算,M H为当前帧的第一目标虚拟扬声器包括的虚拟扬声器的坐标组成的矩阵,
Figure PCTCN2022092310-appb-000004
为前一帧的第二目标虚拟扬声器包括的虚拟扬声器的坐标组成的矩阵的转置;
当所述相关度大于设定值时,所述第m个虚拟扬声器位于以所述第n个虚拟扬声器为中心的设定范围内。
在一种可能的设计中,所述核心编码单元,还用于将复用标识写入码流,所述复用标识的取值为第二值,所述第二值指示所述当前帧的音频通道信号的第一编码参数通过按照设定比例调整所述第二编码参数获得。
在一种可能的设计中,所述核心编码单元,还用于将所述设定比例写入所述码流。
第四方面,本申请实施例提供一种音频解码装置,有益效果可以参见第二方面的相关描述,此处不再赘述。音频解码装置包括用于实施第三方面的任意一种方法的若干个功能 单元。举例来说,音频解码装置可以包括:核心解码单元,用于从码流中解析复用标识,所述复用标识指示当前帧的音频通道信号的第一编码参数通过所述当前帧的前一帧的音频通道信号的第二编码参数确定;根据所述前一帧的音频通道信号的第二编码参数确定所述第一编码参数;根据所述第一编码参数从所述码流中解码所述当前帧的音频通道信号;空间解码单元,用于对所述音频通道信号进行空间解码获得高阶立体混响HOA信号。
在一种可能的设计中,所述核心解码单元,具体用于当所述复用标识的取值为第一值时,所述第一值指示所述第一编码参数复用所述第二编码参数,获得所述第二编码参数作为所述第一编码参数。
在一种可能的设计中,所述核心解码单元,具体用于当所述复用标识的取值为第二值时,所述第二值指示所述第一编码参数通过按照设定比例调整所述第二编码参数获得,按照设定比例调整所述第二编码参数获得所述第一编码参数。
在一种可能的设计中,所述核心解码单元,具体用于当所述复用标识的取值为第二值时,从所述码流中解码获得所述设定比例。
在一种可能的设计中,所述音频通道信号的编码参数包括通道间配对参数、通道间听觉空间参数或者通道间比特分配参数中的一项或者多项。
第五方面,本申请实施例提供一种音频编码器,所述视频编码器用于编码HOA信号。示例性的,音频编码器可以实现第一方面所述的方法。音频编码器可以包括第三方面中任一设计所述的装置。
第六方面,本申请实施例提供一种音频解码器,所述视频解码器用于从码流中解码HOA信号。示例性的,音频解码器可以实现第二方面的任一种设计所述的方法。音频解码器包括第四方面的任一设计所述的装置。
第七方面,本申请实施例提供一种音频编码设备,包括:相互耦合的非易失性存储器和处理器,所述处理器调用存储在所述存储器中的程序代码以执行第一方面或者第一方面的任一设计所述的方法。
第八方面,本申请实施例提供一种音频解码设备,包括:相互耦合的非易失性存储器和处理器,所述处理器调用存储在所述存储器中的程序代码以执行第二方面或者第二方面的任一设计所述的方法。
第九方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储了程序代码,其中,所述程序代码包括用于执行第一方面至第二方面的任意一种方法的部分或全部步骤的指令。
第十方面,本申请实施例提供一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行第一方面至第二方面的任意一种方法的部分或全部步骤。
第十一方面,本申请实施例提供一种计算机可读存储介质,包括第一方面的任意一种方法所获得的码流。
应当理解的是,本申请的第三至十方面的有益效果可以参见第一方面和第二方面的相关描述,不再赘述。
附图说明
图1A为本申请实施例中一种音频编码及解码***100的示意性框图;
图1B为本申请实施例中音频编码及解码流程的示意性框图;
图1C为本申请实施例中另一种音频编码及解码***示意性框图;
图1D为本申请实施例中又一种音频编码及解码***示意性框图;
图2A为本申请实施例中音频编码组件的结构示意图;
图2B为本申请实施例中音频解码组件的结构示意图;
图3A为本申请实施例中一种音频编码方法流程示意图;
图3B为本申请实施例中另一种音频编码方法流程示意图;
图4A为本申请实施例中一种音频编解码方法流程示意图;
图4B为本申请实施例中另一种音频编解码方法流程示意图;
图5为本申请实施例中音频编码流程示意性框图;
图6为本申请实施例中音频编码装置示意图;
图7为本申请实施例中音频解码装置示意图。
具体实施方式
下面结合本申请实施例中的附图对本申请实施例进行描述。以下描述中,参考形成本公开一部分并以说明之方式示出本申请实施例的具体方面或可使用本申请实施例的具体方面的附图。应理解,本申请实施例可在其它方面中使用,并可包括附图中未描绘的结构或逻辑变化。因此,以下详细描述不应以限制性的意义来理解,且本申请的范围由所附权利要求书界定。例如,应理解,结合所描述方法的揭示内容可以同样适用于执行所述方法的对应设备或***,且反之亦然。例如,如果描述一个或多个具体方法步骤,则对应的设备可以包含如功能单元等一个或多个单元,来执行所描述的一个或多个方法步骤(例如,一个单元执行一个或多个步骤,或多个单元,其中每个都执行多个步骤中的一个或多个),即使附图中未明确描述或说明这种一个或多个单元。另一方面,例如,如果根据如功能单元等一个或多个单元描述具体装置,则对应的方法可以包含一个步骤来执行一个或多个单元的功能性(例如,一个步骤执行一个或多个单元的功能性,或多个步骤,其中每个执行多个单元中一个或多个单元的功能性),即使附图中未明确描述或说明这种一个或多个步骤。进一步,应理解的是,除非另外明确提出,本文中所描述的各示例性实施例和/或方面的特征可以相互组合。
本文所提及的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。同样,“一个”或者“一”等类似词语也不表示数量限制,而是表示存在至少一个。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。
在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
下面描述本申请实施例所应用的***架构。参见图1A所示,图1A示例性地给出了本申请实施例所应用的音频编码及解码***100的示意性框图。如图1A所示,音频编码及解码***100可以包括音频编码组件110和音频解码组件120。音频编码组件110用于对HOA信号(或者3D音频信号)进行音频编码。可选地,音频编码组件110可以通过软件实现,或者也可以通过硬件实现,或者还可以通过软硬件结合的形式实现,本申请实施例对此不作具体限定。
参见图1B所示,音频编码组件110对HOA信号(或者3D音频信号)进行编码可以包括如下几个步骤:
1)对获得到的HOA信号进行音频预处理(audio preprocessing)。预处理可以包括滤除掉HOA信号中的低频部分,比如,以20Hz或者50Hz为分界点,提取HOA信号中的方位信息。
HOA信号可以由音频采集组件采集到并发送至音频编码组件110。可选地,音频采集组件可以与音频编码组件110设置于同一设备中;或者,也可以与音频编码组件110设置于不同设备中。
2)对音频预处理后的信号进行编码处理(Audio encoding)打包(File/Segment encapsulation)获得码流。
3)音频编码组件110通过传输信道将码流发送(Delivery)到解码端的音频解码组件120。
音频解码组件120用于对音频编码组件110生成的码流进行解码获得HOA信号。
可选地,音频编码组件110与音频解码组件120之间可以通过有线或者无线的方式相连。音频解码组件120通过该连接获得音频编码组件110生成的码流;或者,音频编码组件110将生成的码流存储至存储器,音频解码组件120读取存储器中的码流。可选地,音频解码组件120可以通过软件实现;或者,也可以通过硬件实现;或者,还可以通过软硬件结合的形式实现,本申请实施例对此不作限定。
音频解码组件120对码流进行解码,获得HOA信号可包括以下几个步骤:
1)对码流进行解包(File/Segment decapsulation)处理。
2)对解包处理的信号进行音频解码(Audio decoding)处理获得解码信号。
3)对解码信号进行渲染(Audio rendering)处理。
4)渲染处理后的信号映射到收听者耳机(headphones)或者音箱上。收听者耳机可以为独立的耳机也可以是眼镜设备等终端设备上的耳机。
可选地,音频编码组件110和音频解码组件120可以设置在同一设备中;或者,也可以设置在不同设备中。设备可以为手机、平板电脑、膝上型便携计算机和台式计算机、蓝牙音箱、录音笔、可穿戴式设备等具有音频信号处理功能的移动终端,也可以是核心网、无线网中具有音频信号处理能力的网元,比如,媒体网关、转码设备、媒体资源服务器等,还可以是应用于虚拟现实(virtual reality,VR)流(streaming)服务中的音频编解码器,本申请实施例对此不作限定。
示意性地,参考图1C,本实施例以音频编码组件110设置于移动终端130中、音频解码组件120设置于移动终端140中,移动终端130与移动终端140是相互独立的具有音频信号处理能力的电子设备,且移动终端130与移动终端140之间通过无线或有线网络连接。
可选地,移动终端130包括音频采集组件131、音频编码组件110和信道编码组件132,其中,音频采集组件131与音频编码组件110相连,音频编码组件110与信道编码组件132相连。
可选地,移动终端140包括音频播放组件141、音频解码组件120和信道解码组件142,其中,音频播放组件141与音频解码组件120相连,音频解码组件120与信道编码组件132相连。移动终端130通过音频采集组件131采集到HOA信号后,通过音频编码组件110对该HOA信号进行编码,获得编码码流;然后,通过信道编码组件132对编码码流进行 编码,获得传输信号。
移动终端130通过无线或有线网络将该传输信号发送至移动终端140,比如可以通过无线或者有线网络的通信设备将该传输信号发送至移动终端140中。移动终端130和移动终端140所属的有线或者无线网络的通信设备可以相同,也可以不同。
移动终端140接收到该传输信号后,通过信道解码组件142对传输信号进行解码获得编码码流(可以简称为码流);通过音频解码组件120对编码码流进行解码获得HOA信号;通过音频播放组件播放该HOA信号。
示意性地,参考图1D,本申请实施例以音频编码组件110和音频解码组件120设置于同一核心网或无线网中具有音频信号处理能力的网元150中为例进行说明。
可选地,网元150包括信道解码组件151、音频解码组件120、音频编码组件110和信道编码组件152。其中,信道解码组件151与音频解码组件120相连,音频解码组件120与音频编码组件110相连,音频编码组件110与信道编码组件152相连。
信道解码组件151接收到其它设备发送的传输信号后,对该传输信号进行解码获得第一编码码流;通过音频解码组件120对第一编码码流进行解码获得HOA信号;通过音频编码组件110对该HOA信号进行编码,获得第二编码码流;通过信道编码组件152对该第二编码码流进行编码获得传输信号。
其中,其它设备可以是具有音频信号处理能力的移动终端;或者,也可以是具有音频信号处理能力的其它网元,本实施例对此不作限定。
可选地,网元中的音频编码组件110和音频解码组件120可以对移动终端发送的编码码流进行转码。
可选地,本实施例中将安装有音频编码组件110的设备称为音频编码设备,在实际实现时,该音频编码设备也可以具有音频解码功能,本申请实施例对此不作限定。将安装有音频解码组件120的设备可以称为音频解码设备。
示意性地,参见图2A所示,音频编码组件110可以包括空间编码器210和核心编码器220。待编码的HOA信号经过空间编码器210进行编码后获得音频信道信号,即待编码的HOA经过空间编码器210产生虚拟扬声器信号和残差信号;核心编码器220对音频信道信号进行编码后获得码流。
示意性地,参见图2B所示,音频解码组件120可以包括核心解码器230和空间解码器240。接收到码流后,通过核心解码器230对码流进行解码后获得音频信道信号;然后空间解码器240根据解码获得的音频信道信号(虚拟扬声器信号和残差信号),可以获得重建的HOA信号。
作为一种举例,空间编码器210和核心编码器220可以是两个独立的处理单元。空间解码器240和核心解码器230可以是两个独立的处理单元。核心编码器220通常情况下将音频信道信号作为多个单通道信号或立体声通道信号或多通道信号进行编码处理。
核心编码器220会对每一帧的音频通道信号进行编码处理。一种可能的方式是,对每一帧的音频通道信号的编码参数进行计算,然后根据计算获得的编码参数对当前帧的音频通道信号进行编码后写入码流,并将编码参数写入码流。而这种方式仅考虑到音频通道信号间的相关性,忽略音频通道信号的帧间空间相关性,导致编码效率较低。
由于音频通道信号是通过目标虚拟扬声器在原始HOA信号上映射获得的,因此音频通道信号的帧间相关性与HOA信号的虚拟扬声器的选择存在一定联系,当各个虚拟扬声 器的空间位置相同或邻近时,音频通道信号在帧间有较强相关性。根据此,考虑到音频通道信号的帧间相关性,本申请实施例提供一种编解码方式,通过当前帧对应的虚拟扬声器和前一帧对应的虚拟扬声器之间的邻近关系,如果邻近或者位置重叠,可以根据前一帧的编码参数确定当前帧的编码参数,从而不再通过各个编码参数的计算算法来计算当前帧的编码参数,可以提高编码效率。
在对本申请实施例提供的编解码方案进行详细描述之前,下面先对本申请实施例可能涉及的一些概念进行简单介绍。本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。
(1)HOA信号是声场的三维(3D)表示。HOA信号通常由多个球谐系数(spherical harmonic coefficient,SHC)或者其它层次元素表示。根据HOA理论,对于理想的具有特定方向的信号(比如,远场的点声源信号或者平面波信号),其对应的HOA信号在各个通道之间只存在幅度上的差异,因此可以用单通道信号和各个通道分别对应的一组比例系数进行表示。HOA技术中通常会将HOA信号转为实际扬声器信号后进行回放,或者将HOA信号转为虚拟扬声器(virtual loudspeaker,VL)信号再映射到双耳对应的扬声器信号进行回放。其中(虚拟)扬声器的选择对重建信号质量至关重要。
(2)当前帧是指对音频信号采集获得的一定长度的样点,比如960点或者1024点。前一帧,是指当前帧的前一帧,比如,当前帧为第n帧,则前一帧为第n-1帧。前一帧也可以称为在先帧。
(3)音频通道信号,可以包括多通道的虚拟扬声器信号,或者包括多通道的虚拟扬声器信号和残差信号。比如,待编码的HOA信号经过多个虚拟扬声器映射获得多通道的虚拟扬声器信号和残差信号。虚拟扬声器的通道数据和残差信号的通道数可以是预先设定的。音频通道信号也可以称为传输通道,还可以采用的其它的名称,本申请对此不作具体限定。作为一种举例,虚拟扬声器信号的获得可以是根据匹配投影算法从虚拟扬声器集合中选择匹配待编码的当前帧HOA信号的目标虚拟扬声器,根据当前帧的HOA信号和选择的目标虚拟扬声器获得虚拟扬声器信号。残差信号可以是根据待编码HOA信号和虚拟扬声器信号获得的。
(4)编码参数。例如,编码参数可以包括通道间配对参数、通道间听觉空间参数或者通道间比特分配参数中的一项或者多项。
通道间配对参数用于表征音频通道信号包括的多个音频信号分别所属的通道之间的配对关系(或者称为分组关系)。通道间配对音频信号的各个传输通道之间通过相关性等准则进行配对,实现传输通道高效编码的一种计算方法。
作为一种示例,音频通道信号可以包括虚拟扬声器信号和残差信号。如下示例性地描述通道间配置参数的确定方式:
举例来说,音频通道信号可以被划分为两组,虚拟扬声器信号为一组,称为虚拟扬声器信号组,残差信号为一组,称为残差信号组。虚拟扬声器信号组包含M个由单通道组成的虚拟扬声器信号,M为大于2的正整数,残差信号组包含N个由单声道组成的残差信号,N为大于2的正整数。例如,M=4,N=4。通道间配对结果可以为两两通道配对,也可以为三个或更多通道配对,也可以为通道间不配对。以通道间两两配对为例,通道间配对参数指的是在每组内不同的信号组成一对的选择结果。以虚拟扬声器信号组为例,例如虚拟扬声器信号组包括4个通道,分别为通道1,通道2,通道3,通道4。例如,通道间配对 参数可以为通道1和通道2配对,通道3和通道4配对,或通道1和通道3配对,通道2和通道4配对,或通道1和通道2配对,通道3和通道4不配对等情况。通道间配对参数确定的方式,本申请不作具体限定。作为一种举例,可以采用构建通道间相关矩阵W的方法确定通道间配对参数,例如,参见公式(1):
Figure PCTCN2022092310-appb-000005
其中,m11-m44均表示两个通道之间的相关性,进一步令矩阵对角元素值为0,以获得W’,参见公式(2):
Figure PCTCN2022092310-appb-000006
通道间配对的原则可以是W′中元素取得最大值时的序号,此时通道间配对参数可以为矩阵元素的序号。
通道间听觉空间参数用于表征人耳对听觉空间声像特性的感知程度。示例性地,通道间听觉空间参数可以包括通道间声级差(inter-channel level difference,ILD)(也可以称为声道间声级差)、通道间时间差(inter-channel time difference,ITD)(也可以称为声道间时间差)或者通道间相位差(inter-channel phase difference,IPD)(也可以称为声道间相位差)中的一项或者多项。
以ILD参数为例,ILD参数可以为音频通道信号中每个通道的信号能量相对于所有通道能量平均值的比值。作为一种举例,ILD参数可以由各通道的比值绝对值和调整方向值两个参数组成。本申请实施例对ILD、ITD或者IPD的确定方式不作具体限定。
以ITD参数为例,例如音频通道信号包括的两个通道的信号,分别为通道1和通道2,则ITD参数可以为音频通道信号中两个通道的时间差的比值。以IPD参数为例,例如音频通道信号包括的两个通道的信号,分别为通道1和通道2,则IPD参数可以为音频通道信号中两个通道的相位差的比值。
通道间比特分配参数用于表征音频通道信号包括的多个音频信号分别所属的通道在编码时的比特分配关系。示例性的,通道间比特分配时可以采用根据能量的通道间比特分配方式来实现。例如待分配比特的通道包括4个通道,分别为通道1,通道2,通道3,通道4。待分配比特通道可以是音频通道信号包括的多个音频信号所属的通道,也可以是经过对音频通道信号进行通道配对后的下混获得的多个通道,也可以是经过通道间ILD计算和通道间配对下混后获得的多个通道。通过通道间比特分配可以获得通道1、通道2、通道3和通道4的比特分配比值,该比特分配的比值即可作为通道间比特分配参数,例如通道1占用3/16、通道2占用5/16、通道3占用6/16和通道4占用2/16。通道间比特分配所采用的方式,本申请实施例中不作具体限定。
参见图3A和图3B所示,为本申请一个示例性实施例提供的编码方法的流程示意图。编码方法可以由音频编码设备来实现,或者由音频编码组件来实现,或者由核心编码器来实现。后续描述时,以由音频编码组件来实现为例。
301,获得当前帧的音频通道信号,所述当前帧的音频通道信号是通过第一目标虚拟扬声器对原始HOA信号进行空间映射获得的。
一种可能的示例中,第一目标虚拟扬声器可以包括一个或者多个虚拟扬声器,也可以包括一个或者多个虚拟扬声器组。每个扬声器组可以包括一个或者多个虚拟扬声器。不同 的虚拟扬声器组包括的虚拟扬声器的数量可以相同,也可以不同。第一目标虚拟扬声器中的每个虚拟扬声器均对原始HOA信号进行空间映射获得音频通道信号。音频通道信号可以包括一个或者多个通道的音频信号。例如,一个虚拟扬声器对原始HOA信号进行空间映射获得一个通道的音频通道信号。
例如,第一目标虚拟扬声器包括M个虚拟扬声器,M为正整数。当前帧的音频通道信号可以包括M个通道的虚拟扬声器信号。M个通道的虚拟扬声器信号与M个虚拟扬声器一一对应。
第一目标虚拟扬声器包括的扬声器的数量可以与编码速率或者传输速率相关,也可以与音频编码组件的复杂度相关,也可以通过配置确定。例如,当编码速率较低时,比如等于128kbps时,M=1,当编码速率中等时,比如等于384kbps时,M=4,当编码速率较高时,例如等于768kbps时,M=7。再例如,当编码器复杂度较低时,M=1,当编码器复杂度中等时,M=2,当编码器复杂度较高时,M=6。又例如:当编码速率为128kbps时,且编码复杂度要求较低时,M=1。
302,在确定所述第一目标虚拟扬声器与所述当前帧的前一帧的音频通道信号对应的第二目标虚拟扬声器满足设定条件时,根据所述前一帧的音频通道信号的第二编码参数确定当前帧的音频通道信号的第一编码参数。
示例性地,第一编码参数可以包括通道间配对参数、通道间听觉空间参数或者通道间比特分配参数中的一项或者多项。
例如,确定所述第一目标虚拟扬声器与所述当前帧的前一帧的音频通道信号对应的第二目标虚拟扬声器满足设定条件,可以理解为确定所述第一目标虚拟扬声器与所述当前帧的前一帧的音频通道信号对应的第二目标虚拟扬声器之间的邻近关系满足设定条件,或者理解为所述第一目标虚拟扬声器与所述当前帧的前一帧的音频通道信号对应的第二目标虚拟扬声器之间邻近。邻近关系可以理解为第一目标虚拟扬声器与第二目标虚拟扬声器之间的空间位置关系,或者可以通过第一目标虚拟扬声器与第二目标虚拟扬声器之间的空间相关性表征邻近关系。
作为一种举例,设定条件是否满足可以通过第一目标虚拟扬声器的空间位置与第二目标虚拟扬声器的空间位置来确定。为了便于区分,将第一目标虚拟扬声器的空间位置称为第一空间位置,第二目标虚拟扬声器的空间位置称为第二空间位置。可以理解的是,第一目标虚拟扬声器可以包括M个虚拟扬声器,则第一空间位置可以包括M个虚拟扬声器中每个虚拟扬声器的空间位置。第二目标虚拟扬声器可以包括N个虚拟扬声器,则第二空间位置可以包括N个虚拟扬声器中每个虚拟扬声器的空间位置。M和N均为大于1的正整数。M与N可以相同,也可以不同。示例性地,目标虚拟扬声器的空间位置可以通过坐标或者序号或者HOA系数来表征。可选地,M=N。
一些可能的实施例中,所述第一目标虚拟扬声器与所述当前帧的前一帧的音频通道信号对应的第二目标虚拟扬声器满足设定条件,可以包括第一空间位置与第二空间位置重叠,也可以理解为邻近关系满足设定条件。第一空间位置与第二空间位置重叠时,可以复用第二编码参数作为第一编码参数,即将前一帧的音频通道信号的编码参数作为当前帧的音频通道信号的编码参数。
在第一目标虚拟扬声器和第二目标虚拟扬声器均包括多个虚拟扬声器时,第一目标虚拟扬声器和第二目标虚拟扬声器包括的虚拟扬声器的数量相同,第一空间位置与第二空间 位置重叠,可以描述为第一目标虚拟扬声器包括的多个虚拟扬声器的空间位置与第二目标虚拟扬声器包括的多个虚拟扬声器的空间位置一一对应重叠。
比如,空间位置通过坐标来表征时,为了便于区分,将第一目标虚拟扬声器的坐标称为第一坐标,第二目标虚拟扬声器的坐标称为第二坐标,即第一空间位置包括第一目标虚拟扬声器的第一坐标,第二空间位置包括第二目标虚拟扬声器的第二坐标,则第一空间位置与第二空间位置重叠,即为第一坐标与第二坐标相同。应理解的是,当第一目标虚拟扬声器和第二目标虚拟扬声器均包括多个虚拟扬声器时,第一目标虚拟扬声器包括的多个虚拟扬声器的坐标与第二目标虚拟扬声器包括的多个虚拟扬声器的坐标一一对应相同。
再比如,空间位置通过虚拟扬声器的序号来表征时,为了便于区分,将第一目标虚拟扬声器的序号称为第一序号,第二目标虚拟扬声器的序号称为第二序号,即第一空间位置包括第一目标虚拟扬声器的第一序号,第二空间位置包括第二目标虚拟扬声器的第二序号,则第一空间位置与第二空间位置重叠,即为第一序号与第二序号相同。应理解的是,当第一目标虚拟扬声器和第二目标虚拟扬声器均包括多个虚拟扬声器时,第一目标虚拟扬声器包括的多个虚拟扬声器的序号与第二目标虚拟扬声器包括的多个虚拟扬声器的序号一一对应相同。
又比如,空间位置通过虚拟扬声器的HOA系数来表征时,为了便于区分,将第一目标虚拟扬声器的HOA系数称为第一HOA系数,第二目标虚拟扬声器的HOA系数称为第二HOA系数,即第一空间位置包括第一目标虚拟扬声器的第一HOA系数,第二空间位置包括第二目标虚拟扬声器的第二HOA系数,则第一空间位置与第二空间位置重叠,即为第一HOA系数与第二HOA系数相同。应理解的是,当第一目标虚拟扬声器和第二目标虚拟扬声器均包括多个虚拟扬声器时,第一目标虚拟扬声器包括的多个虚拟扬声器的HOA系数与第二目标虚拟扬声器包括的多个虚拟扬声器的HOA系数一一对应相同。
又一些可能的实施例中,所述第一目标虚拟扬声器与所述当前帧的前一帧的音频通道信号对应的第二目标虚拟扬声器满足设定条件,可以包括第一空间位置与第二空间位置不重叠且第一目标虚拟扬声器包括的多个虚拟扬声器一一对应位于以第二目标虚拟扬声器包括的多个虚拟扬声器为中心的设定范围内。也可以理解为邻近关系满足设定条件。例如,可以确定针对第一目标虚拟扬声器包括的第m个虚拟扬声器是否位于以第二目标虚拟扬声器包括的第n个虚拟扬声器为中心的设定范围内,m遍历小于或者等于M的正整数,n遍历小于或者等于N的正整数,以确定所述第一目标虚拟扬声器与所述当前帧的前一帧的音频通道信号对应的第二目标虚拟扬声器是否满足设定条件。比如,当第一空间位置与第二空间位置不重叠时,如果第一目标虚拟扬声器包括的多个虚拟扬声器一一对应位于以第二目标虚拟扬声器包括的多个虚拟扬声器为中心的设定范围内时,可以将按照设定比例调整前一帧的音频通道信号的第二编码参数获得当前帧的音频通道信号的第二编码参数。又比如,当第一空间位置与第二空间位置不重叠时,如果第一目标虚拟扬声器包括的多个虚拟扬声器一一对应位于以第二目标虚拟扬声器包括的多个虚拟扬声器为中心的设定范围内时,当前帧的音频通道信号可以部分复用前一帧的音频通道信号的第二编码参数。例如,当前帧的音频通道信号中虚拟扬声器信号的编码参数复用前一帧的音频通道信号中虚拟扬声器信号的编码参数,当前帧的音频通道信号中的残差信号的编码参数不复用前一帧的音频通道信号中的虚拟扬声器信号的编码参数。又例如,当前帧的音频通道信号中虚拟扬声器信号的编码参数复用前一帧的音频通道信号中虚拟扬声器信号的编码参数,当前帧的 音频通道信号中的残差信号的编码参数由按照设定比例调整前一帧的音频通道信号中的虚拟扬声器信号的编码参数获得。
以当前帧的音频通道信号包括两个虚拟扬声器信号,分别为H1,H2为例,第一目标虚拟扬声器包括两个虚拟扬声器,分别为虚拟扬声器1-1和虚拟扬声器1-2。以前一帧的音频通道信号包括两个虚拟扬声器信号,分别为FH1,FH2为例,第二目标虚拟扬声器包括两个虚拟扬声器,分别为虚拟扬声器2-1和虚拟扬声器2-2。虚拟扬声器1-1位于以虚拟扬声器2-1为中心的设定范围内,虚拟扬声器1-2位于以虚拟扬声器2-2为中心的设定范围内,则第一目标虚拟扬声器与第二目标虚拟扬声器的邻近关系满足设定条件。
比如,以第一空间位置包括第一坐标,第二空间位置包括第二坐标为例,虚拟扬声器的坐标通过(水平角azi,俯仰角ele)表示。虚拟扬声器1-1的坐标为(H1_pos_aiz,H1_pos_ele),虚拟扬声器1-2的坐标为(H2_pos_aiz,H2_pos_ele)。虚拟扬声器2-1的坐标为(FH1_pos_aiz,FH1_pos_ele),虚拟扬声器2-2的坐标为(FH2_pos_aiz,FH2_pos_ele)。当H1_Pos_azi∈[HF1_Pos_azi±TH1]且H1_Pos_ele∈[HF1_Pos_ele±TH2]且H2_Pos_azi∈[HF2_Pos_azi±TH3]且H2_Pos_ele∈[HF1_Pos_ele±TH4]时,第一目标虚拟扬声器与第二目标虚拟扬声器的邻近关系满足设定条件,即第一目标虚拟扬声器包括的多个虚拟扬声器一一对应位于以第二目标虚拟扬声器包括的多个虚拟扬声器为中心的设定范围内。其中,TH1、TH2和TH3和TH4为用于表征设定范围的设定阈值。比如,TH1、TH2和TH3和TH4可以相同也可以不同,或者TH1=TH3,TH2=TH4。
比如,以第一空间位置包括第一序号,第二空间位置包括第二序号为例。虚拟扬声器1-1的序号为H1_Ind,虚拟扬声器1-2的序号为H2_Ind。虚拟扬声器2-1的序号为FH1_Ind,虚拟扬声器2-2的序号为FH2_Ind。当H1_Ind∈[FH1_Ind±TH5]且H2_Ind∈[FH2_Ind±TH6]时,第一目标虚拟扬声器与第二目标虚拟扬声器满足设定条件,即第一目标虚拟扬声器包括的多个虚拟扬声器一一对应位于以第二目标虚拟扬声器包括的多个虚拟扬声器为中心的设定范围内。其中,TH5、TH6为用于表征设定范围的设定阈值。可选地,TH5=TH6。
比如,以第一空间位置包括第一HOA系数,第二空间位置包括第二HOA系数为例。虚拟扬声器1-1的HOA系数为H1_Coef,虚拟扬声器1-2的HOA系数为H2_Coef。虚拟扬声器2-1的HOA系数为FH1_Coef,虚拟扬声器2-2的HOA系数为FH2_Coef。当H1_Coef∈[FH1_Coef±TH7]且H2_Ind∈[HF2_Ind±TH8]时,第一目标虚拟扬声器与第二目标虚拟扬声器满足设定条件,即第一目标虚拟扬声器包括的多个虚拟扬声器一一对应位于以第二目标虚拟扬声器包括的多个虚拟扬声器为中心的设定范围内。其中,TH7、TH8为用于表征设定范围的设定阈值。可选地,TH7=TH8。
在一些可能的实施例中,音频编码组件还可以通过确定第一目标虚拟扬声器与第二目标虚拟扬声器之间的相关度,确定第一目标虚拟扬声器与第二目标虚拟扬声器满足设定条件。
作为一种举例,音频编码组件可以根据第一目标虚拟扬声器的第一坐标与第二目标虚拟扬声器的第二坐标确定第一目标虚拟扬声器与第二目标虚拟扬声器之间的相关度。
比如,音频编码组件确定第一目标虚拟扬声器的第一坐标与第二目标虚拟扬声器的第二坐标相同时,相关度R=1。在该情况下,第一编码参数可以复用第二编码参数。
又比如,当音频编码组件确定第一目标虚拟扬声器的第一坐标与第二目标虚拟扬声器的第二坐标不完全相同时,可以通过如下公式(3)确定相关度。
Figure PCTCN2022092310-appb-000007
其中,R表示相关度,norm()表示归一化运算,S()表示确定距离的运算,H m表示所述第一目标虚拟扬声器中第m个虚拟扬声器的坐标,FH n表示所述第二目标虚拟扬声器中第n个虚拟扬声器的坐标。S(H m,FH n)表示确定第一目标虚拟扬声器包括的第m个虚拟扬声器与第二目标虚拟扬声器包括的第n个虚拟扬声器之间的距离。m遍历不大于N的正整数,n遍历不大于N的正整数。N为第一目标虚拟扬声器与第二目标虚拟扬声器包括的虚拟扬声器。
又比如,当音频编码组件确定第一目标虚拟扬声器的第一坐标与第二目标虚拟扬声器的第二坐标不完全相同时,可以通过如下公式(4)确定相关度。
当前帧的第一目标虚拟扬声器中包括N个虚拟扬声器,分别为:H1,H2,…HN,前一帧的第二目标虚拟扬声器包括N个虚拟扬声器,分别为FH1,FH2,…FHN。
Figure PCTCN2022092310-appb-000008
其中,M H为当前帧的第一目标虚拟扬声器包括的虚拟扬声器的坐标组成的矩阵,
Figure PCTCN2022092310-appb-000009
为前一帧的第二目标虚拟扬声器包括的虚拟扬声器的坐标组成的矩阵的转置。
例如:
Figure PCTCN2022092310-appb-000010
Figure PCTCN2022092310-appb-000011
又比如,根据所述第一目标虚拟扬声器的第一坐标以及所述第二目标虚拟扬声器的第二坐标确定的所述第一目标虚拟扬声器与所述第二目标虚拟扬声器之间的相关度满足如下公式(5)所示的条件:
Figure PCTCN2022092310-appb-000012
其中,R表示相关度,norm()表示归一化运算,max()表示括号内元素取最大值运算,
Figure PCTCN2022092310-appb-000013
表示所述第一目标虚拟扬声器包括的第i个虚拟扬声器的水平角,
Figure PCTCN2022092310-appb-000014
表示所述第二目标虚拟扬声器包括的第i个虚拟扬声器的水平角,
Figure PCTCN2022092310-appb-000015
表示所述第一目标虚拟扬声器包括的第i个虚拟扬声器的俯仰角,
Figure PCTCN2022092310-appb-000016
表示所述第一目标虚拟扬声器包括的第i个虚拟扬声器的俯仰角。
当相关度不等于1且大于设定值时,第一编码参数可以部分复用第二编码参数,或者第一编码参数由按照设定比例调整第二编码参数获得。例如,设定值为大于0.5且小于1的数。
303,根据所述第一编码参数对所述当前帧的音频通道信号进行编码并写入码流。也可以描述为,根据所述第一编码参数对所述当前帧的音频通道信号进行编码获得编码结果,并将编码结果写入码流。
一些可能的实施例中,在第一目标虚拟扬声器的第一空间位置与第二目标虚拟扬声器的第二空间位置重叠时,复用第二编码参数作为第一编码参数对当前帧的音频通道信号进行编码并写入码流。
另一些可能的实施例中,当第一空间位置与第二空间位置不重叠时,如果第一目标虚拟扬声器包括的多个虚拟扬声器一一对应位于以第二目标虚拟扬声器包括的多个虚拟扬声器为中心的设定范围内时,可以按照设定比例调整所述第二编码参数获得第一编码参数。
例如,设定比例通过α表示,当前帧的音频通道信号的第一编码参数=α*前一帧的音频通道信号的第二编码参数,其中α取值范围为(0,1)。第一编码参数可以包括通道间配对参数、通道间听觉空间参数或者通道间比特分配参数中的一项或者多项。在一些示例 中,不同的编码参数,α的取值可以不同。比如,通道间配对参数对应的α的取值为α1,通道间比特分配参数对应的α的取值为α2。
进一步地,音频编码组件还需要通过码流向音频解码组件通知当前帧的音频通道信号的第一编码参数。
一些实施例中,音频编码组件可以通过在码流中写入第一编码参数,以实现向音频解码组件通知当前帧的音频通道信号的第一编码参数。参见图3A所示,音频编码组件还执行304a,将第一编码参数写入码流。
结合图3A所述的编码方法,参见图4A所示,解码侧可以通过如下解码方法来解码。解码侧的方法可以由音频解码设备执行,也可以由音频解码组件执行,或者由核心编码器执行。后续以音频解码组件执行解码侧的方法为例。
405a,音频编码组件将码流发送到音频解码组件,从而音频解码组件接收到码流。
406a,音频解码组件从码流中解码获得第一编码参数。
407a,音频解码组件根据第一编码参数从码流中解码获得当前帧的音频通道信号。
另一些实施例中,音频编码组件可以通过在码流中写入复用标识,通过复用标识的不同取值来指示当前帧的音频通道信号的第一编码参数如何获得。参见图3B所示,音频编码组件还执行304b,将复用标识编入码流。复用标识用于指示当前帧的音频通道信号的第一编码参数通过前一帧的音频通道信号的第二编码参数确定。
一种可能的方式中,在第一目标虚拟扬声器的第一空间位置与第二目标虚拟扬声器的第二空间位置重叠时,复用标识为第一值,以指示所述当前帧的音频通道信号的第一编码参数复用所述第二编码参数。可选地,在该方式下,可以不再码流中写入该第一编码参数,减少资源占用,提高传输效率。可选地,在第一目标虚拟扬声器的第一空间位置与第二目标虚拟扬声器的第二空间位置不重叠时,复用标识为第三值,以指示当前帧的音频通道信号的第一编码参数不复用第二编码参数,可以在码流中写入确定的第一编码参数。该第一编码参数可以是根据第二编码参数确定的,也可以是通过计算获得的。比如,当第一空间位置与第二空间位置不重叠时,如果第一目标虚拟扬声器包括的多个虚拟扬声器一一对应位于以第二目标虚拟扬声器包括的多个虚拟扬声器为中心的设定范围内时,可以按照设定比例调整所述第二编码参数获得第一编码参数,然后将获得的第一编码参数写入码流以及将取值为第三值的复用标识写入码流。再比如,当第一目标虚拟扬声器与第二目标虚拟扬声器不满足设定条件时,可以计算当前帧的音频通道信号的第一编码参数,将第一编码参数写入码流,以及将取值为第三值的复用标识写入码流。例如,第一值为0,第三值为1,或者第一值为1,第三值为0。当然第一值、第三值还可以是其它的取值,本申请实施例对此不作限定。
另一种可能的方式中,在第一目标虚拟扬声器的第一空间位置与第二目标虚拟扬声器的第二空间位置重叠时,将复用标识写入码流,复用标识为第一值,以指示所述当前帧的音频通道信号的第一编码参数复用所述第二编码参数。按照设定比例调整所述第二编码参数获得所述第一编码参数,并将复用标识写入码流中,复用标识取值为第二值,以指示所述当前帧的音频通道信号的第一编码参数通过按照设定比例调整所述第二编码参数获得。可选地,音频编码组件还可以将所述设定比例写入所述码流。在一些示例中,当第一目标虚拟扬声器与第二目标虚拟扬声器不满足设定条件时,可以计算当前帧的音频通道信号的第一编码参数,将第一编码参数写入码流,以及将取值为第三值的复用标识写入码流。例 如,第一值为11,第二值为01,第三值为00。当然第一值、第二值、第三值还可以是其它的取值,本申请实施例对此不作限定。
结合图3B对应编码方法,参见图4B所示,解码侧可以通过如下解码方法来解码。解码侧的方法可以由音频解码设备执行,也可以由音频解码组件执行,或者由核心编码器执行。后续以音频解码组件执行解码侧的方法为例。
405b,音频编码组件将码流发送到音频解码组件,从而音频解码组件接收到码流。
406b,音频解码组件从码流中解码获得复用标识。
407b,当复用标识指示当前帧的音频通道信号的第一编码参数通过前一帧的音频通道信号的第二编码参数确定时,音频解码组件根据第二编码参数确定第一编码参数。
408b,根据第一编码参数从码流中解码获得当前帧的音频通道信号。
在一些场景中,复用标识可以包括两种取值,比如,复用标识的取值为第一值,以指示当前帧的音频通道信号的第一编码参数复用第二编码参数。复用标识的取值为第三值,指示当前帧的音频通道的第一编码参数不复用第二编码参数。音频解码组件从码流中解码获得复用标识,当复用标识的取值为第一值时,复用第二编码参数作为第一编码参数,根据复用的第二编码参数从码流中解码获得当前帧的音频通道信号。当复用标识的取值为第三值时,从码流中解码获得当前帧的音频通道信号的第一编码参数,然后根据解码获得的第一编码参数从码流中解码获得当前帧的音频通道信号。
在另一些场景中,复用标识可以包括两种以上取值,复用标识为第一值,以指示所述当前帧的音频通道信号的第一编码参数复用所述第二编码参数。复用标识取值为第二值,以指示按照设定比例调整所述第二编码参数获得所述第一编码参数。复用标识取值为第三值,指示从码流中解码获得第一编码参数。音频解码组件从码流中解码获得复用标识,当复用标识的取值为第一值时,复用第二编码参数作为第一编码参数,根据复用的第二编码参数从码流中解码获得当前帧的音频通道信号。当复用标识的取值为第二值时,根据设定比例调整第二编码参数获得第一编码参数,然后根据获得的第一编码参数从码流中解码获得当前帧的音频通道信号。可选地,设定比例可以是预先配置与音频解码组件中的,音频解码组件可以获得配置的设定比例,从而根据设定比例调整第二编码参数获得第一编码参数。设定比例可以由音频编码组件写入码流,音频解码组件可以从码流中解码获得设定比例。当复用标识的取值为第三值时,从码流中解码获得当前帧的音频通道信号的第一编码参数,然后根据解码获得的第一编码参数从码流中解码获得当前帧的音频通道信号。
在一些可能的实施例中,第一编码参数包括通道间配对参数、通道间听觉空间参数或者通道间比特分配参数中的一项或者多项。
在第一编码参数包括多个参数时,针对不同参数可以采用一个复用标识,还可以针对多个参数采用不同的复用标识。
针对不同参数可以采用相同的复用标识为例,当复用标识为第一值时,指示第一编码参数包括参数均复用前一帧的音频通道信号的第二编码参数。
下面针对不同的参数可以采用不同的复用标识进行描述。
作为一种举例,第一编码参数包括通道间配对参数。比如,通过复用标识Flag_1来指示当前帧的音频通道信号的通道间配对参数是否复用前一帧的音频通道信号的通道间配对参数。例如,Flag_1=1时,指示当前帧的音频通道信号的通道间配对参数复用前一帧的音频通道信号的通道间配对参数;Flag_1=0时,指示当前帧的音频通道信号的通道间配对 参数不复用前一帧的音频通道信号的通道间配对参数。又例如,Flag_1=11时,指示当前帧的音频通道信号的通道间配对参数复用前一帧的音频通道信号的通道间配对参数;Flag_1=00时,指示当前帧的音频通道信号的通道间配对参数不复用前一帧的音频通道信号的通道间配对参数;Flag_1=01(或者10),指示当前帧的音频通道信号的通道间配对参数由按照设定比例调整前一帧的音频通道信号的通道间配对参数获得,或者指示当前帧的音频通道信号的通道间配对参数部分复用前一帧的音频通道信号的通道间配对参数。
作为另一种举例,第一编码参数包括通道间听觉空间参数。通道间听觉空间参数中包括ILD、IPD或者ITD中的一项或者多项。
一种可能的方式中,通道间听觉空间参数包括多项参数时,一个复用标识可以指示当前帧的音频通道信号的通道间听觉空间参数包括的多个参数是否复用前一帧的音频通道信号的通道间听觉空间参数。
比如,以通道间听觉空间参数包括ILD、IPD和ITD为例。通过复用标识Flag_2来指示当前帧的音频通道信号的通道间听觉空间参数(包括ILD、IPD和ITD)是否复用前一帧的音频通道信号的通道间听觉空间参数。例如,Flag_2=1时,指示当前帧的音频通道信号的通道间听觉空间参数复用前一帧的音频通道信号的通道间听觉空间参数;Flag_2=0时,指示当前帧的音频通道信号的通道间听觉空间参数不复用前一帧的音频通道信号的通道间听觉空间参数。又例如,Flag_2=11时,指示当前帧的音频通道信号的通道间听觉空间参数复用前一帧的音频通道信号的通道间听觉空间参数;Flag_2=00时,指示当前帧的音频通道信号的通道间听觉空间参数不复用前一帧的音频通道信号的通道间听觉空间参数;Flag_2=01(或者10),指示当前帧的音频通道信号的通道间听觉空间参数由按照设定比例调整前一帧的音频通道信号的通道间听觉空间参数获得,或者指示当前帧的音频通道信号的通道间听觉空间参数部分复用前一帧的音频通道信号的通道间听觉空间参数。
另一种可能的方式中,通道间听觉空间参数包括多项参数时,不同的参数采用不同的复用标识。以通道间听觉空间参数包括ILD、IPD和ITD为例。通过复用标识Flag_2-1来指示当前帧的音频通道信号的ILD是否复用前一帧的音频通道信号的ILD。通过复用标识Flag_2-2来指示当前帧的音频通道信号的ITD是否复用前一帧的音频通道信号的ITD。通过复用标识Flag_2-3来指示当前帧的音频通道信号的IPD是否复用前一帧的音频通道信号的IPD。
作为又一种举例,第一编码参数包括通道间比特分配参数。比如,通过复用标识Flag_3来指示当前帧的音频通道信号的通道间比特分配参数是否复用前一帧的音频通道信号的通道间比特分配参数。例如,Flag_3=1时,指示当前帧的音频通道信号的通道间比特分配参数复用前一帧的音频通道信号的通道间比特分配参数;Flag_3=0时,指示当前帧的音频通道信号的通道间比特分配参数不复用前一帧的音频通道信号的通道间比特分配参数。又例如,Flag_3=11时,指示当前帧的音频通道信号的通道间比特分配参数复用前一帧的音频通道信号的通道间比特分配参数;Flag_3=00时,指示当前帧的音频通道信号的通道间比特分配参数不复用前一帧的音频通道信号的通道间比特分配参数;Flag_3=01(或者10),指示当前帧的音频通道信号的通道间比特分配参数由按照设定比例调整前一帧的音频通道信号的通道间比特分配参数获得,或者指示当前帧的音频通道信号的通道间比特分配参数部分复用前一帧的音频通道信号的通道间比特分配参数。
如下对本申请实施例涉及的虚拟扬声器的HOA系数的生成过程进行示例性地说明。 虚拟扬声器的HOA系数的生成还可以采用其它的方式,本申请实施例对此不作具体限定。
以声波在理想介质中传播为例,波数为k=w/c,角频率w=2πf,f为声波频率,c为声速。则声压p满足如下公式(6),其中
Figure PCTCN2022092310-appb-000017
为拉普拉斯算子:
Figure PCTCN2022092310-appb-000018
在球坐标下求解公式(6)所示的方程中的p,在无源球形区域内,该方程的解p可以表达为如下公式(7):
Figure PCTCN2022092310-appb-000019
在上述公式(7)中,r表示球半径,θ表示水平角,
Figure PCTCN2022092310-appb-000020
表示俯仰角,k表示波数,s为理想平面波的幅度,m为HOA阶数的序号,
Figure PCTCN2022092310-appb-000021
是球贝塞尔函数,又称径向基函数,
Figure PCTCN2022092310-appb-000022
中第一个j表示虚数单位。
Figure PCTCN2022092310-appb-000023
部分不随角度变化。
Figure PCTCN2022092310-appb-000024
即为θ,
Figure PCTCN2022092310-appb-000025
方向的球谐函数,
Figure PCTCN2022092310-appb-000026
是声源方向的球谐函数。
其Ambisonics系数可以表示为公式(8):
Figure PCTCN2022092310-appb-000027
根据公式(8)进一步获得公式(7)对应的展开形式如公式(9)所示:
Figure PCTCN2022092310-appb-000028
公式(9)表明声场可以在球面上按球谐函数展开,使用系数
Figure PCTCN2022092310-appb-000029
进行表示。或者,已知系数
Figure PCTCN2022092310-appb-000030
可以根据
Figure PCTCN2022092310-appb-000031
重建声场。将上式截断到第N项,以系数
Figure PCTCN2022092310-appb-000032
作为对声场的近似描述,则称为N阶的HOA系数,该HOA系数也可以称为Ambisonics系数。P阶Ambisonics系数共有(P+1) 2个通道。其中,一阶以上的Ambisonics信号也称为HOA信号。在一种可能的配置下,HOA阶数可以为2至10阶。将球谐函数按照HOA信号一个采样点对应的系数进行叠加,就能实现该采样点对应的时刻空间声场的重构。
根据上述描述可以生成虚拟扬声器的HOA系数。将公式(8)中的θ s
Figure PCTCN2022092310-appb-000033
设置为虚拟扬声器的坐标,即水平角(θ s)和俯仰角
Figure PCTCN2022092310-appb-000034
根据公式(8)可以获得该扬声器的HOA系数,也称作Ambisonics系数。
对于3阶HOA信号,令理想平面波的幅度s=1,其对应的16通道HOA系数可以通过球谐函数
Figure PCTCN2022092310-appb-000035
获得,3阶HOA信号对应的16通道HOA系数计算公式具体如表1所示。
表1
Figure PCTCN2022092310-appb-000036
Figure PCTCN2022092310-appb-000037
其中表1中,θ表示扬声器水平角,
Figure PCTCN2022092310-appb-000038
表示扬声器的仰角。l表示HOA阶数,l=0,1…P;m表示每一阶中的方向参数,m=-l,…,l。按照表1中极坐标中的表达式,可以根据扬声器位置坐标,获得3阶HOA信号对应的16通道系数。
下面对当前帧的目标虚拟扬声器的确定方法以及音频通道信号的生成方法进行示例性地说明。当前帧的目标虚拟扬声器的确定以及音频通道信号的生成还可以采用其它的方式,本申请实施例对此不作具体限定。
A1,音频编码组件确定第一目标虚拟扬声器包括的虚拟扬声器的个数和音频通道信号包括的虚拟扬声器信号的个数。
第一目标虚拟扬声器的个数M不能超过虚拟扬声器总个数,比如,虚拟扬声器集合包括1024个虚拟扬声器,虚拟扬声器信号的个数K(编码器要传输的虚拟扬声器信号)不能 超过第一目标虚拟扬声器个数M。
其中,第一目标虚拟扬声器包括的虚拟扬声器的个数M可以与编码速率相关,也可以与编码器复杂度相关,也可以通过用户指定。例如,当速率较低时,例如等于128kbps时,M=1,当速率中等时,例如等于384kbps时,M=4,当速率较高时,例如等于768kbps时,M=7;当编码器复杂度较低时,M=1,当编码器复杂度中等时,M=2,当编码器复杂度较高时,M=6。又例如:当编码速率为128kbps时,且编码复杂度要求较低时,M=1。
可选地,第一目标虚拟扬声器的个数M也可以通过场景信号类型参数获得。例如,场景信号类型参数可以是对当前帧的待编码HOA信号进行SVD分解后的特征值。通过场景信号类型参数可以获得声场中包含不同方向的声源个数d,第一目标虚拟扬声器的个数M满足1≤N≤d。
A2,根据待编码的HOA信号、候选虚拟扬声器集合确定第一目标虚拟扬声器中的虚拟扬声器。
首先,计算待编码HOA信号第j个频点的第i轮次的扬声器投票值P jil,确定第j个频点的第i轮次的匹配扬声器序号g j,i及其对应的投票值
Figure PCTCN2022092310-appb-000039
可以先根据当前帧的待编码HOA信号确定代表点,然后根据待编码HOA信号的代表点计算扬声器投票值。也可以直接根据当前帧的待编码HOA信号的每一个点计算扬声器投票值。代表点可以是时域上的代表样点也可以频域上的代表频点。
第i轮次中扬声器集合可以是虚拟扬声器集合,包含Q个虚拟扬声器;也可以按照预先设定的规律从虚拟扬声器集合中选出的子集。不同轮次中使用的扬声器集合可以相同也可以不同。
本实施例以采用待编码HOA信号的L’个代表频点、使用虚拟扬声器集合作为每一轮计算投票值的扬声器为例,给出一种扬声器投票值计算方法:扬声器投票值通过待编码信号的HOA系数与扬声器的HOA系数的投影获得。
具体的步骤包括:
(1)计算待编码信号第j个频点的HOA系数与第l个扬声器的HOA系数的投影值,获得第i轮第l个扬声器的投票值P jil,l=1,2…Q。
以下给出一种求取投影值的实施方法:
P jil=log(E jil)或P jil=E jil
Figure PCTCN2022092310-appb-000040
其中θ为方位角和
Figure PCTCN2022092310-appb-000041
为俯仰角,
Figure PCTCN2022092310-appb-000042
为待编码信号第j个频点的HOA系数,
Figure PCTCN2022092310-appb-000043
为第l个扬声器的HOA系数,l=1,2…Q,Q为扬声器总个数。
(2)根据投票值P jil,l=1,2…Q,获得第j个频点对应的第i轮投票的匹配扬声器g j,i
例如,第j个频点对应的第i轮投票的匹配扬声器g j,i的选取准则为从第j个频点对应的第i轮投票的Q个扬声器对应的投票值中选取投票值的绝对值最大的扬声器为第j个频点第i轮投票的匹配扬声器,其序号为g j,i当l=g j,i时,取得
Figure PCTCN2022092310-appb-000044
(3)若i小于投票轮次数I,则从待编码的第j个频点的HOA信号中减去第j个频点的第i轮投票选中的扬声器的HOA系数,作为第j个频点下一轮次计算扬声器投票值所需的待编码HOA信号:
Figure PCTCN2022092310-appb-000045
其中E jig为第j个频点第i轮投票的匹配扬声器的投票值,上述
Figure PCTCN2022092310-appb-000046
公式右侧的
Figure PCTCN2022092310-appb-000047
为用于第j个频点对应的第i轮投票的待编码信号的HOA系数,公式左侧的
Figure PCTCN2022092310-appb-000048
为用于第j个频点对应的第i+1轮投票的待编码信号的HOA系数,w为权值,可以预先设定的值满足0≤w≤1,除此之外给出一种自适应权值计算方法:
Figure PCTCN2022092310-appb-000049
其中norm为求取二范数运算,
Figure PCTCN2022092310-appb-000050
为第j个频点第i轮投票的匹配扬声器的HOA系数。
(4)重复(1)至(3),直到计算出第j个样点的各个轮次匹配扬声器的投票值
Figure PCTCN2022092310-appb-000051
i=1,2,…,I。
(5)重复(1)至(4),直到计算出所有频点的匹配扬声器的投票值
Figure PCTCN2022092310-appb-000052
i=1,2,…,I,j=1,2,…,L’。
其次,根据各个代表频点在各个轮次的匹配扬声器序号g j,i及其对应的投票值
Figure PCTCN2022092310-appb-000053
计算各个匹配扬声器的总投票值VOTE g:VOTE g=∑P jig或VOTE g=VOTE g+P jig
具体实现为对匹配扬声器的序号相等的所有匹配扬声器的投票值
Figure PCTCN2022092310-appb-000054
进行累加以获得该匹配扬声器对应的总投票值。例如:
Figure PCTCN2022092310-appb-000055
根据匹配扬声器的总投票值确定最佳匹配扬声器集合。具体地可以是,对所有匹配扬声器的总投票值VOTE g进行选择,根据总投票值VOTE g的大小选出C个投票胜出的匹配扬声器作为最佳匹配扬声器集合,进而获得最佳匹配扬声器集合的位置坐标
Figure PCTCN2022092310-appb-000056
Figure PCTCN2022092310-appb-000057
A3,根据最佳匹配扬声器集合的位置坐标,计算最佳匹配扬声器集合的HOA系数矩阵A[f g1,f g2,…,f gC]。
A4,根据最佳匹配扬声器集合的HOA系数矩阵和,计算虚拟扬声器信号H:H=A -1X。
其中,A -1代表矩阵A的逆矩阵,矩阵A的大小为(M×C),C为投票胜出扬声器个数,M为N阶的HOA系数的声道个数M=(N+1) 2,a表示最佳匹配扬声器的HOA系数,例如,
Figure PCTCN2022092310-appb-000058
其中,X代表待编码信号的HOA系数,矩阵X的大小为(M×L),M为N阶的HOA系数的声道个数,L为频点个数,x表示待编码信号的HOA系数,例如,
Figure PCTCN2022092310-appb-000059
下面结合具体场景,对本申请实施例提供的编码方法流程进行描述。以音频编码组件包括空间编码器和核心编码器为例。
B1,空间编码器针对待编码的HOA信号进行空间编码处理获得当前帧的音频通道信号和当前帧的音频通道的第一目标虚拟扬声器的属性信息,并传输给核心编码器。第一目标虚拟扬声器的属性信息包括第一目标虚拟扬声器的坐标、序号或者HOA系数中的一项或者多项。
B2,核心编码器针对音频通道信号进行核心编码处理获得码流。
核心编码处理可以包括且不限于变换、心理声学模型处理、下混处理、带宽扩展、量化和熵编码等,核心编码处理可以对频域的音频通道信号进行处理也可以对时域的音频通道信号进行处理,此处不做限定。
下混处理采用的编码参数可以包括通道间配对参数、通道间听觉空间参数或者通道间比特分配参数的一项或者多项。即在进行下混处理时,可以包括通道间配对处理、通道信号调整处理、通道间比特分配处理等。
示例性地,参见图5所示,为一种可能的编码流程示意图。
待编码的HOA信号经过空间编码器处理后输出当前帧的音频通道信号和当前帧的音频通道的第一目标虚拟扬声器的属性信息。以音频通道信号为时域信号为例。核心编码器对音频通道信号进行暂态检测,然后对暂态检测后的信号进行加窗变换获得频域信号。进一步针对频域信号进行噪声整形处理获得整形后的音频通道信号。然后对噪声整形处理后的音频通道信号进行下混处理,可以包括通道间配对操作、通道信号调整、通道间信号比特分配操作。本申请实施例不对通道间配对操作、通道信号调整、通道间信号比特分配操作的处理先后顺序进行具体限定。参见图5所示,以先执行通道间配对处理,具体根据通道间配对参数来执行通道间配对处理,并将通道间配对参数和/或复用标识编入码流。通道间配对参数可以根据当前帧的第一目标虚拟扬声器的属性信息(第一目标虚拟扬声器的坐标、序号或者HOA系数)以及前一帧的第二目标虚拟扬声器的属性信息(第二目标虚拟扬声器的坐标、序号或者HOA系数)确定当前帧的通道间配对参数是否复用前一帧的通道间配对参数。根据确定的当前帧的通道间配对参数对当前帧的噪声整形处理后的音频通道信号进行通道间配对处理获得配对后的音频通道信号。然后针对配对后的音频通道信号进行通道信号调整,比如可以根据通道间听觉空间参数对配对后的音频通道信号进行通道信号调整获得调整后的音频通道信号,并将通道间听觉空间参数和/或复用标识编入码流。通道间听觉空间参数可以根据当前帧的第一目标虚拟扬声器的属性信息(第一目标虚拟扬声器的坐标、序号或者HOA系数)以及前一帧的第二目标虚拟扬声器的属性信息(第二目标虚拟扬声器的坐标、序号或者HOA系数)确定当前帧的通道间听觉空间参数是否复用前一帧的通道间听觉空间参数。进一步地,根据通道间比特分配参数对调整后的音频通道信号进行通道间比特分配处理,并将通道间比特分配参数和/或复用标识编入码流。通道间比特分配参数可以根据当前帧的第一目标虚拟扬声器的属性信息(第一目标虚拟扬声器的坐标、序号或者HOA系数)以及前一帧的第二目标虚拟扬声器的属性信息(第二目标虚拟扬声器的坐标、序号或者HOA系数)确定当前帧的通道间比特分配参数是否复用前一帧的通道间比特分配参数。经过通道间比特分配处理后,可以进一步执行量化、熵编码以及带宽调整获得码流。
根据与上述方法相同的发明构思,本申请实施例提供一种音频编码装置。参见图6所 示,音频编码装置可以包括空间编码单元601,用于获得当前帧的音频通道信号,所述当前帧的音频通道信号是通过第一目标虚拟扬声器对原始高阶立体混响HOA信号进行空间映射获得的;核心编码单元602,用于在确定所述第一目标虚拟扬声器与所述当前帧的前一帧的音频通道信号对应的第二目标虚拟扬声器满足设定条件时,根据所述前一帧的音频通道信号的第二编码参数确定当前帧的音频通道信号的第一编码参数;根据所述第一编码参数对所述当前帧的音频通道信号进行编码并写入码流。
在一种可能的设计中,所述核心编码单元602,还用于将所述第一编码参数写入码流。
在一种可能的设计中,所述第一编码参数包括通道间配对参数、通道间听觉空间参数或者通道间比特分配参数中的一项或者多项。
在一种可能的设计中,所述设定条件包括所述第一空间位置与所述第二空间位置重叠;所述核心编码单元602,具体用于将所述前一帧的音频通道信号的第二编码参数作为所述当前帧的音频通道信号的第一编码参数。
在一种可能的设计中,所述核心编码单元602,还用于将复用标识写入码流,所述复用标识的取值为第一值,所述第一值指示所述当前帧的音频通道信号的第一编码参数复用所述第二编码参数。
在一种可能的设计中,所述第一空间位置包括所述第一目标虚拟扬声器的第一坐标,所述第二空间位置包括所述第二目标虚拟扬声器的第二坐标,所述第一空间位置与所述第二空间位置重叠包括所述第一坐标与所述第二坐标相同;或所述第一空间位置包括所述第一目标虚拟扬声器的第一序号,所述第二空间位置包括所述第二目标虚拟扬声器的第二序号,所述第一空间位置与所述第二空间位置重叠包括所述第一序号与所述第二序号相同;或所述第一空间位置包括所述第一目标虚拟扬声器的第一HOA系数,所述第二空间位置包括所述第二目标虚拟扬声器的第二HOA系数,所述第一空间位置与所述第二空间位置重叠包括所述第一HOA系数与所述第二HOA系数相同。
在一种可能的设计中,所述第一目标虚拟扬声器包括M个虚拟扬声器,所述第二目标虚拟扬声器包括N个虚拟扬声器;设定条件包括所述第一空间位置与所述第二空间位置不重叠且所述第一目标虚拟扬声器包括的第m个虚拟扬声器位于以所述第二目标虚拟扬声器包括的第n个虚拟扬声器为中心的设定范围内,其中,m遍历小于或者等于M的正整数,n遍历小于或者等于N的正整数;所述核心编码单元602,具体用于按照设定比例调整所述第二编码参数获得所述第一编码参数。
在一种可能的设计中,当所述第一空间位置包括所述第一目标虚拟扬声器的第一坐标,所述第二空间位置包括所述第二目标虚拟扬声器的第二坐标时,所述第m个虚拟扬声器是否位于以所述第n个虚拟扬声器为中心的设定范围内通过所述第m个虚拟扬声器与所述第n个虚拟扬声器之间的相关度确定,其中,所述相关度满足如下条件:
Figure PCTCN2022092310-appb-000060
其中,R表示相关度,norm()表示归一化运算,M H为当前帧的第一目标虚拟扬声器包括的虚拟扬声器的坐标组成的矩阵,
Figure PCTCN2022092310-appb-000061
为前一帧的第二目标虚拟扬声器包括的虚拟扬声器的坐标组成的矩阵的转置;
当所述相关度大于设定值时,所述第m个虚拟扬声器位于以所述第n个虚拟扬声器为中心的设定范围内,其中,m遍历小于或者等于M的正整数,n遍历小于或者等于N的正整数。
在一种可能的设计中,所述核心编码单元602,还用于将复用标识写入码流,所述复用标识的取值为第二值,所述第二值指示所述当前帧的音频通道信号的第一编码参数通过按照设定比例调整所述第二编码参数获得。
在一种可能的设计中,所述核心编码单元,还用于将所述设定比例写入所述码流。
根据与上述方法相同的发明构思,本申请实施例提供一种音频解码装置。参见图7所示,音频解码装置可以包括核心解码单元701,用于从码流中解析复用标识,所述复用标识指示当前帧的音频通道信号的第一编码参数通过所述当前帧的前一帧的音频通道信号的第二编码参数确定;根据所述前一帧的音频通道信号的第二编码参数确定所述第一编码参数;根据所述第一编码参数从所述码流中解码所述当前帧的音频通道信号;空间解码单元702,用于对所述音频通道信号进行空间解码获得高阶立体混响HOA信号。
在一种可能的设计中,所述核心解码单元701,具体用于当所述复用标识的取值为第一值时,所述第一值指示所述第一编码参数复用所述第二编码参数,获得所述第二编码参数作为所述第一编码参数。
在一种可能的设计中,所述核心解码单元701,具体用于当所述复用标识的取值为第二值时,所述第二值指示所述第一编码参数通过按照设定比例调整所述第二编码参数获得,按照设定比例调整所述第二编码参数获得所述第一编码参数。
在一种可能的设计中,所述核心解码单元701,具体用于当所述复用标识的取值为第二值时,从所述码流中解码获得所述设定比例。
在一种可能的设计中,所述音频通道信号的编码参数包括通道间配对参数、通道间听觉空间参数或者通道间比特分配参数中的一项或者多项。
示例性地,在解码端,图7中,核心解码单元701的位置对应于图2B中核心解码器230的位置,换言之,核心解码单元701的功能的具体实现可以参见图2B中的核心解码器230的具体细节。空间解码单元702的位置对应于图2B中空间解码器240的位置,换言之,空间解码单元702的功能的具体实现可以参见图2B中空间解码器240的具体细节。
示例性地,在编码端,图6中,空间编码单元601的位置对应于图2A中空间编码器210的位置,换言之,空间编码单元601的功能的具体实现可以参见图2A中空间编码器210的具体细节。核心编码单元602的位置对应于图2A中核心编码器220的位置,换言之,核心编码单元602的功能的具体实现可以参见图2A中核心编码器220的具体细节。
还需要说明的是,核心编码单元602、核心编码单元602的具体实现过程可参考图3A、图3B或者图5实施例的详细描述,为了说明书的简洁,这里不再赘述。
本领域技术人员能够领会,结合本文公开描述的各种说明性逻辑框、模块和算法步骤所描述的功能可以硬件、软件、固件或其任何组合来实施。如果以软件来实施,那么各种说明性逻辑框、模块、和步骤描述的功能可作为一或多个指令或代码在计算机可读媒体上存储或传输,且由根据硬件的处理单元执行。计算机可读媒体可包含计算机可读存储媒体,其对应于有形媒体,例如数据存储媒体,或包括任何促进将计算机程序从一处传送到另一处的媒体(例如,根据通信协议)的通信媒体。以此方式,计算机可读媒体大体上可对应于(1)非暂时性的有形计算机可读存储媒体,或(2)通信媒体,例如信号或载波。数据存储媒体可为可由一或多个计算机或一或多个处理器存取以检索用于实施本申请中描述的技术的指令、代码和/或数据结构的任何可用媒体。计算机程序产品可包含计算机可读媒体。
作为实例而非限制,此类计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM 或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用来存储指令或数据结构的形式的所要程序代码并且可由计算机存取的任何其它媒体。并且,任何连接被恰当地称作计算机可读媒体。举例来说,如果使用同轴缆线、光纤缆线、双绞线、数字订户线(DSL)或例如红外线、无线电和微波等无线技术从网站、服务器或其它远程源传输指令,那么同轴缆线、光纤缆线、双绞线、DSL或例如红外线、无线电和微波等无线技术包含在媒体的定义中。但是,应理解,所述计算机可读存储媒体和数据存储媒体并不包括连接、载波、信号或其它暂时媒体,而是实际上针对于非暂时性有形存储媒体。如本文中所使用,磁盘和光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)和蓝光光盘,其中磁盘通常以磁性方式再现数据,而光盘利用激光以光学方式再现数据。以上各项的组合也应包含在计算机可读媒体的范围内。
可通过例如一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一或多个处理器来执行指令。因此,如本文中所使用的术语“处理器”可指前述结构或适合于实施本文中所描述的技术的任一其它结构中的任一者。另外,在一些方面中,本文中所描述的各种说明性逻辑框、模块、和步骤所描述的功能可以提供于经配置以用于编码和解码的专用硬件和/或软件模块内,或者并入在组合编解码器中。而且,所述技术可完全实施于一或多个电路或逻辑元件中。
本申请的技术可在各种各样的装置或设备中实施,包含无线手持机、集成电路(IC)或一组IC(例如,芯片组)。本申请中描述各种组件、模块或单元是为了强调用于执行所揭示的技术的装置的功能方面,但未必需要由不同硬件单元实现。实际上,如上文所描述,各种单元可结合合适的软件和/或固件组合在编码解码器硬件单元中,或者通过互操作硬件单元(包含如上文所描述的一或多个处理器)来提供。
在上述实施例中,对各个实施例的描述各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
以上所述,仅为本申请示例性的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应该以权利要求的保护范围为准。

Claims (33)

  1. 一种音频编码方法,其特征在于,包括:
    获得当前帧的音频通道信号,所述当前帧的音频通道信号是通过第一目标虚拟扬声器对原始高阶立体混响HOA信号进行空间映射获得的;
    在确定所述第一目标虚拟扬声器与第二目标虚拟扬声器满足设定条件时,根据所述当前帧的前一帧的音频通道信号的第二编码参数确定所述当前帧的音频通道信号的第一编码参数,所述前一帧的音频通道信号与所述第二目标虚拟扬声器对应;
    根据所述第一编码参数对所述当前帧的音频通道信号进行编码;
    将所述当前帧的音频通道信号的编码结果写入码流。
  2. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    将所述第一编码参数写入码流。
  3. 如权利要求1或2所述的方法,其特征在于,所述第一编码参数包括通道间配对参数、通道间听觉空间参数或者通道间比特分配参数中的一项或者多项。
  4. 如权利要求1-3任一项所述的方法,其特征在于,所述设定条件包括所述第一目标虚拟扬声器的第一空间位置与所述第二目标虚拟扬声器的第二空间位置重叠;
    所述根据所述前一帧的音频通道信号的第二编码参数确定当前帧的音频通道信号的第一编码参数,包括:
    将所述前一帧的音频通道信号的第二编码参数作为所述当前帧的音频通道信号的第一编码参数。
  5. 如权利要求4所述的方法,其特征在于,所述方法还包括:
    将复用标识写入码流,所述复用标识的取值为第一值,所述第一值指示所述当前帧的音频通道信号的第一编码参数复用所述第二编码参数。
  6. 如权利要求4或5所述的方法,其特征在于,所述第一空间位置包括所述第一目标虚拟扬声器的第一坐标,所述第二空间位置包括所述第二目标虚拟扬声器的第二坐标,所述第一空间位置与所述第二空间位置重叠包括所述第一坐标与所述第二坐标相同;
    所述第一空间位置包括所述第一目标虚拟扬声器的第一序号,所述第二空间位置包括所述第二目标虚拟扬声器的第二序号,所述第一空间位置与所述第二空间位置重叠包括所述第一序号与所述第二序号相同;
    所述第一空间位置包括所述第一目标虚拟扬声器的第一HOA系数,所述第二空间位置包括所述第二目标虚拟扬声器的第二HOA系数,所述第一空间位置与所述第二空间位置重叠包括所述第一HOA系数与所述第二HOA系数相同。
  7. 如权利要求1-6任一项所述的方法,其特征在于,所述第一目标虚拟扬声器包括M个虚拟扬声器,所述第二目标虚拟扬声器包括N个虚拟扬声器;
    所述设定条件包括:所述第一目标虚拟扬声器的第一空间位置与所述第二目标虚拟扬声器的第二空间位置不重叠,且所述第一目标虚拟扬声器包括的第m个虚拟扬声器位于以所述第二目标虚拟扬声器包括的第n个虚拟扬声器为中心的设定范围内,其中,m遍历小于或者等于M的正整数,n遍历小于或者等于N的正整数;
    所述根据所述前一帧的音频通道信号的第二编码参数确定当前帧的音频通道信号的第一编码参数,包括:
    按照设定比例调整所述第二编码参数获得所述第一编码参数。
  8. 如权利要求7所述的方法,其特征在于,当所述第一空间位置包括所述第一目标虚拟扬声器的第一坐标,所述第二空间位置包括所述第二目标虚拟扬声器的第二坐标时,所述第m个虚拟扬声器是否位于以所述第n个虚拟扬声器为中心的设定范围内通过所述第m个虚拟扬声器与所述第n个虚拟扬声器之间的相关度确定,其中,所述相关度满足如下条件:
    Figure PCTCN2022092310-appb-100001
    其中,R表示相关度,norm()表示归一化运算,M H为当前帧的第一目标虚拟扬声器包括的虚拟扬声器的坐标组成的矩阵,
    Figure PCTCN2022092310-appb-100002
    为前一帧的第二目标虚拟扬声器包括的虚拟扬声器的坐标组成的矩阵的转置;
    当所述相关度大于设定值时,所述第m个虚拟扬声器位于以所述第n个虚拟扬声器为中心的设定范围内。
  9. 如权利要求7或8所述的方法,其特征在于,所述方法还包括:
    将复用标识写入码流,所述复用标识的取值为第二值,所述第二值指示所述当前帧的音频通道信号的第一编码参数通过按照设定比例调整所述第二编码参数获得。
  10. 如权利要求7-9任一项所述的方法,其特征在于,所述方法还包括:将所述设定比例写入所述码流。
  11. 一种音频解码方法,其特征在于,包括:
    从码流中解析复用标识,所述复用标识指示当前帧的音频通道信号的第一编码参数通过所述当前帧的前一帧的音频通道信号的第二编码参数确定;
    根据所述前一帧的音频通道信号的第二编码参数确定所述第一编码参数;
    根据所述第一编码参数从所述码流中解码所述当前帧的音频通道信号。
  12. 如权利要求11所述的方法,其特征在于,根据所述前一帧的音频通道信号的第二编码参数确定所述第一编码参数,包括:
    当所述复用标识的取值为第一值时,所述第一值指示所述第一编码参数复用所述第二编码参数,获得所述第二编码参数作为所述第一编码参数。
  13. 如权利要求11或12所述的方法,其特征在于,根据所述前一帧的音频通道信号的第二编码参数确定所述第一编码参数,包括:
    当所述复用标识的取值为第二值时,所述第二值指示所述第一编码参数通过按照设定比例调整所述第二编码参数获得,按照设定比例调整所述第二编码参数获得所述第一编码参数。
  14. 如权利要求13所述的方法,其特征在于,所述方法还包括:
    当所述复用标识的取值为第二值时,从所述码流中解码获得所述设定比例。
  15. 如权利要求11-14任一项所述的方法,其特征在于,所述音频通道信号的编码参数包括通道间配对参数、通道间听觉空间参数或者通道间比特分配参数中的一项或者多项。
  16. 一种音频编码装置,其特征在于,包括:
    空间编码单元,用于获得当前帧的音频通道信号,所述当前帧的音频通道信号是通过第一目标虚拟扬声器对原始高阶立体混响HOA信号进行空间映射获得的;
    核心编码单元,用于在确定所述第一目标虚拟扬声器与第二目标虚拟扬声器满足设定条件时,根据所述当前帧的前一帧的音频通道信号的第二编码参数确定当前帧的音频通道信号的第一编码参数,所述前一帧的音频通道信号与所述第二目标虚拟扬声器对应;根据所述第一编码参数对所述当前帧的音频通道信号进行编码,并将所述当前帧的音频通道信号的编码结果写入码流。
  17. 如权利要求16所述的装置,其特征在于,所述核心编码单元,还用于将所述第一编码参数写入码流。
  18. 如权利要求16或17所述的装置,其特征在于,所述第一编码参数包括通道间配对参数、通道间听觉空间参数或者通道间比特分配参数中的一项或者多项。
  19. 如权利要求16-18任一项所述的装置,其特征在于,所述设定条件包括所述第一目标虚拟扬声器的第一空间位置与所述第二目标虚拟扬声器的第二空间位置重叠;
    所述核心编码单元,具体用于将所述前一帧的音频通道信号的第二编码参数作为所述当前帧的音频通道信号的第一编码参数。
  20. 如权利要求19所述的装置,其特征在于,所述核心编码单元,还用于将复用标识写入码流,所述复用标识的取值为第一值,所述第一值指示所述当前帧的音频通道信号的第一编码参数复用所述第二编码参数。
  21. 如权利要求19或20所述的装置,其特征在于,所述第一空间位置包括所述第一目标虚拟扬声器的第一坐标,所述第二空间位置包括所述第二目标虚拟扬声器的第二坐标,所述第一空间位置与所述第二空间位置重叠包括所述第一坐标与所述第二坐标相同;
    所述第一空间位置包括所述第一目标虚拟扬声器的第一序号,所述第二空间位置包括所述第二目标虚拟扬声器的第二序号,所述第一空间位置与所述第二空间位置重叠包括所述第一序号与所述第二序号相同;
    所述第一空间位置包括所述第一目标虚拟扬声器的第一HOA系数,所述第二空间位置包括所述第二目标虚拟扬声器的第二HOA系数,所述第一空间位置与所述第二空间位置重叠包括所述第一HOA系数与所述第二HOA系数相同。
  22. 如权利要求16-21任一项所述的装置,其特征在于,所述第一目标虚拟扬声器包括M个虚拟扬声器,所述第二目标虚拟扬声器包括N个虚拟扬声器;
    所述设定条件包括所述第一目标虚拟扬声器的第一空间位置与所述第二目标虚拟扬声器的第二空间位置不重叠且所述第一目标虚拟扬声器包括的第m个虚拟扬声器位于以所述第二目标虚拟扬声器包括的第n个虚拟扬声器为中心的设定范围内,其中,m遍历小于或者等于M的正整数,n遍历小于或者等于N的正整数;
    所述核心编码单元,具体用于按照设定比例调整所述第二编码参数获得所述第一编码参数。
  23. 如权利要求22所述的装置,其特征在于,当所述第一空间位置包括所述第一目标虚拟扬声器的第一坐标,所述第二空间位置包括所述第二目标虚拟扬声器的第二坐标时,所述第m个虚拟扬声器是否位于以所述第n个虚拟扬声器为中心的设定范围内通过所述第m个虚拟扬声器与所述第n个虚拟扬声器之间的相关度确定,其中,所述相关度满足如下条件:
    Figure PCTCN2022092310-appb-100003
    其中,R表示相关度,norm()表示归一化运算,M H为当前帧的第一目标虚拟扬声器包括的虚拟扬声器的坐标组成的矩阵,
    Figure PCTCN2022092310-appb-100004
    为前一帧的第二目标虚拟扬声器包括的虚拟扬声器的坐标组成的矩阵的转置;
    当所述相关度大于设定值时,所述第m个虚拟扬声器位于以所述第n个虚拟扬声器为中心的设定范围内。
  24. 如权利要求22或23所述的装置,其特征在于,所述核心编码单元,还用于将复用标识写入码流,所述复用标识的取值为第二值,所述第二值指示所述当前帧的音频通道信号的第一编码参数通过按照设定比例调整所述第二编码参数获得。
  25. 如权利要求22-24任一项所述的装置,其特征在于,所述核心编码单元,还用于将所述设定比例写入所述码流。
  26. 一种音频解码装置,其特征在于,包括:
    核心解码单元,用于从码流中解析复用标识,所述复用标识指示当前帧的音频通道信号的第一编码参数通过所述当前帧的前一帧的音频通道信号的第二编码参数确定;根据所述前一帧的音频通道信号的第二编码参数确定所述第一编码参数;根据所述第一编码参数从所述码流中解码所述当前帧的音频通道信号;
    空间解码单元,用于对所述音频通道信号进行空间解码获得高阶立体混响HOA信号。
  27. 如权利要求26所述的装置,其特征在于,所述核心解码单元,具体用于当所述复用标识的取值为第一值时,所述第一值指示所述第一编码参数复用所述第二编码参数,获得所述第二编码参数作为所述第一编码参数。
  28. 如权利要求26或27所述的装置,其特征在于,所述核心解码单元,具体用于当所述复用标识的取值为第二值时,所述第二值指示所述第一编码参数通过按照设定比例调整所述第二编码参数获得,按照设定比例调整所述第二编码参数获得所述第一编码参数。
  29. 如权利要求28所述的装置,其特征在于,所述核心解码单元,具体用于当所述复用标识的取值为第二值时,从所述码流中解码获得所述设定比例。
  30. 如权利要求26-29任一项所述的装置,其特征在于,所述音频通道信号的编码参数包括通道间配对参数、通道间听觉空间参数或者通道间比特分配参数中的一项或者多项。
  31. 一种音频编码设备,其特征在于,包括:相互耦合的非易失性存储器和处理器,所述处理器调用存储在所述存储器中的程序代码以执行如权利要求1-10任一项所述的方法。
  32. 一种音频解码设备,其特征在于,包括:相互耦合的非易失性存储器和处理器,所述处理器调用存储在所述存储器中的程序代码以执行如权利要求11-15任一项所述的方法。
  33. 一种计算机存储介质,其特征在于,所述计算机可读存储介质存储了程序代码,所述程序代码包括用于执行如权利要求1-15任一项所述的方法的指令。
PCT/CN2022/092310 2021-05-14 2022-05-11 一种音频编码、解码方法及装置 WO2022237851A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22806813.6A EP4318470A1 (en) 2021-05-14 2022-05-11 Audio encoding method and apparatus, and audio decoding method and apparatus
US18/504,102 US20240079016A1 (en) 2021-05-14 2023-11-07 Audio encoding method and apparatus, and audio decoding method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110530309.1 2021-05-14
CN202110530309.1A CN115346537A (zh) 2021-05-14 2021-05-14 一种音频编码、解码方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/504,102 Continuation US20240079016A1 (en) 2021-05-14 2023-11-07 Audio encoding method and apparatus, and audio decoding method and apparatus

Publications (1)

Publication Number Publication Date
WO2022237851A1 true WO2022237851A1 (zh) 2022-11-17

Family

ID=83947091

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/092310 WO2022237851A1 (zh) 2021-05-14 2022-05-11 一种音频编码、解码方法及装置

Country Status (5)

Country Link
US (1) US20240079016A1 (zh)
EP (1) EP4318470A1 (zh)
CN (1) CN115346537A (zh)
TW (1) TW202248995A (zh)
WO (1) WO2022237851A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101231850A (zh) * 2007-01-23 2008-07-30 华为技术有限公司 编解码方法及装置
CN105917408A (zh) * 2014-01-30 2016-08-31 高通股份有限公司 指示用于译码向量的帧参数可重用性
CN108206984A (zh) * 2016-12-16 2018-06-26 南京青衿信息科技有限公司 利用多信道传输三维声信号的编解码器及其编解码方法
CN109300480A (zh) * 2017-07-25 2019-02-01 华为技术有限公司 立体声信号的编解码方法和编解码装置
CN110556118A (zh) * 2018-05-31 2019-12-10 华为技术有限公司 立体声信号的编码方法和装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2830060A1 (en) * 2013-07-22 2015-01-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Noise filling in multichannel audio coding
CN107731238B (zh) * 2016-08-10 2021-07-16 华为技术有限公司 多声道信号的编码方法和编码器
US20180124540A1 (en) * 2016-10-31 2018-05-03 Google Llc Projection-based audio coding
CN112151045B (zh) * 2019-06-29 2024-06-04 华为技术有限公司 一种立体声编码方法、立体声解码方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101231850A (zh) * 2007-01-23 2008-07-30 华为技术有限公司 编解码方法及装置
CN105917408A (zh) * 2014-01-30 2016-08-31 高通股份有限公司 指示用于译码向量的帧参数可重用性
CN108206984A (zh) * 2016-12-16 2018-06-26 南京青衿信息科技有限公司 利用多信道传输三维声信号的编解码器及其编解码方法
CN109300480A (zh) * 2017-07-25 2019-02-01 华为技术有限公司 立体声信号的编解码方法和编解码装置
CN110556118A (zh) * 2018-05-31 2019-12-10 华为技术有限公司 立体声信号的编码方法和装置

Also Published As

Publication number Publication date
TW202248995A (zh) 2022-12-16
US20240079016A1 (en) 2024-03-07
EP4318470A1 (en) 2024-02-07
CN115346537A (zh) 2022-11-15

Similar Documents

Publication Publication Date Title
WO2022110723A1 (zh) 一种音频编解码方法和装置
US20240119950A1 (en) Method and apparatus for encoding three-dimensional audio signal, encoder, and system
US20240087580A1 (en) Three-dimensional audio signal coding method and apparatus, and encoder
US20230298601A1 (en) Audio encoding and decoding method and apparatus
WO2022237851A1 (zh) 一种音频编码、解码方法及装置
WO2022156556A1 (zh) 音频对象的比特分配方法和装置
WO2024146408A1 (zh) 场景音频解码方法及电子设备
WO2022257824A1 (zh) 一种三维音频信号的处理方法和装置
WO2022262758A1 (zh) 音频渲染***、方法和电子设备
US20240079017A1 (en) Three-dimensional audio signal coding method and apparatus, and encoder
TWI844036B (zh) 三維音訊訊號編碼方法、裝置、編碼器、系統、電腦程式和電腦可讀儲存介質
WO2024114373A1 (zh) 场景音频编码方法及电子设备
WO2024114372A1 (zh) 场景音频解码方法及电子设备
US20240087578A1 (en) Three-dimensional audio signal coding method and apparatus, and encoder
WO2022262750A1 (zh) 音频渲染***、方法和电子设备
JP2024517503A (ja) 三次元オーディオ信号コーディング方法および装置、ならびにエンコーダ
EP3987824A1 (en) Audio rendering for low frequency effects

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22806813

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022806813

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022806813

Country of ref document: EP

Effective date: 20231024

NENP Non-entry into the national phase

Ref country code: DE